在临床试验中检测结果短语中情境化表示的评估

论文标题

在临床试验中检测结果短语中情境化表示的评估

Assessment of contextualised representations in detecting outcome phrases in clinical trials

论文作者

Abaho, Micheal, Bollegala, Danushka, Williamson, Paula R, Dodd, Susanna

论文摘要

自动使用机器学习中对临床试验报告的结果的识别具有巨大的潜力，可以加快获取医疗保健决策所必需的证据的机会。然而，先前的研究已承认培训语料库不足是对结果检测（OD）任务的挑战。此外，在检测各种疾病，基因，蛋白质和化学物质方面，伯特和Elmo等几种上下文化表示取得了无与伦比的成功，但是，对于结果而言，不能强调这种模型，因为这些模型已经相对测试并研究了OD任务。我们介绍了“ EBM-COMET”，这是一个数据集，其中有300个PubMed摘要对临床结果进行了专业注释。与使用任意结果分类的先前相关数据集不同，我们使用最近发布的分类法的标签来标准化结果分类。为了提取结果，我们调整了各种预训练的上下文化表示形式，此外，我们在自定义神经模型中使用冷冻的上下文化和与上下文无关的表示，并随着临床知情的言论嵌入和成本敏感的损失函数增强。我们通过对训练有素的模型进行严格的评估，以奖励它们正确地识别实体中的完整结果短语而不是单词，即给定结果“收缩压”，仅当模型顺序预测所有3个单词时，这些模型才能获得分类评分，否则，它们没有得到奖励。我们观察到我们的最佳模型（Biobert）达到81.5 \％F1、81.3 \％敏感性和98.0 \％特异性。我们达成共识，即在该方面最适合从临床审判摘要中检测结果。此外，我们的最佳模型优于在原始EBM-NLP数据集Leader-Board-Board-Board-Board-Board-Board-Board-Board-Board-Board分数上发表的分数。

Automating the recognition of outcomes reported in clinical trials using machine learning has a huge potential of speeding up access to evidence necessary in healthcare decision-making. Prior research has however acknowledged inadequate training corpora as a challenge for the Outcome detection (OD) task. Additionally, several contextualized representations like BERT and ELMO have achieved unparalleled success in detecting various diseases, genes, proteins, and chemicals, however, the same cannot be emphatically stated for outcomes, because these models have been relatively under-tested and studied for the OD task. We introduce "EBM-COMET", a dataset in which 300 PubMed abstracts are expertly annotated for clinical outcomes. Unlike prior related datasets that use arbitrary outcome classifications, we use labels from a taxonomy recently published to standardize outcome classifications. To extract outcomes, we fine-tune a variety of pre-trained contextualized representations, additionally, we use frozen contextualized and context-independent representations in our custom neural model augmented with clinically informed Part-Of-Speech embeddings and a cost-sensitive loss function. We adopt strict evaluation for the trained models by rewarding them for correctly identifying full outcome phrases rather than words within the entities i.e. given an outcome "systolic blood pressure", the models are rewarded a classification score only when they predict all 3 words in sequence, otherwise, they are not rewarded. We observe our best model (BioBERT) achieve 81.5\% F1, 81.3\% sensitivity and 98.0\% specificity. We reach a consensus on which contextualized representations are best suited for detecting outcomes from clinical-trial abstracts. Furthermore, our best model outperforms scores published on the original EBM-NLP dataset leader-board scores.

下载PDF全文

下载文献需遵守相关版权规定

论文标题