论文标题

临床试验文本中的数据挖掘:用于分类的变压器和问答任务的问题

Data Mining in Clinical Trial Text: Transformers for Classification and Question Answering Tasks

论文作者

Schmidt, Lena, Weeds, Julie, Higgins, Julian P. T.

论文摘要

对数据提取方法的这项研究将自然语言处理的最新进展应用于基于医学文本的证据综合。感兴趣的文本包括英语和多语言环境中的临床试验摘要。主要重点是通过人群,干预,比较器和结果(PICO)框架来表征的信息,但数据提取不限于这些领域。基于变形金刚的最新神经网络体系结构显示了转移学习的能力,并提高了下游自然语言处理任务(例如通用阅读理解),这是由于这种体系结构使用上下文化的单词嵌入和自我发言机制所带来的。本文有助于解决与PICO句子预测任务中的歧义有关的问题,并突出指定培训的注释指定的实体识别系统如何用于培训高性能但灵活的体系结构,以解决系统审查自动化中的问题。此外,它证明了如何通过增强来解决PICO实体提取的训练注释不足的问题。本文中的所有模型都是为了支持系统审查(SEMI)自动化的目的。它们获得了高F1分数,并证明了将基于变压器的分类方法应用于生物医学文献中的数据挖掘的可行性。

This research on data extraction methods applies recent advances in natural language processing to evidence synthesis based on medical texts. Texts of interest include abstracts of clinical trials in English and in multilingual contexts. The main focus is on information characterized via the Population, Intervention, Comparator, and Outcome (PICO) framework, but data extraction is not limited to these fields. Recent neural network architectures based on transformers show capacities for transfer learning and increased performance on downstream natural language processing tasks such as universal reading comprehension, brought forward by this architecture's use of contextualized word embeddings and self-attention mechanisms. This paper contributes to solving problems related to ambiguity in PICO sentence prediction tasks, as well as highlighting how annotations for training named entity recognition systems are used to train a high-performing, but nevertheless flexible architecture for question answering in systematic review automation. Additionally, it demonstrates how the problem of insufficient amounts of training annotations for PICO entity extraction is tackled by augmentation. All models in this paper were created with the aim to support systematic review (semi)automation. They achieve high F1 scores, and demonstrate the feasibility of applying transformer-based classification methods to support data mining in the biomedical literature.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源