论文标题

缩小差距:临床领域中的联合去识别和概念提取

Closing the Gap: Joint De-Identification and Concept Extraction in the Clinical Domain

论文作者

Lange, Lukas, Adel, Heike, Strötgen, Jannik

论文摘要

在临床领域中利用自然语言处理需要取消识别,即文本中个人信息的匿名化。但是,当前的研究仅在孤立地考虑去识别和下游任务(例如概念提取),并且不研究去识别对其他任务的影响。在本文中,我们通过在自动匿名数据上报告概念提取性能,并调查关节模型,以消除识别和概念提取。特别是,我们提出了一个堆叠的模型,该模型限制了对隐私敏感信息的访问和多任务模型。我们以英语为基准数据集(用于识别的96.1%F1,概念提取为88.9%F1)和西班牙语(概念提取的91.4%F1)。

Exploiting natural language processing in the clinical domain requires de-identification, i.e., anonymization of personal information in texts. However, current research considers de-identification and downstream tasks, such as concept extraction, only in isolation and does not study the effects of de-identification on other tasks. In this paper, we close this gap by reporting concept extraction performance on automatically anonymized data and investigating joint models for de-identification and concept extraction. In particular, we propose a stacked model with restricted access to privacy-sensitive information and a multitask model. We set the new state of the art on benchmark datasets in English (96.1% F1 for de-identification and 88.9% F1 for concept extraction) and Spanish (91.4% F1 for concept extraction).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源