缩小差距：临床领域中的联合去识别和概念提取

论文标题

缩小差距：临床领域中的联合去识别和概念提取

Closing the Gap: Joint De-Identification and Concept Extraction in the Clinical Domain

论文作者

Lange, Lukas, Adel, Heike, Strötgen, Jannik

论文摘要

在临床领域中利用自然语言处理需要取消识别，即文本中个人信息的匿名化。但是，当前的研究仅在孤立地考虑去识别和下游任务（例如概念提取），并且不研究去识别对其他任务的影响。在本文中，我们通过在自动匿名数据上报告概念提取性能，并调查关节模型，以消除识别和概念提取。特别是，我们提出了一个堆叠的模型，该模型限制了对隐私敏感信息的访问和多任务模型。我们以英语为基准数据集（用于识别的96.1％F1，概念提取为88.9％F1）和西班牙语（概念提取的91.4％F1）。

Exploiting natural language processing in the clinical domain requires de-identification, i.e., anonymization of personal information in texts. However, current research considers de-identification and downstream tasks, such as concept extraction, only in isolation and does not study the effects of de-identification on other tasks. In this paper, we close this gap by reporting concept extraction performance on automatically anonymized data and investigating joint models for de-identification and concept extraction. In particular, we propose a stacked model with restricted access to privacy-sensitive information and a multitask model. We set the new state of the art on benchmark datasets in English (96.1% F1 for de-identification and 88.9% F1 for concept extraction) and Spanish (91.4% F1 for concept extraction).

下载PDF全文

下载文献需遵守相关版权规定

论文标题