论文标题
中国自由文本放射学报告肝癌诊断的自然语言处理管道
A Natural Language Processing Pipeline of Chinese Free-text Radiology Reports for Liver Cancer Diagnosis
论文作者
论文摘要
尽管在电子病历(EMRS)中实施了自然语言处理(NLP),但由于语料库有限和特定的语法特征,尤其是对于放射学报告,中国EMRS处理仍然具有挑战性。在这项研究中,我们设计了一条NLP管道,用于直接从中国放射学报告中直接提取临床相关特征,这是计算机辅助放射学诊断的第一步。该管道由指定的实体识别,同义词归一化和关系提取,最终得出由一个或多个术语组成的放射学特征。在指定的实体识别中,我们将词典纳入了深度学习模型双向长期记忆条件随机场(BilstM-CRF),该模型最终达到了F1分数93.00%。借助提取的放射学特征,使用了最少的绝对收缩和选择操作员和机器学习方法(支持矢量机,随机森林,决策树和逻辑回归)来构建用于肝癌预测的分类器。对于肝癌诊断,随机森林在肝癌诊断中的预测性最高(F1得分为86.97%,精度为87.71%,回忆86.25%)。这项工作是一项全面的NLP研究,重点是中国放射学报告以及NLP在癌症风险预测中的应用。提议的放射学特征提取的NLP管道可以在其他类型的中国临床文本和其他疾病预测任务中轻松实施。
Despite the rapid development of natural language processing (NLP) implementation in electronic medical records (EMRs), Chinese EMRs processing remains challenging due to the limited corpus and specific grammatical characteristics, especially for radiology reports. In this study, we designed an NLP pipeline for the direct extraction of clinically relevant features from Chinese radiology reports, which is the first key step in computer-aided radiologic diagnosis. The pipeline was comprised of named entity recognition, synonyms normalization, and relationship extraction to finally derive the radiological features composed of one or more terms. In named entity recognition, we incorporated lexicon into deep learning model bidirectional long short-term memory-conditional random field (BiLSTM-CRF), and the model finally achieved an F1 score of 93.00%. With the extracted radiological features, least absolute shrinkage and selection operator and machine learning methods (support vector machine, random forest, decision tree, and logistic regression) were used to build the classifiers for liver cancer prediction. For liver cancer diagnosis, random forest had the highest predictive performance in liver cancer diagnosis (F1 score 86.97%, precision 87.71%, and recall 86.25%). This work was a comprehensive NLP study focusing on Chinese radiology reports and the application of NLP in cancer risk prediction. The proposed NLP pipeline for the radiological feature extraction could be easily implemented in other kinds of Chinese clinical texts and other disease predictive tasks.