有效的转移学习以识别类似问题：将用户问题与COVID匹配到COVID-19

论文标题

有效的转移学习以识别类似问题：将用户问题与COVID匹配到COVID-19

Effective Transfer Learning for Identifying Similar Questions: Matching User Questions to COVID-19 FAQs

论文作者

McCreery, Clara H., Katariya, Namit, Kannan, Anitha, Chablani, Manish, Amatriain, Xavier

论文摘要

人们越来越多地在线搜索他们的医疗问题的答案，但是在线提出医疗问题的速度大大超过了合格的人回答这些问题的能力。这留下了许多没有回答或不充分回答的问题。这些问题中的许多不是唯一的，对类似问题的可靠识别将使更有效，有效地回答模式。 Covid-19仅加剧了这个问题。几乎每个政府机构和医疗保健组织都试图通过构建在线常见问题来满足用户的信息需求，但是人们无法提出问题并知道是否在其中一个页面上回答。尽管许多研究工作集中在一般问题相似性的问题上，但这些方法并不能很好地推广到需要专家知识来确定语义相似性（例如医学领域）的领域。在本文中，我们展示了在医疗问答对中预处理神经网络的双重微调方法，然后在医疗问题问题对上进行微调是一个特别有用的中间任务，用于确定医疗问题相似性的最终目标。尽管其他预训练的任务在此任务上的准确性低于78.7％，但我们的模型在相同数量的培训示例中获得了82.6％的精度，精度为80.0％，而训练组较小，精度为84.5％。我们还描述了当前的实时系统，该系统使用训练有素的模型将用户问题与共同相关的常见问题解答匹配。

People increasingly search online for answers to their medical questions but the rate at which medical questions are asked online significantly exceeds the capacity of qualified people to answer them. This leaves many questions unanswered or inadequately answered. Many of these questions are not unique, and reliable identification of similar questions would enable more efficient and effective question answering schema. COVID-19 has only exacerbated this problem. Almost every government agency and healthcare organization has tried to meet the informational need of users by building online FAQs, but there is no way for people to ask their question and know if it is answered on one of these pages. While many research efforts have focused on the problem of general question similarity, these approaches do not generalize well to domains that require expert knowledge to determine semantic similarity, such as the medical domain. In this paper, we show how a double fine-tuning approach of pretraining a neural network on medical question-answer pairs followed by fine-tuning on medical question-question pairs is a particularly useful intermediate task for the ultimate goal of determining medical question similarity. While other pretraining tasks yield an accuracy below 78.7% on this task, our model achieves an accuracy of 82.6% with the same number of training examples, an accuracy of 80.0% with a much smaller training set, and an accuracy of 84.5% when the full corpus of medical question-answer data is used. We also describe a currently live system that uses the trained model to match user questions to COVID-related FAQs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题