论文标题
学习上下文化的医疗保健答案的文档表示
Learning Contextualized Document Representations for Healthcare Answer Retrieval
论文作者
论文摘要
我们介绍了上下文话语向量(CDV),这是一种分布式文档表示,以从长期的医疗保健文件中检索有效的答案。我们的方法基于实体和自由文本和医学分类法的结构化查询元素。我们的模型利用层次LSTM层和多任务培训来利用双重编码器体系结构来编码临床实体和方面的位置以及文档论述。我们使用连续表示,使用句子级别上的大约最近的邻居搜索来解决延迟延迟的查询。我们将CDV模型应用于从网络中从9个英语公共卫生资源中检索连贯的答案,从而解决了患者和医疗专业人员。由于没有适用于所有应用程序场景的端到端培训数据,因此我们使用Wikipedia的自我监管数据来培训模型。我们表明,我们的广义模型大大优于医疗保健通过排名的几个最先进的基线,并且能够适应异质域而无需进行其他微调。
We present Contextual Discourse Vectors (CDV), a distributed document representation for efficient answer retrieval from long healthcare documents. Our approach is based on structured query tuples of entities and aspects from free text and medical taxonomies. Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse. We use our continuous representations to resolve queries with short latency using approximate nearest neighbor search on sentence level. We apply the CDV model for retrieving coherent answer passages from nine English public health resources from the Web, addressing both patients and medical professionals. Because there is no end-to-end training data available for all application scenarios, we train our model with self-supervised data from Wikipedia. We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking and is able to adapt to heterogeneous domains without additional fine-tuning.