论文标题
在生物医学领域中的多角度语义信息检索
Multi-Perspective Semantic Information Retrieval in the Biomedical Domain
论文作者
论文摘要
信息检索(IR)是获得与特定查询或需求从大量信息存储库中相关的数据(例如文档)的任务。 IR是几个下游自然语言处理(NLP)任务的宝贵组成部分。实际上,IR是许多广泛使用的技术等搜索引擎的核心。虽然自1970年代以来,与IR系统中使用了诸如OKAPI BM25功能之类的概率排名函数,但与其经典的对应物相比,现代神经方法具有某些优势。特别是,通过证明在大量数据语料库中训练的蒙版语言模型如何改善各种下游NLP任务,包括句子分类和通道重新排序,对NLP社区的释放(从变形金刚的双向编码器表示)对NLP社区产生了重大影响。 IR系统在生物医学和临床领域也很重要。鉴于跨生物医学领域的科学文献量增加,该能力从数百万篇文章的存储库中找到特定临床查询的答案是对医学专业人员的实际价值问题。此外,存在一些特定领域的挑战,包括处理临床行话并评估各种医学症状在确定查询和句子之间的相关性时的相似性或相关性。这项工作为生物医学语义信息检索领域的几个方面做出了贡献。首先,它引入了多句话句子相关性,这是一种利用基于BERT的模型进行上下文IR的新方法。使用BioASQ生物医学IR挑战评估该系统。最后,提供了用于医务人员实时IR系统的形式的实践贡献,并提供了有关生活系统审查临床任务的拟议挑战。
Information Retrieval (IR) is the task of obtaining pieces of data (such as documents) that are relevant to a particular query or need from a large repository of information. IR is a valuable component of several downstream Natural Language Processing (NLP) tasks. Practically, IR is at the heart of many widely-used technologies like search engines. While probabilistic ranking functions like the Okapi BM25 function have been utilized in IR systems since the 1970's, modern neural approaches pose certain advantages compared to their classical counterparts. In particular, the release of BERT (Bidirectional Encoder Representations from Transformers) has had a significant impact in the NLP community by demonstrating how the use of a Masked Language Model trained on a large corpus of data can improve a variety of downstream NLP tasks, including sentence classification and passage re-ranking. IR Systems are also important in the biomedical and clinical domains. Given the increasing amount of scientific literature across biomedical domain, the ability find answers to specific clinical queries from a repository of millions of articles is a matter of practical value to medical professionals. Moreover, there are domain-specific challenges present, including handling clinical jargon and evaluating the similarity or relatedness of various medical symptoms when determining the relevance between a query and a sentence. This work presents contributions to several aspects of the Biomedical Semantic Information Retrieval domain. First, it introduces Multi-Perspective Sentence Relevance, a novel methodology of utilizing BERT-based models for contextual IR. The system is evaluated using the BioASQ Biomedical IR Challenge. Finally, practical contributions in the form of a live IR system for medics and a proposed challenge on the Living Systematic Review clinical task are provided.