论文标题
通过人为生成的文本进行查询扩展
Query expansion with artificially generated texts
论文作者
论文摘要
提高文档检索性能的一种众所周知的方法是扩展用户的查询。文献中已经提出了几种方法,其中一些方法被认为是在IR中产生最先进的结果。在本文中,我们探讨了文本生成自动扩展查询的使用。我们依靠众所周知的神经生成模型GPT-2,该模型与预先培训的英语模型有关,但也可以在特定的Corpora上进行微调。通过不同的实验,我们表明文本生成是提高IR系统性能的非常有效的方法(+10%的地图增益),并且它优于强大的基线也依赖于查询扩展(LM+RM3)。由于GPT代码和模型的可用性,这种概念上简单的方法可以在任何IR系统上轻松实现。
A well-known way to improve the performance of document retrieval is to expand the user's query. Several approaches have been proposed in the literature, and some of them are considered as yielding state-of-the-art results in IR. In this paper, we explore the use of text generation to automatically expand the queries. We rely on a well-known neural generative model, GPT-2, that comes with pre-trained models for English but can also be fine-tuned on specific corpora. Through different experiments, we show that text generation is a very effective way to improve the performance of an IR system, with a large margin (+10% MAP gains), and that it outperforms strong baselines also relying on query expansion (LM+RM3). This conceptually simple approach can easily be implemented on any IR system thanks to the availability of GPT code and models.