MS-SHIFT：MS MAS MARCO分布在神经检索上的分析

论文标题

MS-SHIFT：MS MAS MARCO分布在神经检索上的分析

MS-Shift: An Analysis of MS MARCO Distribution Shifts on Neural Retrieval

论文作者

Lupart, Simon, Formal, Thibault, Clinchant, Stéphane

论文摘要

在信息检索中，预训练的语言模型最近出现了，因为它提供了新一代神经系统的骨干，这些神经系统在各种任务上都超过了传统方法。但是，目前尚不清楚这种方法在零拍的条件下在多大程度上推广。最近的Beir Benchmark通过比较与培训条件不同的数据集和任务上的模型，为该问题提供了部分答案。我们的目标是通过比较更明确的分配变化下的模型来解决相同的问题。为此，我们在Marco女士（查询语义，查询，查询长度）中建立了三个基于查询的分布变化，这些变化用于评估基于BERT的三个主要神经回收者家族：稀疏，密集和后期交流 - 以及Monobert Re-Ranker。我们进一步分析了火车和测试查询分布之间的性能下降。特别是，我们尝试了两个概括指标：第一个指标基于火车/测试查询词汇重叠，而第二个基于训练有素的双重编码器的表示。直观地，这些指标验证了测试集远离火车的指标，性能下降越糟。我们还表明，模型对变化的反应有所不同 - 密集的方法是受影响最大的方法。总体而言，我们的研究表明，可以设计更可控制的分布变化作为更好地了解IR模型概括的工具。最后，我们发布了MS MARCO查询子集，该子集为信息检索提供了基准零击转的附加资源。

Pre-trained Language Models have recently emerged in Information Retrieval as providing the backbone of a new generation of neural systems that outperform traditional methods on a variety of tasks. However, it is still unclear to what extent such approaches generalize in zero-shot conditions. The recent BEIR benchmark provides partial answers to this question by comparing models on datasets and tasks that differ from the training conditions. We aim to address the same question by comparing models under more explicit distribution shifts. To this end, we build three query-based distribution shifts within MS MARCO (query-semantic, query-intent, query-length), which are used to evaluate the three main families of neural retrievers based on BERT: sparse, dense, and late-interaction -- as well as a monoBERT re-ranker. We further analyse the performance drops between the train and test query distributions. In particular, we experiment with two generalization indicators: the first one based on train/test query vocabulary overlap, and the second based on representations of a trained bi-encoder. Intuitively, those indicators verify that the further away the test set is from the train one, the worse the drop in performance. We also show that models respond differently to the shifts -- dense approaches being the most impacted. Overall, our study demonstrates that it is possible to design more controllable distribution shifts as a tool to better understand generalization of IR models. Finally, we release the MS MARCO query subsets, which provide an additional resource to benchmark zero-shot transfer in Information Retrieval.

下载PDF全文

下载文献需遵守相关版权规定

论文标题