正规化语义搜索的对比度学习

论文标题

正规化语义搜索的对比度学习

Regularized Contrastive Learning of Semantic Search

论文作者

Tan, Mingxi, Rolland, Alexis, Tian, Andong

论文摘要

语义搜索是一项重要的任务，目的是从数据库中找到相关索引以进行查询。它需要一个可以正确学习句子语义的检索模型。基于变压器的模型由于其出色的学习语义表示能力而被广泛用作检索模型。同时，还提出了许多适合它们的正则化方法。在本文中，我们提出了一种新的正则化方法：正则化对比度学习，可以帮助基于变形金刚的模型更好地表示句子。首先，它为每个句子增强了几个不同的语义表示，然后将它们作为监管机构的对比目标。这些对比的监管机构可以克服过度拟合的问题并减轻各向异性问题。我们首先使用优于预培训的模型Sroberta评估了7个语义搜索基准测试的方法。结果表明，我们的方法更有效地学习了出色的句子表示。然后，我们评估具有长期查询和索引的2个具有挑战性的常见问题数据集，咳嗽和FAQIR。我们的实验结果表明，我们的方法表现优于基线方法。

Semantic search is an important task which objective is to find the relevant index from a database for query. It requires a retrieval model that can properly learn the semantics of sentences. Transformer-based models are widely used as retrieval models due to their excellent ability to learn semantic representations. in the meantime, many regularization methods suitable for them have also been proposed. In this paper, we propose a new regularization method: Regularized Contrastive Learning, which can help transformer-based models to learn a better representation of sentences. It firstly augments several different semantic representations for every sentence, then take them into the contrastive objective as regulators. These contrastive regulators can overcome overfitting issues and alleviate the anisotropic problem. We firstly evaluate our approach on 7 semantic search benchmarks with the outperforming pre-trained model SRoBERTA. The results show that our method is more effective for learning a superior sentence representation. Then we evaluate our approach on 2 challenging FAQ datasets, Cough and Faqir, which have long query and index. The results of our experiments demonstrate that our method outperforms baseline methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题