领域：检索语言模型预训练

论文标题

领域：检索语言模型预训练

REALM: Retrieval-Augmented Language Model Pre-Training

论文作者

Guu, Kelvin, Lee, Kenton, Tung, Zora, Pasupat, Panupong, Chang, Ming-Wei

论文摘要

语言模型预培训已显示出可捕获令人惊讶的世界知识，这对于NLP任务（例如问答回答）至关重要。但是，这些知识被隐式存储在神经网络的参数中，要求越来越多的网络涵盖更多事实。为了以一种更模块化，更容易解释的方式捕获知识，我们使用潜在知识检索器来增强语言模型预训练，该模型允许该模型从大型语料库（例如Wikipedia）（例如Wikipedia）（例如Wikipedia）中获取并参观，该文档在预训练，微调和推理过程中使用。我们第一次以无监督的方式展示了如何使用蒙版的语言建模作为学习信号并通过检索步骤进行反向传播，以进行无监督的方式预先培训。我们通过微调开放域问答（Open-QA）的具有挑战性的任务来证明检索型语言模型预训练（领域）的有效性。我们将与三个流行的开放式QA基准上的显式和隐性知识存储的最新模型进行比较，并发现我们以明显的差距（绝对精度为4-16％）优于所有以前的方法，同时还提供了诸如解释性和模块化等定性益处。

Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts. To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents. We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA). We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a significant margin (4-16% absolute accuracy), while also providing qualitative benefits such as interpretability and modularity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题