论文标题
当不信任语言模型时:调查参数和非参数记忆的有效性
When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories
论文作者
论文摘要
尽管他们在各种任务上的表现令人印象深刻,但大型语言模型(LMS)仍然在需要丰富世界知识的任务上挣扎,这意味着仅依靠其参数来编码大量世界知识的局限性。本文旨在通过对10个模型的大规模知识探测实验和POPQA进行4种增强方法来了解LMS在记忆中的优势和局限性,这是我们的新开放式质量识别QA数据集,其中有14K问题。我们发现,LMS在不流行的事实知识中挣扎,而扩展并未明显改善长期尾巴中事实知识的记忆。然后,我们表明,检索功能的LMS在很大程度上要比LMS较大的数量级,而无助的LMS在有关高人口实体的问题中仍然具有竞争力。根据这些发现,我们为强大而有效的检索LMS设计了一种简单但有效的方法,该方法仅在必要时才能检索非参数记忆。实验结果表明,这显着改善了模型的性能,同时降低了推理成本。
Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the limitations of relying solely on their parameters to encode a wealth of world knowledge. This paper aims to understand LMs' strengths and limitations in memorizing factual knowledge, by conducting large-scale knowledge probing experiments of 10 models and 4 augmentation methods on PopQA, our new open-domain QA dataset with 14k questions. We find that LMs struggle with less popular factual knowledge, and that scaling fails to appreciably improve memorization of factual knowledge in the long tail. We then show that retrieval-augmented LMs largely outperform orders of magnitude larger LMs, while unassisted LMs remain competitive in questions about high-popularity entities. Based on those findings, we devise a simple, yet effective, method for powerful and efficient retrieval-augmented LMs, which retrieves non-parametric memories only when necessary. Experimental results show that this significantly improves models' performance while reducing the inference costs.