教授语言模型以验证的报价支持答案

论文标题

教授语言模型以验证的报价支持答案

Teaching language models to support answers with verified quotes

论文作者

Menick, Jacob, Trebacz, Maja, Mikulik, Vladimir, Aslanides, John, Song, Francis, Chadwick, Martin, Glaese, Mia, Young, Susannah, Campbell-Gillingham, Lucy, Irving, Geoffrey, McAleese, Nat

论文摘要

最近的大型语言模型通常会正确回答事实问题。但是用户不能相信没有事实检查的任何给定的索赔，因为语言模型会幻觉令人信服的胡说八道。在这项工作中，我们使用从人类偏好（RLHP）学习的强化学习来培训“开放式质量书”质量检查模型，以产生答案，同时还引用了其主张的具体证据，这有助于评估正确性。支持证据是从通过搜索引擎或单个用户提供的文档中找到的多个文档中得出的。我们的2800亿个参数模型Gophercite能够以高质量的支持证据提供答案，并在不确定时放弃回答。我们通过在Antubsquestions和ELI5数据集中对问题的答案进行人体评估来衡量Gophercite的性能。在此自然问题子集上，该模型的响应是高质量的80 \％，而在ELI5子集中有67％的时间。从最不确定的第三个问题中弃权将性能分别提高到90 \％和80 \％，接近人类基准。但是，对对抗性真实性数据集的分析表明，为什么引用只是安全和可信度的总体策略的一部分：并非证据支持的所有主张都是正确的。

Recent large language models often answer factual questions correctly. But users can't trust any given claim a model makes without fact-checking, because language models can hallucinate convincing nonsense. In this work we use reinforcement learning from human preferences (RLHP) to train "open-book" QA models that generate answers whilst also citing specific evidence for their claims, which aids in the appraisal of correctness. Supporting evidence is drawn from multiple documents found via a search engine, or from a single user-provided document. Our 280 billion parameter model, GopherCite, is able to produce answers with high quality supporting evidence and abstain from answering when unsure. We measure the performance of GopherCite by conducting human evaluation of answers to questions in a subset of the NaturalQuestions and ELI5 datasets. The model's response is found to be high-quality 80\% of the time on this Natural Questions subset, and 67\% of the time on the ELI5 subset. Abstaining from the third of questions for which it is most unsure improves performance to 90\% and 80\% respectively, approaching human baselines. However, analysis on the adversarial TruthfulQA dataset shows why citation is only one part of an overall strategy for safety and trustworthiness: not all claims supported by evidence are true.

下载PDF全文

下载文献需遵守相关版权规定

论文标题