论文标题

威廉克(Wecheck):通过弱监督学习,强大的事实一致性检查器

WeCheck: Strong Factual Consistency Checker via Weakly Supervised Learning

论文作者

Wu, Wenhao, Li, Wei, Xiao, Xinyan, Liu, Jiachen, Li, Sujian, Lv, Yajuan

论文摘要

当前文本生成模型的一个至关重要的问题是,它们通常会因其输入而无法控制地产生事实不一致的文本。由于缺乏注释数据的限制,现有在评估事实一致性的作品直接传递了在其他数据丰富的上游任务上训练的模型的推理能力,例如问答(QA)和自然语言推断(NLI)(NLI),而没有任何进一步的适应。结果,它们在实际生成的文本上的表现较差,并且由于其单一源上游任务而严重偏见。为了减轻这个问题,我们提出了一个弱监督的框架,该框架汇总了多个资源来培训精确,有效的事实指标,即Wecheck。 Wecheck首先利用生成模型来通过汇总其弱标签来准确标记实际生成的样品,这些标签是从多个资源推断出来的。然后,我们在考虑噪音的同时,通过较弱的监督训练目标度量模型。有关各种任务的全面实验表明,威彻克的出色表现,这比以前的最新方法平均实现了3.4 \%的绝对改进。

A crucial issue of current text generation models is that they often uncontrollably generate factually inconsistent text with respective of their inputs. Limited by the lack of annotated data, existing works in evaluating factual consistency directly transfer the reasoning ability of models trained on other data-rich upstream tasks like question answering (QA) and natural language inference (NLI) without any further adaptation. As a result, they perform poorly on the real generated text and are biased heavily by their single-source upstream tasks. To alleviate this problem, we propose a weakly supervised framework that aggregates multiple resources to train a precise and efficient factual metric, namely WeCheck. WeCheck first utilizes a generative model to accurately label a real generated sample by aggregating its weak labels, which are inferred from multiple resources. Then, we train the target metric model with the weak supervision while taking noises into consideration. Comprehensive experiments on a variety of tasks demonstrate the strong performance of WeCheck, which achieves a 3.4\% absolute improvement over previous state-of-the-art methods on TRUE benchmark on average.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源