论文标题

AutoCAD:自动产生反事实来减轻快捷方式学习

AutoCAD: Automatically Generating Counterfactuals for Mitigating Shortcut Learning

论文作者

Wen, Jiaxin, Zhu, Yeshuang, Zhang, Jinchao, Zhou, Jie, Huang, Minlie

论文摘要

最近的研究表明,反事实增强数据(CAD)的令人印象深刻的功效可降低NLU模型对虚假特征的依赖并提高其普遍性。但是,当前的方法仍然在很大程度上依赖人类的努力或特定于任务的设计来产生反事实,从而阻碍了CAD的适用性,而不是NLU任务。在本文中,我们提出了AutoCAD,这是一个全自动和任务不合时宜的CAD生成框架。 AutoCAD首先利用分类器来毫不客气地将理由识别为要进行的跨度,从而消除了伪造和因果特征。然后,AutoCAD通过不可能的训练来增强可控的生成,以产生不同的反事实。对多个室外和挑战基准的广泛评估表明,自动加成始终如一,显着提高了不同NLU任务的强大预训练模型的分布外部性能,这比以前的先前最先进的人类在卢比或任务特定的CAD方法上是可比性甚至更好的。该代码可在https://github.com/thu-coai/autocad上公开获取。

Recent studies have shown the impressive efficacy of counterfactually augmented data (CAD) for reducing NLU models' reliance on spurious features and improving their generalizability. However, current methods still heavily rely on human efforts or task-specific designs to generate counterfactuals, thereby impeding CAD's applicability to a broad range of NLU tasks. In this paper, we present AutoCAD, a fully automatic and task-agnostic CAD generation framework. AutoCAD first leverages a classifier to unsupervisedly identify rationales as spans to be intervened, which disentangles spurious and causal features. Then, AutoCAD performs controllable generation enhanced by unlikelihood training to produce diverse counterfactuals. Extensive evaluations on multiple out-of-domain and challenge benchmarks demonstrate that AutoCAD consistently and significantly boosts the out-of-distribution performance of powerful pre-trained models across different NLU tasks, which is comparable or even better than previous state-of-the-art human-in-the-loop or task-specific CAD methods. The code is publicly available at https://github.com/thu-coai/AutoCAD.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源