通过反事实样本调查从逻辑形式的自然语言生成的鲁棒性

论文标题

通过反事实样本调查从逻辑形式的自然语言生成的鲁棒性

Investigating the Robustness of Natural Language Generation from Logical Forms via Counterfactual Samples

论文作者

Liu, Chengyuan, Gan, Leilei, Kuang, Kun, Wu, Fei

论文摘要

Logic2Text的目的是生成以表和逻辑形式为条件的可控制和忠实的文本，这些文本不仅需要对表和逻辑形式有深入的了解，而且还需要对表上的象征性推理。基于预训练模型的最新方法在标准测试数据集上取得了出色的性能。但是，我们质疑这些方法是否真正学习了如何执行逻辑推理，而不仅仅是依靠表的标题和逻辑形式的运营商之间的虚假相关性。为了验证这一假设，我们手动构建了一组反事实样本，这些样本修改了原始逻辑形式，以生成相反的逻辑形式，这些形式很少使用共发生的表标头和逻辑运算符。与原始测试数据集中的结果相比，SOTA方法在这些反事实样本上给出了更差的结果，这证实了我们的假设。为了解决这个问题，我们首先从因果角度分析了这种偏见，根据该偏见，我们提出了两种方法来减少模型对快捷方式的依赖。第一个将逻辑形式的层次结构纳入模型。第二个利用自动生成的反事实数据进行培训。原始测试数据集和反事实数据集的自动和手动实验结果表明，我们的方法可有效减轻虚假相关性。我们的工作指出了以前方法的弱点，并迈出了具有真正逻辑推理能力的Logic2Text模型的进一步步骤。

The aim of Logic2Text is to generate controllable and faithful texts conditioned on tables and logical forms, which not only requires a deep understanding of the tables and logical forms, but also warrants symbolic reasoning over the tables. State-of-the-art methods based on pre-trained models have achieved remarkable performance on the standard test dataset. However, we question whether these methods really learn how to perform logical reasoning, rather than just relying on the spurious correlations between the headers of the tables and operators of the logical form. To verify this hypothesis, we manually construct a set of counterfactual samples, which modify the original logical forms to generate counterfactual logical forms with rarely co-occurred table headers and logical operators. SOTA methods give much worse results on these counterfactual samples compared with the results on the original test dataset, which verifies our hypothesis. To deal with this problem, we firstly analyze this bias from a causal perspective, based on which we propose two approaches to reduce the model's reliance on the shortcut. The first one incorporates the hierarchical structure of the logical forms into the model. The second one exploits automatically generated counterfactual data for training. Automatic and manual experimental results on the original test dataset and the counterfactual dataset show that our method is effective to alleviate the spurious correlation. Our work points out the weakness of previous methods and takes a further step toward developing Logic2Text models with real logical reasoning ability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题