合理化可以提高鲁棒性吗？

论文标题

合理化可以提高鲁棒性吗？

Can Rationalization Improve Robustness?

论文作者

Chen, Howard, He, Jacqueline, Narasimhan, Karthik, Chen, Danqi

论文摘要

越来越多的工作研究调查了可以产生有理由的神经NLP模型的发展 - 可以解释其模型预测的输入。在本文中，我们询问此类理由模型除了其可解释的性质外，是否还可以为对抗性攻击提供鲁棒性。由于这些模型在做出预测之前需要首先生成理由（“ Ronicalizer”），因此它们有可能忽略噪声或对手添加的文本，而简单地将其简单地掩盖在生成的理由中。为此，我们系统地为令牌和句子级别的合理化任务系统地生成了各种类型的“ addText”攻击，并对五个不同任务进行最先进的理由模型进行了广泛的经验评估。我们的实验表明，理由模型在某些情况下挣扎的同时，理由模型表明了有望改善鲁棒性 - 当合理器对位置偏见或攻击文本的词汇选择敏感时。此外，利用人类理由作为监督并不总是转化为更好的绩效。我们的研究是探索合理化的可解释性与鲁棒性之间相互作用的第一步。

A growing line of work has investigated the development of neural NLP models that can produce rationales--subsets of input that can explain their model predictions. In this paper, we ask whether such rationale models can also provide robustness to adversarial attacks in addition to their interpretable nature. Since these models need to first generate rationales ("rationalizer") before making predictions ("predictor"), they have the potential to ignore noise or adversarially added text by simply masking it out of the generated rationale. To this end, we systematically generate various types of 'AddText' attacks for both token and sentence-level rationalization tasks, and perform an extensive empirical evaluation of state-of-the-art rationale models across five different tasks. Our experiments reveal that the rationale models show the promise to improve robustness, while they struggle in certain scenarios--when the rationalizer is sensitive to positional bias or lexical choices of attack text. Further, leveraging human rationale as supervision does not always translate to better performance. Our study is a first step towards exploring the interplay between interpretability and robustness in the rationalize-then-predict framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题