论文标题
FID-EX:改进序列的序列模型
FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale Generation
论文作者
论文摘要
自然语言(NL)对模型预测的解释已成为理解和验证大型黑盒预训练模型做出的决策的手段,用于NLP任务,例如问答(QA)和事实验证。最近,事实证明,对序列(SEQ2SEQ)模型的预训练序列(SEQ2SEQ)在共同做出预测以及产生NL解释方面非常有效。但是,这些模型有许多缺点。他们甚至可以为错误的预测来制作解释,很难适应长输入文档,并且他们的培训需要大量标记的数据。在本文中,我们开发了FID-EX,它通过以下方式解决了SEQ2SEQ模型的这些缺点,通过鼓励提取生成来消除句子标记以消除解释的制造,2)使用fifusion-In-In-In-In-In-In-In-In-In-In-In-In-In-In-In-In-In-In-ecoder架构来处理长输入上下文,以及3)在重构开放式QA QA数据集中进行微调微调以改善一些shot shot shot shot shot shot shot shot shot shot shot shot。 FID-EX在解释指标和任务准确性方面对先前的工作有了显着改善,这些任务是在完全有监督的和几个播放设置中的橡皮擦解释性基准中的多个任务。
Natural language (NL) explanations of model predictions are gaining popularity as a means to understand and verify decisions made by large black-box pre-trained models, for NLP tasks such as Question Answering (QA) and Fact Verification. Recently, pre-trained sequence to sequence (seq2seq) models have proven to be very effective in jointly making predictions, as well as generating NL explanations. However, these models have many shortcomings; they can fabricate explanations even for incorrect predictions, they are difficult to adapt to long input documents, and their training requires a large amount of labeled data. In this paper, we develop FiD-Ex, which addresses these shortcomings for seq2seq models by: 1) introducing sentence markers to eliminate explanation fabrication by encouraging extractive generation, 2) using the fusion-in-decoder architecture to handle long input contexts, and 3) intermediate fine-tuning on re-structured open domain QA datasets to improve few-shot performance. FiD-Ex significantly improves over prior work in terms of explanation metrics and task accuracy, on multiple tasks from the ERASER explainability benchmark, both in the fully supervised and in the few-shot settings.