选择性机制如何改善自我注意力网络？

论文标题

选择性机制如何改善自我注意力网络？

How Does Selective Mechanism Improve Self-Attention Networks?

论文作者

Geng, Xinwei, Wang, Longyue, Wang, Xing, Qin, Bing, Liu, Ting, Tu, Zhaopeng

论文摘要

具有选择性机制的自我注意力网络（SAN）通过集中于输入词的一部分，从而在各种NLP任务中产生了重大改进。但是，其强劲表现的根本原因尚未得到很好的解释。在本文中，我们通过评估选择性SAN（SSAS）的优势来弥合差距，这些选择性SSAS（SSAS）是通过灵活而通用的gumbel-softmax实施的。对几项代表性NLP任务的实验结果，包括自然语言推断，语义角色标签和机器翻译，表明SSAN始终优于标准SAN。通过精心设计的探测实验，我们从经验上验证了SSAN的改善可以部分归因于减轻SANS的两个常见弱点：单词顺序编码和结构建模。具体而言，选择性机制通过更多地关注对句子含义的内容词提高了SAN的改善。代码和数据在https://github.com/xwgeng/ssan上发布。

Self-attention networks (SANs) with selective mechanism has produced substantial improvements in various NLP tasks by concentrating on a subset of input words. However, the underlying reasons for their strong performance have not been well explained. In this paper, we bridge the gap by assessing the strengths of selective SANs (SSANs), which are implemented with a flexible and universal Gumbel-Softmax. Experimental results on several representative NLP tasks, including natural language inference, semantic role labelling, and machine translation, show that SSANs consistently outperform the standard SANs. Through well-designed probing experiments, we empirically validate that the improvement of SSANs can be attributed in part to mitigating two commonly-cited weaknesses of SANs: word order encoding and structure modeling. Specifically, the selective mechanism improves SANs by paying more attention to content words that contribute to the meaning of the sentence. The code and data are released at https://github.com/xwgeng/SSAN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题