论文标题
通过语法感知的本地关注来改善伯特
Improving BERT with Syntax-aware Local Attention
论文作者
论文摘要
诸如BERT之类的基于预训练的变压器的神经语言模型已在NLP任务的品种上取得了显着的结果。最近的作品表明,基于注意力的模型可以从对本地地区的更集中注意力中受益。他们中的大多数都限制了线性跨度的注意力范围,或者仅限于某些任务,例如机器翻译和问题答案。在本文中,我们提出了语法感知的局部关注,其中注意范围是根据句法结构的距离来限制的。所提出的语法感知的本地关注可以与验证的语言模型(例如BERT)集成,以使模型专注于语法相关的单词。我们对各种单句基准进行实验,包括句子分类和序列标记任务。实验结果表明,在所有基准数据集上的BERT上的收益一致。广泛的研究证明,由于对语法相关的单词的关注更加集中,我们的模型可以提高性能。
Pre-trained Transformer-based neural language models, such as BERT, have achieved remarkable results on varieties of NLP tasks. Recent works have shown that attention-based models can benefit from more focused attention over local regions. Most of them restrict the attention scope within a linear span, or confine to certain tasks such as machine translation and question answering. In this paper, we propose a syntax-aware local attention, where the attention scopes are restrained based on the distances in the syntactic structure. The proposed syntax-aware local attention can be integrated with pretrained language models, such as BERT, to render the model to focus on syntactically relevant words. We conduct experiments on various single-sentence benchmarks, including sentence classification and sequence labeling tasks. Experimental results show consistent gains over BERT on all benchmark datasets. The extensive studies verify that our model achieves better performance owing to more focused attention over syntactically relevant words.