探索用伯特的命名实体识别的跨句子上下文

论文标题

探索用伯特的命名实体识别的跨句子上下文

Exploring Cross-sentence Contexts for Named Entity Recognition with BERT

论文作者

Luoma, Jouni, Pyysalo, Sampo

论文摘要

命名实体识别（NER）经常被称为序列分类任务，其中每个输入由文本的一个句子组成。然而，很明显，在单句环境的范围之外，通常可以找到该任务的有用信息。最近提出的自我发场模型（例如BERT）既可以有效地捕获输入中的长距离关系，又可以代表由几个句子组成的输入，为将跨句子信息纳入自然语言处理任务中的方法创造了新的机会。在本文中，我们提出了一项系统的研究，该研究探讨了使用五种语言的BERT模型在NER中使用跨句子信息。我们发现，以其他句子的形式添加上下文以系统地提高所有测试语言和模型的NER性能。在每个输入中包含多个句子也使我们能够在不同情况下研究相同句子的预测。我们提出了一种直接的方法，即情境多数投票（CMV），以结合句子的不同预测，并证明这一点以进一步提高伯特的NER性能。我们的方法不需要对基础BERT体系结构进行任何更改，而是依靠重组示例进行培训和预测。对包括CONLL'02和CONLL'03 NER基准在内的已建立数据集的评估表明，我们提出的方法可以改善对英语，荷兰语和芬兰语的最先进的结果，从而取得了基于BERT的基于BERT的基于德语的结果，并且与其他基于Bert基于Bert的方法相同。我们在此工作中发布了根据开放许可实施的所有方法。

Named entity recognition (NER) is frequently addressed as a sequence classification task where each input consists of one sentence of text. It is nevertheless clear that useful information for the task can often be found outside of the scope of a single-sentence context. Recently proposed self-attention models such as BERT can both efficiently capture long-distance relationships in input as well as represent inputs consisting of several sentences, creating new opportunitites for approaches that incorporate cross-sentence information in natural language processing tasks. In this paper, we present a systematic study exploring the use of cross-sentence information for NER using BERT models in five languages. We find that adding context in the form of additional sentences to BERT input systematically increases NER performance on all of the tested languages and models. Including multiple sentences in each input also allows us to study the predictions of the same sentences in different contexts. We propose a straightforward method, Contextual Majority Voting (CMV), to combine different predictions for sentences and demonstrate this to further increase NER performance with BERT. Our approach does not require any changes to the underlying BERT architecture, rather relying on restructuring examples for training and prediction. Evaluation on established datasets, including the CoNLL'02 and CoNLL'03 NER benchmarks, demonstrates that our proposed approach can improve on the state-of-the-art NER results on English, Dutch, and Finnish, achieves the best reported BERT-based results on German, and is on par with performance reported with other BERT-based approaches in Spanish. We release all methods implemented in this work under open licenses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题