具有组合嵌入和约束解码的计算有效的NER标记器

论文标题

具有组合嵌入和约束解码的计算有效的NER标记器

Computationally Efficient NER Taggers with Combined Embeddings and Constrained Decoding

论文作者

Lester, Brian, Pressel, Daniel, Hemmeter, Amy, Choudhury, Sagnik Ray

论文摘要

指定实体识别（NER）中的当前最新模型是具有条件随机场（CRF）作为最终网络层的神经模型，并且是预训练的“上下文嵌入”。 CRF层用于促进标签之间的全局连贯性，而上下文嵌入在上下文中可以更好地表示单词。但是，这两种改进都以高计算成本为基础。在这项工作中，我们探索了两种简单的技术，这些技术在强大的基线上以微不足道的成本而大大提高了NER性能。首先，我们通过串联使用多个预训练的嵌入作为单词表示。其次，我们在解码过程中限制了使用交叉渗透损失训练的标记器，以消除非法过渡。在Conll 2003上训练标签机的同时，我们发现基于上下文的标记器的$ 786 $ \％加速，而不会牺牲强劲的性能。我们还表明，串联技术跨多个任务和数据集都起作用。我们分析了预训练的嵌入与标签共发生的动力学之间的相似性和覆盖范围的各个方面，以解释这些技术为何起作用。我们在三个流行的深度学习框架中使用这些技术提供了标记器的开源实现--- tensorflow，pytorch和dynet。

Current State-of-the-Art models in Named Entity Recognition (NER) are neural models with a Conditional Random Field (CRF) as the final network layer, and pre-trained "contextual embeddings". The CRF layer is used to facilitate global coherence between labels, and the contextual embeddings provide a better representation of words in context. However, both of these improvements come at a high computational cost. In this work, we explore two simple techniques that substantially improve NER performance over a strong baseline with negligible cost. First, we use multiple pre-trained embeddings as word representations via concatenation. Second, we constrain the tagger, trained using a cross-entropy loss, during decoding to eliminate illegal transitions. While training a tagger on CoNLL 2003 we find a $786$\% speed-up over a contextual embeddings-based tagger without sacrificing strong performance. We also show that the concatenation technique works across multiple tasks and datasets. We analyze aspects of similarity and coverage between pre-trained embeddings and the dynamics of tag co-occurrence to explain why these techniques work. We provide an open source implementation of our tagger using these techniques in three popular deep learning frameworks --- TensorFlow, Pytorch, and DyNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题