论文标题

使用有关回答的全息压缩嵌入

Using Holographically Compressed Embeddings in Question Answering

论文作者

Barbosa, Salvador E.

论文摘要

单词矢量表示是深度学习自然语言处理模型的核心。这些向量的许多形式被称为嵌入,包括Word2Vec和Glove。嵌入在大型语料库中接受培训,并在上下文中学习该单词的用法,从而捕获单词之间的语义关系。但是,此类训练的语义是在不同单词(称为单词类型)的水平上,例如,当单词类型可以是名词或动词时,可能是模棱两可的。在有问题的回答中,词性和命名实体类型很重要,但是在神经模型中编码这些属性会扩大输入的大小。这项研究采用预训练的嵌入的全息压缩,以代表一个令牌,其词性和命名实体类型,与仅代表令牌相同的维度。在回答经常性深度学习网络的一个修改的问题中,该实现表明语义关系得到了保留,并产生了强大的性能。

Word vector representations are central to deep learning natural language processing models. Many forms of these vectors, known as embeddings, exist, including word2vec and GloVe. Embeddings are trained on large corpora and learn the word's usage in context, capturing the semantic relationship between words. However, the semantics from such training are at the level of distinct words (known as word types), and can be ambiguous when, for example, a word type can be either a noun or a verb. In question answering, parts-of-speech and named entity types are important, but encoding these attributes in neural models expands the size of the input. This research employs holographic compression of pre-trained embeddings, to represent a token, its part-of-speech, and named entity type, in the same dimension as representing only the token. The implementation, in a modified question answering recurrent deep learning network, shows that semantic relationships are preserved, and yields strong performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源