Debert堆叠：文本分类不完整数据中的所有注意力

论文标题

Debert堆叠：文本分类不完整数据中的所有注意力

Stacked DeBERT: All Attention in Incomplete Data for Text Classification

论文作者

Sergio, Gwenaelle Cunha, Lee, Minho

论文摘要

在本文中，我们提出了堆叠的迪伯特（Debert），这是变形金刚的堆叠deNOCODICTION BIDECTICTAL编码器表示的缩写。与现有系统相比，这种新颖的模型通过在BERT中设计新颖的编码方案（仅基于注意力机制的强大语言表示模型）来提高不完整数据的鲁棒性。自然语言处理中的不完整数据是指缺少或不正确的单词的文本，其存在可能会阻碍目前未实施以承受这种噪音的当前模型的性能，但即使在胁迫下也必须表现良好。这是由于目前的方法是为当前的方法构建和培训的，因此可以用干净和完整的数据进行培训，因此无法提取可以充分代表不完整数据的功能。我们提出的方法包括通过将嵌入层应用于输入令牌，然后是香草变压器，从而获得中间输入表示。这些中间特征被作为新的脱氧变压器的输入，这些变压器负责获得富含输入表示。提出的方法利用了多层感知器的堆栈，通过提取更抽象和有意义的隐藏特征向量和双向变压器来重建缺失单词的嵌入，以改善嵌入表示表示。我们考虑两个用于培训和评估的数据集：聊天机器人自然语言理解评估语料库和Kaggle的Twitter情感语料库。我们的模型显示了在推文中存在的非正式/不正确文本中的F1分数和更好的鲁棒性，以及在情感和意图分类任务中具有语音到文本错误的文本中。

In this paper, we propose Stacked DeBERT, short for Stacked Denoising Bidirectional Encoder Representations from Transformers. This novel model improves robustness in incomplete data, when compared to existing systems, by designing a novel encoding scheme in BERT, a powerful language representation model solely based on attention mechanisms. Incomplete data in natural language processing refer to text with missing or incorrect words, and its presence can hinder the performance of current models that were not implemented to withstand such noises, but must still perform well even under duress. This is due to the fact that current approaches are built for and trained with clean and complete data, and thus are not able to extract features that can adequately represent incomplete data. Our proposed approach consists of obtaining intermediate input representations by applying an embedding layer to the input tokens followed by vanilla transformers. These intermediate features are given as input to novel denoising transformers which are responsible for obtaining richer input representations. The proposed approach takes advantage of stacks of multilayer perceptrons for the reconstruction of missing words' embeddings by extracting more abstract and meaningful hidden feature vectors, and bidirectional transformers for improved embedding representation. We consider two datasets for training and evaluation: the Chatbot Natural Language Understanding Evaluation Corpus and Kaggle's Twitter Sentiment Corpus. Our model shows improved F1-scores and better robustness in informal/incorrect texts present in tweets and in texts with Speech-to-Text error in the sentiment and intent classification tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题