Burt：从学习有意义的细分市场中以伯特为灵感的普遍表示

论文标题

Burt：从学习有意义的细分市场中以伯特为灵感的普遍表示

BURT: BERT-inspired Universal Representation from Learning Meaningful Segment

论文作者

Li, Yian, Zhao, Hai

论文摘要

尽管预先训练的上下文化语言模型（例如BERT）在各种下游任务上实现了显着的性能，但当前的语言表示仍然只关注特定粒度的语言目标，当同时涉及多个语言单位时，这可能不适用。因此，这项工作介绍并探讨了通用表示学习，即在统一矢量空间中不同级别的语言单元的嵌入。我们提出了一个通用表示模型Burt（Bert启发的通用表示从学习有意义的段），以将不同级别的语言单元编码到同一矢量空间中。具体而言，我们基于点的互信息（PMI）提取并掩盖了有意义的段，以将不同的颗粒目标纳入训练阶段。我们在英语和中文数据集上进行实验，包括胶水和线索基准，我们的模型在各种下游任务上超过了其基准和替代方案。我们介绍了用单词，短语和句子来构建类比数据集的方法，并尝试使用多个表示模型，以通过独立于任务的评估来检查学习向量空间的几何特性。最后，我们在两个现实世界的文本匹配方案中验证了统一的训练策略的有效性。结果，我们的模型极大地胜过现有信息检索方法（IR）方法，并产生可以直接应用于基于检索的问题索问题和自然语言生成任务的通用表示。

Although pre-trained contextualized language models such as BERT achieve significant performance on various downstream tasks, current language representation still only focuses on linguistic objective at a specific granularity, which may not applicable when multiple levels of linguistic units are involved at the same time. Thus this work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space. We present a universal representation model, BURT (BERT-inspired Universal Representation from learning meaningful segmenT), to encode different levels of linguistic unit into the same vector space. Specifically, we extract and mask meaningful segments based on point-wise mutual information (PMI) to incorporate different granular objectives into the pre-training stage. We conduct experiments on datasets for English and Chinese including the GLUE and CLUE benchmarks, where our model surpasses its baselines and alternatives on a wide range of downstream tasks. We present our approach of constructing analogy datasets in terms of words, phrases and sentences and experiment with multiple representation models to examine geometric properties of the learned vector space through a task-independent evaluation. Finally, we verify the effectiveness of our unified pre-training strategy in two real-world text matching scenarios. As a result, our model significantly outperforms existing information retrieval (IR) methods and yields universal representations that can be directly applied to retrieval-based question-answering and natural language generation tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题