K-Audapter：将知识注入适配器的预训练模型

论文标题

K-Audapter：将知识注入适配器的预训练模型

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

论文作者

Wang, Ruize, Tang, Duyu, Duan, Nan, Wei, Zhongyu, Huang, Xuanjing, ji, Jianshu, Cao, Guihong, Jiang, Daxin, Zhou, Ming

论文摘要

我们研究将知识注入大型预训练模型等大型模型的问题。现有方法通常在注入知识时更新预训练模型的原始参数。但是，当注入多种知识时，历史上注入的知识将被冲走。为了解决这个问题，我们提出了K-Adapter，该框架保留了固定的预训练模型的原始参数并支持多功能知识融合模型的开发。 K-Adapter以罗伯塔为骨干模型，具有每种注入知识的神经适配器，例如连接到罗伯塔的插件。不同的适配器之间没有信息流，因此可以以分布式的方式有效地训练多个适配器。作为一个案例研究，我们在这项工作中注入了两种知识，包括（1）从Wikipedia和Wikidata上自动对齐的文本 - 细胞所获得的事实知识以及（2）通过依赖性解析获得的语言知识。结果是三个知识驱动的任务，包括关系分类，实体键入和问题回答，表明每个适配器都会改善两个适配器的性能和组合，从而带来了进一步的改进。进一步的分析表明，K-Adapter捕获了比罗伯塔的多功能知识。

We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. However, when multiple kinds of knowledge are injected, the historically injected knowledge would be flushed away. To address this, we propose K-Adapter, a framework that retains the original parameters of the pre-trained model fixed and supports the development of versatile knowledge-infused model. Taking RoBERTa as the backbone model, K-Adapter has a neural adapter for each kind of infused knowledge, like a plug-in connected to RoBERTa. There is no information flow between different adapters, thus multiple adapters can be efficiently trained in a distributed way. As a case study, we inject two kinds of knowledge in this work, including (1) factual knowledge obtained from automatically aligned text-triplets on Wikipedia and Wikidata and (2) linguistic knowledge obtained via dependency parsing. Results on three knowledge-driven tasks, including relation classification, entity typing, and question answering, demonstrate that each adapter improves the performance and the combination of both adapters brings further improvements. Further analysis indicates that K-Adapter captures versatile knowledge than RoBERTa.

下载PDF全文

下载文献需遵守相关版权规定

论文标题