常识还是世界知识？调查基于适配器的知识注入到验证的变压器中

论文标题

常识还是世界知识？调查基于适配器的知识注入到验证的变压器中

Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers

论文作者

Lauscher, Anne, Majewska, Olga, Ribeiro, Leonardo F. R., Gurevych, Iryna, Rozanov, Nikolai, Glavaš, Goran

论文摘要

遵循神经语言模型（LMS）（例如BERT或GPT-2）对各种语言理解任务的主要成功，最近的工作着重于将外部资源的（结构化）知识注入这些模型。一方面，联合预处理（即，从头开始训练，基于外部知识的目标添加目标）可能会非常昂贵的计算昂贵的外部知识，事后微调，另一方面，可能会导致灾难性的分布知识的灾难性忘记。在这项工作中，我们研究了使用适配器培训的概念网及其相应的开放式常识（OMC）语料库的概念知识来补充BERT的分布知识的模型。尽管胶水基准上的总体结果描绘了一幅不确定的图片，但更深入的分析表明，我们基于适配器的模型在推理任务上基于适配器的模型（最多高达15-20个绩效点），这需要在概念网和OMC中明确存在的概念知识类型。所有代码和实验都是开源的：https：//github.com/wluper/retrograph。

Following the major success of neural language models (LMs) such as BERT or GPT-2 on a variety of language understanding tasks, recent work focused on injecting (structured) knowledge from external resources into these models. While on the one hand, joint pretraining (i.e., training from scratch, adding objectives based on external knowledge to the primary LM objective) may be prohibitively computationally expensive, post-hoc fine-tuning on external knowledge, on the other hand, may lead to the catastrophic forgetting of distributional knowledge. In this work, we investigate models for complementing the distributional knowledge of BERT with conceptual knowledge from ConceptNet and its corresponding Open Mind Common Sense (OMCS) corpus, respectively, using adapter training. While overall results on the GLUE benchmark paint an inconclusive picture, a deeper analysis reveals that our adapter-based models substantially outperform BERT (up to 15-20 performance points) on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and OMCS. All code and experiments are open sourced under: https://github.com/wluper/retrograph .

下载PDF全文

下载文献需遵守相关版权规定

论文标题