对语言模型预训练的几乎没有射击的微调：一项针对命名实体识别的试点研究

论文标题

对语言模型预训练的几乎没有射击的微调：一项针对命名实体识别的试点研究

Formulating Few-shot Fine-tuning Towards Language Model Pre-training: A Pilot Study on Named Entity Recognition

论文作者

Wang, Zihan, Zhao, Kewen, Wang, Zilong, Shang, Jingbo

论文摘要

微调预训练的语言模型最近已成为为各种任务建立NLP模型的普遍做法，尤其是几乎没有射击的任务。我们认为，在几次射击设置下，更接近培训预训练的目标的微调将能够从预训练的语言模型中释放出更多的好处。在这项工作中，我们进行了几个名为“实体识别”（NER）的试点研究，其中现有的微调策略与预训练大不相同。我们为NER（FFF-ner）提出了一个新颖的微型微调框架。具体而言，我们介绍了三种新型的令牌，“ IS-ENTITY”，“ type”和barket，因此我们可以根据预训练的语言模型的选择来将NER微调作为（掩盖）代币预测或生成。在我们的实验中，我们将FFF-NER应用于BERT和BART进行微调，以在几个基准数据集中进行几次射击，并观察到对现有的微调策略的显着改善，包括序列标签，原型元学习和及时的方法。我们进一步进行了一系列消融研究，表明很少有射击性能与微调和预训练之间的相似性密切相关。

Fine-tuning pre-trained language models has recently become a common practice in building NLP models for various tasks, especially few-shot tasks. We argue that under the few-shot setting, formulating fine-tuning closer to the pre-training objectives shall be able to unleash more benefits from the pre-trained language models. In this work, we take few-shot named entity recognition (NER) for a pilot study, where existing fine-tuning strategies are much different from pre-training. We propose a novel few-shot fine-tuning framework for NER, FFF-NER. Specifically, we introduce three new types of tokens, "is-entity", "which-type" and bracket, so we can formulate the NER fine-tuning as (masked) token prediction or generation, depending on the choice of pre-trained language models. In our experiments, we apply FFF-NER to fine-tune both BERT and BART for few-shot NER on several benchmark datasets and observe significant improvements over existing fine-tuning strategies, including sequence labeling, prototype meta-learning, and prompt-based approaches. We further perform a series of ablation studies, showing few-shot NER performance is strongly correlated with the similarity between fine-tuning and pre-training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题