自我监督的元学习，用于几次自然语言分类任务

论文标题

自我监督的元学习，用于几次自然语言分类任务

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

论文作者

Bansal, Trapit, Jha, Rishikesh, Munkhdalai, Tsendsuren, McCallum, Andrew

论文摘要

变压器模型的自我监督预训练已彻底改变了NLP的应用。这种使用语言建模目标的预训练为参数提供了有用的初始点，这些参数可以很好地推广到具有微调的新任务。但是，微调仍然是数据效率低下 - 当标记的示例很少时，准确性可能很低。可以通过直接优化预培训以在未来的微调中优化预培训，而很少有示例来提高数据效率；这可以将其视为元学习问题。但是，标准的元学习技术需要许多培训任务才能概括。不幸的是，通常很难找到一组此类监督任务。本文提出了一种自我监督的方法，以从未标记的文本中产生大型，丰富的元学习任务分布。这是使用披风风格的目标来实现的，但通过仅从少数词汇术语中收集到准代币来创建单独的多级分类任务。这产生了与词汇术语的子集的数量一样多的独特元训练任务。我们使用最新的元学习框架进行了对任务分布的变压器模型。在17个NLP任务中，我们表明，这种元训练会导致比语言模型预训练更高的概括，然后进行填充。此外，我们展示了如何将自我监督的任务与元学习的监督任务结合在一起，从而比以前的监督元学习提供了可观的准确性。

Self-supervised pre-training of transformer models has revolutionized NLP applications. Such pre-training with language modeling objectives provides a useful initial point for parameters that generalize well to new tasks with fine-tuning. However, fine-tuning is still data inefficient -- when there are few labeled examples, accuracy can be low. Data efficiency can be improved by optimizing pre-training directly for future fine-tuning with few examples; this can be treated as a meta-learning problem. However, standard meta-learning techniques require many training tasks in order to generalize; unfortunately, finding a diverse set of such supervised tasks is usually difficult. This paper proposes a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text. This is achieved using a cloze-style objective, but creating separate multi-class classification tasks by gathering tokens-to-be blanked from among only a handful of vocabulary terms. This yields as many unique meta-training tasks as the number of subsets of vocabulary terms. We meta-train a transformer model on this distribution of tasks using a recent meta-learning framework. On 17 NLP tasks, we show that this meta-training leads to better few-shot generalization than language-model pre-training followed by finetuning. Furthermore, we show how the self-supervised tasks can be combined with supervised tasks for meta-learning, providing substantial accuracy gains over previous supervised meta-learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题