论文标题
与内在的学习相比,很少有参数效率的微调更好,更便宜
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
论文作者
论文摘要
几乎没有射击的内在学习学习(ICL)使预训练的语言模型可以通过为输入的一部分提供少量的培训示例来执行以前的未见任务,而无需任何基于梯度的培训。 ICL会产生大量的计算,内存和存储成本,因为它每次进行预测时都涉及处理所有培训示例。参数有效的微调(PEFT)(例如,适配器模块,提示调谐,稀疏更新方法等)提供了一种替代范式,其中训练了一组参数以启用模型来执行新任务。在本文中,我们严格地比较了几个ICL和PEFT,并证明后者提供了更好的准确性,并大大降低了计算成本。一路上,我们引入了一种称为(IA)$^3 $的新PEFT方法,该方法通过学习的向量来扩展激活,从而获得更强的性能,同时仅引入相对较少的新参数。我们还提出了一个基于称为T-FEW的T0模型的简单食谱,可以将其应用于新任务而无需特定任务的调整或修改。我们通过将T-FEW应用于木筏基准,首次达到超人性能,并以6%的绝对性能超过6%的最先进的表现来验证T-FEW对完全看不见的任务的有效性。我们实验中使用的所有代码均可公开使用。
Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unseen task without any gradient-based training by feeding a small number of training examples as part of the input. ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. Parameter-efficient fine-tuning (PEFT) (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and PEFT and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs. Along the way, we introduce a new PEFT method called (IA)$^3$ that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters. We also propose a simple recipe based on the T0 model called T-Few that can be applied to new tasks without task-specific tuning or modifications. We validate the effectiveness of T-Few on completely unseen tasks by applying it to the RAFT benchmark, attaining super-human performance for the first time and outperforming the state-of-the-art by 6% absolute. All of the code used in our experiments is publicly available.