论文标题
Sparseadapter:提高适配器参数效率的简单方法
SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters
论文作者
论文摘要
适配器的调整冻结了验证的语言模型(PLM),并且仅微调一些额外的模块,它成为完整模型微调的有效替代方案。尽管在计算上有效,但最近的适配器通常会增加参数(例如瓶颈维度),以匹配完整模型微调的性能,我们认为这违背了它们的最初意图。在这项工作中,我们通过网络修剪的镜头重新检查了适配器的参数效率(我们将插件概念命名为\ texttt {sparseadapter}),并发现SparSeadapter在稀疏比最多达到80 \%时可以比标准适配器实现比标准适配器的可比性或更好的性能。根据我们的发现,我们介绍了一个简单但有效的设置``\ textit {last-sparse}'',以提高在相同的参数预算下适配器的模型容量。在三个高级PLM的五个竞争适配器上进行的实验表明,使用适当的稀疏方法(例如,剪切)和比率(例如40 \%)Sparseadapter可以始终优于其相应的对应物。令人鼓舞的是,在\ textit {大sparse}设置中,我们可以获得进一步的吸引力,甚至超过了大量的微调。我们的代码将在以下网址发布:https://github.com/shwai-he/sparseadapter。
Adapter Tuning, which freezes the pretrained language models (PLMs) and only fine-tunes a few extra modules, becomes an appealing efficient alternative to the full model fine-tuning. Although computationally efficient, the recent Adapters often increase parameters (e.g. bottleneck dimension) for matching the performance of full model fine-tuning, which we argue goes against their original intention. In this work, we re-examine the parameter-efficiency of Adapters through the lens of network pruning (we name such plug-in concept as \texttt{SparseAdapter}) and find that SparseAdapter can achieve comparable or better performance than standard Adapters when the sparse ratio reaches up to 80\%. Based on our findings, we introduce an easy but effective setting ``\textit{Large-Sparse}'' to improve the model capacity of Adapters under the same parameter budget. Experiments on five competitive Adapters upon three advanced PLMs show that with proper sparse method (e.g. SNIP) and ratio (e.g. 40\%) SparseAdapter can consistently outperform their corresponding counterpart. Encouragingly, with the \textit{Large-Sparse} setting, we can obtain further appealing gains, even outperforming the full fine-tuning by a large margin. Our code will be released at: https://github.com/Shwai-He/SparseAdapter.