斯巴达：参数有效变压器的稀疏分层内存

论文标题

斯巴达：参数有效变压器的稀疏分层内存

SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers

论文作者

Deshpande, Ameet, Sultan, Md Arafat, Ferritto, Anthony, Kalyan, Ashwin, Narasimhan, Karthik, Sil, Avirup

论文摘要

微调预训练的语言模型（PLM）在一系列下游任务上取得了令人印象深刻的表现，因此它们的尺寸越来越大。由于每个任务都需要不同的模型副本，因此对于手机等存储约束的边缘设备，此范式是不可行的。在本文中，我们提出了Spartan，这是边缘设备的参数效率（PE）和计算快速体系结构，在每个变压器层之后添加了层次结构化的稀疏内存。 Spartan仅将PLM参数和微调冻结，因此通过重新使用PLM主链来显着降低存储成本。 Spartan包含两个级别的记忆，每个输入中只有一个稀疏的父母子集，而与那些用于计算输出表示形式的父母相对应的儿童细胞。与PE基线（适配器）相比，这种稀疏性与其他体系结构的优化相结合，在推断Raspberry Pi 4期间，Spartan的吞吐量提高了90％以上，同时在粘合基准上也优于后者0.1分。此外，可以在几次设置中更快地训练34％，同时在适配器的0.9点内进行。定性分析表明，斯巴达人的不同母细胞专门研究不同的主题，从而有效地划分了责任。

Fine-tuning pre-trained language models (PLMs) achieves impressive performance on a range of downstream tasks, and their sizes have consequently been getting bigger. Since a different copy of the model is required for each task, this paradigm is infeasible for storage-constrained edge devices like mobile phones. In this paper, we propose SPARTAN, a parameter efficient (PE) and computationally fast architecture for edge devices that adds hierarchically organized sparse memory after each Transformer layer. SPARTAN freezes the PLM parameters and fine-tunes only its memory, thus significantly reducing storage costs by re-using the PLM backbone for different tasks. SPARTAN contains two levels of memory, with only a sparse subset of parents being chosen in the first level for each input, and children cells corresponding to those parents being used to compute an output representation. This sparsity combined with other architecture optimizations improves SPARTAN's throughput by over 90% during inference on a Raspberry Pi 4 when compared to PE baselines (adapters) while also outperforming the latter by 0.1 points on the GLUE benchmark. Further, it can be trained 34% faster in a few-shot setting, while performing within 0.9 points of adapters. Qualitative analysis shows that different parent cells in SPARTAN specialize in different topics, thus dividing responsibility efficiently.

下载PDF全文

下载文献需遵守相关版权规定

论文标题