超级启示：迅速的变压器任务调节

论文标题

超级启示：迅速的变压器任务调节

HyperPrompt: Prompt-based Task-Conditioning of Transformers

论文作者

He, Yun, Zheng, Huaixiu Steven, Tay, Yi, Gupta, Jai, Du, Yu, Aribandi, Vamsi, Zhao, Zhe, Li, YaGuang, Chen, Zhao, Metzler, Donald, Cheng, Heng-Tze, Chi, Ed H.

论文摘要

迅速调整是以参数有效的方式对预训练的预训练语言模型的新范式。在这里，我们探讨了超网的使用来产生超准备力：我们提出了超启示，这是一种新型的架构，用于基于变形金刚中自我注意的迅速任务条件。超预要是通过超网络通过一代人来学习的端到端。 HyperPrompt允许网络学习特定于任务的功能图，在这些特征图中，超预要是要参与的查询的任务全局记忆，同时启用了任务之间的灵活信息共享。我们表明，HyperPrompt与强大的多任务学习基线具有竞争力，其额外的任务条件参数为$ 0.14 \％\％，实现了很高的参数和计算效率。通过广泛的经验实验，我们证明，超级启示可以比强大的T5多任务学习基准和参数有效的适配器变体获得卓越的性能，包括及时调整和超大型语言++的自然语言理解胶水和超级速度的自然语言基准。

Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient way. Here, we explore the use of HyperNetworks to generate hyper-prompts: we propose HyperPrompt, a novel architecture for prompt-based task-conditioning of self-attention in Transformers. The hyper-prompts are end-to-end learnable via generation by a HyperNetwork. HyperPrompt allows the network to learn task-specific feature maps where the hyper-prompts serve as task global memories for the queries to attend to, at the same time enabling flexible information sharing among tasks. We show that HyperPrompt is competitive against strong multi-task learning baselines with as few as $0.14\%$ of additional task-conditioning parameters, achieving great parameter and computational efficiency. Through extensive empirical experiments, we demonstrate that HyperPrompt can achieve superior performances over strong T5 multi-task learning baselines and parameter-efficient adapter variants including Prompt-Tuning and HyperFormer++ on Natural Language Understanding benchmarks of GLUE and SuperGLUE across many model sizes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题