带有内存重播的变压器

论文标题

带有内存重播的变压器

Transformer with Memory Replay

论文作者

Liu, Rui, Mozafari, Barzan

论文摘要

变形金刚通过对大规模文本语料库进行预培训来实现自然语言处理任务的最新性能。它们非常计算密集型，样品复杂性很高。内存重播是一种机制，可以通过保存并从内存缓冲区保存并重新播放过去的示例并重新恢复示例。由于样品效率的提高，它已成功用于增强学习和甘恩。在本文中，我们提出\ emph {带有内存重播}（TMR）的变压器，该}（TMR）将内存重播与变压器集成，从而使变压器更有效率。胶水和小队基准数据集的实验表明，与基线变压器模型相比，具有相同数量的示例时，带有内存重放的变压器至少增加了$ 1 \％$点。此外，通过采用仔细的设计来减少记忆重播的墙壁锁定时间开销，我们还实现了更好的运行时效率。

Transformers achieve state-of-the-art performance for natural language processing tasks by pre-training on large-scale text corpora. They are extremely compute-intensive and have very high sample complexity. Memory replay is a mechanism that remembers and reuses past examples by saving to and replaying from a memory buffer. It has been successfully used in reinforcement learning and GANs due to better sample efficiency. In this paper, we propose \emph{Transformer with Memory Replay} (TMR), which integrates memory replay with transformer, making transformer more sample-efficient. Experiments on GLUE and SQuAD benchmark datasets show that Transformer with Memory Replay achieves at least $1\%$ point increase compared to the baseline transformer model when pretrained with the same number of examples. Further, by adopting a careful design that reduces the wall-clock time overhead of memory replay, we also empirically achieve a better runtime efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题