变形金刚是RNN：具有线性注意的快速自回旋变压器

论文标题

变形金刚是RNN：具有线性注意的快速自回旋变压器

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

论文作者

Katharopoulos, Angelos, Vyas, Apoorv, Pappas, Nikolaos, Fleuret, François

论文摘要

变形金刚在几个任务中实现了显着的性能，但是由于其二次复杂性，就输入的长度而言，它们在很长的序列中非常慢。为了解决这一限制，我们将自我注意力表示为内核特征图的线性点产物，并利用矩阵产品的关联属性，以降低$ \ nathcal {o} \ left（n^2 \ oright）$ to $ \ weft（n^2 \ oright）$到$ \ mathcal {o} {o} {o} \ left（n \ weft（n \ right）$ n $ n是$ n $，我们表明，这种配方允许迭代实现，该实施极大地加速了自回归变压器，并揭示了它们与经常性神经网络的关系。我们的线性变压器的性能与香草变形金刚的性能相似，并且在长时间的自回归预测中，它们的性能快4000倍。

Transformers achieve remarkable performance in several tasks but due to their quadratic complexity, with respect to the input's length, they are prohibitively slow for very long sequences. To address this limitation, we express the self-attention as a linear dot-product of kernel feature maps and make use of the associativity property of matrix products to reduce the complexity from $\mathcal{O}\left(N^2\right)$ to $\mathcal{O}\left(N\right)$, where $N$ is the sequence length. We show that this formulation permits an iterative implementation that dramatically accelerates autoregressive transformers and reveals their relationship to recurrent neural networks. Our linear transformers achieve similar performance to vanilla transformers and they are up to 4000x faster on autoregressive prediction of very long sequences.

下载PDF全文

下载文献需遵守相关版权规定

论文标题