注意的夸克

论文标题

The Quarks of Attention

论文作者

Baldi, Pierre, Vershynin, Roman

论文摘要

注意在自然和人工智能系统中都起着基本作用。在深度学习中，基于注意力的神经体系结构（例如变压器体系结构）被广泛用于解决自然语言处理中的问题。在这里，我们研究了关注及其计算属性的基本构件。在深度学习的标准模型中，我们根据其来源，目标和计算机制对所有可能的基本关注块进行了分类。我们识别并研究了三种最重要的机制：加性激活注意力，乘法输出注意力（输出门控）和乘法突触关注（突触门控）。门控机制对应于标准模型的乘法扩展，并在所有基于注意力的深度学习体系结构中使用。我们研究了它们的功能特性，并在线性和多项式阈值门的情况下估算了几个注意力构建块的能力。令人惊讶的是，加性激活注意力在下限的证据中起着核心作用。注意机制减少了某些基本电路的深度，并利用了二次激活的力量，而不会产生全部成本。

Attention plays a fundamental role in both natural and artificial intelligence systems. In deep learning, attention-based neural architectures, such as transformer architectures, are widely used to tackle problems in natural language processing and beyond. Here we investigate the fundamental building blocks of attention and their computational properties. Within the standard model of deep learning, we classify all possible fundamental building blocks of attention in terms of their source, target, and computational mechanism. We identify and study three most important mechanisms: additive activation attention, multiplicative output attention (output gating), and multiplicative synaptic attention (synaptic gating). The gating mechanisms correspond to multiplicative extensions of the standard model and are used across all current attention-based deep learning architectures. We study their functional properties and estimate the capacity of several attentional building blocks in the case of linear and polynomial threshold gates. Surprisingly, additive activation attention plays a central role in the proofs of the lower bounds. Attention mechanisms reduce the depth of certain basic circuits and leverage the power of quadratic activations without incurring their full cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题