Arelu：基于注意力的整流线性单元

论文标题

Arelu：基于注意力的整流线性单元

AReLU: Attention-based Rectified Linear Unit

论文作者

Chen, Dengsheng, Li, Jun, Xu, Kai

论文摘要

通过影响表达能力和学习动力学，元素的激活功能在深层神经网络中起着关键作用。基于学习的激活功能最近已获得越来越多的关注和成功。我们通过使用元素注意机制来提出可学习激活功能的新观点。在每个网络层中，我们设计了一个注意模块，该模块可以学习一个基于元素的，基于符号的注意力图，以获取前激活特征图。注意图根据其符号缩放一个元素。用整流的线性单元（RELU）添加注意模块会导致正元素的放大和对负元素的抑制，包括学习，数据适应性参数。我们将产生的激活函数基于注意力的整流线性单元（ARELU）创造。注意模块基本上学习了输入激活部分的元素残留物，因为Relu可以看作是身份转换。这使网络培训更具梯度消失的抵抗力。学识渊博的专心激活导致了特征图的相关区域的聚焦激活。通过广泛的评估，我们表明Arelu显着提高了大多数主流网络体系结构的性能，每层仅引入两个额外的可学习参数。值得注意的是，Arelu促进了在较小的学习率下的快速网络培训，这使得在转移学习和元学习的情况下特别适合。我们的源代码已发布（请参阅https://github.com/densechen/arelu）。

Element-wise activation functions play a critical role in deep neural networks via affecting the expressivity power and the learning dynamics. Learning-based activation functions have recently gained increasing attention and success. We propose a new perspective of learnable activation function through formulating them with element-wise attention mechanism. In each network layer, we devise an attention module which learns an element-wise, sign-based attention map for the pre-activation feature map. The attention map scales an element based on its sign. Adding the attention module with a rectified linear unit (ReLU) results in an amplification of positive elements and a suppression of negative ones, both with learned, data-adaptive parameters. We coin the resulting activation function Attention-based Rectified Linear Unit (AReLU). The attention module essentially learns an element-wise residue of the activated part of the input, as ReLU can be viewed as an identity transformation. This makes the network training more resistant to gradient vanishing. The learned attentive activation leads to well-focused activation of relevant regions of a feature map. Through extensive evaluations, we show that AReLU significantly boosts the performance of most mainstream network architectures with only two extra learnable parameters per layer introduced. Notably, AReLU facilitates fast network training under small learning rates, which makes it especially suited in the case of transfer learning and meta learning. Our source code has been released (see https://github.com/densechen/AReLU).

下载PDF全文

下载文献需遵守相关版权规定

论文标题