基于视觉变压器的基于自适应注意链接的正规化

论文标题

基于视觉变压器的基于自适应注意链接的正规化

Adaptive Attention Link-based Regularization for Vision Transformers

论文作者

Jin, Heegon, Choi, Jongwon

论文摘要

尽管最近在各种视觉任务中使用了变压器网络，但要培训模型以无视归纳偏见，需要大量的培训数据和冗长的训练时间。利用预先训练的卷积神经网络（CNN）和视觉变压器（VIT）的关注主管的渠道空间注意力之间的可训练联系，我们提出了一种正则化技术，以提高VIT的训练效率。可训练的链接被称为注意力增强模块，该模块同时培训了VIT，从而促进了VIT的培训，并允许其避免由于缺乏数据而引起的过度拟合问题。从受过训练的注意力增强模块中，我们可以提取每个CNN激活图与每个VIT注意力头之间的相关关系，并且基于此，我们还提出了一个先进的注意增强模块。因此，即使使用少量数据，建议的方法也大大提高了VIT的性能，同时在训练过程中实现了更快的收敛速度。

Although transformer networks are recently employed in various vision tasks with outperforming performance, extensive training data and a lengthy training time are required to train a model to disregard an inductive bias. Using trainable links between the channel-wise spatial attention of a pre-trained Convolutional Neural Network (CNN) and the attention head of Vision Transformers (ViT), we present a regularization technique to improve the training efficiency of ViT. The trainable links are referred to as the attention augmentation module, which is trained simultaneously with ViT, boosting the training of ViT and allowing it to avoid the overfitting issue caused by a lack of data. From the trained attention augmentation module, we can extract the relevant relationship between each CNN activation map and each ViT attention head, and based on this, we also propose an advanced attention augmentation module. Consequently, even with a small amount of data, the suggested method considerably improves the performance of ViT while achieving faster convergence during training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题