论文标题
学习自我监管的视觉变形金刚的自我调节的对抗观点
Learning Self-Regularized Adversarial Views for Self-Supervised Vision Transformers
论文作者
论文摘要
自动数据增强(自动说一项)策略在视力变压器的监督培训方案中是必不可少的,并导致了监督学习的最新结果。尽管取得了成功,但它在自我监督视觉变压器上的发展和应用受到了几个障碍的阻碍,包括高搜索成本,缺乏监督和不合适的搜索空间。在这项工作中,我们建议通过解决上述障碍来学习一种自我调节的对抗性自动说明方法AutoView,以学习自我监督视力变压器的观点。首先,通过同时学习视图和网络参数,我们将自动图的搜索成本降低到几乎为零,分别在单一前进的步骤中,分别在不同的增强视图中最小化和最大化相互信息。然后,为避免由于缺乏标签监督而导致的信息崩溃,我们提出了一个自我调节的损失项,以保证信息传播。此外,我们通过修改设计用于监督学习的一般搜索空间,为自我监督学习提供一个精选的增强策略搜索空间,以进行自我监督学习。在ImageNet上,我们的自动浏览量比Randaug基线( +10.2%K-NN的精度)取得了显着改善,并且通过明确的边距(高达 +1.3%K-NN的精度)始终优于SOTA手动调整视图策略。广泛的实验表明,自动浏览预处理也有益于下游任务(ADE20K语义细分的MACC +1.2%的MACC和重新访问的牛津图像检索基准的 +2.8%的映射),并改善了模型的鲁棒性(ImagEnet-A上的 +2.3%top-1 ACC和ImageNet-a和 +1.0%AUPR在Imagenet-O上)。代码和模型将在https://github.com/trent-tangtao/autoview上找到。
Automatic data augmentation (AutoAugment) strategies are indispensable in supervised data-efficient training protocols of vision transformers, and have led to state-of-the-art results in supervised learning. Despite the success, its development and application on self-supervised vision transformers have been hindered by several barriers, including the high search cost, the lack of supervision, and the unsuitable search space. In this work, we propose AutoView, a self-regularized adversarial AutoAugment method, to learn views for self-supervised vision transformers, by addressing the above barriers. First, we reduce the search cost of AutoView to nearly zero by learning views and network parameters simultaneously in a single forward-backward step, minimizing and maximizing the mutual information among different augmented views, respectively. Then, to avoid information collapse caused by the lack of label supervision, we propose a self-regularized loss term to guarantee the information propagation. Additionally, we present a curated augmentation policy search space for self-supervised learning, by modifying the generally used search space designed for supervised learning. On ImageNet, our AutoView achieves remarkable improvement over RandAug baseline (+10.2% k-NN accuracy), and consistently outperforms sota manually tuned view policy by a clear margin (up to +1.3% k-NN accuracy). Extensive experiments show that AutoView pretraining also benefits downstream tasks (+1.2% mAcc on ADE20K Semantic Segmentation and +2.8% mAP on revisited Oxford Image Retrieval benchmark) and improves model robustness (+2.3% Top-1 Acc on ImageNet-A and +1.0% AUPR on ImageNet-O). Code and models will be available at https://github.com/Trent-tangtao/AutoView.