一个简单的视觉表示对比学习的框架

论文标题

一个简单的视觉表示对比学习的框架

A Simple Framework for Contrastive Learning of Visual Representations

论文作者

Chen, Ting, Kornblith, Simon, Norouzi, Mohammad, Hinton, Geoffrey

论文摘要

本文介绍了SIMCLR：一个简单的视觉表示学习框架。我们简化了最近提出的对比反对的自我监督学习算法，而无需专门的体系结构或内存库。为了了解什么使对比的预测任务可以学习有用的表示形式，我们会系统地研究框架的主要组成部分。我们表明，（1）数据增强的组成在定义有效的预测任务中起着至关重要的作用，（2）引入表示形式和对比性损失之间的可学习的非线性转换实质上可提高学会表示的质量，以及（3）从较大的批次规模和与受监督的学习相比，从较大的批次大小和更多的培训步骤中提高了对比度学习益处。通过将这些发现结合起来，我们能够在Imagenet上进行自我监督和半监督学习的先前方法。在SIMCLR学到的自我监管表示的培训的线性分类器实现了76.5％的TOP-1准确性，这比以前的最先进的相对相对提高了7％，与监督Resnet-50的性能相匹配。当仅在1％的标签上进行微调时，我们获得了85.8％的前5个精度，优于少量标签的Alexnet。

This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100X fewer labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题