是时候增加自我监督的视觉表示学习

论文标题

是时候增加自我监督的视觉表示学习

Time to augment self-supervised visual representation learning

论文作者

Aubret, Arthur, Ernst, Markus, Teulière, Céline, Triesch, Jochen

论文摘要

生物视觉系统在没有监督的情况下学习视觉表示的能力是无与伦比的。在机器学习中，自学学习（SSL）已导致以无监督的方式形成对象表示的重大进展。这样的系统学会了对图像的增强操作不变的表示，例如裁剪或翻转。相比之下，生物视觉系统利用自然与物体相互作用期间视觉体验的时间结构。这可以访问SSL中常用的“增强”，例如从多个观点或不同背景观察相同的对象。在这里，我们系统地调查并比较了自然互动期间这种基于时间的增强对学习对象类别的潜在优势。我们的结果表明，基于时间的增强量超过了最先进的图像增强功能。具体而言，我们的分析表明：1）3-D对象操纵大大改善了对象类别的学习； 2）在不断变化的背景下查看对象对于学习从潜在表示中丢弃与背景相关的信息很重要。总体而言，我们得出的结论是，与物体自然互动期间基于时间的增强可以大大改善自我监督的学习，从而缩小人工和生物视觉系统之间的差距。

Biological vision systems are unparalleled in their ability to learn visual representations without supervision. In machine learning, self-supervised learning (SSL) has led to major advances in forming object representations in an unsupervised fashion. Such systems learn representations invariant to augmentation operations over images, like cropping or flipping. In contrast, biological vision systems exploit the temporal structure of the visual experience during natural interactions with objects. This gives access to "augmentations" not commonly used in SSL, like watching the same object from multiple viewpoints or against different backgrounds. Here, we systematically investigate and compare the potential benefits of such time-based augmentations during natural interactions for learning object categories. Our results show that time-based augmentations achieve large performance gains over state-of-the-art image augmentations. Specifically, our analyses reveal that: 1) 3-D object manipulations drastically improve the learning of object categories; 2) viewing objects against changing backgrounds is important for learning to discard background-related information from the latent representation. Overall, we conclude that time-based augmentations during natural interactions with objects can substantially improve self-supervised learning, narrowing the gap between artificial and biological vision systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题