研究深度代表性学习中的权力法律

论文标题

研究深度代表性学习中的权力法律

Investigating Power laws in Deep Representation Learning

论文作者

Ghosh, Arna, Mondal, Arnab Kumar, Agrawal, Kumar Krishna, Richards, Blake

论文摘要

表示利用大规模标记数据集的表示，对于机器学习的最新进展至关重要。按大规模访问与任务相关的标签通常很少或昂贵，这激发了通过自我监督学习（SSL）从未标记的数据集中学习的需求。如此大的未标记数据集（带有数据增强）通常可以很好地覆盖基础输入分布。但是，评估SSL算法所学的表示形式仍然需要训练管道中特定于任务的标签样本。此外，特定于任务的编码的概括通常对潜在的分布转移敏感。受理论机器学习和视觉神经科学的最新进展的启发，我们观察到经验特征协方差矩阵的特征经常遵循权力定律。对于视觉表示，我们估计了影响表示目标的三个关键属性，估计了电力法的系数：$α$：学习目标（监督，SIMCLR，Barlow Twins和Byol），网络架构（VGG，Resnet和Vision Transformer）以及任务（对象和场景识别）。我们观察到，在轻度条件下，$α$与1的接近度与下游的概括性能密切相关。此外，$α\ 1 $是在微调过程中标记噪声稳健性的强烈指标。值得注意的是，$α$是可以从表示形式中计算的，而无需任何标签，从而提供了一个框架来评估未标记数据集中的表示质量。

Representation learning that leverages large-scale labelled datasets, is central to recent progress in machine learning. Access to task relevant labels at scale is often scarce or expensive, motivating the need to learn from unlabelled datasets with self-supervised learning (SSL). Such large unlabelled datasets (with data augmentations) often provide a good coverage of the underlying input distribution. However evaluating the representations learned by SSL algorithms still requires task-specific labelled samples in the training pipeline. Additionally, the generalization of task-specific encoding is often sensitive to potential distribution shift. Inspired by recent advances in theoretical machine learning and vision neuroscience, we observe that the eigenspectrum of the empirical feature covariance matrix often follows a power law. For visual representations, we estimate the coefficient of the power law, $α$, across three key attributes which influence representation learning: learning objective (supervised, SimCLR, Barlow Twins and BYOL), network architecture (VGG, ResNet and Vision Transformer), and tasks (object and scene recognition). We observe that under mild conditions, proximity of $α$ to 1, is strongly correlated to the downstream generalization performance. Furthermore, $α\approx 1$ is a strong indicator of robustness to label noise during fine-tuning. Notably, $α$ is computable from the representations without knowledge of any labels, thereby offering a framework to evaluate the quality of representations in unlabelled datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题