通过坐标优化了解深层学习

论文标题

通过坐标优化了解深层学习

Understanding Deep Contrastive Learning via Coordinate-wise Optimization

论文作者

Tian, Yuandong

论文摘要

我们表明，在广泛的损失功能（包括Infonce）下，对比度学习（CL）具有统一的对网络参数$ \boldsymbolθ$和成对重要性$α$的优化，其中\ emph {max player} $ \boldsymbolθ$在相反的情况下进行了对比的代表，并在\ emph的范围内进行\ emph的plumers \ emph} $} $} $。共享类似表示的样本。所得的配方称为$α$ -cl，不仅统一了各种现有的对比损失，这与样本对重要性$α$的构建方式不同，而且还能够推断出以外的新对比损失，从而开辟了相反损失设计的新途径。这些新型损失比经典Infonce在CIFAR10，STL-10和CIFAR-100上产生可比的（或更好）的性能。此外，我们还详细分析了最大播放器：我们证明，使用固定的$α$，Max Player等于深线性线性网络的主成分分析（PCA），几乎所有本地最小值都是全局和等级-1，恢复了最佳PCA解决方案。最后，我们将对Max Player的分析扩展到2层Relu网络，表明其固定点可以具有更高的排名。

We show that Contrastive Learning (CL) under a broad family of loss functions (including InfoNCE) has a unified formulation of coordinate-wise optimization on the network parameter $\boldsymbolθ$ and pairwise importance $α$, where the \emph{max player} $\boldsymbolθ$ learns representation for contrastiveness, and the \emph{min player} $α$ puts more weights on pairs of distinct samples that share similar representations. The resulting formulation, called $α$-CL, unifies not only various existing contrastive losses, which differ by how sample-pair importance $α$ is constructed, but also is able to extrapolate to give novel contrastive losses beyond popular ones, opening a new avenue of contrastive loss design. These novel losses yield comparable (or better) performance on CIFAR10, STL-10 and CIFAR-100 than classic InfoNCE. Furthermore, we also analyze the max player in detail: we prove that with fixed $α$, max player is equivalent to Principal Component Analysis (PCA) for deep linear network, and almost all local minima are global and rank-1, recovering optimal PCA solutions. Finally, we extend our analysis on max player to 2-layer ReLU networks, showing that its fixed points can have higher ranks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题