论文标题

通过坐标优化了解深层学习

Understanding Deep Contrastive Learning via Coordinate-wise Optimization

论文作者

Tian, Yuandong

论文摘要

我们表明,在广泛的损失功能(包括Infonce)下,对比度学习(CL)具有统一的对网络参数$ \boldsymbolθ$和成对重要性$α$的优化,其中\ emph {max player} $ \boldsymbolθ$在相反的情况下进行了对比的代表,并在\ emph的范围内进行\ emph的plumers \ emph} $} $} $。共享类似表示的样本。所得的配方称为$α$ -cl,不仅统一了各种现有的对比损失,这与样本对重要性$α$的构建方式不同,而且还能够推断出以外的新对比损失,从而开辟了相反损失设计的新途径。这些新型损失比经典Infonce在CIFAR10,STL-10和CIFAR-100上产生可比的(或更好)的性能。此外,我们还详细分析了最大播放器:我们证明,使用固定的$α$,Max Player等于深线性线性网络的主成分分析(PCA),几乎所有本地最小值都是全局和等级-1,恢复了最佳PCA解决方案。最后,我们将对Max Player的分析扩展到2层Relu网络,表明其固定点可以具有更高的排名。

We show that Contrastive Learning (CL) under a broad family of loss functions (including InfoNCE) has a unified formulation of coordinate-wise optimization on the network parameter $\boldsymbolθ$ and pairwise importance $α$, where the \emph{max player} $\boldsymbolθ$ learns representation for contrastiveness, and the \emph{min player} $α$ puts more weights on pairs of distinct samples that share similar representations. The resulting formulation, called $α$-CL, unifies not only various existing contrastive losses, which differ by how sample-pair importance $α$ is constructed, but also is able to extrapolate to give novel contrastive losses beyond popular ones, opening a new avenue of contrastive loss design. These novel losses yield comparable (or better) performance on CIFAR10, STL-10 and CIFAR-100 than classic InfoNCE. Furthermore, we also analyze the max player in detail: we prove that with fixed $α$, max player is equivalent to Principal Component Analysis (PCA) for deep linear network, and almost all local minima are global and rank-1, recovering optimal PCA solutions. Finally, we extend our analysis on max player to 2-layer ReLU networks, showing that its fixed points can have higher ranks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源