论文标题
对比聚类
Contrastive Clustering
论文作者
论文摘要
在本文中,我们提出了一种称为“对比聚类(CC)”的单阶段在线聚类方法,该方法明确执行实例和群集级对比度学习。要具体而言,对于给定的数据集,正和负实例对是通过数据增强构建的,然后投影到特征空间中。在其中,实例和群集级对比度学习分别通过最大化正对的相似性,同时最大程度地减少负面的对比,在行和列空间中分别进行。我们的主要观察结果是,特征矩阵的行可以被视为实例的软标签,因此,可以将列进一步视为群集表示。通过同时优化实例和群集级的对比损失,该模型以端到端的方式共同学习表示和集群分配。广泛的实验结果表明,在六个具有挑战性的图像基准上,CC明显优于17种竞争聚类方法。特别是,与最佳基线相比,CC在CIFAR-10(CIFAR-100)数据集上的NMI达到0.705(0.431),该数据集的NMI最高为19 \%(39 \%)的性能提高。
In this paper, we propose a one-stage online clustering method called Contrastive Clustering (CC) which explicitly performs the instance- and cluster-level contrastive learning. To be specific, for a given dataset, the positive and negative instance pairs are constructed through data augmentations and then projected into a feature space. Therein, the instance- and cluster-level contrastive learning are respectively conducted in the row and column space by maximizing the similarities of positive pairs while minimizing those of negative ones. Our key observation is that the rows of the feature matrix could be regarded as soft labels of instances, and accordingly the columns could be further regarded as cluster representations. By simultaneously optimizing the instance- and cluster-level contrastive loss, the model jointly learns representations and cluster assignments in an end-to-end manner. Extensive experimental results show that CC remarkably outperforms 17 competitive clustering methods on six challenging image benchmarks. In particular, CC achieves an NMI of 0.705 (0.431) on the CIFAR-10 (CIFAR-100) dataset, which is an up to 19\% (39\%) performance improvement compared with the best baseline.