论文标题
MutexMatch:基于静音的一致性正则化的半监督学习
MutexMatch: Semi-Supervised Learning with Mutex-Based Consistency Regularization
论文作者
论文摘要
半监督学习(SSL)的核心问题在于如何有效利用未标记的数据,而大多数现有的方法倾向于非常强调对高信心样本的利用,但很少完全探索对低信心样品的使用。在本文中,我们旨在以一种新颖的方式利用我们提出的基于静音的一致性正则化的新型方式,即合格。具体而言,需要高信心样本来准确预测常规的真实阳性分类器的“它是什么”,而使用低信心样本来实现一个更简单的目标 - 以轻松的方式预测“不是真实的分类器”。从这个意义上讲,我们不仅减轻了伪标记的错误,而且还通过差异度的一致性充分利用了低信任的不贴标记数据。 MutexMatch在多个基准数据集(即CIFAR-10,CIFAR-100,SVHN,STL,STL-10,MINI-IMAGENET和TINY-IMAGENET)上取得了出色的性能。更重要的是,当标记数据的量稀缺时,我们的方法进一步显示出优势,例如,仅在CIFAR-10上只有20个标记的数据,即92.23%的精度。我们的代码和模型权重已在https://github.com/njuyued/mutexmatch4ssl上发布。
The core issue in semi-supervised learning (SSL) lies in how to effectively leverage unlabeled data, whereas most existing methods tend to put a great emphasis on the utilization of high-confidence samples yet seldom fully explore the usage of low-confidence samples. In this paper, we aim to utilize low-confidence samples in a novel way with our proposed mutex-based consistency regularization, namely MutexMatch. Specifically, the high-confidence samples are required to exactly predict "what it is" by conventional True-Positive Classifier, while the low-confidence samples are employed to achieve a simpler goal -- to predict with ease "what it is not" by True-Negative Classifier. In this sense, we not only mitigate the pseudo-labeling errors but also make full use of the low-confidence unlabeled data by consistency of dissimilarity degree. MutexMatch achieves superior performance on multiple benchmark datasets, i.e., CIFAR-10, CIFAR-100, SVHN, STL-10, mini-ImageNet and Tiny-ImageNet. More importantly, our method further shows superiority when the amount of labeled data is scarce, e.g., 92.23% accuracy with only 20 labeled data on CIFAR-10. Our code and model weights have been released at https://github.com/NJUyued/MutexMatch4SSL.