广义类别发现的参数分类：基线研究

论文标题

广义类别发现的参数分类：基线研究

Parametric Classification for Generalized Category Discovery: A Baseline Study

论文作者

Wen, Xin, Zhao, Bingchen, Qi, Xiaojuan

论文摘要

广义类别发现（GCD）旨在使用从标记的样本中学到的知识来发现未标记的数据集中的新型类别。先前的研究表明，参数分类器容易过度拟合看到的类别，并使用由半监督K-均值的非参数分类器认可。但是，在这项研究中，我们研究了参数分类器的故障，在可用的高质量监督时验证先前设计选择的有效性，并确定不可靠的伪标记作为关键问题。我们证明存在两个预测偏见：分类器倾向于更频繁地预测所见类，并在可见和新颖的类别之间产生不平衡的分布。基于这些发现，我们提出了一种简单而有效的参数分类方法，该方法受益于熵正则化，在多个GCD基准上实现最先进的性能，并对未知类别的数字显示出强大的鲁棒性。我们希望调查和拟议的简单框架可以成为促进该领域未来研究的强大基准。我们的代码可在以下网址提供：https：//github.com/cvmi-lab/simgcd。

Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples. Previous studies argued that parametric classifiers are prone to overfitting to seen categories, and endorsed using a non-parametric classifier formed with semi-supervised k-means. However, in this study, we investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem. We demonstrate that two prediction biases exist: the classifier tends to predict seen classes more often, and produces an imbalanced distribution across seen and novel categories. Based on these findings, we propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers. We hope the investigation and proposed simple framework can serve as a strong baseline to facilitate future studies in this field. Our code is available at: https://github.com/CVMI-Lab/SimGCD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题