论文标题
DIST-PU:从标签分布的角度来看的积极未标记的学习
Dist-PU: Positive-Unlabeled Learning from a Label Distribution Perspective
论文作者
论文摘要
积极的未标记(PU)学习试图从一些标记的积极示例中学习二进制分类器,其中许多未标记的示例。与普通的半监督学习相比,由于没有任何已知的负标签,此任务更具挑战性。尽管现有的基于成本敏感的方法已经实现了最先进的性能,但它们明确地将无标记数据分类为负样本的风险最小化,这可能会导致分类器的负预测偏好。为了减轻这个问题,我们诉诸于本文中的PU学习的标签分布观点。注意到,当已知类别的类别的类别时,未标记数据的标签分布已固定,可以自然用作模型的学习监督。在此激励的情况下,我们建议追求预测标签分布和地面标签分布之间的标签分布一致性,这是通过使他们的期望保持一致的。此外,我们进一步采用了熵最小化和混合正则化,以避免在未标记数据上的标签分布一致性的微不足道解决方案,并减轻随之而来的确认偏差。在三个基准数据集上的实验验证了所提出的方法的有效性。编码可用:https://github.com/ray-rui/dist-pu-pu-po-posisity-unlabeled-learning-learning-from-a-a-label-distribution prockitive。
Positive-Unlabeled (PU) learning tries to learn binary classifiers from a few labeled positive examples with many unlabeled ones. Compared with ordinary semi-supervised learning, this task is much more challenging due to the absence of any known negative labels. While existing cost-sensitive-based methods have achieved state-of-the-art performances, they explicitly minimize the risk of classifying unlabeled data as negative samples, which might result in a negative-prediction preference of the classifier. To alleviate this issue, we resort to a label distribution perspective for PU learning in this paper. Noticing that the label distribution of unlabeled data is fixed when the class prior is known, it can be naturally used as learning supervision for the model. Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions, which is formulated by aligning their expectations. Moreover, we further adopt the entropy minimization and Mixup regularization to avoid the trivial solution of the label distribution consistency on unlabeled data and mitigate the consequent confirmation bias. Experiments on three benchmark datasets validate the effectiveness of the proposed method.Code available at: https://github.com/Ray-rui/Dist-PU-Positive-Unlabeled-Learning-from-a-Label-Distribution-Perspective.