论文标题

重新思考积极标记学习的课堂优点估计

Rethinking Class-Prior Estimation for Positive-Unlabeled Learning

论文作者

Yao, Yu, Liu, Tongliang, Han, Bo, Gong, Mingming, Niu, Gang, Sugiyama, Masashi, Tao, Dacheng

论文摘要

只有阳性(P)和未标记的数据,PU学习可以训练二进制分类器而无需​​任何负数据。它有两个基础:PU类优点估计(CPE)和PU分类;后者对后者进行了很好的研究,而前者受到了较少的关注。迄今为止,无分布 - 无需CPE方法取决于一个关键假设,即对正数据分布的支持不能包含在负数据分布中。如果违反了这一点,那些CPE方法将系统地高估了班级的先验。更糟糕的是,我们无法基于数据验证假设。在本文中,我们将CPE重新考虑为PU Learning-Can,我们删除了使CPE始终有效的假设?我们通过提出重组CPE(RECPE)来展示肯定的答案,该重新组件构建辅助概率分布,以便支持正数据分布的支持在负面数据分布中永远不会包含。 RECPE可以通过将其视为基本方法来使用任何CPE方法。从理论上讲,如果假设已经存在原始概率分布,则RECPE不会影响其基础;否则,它减少了基础的积极偏见。从经验上讲,RECPE改善了各种数据集上的所有最新CPE方法,这意味着这里确实违反了该假设。

Given only positive (P) and unlabeled (U) data, PU learning can train a binary classifier without any negative data. It has two building blocks: PU class-prior estimation (CPE) and PU classification; the latter has been well studied while the former has received less attention. Hitherto, the distributional-assumption-free CPE methods rely on a critical assumption that the support of the positive data distribution cannot be contained in the support of the negative data distribution. If this is violated, those CPE methods will systematically overestimate the class prior; it is even worse that we cannot verify the assumption based on the data. In this paper, we rethink CPE for PU learning-can we remove the assumption to make CPE always valid? We show an affirmative answer by proposing Regrouping CPE (ReCPE) that builds an auxiliary probability distribution such that the support of the positive data distribution is never contained in the support of the negative data distribution. ReCPE can work with any CPE method by treating it as the base method. Theoretically, ReCPE does not affect its base if the assumption already holds for the original probability distribution; otherwise, it reduces the positive bias of its base. Empirically, ReCPE improves all state-of-the-art CPE methods on various datasets, implying that the assumption has indeed been violated here.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源