论文标题
vab-al:将阶级不平衡和困难与各种贝叶斯进行活跃学习
VaB-AL: Incorporating Class Imbalance and Difficulty with Variational Bayes for Active Learning
论文作者
论文摘要
在很大程度上研究了对判别模型的积极学习,重点是单个样本,而更加强调班级的分布方式或难以处理哪些类别。在这项工作中,我们表明这是有害的。我们提出了一种基于贝叶斯规则的方法,可以自然地将阶级失衡纳入主动学习框架中。我们得出,在估计分类器犯错的概率时,应共同考虑三个术语; i)误标准的概率,ii)给定类别的数据的可能性,iii)关于预测类的丰度的先验概率。实施这些术语需要一个生成模型和棘手的似然估计。因此,为此,我们训练一个变异自动编码器(VAE)。为了进一步将VAE与分类器联系起来并促进VAE培训,我们使用分类器的深度特征表示作为VAE的输入。通过考虑所有三种概率,尤其是数据不平衡,我们可以在有限的数据预算下实质上提高现有方法的潜力。我们表明,我们的方法可以应用于多个不同数据集上的分类任务(包括一个具有重度数据不平衡的现实世界数据集),可以极大地表现出色的状态。
Active Learning for discriminative models has largely been studied with the focus on individual samples, with less emphasis on how classes are distributed or which classes are hard to deal with. In this work, we show that this is harmful. We propose a method based on the Bayes' rule, that can naturally incorporate class imbalance into the Active Learning framework. We derive that three terms should be considered together when estimating the probability of a classifier making a mistake for a given sample; i) probability of mislabelling a class, ii) likelihood of the data given a predicted class, and iii) the prior probability on the abundance of a predicted class. Implementing these terms requires a generative model and an intractable likelihood estimation. Therefore, we train a Variational Auto Encoder (VAE) for this purpose. To further tie the VAE with the classifier and facilitate VAE training, we use the classifiers' deep feature representations as input to the VAE. By considering all three probabilities, among them especially the data imbalance, we can substantially improve the potential of existing methods under limited data budget. We show that our method can be applied to classification tasks on multiple different datasets -- including one that is a real-world dataset with heavy data imbalance -- significantly outperforming the state of the art.