论文标题
银河系:基于图的主动学习
GALAXY: Graph-based Active Learning at the Extreme
论文作者
论文摘要
主动学习是一种训练高效模型的标签效率方法,同时仅选择用于标记和培训的无标记数据的小亚集。在“开放世界”设置中,感兴趣的类别可以占整体数据集的一小部分 - 大多数数据可能被视为分布或无关的类别。这导致了极端的阶级不平衡,我们的理论和方法集中在这个核心问题上。我们提出了一种称为Galaxy(极端图基于图的主动学习)的活跃学习的新策略,该策略将基于图的主动学习和深度学习的想法融合在一起。与大多数其他主动学习方法相比,Galaxy自动和自适应地选择了更多的类平衡示例作为标签。我们的理论表明,星系执行一种精致的不确定性取样形式,与香草不确定性抽样相比,收集了更高的类平衡数据集。在实验上,我们证明了Galaxy优于从流行数据集生成的不平衡视力分类设置中现有的最先进的深度积极学习算法。
Active learning is a label-efficient approach to train highly effective models while interactively selecting only small subsets of unlabelled data for labelling and training. In "open world" settings, the classes of interest can make up a small fraction of the overall dataset -- most of the data may be viewed as an out-of-distribution or irrelevant class. This leads to extreme class-imbalance, and our theory and methods focus on this core issue. We propose a new strategy for active learning called GALAXY (Graph-based Active Learning At the eXtrEme), which blends ideas from graph-based active learning and deep learning. GALAXY automatically and adaptively selects more class-balanced examples for labeling than most other methods for active learning. Our theory shows that GALAXY performs a refined form of uncertainty sampling that gathers a much more class-balanced dataset than vanilla uncertainty sampling. Experimentally, we demonstrate GALAXY's superiority over existing state-of-art deep active learning algorithms in unbalanced vision classification settings generated from popular datasets.