论文标题
积极学习开放设定注释
Active Learning for Open-set Annotation
论文作者
论文摘要
现有的主动学习研究通常通过假设所有要标记的数据示例均来自已知类别,从而在封闭设置中起作用。但是,在实际注释任务中,未标记的数据通常包含来自未知类别的大量示例,从而导致大多数活跃的学习方法的失败。为了解决此开放设定注释(OSA)问题,我们提出了一个名为LFOSA的新的活跃学习框架,该框架通过有效的采样策略来提高分类性能,以精确检测已知类别的示例以进行注释。 LFOSA框架引入了一个辅助网络,以使用高斯混合模型对每个示例最大激活值(MAV)分布进行建模,该模型可以动态地选择未标记集中已知类别的最高概率的示例。此外,通过降低损失函数的温度$ t $,将通过利用已知和未知监督来进一步优化检测模型。实验结果表明,所提出的方法可以显着提高已知类别的选择质量,并比最新的主动学习方法获得更高的注释成本的分类精度。据我们所知,这是为开放设定注释而积极学习的第一部作品。
Existing active learning studies typically work in the closed-set setting by assuming that all data examples to be labeled are drawn from known classes. However, in real annotation tasks, the unlabeled data usually contains a large amount of examples from unknown classes, resulting in the failure of most active learning methods. To tackle this open-set annotation (OSA) problem, we propose a new active learning framework called LfOSA, which boosts the classification performance with an effective sampling strategy to precisely detect examples from known classes for annotation. The LfOSA framework introduces an auxiliary network to model the per-example max activation value (MAV) distribution with a Gaussian Mixture Model, which can dynamically select the examples with highest probability from known classes in the unlabeled set. Moreover, by reducing the temperature $T$ of the loss function, the detection model will be further optimized by exploiting both known and unknown supervision. The experimental results show that the proposed method can significantly improve the selection quality of known classes, and achieve higher classification accuracy with lower annotation cost than state-of-the-art active learning methods. To the best of our knowledge, this is the first work of active learning for open-set annotation.