PT4AL：使用自我监督的借口任务进行主动学习

论文标题

PT4AL：使用自我监督的借口任务进行主动学习

PT4AL: Using Self-Supervised Pretext Tasks for Active Learning

论文作者

Yi, John Seon Keun, Seo, Minseok, Park, Jongchan, Choi, Dong-Geol

论文摘要

标记大量数据很昂贵。主动学习旨在通过要求注释未标记的集合中最有用的数据来解决这个问题。我们提出了一种新颖的主动学习方法，该方法利用自我监督的借口任务和独特的数据采样器来选择既困难又具有代表性的数据。我们发现，简单的自我监督借口任务（例如旋转预测）的损失与下游任务损失密切相关。在主动学习迭代之前，对未标记的集合进行了借口任务学习者进行培训，并且未标记的数据被分类并通过其借口任务损失分组成批次。在每个主动学习迭代中，主要任务模型用于批次在要注释的批次中最不确定的数据。我们评估了有关各种图像分类和分割基准测试的方法，并在CIFAR10，CALTECH-101，IMAGENET和CITYSCAPES上实现令人信服的性能。我们进一步表明，我们的方法在不平衡的数据集上表现良好，并且可以有效地解决寒冷启动问题的解决方案，在这种问题中，主动学习性能受到随机采样的初始标记集的影响。

Labeling a large set of data is expensive. Active learning aims to tackle this problem by asking to annotate only the most informative data from the unlabeled set. We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative. We discover that the loss of a simple self-supervised pretext task, such as rotation prediction, is closely correlated to the downstream task loss. Before the active learning iterations, the pretext task learner is trained on the unlabeled set, and the unlabeled data are sorted and split into batches by their pretext task losses. In each active learning iteration, the main task model is used to sample the most uncertain data in a batch to be annotated. We evaluate our method on various image classification and segmentation benchmarks and achieve compelling performances on CIFAR10, Caltech-101, ImageNet, and Cityscapes. We further show that our method performs well on imbalanced datasets, and can be an effective solution to the cold-start problem where active learning performance is affected by the randomly sampled initial labeled set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题