并非所有未标记的数据都是平等的：学习在半监督学习中加权数据

论文标题

并非所有未标记的数据都是平等的：学习在半监督学习中加权数据

Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning

论文作者

Ren, Zhongzheng, Yeh, Raymond A., Schwing, Alexander G.

论文摘要

现有的半监督学习（SSL）算法使用单个重量来平衡标记和未标记的示例的损失，即所有未标记的示例都同样加权。但是，并非所有未标记的数据都相等。在本文中，我们研究了如何在每个未标记的示例中使用不同的重量。不再有可能对所有这些权重进行手动调整（如先前的工作中所做的那样）。取而代之的是，我们根据影响函数通过算法调整这些权重，这是模型对一个训练示例的依赖性的度量。为了使方法有效，我们提出了影响函数的快速有效近似。我们证明，该技术在半监督图像和语言分类任务上的最先进方法优于最先进的方法。

Existing semi-supervised learning (SSL) algorithms use a single weight to balance the loss of labeled and unlabeled examples, i.e., all unlabeled examples are equally weighted. But not all unlabeled data are equal. In this paper we study how to use a different weight for every unlabeled example. Manual tuning of all those weights -- as done in prior work -- is no longer possible. Instead, we adjust those weights via an algorithm based on the influence function, a measure of a model's dependency on one training example. To make the approach efficient, we propose a fast and effective approximation of the influence function. We demonstrate that this technique outperforms state-of-the-art methods on semi-supervised image and language classification tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题