了解逐渐域适应的自我训练

论文标题

了解逐渐域适应的自我训练

Understanding Self-Training for Gradual Domain Adaptation

论文作者

Kumar, Ananya, Ma, Tengyu, Liang, Percy

论文摘要

机器学习系统必须适应随着时间的流逝而发展的数据分布，在从传感器网络和自动驾驶汽车感知模块到脑机界面的应用程序中。我们考虑渐进域的适应性，其中的目标是适应在源域上训练的初始分类器，仅给出了未标记的数据，该数据在分布向目标域逐渐移动。我们证明，在直接适应目标域的设置下，在自我训练的误差和逐渐变化的误差上，第一个非呈现上限可能会导致无限误差。理论分析导致算法见解，强调即使我们拥有无限数据，正则化和标签锐化也是必不可少的，并且表明自我训练对于与小瓦斯斯特海斯坦（Wasserstein-Infinity）距离的偏移效果特别好。利用渐进的偏移结构会导致旋转的MNIST数据集和现实的肖像数据集上的更高精度。

Machine learning systems must adapt to data distributions that evolve over time, in applications ranging from sensor networks and self-driving car perception modules to brain-machine interfaces. We consider gradual domain adaptation, where the goal is to adapt an initial classifier trained on a source domain given only unlabeled data that shifts gradually in distribution towards a target domain. We prove the first non-vacuous upper bound on the error of self-training with gradual shifts, under settings where directly adapting to the target domain can result in unbounded error. The theoretical analysis leads to algorithmic insights, highlighting that regularization and label sharpening are essential even when we have infinite data, and suggesting that self-training works particularly well for shifts with small Wasserstein-infinity distance. Leveraging the gradual shift structure leads to higher accuracies on a rotating MNIST dataset and a realistic Portraits dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题