论文标题
通过概率空间中梯度流的数据集动力学
Dataset Dynamics via Gradient Flows in Probability Space
论文作者
论文摘要
从生成建模到域的适应性,各种机器学习任务围绕数据集转换和操纵的概念。尽管存在用于转换未标记数据集的各种方法,但缺少标记(例如,分类)数据集的原则方法。在这项工作中,我们提出了一个新颖的数据集变换框架,我们将其视为对数据生成的关节概率分布的优化。我们通过Wasserstein梯度流在概率空间中解决这一类问题,并为灵活但行为良好的目标函数提供了实用,有效的基于粒子的方法。通过各种实验,我们表明该框架可用于对分类数据集施加约束,调整它们以进行转移学习,或重新使用固定或黑色框模型,以高准确性 - 以前看不见的数据集进行分类。
Various machine learning tasks, from generative modeling to domain adaptation, revolve around the concept of dataset transformation and manipulation. While various methods exist for transforming unlabeled datasets, principled methods to do so for labeled (e.g., classification) datasets are missing. In this work, we propose a novel framework for dataset transformation, which we cast as optimization over data-generating joint probability distributions. We approach this class of problems through Wasserstein gradient flows in probability space, and derive practical and efficient particle-based methods for a flexible but well-behaved class of objective functions. Through various experiments, we show that this framework can be used to impose constraints on classification datasets, adapt them for transfer learning, or to re-purpose fixed or black-box models to classify -- with high accuracy -- previously unseen datasets.