论文标题
转移学习的信息理论分析
Information-theoretic analysis for transfer learning
论文作者
论文摘要
转移学习或域的适应性与机器学习问题有关,其中培训和测试数据可能来自不同的分布(分别为$μ$和$μ'$)。在这项工作中,我们在Russo和Zhou发起的一系列工作之后,就概括错误和转移学习算法的过多风险进行了信息理论分析。我们的结果也许表明,也许正如预期的那样,Kullback-Leibler(KL)Divergence $ d(Mu || mu')$在表征域适应设置的概括误差方面起着重要作用。具体而言,我们为通用转移学习算法提供了概括误差上限,并将结果扩展到特定的经验风险最小化(ERM)算法,其中两个分布的数据在训练阶段都可用。我们进一步将该方法应用于迭代,嘈杂的梯度下降算法,并获得可以轻松计算的上限,仅使用学习算法的参数。提供了一些说明性示例,以证明结果的有用性。特别是,在特定分类问题中,我们的界限比使用Rademacher复杂性得出的结合问题更紧密。
Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different distributions (denoted as $μ$ and $μ'$, respectively). In this work, we give an information-theoretic analysis on the generalization error and the excess risk of transfer learning algorithms, following a line of work initiated by Russo and Zhou. Our results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence $D(mu||mu')$ plays an important role in characterizing the generalization error in the settings of domain adaptation. Specifically, we provide generalization error upper bounds for general transfer learning algorithms and extend the results to a specific empirical risk minimization (ERM) algorithm where data from both distributions are available in the training phase. We further apply the method to iterative, noisy gradient descent algorithms, and obtain upper bounds which can be easily calculated, only using parameters from the learning algorithms. A few illustrative examples are provided to demonstrate the usefulness of the results. In particular, our bound is tighter in specific classification problems than the bound derived using Rademacher complexity.