论文标题

理解非线性模型外推到看不见域的第一步

First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains

论文作者

Dong, Kefan, Ma, Tengyu

论文摘要

现实世界中的机器学习应用程序通常涉及将神经网络部署到在培训时间未见的域中。因此,我们需要了解非线性模型的外推 - 在分布和功能类别的哪些条件下,可以保证模型推断到新的测试分布。这个问题非常具有挑战性,因为即使是两层神经网络,也不能保证在训练分布的支持之外推断,而没有对域转移的进一步假设。本文为分析非线性模型的结构域移位而采取了一些初步步骤。我们主要考虑设置数据的每个坐标的边际分布(或坐标子集)在整个培训和测试分布中没有显着变化,但是联合分布可能会有更大的变化。我们证明了表格$ f(x)= \ sum f_i(x_i)$的非线性模型的家族,其中$ f_i $是特征$ x_i $子集的任意函数,如果功能的协调能力良好,则可以推断出不见的分布。据我们所知,这是超越线性模型和有界密度比假设的第一个结果,即使对分布移位和函数类别的假设进行了风格化,即使是对线性模型的假设。

Real-world machine learning applications often involve deploying neural networks to domains that are not seen in the training time. Hence, we need to understand the extrapolation of nonlinear models -- under what conditions on the distributions and function class, models can be guaranteed to extrapolate to new test distributions. The question is very challenging because even two-layer neural networks cannot be guaranteed to extrapolate outside the support of the training distribution without further assumptions on the domain shift. This paper makes some initial steps toward analyzing the extrapolation of nonlinear models for structured domain shift. We primarily consider settings where the marginal distribution of each coordinate of the data (or subset of coordinates) does not shift significantly across the training and test distributions, but the joint distribution may have a much bigger shift. We prove that the family of nonlinear models of the form $f(x)=\sum f_i(x_i)$, where $f_i$ is an arbitrary function on the subset of features $x_i$, can extrapolate to unseen distributions, if the covariance of the features is well-conditioned. To the best of our knowledge, this is the first result that goes beyond linear models and the bounded density ratio assumption, even though the assumptions on the distribution shift and function class are stylized.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源