论文标题
神经网络如何推断:从馈送到图神经网络
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks
论文作者
论文摘要
我们研究如何通过梯度下降来推断的神经网络,即他们在支持训练分布的支持之外学到的知识。先前的作品报告了用神经网络推断时的混合经验结果:虽然前馈神经网络(又称多层感知器(MLP)(MLP),在某些简单任务中不能很好地推断出图形神经网络(GNN) - 具有MLP模块的结构化网络 - 在更复杂的任务中显示出一些成功。为了理论解释,我们确定了MLP和GNN良好推断的条件。首先,我们量化了从原始方向迅速收敛到任何方向的线性函数的观察结果,这意味着Relu MLP不会推断大多数非线性函数。但是,当训练分布足够“多样化”时,他们可以证明他们可以学习线性目标功能。其次,关于分析GNN的成功和局限性,这些结果提出了一个假设,我们提供了理论和经验证据:GNN在推断算法任务中的成功范围(例如,较大的图形或边缘权重)涉及编码任务特定的非构造或架构中的非层次性。我们的理论分析建立在过度参数化网络与神经切线内核的联系基础上。从经验上讲,我们的理论在不同的培训环境中存在。
We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution. Previous works report mixed empirical results when extrapolating with neural networks: while feedforward neural networks, a.k.a. multilayer perceptrons (MLPs), do not extrapolate well in certain simple tasks, Graph Neural Networks (GNNs) -- structured networks with MLP modules -- have shown some success in more complex tasks. Working towards a theoretical explanation, we identify conditions under which MLPs and GNNs extrapolate well. First, we quantify the observation that ReLU MLPs quickly converge to linear functions along any direction from the origin, which implies that ReLU MLPs do not extrapolate most nonlinear functions. But, they can provably learn a linear target function when the training distribution is sufficiently "diverse". Second, in connection to analyzing the successes and limitations of GNNs, these results suggest a hypothesis for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e.g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features. Our theoretical analysis builds on a connection of over-parameterized networks to the neural tangent kernel. Empirically, our theory holds across different training settings.