具有全球收敛保证的神经网络中的功能学习

论文标题

具有全球收敛保证的神经网络中的功能学习

On Feature Learning in Neural Networks with Global Convergence Guarantees

论文作者

Chen, Zhengdao, Vanden-Eijnden, Eric, Bruna, Joan

论文摘要

我们通过设置中的梯度流量（GF）研究了宽阔的神经网络（NNS），这些设置允许特征学习，同时允许非反应全局收敛保证。首先，对于平均场缩放下的宽浅NN，并且具有一般的激活功能，我们证明，当输入维度不小于训练集的大小时，训练损耗在GF下以线性速率收敛到零。在此分析的基础上，我们研究了一个宽阔的多层NN模型，其二到层的层是通过GF训练的，为此我们还证明了训练损耗的线性率收敛到零，但无论输入维度如何。我们还从经验上表明，与神经切线内核（NTK）制度不同，我们的多层模型展示了特征学习的特征学习，并且可以比其NTK对应物获得更好的概括性能。

We study the optimization of wide neural networks (NNs) via gradient flow (GF) in setups that allow feature learning while admitting non-asymptotic global convergence guarantees. First, for wide shallow NNs under the mean-field scaling and with a general class of activation functions, we prove that when the input dimension is no less than the size of the training set, the training loss converges to zero at a linear rate under GF. Building upon this analysis, we study a model of wide multi-layer NNs whose second-to-last layer is trained via GF, for which we also prove a linear-rate convergence of the training loss to zero, but regardless of the input dimension. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.

下载PDF全文

下载文献需遵守相关版权规定

论文标题