预测有限的深神经网络的产出

论文标题

预测有限的深神经网络的产出

Predicting the outputs of finite deep neural networks trained with noisy gradients

论文作者

Naveh, Gadi, Ben-David, Oded, Sompolinsky, Haim, Ringel, Zohar

论文摘要

最近的一系列作品研究了宽深的神经网络（DNN），通过将它们近似为高斯过程（GPS）。显示了经过梯度流训练的DNN映射到由神经切线核（NTK）控制的GP，而较早的作品显示具有I.I.D.的DNN。先前，其权重映射到所谓的神经网络高斯过程（NNGP）。在这里，我们考虑了DNN训练方案，涉及噪声，重量衰减和有限宽度，其结果对应于某些非高斯随机过程。然后引入一个分析框架来分析这一非高斯过程，该过程的偏离由有限的宽度控制。我们的贡献是三倍：（i）在无限宽度极限中，我们在接受嘈杂梯度的DNN和NNGP而不是NTK之间建立了对应关系。（ii）我们为具有任意激活功能和深度的DNN的有限宽度校正（FWC）提供了一般的分析形式，并使用它来预测具有高精度的经验有限网络的输出。分析FWC行为是$ n $，训练集的尺寸，我们发现它对于非常小的$ n $制度都可以忽略不计，而且令人惊讶的是，对于大$ n $制度（其中GP错误缩放为$ O（1/n）$）。（iii）我们以代数为代数如何相对于图像分类任务的GP对应物来改善有限卷积神经网络（CNN）的性能。

A recent line of works studied wide deep neural networks (DNNs) by approximating them as Gaussian Processes (GPs). A DNN trained with gradient flow was shown to map to a GP governed by the Neural Tangent Kernel (NTK), whereas earlier works showed that a DNN with an i.i.d. prior over its weights maps to the so-called Neural Network Gaussian Process (NNGP). Here we consider a DNN training protocol, involving noise, weight decay and finite width, whose outcome corresponds to a certain non-Gaussian stochastic process. An analytical framework is then introduced to analyze this non-Gaussian process, whose deviation from a GP is controlled by the finite width. Our contribution is three-fold: (i) In the infinite width limit, we establish a correspondence between DNNs trained with noisy gradients and the NNGP, not the NTK. (ii) We provide a general analytical form for the finite width correction (FWC) for DNNs with arbitrary activation functions and depth and use it to predict the outputs of empirical finite networks with high accuracy. Analyzing the FWC behavior as a function of $n$, the training set size, we find that it is negligible for both the very small $n$ regime, and, surprisingly, for the large $n$ regime (where the GP error scales as $O(1/n)$). (iii) We flesh out algebraically how these FWCs can improve the performance of finite convolutional neural networks (CNNs) relative to their GP counterparts on image classification tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题