论文标题

深度等于内核制度中的Relu Networks

Deep Equals Shallow for ReLU Networks in Kernel Regimes

论文作者

Bietti, Alberto, Bach, Francis

论文摘要

就近似而言,深层网络通常被认为比浅网络更具有表现力。实际上,某些功能可以被深层网络比浅网络更有效地近似,但是,没有可拖延的算法以学习此类深层模型而闻名。另外,最近的一项工作表明,经过梯度下降训练的深网可能在某个过度参数化的制度中表现得像(可处理的)内核方法,其中内核是由架构和初始化确定的,本文重点介绍了此类内核的近似值。我们表明,对于Relu激活,从深度完全连接的网络中得出的内核具有与它们的浅两层相对物相同的近似属性,即相应的积分操作员的特征值衰减相同。这突出了内核框架的局限性,以理解这种深层建筑的好处。我们的主要理论结果依赖于表征这种特征值衰减通过内核函数的不同性能,这也很容易适用于对球体上定义的其他内核的研究。

Deep networks are often considered to be more expressive than shallow ones in terms of approximation. Indeed, certain functions can be approximated by deep networks provably more efficiently than by shallow ones, however, no tractable algorithms are known for learning such deep models. Separately, a recent line of work has shown that deep networks trained with gradient descent may behave like (tractable) kernel methods in a certain over-parameterized regime, where the kernel is determined by the architecture and initialization, and this paper focuses on approximation for such kernels. We show that for ReLU activations, the kernels derived from deep fully-connected networks have essentially the same approximation properties as their shallow two-layer counterpart, namely the same eigenvalue decay for the corresponding integral operator. This highlights the limitations of the kernel framework for understanding the benefits of such deep architectures. Our main theoretical result relies on characterizing such eigenvalue decays through differentiability properties of the kernel function, which also easily applies to the study of other kernels defined on the sphere.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源