NTK了解深度学习中的概括的局限性

论文标题

NTK了解深度学习中的概括的局限性

Limitations of the NTK for Understanding Generalization in Deep Learning

论文作者

Vyas, Nikhil, Bansal, Yamini, Nakkiran, Preetum

论文摘要

``神经切线内核'（NTK）（Jacot等人，2018年）及其经验变体被提议作为捕获真实神经网络某些行为的代理。在这项工作中，我们通过缩放定律的角度研究NTK，并证明它们无法解释神经网络概括的重要方面。特别是，我们证明了现实的设置，与初始化时相应的经验和无限NTK相比，有限宽度神经网络具有更好的数据缩放指数。这揭示了真实网络和NTK之间的更根本差异，而不是几个百分点的测试准确性点。此外，我们表明，即使允许经验NTK在恒定数量的样本上进行预训练，也不会赶上神经网络缩放。最后，我们表明，经验NTK在整个培训的大部分培训中都在不断发展，这与先前的工作相反，这表明它在经过几个时期的培训后稳定。总的来说，我们的工作确立了NTK方法在理解自然数据集对真实网络的概括方面的具体限制。

The ``Neural Tangent Kernel'' (NTK) (Jacot et al 2018), and its empirical variants have been proposed as a proxy to capture certain behaviors of real neural networks. In this work, we study NTKs through the lens of scaling laws, and demonstrate that they fall short of explaining important aspects of neural network generalization. In particular, we demonstrate realistic settings where finite-width neural networks have significantly better data scaling exponents as compared to their corresponding empirical and infinite NTKs at initialization. This reveals a more fundamental difference between the real networks and NTKs, beyond just a few percentage points of test accuracy. Further, we show that even if the empirical NTK is allowed to be pre-trained on a constant number of samples, the kernel scaling does not catch up to the neural network scaling. Finally, we show that the empirical NTK continues to evolve throughout most of the training, in contrast with prior work which suggests that it stabilizes after a few epochs of training. Altogether, our work establishes concrete limitations of the NTK approach in understanding generalization of real networks on natural datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题