SGD训练的深神经网络的概括误差范围

论文标题

SGD训练的深神经网络的概括误差范围

Generalization Error Bounds for Deep Neural Networks Trained by SGD

论文作者

Wang, Mingze, Ma, Chao

论文摘要

通过随机梯度下降（SGD）训练的深神经网络的概括误差边界是通过结合对适当参数规范的动态控制和基于参数规范的Rademacher复杂性估算的。界限明确取决于训练轨迹的损失，并为包括多层perceptron（MLP）和卷积神经网络（CNN）在内的广泛网络体系结构工作。与其他基于统一的稳定性界限（例如基于统一的范围）相比，我们的边界不需要$ l $ -Smorthens nonConvex损耗函数，并且直接应用于SGD而不是随机Langevin梯度下降（SGLD）。数值结果表明，我们的边界对优化器和网络超级参数的变化是不变且健壮的。

Generalization error bounds for deep neural networks trained by stochastic gradient descent (SGD) are derived by combining a dynamical control of an appropriate parameter norm and the Rademacher complexity estimate based on parameter norms. The bounds explicitly depend on the loss along the training trajectory, and work for a wide range of network architectures including multilayer perceptron (MLP) and convolutional neural networks (CNN). Compared with other algorithm-depending generalization estimates such as uniform stability-based bounds, our bounds do not require $L$-smoothness of the nonconvex loss function, and apply directly to SGD instead of Stochastic Langevin gradient descent (SGLD). Numerical results show that our bounds are non-vacuous and robust with the change of optimizer and network hyperparameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题