论文标题
神经网络训练技术正规化优化轨迹:一项实证研究
Neural Network Training Techniques Regularize Optimization Trajectory: An Empirical Study
论文作者
论文摘要
现代深度神经网络(DNN)培训利用各种培训技术,例如非线性激活功能,批处理归一化,跳过连接等。尽管它们有效,但仍然神秘,它们如何帮助加速实践中的DNN培训。在本文中,我们提供了这些训练技术对DNN优化的正则作用的经验研究。具体而言,我们发现成功的DNN培训的优化轨迹始终遵守某种规律性原则,该原则将模型更新方向规范化以与轨迹方向保持一致。从理论上讲,我们表明这种规律性原理会导致非凸优化的收敛保证,并且收敛速率取决于正则化参数。从经验上讲,我们发现应用训练技术的DNN培训实现了快速的收敛,并使用较大的正则化参数遵守规律性原理,这意味着模型更新与轨迹非常一致。另一方面,没有训练技术的DNN培训具有缓慢的收敛性,并使用较小的正则化参数遵守规律性原理,这意味着模型更新与轨迹并不十分一致。因此,不同的培训技术通过规律性原理将模型更新方向定期,以促进收敛。
Modern deep neural network (DNN) trainings utilize various training techniques, e.g., nonlinear activation functions, batch normalization, skip-connections, etc. Despite their effectiveness, it is still mysterious how they help accelerate DNN trainings in practice. In this paper, we provide an empirical study of the regularization effect of these training techniques on DNN optimization. Specifically, we find that the optimization trajectories of successful DNN trainings consistently obey a certain regularity principle that regularizes the model update direction to be aligned with the trajectory direction. Theoretically, we show that such a regularity principle leads to a convergence guarantee in nonconvex optimization and the convergence rate depends on a regularization parameter. Empirically, we find that DNN trainings that apply the training techniques achieve a fast convergence and obey the regularity principle with a large regularization parameter, implying that the model updates are well aligned with the trajectory. On the other hand, DNN trainings without the training techniques have slow convergence and obey the regularity principle with a small regularization parameter, implying that the model updates are not well aligned with the trajectory. Therefore, different training techniques regularize the model update direction via the regularity principle to facilitate the convergence.