论文标题
CoolMomentum:Langevin Dynamics具有模拟退火的随机优化方法
CoolMomentum: A Method for Stochastic Optimization by Langevin Dynamics with Simulated Annealing
论文作者
论文摘要
深度学习应用需要全球优化具有多个局部最小值的非凸目标功能。通常在物理模拟中发现了同样的问题,并且可以通过模拟退火的Langevin动力学方法来解决,这是最大程度地降低许多颗粒电位的方法。这种类比为机器学习中的非凸随机优化提供了有用的见解。在这里,我们发现离散的Langevin方程的集成给出了等效于著名动量优化算法的坐标更新规则。作为主要结果,我们表明,从接近统一的初始值逐渐下降,直到零等效于在物理上的应用等效等于模拟退火或缓慢冷却。利用这种新颖的方法,我们提出了CoolMomentum - 一种新的随机优化方法。将CoolMomentum应用于CIFAR-10数据集对Resnet-20的优化,并在ImageNet上进行有效网络-B0,我们证明了它能够达到高精度。
Deep learning applications require global optimization of non-convex objective functions, which have multiple local minima. The same problem is often found in physical simulations and may be resolved by the methods of Langevin dynamics with Simulated Annealing, which is a well-established approach for minimization of many-particle potentials. This analogy provides useful insights for non-convex stochastic optimization in machine learning. Here we find that integration of the discretized Langevin equation gives a coordinate updating rule equivalent to the famous Momentum optimization algorithm. As a main result, we show that a gradual decrease of the momentum coefficient from the initial value close to unity until zero is equivalent to application of Simulated Annealing or slow cooling, in physical terms. Making use of this novel approach, we propose CoolMomentum -- a new stochastic optimization method. Applying Coolmomentum to optimization of Resnet-20 on Cifar-10 dataset and Efficientnet-B0 on Imagenet, we demonstrate that it is able to achieve high accuracies.