论文标题
关于Nesterov在随机设置中加速梯度方法的收敛性
On the Convergence of Nesterov's Accelerated Gradient Method in Stochastic Settings
论文作者
论文摘要
我们研究了Nesterov的加速梯度方法,具有随机近似设置(具有有界方差的无偏梯度)和有限-SUM设置(其中随机性是由于采样小批次引起的),在随机近似设置(无偏梯度)中具有恒定的台阶和动量参数。为了更好地了解Nesterov在随机设置中的行为,我们整个目标都集中在平稳,强烈键盘且两次连续区分的目标上。在随机近似设置中,Nesterov的方法以与确定性设置相同的加速速率收敛到最佳点的邻域。也许令人惊讶的是,在有限的和设置中,我们证明Nesterov的方法可能会与通常选择的阶梯尺寸和动量相反,除非满足与调理和数据相干性有关的问题。我们的结果阐明了为什么内斯特罗夫的方法可能无法在有限和设置中融合或实现加速度。
We study Nesterov's accelerated gradient method with constant step-size and momentum parameters in the stochastic approximation setting (unbiased gradients with bounded variance) and the finite-sum setting (where randomness is due to sampling mini-batches). To build better insight into the behavior of Nesterov's method in stochastic settings, we focus throughout on objectives that are smooth, strongly-convex, and twice continuously differentiable. In the stochastic approximation setting, Nesterov's method converges to a neighborhood of the optimal point at the same accelerated rate as in the deterministic setting. Perhaps surprisingly, in the finite-sum setting, we prove that Nesterov's method may diverge with the usual choice of step-size and momentum, unless additional conditions on the problem related to conditioning and data coherence are satisfied. Our results shed light as to why Nesterov's method may fail to converge or achieve acceleration in the finite-sum setting.