论文标题
本地SGD比Minibatch SGD好吗?
Is Local SGD Better than Minibatch SGD?
论文作者
论文摘要
我们研究局部SGD(也称为平行SGD和联合平均),这是一种天然且常用的随机分布式优化方法。目前缺乏其理论基础,我们强调了凸设置中所有现有的错误保证如何以简单的基线Minibatch SGD为主导。 (1)对于二次目标,我们证明本地SGD严格主导了Minibatch SGD,并且加速的本地SGD对四次统治是最小的; (2)对于一般凸目标,我们提供了第一个保证,即至少有时会改善Minibatch SGD; (3)我们表明,确实通过对本地SGD的性能呈现低于Minibatch SGD保证的局部SGD的下限,实际上本地SGD并不能主导Minibatch SGD。
We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD and that accelerated local SGD is minimax optimal for quadratics; (2) For general convex objectives we provide the first guarantee that at least sometimes improves over minibatch SGD; (3) We show that indeed local SGD does not dominate minibatch SGD by presenting a lower bound on the performance of local SGD that is worse than the minibatch SGD guarantee.