分布式随机非凸优化：基于动量的方差降低

论文标题

分布式随机非凸优化：基于动量的方差降低

Distributed Stochastic Non-Convex Optimization: Momentum-Based Variance Reduction

论文作者

Khanduri, Prashant, Sharma, Pranay, Kafle, Swatantra, Bulusu, Saikiran, Rajawat, Ketan, Varshney, Pramod K.

论文摘要

在这项工作中，我们提出了一种分布式算法，用于随机非凸优化。我们考虑了一个工人服务器体系结构，其中一组$ k $ Worker节点（WNS）与服务器节点（SN）共同旨在最小化全球，潜在的非Convex目标函数。将目标函数假定为每个WN上可用的局部目标函数的总和，每个节点仅访问其局部目标函数的随机样本。与现有方法相反，我们采用了基于动量的“单回路”分布式算法，该算法消除了计算大批量尺寸梯度以实现差异的需求。我们提出了两种具有“自适应”的算法，另一个具有“非自适应”学习率。我们表明，所提出的算法达到了最佳的计算复杂性，同时以WNS数量达到线性加速。具体来说，该算法达到$ε$ -Stationary Point $ x_a $，带有$ \ Mathbb {e} \ | \ nabla f（x_a）\ | \ leq \ tilde {o}（k^{ - 1/3} t^{ - 1/2} + k^{ - 1/3} t^{ - 1/3}）$ in $ t $ iterations中，从而需要$ \ tilde {o}（k^o}（k^{ - 1} { - 1}} $ gnn此外，我们的方法并未假定在WNS之间进行相同的数据分布，使该方法足够一般地用于联合学习应用程序。

In this work, we propose a distributed algorithm for stochastic non-convex optimization. We consider a worker-server architecture where a set of $K$ worker nodes (WNs) in collaboration with a server node (SN) jointly aim to minimize a global, potentially non-convex objective function. The objective function is assumed to be the sum of local objective functions available at each WN, with each node having access to only the stochastic samples of its local objective function. In contrast to the existing approaches, we employ a momentum based "single loop" distributed algorithm which eliminates the need of computing large batch size gradients to achieve variance reduction. We propose two algorithms one with "adaptive" and the other with "non-adaptive" learning rates. We show that the proposed algorithms achieve the optimal computational complexity while attaining linear speedup with the number of WNs. Specifically, the algorithms reach an $ε$-stationary point $x_a$ with $\mathbb{E}\| \nabla f(x_a) \| \leq \tilde{O}(K^{-1/3}T^{-1/2} + K^{-1/3}T^{-1/3})$ in $T$ iterations, thereby requiring $\tilde{O}(K^{-1} ε^{-3})$ gradient computations at each WN. Moreover, our approach does not assume identical data distributions across WNs making the approach general enough for federated learning applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题