关于分散非凸优化的分歧

论文标题

关于分散非凸优化的分歧

On the Divergence of Decentralized Non-Convex Optimization

论文作者

Hong, Mingyi, Zeng, Siliang, Zhang, Junyu, Sun, Haoran

论文摘要

我们研究了一类分散算法的一类分散算法，其中$ n $代理共同优化了非凸目标$ f（u）：= 1/n \ sum_ {i = 1}^{n} f_i（u）$，而仅与邻居进行交流。这类问题已在建模许多信号处理和机器学习应用中变得很流行，并且已经提出了许多有效的算法。但是，通过构建一些反例，我们表明，当某些本地功能梯度$ \ nabla f_i $上的某些本地Lipschitz条件（LLC）不满足时，即使全球lipschitz条件（GLC）满足了Sum function $ f $ f $ f $ f $ f $ f $ f $ f $ flipChitz梯度。该观察结果提出了一个重要的开放问题：当不满足LLC甚至GLC时，如何设计分散的算法？为了解决上述问题，我们设计了一种称为多阶段梯度跟踪算法（Magenta）的一阶算法，该算法能够使用LLC和GLC计算固定解决方案。特别是，我们表明所提出的算法会收敛到某些$ε$ - 定位解决方案，其中精确的速率取决于各种算法和问题参数。特别是，如果本地函数$ f_i $是$ q $ th订单多项式，则速率变为$ \ MATHCAL {o}（1/ε^{q-1}）$。对于$ q = 2 $的特殊情况，每个$ f_i $都满足llc的特殊情况很紧张。据我们所知，这是研究分散的非凸优化问题的第一次尝试，而不是LLC和GLC。

We study a generic class of decentralized algorithms in which $N$ agents jointly optimize the non-convex objective $f(u):=1/N\sum_{i=1}^{N}f_i(u)$, while only communicating with their neighbors. This class of problems has become popular in modeling many signal processing and machine learning applications, and many efficient algorithms have been proposed. However, by constructing some counter-examples, we show that when certain local Lipschitz conditions (LLC) on the local function gradient $\nabla f_i$'s are not satisfied, most of the existing decentralized algorithms diverge, even if the global Lipschitz condition (GLC) is satisfied, where the sum function $f$ has Lipschitz gradient. This observation raises an important open question: How to design decentralized algorithms when the LLC, or even the GLC, is not satisfied? To address the above question, we design a first-order algorithm called Multi-stage gradient tracking algorithm (MAGENTA), which is capable of computing stationary solutions with neither the LLC nor the GLC. In particular, we show that the proposed algorithm converges sublinearly to certain $ε$-stationary solution, where the precise rate depends on various algorithmic and problem parameters. In particular, if the local function $f_i$'s are $Q$th order polynomials, then the rate becomes $\mathcal{O}(1/ε^{Q-1})$. Such a rate is tight for the special case of $Q=2$ where each $f_i$ satisfies LLC. To our knowledge, this is the first attempt that studies decentralized non-convex optimization problems with neither the LLC nor the GLC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题