与依赖数据的约束非凸优化的一阶方法的收敛

论文标题

与依赖数据的约束非凸优化的一阶方法的收敛

Convergence of First-Order Methods for Constrained Nonconvex Optimization with Dependent Data

论文作者

Alacaoglu, Ahmet, Lyu, Hanbaek

论文摘要

我们着重于在一般依赖数据采样方案下分析经典的随机投影梯度方法，以进行约束的平滑非凸优化。我们显示了最差的收敛速率$ \ tilde {o}（t^{ - 1/4}）$和复杂性$ \ \ tilde {o}（\ varepsilon^{ - 4}）$，以实现$ \ varepsilon $ - varepsilon $ - near near nestary nestary nestary native点的阶梯阶级的正常值和渐变的渐变。虽然经典的收敛保证需要I.I.D.来自目标分布的数据采样，我们只需要有条件分布的轻度混合条件，这适用于多种Markov链采样算法。这可以通过$ \ tilde {o}（\ varepsilon^{ - 8}）$从$ \ tilde {o}到$ \ tilde {o}（\ varepsilon^ - 4}）$的相关数据来提高现有的复杂性。我们通过与随机近端梯度方法，自适应随机梯度算法Adagrad和随机梯度算法的依赖数据得出收敛结果来说明我们的方法的通用性。作为应用程序，我们基于随机投影梯度方法的依赖数据获得了第一个在线非负矩阵分解算法，该方法具有自适应步骤尺寸和最佳的收敛速率。

We focus on analyzing the classical stochastic projected gradient methods under a general dependent data sampling scheme for constrained smooth nonconvex optimization. We show the worst-case rate of convergence $\tilde{O}(t^{-1/4})$ and complexity $\tilde{O}(\varepsilon^{-4})$ for achieving an $\varepsilon$-near stationary point in terms of the norm of the gradient of Moreau envelope and gradient mapping. While classical convergence guarantee requires i.i.d. data sampling from the target distribution, we only require a mild mixing condition of the conditional distribution, which holds for a wide class of Markov chain sampling algorithms. This improves the existing complexity for the constrained smooth nonconvex optimization with dependent data from $\tilde{O}(\varepsilon^{-8})$ to $\tilde{O}(\varepsilon^{-4})$ with a significantly simpler analysis. We illustrate the generality of our approach by deriving convergence results with dependent data for stochastic proximal gradient methods, adaptive stochastic gradient algorithm AdaGrad and stochastic gradient algorithm with heavy ball momentum. As an application, we obtain first online nonnegative matrix factorization algorithms for dependent data based on stochastic projected gradient methods with adaptive step sizes and optimal rate of convergence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题