通过加速双重平均降低有限和优化的差异

论文标题

通过加速双重平均降低有限和优化的差异

Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization

论文作者

Song, Chaobing, Jiang, Yong, Ma, Yi

论文摘要

在本文中，我们引入了一种简化而统一的方法，以通过加速双重平均（VRADA）}来称为\ emph {降低方差{差异。在一般凸面和强烈凸面设置中，Vrada都可以达到$ o \ big（\ frac {1} {n} \ big）$ - 准确的解决方案 - $ O（n \ log \ log \ log n）$的随机梯度评估数量，可改善最著名的结果$ O（N \ log log n \ log n）$ n $ s same same nork nork nork nork nork nork same nork nork nork same nork nork nork by nors nors nork by nors nors by nob。同时，VRADA匹配了一般凸的下限，设置为$ \ log \ log n $ factor，并匹配两个制度中的下限$ n \leθ（κ）$和$ n \ n \ggκ$在强凸设置的设置中，$κ$表示条件编号。除了改善最著名的结果并同时匹配上述所有下限之外，VRADA还具有更简化的算法实现和融合分析，用于一般凸和强烈凸面设置。诸如VRADA中新型初始化策略之类的基本新方法可能具有独立的兴趣。通过在实际数据集上的实验，我们显示了VRADA对大规模机器学习问题的现有方法的良好性能。

In this paper, we introduce a simplified and unified method for finite-sum convex optimization, named \emph{Variance Reduction via Accelerated Dual Averaging (VRADA)}. In both general convex and strongly convex settings, VRADA can attain an $O\big(\frac{1}{n}\big)$-accurate solution in $O(n\log\log n)$ number of stochastic gradient evaluations which improves the best-known result $O(n\log n)$, where $n$ is the number of samples. Meanwhile, VRADA matches the lower bound of the general convex setting up to a $\log\log n$ factor and matches the lower bounds in both regimes $n\le Θ(κ)$ and $n\gg κ$ of the strongly convex setting, where $κ$ denotes the condition number. Besides improving the best-known results and matching all the above lower bounds simultaneously, VRADA has more unified and simplified algorithmic implementation and convergence analysis for both the general convex and strongly convex settings. The underlying novel approaches such as the novel initialization strategy in VRADA may be of independent interest. Through experiments on real datasets, we show the good performance of VRADA over existing methods for large-scale machine learning problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题