多指数对立随机梯度算法

论文标题

多指数对立随机梯度算法

Multi-index Antithetic Stochastic Gradient Algorithm

论文作者

Majka, Mateusz B., Sabate-Vidales, Marc, Szpruch, Łukasz

论文摘要

随机梯度算法（SGA）在计算统计，机器学习和优化中无处不在。近年来，人们对SGA引起了人们的兴趣，目前对其偏见的非反应分析已经发达了。但是，对于SGA中梯度的随机近似（例如，微型批次）的最佳选择，相对较少，因为这取决于方差的分析，并且是特定于问题的。尽管已经进行了许多尝试来减少SGA的差异，但通常通过需要先验了解其密度模式来利用采样分布的特定结构。因此，尚不清楚如何将这种算法适应非concave设置。在本文中，我们构建了一个多指数的反思随机梯度算法（MASGA），其实施独立于目标度量的结构，并与蒙特卡洛估计器以相同的形式达到性能，这些蒙特卡洛估计器可以从兴趣分布中获取无偏见的样品。换句话说，MASGA是蒙特卡洛估计器类别中的均方误差计算成本的角度来看的最佳估计器。我们严格地证明了对数孔设置的严格事实，并在数值上对其进行数值验证，以便某些不满足对数concoven的假设的示例。

Stochastic Gradient Algorithms (SGAs) are ubiquitous in computational statistics, machine learning and optimisation. Recent years have brought an influx of interest in SGAs, and the non-asymptotic analysis of their bias is by now well-developed. However, relatively little is known about the optimal choice of the random approximation (e.g mini-batching) of the gradient in SGAs as this relies on the analysis of the variance and is problem specific. While there have been numerous attempts to reduce the variance of SGAs, these typically exploit a particular structure of the sampled distribution by requiring a priori knowledge of its density's mode. It is thus unclear how to adapt such algorithms to non-log-concave settings. In this paper, we construct a Multi-index Antithetic Stochastic Gradient Algorithm (MASGA) whose implementation is independent of the structure of the target measure and which achieves performance on par with Monte Carlo estimators that have access to unbiased samples from the distribution of interest. In other words, MASGA is an optimal estimator from the mean square error-computational cost perspective within the class of Monte Carlo estimators. We prove this fact rigorously for log-concave settings and verify it numerically for some examples where the log-concavity assumption is not satisfied.

下载PDF全文

下载文献需遵守相关版权规定

论文标题