高维度的近端Langevin算法的最佳缩放

论文标题

高维度的近端Langevin算法的最佳缩放

Optimal Scaling for the Proximal Langevin Algorithm in High Dimensions

论文作者

Pillai, Natesh S.

论文摘要

大都市调整后的兰格文（MALA）算法是一种抽样算法，将目标密度对数的梯度纳入其建议分布中。 In an earlier joint work \citet{pill:stu:12}, the author had extended the seminal work of \cite{Robe:Rose:98} and showed that in stationarity, MALA applied to an $N-$dimensional approximation of the target will take ${\cal O}(N^{\frac13})$ steps to explore its target measure.还显示，MALA算法的平均接受概率为0.574美元。在\ citet {pere：16}中，作者引入了近端MALA算法，其中日志目标密度的梯度被近端函数取代。在本文中，我们表明，对于宽两倍的目标目标密度，近端MALA的最佳缩放率与MALA在高维度上的缩放量相同，并且具有平均最佳接受概率为0.574美元。因此，本文的结果给出了以下实际上有用的指南：对于平稳的目标密度，在实施MALA的同时计算梯度很昂贵，用户可以用相应的近端功能替换梯度（通常可以通过凸优化相对便宜地计算出相对便宜的计算）\ emph {nove}损失从最佳量表中损失任何效率的效率。这证实了\ cite {pere：16}中的一些经验观察。

The Metropolis-adjusted Langevin (MALA) algorithm is a sampling algorithm that incorporates the gradient of the logarithm of the target density in its proposal distribution. In an earlier joint work \citet{pill:stu:12}, the author had extended the seminal work of \cite{Robe:Rose:98} and showed that in stationarity, MALA applied to an $N-$dimensional approximation of the target will take ${\cal O}(N^{\frac13})$ steps to explore its target measure. It was also shown that the MALA algorithm is optimized at an average acceptance probability of $0.574$. In \citet{pere:16}, the author introduced the proximal MALA algorithm where the gradient of the log target density is replaced by the proximal function. In this paper, we show that for a wide class of twice differentiable target densities, the proximal MALA enjoys the same optimal scaling as that of MALA in high dimensions and also has an average optimal acceptance probability of $0.574$. The results of this paper thus give the following practically useful guideline: for smooth target densities where it is expensive to compute the gradient while implementing MALA, users may replace the gradient with the corresponding proximal function (that can be often computed relatively cheaply via convex optimization) \emph{without} losing any efficiency gains from optimal scaling. This confirms some of the empirical observations made in \cite{pere:16}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题