随机零阶梯度下降的收敛速率，用于lojasiewicz函数

论文标题

随机零阶梯度下降的收敛速率，用于lojasiewicz函数

Convergence Rates of Stochastic Zeroth-order Gradient Descent for Łojasiewicz Functions

论文作者

Wang, Tianyu, Feng, Yasong

论文摘要

我们证明了Lojasiewicz函数的随机零阶梯度下降（SZGD）算法的收敛速率。 SZGD算法迭代为\ begin {align*} \ \ Mathbf {X} _ {t+1} = \ MathBf {X} _t-最大lojasiewicz对ujasiewicz指数$θ$，$η_t$的不平等是步进尺寸（学习率），而$ \ wideHat {\ nabla} f（\ mathbf {x} _t）$是使用Zeroth-rorder信息信息估算的近似梯度。我们的结果表明，$ \ {f（\ MathBf {x} _t）-f（\ MathBf {x} _ \ infty）\} _ {t \ in \ Mathbb {n}} $可以比$ \ {\ \ | | \ Mathbf {X} _T- \ \ Mathbf {x} _ \ infty \ | \} _ {t \ in \ mathbb {n}} $，无论目标$ f $是光滑还是不蒸。

We prove convergence rates of Stochastic Zeroth-order Gradient Descent (SZGD) algorithms for Lojasiewicz functions. The SZGD algorithm iterates as \begin{align*} \mathbf{x}_{t+1} = \mathbf{x}_t - η_t \widehat{\nabla} f (\mathbf{x}_t), \qquad t = 0,1,2,3,\cdots , \end{align*} where $f$ is the objective function that satisfies the Łojasiewicz inequality with Łojasiewicz exponent $θ$, $η_t$ is the step size (learning rate), and $ \widehat{\nabla} f (\mathbf{x}_t) $ is the approximate gradient estimated using zeroth-order information only. Our results show that $ \{ f (\mathbf{x}_t) - f (\mathbf{x}_\infty) \}_{t \in \mathbb{N} } $ can converge faster than $ \{ \| \mathbf{x}_t - \mathbf{x}_\infty \| \}_{t \in \mathbb{N} }$, regardless of whether the objective $f$ is smooth or nonsmooth.

下载PDF全文

下载文献需遵守相关版权规定

论文标题