论文标题

随机零阶梯度下降的收敛速率,用于lojasiewicz函数

Convergence Rates of Stochastic Zeroth-order Gradient Descent for Łojasiewicz Functions

论文作者

Wang, Tianyu, Feng, Yasong

论文摘要

我们证明了Lojasiewicz函数的随机零阶梯度下降(SZGD)算法的收敛速率。 SZGD算法迭代为\ begin {align*} \ \ Mathbf {X} _ {t+1} = \ MathBf {X} _t-最大lojasiewicz对ujasiewicz指数$θ$,$η_t$的不平等是步进尺寸(学习率),而$ \ wideHat {\ nabla} f(\ mathbf {x} _t)$是使用Zeroth-rorder信息信息估算的近似梯度。 我们的结果表明,$ \ {f(\ MathBf {x} _t)-f(\ MathBf {x} _ \ infty)\} _ {t \ in \ Mathbb {n}} $可以比$ \ {\ \ | | \ Mathbf {X} _T- \ \ Mathbf {x} _ \ infty \ | \} _ {t \ in \ mathbb {n}} $,无论目标$ f $是光滑还是不蒸。

We prove convergence rates of Stochastic Zeroth-order Gradient Descent (SZGD) algorithms for Lojasiewicz functions. The SZGD algorithm iterates as \begin{align*} \mathbf{x}_{t+1} = \mathbf{x}_t - η_t \widehat{\nabla} f (\mathbf{x}_t), \qquad t = 0,1,2,3,\cdots , \end{align*} where $f$ is the objective function that satisfies the Łojasiewicz inequality with Łojasiewicz exponent $θ$, $η_t$ is the step size (learning rate), and $ \widehat{\nabla} f (\mathbf{x}_t) $ is the approximate gradient estimated using zeroth-order information only. Our results show that $ \{ f (\mathbf{x}_t) - f (\mathbf{x}_\infty) \}_{t \in \mathbb{N} } $ can converge faster than $ \{ \| \mathbf{x}_t - \mathbf{x}_\infty \| \}_{t \in \mathbb{N} }$, regardless of whether the objective $f$ is smooth or nonsmooth.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源