使用过度参数化系统中的前向梯度在线优化一阶在线优化

论文标题

使用过度参数化系统中的前向梯度在线优化一阶在线优化

First order online optimisation using forward gradients in over-parameterised systems

论文作者

Mafakheri, Behnam, Shames, Iman, Manton, Jonathan H.

论文摘要

在过去的十年中，深度学习的成功主要取决于基于梯度的优化和反向传播。本文着重分析基于一阶梯度的优化算法，梯度下降和近端梯度的性能，并在（近端）polyak-olojasiewicz条件下具有时变的非凸成本函数。具体而言，我们专注于使用自动分化的正向模式来计算快速变化的问题，这些问题是不可能的，或者使用反向传播算法计算梯度是不可能的，要么是不可能的。对于各种情况，用于跟踪和渐近误差的上限是在最佳解决方案的解决方案或邻域的线性收敛的，其中收敛速率随问题的维度的增加而降低。我们表明，对于具有限制计算资源的求解器，每个步骤处的前向梯度迭代的数量可以是一个设计参数，在跟踪性能和计算限制之间进行交易。

The success of deep learning over the past decade mainly relies on gradient-based optimisation and backpropagation. This paper focuses on analysing the performance of first-order gradient-based optimisation algorithms, gradient descent and proximal gradient, with time-varying non-convex cost function under (proximal) Polyak-Łojasiewicz condition. Specifically, we focus on using the forward mode of automatic differentiation to compute gradients in the fast-changing problems where calculating gradients using the backpropagation algorithm is either impossible or inefficient. Upper bounds for tracking and asymptotic errors are derived for various cases, showing the linear convergence to a solution or a neighbourhood of an optimal solution, where the convergence rate decreases with the increase in the dimension of the problem. We show that for a solver with constraints on computing resources, the number of forward gradient iterations at each step can be a design parameter that trades off between the tracking performance and computing constraints.

下载PDF全文

下载文献需遵守相关版权规定

论文标题