用于插值网络的随机束方法

论文标题

用于插值网络的随机束方法

A Stochastic Bundle Method for Interpolating Networks

论文作者

Paren, Alasdair, Berrada, Leonard, Poudel, Rudra P. K., Kumar, M. Pawan

论文摘要

我们提出了一种训练能够插值的深层神经网络的新方法，即将经验损失推向零。在每次迭代中，我们的方法构建了学习目标的随机近似。近似值（称为束）是线性函数的最大值。我们的捆绑包包含一个恒定函数，可降低经验损失。这使我们能够计算自动自适应学习率，从而提供准确的解决方案。此外，我们的捆绑包包括在当前迭代中计算出的线性近似值和DNN参数的其他线性估计值。这些额外近似值的使用使我们的方法对其超参数更加强大。基于其理想的经验特性，我们将方法捆绑优化（BORAT）（BORAT）称为“我们的方法捆绑”。为了操作Borat，我们设计了一种新型算法，以在每次迭代时有效地优化束近似值。我们在凸面和非凸面设置中建立了Borat的理论收敛。使用标准的公开数据集，我们将BORAT与其他单一的超参数优化算法进行了详尽的比较。我们的实验表明，BORAT与这些方法的最新概括性能相匹配，并且是最强大的。

We propose a novel method for training deep neural networks that are capable of interpolation, that is, driving the empirical loss to zero. At each iteration, our method constructs a stochastic approximation of the learning objective. The approximation, known as a bundle, is a pointwise maximum of linear functions. Our bundle contains a constant function that lower bounds the empirical loss. This enables us to compute an automatic adaptive learning rate, thereby providing an accurate solution. In addition, our bundle includes linear approximations computed at the current iterate and other linear estimates of the DNN parameters. The use of these additional approximations makes our method significantly more robust to its hyperparameters. Based on its desirable empirical properties, we term our method Bundle Optimisation for Robust and Accurate Training (BORAT). In order to operationalise BORAT, we design a novel algorithm for optimising the bundle approximation efficiently at each iteration. We establish the theoretical convergence of BORAT in both convex and non-convex settings. Using standard publicly available data sets, we provide a thorough comparison of BORAT to other single hyperparameter optimisation algorithms. Our experiments demonstrate BORAT matches the state-of-the-art generalisation performance for these methods and is the most robust.

下载PDF全文

下载文献需遵守相关版权规定

论文标题