通过迭代平均免费获得可调节的正则化

论文标题

通过迭代平均免费获得可调节的正则化

Obtaining Adjustable Regularization for Free via Iterate Averaging

论文作者

Wu, Jingfeng, Braverman, Vladimir, Yang, Lin F.

论文摘要

进行优化的正则化是一种至关重要的技术，可避免在机器学习中过度拟合。为了获得最佳性能，我们通常通过调整正则化参数来训练模型。但是，当一轮训练需要大量时间时，它变得昂贵。最近，Neu和Rosasco表明，如果我们在线性回归问题上运行随机梯度下降（SGD），那么通过平均sgd正确迭代，我们将获得一个正则化解决方案。是否可以在其他优化问题和算法上实现相同的现象，它留下了开放。在本文中，我们建立了一个平均方案，该方案可证明在任意强烈凸和平稳的目标函数上将SGD的迭代转换为具有可调节的正则化参数的正规化对应物。我们的方法也可以用于加速和预处理的优化方法。我们进一步表明，相同的方法在包括神经网络在内的更一般的优化目标上有效。总而言之，我们可以免费获得大量优化问题的可调节正则化，并解决NEU和Rosasco提出的一个空旷的问题。

Regularization for optimization is a crucial technique to avoid overfitting in machine learning. In order to obtain the best performance, we usually train a model by tuning the regularization parameters. It becomes costly, however, when a single round of training takes significant amount of time. Very recently, Neu and Rosasco show that if we run stochastic gradient descent (SGD) on linear regression problems, then by averaging the SGD iterates properly, we obtain a regularized solution. It left open whether the same phenomenon can be achieved for other optimization problems and algorithms. In this paper, we establish an averaging scheme that provably converts the iterates of SGD on an arbitrary strongly convex and smooth objective function to its regularized counterpart with an adjustable regularization parameter. Our approaches can be used for accelerated and preconditioned optimization methods as well. We further show that the same methods work empirically on more general optimization objectives including neural networks. In sum, we obtain adjustable regularization for free for a large class of optimization problems and resolve an open question raised by Neu and Rosasco.

下载PDF全文

下载文献需遵守相关版权规定

论文标题