当应用于非平滑函数及其Nesterov平滑度时，有限记忆BFG的行为

论文标题

当应用于非平滑函数及其Nesterov平滑度时，有限记忆BFG的行为

Behavior of Limited Memory BFGS when Applied to Nonsmooth Functions and their Nesterov Smoothings

论文作者

Asl, Azam, Overton, Michael L.

论文摘要

研究有限的内存BFG（L-BFG）在非平滑优化问题上的行为的动机是基于两个经验观察：L-BFG在解决大型平滑优化问题方面的广泛成功，以及完整的BFG方法在求解小型非中等介于非牙套的逐渐逐渐逐渐逐渐逐步的求解方面的有效性，该方法的效率是逐渐逐步的。我们首先总结了关于缩放L-BFGS方法的行为的理论结果，其中一个更新应用于下面不受限制的简单凸非光功能，并指出该方法在该条件下将方法收敛到非最佳点的情况下，无论起点如何。然后，我们转向经验研究同一现象是否更普遍地存在，重点是Nesterov的困难问题，以及在半决赛编程应用中引起的特征值优化问题。我们发现，当直接应用于非平滑函数时，L-BFG（尤其是其缩放变体）通常会因与完整BFG形成鲜明对比的最佳溶液的近似值差而分解。未量化的L-BFG不容易分解，但是每个迭代的功能评估要比缩放的L-BFG进行得多，因此它很慢。但是，通常情况下，这两个变体都比可证明的收敛但缓慢的亚级别方法获得更好的结果。另一方面，当应用于Nesterov的非平滑函数的平滑近似值时，缩放L-BFG通常比未量的L-BFG更有效，即使问题的条件很差，也经常获得良好的结果。总而言之，对于大规模的非平滑优化问题，全部BFG和其他非平滑优化方法不实用，通常将L-BFGS应用于非平滑问题问题的平稳近似比将其直接应用于非滑动问题。

The motivation to study the behavior of limited-memory BFGS (L-BFGS) on nonsmooth optimization problems is based on two empirical observations: the widespread success of L-BFGS in solving large-scale smooth optimization problems, and the effectiveness of the full BFGS method in solving small to medium-sized nonsmooth optimization problems, based on using a gradient, not a subgradient, oracle paradigm. We first summarize our theoretical results on the behavior of the scaled L-BFGS method with one update applied to a simple convex nonsmooth function that is unbounded below, stating conditions under which the method converges to a non-optimal point regardless of the starting point. We then turn to empirically investigating whether the same phenomenon holds more generally,focusing on a difficult problem of Nesterov, as well as eigenvalue optimization problems arising in semidefinite programming applications. We find that when applied to a nonsmooth function directly, L-BFGS, especially its scaled variant, often breaks down with a poor approximation to an optimal solution, in sharp contrast to full BFGS. Unscaled L-BFGS is less prone to breakdown but conducts far more function evaluations per iteration than scaled L-BFGS does, and thus it is slow. Nonetheless, it is often the case that both variants obtain better results than the provably convergent, but slow, subgradient method. On the other hand, when applied to Nesterov's smooth approximation of a nonsmooth function, scaled L-BFGS is generally much more efficient than unscaled L-BFGS, often obtaining good results even when the problem is quite ill-conditioned. Summarizing, for large-scale nonsmooth optimization problems for which full BFGS and other methods for nonsmooth optimization are not practical, it is often better to apply L-BFGS to a smooth approximation of a nonsmooth problem than to apply it directly to the nonsmooth problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题