论文标题

在阴影中行走:下降方向的新观点,以最小化。

Walking in the Shadow: A New Perspective on Descent Directions for Constrained Minimization

论文作者

Mortagy, Hassan, Gupta, Swati, Pokutta, Sebastian

论文摘要

下降方向,例如向下降方向运动,包括向弗兰克·沃尔夫(Frank-Wolfe)的顶点,客场,面向客场和成对方向的运动,在条件梯度下降(CGD)变体中是一个重要的设计考虑因素。在这项工作中,我们试图揭开这些方向上运动对达到约束最小化的影响的影响。下降的最佳局部方向是负梯度投影的定向衍生物(即阴影)。我们表明,这个方向是最佳的步骤,在阴影中移动的连续时间动力学相当于投影梯度下降(PGD)的动力学,尽管离散化并非平凡。我们还表明,Frank-Wolfe(FW)顶点对应于使用负梯度方向的“无限”步骤投射到多层,从而在这些步骤上提供了新的视角。我们将这些洞察力结合到一种新型的Shadow-CG方法中,该方法使用FW和阴影步骤,同时享受线性收敛,并取决于其投影曲线中断点的数量,而不是金字塔宽度。我们在简单多型的断点数量上提供了线性结合,并根据刻面数来为一般多层型的当前缩放不变上限。我们体现了在各种应用程序中使用Shadow-CG计算的好处,同时提出了一个关于收紧一般多层断点数量的开放问题。

Descent directions such as movement towards Descent directions, including movement towards Frank-Wolfe vertices, away-steps, in-face away-steps and pairwise directions, have been an important design consideration in conditional gradient descent (CGD) variants. In this work, we attempt to demystify the impact of the movement in these directions towards attaining constrained minimizers. The optimal local direction of descent is the directional derivative (i.e., shadow) of the projection of the negative gradient. We show that this direction is the best away-step possible, and the continuous-time dynamics of moving in the shadow is equivalent to the dynamics of projected gradient descent (PGD), although it's non-trivial to discretize. We also show that Frank-Wolfe (FW) vertices correspond to projecting onto the polytope using an "infinite" step in the direction of the negative gradient, thus providing a new perspective on these steps. We combine these insights into a novel Shadow-CG method that uses FW and shadow steps, while enjoying linear convergence, with a rate that depends on the number of breakpoints in its projection curve, rather than the pyramidal width. We provide a linear bound on the number of breakpoints for simple polytopes and present scaling-invariant upper bounds for general polytopes based on the number of facets. We exemplify the benefit of using Shadow-CG computationally for various applications, while raising an open question about tightening the bound on the number of breakpoints for general polytopes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源