神经网络如何找到可概括的解决方案：深度学习中的自我调整退火

论文标题

神经网络如何找到可概括的解决方案：深度学习中的自我调整退火

How neural networks find generalizable solutions: Self-tuned annealing in deep learning

论文作者

Feng, Yu, Tu, Yuhai

论文摘要

尽管在深度学习中，随机梯度下降（SGD）算法取得了巨大的成功，但对于SGD如何在高维重量空间中找到可推广的溶液知之甚少。通过分析学习动力和损失函数界面，我们发现了所有基于SGD的学习算法的重量方差与景观平坦度（曲率倒数）之间的牢固逆关系。为了解释逆差异 - 流动关系的关系，我们发展了一个随机的景观理论，该理论表明SGD噪声强度（有效温度）取决于景观平坦度。我们的研究表明，SGD达到了一个自调整的景观依赖性退火策略，可以在景观的平坦最小值中找到可推广的解决方案。最后，我们演示了这些新的理论见解如何导致更有效的算法，例如避免灾难性遗忘。

Despite the tremendous success of Stochastic Gradient Descent (SGD) algorithm in deep learning, little is known about how SGD finds generalizable solutions in the high-dimensional weight space. By analyzing the learning dynamics and loss function landscape, we discover a robust inverse relation between the weight variance and the landscape flatness (inverse of curvature) for all SGD-based learning algorithms. To explain the inverse variance-flatness relation, we develop a random landscape theory, which shows that the SGD noise strength (effective temperature) depends inversely on the landscape flatness. Our study indicates that SGD attains a self-tuned landscape-dependent annealing strategy to find generalizable solutions at the flat minima of the landscape. Finally, we demonstrate how these new theoretical insights lead to more efficient algorithms, e.g., for avoiding catastrophic forgetting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题