两层恢复神经网络的隐藏凸优化景观：最佳解决方案的精确表征

论文标题

两层恢复神经网络的隐藏凸优化景观：最佳解决方案的精确表征

The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural Networks: an Exact Characterization of the Optimal Solutions

论文作者

Wang, Yifei, Lacotte, Jonathan, Pilanci, Mert

论文摘要

我们证明，可以通过求解具有锥体约束的凸优化程序来执行所有全球最佳的两层恢复神经网络。我们的分析是新颖的，是所有最佳解决方案的特征，并且不利用基于偶性的分析，该分析最近用于将神经网络训练提升到凸空间。鉴于我们凸优化程序的一系列解决方案，我们展示了如何精确构建整个最佳神经网络集。我们提供了此最佳集合及其不变转换的详细表征。 As additional consequences of our convex perspective, (i) we establish that Clarke stationary points found by stochastic gradient descent correspond to the global optimum of a subsampled convex problem (ii) we provide a polynomial-time algorithm for checking if a neural network is a global minimum of the training loss (iii) we provide an explicit construction of a continuous path between any neural network and the global minimum of its sublevel set and （iv）表征隐藏层的最小尺寸，以使神经网络优化景观没有虚假的山谷。总体而言，我们为研究神经网络训练损失的景观提供了丰富的框架。

We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints. Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces. Given the set of solutions of our convex optimization program, we show how to construct exactly the entire set of optimal neural networks. We provide a detailed characterization of this optimal set and its invariant transformations. As additional consequences of our convex perspective, (i) we establish that Clarke stationary points found by stochastic gradient descent correspond to the global optimum of a subsampled convex problem (ii) we provide a polynomial-time algorithm for checking if a neural network is a global minimum of the training loss (iii) we provide an explicit construction of a continuous path between any neural network and the global minimum of its sublevel set and (iv) characterize the minimal size of the hidden layer so that the neural network optimization landscape has no spurious valleys. Overall, we provide a rich framework for studying the landscape of neural network training loss through convexity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题