论文标题
过度参数化神经网络的凸几何和二元性
Convex Geometry and Duality of Over-parameterized Neural Networks
论文作者
论文摘要
我们开发了一种凸分析方法来分析有限宽度两层relu网络。我们首先证明,可以将正规训练问题的最佳解决方案描述为凸组集合的极端点,通过其凸面几何特性鼓励简单的解决方案。然后,我们利用这种表征来表明一组最佳参数会产生涉及一维或等级一数据的回归问题的线性样条插值。我们还根据内核矩阵和最小$ \ ell_1 $ -norm解决方案来表征分类决策区域。这与无法解释有限宽度网络的预测的神经切线内核相反。我们的凸几何表征还提供了隐藏神经元作为自动编码器的直观解释。在较高的维度中,我们表明训练问题可以作为有限的维度凸问题,并且无限许多约束。然后,我们应用了某些凸松弛,并引入了一条切削平面算法以在全球优化网络。我们进一步分析了放松的精确性,以提供融合到全球最佳的条件。我们的分析还表明,在某些实际相关的特殊情况下,最佳网络参数也可以被视为可解释的封闭式公式。
We develop a convex analytic approach to analyze finite width two-layer ReLU networks. We first prove that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set, where simple solutions are encouraged via its convex geometrical properties. We then leverage this characterization to show that an optimal set of parameters yield linear spline interpolation for regression problems involving one dimensional or rank-one data. We also characterize the classification decision regions in terms of a kernel matrix and minimum $\ell_1$-norm solutions. This is in contrast to Neural Tangent Kernel which is unable to explain predictions of finite width networks. Our convex geometric characterization also provides intuitive explanations of hidden neurons as auto-encoders. In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints. Then, we apply certain convex relaxations and introduce a cutting-plane algorithm to globally optimize the network. We further analyze the exactness of the relaxations to provide conditions for the convergence to a global optimum. Our analysis also shows that optimal network parameters can be also characterized as interpretable closed-form formulas in some practically relevant special cases.