论文标题
矢量输出relu神经网络问题是共同的程序:两个层网络和多项式时间算法的凸分析
Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms
论文作者
论文摘要
我们描述了两层矢量输出relu神经网络训练问题的凸半无限双偶。该半无限双重表示有限的尺寸表示,但其支持在很难表征的凸集之上。特别是,我们证明了非凸位神经网络训练问题等同于有限维凸共振态计划。我们的工作是第一个确定神经网络的全球最佳选择与共同计划之间的紧密联系的工作。因此,我们演示了神经网络如何通过半非态矩阵分解来隐式地尝试解决共同程序,并从该公式中获取关键见解。我们描述了第一个用于发现矢量输出神经网络训练问题的全局最小值的算法,这些算法在固定数据等级的样本数量中是多项式的,但在维度上是指数级的。但是,在卷积体系结构的情况下,计算复杂性仅在所有其他参数中的滤波器大小和多项式中指数。我们描述了我们可以通过软阈值SVD完全找到这个神经网络训练问题的全球最佳状态,并提供了共同的放松,保证对某些类别的问题确切,这与实践中随机梯度下降的解决方案相对应。
We describe the convex semi-infinite dual of the two-layer vector-output ReLU neural network training problem. This semi-infinite dual admits a finite dimensional representation, but its support is over a convex set which is difficult to characterize. In particular, we demonstrate that the non-convex neural network training problem is equivalent to a finite-dimensional convex copositive program. Our work is the first to identify this strong connection between the global optima of neural networks and those of copositive programs. We thus demonstrate how neural networks implicitly attempt to solve copositive programs via semi-nonnegative matrix factorization, and draw key insights from this formulation. We describe the first algorithms for provably finding the global minimum of the vector output neural network training problem, which are polynomial in the number of samples for a fixed data rank, yet exponential in the dimension. However, in the case of convolutional architectures, the computational complexity is exponential in only the filter size and polynomial in all other parameters. We describe the circumstances in which we can find the global optimum of this neural network training problem exactly with soft-thresholded SVD, and provide a copositive relaxation which is guaranteed to be exact for certain classes of problems, and which corresponds with the solution of Stochastic Gradient Descent in practice.