通过层融合的MSE最佳神经网络初始化

论文标题

通过层融合的MSE最佳神经网络初始化

MSE-Optimal Neural Network Initialization via Layer Fusion

论文作者

Ghods, Ramina, Lan, Andrew S., Goldstein, Tom, Studer, Christoph

论文摘要

深层神经网络实现了一系列分类和推理任务的最新性能。但是，使用随机梯度下降与基本优化问题的非概念相结合的参数学习对初始化敏感。为了解决这个问题，过去曾提出过各种依赖于随机参数初始化或知识蒸馏的方法。在本文中，我们提出了Fuseinit，这是一种新颖的方法，可以通过融合以随机初始化训练的较深网络的相邻层来初始化较浅的网络。我们开发了均方体误差（MSE） - 邻近致密，卷积致密和卷积对趋化度层的理论结果和有效算法。我们展示了一系列分类和回归数据集的实验，这表明更深的神经网络对初始化敏感不太敏感，如果用Fuseinit初始化，较浅的网络可以表现更好（有时和更深的对应物）。

Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. However, the use of stochastic gradient descent combined with the nonconvexity of the underlying optimization problems renders parameter learning susceptible to initialization. To address this issue, a variety of methods that rely on random parameter initialization or knowledge distillation have been proposed in the past. In this paper, we propose FuseInit, a novel method to initialize shallower networks by fusing neighboring layers of deeper networks that are trained with random initialization. We develop theoretical results and efficient algorithms for mean-square error (MSE)-optimal fusion of neighboring dense-dense, convolutional-dense, and convolutional-convolutional layers. We show experiments for a range of classification and regression datasets, which suggest that deeper neural networks are less sensitive to initialization and shallower networks can perform better (sometimes as well as their deeper counterparts) if initialized with FuseInit.

下载PDF全文

下载文献需遵守相关版权规定

论文标题