论文标题
Gradmax:使用梯度信息发展神经网络
GradMax: Growing Neural Networks using Gradient Information
论文作者
论文摘要
神经网络的体系结构和参数通常是独立优化的,只要修改体系结构,就需要对参数进行昂贵的重新审查。相反,在这项工作中,我们专注于发展体系结构,而无需昂贵的重新培训。我们提出了一种在训练过程中添加新神经元的方法,而不会影响已经学到的东西,同时改善了训练动态。我们通过最大化新权重的梯度来实现后者,并通过单数值分解(SVD)有效地找到最佳初始化。我们称这种技术梯度最大化增长(GradMax),并证明了其在各种视觉任务和体系结构中的有效性。
The architecture and the parameters of neural networks are often optimized independently, which requires costly retraining of the parameters whenever the architecture is modified. In this work we instead focus on growing the architecture without requiring costly retraining. We present a method that adds new neurons during training without impacting what is already learned, while improving the training dynamics. We achieve the latter by maximizing the gradients of the new weights and find the optimal initialization efficiently by means of the singular value decomposition (SVD). We call this technique Gradient Maximizing Growth (GradMax) and demonstrate its effectiveness in variety of vision tasks and architectures.