重量初始配置对人工神经网络的训练和功能的影响

论文标题

重量初始配置对人工神经网络的训练和功能的影响

Effect of the initial configuration of weights on the training and function of artificial neural networks

论文作者

Jesus, R. J., Antunes, M. L., da Costa, R. A., Dorogovtsev, S. N., Mendes, J. F. F., Aguiar, R. L.

论文摘要

神经网络的功能和性能在很大程度上取决于训练过程中其权重和偏差的演变，从这些参数的初始配置到损失函数的局部最小值之一。我们对通过随机梯度下降（SGD）从其初始随机配置中训练的各种尺寸的两个隐藏层的Relu网络的偏差进行定量统计表征。我们将这种偏差的分布函数的演变与训练过程中损失的演变进行了比较。我们观察到，通过SGD成功训练将网络留在其权重的初始配置的近距离附近。对于链接的每个初始权重，我们测量了训练后与该值的偏差的分布函数，并发现了该分布的矩及其峰如何取决于初始权重。我们在训练过程中探索了这些偏差的演变，并观察到过度拟合区域内突然增加。这种跳跃同时发生，在损失函数的演变中记录的类似突然增加。我们的结果表明，SGD有效找到局部最小值的能力仅限于权重随机初始配置的附近。

The function and performance of neural networks is largely determined by the evolution of their weights and biases in the process of training, starting from the initial configuration of these parameters to one of the local minima of the loss function. We perform the quantitative statistical characterization of the deviation of the weights of two-hidden-layer ReLU networks of various sizes trained via Stochastic Gradient Descent (SGD) from their initial random configuration. We compare the evolution of the distribution function of this deviation with the evolution of the loss during training. We observed that successful training via SGD leaves the network in the close neighborhood of the initial configuration of its weights. For each initial weight of a link we measured the distribution function of the deviation from this value after training and found how the moments of this distribution and its peak depend on the initial weight. We explored the evolution of these deviations during training and observed an abrupt increase within the overfitting region. This jump occurs simultaneously with a similarly abrupt increase recorded in the evolution of the loss function. Our results suggest that SGD's ability to efficiently find local minima is restricted to the vicinity of the random initial configuration of weights.

下载PDF全文

下载文献需遵守相关版权规定

论文标题