通过大幅度降低其梯度估计器偏置，将平衡传播缩放到深度弯曲

论文标题

通过大幅度降低其梯度估计器偏置，将平衡传播缩放到深度弯曲

Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing its Gradient Estimator Bias

论文作者

Laborieux, Axel, Ernoult, Maxence, Scellier, Benjamin, Bengio, Yoshua, Grollier, Julie, Querlioz, Damien

论文摘要

平衡传播（EP）是一种具有强大理论保证的本地学习规则的收敛性RNN的生物启发算法。数学上显示了在信用分配阶段的神经网络的参数更新，以接近通过时间（BPTT）提供的反向传播提供的梯度（当网络针对其目标的无限范围）时。但是，实际上，通过EP提供的梯度估计训练网络并不比MNIST更难地扩展到视觉任务。在这项工作中，我们表明，使用有限的轻推固有的EP梯度估计值的偏见是对这种现象的造成的，并且取消它允许EP训练深入的Convnets。我们表明，通过使用对称的轻推（一种阳性的轻推和阴性）可以大大降低这种偏见。我们还将以前的EP方程概括为跨凝性损失的情况（反对平方误差）。由于这些进步，我们能够通过EP实现CIFAR-10的测试误差为11.7％，EP接近BPTT实现的测试误差，并以同一签名的裸露为86％的测试误差提供了对标准EP方法的重大改进。我们还应用这些技术来训练一个不对称的前向和向后连接的体系结构，得出13.2％的测试错误。这些结果凸显了EP是一种令人信服的生物学知识方法，可以计算深神经网络中的错误梯度。

Equilibrium Propagation (EP) is a biologically-inspired algorithm for convergent RNNs with a local learning rule that comes with strong theoretical guarantees. The parameter updates of the neural network during the credit assignment phase have been shown mathematically to approach the gradients provided by Backpropagation Through Time (BPTT) when the network is infinitesimally nudged toward its target. In practice, however, training a network with the gradient estimates provided by EP does not scale to visual tasks harder than MNIST. In this work, we show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon and that cancelling it allows training deep ConvNets by EP. We show that this bias can be greatly reduced by using symmetric nudging (a positive nudging and a negative one). We also generalize previous EP equations to the case of cross-entropy loss (by opposition to squared error). As a result of these advances, we are able to achieve a test error of 11.7% on CIFAR-10 by EP, which approaches the one achieved by BPTT and provides a major improvement with respect to the standard EP approach with same-sign nudging that gives 86% test error. We also apply these techniques to train an architecture with asymmetric forward and backward connections, yielding a 13.2% test error. These results highlight EP as a compelling biologically-plausible approach to compute error gradients in deep neural networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题