神经梯度几乎含量：改进的量化和稀疏训练

论文标题

神经梯度几乎含量：改进的量化和稀疏训练

Neural gradients are near-lognormal: improved quantized and sparse training

论文作者

Chmiel, Brian, Ben-Uri, Liad, Shkolnik, Moran, Hoffer, Elad, Banner, Ron, Soudry, Daniel

论文摘要

尽管可以通过减少整个模型中传播神经梯度所需的时间来加速训练，但大多数以前的作品都集中在重量和激活的量化/修剪。这些方法通常不适用于具有非常不同的统计特性的神经梯度。与权重和激活区别开来，我们发现神经梯度的分布大约是对数正常的。考虑到这一点，我们建议两种封闭形式的分析方法，以减少神经梯度的计算和记忆负担。第一种方法优化了梯度的浮点格式和比例。第二种方法准确地设置了梯度修剪的稀疏阈值。每种方法都可以在Imagenet上获得最新的结果。据我们所知，本文是（1）将梯度量化为6位浮点格式，或（2）达到高达85％的梯度稀疏性 - 在每种情况下都没有准确的降解。参考实施伴随论文。

While training can mostly be accelerated by reducing the time needed to propagate neural gradients back throughout the model, most previous works focus on the quantization/pruning of weights and activations. These methods are often not applicable to neural gradients, which have very different statistical properties. Distinguished from weights and activations, we find that the distribution of neural gradients is approximately lognormal. Considering this, we suggest two closed-form analytical methods to reduce the computational and memory burdens of neural gradients. The first method optimizes the floating-point format and scale of the gradients. The second method accurately sets sparsity thresholds for gradient pruning. Each method achieves state-of-the-art results on ImageNet. To the best of our knowledge, this paper is the first to (1) quantize the gradients to 6-bit floating-point formats, or (2) achieve up to 85% gradient sparsity -- in each case without accuracy degradation. Reference implementation accompanies the paper.

下载PDF全文

下载文献需遵守相关版权规定

论文标题