沟通效率分布式培训的最佳梯度量化条件

论文标题

沟通效率分布式培训的最佳梯度量化条件

Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training

论文作者

Xu, An, Huo, Zhouyuan, Huang, Heng

论文摘要

对于在计算机视觉应用中使用多个设备培训深层神经网络的梯度通信代价很高。特别是，深度学习模型的尺寸不断增长会导致更高的沟通开销，这无视有关设备数量的理想线性训练速度。梯度量化是降低通信成本的常见方法之一。但是，它可能导致训练中的量化错误，并导致模型性能降解。在这项工作中，我们推断出\ textbf {any}梯度分布的二进制和多级梯度量化的最佳条件。根据最佳条件，我们开发了两个新的量化方案：分别用于二进制和多级梯度量化的偏置宾格德和无偏见的ORQ，这会动态确定最佳量化水平。具有几个流行的卷积神经网络的CIFAR和Imagenet数据集的广泛实验结果显示了我们提出的方法的优越性。

The communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications. In particular, the growing size of deep learning models leads to higher communication overheads that defy the ideal linear training speedup regarding the number of devices. Gradient quantization is one of the common methods to reduce communication costs. However, it can lead to quantization error in the training and result in model performance degradation. In this work, we deduce the optimal condition of both the binary and multi-level gradient quantization for \textbf{ANY} gradient distribution. Based on the optimal condition, we develop two novel quantization schemes: biased BinGrad and unbiased ORQ for binary and multi-level gradient quantization respectively, which dynamically determine the optimal quantization levels. Extensive experimental results on CIFAR and ImageNet datasets with several popular convolutional neural networks show the superiority of our proposed methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题