通过结构化数据梯度修剪加速DNN培训

论文标题

通过结构化数据梯度修剪加速DNN培训

Accelerating DNN Training with Structured Data Gradient Pruning

论文作者

McDanel, Bradley, Dinh, Helia, Magallanes, John

论文摘要

体重修剪是一种通过减少训练过程中模型参数的数量来使深度神经网络（DNN）推断更加有效的技术。但是，大多数重量修剪技术通常不会加快DNN训练的速度，甚至可能需要更多的迭代才能达到模型收敛。在这项工作中，我们提出了一种新型的结构化数据梯度修剪（SDGP）方法，该方法可以加快训练而不会影响模型收敛。这种方法实施了特定的稀疏结构，在矩阵中的每个m元素中，只有n个n可以是非零的，因此可以适应硬件加速度。 NVIDIA A100 GPU等现代加速器支持这种类型的结构化稀疏性，每4个元素的2个nonzeros降低。假设对2：4稀疏性的硬件支持，我们的方法可以达到15-25 \％的总训练时间，而不会对性能产生重大影响。源代码和预训练模型可在\ url {https://github.com/bradmcdanel/sdgp}上获得。

Weight pruning is a technique to make Deep Neural Network (DNN) inference more computationally efficient by reducing the number of model parameters over the course of training. However, most weight pruning techniques generally does not speed up DNN training and can even require more iterations to reach model convergence. In this work, we propose a novel Structured Data Gradient Pruning (SDGP) method that can speed up training without impacting model convergence. This approach enforces a specific sparsity structure, where only N out of every M elements in a matrix can be nonzero, making it amenable to hardware acceleration. Modern accelerators such as the Nvidia A100 GPU support this type of structured sparsity for 2 nonzeros per 4 elements in a reduction. Assuming hardware support for 2:4 sparsity, our approach can achieve a 15-25\% reduction in total training time without significant impact to performance. Source code and pre-trained models are available at \url{https://github.com/BradMcDanel/sdgp}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题