CSQ：与双层连续稀疏有关的混合精液量化方案

论文标题

CSQ：与双层连续稀疏有关的混合精液量化方案

CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification

论文作者

Xiao, Lirui, Yang, Huanrui, Dong, Zhen, Keutzer, Kurt, Du, Li, Zhang, Shanghang

论文摘要

与统一的量化相比，混合精液量化已广泛应用于深神经网络（DNN）（DNN），因为它导致效率 - 准确性的权衡明显更好。同时，确定每一层的确切精度仍然具有挑战性。先前尝试在训练期间进行比特级正规化和基于修剪的动态精度调整的尝试均具有嘈杂的梯度和不稳定的收敛性。在这项工作中，我们提出了连续的稀疏量化（CSQ），这是一种比特级训练方法，用于搜索具有提高稳定性的混合精确量化方案。 CSQ在量化权重的位值和确定每一层的量化精度方面都可以稳定比特级混合精液训练过程。连续的稀疏方案可以实现完全差异的训练，而无需梯度近似，同时实现了准确的量化模型。总模型大小的预算意识正规化使每一层的精度具有动态增长和修剪为所需尺寸的混合精确量化方案。广泛的实验表明，与多个模型和数据集上的以前方法相比，CSQ实现了更好的效率 - 准确性权衡。

Mixed-precision quantization has been widely applied on deep neural networks (DNNs) as it leads to significantly better efficiency-accuracy tradeoffs compared to uniform quantization. Meanwhile, determining the exact precision of each layer remains challenging. Previous attempts on bit-level regularization and pruning-based dynamic precision adjustment during training suffer from noisy gradients and unstable convergence. In this work, we propose Continuous Sparsification Quantization (CSQ), a bit-level training method to search for mixed-precision quantization schemes with improved stability. CSQ stabilizes the bit-level mixed-precision training process with a bi-level gradual continuous sparsification on both the bit values of the quantized weights and the bit selection in determining the quantization precision of each layer. The continuous sparsification scheme enables fully-differentiable training without gradient approximation while achieving an exact quantized model in the end.A budget-aware regularization of total model size enables the dynamic growth and pruning of each layer's precision towards a mixed-precision quantization scheme of the desired size. Extensive experiments show CSQ achieves better efficiency-accuracy tradeoff than previous methods on multiple models and datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题