CRAM：压缩意识的最小化器

论文标题

CRAM：压缩意识的最小化器

CrAM: A Compression-Aware Minimizer

论文作者

Peste, Alexandra, Vladu, Adrian, Kurtic, Eldar, Lampert, Christoph H., Alistarh, Dan

论文摘要

深度神经网络（DNN）通常必须通过修剪和/或量化来压缩，然后才能在实际设置中部署。在这项工作中，我们提出了一种新的压缩意识最小化器，称为CRAM，以原则上的方式修改优化步骤，以产生在诸如修剪等压缩操作下局部损失行为稳定的模型。因此，通过CRAM训练的密集模型应在一个步骤中是可压缩的训练后，而没有明显的准确性损失。 Experimental results on standard benchmarks, such as residual networks for ImageNet classification and BERT models for language modelling, show that CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning: specifically, we can prune models in one-shot to 70-80% sparsity with almost no accuracy loss, and to 90% with reasonable ($\sim 1 \％$）精度损失，这具有逐渐压缩方法的竞争。此外，CRAM可以产生稀疏的模型，这些模型在转移学习方面表现良好，并且还适用于GPU硬件支持的半结构2：4修剪模式。复制结果的代码可在https://github.com/ist-daslab/cram上获得。

Deep neural networks (DNNs) often have to be compressed, via pruning and/or quantization, before they can be deployed in practical settings. In this work we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as pruning. Thus, dense models trained via CrAM should be compressible post-training, in a single step, without significant accuracy loss. Experimental results on standard benchmarks, such as residual networks for ImageNet classification and BERT models for language modelling, show that CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning: specifically, we can prune models in one-shot to 70-80% sparsity with almost no accuracy loss, and to 90% with reasonable ($\sim 1\%$) accuracy loss, which is competitive with gradual compression methods. Additionally, CrAM can produce sparse models which perform well for transfer learning, and it also works for semi-structured 2:4 pruning patterns supported by GPU hardware. The code for reproducing the results is available at https://github.com/IST-DASLab/CrAM .

下载PDF全文

下载文献需遵守相关版权规定

论文标题