论文标题
量化神经网络压缩的稀疏重量分解
Quantized Sparse Weight Decomposition for Neural Network Compression
论文作者
论文摘要
在本文中,我们介绍了一种新颖的神经网络重量压缩方法。在我们的方法中,我们将重量张量存储为稀疏,量化的矩阵因子,在推断过程中,其产物是在产生目标模型重量的过程中即时计算的。我们使用预计的梯度下降方法来找到重量张量的量化和稀疏分解。我们表明,这种方法可以看作是体重SVD,矢量量化和稀疏PCA的统一。结合端到端微调,我们的方法超出了或与以前的最新方法相提并论,就精度和模型大小之间的权衡而言。与向量量化和极端压缩方案不同,我们的方法适用于中等压缩方案。
In this paper, we introduce a novel method of neural network weight compression. In our method, we store weight tensors as sparse, quantized matrix factors, whose product is computed on the fly during inference to generate the target model's weights. We use projected gradient descent methods to find quantized and sparse factorization of the weight tensors. We show that this approach can be seen as a unification of weight SVD, vector quantization, and sparse PCA. Combined with end-to-end fine-tuning our method exceeds or is on par with previous state-of-the-art methods in terms of the trade-off between accuracy and model size. Our method is applicable to both moderate compression regimes, unlike vector quantization, and extreme compression regimes.