块动态稀疏

论文标题

Block-wise Dynamic Sparseness

论文作者

Hadifar, Amir, Deleu, Johannes, Develder, Chris, Demeester, Thomas

论文摘要

神经网络已经实现了各种机器学习任务的最先进的性能，通常具有大型计算模型。近年来，诱导稀疏度是减少这些模型的记忆和计算足迹的一种方式。在本文中，我们提出了一种用于\ emph {动态稀疏性}的新方法，该方法基于输入而动态省略了一部分计算。为了效率，我们将动态稀疏性的概念与块矩阵矢量乘法结合在一起。与静态稀疏性（永久归零的重量矩阵位置）相反，我们的方法通过有可能访问任何受过训练的权重来保留完整的网络功能。然而，基于输入，矩阵矢量乘法可以通过省略从矩阵的重量块的预定键块来加速。使用经常性和准循环模型的语言建模任务的实验结果表明，所提出的方法可以胜过基于大小的静态稀疏基线。此外，我们的方法达到了与密集基线相似的语言建模困惑，在推理时间的计算成本的一半。

Neural networks have achieved state of the art performance across a wide variety of machine learning tasks, often with large and computation-heavy models. Inducing sparseness as a way to reduce the memory and computation footprint of these models has seen significant research attention in recent years. In this paper, we present a new method for \emph{dynamic sparseness}, whereby part of the computations are omitted dynamically, based on the input. For efficiency, we combined the idea of dynamic sparseness with block-wise matrix-vector multiplications. In contrast to static sparseness, which permanently zeroes out selected positions in weight matrices, our method preserves the full network capabilities by potentially accessing any trained weights. Yet, matrix vector multiplications are accelerated by omitting a pre-defined fraction of weight blocks from the matrix, based on the input. Experimental results on the task of language modeling, using recurrent and quasi-recurrent models, show that the proposed method can outperform a magnitude-based static sparseness baseline. In addition, our method achieves similar language modeling perplexities as the dense baseline, at half the computational cost at inference time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题