BLK-REW：使用重新加权正则化方法的基于统一的基于块的DNN修剪框架

论文标题

BLK-REW：使用重新加权正则化方法的基于统一的基于块的DNN修剪框架

BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method

论文作者

Ma, Xiaolong, Li, Zhengang, Gong, Yifan, Zhang, Tianyun, Niu, Wei, Zhan, Zheng, Zhao, Pu, Tang, Jian, Lin, Xue, Ren, Bin, Wang, Yanzhi

论文摘要

在各种资源有限的计算平台上加速DNN执行一直是一个长期存在的问题。先前的工作利用基于L1的组拉索或动态正则化（例如ADMM）对DNN模型进行结构化修剪来利用并行计算体系结构。但是，两种修剪维度和修剪方法都缺乏普遍性，从而导致性能下降和有限的适用性。为了解决该问题，我们提出了一个新的基于块的修剪框架，该框架包括一个通用且灵活的结构化修剪维度以及强大而有效的重新加权正则化方法。我们的框架是通用的，可以应用于CNN和RNN，这意味着对两种主要类型的计算密集型层（即Cons和FC层）提供了完全的支持。为了完成整个加速任务的所有方面，我们还将基于编译器的代码优化集成到我们的框架中，以实时执行DNN推断。据我们所知，这是重量修剪框架首次以实时移动加速度实现CNN和RNN的通用覆盖范围，并且没有准确性妥协。

Accelerating DNN execution on various resource-limited computing platforms has been a long-standing problem. Prior works utilize l1-based group lasso or dynamic regularization such as ADMM to perform structured pruning on DNN models to leverage the parallel computing architectures. However, both of the pruning dimensions and pruning methods lack universality, which leads to degraded performance and limited applicability. To solve the problem, we propose a new block-based pruning framework that comprises a general and flexible structured pruning dimension as well as a powerful and efficient reweighted regularization method. Our framework is universal, which can be applied to both CNNs and RNNs, implying complete support for the two major kinds of computation-intensive layers (i.e., CONV and FC layers). To complete all aspects of the pruning-for-acceleration task, we also integrate compiler-based code optimization into our framework that can perform DNN inference in a real-time manner. To the best of our knowledge, it is the first time that the weight pruning framework achieves universal coverage for both CNNs and RNNs with real-time mobile acceleration and no accuracy compromise.

下载PDF全文

下载文献需遵守相关版权规定

论文标题