DepthShrinker：一种新的压缩范式，可提高紧凑型神经网络的真实硬件效率

论文标题

DepthShrinker：一种新的压缩范式，可提高紧凑型神经网络的真实硬件效率

DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks

论文作者

Fu, Yonggan, Yang, Haichuan, Yuan, Jiayi, Li, Meng, Wan, Cheng, Krishnamoorthi, Raghuraman, Chandra, Vikas, Lin, Yingyan Celine

论文摘要

配备有紧凑型操作员（例如，深度卷积）的有效高神经网络（DNN）模型在降低DNN的理论复杂性（例如，权重/操作的总重量/操作）的同时，在保持体面的模型准确性的同时，显示出很大的潜力。但是，由于其通常采用的紧凑型操作员的低硬件利用率，现有的有效DNN仍然受到履行其提高现实硬件效率的承诺的限制。在这项工作中，我们为开发真实硬件有效的DNN开设了新的压缩范式，从而提高了硬件效率，同时保持了模型的准确性。有趣的是，我们观察到，尽管某些DNN层的激活功能有助于DNNS的训练优化和可实现的准确性，但在训练后可以正确删除它们，而不会损害模型的准确性。受到这一观察的启发，我们提出了一个称为DepthShrinker的框架，该框架通过缩小现有有效DNN的基本构建块来开发硬件友好的紧凑网络，该网络的基本构建块具有不规则的计算模式，并具有大量改进的硬件利用率，从而将硬件型DNN的基础缩小为密集的计算模式，从而使硬件的效率得到了大量改进。令人兴奋的是，我们的DepthShrinker框架提供了对硬件友好的紧凑网络，这些网络的表现都超过了最先进的有效DNN和压缩技术，例如，在Tesla V100上，SOTA Channel vise-wise-wise-wise-wise-wise-wise-wise-wise-wise-wise Prunning Metapruning均高于Tesla V100的精度和1.53 $ \ times $ thyput。我们的代码可在以下网址找到：https：//github.com/facebookresearch/depthshrinker。

Efficient deep neural network (DNN) models equipped with compact operators (e.g., depthwise convolutions) have shown great potential in reducing DNNs' theoretical complexity (e.g., the total number of weights/operations) while maintaining a decent model accuracy. However, existing efficient DNNs are still limited in fulfilling their promise in boosting real-hardware efficiency, due to their commonly adopted compact operators' low hardware utilization. In this work, we open up a new compression paradigm for developing real-hardware efficient DNNs, leading to boosted hardware efficiency while maintaining model accuracy. Interestingly, we observe that while some DNN layers' activation functions help DNNs' training optimization and achievable accuracy, they can be properly removed after training without compromising the model accuracy. Inspired by this observation, we propose a framework dubbed DepthShrinker, which develops hardware-friendly compact networks via shrinking the basic building blocks of existing efficient DNNs that feature irregular computation patterns into dense ones with much improved hardware utilization and thus real-hardware efficiency. Excitingly, our DepthShrinker framework delivers hardware-friendly compact networks that outperform both state-of-the-art efficient DNNs and compression techniques, e.g., a 3.06% higher accuracy and 1.53$\times$ throughput on Tesla V100 over SOTA channel-wise pruning method MetaPruning. Our codes are available at: https://github.com/facebookresearch/DepthShrinker.

下载PDF全文

下载文献需遵守相关版权规定

论文标题