论文标题
要过滤修剪或铺层修剪,这是一个问题
To Filter Prune, or to Layer Prune, That Is The Question
论文作者
论文摘要
修剪神经网络的最新进展使得可以消除大量过滤器或权重,而准确性下降。参数的数量和拖船的数量通常是报告的指标,以测量修剪模型的质量。但是,由于潜伏期测量的复杂性质,文献中通常会忽略这些修剪模型的速度增长。在本文中,我们显示了滤波器修剪方法的局限性,以减少延迟并提出LoesePrune框架。 LayerPrune根据不同的标准提出了一组层修剪方法,这些方法比相似的精度上的过滤器修剪方法实现了更高的延迟降低。层修剪比过滤器减小的优点是由于前者不受原始模型的深度限制的结果,因此可以减少延迟范围。对于我们检查的每种过滤器修剪方法,我们使用相同的滤波器重要性标准来计算一击中的每层重要性得分。然后,我们修剪最小重要的层,并微调较浅的模型,该模型比其基于滤波器的修剪对应物获得了可比或更好的精度。这个单发过程允许在微调之前从单个路径网络(例如VGG)中删除层,与迭代过滤器修剪不同,每层需要最少的过滤器数量来允许限制搜索空间的数据流。据我们所知,我们是第一个研究修剪方法对延迟度量的影响而不是用于多个网络,数据集和硬件目标的flops的效果。在Imagenet数据集的类似延迟预算上,LoesePrune还优于Shuffleenet,Mobilenet,Mnasnet和Resnet18等手工制作的体系结构,分别为7.3%,4.6%,2.8%和0.5%。
Recent advances in pruning of neural networks have made it possible to remove a large number of filters or weights without any perceptible drop in accuracy. The number of parameters and that of FLOPs are usually the reported metrics to measure the quality of the pruned models. However, the gain in speed for these pruned models is often overlooked in the literature due to the complex nature of latency measurements. In this paper, we show the limitation of filter pruning methods in terms of latency reduction and propose LayerPrune framework. LayerPrune presents a set of layer pruning methods based on different criteria that achieve higher latency reduction than filter pruning methods on similar accuracy. The advantage of layer pruning over filter pruning in terms of latency reduction is a result of the fact that the former is not constrained by the original model's depth and thus allows for a larger range of latency reduction. For each filter pruning method we examined, we use the same filter importance criterion to calculate a per-layer importance score in one-shot. We then prune the least important layers and fine-tune the shallower model which obtains comparable or better accuracy than its filter-based pruning counterpart. This one-shot process allows to remove layers from single path networks like VGG before fine-tuning, unlike in iterative filter pruning, a minimum number of filters per layer is required to allow for data flow which constraint the search space. To the best of our knowledge, we are the first to examine the effect of pruning methods on latency metric instead of FLOPs for multiple networks, datasets and hardware targets. LayerPrune also outperforms handcrafted architectures such as Shufflenet, MobileNet, MNASNet and ResNet18 by 7.3%, 4.6%, 2.8% and 0.5% respectively on similar latency budget on ImageNet dataset.