论文标题
Frostnet:朝着量化感知网络体系结构搜索
FrostNet: Towards Quantization-Aware Network Architecture Search
论文作者
论文摘要
INT8量化已成为边缘设备上部署卷积神经网络(CNN)的标准技术之一,以减少内存和计算资源使用情况。通过分析现有移动目标网络体系结构的量化性能,我们可以提出有关网络体系结构对最佳INT8量化的重要性的问题。在本文中,我们提出了一个新的网络体系结构搜索(NAS)过程,以找到一个保证全精度(FLOAT32)和量化(INT8)性能的网络。我们首先提出了关键但直接的优化方法,该方法可以实现量化感知培训(QAT):浮点统计辅助(Statassist)和随机梯度提升(GradBoost)。通过将基于梯度的NAS与Statassist和Gradboost集成,我们发现了一个量化有效的网络构建块,即Frost Bottleneck。此外,我们使用Frost瓶颈作为硬件感知NAS的基础,以获得量化有效的网络Frostnets,与其他移动目标网络相比,在保持竞争激烈的FLOAT32性能的同时,它们显示出改进的量化性能。由于较高的延迟降低率(平均65%),我们的Frostnets获得了比现有CNN的识别精度更高的识别精度。
INT8 quantization has become one of the standard techniques for deploying convolutional neural networks (CNNs) on edge devices to reduce the memory and computational resource usages. By analyzing quantized performances of existing mobile-target network architectures, we can raise an issue regarding the importance of network architecture for optimal INT8 quantization. In this paper, we present a new network architecture search (NAS) procedure to find a network that guarantees both full-precision (FLOAT32) and quantized (INT8) performances. We first propose critical but straightforward optimization method which enables quantization-aware training (QAT) : floating-point statistic assisting (StatAssist) and stochastic gradient boosting (GradBoost). By integrating the gradient-based NAS with StatAssist and GradBoost, we discovered a quantization-efficient network building block, Frost bottleneck. Furthermore, we used Frost bottleneck as the building block for hardware-aware NAS to obtain quantization-efficient networks, FrostNets, which show improved quantization performances compared to other mobile-target networks while maintaining competitive FLOAT32 performance. Our FrostNets achieve higher recognition accuracy than existing CNNs with comparable latency when quantized, due to higher latency reduction rate (average 65%).