AMED：边缘设备的自动混合精确量化

论文标题

AMED：边缘设备的自动混合精确量化

AMED: Automatic Mixed-Precision Quantization for Edge Devices

论文作者

Kimhi, Moshe, Rozen, Tal, Mendelson, Avi, Baskin, Chaim

论文摘要

量化的神经网络以减少潜伏期，功耗和模型大小而闻名，而不会对性能造成重大损害。这使它们非常适合资源有限和功率低的系统。混合精液量化提供了更好地利用自定义硬件，该硬件支持不同的位宽算术操作。量化方法要么旨在将压缩损失最小化，如果所需的还原降低或优化模型指定属性的因变量（例如拖船或模型大小）；两者都在部署在特定硬件上时的性能效率低下，但更重要的是，量化方法假定损失歧管对于量化模型的全局最小值，该量化模型与全局最小的Precision对应物相对应。挑战这一假设，我们认为，最佳最小值随着精度的变化而变化，因此，最好将量化视为一个随机过程，为量化神经网络的不同方法奠定基础，在培训过程中，该方法将模型量化为不同的精度，通过在训练过程中量身定义，然后将其定义为Markov决策过程，并将其定义为特定的，并在Markov的决策过程中进行了特定的指定，并将某个设备定义为最佳的局面，并将某个设备定义为一定的范围。硬件体系结构。通过这样做，我们避免了基本假设，即损失的行为以相同的方式对于量化模型。在神经网络的准确性和硬件效率之间的权衡方面，对边缘设备的自动混合精液量化（称为AMED）表明了它优于当前最新方案。

Quantized neural networks are well known for reducing the latency, power consumption, and model size without significant harm to the performance. This makes them highly appropriate for systems with limited resources and low power capacity. Mixed-precision quantization offers better utilization of customized hardware that supports arithmetic operations at different bitwidths. Quantization methods either aim to minimize the compression loss given a desired reduction or optimize a dependent variable for a specified property of the model (such as FLOPs or model size); both make the performance inefficient when deployed on specific hardware, but more importantly, quantization methods assume that the loss manifold holds a global minimum for a quantized model that copes with the global minimum of the full precision counterpart. Challenging this assumption, we argue that the optimal minimum changes as the precision changes, and thus, it is better to look at quantization as a random process, placing the foundation for a different approach to quantize neural networks, which, during the training procedure, quantizes the model to a different precision, looks at the bit allocation as a Markov Decision Process, and then, finds an optimal bitwidth allocation for measuring specified behaviors on a specific device via direct signals from the particular hardware architecture. By doing so, we avoid the basic assumption that the loss behaves the same way for a quantized model. Automatic Mixed-Precision Quantization for Edge Devices (dubbed AMED) demonstrates its superiority over current state-of-the-art schemes in terms of the trade-off between neural network accuracy and hardware efficiency, backed by a comprehensive evaluation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题