论文标题

适合:用于模型灵敏度的度量

FIT: A Metric for Model Sensitivity

论文作者

Zandonati, Ben, Pol, Adrian Alan, Pierini, Maurizio, Sirkin, Olya, Kopetz, Tal

论文摘要

模型压缩对于在边缘设备上的深度学习部署至关重要。通过量化权重和激活来实现的低精度表示形式可以减少推理时间和内存需求。但是,量化和预测模型对与此过程相关的更改的响应仍然具有挑战性。在整个网络中,这种响应是非线性和异质性的。了解哪些参数和激活比其他参数更敏感,这是最大化效率的关键阶段。为此,我们建议合适。 FIT是由信息几何观点的动机,将Fisher信息与量化模型相结合。我们发现FIT可以估算网络的最终性能而无需再培训。拟合有效地将参数和激活量化的贡献融合到单个度量标准中。另外,与现有方法相比,拟合度很快,表明收敛属性良好。这些属性在数百种量化配置中进行了实验验证,重点是层次混合精液量化。

Model compression is vital to the deployment of deep learning on edge devices. Low precision representations, achieved via quantization of weights and activations, can reduce inference time and memory requirements. However, quantifying and predicting the response of a model to the changes associated with this procedure remains challenging. This response is non-linear and heterogeneous throughout the network. Understanding which groups of parameters and activations are more sensitive to quantization than others is a critical stage in maximizing efficiency. For this purpose, we propose FIT. Motivated by an information geometric perspective, FIT combines the Fisher information with a model of quantization. We find that FIT can estimate the final performance of a network without retraining. FIT effectively fuses contributions from both parameter and activation quantization into a single metric. Additionally, FIT is fast to compute when compared to existing methods, demonstrating favourable convergence properties. These properties are validated experimentally across hundreds of quantization configurations, with a focus on layer-wise mixed-precision quantization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源