扩散模型的训练后量化

论文标题

扩散模型的训练后量化

Post-training Quantization on Diffusion Models

论文作者

Shang, Yuzhang, Yuan, Zhihang, Xie, Bin, Wu, Bingzhe, Yan, Yan

论文摘要

降级扩散（基于得分）的生成模型最近在生成现实和多样化的数据方面取得了重大成就。这些方法定义了将数据转换为噪声的正向扩散过程，以及从噪声中采样数据的向后降解过程。不幸的是，由于依赖繁琐的神经网络的冗长迭代噪声估计，众所周知，当前的脱氧扩散模型的发电过程慢慢。它可以防止扩散模型被广泛部署，尤其是在边缘设备上。以前的作品通过查找较短但有效的采样轨迹加速了扩散模型（DM）的生成过程。但是，他们在每次迭代中都忽略了噪声估计的成本。在这项工作中，我们从压缩噪声估计网络的角度加速了发电。由于难以再训练DMS，我们排除了主流训练感知的压缩范例，并将训练后量化（PTQ）引入DM加速度。但是，噪声估计网络的输出分布随时间步长而变化，使以前的PTQ方法在DMS中失败，因为它们是为单个时间步骤方案而设计的。为了设计DM特异性PTQ方法，我们在三个方面探索DM上的PTQ：量化操作，校准数据集和校准度量。我们总结并使用来自全包研究的几种观察结果来制定我们的方法，这尤其针对DMS的独特多时间步骤结构。在实验上，我们的方法可以将完整精确的DM直接量化为8位模型，同时以无训练的方式保持甚至提高其性能。重要的是，我们的方法可以用作其他快速采样方法的插件模块，例如DDIM。该代码可在https://github.com/42shawn/ptq4dm上找到。

Denoising diffusion (score-based) generative models have recently achieved significant accomplishments in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data into noise and a backward denoising process for sampling data from noise. Unfortunately, the generation process of current denoising diffusion models is notoriously slow due to the lengthy iterative noise estimations, which rely on cumbersome neural networks. It prevents the diffusion models from being widely deployed, especially on edge devices. Previous works accelerate the generation process of diffusion model (DM) via finding shorter yet effective sampling trajectories. However, they overlook the cost of noise estimation with a heavy network in every iteration. In this work, we accelerate generation from the perspective of compressing the noise estimation network. Due to the difficulty of retraining DMs, we exclude mainstream training-aware compression paradigms and introduce post-training quantization (PTQ) into DM acceleration. However, the output distributions of noise estimation networks change with time-step, making previous PTQ methods fail in DMs since they are designed for single-time step scenarios. To devise a DM-specific PTQ method, we explore PTQ on DM in three aspects: quantized operations, calibration dataset, and calibration metric. We summarize and use several observations derived from all-inclusive investigations to formulate our method, which especially targets the unique multi-time-step structure of DMs. Experimentally, our method can directly quantize full-precision DMs into 8-bit models while maintaining or even improving their performance in a training-free manner. Importantly, our method can serve as a plug-and-play module on other fast-sampling methods, e.g., DDIM. The code is available at https://github.com/42Shawn/PTQ4DM .

下载PDF全文

下载文献需遵守相关版权规定

论文标题