生成视觉提示：统一对预训练的生成模型的分布控制

论文标题

生成视觉提示：统一对预训练的生成模型的分布控制

Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models

论文作者

Wu, Chen Henry, Motamed, Saman, Srivastava, Shaunak, De la Torre, Fernando

论文摘要

生成模型（例如gans，扩散模型）以无监督的方式学习潜在的数据分布。但是，许多感兴趣的应用都需要从输出空间的特定区域进行采样，或在一系列特征上均匀地采样。为了在这些情况下进行有效的采样，我们提出了生成视觉提示（提示），这是通过合并其他现成模型的知识来对预训练的生成模型进行分配控制的框架。提示将控制为基于能量的模型（EBM），并以馈送方式将图像定义为通过使用可逆的神经网络近似EBM，从而避免推理时进行优化。我们的实验证明了如何从几个无条件生成模型（例如stylegan2，stylenerf，diffusion自动编码器，NVAE，NVAE，NVAE，NVAE）中有效采样，使用各种悬而未决的模型，使用各种卖空模型使用剪辑模型作为控制模型，并通过cript facter tage a iaginal a iaginal a iaginal a image（2）示例图像（2）示例图像（2），该模型可以通过图像为（2）示例图像（2），该模型（2）的图像（2）示例（2），图像示例（2），图像（2）图像（2），图像（2），图像（2）图像，2）属性或属性组合，（3）具有反图形模型作为对照，提示可以在不同姿势中采样相同身份的图像。（4）最后，提示符揭示了剪辑模型在用作控制时显示“报告偏差”，而提示可以以迭代方式进一步偏离此受控分布。该代码可在https://github.com/chenwu98/generative-visual-prompt上找到。

Generative models (e.g., GANs, diffusion models) learn the underlying data distribution in an unsupervised manner. However, many applications of interest require sampling from a particular region of the output space or sampling evenly over a range of characteristics. For efficient sampling in these scenarios, we propose Generative Visual Prompt (PromptGen), a framework for distributional control over pre-trained generative models by incorporating knowledge of other off-the-shelf models. PromptGen defines control as energy-based models (EBMs) and samples images in a feed-forward manner by approximating the EBM with invertible neural networks, avoiding optimization at inference. Our experiments demonstrate how PromptGen can efficiently sample from several unconditional generative models (e.g., StyleGAN2, StyleNeRF, diffusion autoencoder, NVAE) in a controlled or/and de-biased manner using various off-the-shelf models: (1) with the CLIP model as control, PromptGen can sample images guided by text, (2) with image classifiers as control, PromptGen can de-bias generative models across a set of attributes or attribute combinations, and (3) with inverse graphics models as control, PromptGen can sample images of the same identity in different poses. (4) Finally, PromptGen reveals that the CLIP model shows a "reporting bias" when used as control, and PromptGen can further de-bias this controlled distribution in an iterative manner. The code is available at https://github.com/ChenWu98/Generative-Visual-Prompt.

下载PDF全文

下载文献需遵守相关版权规定

论文标题