量化潜在空间中文本和图像条件图像合成的新型抽样方案

论文标题

量化潜在空间中文本和图像条件图像合成的新型抽样方案

A Novel Sampling Scheme for Text- and Image-Conditional Image Synthesis in Quantized Latent Spaces

论文作者

Rampas, Dominic, Pernias, Pablo, Aubreville, Marc

论文摘要

文本到图像合成领域的最新进展已达到与质量，忠诚度和多样性有关的多种增强。当代技术可以产生高度复杂的视觉效果，这些视觉效果迅速接近现实主义的质量。然而，随着进步的发展，这些方法的复杂性增加了，从而加强了野外个人与其外部人员之间的理解障碍。为了减轻这种差异，我们提出了一种简化的文本对图像生成方法，该方法涵盖了训练范式和采样过程。尽管它具有显着的简单性，但我们的方法在几乎没有采样的迭代中产生了美学上令人愉悦的图像，可以为制定模型调理的有趣方法，并赋予最先进的技术缺乏优势。为了证明这种方法在达到与现有作品相当的结果方面的功效，我们培训了十亿个参数文本条件模型，我们称之为“西班牙语”。为了促进该领域的未来探索，我们使研究界公开访问的源代码和模型。

Recent advancements in the domain of text-to-image synthesis have culminated in a multitude of enhancements pertaining to quality, fidelity, and diversity. Contemporary techniques enable the generation of highly intricate visuals which rapidly approach near-photorealistic quality. Nevertheless, as progress is achieved, the complexity of these methodologies increases, consequently intensifying the comprehension barrier between individuals within the field and those external to it. In an endeavor to mitigate this disparity, we propose a streamlined approach for text-to-image generation, which encompasses both the training paradigm and the sampling process. Despite its remarkable simplicity, our method yields aesthetically pleasing images with few sampling iterations, allows for intriguing ways for conditioning the model, and imparts advantages absent in state-of-the-art techniques. To demonstrate the efficacy of this approach in achieving outcomes comparable to existing works, we have trained a one-billion parameter text-conditional model, which we refer to as "Paella". In the interest of fostering future exploration in this field, we have made our source code and models publicly accessible for the research community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题