IR-GAN：通过增加推理的语言教学的图像操纵

论文标题

IR-GAN：通过增加推理的语言教学的图像操纵

IR-GAN: Image Manipulation with Linguistic Instruction by Increment Reasoning

论文作者

Liu, Zhenhuan, Deng, Jincan, Li, Liang, Cai, Shaofei, Xu, Qianqian, Wang, Shuhui, Huang, Qingming

论文摘要

有条件的图像生成是一个主动研究主题，包括文本2图像和图像翻译。最近，使用语言指导的图像操纵带来了多模式条件产生的新挑战。但是，传统的有条件图像生成模型主要集中于产生高质量和视觉逼真的图像，并且缺乏解决图像和教学之间的部分一致性。为了解决此问题，我们提出了一个增量推理生成对抗网络（IR-GAN），该网络旨在推论图像中视觉增量与指令语义增量之间的一致性。首先，我们介绍单词级别和指令级指令编码器，以从历史相关的指令中学习用户的意图，作为语义增量。其次，我们将语义增量的表示形式嵌入到源图像的生成目标图像中，其中源图像起着引用辅助作用。最后，我们提出了一个推理歧视器，以衡量视觉增量和语义增量之间的一致性，从而净化用户的意图并确保生成的目标图像的良好逻辑。在两个数据集上进行的广泛实验和可视化显示了IR-GAN的有效性。

Conditional image generation is an active research topic including text2image and image translation. Recently image manipulation with linguistic instruction brings new challenges of multimodal conditional generation. However, traditional conditional image generation models mainly focus on generating high-quality and visually realistic images, and lack resolving the partial consistency between image and instruction. To address this issue, we propose an Increment Reasoning Generative Adversarial Network (IR-GAN), which aims to reason the consistency between visual increment in images and semantic increment in instructions. First, we introduce the word-level and instruction-level instruction encoders to learn user's intention from history-correlated instructions as semantic increment. Second, we embed the representation of semantic increment into that of source image for generating target image, where source image plays the role of referring auxiliary. Finally, we propose a reasoning discriminator to measure the consistency between visual increment and semantic increment, which purifies user's intention and guarantees the good logic of generated target image. Extensive experiments and visualization conducted on two datasets show the effectiveness of IR-GAN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题