通过合并词汇语义来完成且可控的图像完成

论文标题

通过合并词汇语义来完成且可控的图像完成

Grounded and Controllable Image Completion by Incorporating Lexical Semantics

论文作者

Zhang, Shengyu, Jiang, Tan, Huang, Qinghao, Tan, Ziqi, Zhao, Zhou, Tang, Siliang, Yu, Jin, Yang, Hongxia, Yang, Yi, Wu, Fei

论文摘要

在本文中，我们提出了一种方法，即词汇语义图像完成（LSIC），该方法可能在艺术，设计和遗产保护中具有潜在的应用。现有的图像完成过程是高度主观的，仅考虑视觉上下文，这可能会触发不可预测的结果，这是合理但不忠于基础知识的。为了允许接地和可控的完成过程，我们主张忠实于视觉和词汇语义上下文，即对图像中留下孔或空白区域的描述（例如孔描述）。 LSIC的一个主要挑战来自建模和对齐视觉语义环境的结构，并跨不同方式翻译。我们将此过程称为结构完成，这是通过模型中的多重推理块实现的。另一个挑战与单峰偏见有关，该偏差发生在模型生成合理的结果而无需使用文本描述时。这可能是正确的，因为在现有数据集中，图像的注释标题通常在语义上等效，因此在培训中只有一个配对文本用于掩盖图像。除了过多探索的配对重建路径外，我们还设计了一种无监督的不属于的创造学习路径，以及一种多阶段训练策略来减轻标记数据的不足。我们进行了广泛的定量和定性实验以及消融研究，揭示了我们提出的LSIC的功效。

In this paper, we present an approach, namely Lexical Semantic Image Completion (LSIC), that may have potential applications in art, design, and heritage conservation, among several others. Existing image completion procedure is highly subjective by considering only visual context, which may trigger unpredictable results which are plausible but not faithful to a grounded knowledge. To permit both grounded and controllable completion process, we advocate generating results faithful to both visual and lexical semantic context, i.e., the description of leaving holes or blank regions in the image (e.g., hole description). One major challenge for LSIC comes from modeling and aligning the structure of visual-semantic context and translating across different modalities. We term this process as structure completion, which is realized by multi-grained reasoning blocks in our model. Another challenge relates to the unimodal biases, which occurs when the model generates plausible results without using the textual description. This can be true since the annotated captions for an image are often semantically equivalent in existing datasets, and thus there is only one paired text for a masked image in training. We devise an unsupervised unpaired-creation learning path besides the over-explored paired-reconstruction path, as well as a multi-stage training strategy to mitigate the insufficiency of labeled data. We conduct extensive quantitative and qualitative experiments as well as ablation studies, which reveal the efficacy of our proposed LSIC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题