蒂夫根（Tivgan）：通过分步进化生成器的文字到图像到视频生成

论文标题

蒂夫根（Tivgan）：通过分步进化生成器的文字到图像到视频生成

TiVGAN: Text to Image to Video Generation with Step-by-Step Evolutionary Generator

论文作者

Kim, Doyeon, Joo, Donggyu, Kim, Junmo

论文摘要

技术的进步导致了可以创建所需视觉多媒体的方法的发展。特别是，使用深度学习的图像产生已经在各种领域进行了广泛的研究。相比之下，视频生成，尤其是在有条件的输入方面，仍然是一个具有挑战性且探索较少的领域。为了缩小这一差距，我们旨在训练我们的模型以制作与给定文本描述相对应的视频。我们提出了一个新颖的培训框架，文本到图像到视频生成对抗网络（Tivgan），该网络逐帧发展，最终产生了全长视频。在第一阶段，我们专注于在学习文本和图像之间的关系时创建高质量的单个视频框架。随着步骤的进行，我们的模型经过更多连续的框架进行了逐步培训。此逐步学习过程有助于稳定培训，并可以根据条件文本描述创建高分辨率视频。各种数据集的定性和定量实验结果证明了该方法的有效性。

Advances in technology have led to the development of methods that can create desired visual multimedia. In particular, image generation using deep learning has been extensively studied across diverse fields. In comparison, video generation, especially on conditional inputs, remains a challenging and less explored area. To narrow this gap, we aim to train our model to produce a video corresponding to a given text description. We propose a novel training framework, Text-to-Image-to-Video Generative Adversarial Network (TiVGAN), which evolves frame-by-frame and finally produces a full-length video. In the first phase, we focus on creating a high-quality single video frame while learning the relationship between the text and an image. As the steps proceed, our model is trained gradually on more number of consecutive frames.This step-by-step learning process helps stabilize the training and enables the creation of high-resolution video based on conditional text descriptions. Qualitative and quantitative experimental results on various datasets demonstrate the effectiveness of the proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题