Transmomo：不变性驱动的无监督视频运动重新定位

论文标题

Transmomo：不变性驱动的无监督视频运动重新定位

TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

论文作者

Yang, Zhuoqian, Zhu, Wentao, Wu, Wayne, Qian, Chen, Zhou, Qiang, Zhou, Bolei, Loy, Chen Change

论文摘要

我们提出了一种轻巧的视频运动重新定位方法TransMomo，该方法能够将人在源视频中实际转移到目标人的另一个视频中。无需使用任何配对数据进行监督，就可以通过利用三种正交因子的不变性特性，包括运动，结构和视角，以无监督的方式进行训练。具体而言，通过基于不变性精心得出的损失函数，我们训练自动编码器，以解散鉴于源和目标视频剪辑的这些因素的潜在表示。这使我们能够选择性地将运动从源视频无缝提取的动作无缝地转移到目标视频，尽管源和目标之间存在结构性和视角差异。配对数据的轻松假设使我们的方法可以接受大量视频的培训，不必要对源目标配对进行手动注释，从而改善了针对大型结构变化和视频中极端运动的鲁棒性。我们证明了我们方法对最新方法的有效性。代码，模型和数据在我们的项目页面（https://yzhq97.github.io/transmomo）上公开可用。

We present a lightweight video motion retargeting approach TransMoMo that is capable of transferring motion of a person in a source video realistically to another video of a target person. Without using any paired data for supervision, the proposed method can be trained in an unsupervised manner by exploiting invariance properties of three orthogonal factors of variation including motion, structure, and view-angle. Specifically, with loss functions carefully derived based on invariance, we train an auto-encoder to disentangle the latent representations of such factors given the source and target video clips. This allows us to selectively transfer motion extracted from the source video seamlessly to the target video in spite of structural and view-angle disparities between the source and the target. The relaxed assumption of paired data allows our method to be trained on a vast amount of videos needless of manual annotation of source-target pairing, leading to improved robustness against large structural variations and extreme motion in videos. We demonstrate the effectiveness of our method over the state-of-the-art methods. Code, model and data are publicly available on our project page (https://yzhq97.github.io/transmomo).

下载PDF全文

下载文献需遵守相关版权规定

论文标题