未配对的运动风格从视频到动画

论文标题

未配对的运动风格从视频到动画

Unpaired Motion Style Transfer from Video to Animation

论文作者

Aberman, Kfir, Weng, Yijia, Lischinski, Dani, Cohen-Or, Daniel, Chen, Baoquan

论文摘要

将运动样式从一个动画剪辑转移到另一个动画剪辑，同时保留后者的运动内容，这是角色动画的长期问题。大多数现有数据驱动的方法都是监督的，并依赖于配对数据，其中具有相同内容的动作以不同的样式执行。此外，这些方法仅限于在训练过程中看到的样式的转移。在本文中，我们提出了一个新颖的运动风格数据驱动框架，该框架从带有样式标签的不成对动作集合中学习，并使训练过程中未观察到转移运动样式。此外，我们的框架能够直接从视频中提取运动样式，绕过3D重建，并将其应用于3D输入运动。我们的样式传输网络将动作编码为两个潜在代码，用于内容和样式，每个代码在解码（综合）过程中扮演着不同的作用。当内容代码通过几个时间卷积层解码为输出运动中时，样式代码通过时间不变的自适应实例归一化（ADAIN）修改了深层功能。此外，当内容代码从3D关节旋转中编码时，我们从3D或2D接头位置学习了一种常见的样式嵌入，从而从视频中汲取了样式的提取。尽管不需要配对的培训数据，但我们的结果与最先进的结果相当，并且在转移以前看不见的样式时的其他方法都优于其他方法。据我们所知，我们是第一个直接演示风格从视频转移到3D动画的人，这种功能使人们能够扩展一组样式示例，远远超出了MoCap Systems捕获的动作。

Transferring the motion style from one animation clip to another, while preserving the motion content of the latter, has been a long-standing problem in character animation. Most existing data-driven approaches are supervised and rely on paired data, where motions with the same content are performed in different styles. In addition, these approaches are limited to transfer of styles that were seen during training. In this paper, we present a novel data-driven framework for motion style transfer, which learns from an unpaired collection of motions with style labels, and enables transferring motion styles not observed during training. Furthermore, our framework is able to extract motion styles directly from videos, bypassing 3D reconstruction, and apply them to the 3D input motion. Our style transfer network encodes motions into two latent codes, for content and for style, each of which plays a different role in the decoding (synthesis) process. While the content code is decoded into the output motion by several temporal convolutional layers, the style code modifies deep features via temporally invariant adaptive instance normalization (AdaIN). Moreover, while the content code is encoded from 3D joint rotations, we learn a common embedding for style from either 3D or 2D joint positions, enabling style extraction from videos. Our results are comparable to the state-of-the-art, despite not requiring paired training data, and outperform other methods when transferring previously unseen styles. To our knowledge, we are the first to demonstrate style transfer directly from videos to 3D animations - an ability which enables one to extend the set of style examples far beyond motions captured by MoCap systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题