论文标题
尽我所能:通过空间和时间限制在单眼视频之间转移人类运动和外观
Do As I Do: Transferring Human Motion and Appearance between Monocular Videos with Spatial and Temporal Constraints
论文作者
论文摘要
从真实演员的图像中创建合理的虚拟参与者仍然是计算机视觉和计算机图形方面的关键挑战之一。野外图像中的无标记人类运动估计和形状建模将这一挑战引起了人们的关注。尽管最近的视图合成和图像到图像翻译的进展,但当前可用的配方仅限于转移样式,并且不考虑角色的运动和形状,而这些运动和形状本质上都与产生合理的人类形式相结合。在本文中,我们提出了一种统一的公式,用于从单眼视频中转移外观和重新定位人类运动,以指示所有这些方面。我们的方法在最初记录的不同情况下综合了人们的新视频。与最近的外观转移方法不同,我们的方法考虑了身体形状,外观和运动限制。使用包含硬条件的公开真实视频进行多个实验进行评估。我们的方法能够传递人的运动和外观优于最先进的方法,同时保留必须维护的运动的特定特征(例如,脚接触地板,触摸特定物体的手)并保持最佳的视觉质量和外观指标,例如结构性相似性(SSIM)(SSIM)和知名度图像贴片(LPIPS)。
Creating plausible virtual actors from images of real actors remains one of the key challenges in computer vision and computer graphics. Marker-less human motion estimation and shape modeling from images in the wild bring this challenge to the fore. Although the recent advances on view synthesis and image-to-image translation, currently available formulations are limited to transfer solely style and do not take into account the character's motion and shape, which are by nature intermingled to produce plausible human forms. In this paper, we propose a unifying formulation for transferring appearance and retargeting human motion from monocular videos that regards all these aspects. Our method synthesizes new videos of people in a different context where they were initially recorded. Differently from recent appearance transferring methods, our approach takes into account body shape, appearance, and motion constraints. The evaluation is performed with several experiments using publicly available real videos containing hard conditions. Our method is able to transfer both human motion and appearance outperforming state-of-the-art methods, while preserving specific features of the motion that must be maintained (e.g., feet touching the floor, hands touching a particular object) and holding the best visual quality and appearance metrics such as Structural Similarity (SSIM) and Learned Perceptual Image Patch Similarity (LPIPS).