使用多模式数据的单眼实时手形和运动捕获

论文标题

使用多模式数据的单眼实时手形和运动捕获

Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data

论文作者

Zhou, Yuxiao, Habermann, Marc, Xu, Weipeng, Habibie, Ikhsanul, Theobalt, Christian, Xu, Feng

论文摘要

我们提出了一种新颖的方法，用于以100fps的前所未有的运行时性能和最先进的精度，用于单眼手部形状和姿势估计。这是通过设计的新的基于学习的架构来启用的，以便它可以利用所有可用的手训练数据的来源：带有2D或3D注释的图像数据，以及独立的3D动画，而无需相应的图像数据。它具有3D手关节检测模块和一个逆运动模块，该模块不仅会回归3D关节位置，还可以将它们映射到单个进料前传球中的关节旋转。与仅回归3D关节位置相比，该方法使该方法更直接用于计算机视觉和图形的应用。我们证明，我们的建筑设计可在几种具有挑战性的基准方面对艺术的状态进行重大定量和定性的改进。我们的模型可公开用于未来的研究。

We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps and at state-of-the-art accuracy. This is enabled by a new learning based architecture designed such that it can make use of all the sources of available hand training data: image data with either 2D or 3D annotations, as well as stand-alone 3D animations without corresponding image data. It features a 3D hand joint detection module and an inverse kinematics module which regresses not only 3D joint positions but also maps them to joint rotations in a single feed-forward pass. This output makes the method more directly usable for applications in computer vision and graphics compared to only regressing 3D joint positions. We demonstrate that our architectural design leads to a significant quantitative and qualitative improvement over the state of the art on several challenging benchmarks. Our model is publicly available for future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题