avatarposer：稀疏运动感应的铰接全身姿势跟踪

论文标题

avatarposer：稀疏运动感应的铰接全身姿势跟踪

AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing

论文作者

Jiang, Jiaxi, Streli, Paul, Qiu, Huajian, Fender, Andreas, Laich, Larissa, Snape, Patrick, Holz, Christian

论文摘要

当今的混合现实头戴式显示器显示了用户在世界空间中的头部姿势，以及用户的手，以在增强现实和虚拟现实场景中进行互动。尽管这足以支持用户输入，但不幸的是，它仅将用户的虚拟表示仅限于其上部。因此，当前的系统诉诸于浮动化身，其限制在协作环境中尤为明显。为了估算稀疏输入源的全身姿势，先前的工作已在骨盆或下半身中融合了其他跟踪器和传感器，从而增加了设置的复杂性，并限制了移动设置中的实际应用。在本文中，我们提出了AvatarPoser，这是第一个基于学习的方法，该方法仅使用用户头和手中的运动输入来预测世界坐标中的全身姿势。我们的方法建立在变压器编码器上，以从输入信号中提取深层特征，并将全局运动从学到的局部关节取向中解脱出来，以指导姿势估计。为了获得类似于运动捕获动画的准确的全身运动，我们使用具有逆运动学的优化程序来完善臂关节位置，以匹配原始跟踪输入。在我们的评估中，Avatarposer实现了新的最新最新最新结果，从而对大型运动捕获数据集（AMASS）进行了评估。同时，我们的方法的推理速度支持实时操作，提供了一个实用的接口，以支持整体化的化身控制和代表元应用程序。

Today's Mixed Reality head-mounted displays track the user's head pose in world space as well as the user's hands for interaction in both Augmented Reality and Virtual Reality scenarios. While this is adequate to support user input, it unfortunately limits users' virtual representations to just their upper bodies. Current systems thus resort to floating avatars, whose limitation is particularly evident in collaborative settings. To estimate full-body poses from the sparse input sources, prior work has incorporated additional trackers and sensors at the pelvis or lower body, which increases setup complexity and limits practical application in mobile settings. In this paper, we present AvatarPoser, the first learning-based method that predicts full-body poses in world coordinates using only motion input from the user's head and hands. Our method builds on a Transformer encoder to extract deep features from the input signals and decouples global motion from the learned local joint orientations to guide pose estimation. To obtain accurate full-body motions that resemble motion capture animations, we refine the arm joints' positions using an optimization routine with inverse kinematics to match the original tracking input. In our evaluation, AvatarPoser achieved new state-of-the-art results in evaluations on large motion capture datasets (AMASS). At the same time, our method's inference speed supports real-time operation, providing a practical interface to support holistic avatar control and representation for Metaverse applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题