测量对看不见的观点，表达，形状和物体的概括，以进行3D手动姿势估计

论文标题

测量对看不见的观点，表达，形状和物体的概括，以进行3D手动姿势估计

Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction

论文作者

Armagan, Anil, Garcia-Hernando, Guillermo, Baek, Seungryul, Hampali, Shreyas, Rad, Mahdi, Zhang, Zhaohui, Xie, Shipeng, Chen, MingXiu, Zhang, Boshen, Xiong, Fu, Xiao, Yang, Cao, Zhiguo, Yuan, Junsong, Ren, Pengfei, Huang, Weiting, Sun, Haifeng, Hrúz, Marek, Kanis, Jakub, Krňoul, Zdeněk, Wan, Qingfu, Li, Shile, Yang, Linlin, Lee, Dongheui, Yao, Angela, Zhou, Weiguo, Mei, Sijia, Liu, Yunhui, Spurr, Adrian, Iqbal, Umar, Molchanov, Pavlo, Weinzaepfel, Philippe, Brégier, Romain, Rogez, Grégory, Lepetit, Vincent, Kim, Tae-Kyun

论文摘要

我们研究了在单手情况和手动相互作用下，不同类型的方法在3D手姿势估计的任务中的推广程度如何。我们表明，最先进方法的准确性可能会下降，并且它们主要因训练集所缺乏的姿势而失败。不幸的是，由于手持姿势的空间高度尺寸，尽管最近在收集大规模培训数据集方面进行了努力，但仍可覆盖整个空间。当双手与物体和/或输入相互作用是RGB而不是深度图像时，此采样问题更加严重，因为RGB图像也随照明条件和颜色而变化。为了解决这些问题，我们设计了一个公共挑战（Hands'19），以评估当前3D手姿势估计器（HPE）的能力，以插入和推断训练集的姿势。更确切地说，Hands'19的设计（a）是为了评估深度和颜色方式对3D手姿势估计的影响，在存在或不存在物体的情况下；（b）评估W.R.T.的概括能力四个主轴：形状，表达，观点和对象；（c）探索使用合成手模型来填补当前数据集的空白的使用。通过挑战，总体准确性在基线上显着提高，尤其是在外推任务上，从27mm到13mm的平均关节误差。我们的分析强调了：数据预处理，集合方法，使用参数3D手模型（MANO）和不同的HPE方法/骨架的影响。

We study how well different types of approaches generalise in the task of 3D hand pose estimation under single hand scenarios and hand-object interaction. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole space densely, despite recent efforts in collecting large-scale training datasets. This sampling problem is even more severe when hands are interacting with objects and/or inputs are RGB rather than depth images, as RGB images also vary with lighting conditions and colors. To address these issues, we designed a public challenge (HANDS'19) to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set. More exactly, HANDS'19 is designed (a) to evaluate the influence of both depth and color modalities on 3D hand pose estimation, under the presence or absence of objects; (b) to assess the generalisation abilities w.r.t. four main axes: shapes, articulations, viewpoints, and objects; (c) to explore the use of a synthetic hand model to fill the gaps of current datasets. Through the challenge, the overall accuracy has dramatically improved over the baseline, especially on extrapolation tasks, from 27mm to 13mm mean joint error. Our analyses highlight the impacts of: Data pre-processing, ensemble approaches, the use of a parametric 3D hand model (MANO), and different HPE methods/backbones.

下载PDF全文

下载文献需遵守相关版权规定

论文标题