提升物：使用注意模型的3D人姿势估计

论文标题

提升物：使用注意模型的3D人姿势估计

LiftFormer: 3D Human Pose Estimation using attention models

论文作者

Llopart, Adrian

论文摘要

估计人类关节的3D位置已成为过去几年中广泛研究的主题。特殊重点是定义了将二维数据（关键点）推断为3D的新型方法，即预测与人类骨骼相关的关节的根层坐标。最新的研究趋势已经证明，变压器编码器块构造的时间信息比以前的方法要好得多。因此，我们提出了这些模型的用法，以通过使用有序序列的人类姿势在视频中利用注意机制来获得更准确的3D预测。我们的方法始终优于文献的先前最佳结果，当时使用2D KeyPoint预测器（44.8 mpjpe，0.7％，提高了44.8 mpjpe，0.7％）和2mm（MPJPE：31.9，8.4％改善）的地面真相输入在人3.6m上。它还以10.5 p-MPJPE（降低22.2％）的人Humaneva-I数据集上的最新性能。我们的模型中的参数数量很容易调谐，并且比当前方法（16.5m和112.5m）更小（9.5m），同时仍然具有更好的性能。因此，我们的3D提升模型的精度超过了其他端到端或SMPL方法的精度，并且与许多多视图方法相媲美。

Estimating the 3D position of human joints has become a widely researched topic in the last years. Special emphasis has gone into defining novel methods that extrapolate 2-dimensional data (keypoints) into 3D, namely predicting the root-relative coordinates of joints associated to human skeletons. The latest research trends have proven that the Transformer Encoder blocks aggregate temporal information significantly better than previous approaches. Thus, we propose the usage of these models to obtain more accurate 3D predictions by leveraging temporal information using attention mechanisms on ordered sequences human poses in videos. Our method consistently outperforms the previous best results from the literature when using both 2D keypoint predictors by 0.3 mm (44.8 MPJPE, 0.7% improvement) and ground truth inputs by 2mm (MPJPE: 31.9, 8.4% improvement) on Human3.6M. It also achieves state-of-the-art performance on the HumanEva-I dataset with 10.5 P-MPJPE (22.2% reduction). The number of parameters in our model is easily tunable and is smaller (9.5M) than current methodologies (16.95M and 11.25M) whilst still having better performance. Thus, our 3D lifting model's accuracy exceeds that of other end-to-end or SMPL approaches and is comparable to many multi-view methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题