论文标题
优化2D姿势表示:提高无监督的2d-3d人姿势估计中的准确性,稳定性和普遍性
Optimising 2D Pose Representation: Improve Accuracy, Stability and Generalisability Within Unsupervised 2D-3D Human Pose Estimation
论文作者
论文摘要
本文解决了在无监督的2D至3D姿势提升过程中2D姿势表示的问题,以提高3D人姿势估计(HPE)模型的准确性,稳定性和普遍性。在训练期间,所有无监督的2d-3d HPE方法都为模型提供了整个2D运动骨架。我们认为,这是次优的和破坏性的,因为在训练过程中独立的2D关键点和预测的3D序列之间引起了远距离相关性。为此,我们进行了以下研究。我们的最大体系结构能力为6个残留块,我们评估了5种模型的性能,在对抗性无监督的2d-3d HPE过程中,每种模型代表2D姿势的不同。此外,我们还显示了在训练过程中学习的2D关键点之间的相关性,并强调了当将整个2D姿势提供给起重模型时引起的非直觉相关性。我们的结果表明,二维姿势的最佳表示是两个独立的细分市场,即躯干和腿部,每个提升网络之间没有共同的特征。与在整个2D运动骨架上训练的几乎相同的参数计数相比,这种方法在人类36m数据集上的平均误差下降了20 \%。此外,由于对抗性学习的复杂性质,我们展示了这种表示如何在训练过程中改善收敛性,从而更频繁地获得最佳的结果。
This paper addresses the problem of 2D pose representation during unsupervised 2D to 3D pose lifting to improve the accuracy, stability and generalisability of 3D human pose estimation (HPE) models. All unsupervised 2D-3D HPE approaches provide the entire 2D kinematic skeleton to a model during training. We argue that this is sub-optimal and disruptive as long-range correlations are induced between independent 2D key points and predicted 3D ordinates during training. To this end, we conduct the following study. With a maximum architecture capacity of 6 residual blocks, we evaluate the performance of 5 models which each represent a 2D pose differently during the adversarial unsupervised 2D-3D HPE process. Additionally, we show the correlations between 2D key points which are learned during the training process, highlighting the unintuitive correlations induced when an entire 2D pose is provided to a lifting model. Our results show that the most optimal representation of a 2D pose is that of two independent segments, the torso and legs, with no shared features between each lifting network. This approach decreased the average error by 20\% on the Human3.6M dataset when compared to a model with a near identical parameter count trained on the entire 2D kinematic skeleton. Furthermore, due to the complex nature of adversarial learning, we show how this representation can also improve convergence during training allowing for an optimum result to be obtained more often.