后部：共同发展的3D人姿势估计，模仿和幻觉

论文标题

后部：共同发展的3D人姿势估计，模仿和幻觉

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision

论文作者

Gong, Kehong, Li, Bingbing, Zhang, Jianfeng, Wang, Tao, Huang, Jing, Mi, Michael Bi, Feng, Jiashi, Wang, Xinchao

论文摘要

现有的自我监督的3D人类姿势估计方案在很大程度上依赖于弱的监督，例如一致性损失来指导学习，这不可避免地会导致不属于看不见的现实情况。在本文中，我们提出了一种新颖的自我监督方法，使我们能够通过一个自我增强的双环学习框架明确生成2d-3d姿势对以增强监督。通过引入基于加强学习的模仿者，这是可以实现的，该模仿者与姿势幻觉者一起共同学习。在训练过程中，这三个组件形成两个循环，相互补充和增强。具体而言，姿势估计器将输入2D姿势序列转换为低保真3D输出，然后通过强制执行物理约束的模仿者增强。随后，精致的3D姿势被送给幻影仪，以生成更加多样化的数据，而这些数据反过来又被模仿者增强并进一步用于训练姿势估计器。实际上，这样的共同进化方案使训练姿势估计器在不依赖任何给定的3D数据的情况下对自我生成的运动数据进行训练。跨各种基准测试的广泛实验表明，我们的方法产生的令人鼓舞的结果显着超过了艺术的状态，在某些情况下，即使与完全监督的方法的结果相当。值得注意的是，在自我监督的跨数据库评估设置下，它在MPI-INF-3DHP上实现了89.1％的3D PCK，从而改善了先前最佳自我监督方法的8.6％。代码可以在以下网址找到：https：//github.com/garfield-kh/posetriplet

Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions like consistency loss to guide the learning, which, inevitably, leads to inferior results in real-world scenarios with unseen poses. In this paper, we propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision, through a self-enhancing dual-loop learning framework. This is made possible via introducing a reinforcement-learning-based imitator, which is learned jointly with a pose estimator alongside a pose hallucinator; the three components form two loops during the training process, complementing and strengthening one another. Specifically, the pose estimator transforms an input 2D pose sequence to a low-fidelity 3D output, which is then enhanced by the imitator that enforces physical constraints. The refined 3D poses are subsequently fed to the hallucinator for producing even more diverse data, which are, in turn, strengthened by the imitator and further utilized to train the pose estimator. Such a co-evolution scheme, in practice, enables training a pose estimator on self-generated motion data without relying on any given 3D data. Extensive experiments across various benchmarks demonstrate that our approach yields encouraging results significantly outperforming the state of the art and, in some cases, even on par with results of fully-supervised methods. Notably, it achieves 89.1% 3D PCK on MPI-INF-3DHP under self-supervised cross-dataset evaluation setup, improving upon the previous best self-supervised methods by 8.6%. Code can be found at: https://github.com/Garfield-kh/PoseTriplet

下载PDF全文

下载文献需遵守相关版权规定

论文标题