posegpt：基于量化的3D人类运动产生和预测

论文标题

posegpt：基于量化的3D人类运动产生和预测

PoseGPT: Quantization-based 3D Human Motion Generation and Forecasting

论文作者

Lucas, Thomas, Baradel, Fabien, Weinzaepfel, Philippe, Rogez, Grégory

论文摘要

我们解决了人类运动序列的动作条件生成的问题。现有工作分为两类：预测模型以观察到的过去动作或仅以动作标签和持续时间为条件的生成模型。相比之下，我们产生的运动以任意长度的观察为条件，包括无。为了解决这个广义问题，我们提出了一种自动回归变压器的方法Posegpt，它在内部将人类运动压缩为量化的潜在序列。自动编码器首先将人类运动映射到离散空间中的潜在索引序列，反之亦然。受生成预估计的变压器（GPT）的启发，我们建议训练类似GPT的模型，以在该领域的下一步预测。这允许Posegpt在可能的期货上输出分布，或者在过去的运动中没有条件。潜在空间的离散性和压缩性质使类似GPT的模型可以专注于远程信号，因为它消除了输入信号中的低级冗余。预测离散指数还减轻了预测平均姿势的共同陷阱，这是回归连续值的典型故障案例，因为离散目标的平均值不是目标本身。我们的实验结果表明，我们提出的方法在人类Act12上获得了最先进的结果。

We address the problem of action-conditioned generation of human motion sequences. Existing work falls into two categories: forecast models conditioned on observed past motions, or generative models conditioned on action labels and duration only. In contrast, we generate motion conditioned on observations of arbitrary length, including none. To solve this generalized problem, we propose PoseGPT, an auto-regressive transformer-based approach which internally compresses human motion into quantized latent sequences. An auto-encoder first maps human motion to latent index sequences in a discrete space, and vice-versa. Inspired by the Generative Pretrained Transformer (GPT), we propose to train a GPT-like model for next-index prediction in that space; this allows PoseGPT to output distributions on possible futures, with or without conditioning on past motion. The discrete and compressed nature of the latent space allows the GPT-like model to focus on long-range signal, as it removes low-level redundancy in the input signal. Predicting discrete indices also alleviates the common pitfall of predicting averaged poses, a typical failure case when regressing continuous values, as the average of discrete targets is not a target itself. Our experimental results show that our proposed approach achieves state-of-the-art results on HumanAct12, a standard but small scale dataset, as well as on BABEL, a recent large scale MoCap dataset, and on GRAB, a human-object interactions dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题