在面对面互动场景中比较人类运动和姿势预测的时空模型

论文标题

在面对面互动场景中比较人类运动和姿势预测的时空模型

Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios

论文作者

Barquero, German, Núñez, Johnny, Xu, Zhen, Escalera, Sergio, Tu, Wei-Wei, Guyon, Isabelle, Palmero, Cristina

论文摘要

人类互动期间预测的人类行为对于为机器人或虚拟代理提供社会智力至关重要。对于高度由人际动态驱动的场景，这个问题尤其具有挑战性。在这项工作中，我们介绍了对行为预测的最新方法的第一个系统比较。为此，我们利用最近发布的Udiva V0.5的全身注释（面部，身体和手），具有面对面的二元相互作用。我们最好的基于注意力的方法在Udiva V0.5中实现了最先进的表现。我们表明，通过为短期未来（<400ms）培训的方法自动锻炼预测未来，我们即使在长期的未来（最多2s）方面都优于基准。我们还表明，当使用高度嘈杂的注释时，这一发现就会成立，这为使用弱监督学习的学习打开了新的视野。结合大规模数据集，这可能有助于提高该领域的进步。

Human behavior forecasting during human-human interactions is of utmost importance to provide robotic or virtual agents with social intelligence. This problem is especially challenging for scenarios that are highly driven by interpersonal dynamics. In this work, we present the first systematic comparison of state-of-the-art approaches for behavior forecasting. To do so, we leverage whole-body annotations (face, body, and hands) from the very recently released UDIVA v0.5, which features face-to-face dyadic interactions. Our best attention-based approaches achieve state-of-the-art performance in UDIVA v0.5. We show that by autoregressively predicting the future with methods trained for the short-term future (<400ms), we outperform the baselines even for a considerably longer-term future (up to 2s). We also show that this finding holds when highly noisy annotations are used, which opens new horizons towards the use of weakly-supervised learning. Combined with large-scale datasets, this may help boost the advances in this field.

下载PDF全文

下载文献需遵守相关版权规定

论文标题