快速移动：对语音驱动的自动手势生成的表示形式分析和后处理

论文标题

快速移动：对语音驱动的自动手势生成的表示形式分析和后处理

Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation

论文作者

Kucherenko, Taras, Hasegawa, Dai, Kaneko, Naoshi, Henter, Gustav Eje, Kjellström, Hedvig

论文摘要

本文介绍了一个新颖的语音驱动性手势产生框架，适用于虚拟药物，以增强人类计算机的相互作用。具体而言，我们通过合并表示形式学习，扩展了最近基于深度学习的，基于数据驱动的方法，用于语音驱动的手势。我们的模型将语音作为输入，并以3D坐标序列的形式产生手势作为输出。我们通过客观和主观评估对网络的输入（语音）和输出（运动）的不同表示形式进行分析。我们还分析了产生运动的平滑的重要性。我们的结果表明，根据客观措施，提出的方法在我们的基线上有所改善。例如，它更好地捕获了运动动力学，并更好地匹配了运动速度分布。此外，我们对两个不同数据集进行了用户研究。研究证实，我们所提出的方法比基线更自然，尽管通过适当的后处理消除了研究的差异：以髋关节为中心和平滑。我们得出的结论是，在设计自动手势生产方法时，请考虑运动表示和后处理。

This paper presents a novel framework for speech-driven gesture production, applicable to virtual agents to enhance human-computer interaction. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates. We provide an analysis of different representations for the input (speech) and the output (motion) of the network by both objective and subjective evaluations. We also analyse the importance of smoothing of the produced motion. Our results indicated that the proposed method improved on our baseline in terms of objective measures. For example, it better captured the motion dynamics and better matched the motion-speed distribution. Moreover, we performed user studies on two different datasets. The studies confirmed that our proposed method is perceived as more natural than the baseline, although the difference in the studies was eliminated by appropriate post-processing: hip-centering and smoothing. We conclude that it is important to take both motion representation and post-processing into account when designing an automatic gesture-production method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题