论文标题

手球手:语言意识到语音驱动的手势产生的框架

Gesticulator: A framework for semantically-aware speech-driven gesture generation

论文作者

Kucherenko, Taras, Jonell, Patrik, van Waveren, Sanne, Henter, Gustav Eje, Alexanderson, Simon, Leite, Iolanda, Kjellström, Hedvig

论文摘要

在讲话中,人们自发地打手势,在传达信息中起着关键作用。同样,现实的共同语音手势对于实现与社会代理人的自然和平稳互动至关重要。当前的端到端共同语音手势生成系统使用单个模式来表示语音:音频或文本。因此,这些系统仅限于产生与声学链接的手势或语义链接手势(例如,在说“高”时举起一只手):他们无法适当学会生成两种手势类型。我们提出了一种旨在共同产生任意节拍和语义手势的模型。我们基于深度学习的模型将语音的声学和语义表示为输入,并以输出为关节角度旋转的序列产生手势。所产生的手势可以应用于虚拟药物和人形机器人。主观和客观评估证实了我们方法的成功。代码和视频可在项目页面https://svito-zar.github.io/genticulator上获得。

During speech, people spontaneously gesticulate, which plays a key role in conveying information. Similarly, realistic co-speech gestures are crucial to enable natural and smooth interactions with social agents. Current end-to-end co-speech gesture generation systems use a single modality for representing speech: either audio or text. These systems are therefore confined to producing either acoustically-linked beat gestures or semantically-linked gesticulation (e.g., raising a hand when saying "high"): they cannot appropriately learn to generate both gesture types. We present a model designed to produce arbitrary beat and semantic gestures together. Our deep-learning based model takes both acoustic and semantic representations of speech as input, and generates gestures as a sequence of joint angle rotations as output. The resulting gestures can be applied to both virtual agents and humanoid robots. Subjective and objective evaluations confirm the success of our approach. The code and video are available at the project page https://svito-zar.github.io/gesticulator .

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源