Shimon The Rapper：人类互动说唱战的实时系统

论文标题

Shimon The Rapper：人类互动说唱战的实时系统

Shimon the Rapper: A Real-Time System for Human-Robot Interactive Rap Battles

论文作者

Savery, Richard, Zahray, Lisa, Weinberg, Gil

论文摘要

我们介绍了一个以嘻哈风格的人与机器人之间实时抒情即兴的系统。我们的系统从人类的说唱歌手中获取声音输入，分析语义含义，并产生一个由机器人在音乐凹槽上敲回的响应。实时互动音乐系统的先前工作主要集中在乐器输出上，并且已经探索了与机器人的声音互动，但在音乐背景下却没有。我们的生成系统包括用于审查，语音，节奏，押韵和基于音素嵌入的新型深度学习管道的自定义方法。说唱表演伴随着同步的机器人手势和口腔动作。在系统中克服的关键技术挑战是发展押韵，以低延迟和数据集审查制度进行。我们通过对视频和示例文本输出的调查评估了系统的几个方面。评论的分析表明，对系统的整体看法是积极的。在我们的嘻哈数据集上训练的模型在连贯性，押韵质量和享受方面的评分显着高于我们的金属数据集。参与者更喜欢由给定输入短语生成的输出，而不是未知关键字生成的输出，这表明该系统成功将其输出与其输入联系起来。

We present a system for real-time lyrical improvisation between a human and a robot in the style of hip hop. Our system takes vocal input from a human rapper, analyzes the semantic meaning, and generates a response that is rapped back by a robot over a musical groove. Previous work with real-time interactive music systems has largely focused on instrumental output, and vocal interactions with robots have been explored, but not in a musical context. Our generative system includes custom methods for censorship, voice, rhythm, rhyming and a novel deep learning pipeline based on phoneme embeddings. The rap performances are accompanied by synchronized robotic gestures and mouth movements. Key technical challenges that were overcome in the system are developing rhymes, performing with low-latency and dataset censorship. We evaluated several aspects of the system through a survey of videos and sample text output. Analysis of comments showed that the overall perception of the system was positive. The model trained on our hip hop dataset was rated significantly higher than our metal dataset in coherence, rhyme quality, and enjoyment. Participants preferred outputs generated by a given input phrase over outputs generated from unknown keywords, indicating that the system successfully relates its output to its input.

下载PDF全文

下载文献需遵守相关版权规定

论文标题