论文标题
像狗一样说话:人到非人类生物的语音转换
Speak Like a Dog: Human to Non-human creature Voice Conversion
论文作者
论文摘要
本文提出了一个新的语音转换(VC)任务,从人类言语到类似狗的语音,同时保存语言信息作为人类到非人类生物语音转换(H2NH-VC)任务的一个例子。尽管大多数VC研究都涉及人类的VC,但H2NH-VC旨在将人类的言论转变为非人类生物式的言语。非平行VC允许我们开发H2NH-VC,因为我们无法收集非人类生物说人类语言的并行数据集。在这项研究中,我们建议将狗用作非人类生物目标领域的一个例子,并定义“像狗一样说话”任务。为了澄清“像狗一样说话”任务的可能性和特征,我们使用现有的代表性非平行VC方法进行了比较实验,以声学特征(Mel-Cepstral系数和MEL-SECTROGINS)和网络体系结构(五个不同的Kernel-size设置)以及培训标准(基于五个不同的自动启动)(Vaie AutoCoder(Vae)(VAE)(VAIE)(VAE)(VAE)(VAIE)(VAIE)(VAIE)(VAIE)(VAE)(VAE)。最后,使用平均意见分数评估了转换后的声音:狗般的声音,声音质量和可理解性以及字符错误率(CER)。该实验表明,MEL光谱图的使用改善了转换后的语音的狗的类似性,而保留语言信息则具有挑战性。强调了H2NH-VC当前VC方法的挑战和局限性。
This paper proposes a new voice conversion (VC) task from human speech to dog-like speech while preserving linguistic information as an example of human to non-human creature voice conversion (H2NH-VC) tasks. Although most VC studies deal with human to human VC, H2NH-VC aims to convert human speech into non-human creature-like speech. Non-parallel VC allows us to develop H2NH-VC, because we cannot collect a parallel dataset that non-human creatures speak human language. In this study, we propose to use dogs as an example of a non-human creature target domain and define the "speak like a dog" task. To clarify the possibilities and characteristics of the "speak like a dog" task, we conducted a comparative experiment using existing representative non-parallel VC methods in acoustic features (Mel-cepstral coefficients and Mel-spectrograms), network architectures (five different kernel-size settings), and training criteria (variational autoencoder (VAE)- based and generative adversarial network-based). Finally, the converted voices were evaluated using mean opinion scores: dog-likeness, sound quality and intelligibility, and character error rate (CER). The experiment showed that the employment of the Mel-spectrogram improved the dog-likeness of the converted speech, while it is challenging to preserve linguistic information. Challenges and limitations of the current VC methods for H2NH-VC are highlighted.