论文标题
带有条件变异自动编码器的重音文本到语音综合
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
论文作者
论文摘要
口音在语音交流中起着重要的作用,影响了一个人理解和传达一个人的身份的能力。本文介绍了一个基于条件变化自动编码器的重音文本到语音(TTS)合成的新颖而有效的框架。它具有综合选定扬声器的声音并将其转换为任何所需的目标口音的能力。我们的彻底实验使用客观和主观评估验证了所提出的框架的有效性。结果还显示了模型在合成语音中操纵口音的能力方面的出色表现。总体而言,我们提出的框架为未来重音TTS研究提供了有希望的途径。
Accent plays a significant role in speech communication, influencing one's capability to understand as well as conveying a person's identity. This paper introduces a novel and efficient framework for accented Text-to-Speech (TTS) synthesis based on a Conditional Variational Autoencoder. It has the ability to synthesize a selected speaker's voice, and convert this to any desired target accent. Our thorough experiments validate the effectiveness of the proposed framework using both objective and subjective evaluations. The results also show remarkable performance in terms of the model's ability to manipulate accents in the synthesized speech. Overall, our proposed framework presents a promising avenue for future accented TTS research.