论文标题

用于优化越南文本到语音系统自然性的数据处理

Data Processing for Optimizing Naturalness of Vietnamese Text-to-speech System

论文作者

Phung, Viet Lam, Kinh, Phan Huy, Dinh, Anh Tuan, Nguyen, Quoc Bao

论文摘要

抽象的端到端文本对语音(TTS)系统已证明其在具有高质量麦克风的良好房间中记录的大量高质量训练数据的存在下。另一种方法是使用可用的数据源,例如无线电广播新闻。我们旨在使用新型数据处理方法在发现的数据上优化TTS系统的自然性。数据处理方法包括1)话语选择和2)韵律标点符号插入,以准备可以优化TTS系统自然性的训练数据。我们表明,使用处理数据方法,端到端TTS的平均意见评分(MOS)为4.1,而自然语音的次数为4.3。我们表明,标点符号插入对结果的贡献最大。为了促进TTS系统的研究和开发,我们在https://forms.gle/6HK5YKQGDXAAC2BU6上分发了一位发言人的处理数据。

Abstract End-to-end text-to-speech (TTS) systems has proved its great success in the presence of a large amount of high-quality training data recorded in anechoic room with high-quality microphone. Another approach is to use available source of found data like radio broadcast news. We aim to optimize the naturalness of TTS system on the found data using a novel data processing method. The data processing method includes 1) utterance selection and 2) prosodic punctuation insertion to prepare training data which can optimize the naturalness of TTS systems. We showed that using the processing data method, an end-to-end TTS achieved a mean opinion score (MOS) of 4.1 compared to 4.3 of natural speech. We showed that the punctuation insertion contributed the most to the result. To facilitate the research and development of TTS systems, we distributed the processed data of one speaker at https://forms.gle/6Hk5YkqgDxAaC2BU6.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源