论文标题
OpencPop:高质量的开源中文流行歌曲语料库,用于唱歌声音综合
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis
论文作者
论文摘要
本文介绍了OpencPop,这是一种专为演唱语音合成(SVS)的公开可用的高质量的普通话演唱语料库。该语料库由一位女专业歌手演奏的100首流行的普通话歌曲组成。音频文件以工作室质量记录,采样率为44,100 Hz,并提供了相应的歌词和音乐分数。所有歌唱录音都用音素边界和音节(注)边界进行了语音注释。为了证明已发布数据的可靠性并为将来的研究提供了基线,我们建立了基于神经网络的SVS模型,并通过客观的指标和主观的平均意见分数(MOS)度量对其进行了评估。实验结果表明,在我们的数据库中训练的最佳SVS模型达到3.70 MO,表明提供的语料库的可靠性。 OpencPop已发布给开源社区WENET,并且可以在项目主页上找到语料库以及合成的演示。
This paper introduces Opencpop, a publicly available high-quality Mandarin singing corpus designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin songs performed by a female professional singer. Audio files are recorded with studio quality at a sampling rate of 44,100 Hz and the corresponding lyrics and musical scores are provided. All singing recordings have been phonetically annotated with phoneme boundaries and syllable (note) boundaries. To demonstrate the reliability of the released data and to provide a baseline for future research, we built baseline deep neural network-based SVS models and evaluated them with both objective metrics and subjective mean opinion score (MOS) measure. Experimental results show that the best SVS model trained on our database achieves 3.70 MOS, indicating the reliability of the provided corpus. Opencpop is released to the open-source community WeNet, and the corpus, as well as synthesized demos, can be found on the project homepage.