论文标题
NHSS:语音和唱歌并行数据库
NHSS: A Speech and Singing Parallel Database
论文作者
论文摘要
我们介绍了由新加坡国立大学(NUS)人类语言技术(HLT)实验室收集和发行的语音和唱歌的数据库,这称为NUS-HLT说话 - 传语(NHSS)数据库。我们向公众发布了此数据库,以支持研究活动,其中包括但不限于对语音和唱歌信号的声学属性的比较研究,语音和唱歌声音的合作综合以及演讲转换。该数据库由唱歌的英语流行歌曲演唱,歌手以自然阅读方式阅读的歌曲歌词的口语以及手动准备的话语级别和文字级别的注释。 NHSS数据库中的录音对应于100首歌曲,并由10位歌手说话,总共有7个小时的音频数据。有5位男性和5位女歌手,唱歌和阅读10首歌曲的歌词。在本文中,我们讨论了数据库的设计方法,分析语音和唱歌声音的特征的相似性和差异,并提供一些策略来解决这些特征之间将一个特征转换为彼此的关系。我们开发基准系统,可以用作使用NHSS数据库的语音对齐,光谱映射和转换的参考。
We present a database of parallel recordings of speech and singing, collected and released by the Human Language Technology (HLT) laboratory at the National University of Singapore (NUS), that is called NUS-HLT Speak-Sing (NHSS) database. We release this database to the public to support research activities, that include, but not limited to comparative studies of acoustic attributes of speech and singing signals, cooperative synthesis of speech and singing voices, and speech-to-singing conversion. This database consists of recordings of sung vocals of English pop songs, the spoken counterpart of lyrics of the songs read by the singers in their natural reading manner, and manually prepared utterance-level and word-level annotations. The audio recordings in the NHSS database correspond to 100 songs sung and spoken by 10 singers, resulting in a total of 7 hours of audio data. There are 5 male and 5 female singers, singing and reading the lyrics of 10 songs each. In this paper, we discuss the design methodology of the database, analyse the similarities and dissimilarities in characteristics of speech and singing voices, and provide some strategies to address relationships between these characteristics for converting one to another. We develop benchmark systems, which can be used as reference for speech-to-singing alignment, spectral mapping, and conversion using the NHSS database.