论文标题
CN-CELEB:多类扬声器识别
CN-Celeb: multi-genre speaker recognition
论文作者
论文摘要
对说话者认可的研究正在扩展到解决野生条件下的脆弱性,其中类型不匹配也许是最具挑战性的,例如,在与对话或唱歌音频测试时阅读语音的注册。这种不匹配导致了复杂而复合的会议间变化,既有内在的(即口语样式,生理状态)和外部(即记录设备,背景噪声)。不幸的是,少数几个现有的多流派语料库不仅限制了规模,而且在受控条件下也记录下来,这不能支持对多种类型问题的结论性研究。在这项工作中,我们首先出版了CN-Celeb,这是一种大规模的多流派语料库,其中包括11种不同类型的3,000名演讲者的野外演讲。其次,使用此数据集,我们对多流派现象进行了全面研究,尤其是多流派挑战对说话者识别的影响以及新数据集用于进行多种流派培训时的性能增长。
Research on speaker recognition is extending to address the vulnerability in the wild conditions, among which genre mismatch is perhaps the most challenging, for instance, enrollment with reading speech while testing with conversational or singing audio. This mismatch leads to complex and composite inter-session variations, both intrinsic (i.e., speaking style, physiological status) and extrinsic (i.e., recording device, background noise). Unfortunately, the few existing multi-genre corpora are not only limited in size but are also recorded under controlled conditions, which cannot support conclusive research on the multi-genre problem. In this work, we firstly publish CN-Celeb, a large-scale multi-genre corpus that includes in-the-wild speech utterances of 3,000 speakers in 11 different genres. Secondly, using this dataset, we conduct a comprehensive study on the multi-genre phenomenon, in particular the impact of the multi-genre challenge on speaker recognition and the performance gain when the new dataset is used to conduct multi-genre training.