论文标题
可识别的扬声器和滋扰属性嵌入可靠的扬声器验证
Disentangled speaker and nuisance attribute embedding for robust speaker verification
论文作者
论文摘要
近年来,已经提出了各种基于深度学习的嵌入方法,并在说话者验证中表现出令人印象深刻的性能。但是,与大多数经典嵌入技术一样,在处理具有不同条件的语音样本(例如,记录设备,情绪状态)时,已知基于深度学习的方法会遭受严重的性能降解。在本文中,我们提出了一种新颖的全面监督培训方法,用于提取嵌入载体与滋扰属性变异性的载体嵌入矢量。将所提出的框架与使用RSR2015和Voxceleb1数据集的常规基于深度学习的嵌入方法进行了比较。实验结果表明,所提出的方法可以提取扬声器嵌入到渠道和情绪变异性方面。
Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states). In this paper, we propose a novel fully supervised training method for extracting a speaker embedding vector disentangled from the variability caused by the nuisance attributes. The proposed framework was compared with the conventional deep learning-based embedding methods using the RSR2015 and VoxCeleb1 dataset. Experimental results show that the proposed approach can extract speaker embeddings robust to channel and emotional variability.