可识别的扬声器和滋扰属性嵌入可靠的扬声器验证

论文标题

可识别的扬声器和滋扰属性嵌入可靠的扬声器验证

Disentangled speaker and nuisance attribute embedding for robust speaker verification

论文作者

Kang, Woo Hyun, Mun, Sung Hwan, Han, Min Hyun, Kim, Nam Soo

论文摘要

近年来，已经提出了各种基于深度学习的嵌入方法，并在说话者验证中表现出令人印象深刻的性能。但是，与大多数经典嵌入技术一样，在处理具有不同条件的语音样本（例如，记录设备，情绪状态）时，已知基于深度学习的方法会遭受严重的性能降解。在本文中，我们提出了一种新颖的全面监督培训方法，用于提取嵌入载体与滋扰属性变异性的载体嵌入矢量。将所提出的框架与使用RSR2015和Voxceleb1数据集的常规基于深度学习的嵌入方法进行了比较。实验结果表明，所提出的方法可以提取扬声器嵌入到渠道和情绪变异性方面。

Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states). In this paper, we propose a novel fully supervised training method for extracting a speaker embedding vector disentangled from the variability caused by the nuisance attributes. The proposed framework was compared with the conventional deep learning-based embedding methods using the RSR2015 and VoxCeleb1 dataset. Experimental results show that the proposed approach can extract speaker embeddings robust to channel and emotional variability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题