论文标题
DEID-VC:扬声器通过零拍伪语音转换的识别
DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion
论文作者
论文摘要
基于语音的在线服务的广泛采用提出了有关使用和共享数据的安全性和隐私问题。如果数据受到损害,攻击者可以利用用户语音来绕过扬声器验证系统,甚至模仿用户。为了减轻这种情况,我们提出了DEID-VC,这是一种演讲者的去识别系统,将真实的演讲者转换为伪扬声器,从而从口头声音中删除或使依赖说话者依赖的属性混淆。 DEID-VC的关键组件包括基于变量的自动编码器(VAE)的伪扬声器生成器(PSG)和零弹奏设置下的语音转换自动编码器(AE)。在PSG的帮助下,DeID-VC可以在扬声器级别甚至在话语层面上分配独特的伪扬声器。此外,还添加了两个新颖的学习目标,以弥合训练和零声音转换的推理之间的差距。我们以单词错误率(WER)和相等的错误率(EER)以及三个主观指标介绍了我们的实验结果,以评估DEID-VC的产生输出。结果表明,与我们的基线相比,我们的方法显着提高了清晰度(低10%)和去识别效果(EER高5%)。代码和听力演示:https://github.com/a43992899/deid-vc
The widespread adoption of speech-based online services raises security and privacy concerns regarding the data that they use and share. If the data were compromised, attackers could exploit user speech to bypass speaker verification systems or even impersonate users. To mitigate this, we propose DeID-VC, a speaker de-identification system that converts a real speaker to pseudo speakers, thus removing or obfuscating the speaker-dependent attributes from a spoken voice. The key components of DeID-VC include a Variational Autoencoder (VAE) based Pseudo Speaker Generator (PSG) and a voice conversion Autoencoder (AE) under zero-shot settings. With the help of PSG, DeID-VC can assign unique pseudo speakers at speaker level or even at utterance level. Also, two novel learning objectives are added to bridge the gap between training and inference of zero-shot voice conversion. We present our experimental results with word error rate (WER) and equal error rate (EER), along with three subjective metrics to evaluate the generated output of DeID-VC. The result shows that our method substantially improved intelligibility (WER 10% lower) and de-identification effectiveness (EER 5% higher) compared to our baseline. Code and listening demo: https://github.com/a43992899/DeID-VC