论文标题

NWPU-ASLP语音私人2022挑战系统

NWPU-ASLP System for the VoicePrivacy 2022 Challenge

论文作者

Yao, Jixun, Wang, Qing, Zhang, Li, Guo, Pengcheng, Liang, Yuhao, Xie, Lei

论文摘要

本文介绍了Voice Privacy 2022 Challenge的NWPU-ASLP扬声器匿名系统。我们的提交不涉及其他自动扬声器验证(ASV)模型或X-vector池。我们的系统由四个模块组成,包括特征提取器,声学模型,匿名模块和神经声码器。首先,该特征提取器提取语音后验(PPG),并从输入语音信号中提取音调。然后,我们从扬声器查找桌(LUT)中保留一个伪扬声器ID,随后将其馈入扬声器编码器,以生成与任何真实扬声器相对应的伪扬声器。为了确保伪扬声器可以区分,我们进一步平均嵌入随机选择的扬声器并将其与伪扬声器嵌入嵌入的加权串联以生成匿名扬声器的嵌入。最后,声学模型从匿名扬声器嵌入的匿名MEL-SPECTROGRAM中输出匿名MEL光谱图,而修改后的Hifigan版本将MEL光谱图转换为匿名的语音波形。实验结果证明了我们提出的匿名系统的有效性。

This paper presents the NWPU-ASLP speaker anonymization system for VoicePrivacy 2022 Challenge. Our submission does not involve additional Automatic Speaker Verification (ASV) model or x-vector pool. Our system consists of four modules, including feature extractor, acoustic model, anonymization module, and neural vocoder. First, the feature extractor extracts the Phonetic Posteriorgram (PPG) and pitch from the input speech signal. Then, we reserve a pseudo speaker ID from a speaker look-up table (LUT), which is subsequently fed into a speaker encoder to generate the pseudo speaker embedding that is not corresponding to any real speaker. To ensure the pseudo speaker is distinguishable, we further average the randomly selected speaker embedding and weighted concatenate it with the pseudo speaker embedding to generate the anonymized speaker embedding. Finally, the acoustic model outputs the anonymized mel-spectrogram from the anonymized speaker embedding and a modified version of HifiGAN transforms the mel-spectrogram into the anonymized speech waveform. Experimental results demonstrate the effectiveness of our proposed anonymization system.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源