学会愚弄说话者的认可

论文标题

学会愚弄说话者的认可

Learning to fool the speaker recognition

论文作者

Li, Jiguo, Zhang, Xinfeng, Xu, Jizheng, Zhang, Li, Wang, Yue, Ma, Siwei, Gao, Wen

论文摘要

由于指纹/面部/扬声器识别系统的广泛部署，攻击基于深度学习的生物识别系统引起了越来越多的关注。先前的研究主要研究了对基于视觉的系统的攻击，例如指纹和面部识别。尽管尚未对说话者认可的攻击进行调查，尽管它已在我们的日常生活中广泛使用。在本文中，我们试图欺骗最先进的说话者识别模型，并提供\ textit {说话者识别攻击者}，这是一种轻巧的模型，通过在原始的语音波形上添加不可察觉的扰动来欺骗深层说话者识别模型。我们发现说话者识别系统也容易受到攻击，并且在非目标攻击方面取得了很高的成功率。此外，我们还提出了一种有效的方法，可以优化说话者识别攻击者，以通过感知质量在攻击成功率之间进行权衡。 TIMIT数据集的实验表明，我们可以达到$ 99.2 \％$的句子错误率，而平均SNR $ 57.2 \ text {db} $和PESQ 4.2的速度比实时更快。

Due to the widespread deployment of fingerprint/face/speaker recognition systems, attacking deep learning based biometric systems has drawn more and more attention. Previous research mainly studied the attack to the vision-based system, such as fingerprint and face recognition. While the attack for speaker recognition has not been investigated yet, although it has been widely used in our daily life. In this paper, we attempt to fool the state-of-the-art speaker recognition model and present \textit{speaker recognition attacker}, a lightweight model to fool the deep speaker recognition model by adding imperceptible perturbations onto the raw speech waveform. We find that the speaker recognition system is also vulnerable to the attack, and we achieve a high success rate on the non-targeted attack. Besides, we also present an effective method to optimize the speaker recognition attacker to obtain a trade-off between the attack success rate with the perceptual quality. Experiments on the TIMIT dataset show that we can achieve a sentence error rate of $99.2\%$ with an average SNR $57.2\text{dB}$ and PESQ 4.2 with speed rather faster than real-time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题