论文标题

自动扬声器验证中欺骗检测的子带模型

Subband modeling for spoofing detection in automatic speaker verification

论文作者

Chettri, Bhusan, Kinnunen, Tomi, Benetos, Emmanouil

论文摘要

频谱图 - 音频信号的时频表示 - 发现在基于神经网络的欺骗检测中广泛使用。虽然深层模型是在信号的成率频谱上训练的,但我们认为并非所有频段都对这些任务有用。在本文中,我们系统地研究了不同子带的影响及其对在两个基准数据集上重播欺骗检测的重要性:ASVSPOOF 2017 v2.0 v2.0和ASVSPOOF 2019 PA。我们提出了一个联合子带建模框架,该框架采用n个不同的子网络来学习特定的特定功能。后来将它们组合在一起并传递给分类器,并且整个网络权重在培训期间进行了更新。我们对ASVSPOOF 2017数据集的发现表明,最判别的信息似乎是在第一个和最后一个1 kHz频带中,并且在这两个子带上训练的联合模型表明,最佳性能表现出了最佳的表现,优于基准的大幅度。但是,这些发现并未对ASVSPOOF 2019 PA数据集进行推广。这表明可用于培训的数据集这些模型没有反映现实世界的重播条件,这表明需要仔细设计用于训练重播欺骗对策的数据集。

Spectrograms - time-frequency representations of audio signals - have found widespread use in neural network-based spoofing detection. While deep models are trained on the fullband spectrum of the signal, we argue that not all frequency bands are useful for these tasks. In this paper, we systematically investigate the impact of different subbands and their importance on replay spoofing detection on two benchmark datasets: ASVspoof 2017 v2.0 and ASVspoof 2019 PA. We propose a joint subband modelling framework that employs n different sub-networks to learn subband specific features. These are later combined and passed to a classifier and the whole network weights are updated during training. Our findings on the ASVspoof 2017 dataset suggest that the most discriminative information appears to be in the first and the last 1 kHz frequency bands, and the joint model trained on these two subbands shows the best performance outperforming the baselines by a large margin. However, these findings do not generalise on the ASVspoof 2019 PA dataset. This suggests that the datasets available for training these models do not reflect real world replay conditions suggesting a need for careful design of datasets for training replay spoofing countermeasures.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源