论文标题
基于RES2NET的相动欺骗语音检测与相位网络
Phase-Aware Spoof Speech Detection Based on Res2Net with Phase Network
论文作者
论文摘要
欺骗语音检测(SSD)是自动扬声器验证系统的基本对策。尽管频域中具有大小特征的SSD显示出令人鼓舞的结果,但相位信息对于捕获某些类型的欺骗攻击的人工制品也很重要。因此,必须考虑大小和相位特征,以确保泛化攻击的概括能力。在本文中,我们通过熵分析研究了先前作品特征级融合的失败原因,我们发现大小和相位特征之间的随机性差异很大,这可以通过后端神经网络中断特征级融合。因此,我们提出了一个相位网络来减少这种差异。我们的SSD系统:配备阶段网络的RES2NET可实现显着的性能提高,特别是在欺骗攻击中,该阶段信息被认为很重要。此外,我们在已知和未知的SSD方案中都展示了我们的SSD系统。
The spoof speech detection (SSD) is the essential countermeasure for automatic speaker verification systems. Although SSD with magnitude features in the frequency domain has shown promising results, the phase information also can be important to capture the artefacts of certain types of spoofing attacks. Thus, both magnitude and phase features must be considered to ensure the generalization ability to diverse types of spoofing attacks. In this paper, we investigate the failure reason of feature-level fusion of the previous works through the entropy analysis from which we found that the randomness difference between magnitude and phase features is large, which can interrupt the feature-level fusion via backend neural network; thus, we propose a phase network to reduce that difference. Our SSD system: phase network equipped Res2Net achieved significant performance improvement, specifically in the spoofing attack for which the phase information is considered to be important. Also, we demonstrate our SSD system in both known- and unknown-kind SSD scenarios for practical applications.