论文标题
深层建筑增强了对噪音,对抗性攻击和跨科目的鲁棒性,以识别语音情感
Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition
论文作者
论文摘要
当培训和测试数据分布相同时,语音情绪识别系统(SER)可以实现高精度,但是在实践中经常违反此假设,而SER Systems的性能在不可预见的数据转移中趋于下降。适合精确SER的强大模型的设计是具有挑战性的,这限制了其在实际应用中的使用。在本文中,我们提出了一个更深层次的神经网络体系结构,其中我们将Densenet,LSTM和Highway Network融合在一起,以学习强大的歧视性特征,这些功能对噪声非常有力。我们还通过网络体系结构提出了数据增强,以进一步提高鲁棒性。我们全面评估了架构,并针对(1)噪声,(2)对抗攻击和(3)交叉式设置,结合了数据增强。与现有的研究和最先进的模型相比,我们对广泛使用的IEMOCAP和MSP-IMPROV数据集的评估显示出令人鼓舞的结果。
Speech emotion recognition systems (SER) can achieve high accuracy when the training and test data are identically distributed, but this assumption is frequently violated in practice and the performance of SER systems plummet against unforeseen data shifts. The design of robust models for accurate SER is challenging, which limits its use in practical applications. In this paper we propose a deeper neural network architecture wherein we fuse DenseNet, LSTM and Highway Network to learn powerful discriminative features which are robust to noise. We also propose data augmentation with our network architecture to further improve the robustness. We comprehensively evaluate the architecture coupled with data augmentation against (1) noise, (2) adversarial attacks and (3) cross-corpus settings. Our evaluations on the widely used IEMOCAP and MSP-IMPROV datasets show promising results when compared with existing studies and state-of-the-art models.