MMER：多模式的多任务学习语音情感识别

论文标题

MMER：多模式的多任务学习语音情感识别

MMER: Multimodal Multi-task Learning for Speech Emotion Recognition

论文作者

Ghosh, Sreyan, Tyagi, Utkarsh, Ramaneswaran, S, Srivastava, Harshvardhan, Manocha, Dinesh

论文摘要

在本文中，我们提出了一种新型的多式模式多任务学习方法，以识别语音情感。 Mmer利用新颖的多模式网络，基于文本和声学方式之间的早期融合和跨模式的自我注意，并解决了三个新颖的辅助任务，以从口头语音中学习情感识别。在实践中，Mer Mer的表现优于我们在IEMocap基准测试上的所有基准，并取得了最先进的表现。此外，我们进行了广泛的消融研究和结果分析，以证明我们提出的方法的有效性。

In this paper, we propose MMER, a novel Multimodal Multi-task learning approach for Speech Emotion Recognition. MMER leverages a novel multimodal network based on early-fusion and cross-modal self-attention between text and acoustic modalities and solves three novel auxiliary tasks for learning emotion recognition from spoken utterances. In practice, MMER outperforms all our baselines and achieves state-of-the-art performance on the IEMOCAP benchmark. Additionally, we conduct extensive ablation studies and results analysis to prove the effectiveness of our proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题