ICANET：一种由多模式数据驱动的短视频情感识别方法

论文标题

ICANET：一种由多模式数据驱动的短视频情感识别方法

ICANet: A Method of Short Video Emotion Recognition Driven by Multimodal Data

论文作者

Wu, Xuecheng, Tian, Mengmeng, Zhai, Lanhang

论文摘要

随着人工智能和简短视频的快速发展，短视频中的情感识别已成为人类计算机互动中最重要的研究主题之一。目前，大多数情感识别方法仍然保持在单一方式中。但是，在日常生活中，人类通常会掩盖自己的真实情绪，这导致了一个问题，即单态情感识别的准确性相对可怕。而且，区分类似的情绪并不容易。因此，我们提出了一种新方法，称为ICANET，通过采用三种不同的音频，视频和光学方式来实现多模式的短视频情感识别，从而弥补了缺乏单一模态，然后在短视频中提高了情感识别的准确性。 ICANET在Iemocap基准上具有更好的准确度为80.77％，超过SOTA方法的精度为15.89％。

With the fast development of artificial intelligence and short videos, emotion recognition in short videos has become one of the most important research topics in human-computer interaction. At present, most emotion recognition methods still stay in a single modality. However, in daily life, human beings will usually disguise their real emotions, which leads to the problem that the accuracy of single modal emotion recognition is relatively terrible. Moreover, it is not easy to distinguish similar emotions. Therefore, we propose a new approach denoted as ICANet to achieve multimodal short video emotion recognition by employing three different modalities of audio, video and optical flow, making up for the lack of a single modality and then improving the accuracy of emotion recognition in short videos. ICANet has a better accuracy of 80.77% on the IEMOCAP benchmark, exceeding the SOTA methods by 15.89%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题