论文标题
您的“注意力”值得关注:面部动作分析的自变量多通道的关注
Your "Attention" Deserves Attention: A Self-Diversified Multi-Channel Attention for Facial Action Analysis
论文作者
论文摘要
在面部表达识别(FER)和动作单元(AU)检测中,都对视觉注意力进行了广泛研究。一系列以前的研究探索了如何使用注意模块来定位详细的面部零件(e,g。面部动作单元),学习判别特征并学习阶层间相关性。但是,很少有相关作品注意注意模块本身的鲁棒性。通过实验,我们发现在学习参加相同感兴趣区域(ROI)时,具有不同特征图初始化的神经注意图产生了不同的表示。换句话说,与一般特征学习类似,注意图的表示质量也极大地影响了模型的性能,这意味着不受限制的注意力学习具有很多随机性。这种不确定性使传统的注意力学习属于次级优势。在本文中,我们提出了一个紧凑的模型,以增强神经注意图的代表性和聚焦力,并了解精致注意图的“间隔”相关性,我们将其称为“自变化的多通道注意网络(SMA-NET)”。在两个基准数据库(BP4D和DISFA)上评估了所提出的方法,以进行AU检测,以及四个数据库(CK+,MMI,BU-3DFE和BP4D+)进行面部表达识别。与最先进的方法相比,它取得了卓越的性能。
Visual attention has been extensively studied for learning fine-grained features in both facial expression recognition (FER) and Action Unit (AU) detection. A broad range of previous research has explored how to use attention modules to localize detailed facial parts (e,g. facial action units), learn discriminative features, and learn inter-class correlation. However, few related works pay attention to the robustness of the attention module itself. Through experiments, we found neural attention maps initialized with different feature maps yield diverse representations when learning to attend the identical Region of Interest (ROI). In other words, similar to general feature learning, the representational quality of attention maps also greatly affects the performance of a model, which means unconstrained attention learning has lots of randomnesses. This uncertainty lets conventional attention learning fall into sub-optimal. In this paper, we propose a compact model to enhance the representational and focusing power of neural attention maps and learn the "inter-attention" correlation for refined attention maps, which we term the "Self-Diversified Multi-Channel Attention Network (SMA-Net)". The proposed method is evaluated on two benchmark databases (BP4D and DISFA) for AU detection and four databases (CK+, MMI, BU-3DFE, and BP4D+) for facial expression recognition. It achieves superior performance compared to the state-of-the-art methods.