论文标题
多式联运路由:改善多模式分析的本地和全局解释性
Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis
论文作者
论文摘要
人类语言可以通过多种信息来源表示,称为语音,包括语音,面部手势和口语。最近在以人为中心的任务(例如情感分析和情感识别)上具有强大表现的多模式学习通常是黑框,其可解释性非常有限。在本文中,我们提出了多模式路由,该路由对每个输入样本的输入方式和输出表示之间的权重动态调节。多模式路由可以识别单个模态和跨模式特征的相对重要性。此外,通过路由分配的重量分配使我们能够解释模态预测关系不仅在全球(即整个数据集的一般趋势),而且在每个单个输入样本中也本地解释,与最先进的方法相比,请保持竞争性能。
The human language can be expressed through multiple sources of information known as modalities, including tones of voice, facial gestures, and spoken language. Recent multimodal learning with strong performances on human-centric tasks such as sentiment analysis and emotion recognition are often black-box, with very limited interpretability. In this paper we propose Multimodal Routing, which dynamically adjusts weights between input modalities and output representations differently for each input sample. Multimodal routing can identify relative importance of both individual modalities and cross-modality features. Moreover, the weight assignment by routing allows us to interpret modality-prediction relationships not only globally (i.e. general trends over the whole dataset), but also locally for each single input sample, meanwhile keeping competitive performance compared to state-of-the-art methods.