论文标题
基于不确定性的基于视觉问题的类激活图回答
Uncertainty based Class Activation Maps for Visual Question Answering
论文作者
论文摘要
理解和解释深度学习模型是一项必须的任务。在此方面,我们提出了一种获得基于梯度的确定性估计值,该方法也提供了视觉注意图。特别是,我们解决了视觉问题回答任务。我们结合了现代概率深度学习方法,通过使用这些估计梯度,我们进一步改善了这些方法。这些具有两倍的好处:a)在获得与错误分类样本和b)改进的注意力图相关的确定性估计值方面的改进,这些估计值改善了与人类注意区域相关的最新结果的注意力图。改进的注意力图会导致各种方法的视觉问题回答的各种方法的一致改进。因此,可以将提出的技术视为获得改进的深度学习模型的确定性估计和解释的秘诀。我们为所有标准基准的视觉问题回答任务提供了详细的经验分析,并与最先进的方法进行了比较。
Understanding and explaining deep learning models is an imperative task. Towards this, we propose a method that obtains gradient-based certainty estimates that also provide visual attention maps. Particularly, we solve for visual question answering task. We incorporate modern probabilistic deep learning methods that we further improve by using the gradients for these estimates. These have two-fold benefits: a) improvement in obtaining the certainty estimates that correlate better with misclassified samples and b) improved attention maps that provide state-of-the-art results in terms of correlation with human attention regions. The improved attention maps result in consistent improvement for various methods for visual question answering. Therefore, the proposed technique can be thought of as a recipe for obtaining improved certainty estimates and explanations for deep learning models. We provide detailed empirical analysis for the visual question answering task on all standard benchmarks and comparison with state of the art methods.