论文标题
理解CNN:解释深厚的代表及其与Inns的不变
Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs
论文作者
论文摘要
为了解决日益复杂的任务,它已成为神经网络学习抽象表示的基本能力。这些特定于任务的表示,尤其是它们捕获的不变性将神经网络变成缺乏可解释性的黑匣子模型。因此,要打开这样的黑匣子,要揭示模型所学的不同语义概念以及它已经学会了不变的不同语义概念至关重要。我们提出了一种基于Inns的方法,该方法(i)通过解开数据中剩余的变化因素来恢复特定于任务的,学习的不变因素,并且(ii)不可转化这些恢复的不向关系与模型表示形式相同,成为具有可访问的语义概念的同样表现力。结果,神经网络表示可以通过提供(i)揭露其语义含义的手段来理解,(ii)语义修改表示形式,(iii)可视化个人学到的语义概念和不可思议。我们的可逆方法通过启用对最新网络的事后解释而不会损害其性能,从而大大扩展了理解黑匣子模型的能力。我们的实施可在https://compvis.github.io/invariances/上获得。
To tackle increasingly complex tasks, it has become an essential ability of neural networks to learn abstract representations. These task-specific representations and, particularly, the invariances they capture turn neural networks into black box models that lack interpretability. To open such a black box, it is, therefore, crucial to uncover the different semantic concepts a model has learned as well as those that it has learned to be invariant to. We present an approach based on INNs that (i) recovers the task-specific, learned invariances by disentangling the remaining factor of variation in the data and that (ii) invertibly transforms these recovered invariances combined with the model representation into an equally expressive one with accessible semantic concepts. As a consequence, neural network representations become understandable by providing the means to (i) expose their semantic meaning, (ii) semantically modify a representation, and (iii) visualize individual learned semantic concepts and invariances. Our invertible approach significantly extends the abilities to understand black box models by enabling post-hoc interpretations of state-of-the-art networks without compromising their performance. Our implementation is available at https://compvis.github.io/invariances/ .