神经激活模式（午睡）：学习概念的视觉解释性

论文标题

神经激活模式（午睡）：学习概念的视觉解释性

Neural Activation Patterns (NAPs): Visual Explainability of Learned Concepts

论文作者

Bäuerle, Alex, Jönsson, Daniel, Ropinski, Timo

论文摘要

破译神经网络内部运作的关键是了解模型学到了什么。发现学习特征的有前途的方法基于分析激活值，从而当前技术着重于分析高激活值，以在神经元水平上揭示有趣的特征。但是，分析高激活值限制了图层级概念发现。我们提出了一种考虑整个激活分布的方法。通过在神经网络层的高维活化空间内提取相似的激活曲线，我们发现了类似处理的输入组。这些输入组代表神经激活模式（小睡），可用于可视化和解释学习的层概念。我们释放一个框架，可以从预训练的模型中提取小睡，并提供可视觉内省工具，可用于分析午睡。我们通过各种网络测试了我们的方法，并展示了它如何补充现有的分析神经网络激活值的方法。

A key to deciphering the inner workings of neural networks is understanding what a model has learned. Promising methods for discovering learned features are based on analyzing activation values, whereby current techniques focus on analyzing high activation values to reveal interesting features on a neuron level. However, analyzing high activation values limits layer-level concept discovery. We present a method that instead takes into account the entire activation distribution. By extracting similar activation profiles within the high-dimensional activation space of a neural network layer, we find groups of inputs that are treated similarly. These input groups represent neural activation patterns (NAPs) and can be used to visualize and interpret learned layer concepts. We release a framework with which NAPs can be extracted from pre-trained models and provide a visual introspection tool that can be used to analyze NAPs. We tested our method with a variety of networks and show how it complements existing methods for analyzing neural network activation values.

下载PDF全文

下载文献需遵守相关版权规定

论文标题