提示：分层神经元概念解释器

论文标题

提示：分层神经元概念解释器

HINT: Hierarchical Neuron Concept Explainer

论文作者

Wang, Andong, Lee, Wei-Ning, Qi, Xiaojuan

论文摘要

要解释深层网络，一种主要方法是将神经元与人类理解的概念联系起来。但是，现有方法通常忽略了不同概念（例如，狗和猫都属于动物）的固有关系，因此失去了解释负责高级概念的神经元的机会（例如动物）。在本文中，我们研究了受人类等级认知过程启发的层次概念。为此，我们提出了层次的神经元概念解释器（提示），以低成本和可扩展的方式有效地建立神经元与分层概念之间的双向关联。提示我们能够系统地和定量研究概念的隐式分层关系是否以及如何嵌入神经元中，例如识别负责一个概念的协作神经元，以及用于不同概念的多模态神经元，在不同的语义级别上，来自具体概念（例如，狗）的不同语义级别（例如，更多的摘要）（例如，更多的摘要概念（例如）（例如，动物）。最后，我们使用弱监督的对象定位来验证关联的忠诚，并证明其在各种任务中的适用性，例如发现显着区域和解释对抗性攻击。代码可在https://github.com/antonotnawang/hint上找到。

To interpret deep networks, one main approach is to associate neurons with human-understandable concepts. However, existing methods often ignore the inherent relationships of different concepts (e.g., dog and cat both belong to animals), and thus lose the chance to explain neurons responsible for higher-level concepts (e.g., animal). In this paper, we study hierarchical concepts inspired by the hierarchical cognition process of human beings. To this end, we propose HIerarchical Neuron concepT explainer (HINT) to effectively build bidirectional associations between neurons and hierarchical concepts in a low-cost and scalable manner. HINT enables us to systematically and quantitatively study whether and how the implicit hierarchical relationships of concepts are embedded into neurons, such as identifying collaborative neurons responsible to one concept and multimodal neurons for different concepts, at different semantic levels from concrete concepts (e.g., dog) to more abstract ones (e.g., animal). Finally, we verify the faithfulness of the associations using Weakly Supervised Object Localization, and demonstrate its applicability in various tasks such as discovering saliency regions and explaining adversarial attacks. Code is available on https://github.com/AntonotnaWang/HINT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题