神经网络中可解释性方法和扰动伪像的保真度

论文标题

神经网络中可解释性方法和扰动伪像的保真度

Fidelity of Interpretability Methods and Perturbation Artifacts in Neural Networks

论文作者

Brocki, Lennart, Chung, Neo Christopher

论文摘要

尽管在图像分类，检测和预测中，深层神经网络（DNNS）表现出色，但表征DNN如何做出给定的决定仍然是一个开放的问题，导致了许多可解释性方法。事后解释性方法主要旨在量化输入特征在类概率方面的重要性。但是，由于缺乏地面真理和具有不同操作特征的可解释性方法的存在，评估这些方法是一个至关重要的挑战。评估可解释性方法的一种流行方法是烧毁对给定预测至关重要的输入特征，并观察到准确性的降低。但是，扰动本身可能会引入工件。我们提出了一种方法，可以根据最进口（MIF）（MIF）和最少进口的第一（LIF）顺序利用模型精度曲线来估算此类工件对忠诚度估计的影响。使用在Imagenet上训练的Resnet-50，我们证明了四种流行的事后可解释性方法的拟议保真度估计。

Despite excellent performance of deep neural networks (DNNs) in image classification, detection, and prediction, characterizing how DNNs make a given decision remains an open problem, resulting in a number of interpretability methods. Post-hoc interpretability methods primarily aim to quantify the importance of input features with respect to the class probabilities. However, due to the lack of ground truth and the existence of interpretability methods with diverse operating characteristics, evaluating these methods is a crucial challenge. A popular approach to evaluate interpretability methods is to perturb input features deemed important for a given prediction and observe the decrease in accuracy. However, perturbation itself may introduce artifacts. We propose a method for estimating the impact of such artifacts on the fidelity estimation by utilizing model accuracy curves from perturbing input features according to the Most Import First (MIF) and Least Import First (LIF) orders. Using the ResNet-50 trained on the ImageNet, we demonstrate the proposed fidelity estimation of four popular post-hoc interpretability methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题