论文标题

用外洗涤剂的彩式解释

Fairwashing Explanations with Off-Manifold Detergent

论文作者

Anders, Christopher J., Pasliev, Plamen, Dombrowski, Ann-Kathrin, Müller, Klaus-Robert, Kessel, Pan

论文摘要

解释方法有望使黑盒分类器更加透明。结果,希望他们能够充当算法的明智,公平和值得信赖的决策过程的证明,从而增加最终用户的接受。在本文中,我们在理论上和实验上都表明这些希望目前是毫无根据的。具体来说,我们表明,对于任何分类器$ g $,一个人总是可以构建另一个分类器$ \ tilde {g} $,该{g} $在数据上具有相同的行为(相同的火车,验证和测试错误),但已任意操纵解释图。我们使用微分几何形状从理论上得出此陈述,并在实验中为各种解释方法,体系结构和数据集进行了实验证明。在我们的理论见解的推动下,我们提出了对现有解释方法的修改,这使它们变得更加健壮。

Explanation methods promise to make black-box classifiers more transparent. As a result, it is hoped that they can act as proof for a sensible, fair and trustworthy decision-making process of the algorithm and thereby increase its acceptance by the end-users. In this paper, we show both theoretically and experimentally that these hopes are presently unfounded. Specifically, we show that, for any classifier $g$, one can always construct another classifier $\tilde{g}$ which has the same behavior on the data (same train, validation, and test error) but has arbitrarily manipulated explanation maps. We derive this statement theoretically using differential geometry and demonstrate it experimentally for various explanation methods, architectures, and datasets. Motivated by our theoretical insights, we then propose a modification of existing explanation methods which makes them significantly more robust.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源