论文标题
可能性景观:许多对抗防御背后的统一原则
Likelihood Landscapes: A Unifying Principle Behind Many Adversarial Defenses
论文作者
论文摘要
卷积神经网络已被证明容易受到对抗性示例的影响,众所周知,这些例子位于正常数据所在的地方接近的子空间中,但并非自然发生并且可能性较低。在这项工作中,我们研究了防御技术对训练模型下输入图像的可能性的几何形状的潜在效果。我们首先提出了一种可视化利用歧视分类器模型解释的可能性景观的方法。然后,我们引入了一项量度来量化似然景观的平坦度。我们观察到,一部分对抗防御技术会产生类似的可能性景观的影响。我们进一步探索直接朝着平坦的景观探索,以实现对抗性鲁棒性。
Convolutional Neural Networks have been shown to be vulnerable to adversarial examples, which are known to locate in subspaces close to where normal data lies but are not naturally occurring and of low probability. In this work, we investigate the potential effect defense techniques have on the geometry of the likelihood landscape - likelihood of the input images under the trained model. We first propose a way to visualize the likelihood landscape leveraging an energy-based model interpretation of discriminative classifiers. Then we introduce a measure to quantify the flatness of the likelihood landscape. We observe that a subset of adversarial defense techniques results in a similar effect of flattening the likelihood landscape. We further explore directly regularizing towards a flat landscape for adversarial robustness.