论文标题
关于对抗示例防御的自适应攻击
On Adaptive Attacks to Adversarial Example Defenses
论文作者
论文摘要
自适应攻击已成为评估对抗性例子的防御措施的事实上的标准。但是,我们发现典型的自适应评估是不完整的。我们证明,最近在ICLR,ICML和Neurips上发布的13种防御措施 - 并选择用于说明性和教学目的 - 尽管试图使用自适应攻击进行评估,但仍可以规避。虽然事先评估论文主要集中于最终结果,但表明辩方是无效的 - 本文着重于制定方法和进行适应性攻击所必需的方法。我们希望这些分析将作为如何适当地针对对抗性例子进行防御的适应性攻击的指导,从而使社区能够在建立更强大的模型方面取得进一步的进步。
Adaptive attacks have (rightfully) become the de facto standard for evaluating defenses to adversarial examples. We find, however, that typical adaptive evaluations are incomplete. We demonstrate that thirteen defenses recently published at ICLR, ICML and NeurIPS---and chosen for illustrative and pedagogical purposes---can be circumvented despite attempting to perform evaluations using adaptive attacks. While prior evaluation papers focused mainly on the end result---showing that a defense was ineffective---this paper focuses on laying out the methodology and the approach necessary to perform an adaptive attack. We hope that these analyses will serve as guidance on how to properly perform adaptive attacks against defenses to adversarial examples, and thus will allow the community to make further progress in building more robust models.