论文标题
对对抗性训练是否可以通过非运动功能来操纵?
Can Adversarial Training Be Manipulated By Non-Robust Features?
论文作者
论文摘要
最初旨在抵抗测试时间对抗性示例的对抗训练已证明在缓解训练时间可用性攻击方面有希望。但是,本文挑战了这种防御能力。我们确定了一种名为“稳定攻击”的新型威胁模型,该模型旨在通过稍微操纵培训数据来阻碍强大的可用性。在这种威胁下,我们表明使用常规防御预算$ε$的对抗性训练未能在简单的统计环境中提供测试鲁棒性,在这种情况下,培训数据的不舒适特征可以通过$ε$ cob的扰动来增强。此外,我们分析了扩大国防预算以应对稳定攻击的必要性。最后,全面的实验表明,稳定性攻击对基准数据集有害,因此自适应防御对于维持鲁棒性是必要的。我们的代码可在https://github.com/tlmichael/hyprigical-perterbation上找到。
Adversarial training, originally designed to resist test-time adversarial examples, has shown to be promising in mitigating training-time availability attacks. This defense ability, however, is challenged in this paper. We identify a novel threat model named stability attack, which aims to hinder robust availability by slightly manipulating the training data. Under this threat, we show that adversarial training using a conventional defense budget $ε$ provably fails to provide test robustness in a simple statistical setting, where the non-robust features of the training data can be reinforced by $ε$-bounded perturbation. Further, we analyze the necessity of enlarging the defense budget to counter stability attacks. Finally, comprehensive experiments demonstrate that stability attacks are harmful on benchmark datasets, and thus the adaptive defense is necessary to maintain robustness. Our code is available at https://github.com/TLMichael/Hypocritical-Perturbation.