关于具有感知梯度的模型的好处

论文标题

关于具有感知梯度的模型的好处

On the Benefits of Models with Perceptually-Aligned Gradients

论文作者

Aggarwal, Gunjan, Sinha, Abhishek, Kumari, Nupur, Singh, Mayank

论文摘要

与标准训练的模型相比，对抗性强大的模型已显示出更多的可靠和可解释的功能。如[\ cite {tsipras2018Robustness}]中所示，这种可靠的模型继承了有用的可解释属性，其中梯度在感知上与图像很好地对齐，并添加了大型目标对抗扰动导致图像类似于目标类。我们执行实验，以表明即使在对对抗性攻击的高度鲁棒性的模型中，即使在模型中也存在可解释和感知的对齐梯度。具体而言，我们对不同的最大扰动结合进行了对抗训练。在具有可解释特征的模型中，具有低最大扰动的对抗性训练结果在干净的样品上只有略有下降的模型。在本文中，我们利用具有可解释的感知分配功能的模型，并表明具有低最大扰动界限的对抗性训练可以提高模型的零拍摄性能和弱监督的本地化任务。

Adversarial robust models have been shown to learn more robust and interpretable features than standard trained models. As shown in [\cite{tsipras2018robustness}], such robust models inherit useful interpretable properties where the gradient aligns perceptually well with images, and adding a large targeted adversarial perturbation leads to an image resembling the target class. We perform experiments to show that interpretable and perceptually aligned gradients are present even in models that do not show high robustness to adversarial attacks. Specifically, we perform adversarial training with attack for different max-perturbation bound. Adversarial training with low max-perturbation bound results in models that have interpretable features with only slight drop in performance over clean samples. In this paper, we leverage models with interpretable perceptually-aligned features and show that adversarial training with low max-perturbation bound can improve the performance of models for zero-shot and weakly supervised localization tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题