LAS-AT：具有可学习攻击策略的对抗训练

论文标题

LAS-AT：具有可学习攻击策略的对抗训练

LAS-AT: Adversarial Training with Learnable Attack Strategy

论文作者

Jia, Xiaojun, Zhang, Yong, Wu, Baoyuan, Ma, Ke, Wang, Jue, Cao, Xiaochun

论文摘要

对抗训练（AT）始终是一个最小问题，其性能取决于涉及对抗性实例（AES）的内部优化。大多数以前的方法都采用预计的梯度体面（PGD），并为AE生成指定攻击参数。攻击参数的组合可以称为攻击策略。几项作品表明，在整个训练阶段，使用固定攻击策略在整个训练阶段生成AE，限制了模型的鲁棒性，并建议在不同的训练阶段利用不同的攻击策略以提高鲁棒性。但是，那些多阶段的手工攻击策略需要很多领域的专业知识，并且稳健性的改善是有限的。在本文中，我们通过引入“可学习攻击策略”的概念（称为LAS-AT）提出了一个新颖的框架，以介绍“可学习攻击策略”，该概念学会了自动产生攻击策略以改善模型的稳健性。我们的框架由目标网络组成，该目标网络使用AES进行培训来改善鲁棒性和产生攻击策略以控制AE生成的策略网络。在三个基准数据库上进行的实验评估证明了该方法的优越性。该代码在https://github.com/jiaxiaojunqaq/las-at上发布。

Adversarial training (AT) is always formulated as a minimax problem, of which the performance depends on the inner optimization that involves the generation of adversarial examples (AEs). Most previous methods adopt Projected Gradient Decent (PGD) with manually specifying attack parameters for AE generation. A combination of the attack parameters can be referred to as an attack strategy. Several works have revealed that using a fixed attack strategy to generate AEs during the whole training phase limits the model robustness and propose to exploit different attack strategies at different training stages to improve robustness. But those multi-stage hand-crafted attack strategies need much domain expertise, and the robustness improvement is limited. In this paper, we propose a novel framework for adversarial training by introducing the concept of "learnable attack strategy", dubbed LAS-AT, which learns to automatically produce attack strategies to improve the model robustness. Our framework is composed of a target network that uses AEs for training to improve robustness and a strategy network that produces attack strategies to control the AE generation. Experimental evaluations on three benchmark databases demonstrate the superiority of the proposed method. The code is released at https://github.com/jiaxiaojunQAQ/LAS-AT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题