具有动态调节的对手的对抗微调

论文标题

具有动态调节的对手的对抗微调

Adversarial Fine-tune with Dynamically Regulated Adversary

论文作者

Hou, Pengyue, Zhou, Ming, Han, Jie, Musilek, Petr, Li, Xingyu

论文摘要

对抗性训练是一种有效的方法，可以使模型鲁棒性成为恶意，对抗性攻击。但是，模型鲁棒性的这种改善通常会导致在干净的图像上牺牲标准性能。在许多实际应用中，例如健康诊断和自动手术机器人技术，标准性能比对这种极端恶意攻击的模型鲁棒性更具价值。这导致了一个问题：我们在多大程度上可以在多大程度上提高模型鲁棒性而不会牺牲标准绩效？这项工作解决了这个问题，并提出了一种简单而有效的转移学习基于学习的对抗训练策略，该策略删除了对抗样本对模型标准性能的负面影响。此外，我们还引入了一种训练友好的对抗攻击算法，该算法有助于提高对抗性鲁棒性，而无需引入重大的训练复杂性。广泛的实验表明，所提出的方法的表现优于先前的对手训练算法：提高模型鲁棒性，同时保留模型在清洁数据上的标准性能。

Adversarial training is an effective method to boost model robustness to malicious, adversarial attacks. However, such improvement in model robustness often leads to a significant sacrifice of standard performance on clean images. In many real-world applications such as health diagnosis and autonomous surgical robotics, the standard performance is more valued over model robustness against such extremely malicious attacks. This leads to the question: To what extent we can boost model robustness without sacrificing standard performance? This work tackles this problem and proposes a simple yet effective transfer learning-based adversarial training strategy that disentangles the negative effects of adversarial samples on model's standard performance. In addition, we introduce a training-friendly adversarial attack algorithm, which facilitates the boost of adversarial robustness without introducing significant training complexity. Extensive experimentation indicates that the proposed method outperforms previous adversarial training algorithms towards the target: to improve model robustness while preserving model's standard performance on clean data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题