通过对比对抗训练进行认知解离的鲁棒性。

论文标题

通过对比对抗训练进行认知解离的鲁棒性。

Robustness through Cognitive Dissociation Mitigation in Contrastive Adversarial Training

论文作者

Rahamim, Adir, Naeh, Itay

论文摘要

在本文中，我们介绍了一个新型的神经网络训练框架，该框架增加了模型对对抗性攻击的对抗性鲁棒性，同时通过将对比度学习（CL）与对抗性训练（AT）结合在一起来保持高清洁精度。我们建议通过学习在数据增强和对抗性扰动下保持一致的特征表示来提高对对抗性攻击的模型鲁棒性。我们通过将对抗性示例视为另一个积极的例子来利用对比度学习来提高对抗性鲁棒性，并旨在最大程度地提高数据样本的随机增强及其对抗性示例之间的相似性，同时不断更新分类头，以避免分类头和嵌入式空间之间的认知解离。这种分离是由于CL将网络更新到嵌入空间的事实引起的，同时冻结用于生成新的积极对抗示例的分类头。我们在CIFAR-10数据集上验证了我们的方法，具有对抗性特征（CLAF）的方法，与替代监督和自欺欺人的对抗学习方法相比，它在上面的数据集上均超过了强大的精度和清洁精度。

In this paper, we introduce a novel neural network training framework that increases model's adversarial robustness to adversarial attacks while maintaining high clean accuracy by combining contrastive learning (CL) with adversarial training (AT). We propose to improve model robustness to adversarial attacks by learning feature representations that are consistent under both data augmentations and adversarial perturbations. We leverage contrastive learning to improve adversarial robustness by considering an adversarial example as another positive example, and aim to maximize the similarity between random augmentations of data samples and their adversarial example, while constantly updating the classification head in order to avoid a cognitive dissociation between the classification head and the embedding space. This dissociation is caused by the fact that CL updates the network up to the embedding space, while freezing the classification head which is used to generate new positive adversarial examples. We validate our method, Contrastive Learning with Adversarial Features(CLAF), on the CIFAR-10 dataset on which it outperforms both robust accuracy and clean accuracy over alternative supervised and self-supervised adversarial learning methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题