知觉对抗性鲁棒性：防御看不见的威胁模型

论文标题

知觉对抗性鲁棒性：防御看不见的威胁模型

Perceptual Adversarial Robustness: Defense Against Unseen Threat Models

论文作者

Laidlaw, Cassidy, Singla, Sahil, Feizi, Soheil

论文摘要

对抗性鲁棒性的一个关键挑战是缺乏人类感知的精确数学特征，这用于人眼无法察觉的对抗性攻击的定义。当前的大多数攻击和防御试图通过考虑限制性的对抗威胁模型，例如$ l_2 $或$ l_ \ infty $距离，空间扰动等限制性的对抗威胁模型。为了解决这个问题，我们建议使用深层神经网络近似所有不可察觉的对抗示例的对抗训练。我们称这种威胁模型为神经感知威胁模型（NPTM）；它包括具有有界神经感知距离的对抗性示例（基于神经网络的真实感知距离的近似）。通过一项广泛的感知研究，我们表明神经感知距离与人类对对抗性实例的可感知性的判断很好地相关，从而验证了我们的威胁模型。在NPTM下，我们开发了新颖的感知对抗性攻击和防御。由于NPTM非常广泛，因此我们发现对感知攻击的感知对抗训练（PAT）可与许多其他类型的对抗性攻击具有坚固性。我们对CIFAR-10和Imagenet-100进行PAT测试，以针对五种不同的对抗攻击。我们发现，PAT能够针对这五次攻击的结合实现最新的鲁棒性，这使得与下一个最佳模型的准确性增加了一倍，而无需对其进行训练。也就是说，PAT概括为无法预料的扰动类型。这对于不能假设特定威胁模型的敏感应用至关重要，据我们所知，PAT是首次使用此属性的对抗性训练防御。

A key challenge in adversarial robustness is the lack of a precise mathematical characterization of human perception, used in the very definition of adversarial attacks that are imperceptible to human eyes. Most current attacks and defenses try to avoid this issue by considering restrictive adversarial threat models such as those bounded by $L_2$ or $L_\infty$ distance, spatial perturbations, etc. However, models that are robust against any of these restrictive threat models are still fragile against other threat models. To resolve this issue, we propose adversarial training against the set of all imperceptible adversarial examples, approximated using deep neural networks. We call this threat model the neural perceptual threat model (NPTM); it includes adversarial examples with a bounded neural perceptual distance (a neural network-based approximation of the true perceptual distance) to natural images. Through an extensive perceptual study, we show that the neural perceptual distance correlates well with human judgements of perceptibility of adversarial examples, validating our threat model. Under the NPTM, we develop novel perceptual adversarial attacks and defenses. Because the NPTM is very broad, we find that Perceptual Adversarial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks. We test PAT on CIFAR-10 and ImageNet-100 against five diverse adversarial attacks. We find that PAT achieves state-of-the-art robustness against the union of these five attacks, more than doubling the accuracy over the next best model, without training against any of them. That is, PAT generalizes well to unforeseen perturbation types. This is vital in sensitive applications where a particular threat model cannot be assumed, and to the best of our knowledge, PAT is the first adversarial training defense with this property.

下载PDF全文

下载文献需遵守相关版权规定

论文标题