针对对抗性扰动的神经启发的自动编码防御

论文标题

针对对抗性扰动的神经启发的自动编码防御

A Neuro-Inspired Autoencoding Defense Against Adversarial Perturbations

论文作者

Bakiskan, Can, Cekic, Metehan, Sezer, Ahmet Dundar, Madhow, Upamanyu

论文摘要

深度神经网络（DNN）容易受到对抗性攻击的影响：对图像的精心构造可以严重损害分类的准确性，同时对人类无法察觉。尽管在防御此类攻击方面进行了大量研究，但基于系统设计原则的大多数防御措施已被适当修改的攻击击败。对于固定的数据，最有效的当前辩护是使用对抗扰动的示例训练网络。在本文中，我们研究了一种根本不同的，神经启发的防御机制，从观察到人类的视力实际上不受为机器设计的对抗性例子的影响。我们的目的是使用具有在生物学视觉中通常观察到的特征的编码器，拒绝限制的对抗扰动到达分类器DNN：稀疏的过度表示，由于突触噪声而引起的随机性和剧烈的非线性。使用标准词典学习，无监督编码器培训。基于CNN的解码器将编码器输出的大小恢复为原始图像的大小，从而可以使用标准CNN进行分类。我们的名义设计是以标准监督的方式一起培训解码器和分类器，但我们还考虑了基于回归目标（如常规自动编码器）的无监督解码器培训，并通过对分类器进行单独的监督培训。与对手训练不同，所有培训均基于干净的图像。我们在CIFAR-10上进行的实验表明了基于对抗性训练的最先进防御能力的性能竞争，并指出了神经启发的技术对强大神经网络设计的希望。此外，我们还为Imagenet数据集的一部分提供了结果，以验证我们的方法缩放到较大的图像。

Deep Neural Networks (DNNs) are vulnerable to adversarial attacks: carefully constructed perturbations to an image can seriously impair classification accuracy, while being imperceptible to humans. While there has been a significant amount of research on defending against such attacks, most defenses based on systematic design principles have been defeated by appropriately modified attacks. For a fixed set of data, the most effective current defense is to train the network using adversarially perturbed examples. In this paper, we investigate a radically different, neuro-inspired defense mechanism, starting from the observation that human vision is virtually unaffected by adversarial examples designed for machines. We aim to reject L^inf bounded adversarial perturbations before they reach a classifier DNN, using an encoder with characteristics commonly observed in biological vision: sparse overcomplete representations, randomness due to synaptic noise, and drastic nonlinearities. Encoder training is unsupervised, using standard dictionary learning. A CNN-based decoder restores the size of the encoder output to that of the original image, enabling the use of a standard CNN for classification. Our nominal design is to train the decoder and classifier together in standard supervised fashion, but we also consider unsupervised decoder training based on a regression objective (as in a conventional autoencoder) with separate supervised training of the classifier. Unlike adversarial training, all training is based on clean images. Our experiments on the CIFAR-10 show performance competitive with state-of-the-art defenses based on adversarial training, and point to the promise of neuro-inspired techniques for the design of robust neural networks. In addition, we provide results for a subset of the Imagenet dataset to verify that our approach scales to larger images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题