论文标题
学习用更少的像素攻击:用于精炼任意致密攻击的概率事后框架
Learning to Attack with Fewer Pixels: A Probabilistic Post-hoc Framework for Refining Arbitrary Dense Adversarial Attacks
论文作者
论文摘要
据报道,深度神经网络图像分类器容易受到对抗性逃避攻击的影响,后者使用精心制作的图像来误导分类器。许多对抗性攻击属于密集攻击的类别,它们通过扰动自然图像的所有像素来产生对抗性示例。为了产生稀疏的扰动,最近已经开发出稀疏的攻击,通常是通过修改具有稀疏正规的密集攻击算法而得出的独立攻击,从而降低了攻击效率。在本文中,我们旨在从不同的角度解决这项任务。我们从密集攻击产生的扰动中选择最有效的扰动,这是因为我们发现,密集攻击产生的图像上的大量扰动可能对攻击分类器无济于事。因此,我们提出了一个概率的事后框架,该框架通过显着减少扰动的像素的数量,但保持其攻击能力,从而通过相互信息最大化来完善其密集的攻击。鉴于任意密集的攻击,提出的模型具有使其对抗性图像更现实,并且较少扰动的可检测到吸引人的兼容性。此外,我们的框架执行对抗性攻击的速度要比现有的稀疏攻击快得多。
Deep neural network image classifiers are reported to be susceptible to adversarial evasion attacks, which use carefully crafted images created to mislead a classifier. Many adversarial attacks belong to the category of dense attacks, which generate adversarial examples by perturbing all the pixels of a natural image. To generate sparse perturbations, sparse attacks have been recently developed, which are usually independent attacks derived by modifying a dense attack's algorithm with sparsity regularisations, resulting in reduced attack efficiency. In this paper, we aim to tackle this task from a different perspective. We select the most effective perturbations from the ones generated from a dense attack, based on the fact we find that a considerable amount of the perturbations on an image generated by dense attacks may contribute little to attacking a classifier. Accordingly, we propose a probabilistic post-hoc framework that refines given dense attacks by significantly reducing the number of perturbed pixels but keeping their attack power, trained with mutual information maximisation. Given an arbitrary dense attack, the proposed model enjoys appealing compatibility for making its adversarial images more realistic and less detectable with fewer perturbations. Moreover, our framework performs adversarial attacks much faster than existing sparse attacks.