朝着面向班级的中毒攻击针对神经网络

论文标题

朝着面向班级的中毒攻击针对神经网络

Towards Class-Oriented Poisoning Attacks Against Neural Networks

论文作者

Zhao, Bingyin, Lao, Yingjie

论文摘要

中毒对机器学习系统的攻击通过故意在训练数据集中注入恶意样本来影响训练过程，从而损害了模型性能。先前的工作着重于可用性攻击（即降低整体模型准确性）或完整性攻击（即启用基于实例的后门）。在本文中，我们将可用性攻击的对抗性目标提高到人均，我们称这是面向班级的中毒攻击。我们证明了所提出的攻击能够迫使损坏的模型以两种特定的方式进行预测：（i）将未看到的新图像分类为有针对性的“供应商”类，以及（ii）在“受害者”类中误以为图像，同时维持对其他非视频类别的分类准确性。为了最大程度地提高对抗性效果并减少中毒数据生成的计算复杂性，我们提出了一个基于梯度的框架，该框架使用每种情况下的精心操纵特征信息来制作中毒图像。我们在班级级别使用新定义的指标，证明了对各种模型（例如Lenet-5，VGG-9和Resnet-50）提出的针对班级中毒的攻击在广泛的数据集（例如MNIST，CIFAR-10和Imagenet-Imsenet-ilsvrc2012）中的各种数据集中的有效性。

Poisoning attacks on machine learning systems compromise the model performance by deliberately injecting malicious samples in the training dataset to influence the training process. Prior works focus on either availability attacks (i.e., lowering the overall model accuracy) or integrity attacks (i.e., enabling specific instance-based backdoor). In this paper, we advance the adversarial objectives of the availability attacks to a per-class basis, which we refer to as class-oriented poisoning attacks. We demonstrate that the proposed attack is capable of forcing the corrupted model to predict in two specific ways: (i) classify unseen new images to a targeted "supplanter" class, and (ii) misclassify images from a "victim" class while maintaining the classification accuracy on other non-victim classes. To maximize the adversarial effect as well as reduce the computational complexity of poisoned data generation, we propose a gradient-based framework that crafts poisoning images with carefully manipulated feature information for each scenario. Using newly defined metrics at the class level, we demonstrate the effectiveness of the proposed class-oriented poisoning attacks on various models (e.g., LeNet-5, Vgg-9, and ResNet-50) over a wide range of datasets (e.g., MNIST, CIFAR-10, and ImageNet-ILSVRC2012) in an end-to-end training setting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题