论文标题
深度学习后门
Deep Learning Backdoors
论文作者
论文摘要
直观地,对深神经网络(DNN)的后门攻击是将隐藏的恶意行为注入DNN中,以便后门模型对良性输入而合法地行为,但在其输入包含恶意触发时会引起预定的恶意行为。扳机可以采用大量形式,包括图像中存在的特殊对象(例如,黄色垫),形状充满了自定义纹理(例如,具有特定颜色的徽标),甚至具有特殊过滤器的图像(例如,由Nashville或Gotham Filters更改的图像)。这些过滤器可以通过更换或扰动一组图像像素来应用于原始图像。
Intuitively, a backdoor attack against Deep Neural Networks (DNNs) is to inject hidden malicious behaviors into DNNs such that the backdoor model behaves legitimately for benign inputs, yet invokes a predefined malicious behavior when its input contains a malicious trigger. The trigger can take a plethora of forms, including a special object present in the image (e.g., a yellow pad), a shape filled with custom textures (e.g., logos with particular colors) or even image-wide stylizations with special filters (e.g., images altered by Nashville or Gotham filters). These filters can be applied to the original image by replacing or perturbing a set of image pixels.