深度学习后门

论文标题

深度学习后门

Deep Learning Backdoors

论文作者

Li, Shaofeng, Ma, Shiqing, Xue, Minhui, Zhao, Benjamin Zi Hao

论文摘要

直观地，对深神经网络（DNN）的后门攻击是将隐藏的恶意行为注入DNN中，以便后门模型对良性输入而合法地行为，但在其输入包含恶意触发时会引起预定的恶意行为。扳机可以采用大量形式，包括图像中存在的特殊对象（例如，黄色垫），形状充满了自定义纹理（例如，具有特定颜色的徽标），甚至具有特殊过滤器的图像（例如，由Nashville或Gotham Filters更改的图像）。这些过滤器可以通过更换或扰动一组图像像素来应用于原始图像。

Intuitively, a backdoor attack against Deep Neural Networks (DNNs) is to inject hidden malicious behaviors into DNNs such that the backdoor model behaves legitimately for benign inputs, yet invokes a predefined malicious behavior when its input contains a malicious trigger. The trigger can take a plethora of forms, including a special object present in the image (e.g., a yellow pad), a shape filled with custom textures (e.g., logos with particular colors) or even image-wide stylizations with special filters (e.g., images altered by Nashville or Gotham filters). These filters can be applied to the original image by replacing or perturbing a set of image pixels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题