论文标题
后门平滑:对深神经网络的后门攻击神秘
Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural Networks
论文作者
论文摘要
当在测试时呈现特定的触发器时,后门攻击误导了机器学习模型,以输出攻击者指定的类。这些攻击需要使训练数据中毒,以损害学习算法,例如,通过向训练集中注入诱因的中毒样品以及所需的类标签。尽管对后门攻击和防御的研究越来越多,但影响后门攻击成功的基本因素以及对学习算法的影响尚未得到充分了解。在这项工作中,我们旨在通过揭示后门攻击来揭示围绕触发样本的决策功能来阐明这个问题 - 我们称之为\ textit {Backdoor Smoothing}的现象。为了量化后门平滑,我们定义了一种评估与输入样品周围分类器预测相关的不确定性的度量。 我们的实验表明,将触发器添加到输入样品中时的平滑度会提高,并且这种现象更为明显,以实现更成功的攻击。 我们还提供了初步证据,表明后门触发器并不是唯一的诱导平滑模式,但是我们的方法也可以检测到其他人工模式,从而为理解当前防御和设计新颖的局限性铺平了道路。
Backdoor attacks mislead machine-learning models to output an attacker-specified class when presented a specific trigger at test time. These attacks require poisoning the training data to compromise the learning algorithm, e.g., by injecting poisoning samples containing the trigger into the training set, along with the desired class label. Despite the increasing number of studies on backdoor attacks and defenses, the underlying factors affecting the success of backdoor attacks, along with their impact on the learning algorithm, are not yet well understood. In this work, we aim to shed light on this issue by unveiling that backdoor attacks induce a smoother decision function around the triggered samples -- a phenomenon which we refer to as \textit{backdoor smoothing}. To quantify backdoor smoothing, we define a measure that evaluates the uncertainty associated to the predictions of a classifier around the input samples. Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks. We also provide preliminary evidence that backdoor triggers are not the only smoothing-inducing patterns, but that also other artificial patterns can be detected by our approach, paving the way towards understanding the limitations of current defenses and designing novel ones.