论文标题
带有未标记数据的后门清洁
Backdoor Cleansing with Unlabeled Data
论文作者
论文摘要
由于深度神经网络(DNN)的计算需求不断增长,公司和组织已经开始将培训过程外包。但是,受过外部训练的DNN可能会受到后门攻击。防御此类攻击至关重要,即,后处理可疑模型,以便减轻其后门行为,而其正常预测能力在清洁输入的正常预测能力仍然尚不妥协。为了删除异常的后门行为,现有方法主要依赖于其他标记的干净样品。但是,由于培训数据对于最终用户来说通常不可用,因此这种要求可能是不现实的。在本文中,我们研究了规避这种障碍的可能性。我们提出了一种不需要训练标签的新型防御方法。通过精心设计的层次重量重新定位和知识蒸馏,我们的方法可以有效地清洁可疑网络的后门行为,其正常行为却可以忽略不计。在实验中,我们表明我们的方法未经标签训练,与使用标签训练的最先进的防御方法相比。我们还观察到有希望的防御结果,即使在分发数据中也是如此。这使我们的方法非常实用。代码可在以下网址找到:https://github.com/luluppang/bcu。
Due to the increasing computational demand of Deep Neural Networks (DNNs), companies and organizations have begun to outsource the training process. However, the externally trained DNNs can potentially be backdoor attacked. It is crucial to defend against such attacks, i.e., to postprocess a suspicious model so that its backdoor behavior is mitigated while its normal prediction power on clean inputs remain uncompromised. To remove the abnormal backdoor behavior, existing methods mostly rely on additional labeled clean samples. However, such requirement may be unrealistic as the training data are often unavailable to end users. In this paper, we investigate the possibility of circumventing such barrier. We propose a novel defense method that does not require training labels. Through a carefully designed layer-wise weight re-initialization and knowledge distillation, our method can effectively cleanse backdoor behaviors of a suspicious network with negligible compromise in its normal behavior. In experiments, we show that our method, trained without labels, is on-par with state-of-the-art defense methods trained using labels. We also observe promising defense results even on out-of-distribution data. This makes our method very practical. Code is available at: https://github.com/luluppang/BCU.