黑光：针对基于查询的黑盒攻击的神经网络的可扩展防御

论文标题

黑光：针对基于查询的黑盒攻击的神经网络的可扩展防御

Blacklight: Scalable Defense for Neural Networks against Query-Based Black-Box Attacks

论文作者

Li, Huiying, Shan, Shawn, Wenger, Emily, Zhang, Jiayun, Zheng, Haitao, Zhao, Ben Y.

论文摘要

深度学习系统很容易受到对抗性例子的影响。特别是，基于查询的黑框攻击不需要深入学习模型的知识，而可以通过提交查询和检查收益来计算网络上的对抗示例。最近的工作在很大程度上提高了这些攻击的效率，这表明了它们在当今的ML-AS-A-Service平台上的实用性。我们提出了Blacklight，这是一种针对基于查询的黑盒对抗攻击的新防御。推动我们设计的基本见解是，为了计算对抗性示例，这些攻击在网络上进行了迭代优化，从而在输入空间中产生了非常相似的图像查询。 Blacklight使用在概率内容指纹上运行的有效相似性引擎来检测高度相似的查询来检测基于查询的黑盒攻击。我们根据各种模型和图像分类任务对八次最先进的攻击进行了评估。 Blacklight通常只有几次查询后，都可以识别所有这些。通过拒绝所有检测到的查询，Blacklight也可以防止任何攻击完成，即使攻击者在帐户禁令或查询拒绝后坚持提交查询。 Blacklight在几个强大的对策中也非常强大，包括最佳的黑盒攻击，该攻击近似于效率上的白色盒子攻击。最后，我们说明了黑光如何推广到其他域，例如文本分类。

Deep learning systems are known to be vulnerable to adversarial examples. In particular, query-based black-box attacks do not require knowledge of the deep learning model, but can compute adversarial examples over the network by submitting queries and inspecting returns. Recent work largely improves the efficiency of those attacks, demonstrating their practicality on today's ML-as-a-service platforms. We propose Blacklight, a new defense against query-based black-box adversarial attacks. The fundamental insight driving our design is that, to compute adversarial examples, these attacks perform iterative optimization over the network, producing image queries highly similar in the input space. Blacklight detects query-based black-box attacks by detecting highly similar queries, using an efficient similarity engine operating on probabilistic content fingerprints. We evaluate Blacklight against eight state-of-the-art attacks, across a variety of models and image classification tasks. Blacklight identifies them all, often after only a handful of queries. By rejecting all detected queries, Blacklight prevents any attack to complete, even when attackers persist to submit queries after account ban or query rejection. Blacklight is also robust against several powerful countermeasures, including an optimal black-box attack that approximates white-box attacks in efficiency. Finally, we illustrate how Blacklight generalizes to other domains like text classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题