论文标题

通过边界排斥的仅标签模型反转攻击

Label-Only Model Inversion Attacks via Boundary Repulsion

论文作者

Kahla, Mostafa, Chen, Si, Just, Hoang Anh, Jia, Ruoxi

论文摘要

最近的研究表明,最新的深层神经网络容易受到模型反转攻击的影响,在该攻击中,滥用模型的访问以重建任何给定目标类别的私人培训数据。现有的攻击依赖于访问完整的目标模型(WhiteBox)或模型的软标签(BlackBox)。但是,在更艰难但更实用的情况下,攻击者只能访问该模型的预测标签,而没有置信度衡量标签,但没有进行过以前的工作。在本文中,我们介绍了一种算法,边界限制模型反转(BREP-MI),仅使用目标模型的预测标签反转私人训练数据。我们算法的关键思想是在球体上评估该模型的预测标签,然后估算到达目标类质心的方向。以面部识别的示例,我们表明,由BREP-MI重建的图像成功地重现了各种数据集和目标模型体系结构的私人培训数据的语义。我们将BREP-MI与最先进的白框和BlackBox模型反转攻击进行了比较,结果表明,尽管假设对目标模型的了解较少,但BREP-MI的表现却优于BlackBox攻击,并且与WhiteBox攻击相当。

Recent studies show that the state-of-the-art deep neural networks are vulnerable to model inversion attacks, in which access to a model is abused to reconstruct private training data of any given target class. Existing attacks rely on having access to either the complete target model (whitebox) or the model's soft-labels (blackbox). However, no prior work has been done in the harder but more practical scenario, in which the attacker only has access to the model's predicted label, without a confidence measure. In this paper, we introduce an algorithm, Boundary-Repelling Model Inversion (BREP-MI), to invert private training data using only the target model's predicted labels. The key idea of our algorithm is to evaluate the model's predicted labels over a sphere and then estimate the direction to reach the target class's centroid. Using the example of face recognition, we show that the images reconstructed by BREP-MI successfully reproduce the semantics of the private training data for various datasets and target model architectures. We compare BREP-MI with the state-of-the-art whitebox and blackbox model inversion attacks and the results show that despite assuming less knowledge about the target model, BREP-MI outperforms the blackbox attack and achieves comparable results to the whitebox attack.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源