通过共同信息正则化改善对反转攻击模型的鲁棒性

论文标题

通过共同信息正则化改善对反转攻击模型的鲁棒性

Improving Robustness to Model Inversion Attacks via Mutual Information Regularization

论文作者

Wang, Tianhao, Zhang, Yuheng, Jia, Ruoxi

论文摘要

本文研究了针对模型反转（MI）攻击的防御机制 - 一种隐私攻击，旨在推断有关访问目标机器学习模型的训练数据分布的信息。现有的防御机制依赖于特定模型的启发式方法或噪声注射。在能够减轻攻击的同时，现有方法极大地阻碍了模型性能。仍然存在一个问题，即如何设计适用于各种模型的防御机制并实现更好的公用事业权利权衡。在本文中，我们提出了针对MI攻击的基于信息正则化的防御（中）。关键思想是限制预测中包含的模型输入的信息，从而限制了对手从模型预测中推断出私人培训属性的能力。我们的防御原理是模型不可静止的，我们向正规器提出了可拖动的近似值，即使没有任何防御力，也已成功地攻击了线性回归，决策树和神经网络。我们通过设计严格的基于游戏的定义并量化相关信息泄漏，对MI攻击进行正式研究。我们的理论分析阐明了DP在捍卫MI攻击方面的效率低下，这在几项先前的作品中已被经验观察到。我们的实验表明，中间会导致各种MI攻击，目标模型和数据集的最新性能。

This paper studies defense mechanisms against model inversion (MI) attacks -- a type of privacy attacks aimed at inferring information about the training data distribution given the access to a target machine learning model. Existing defense mechanisms rely on model-specific heuristics or noise injection. While being able to mitigate attacks, existing methods significantly hinder model performance. There remains a question of how to design a defense mechanism that is applicable to a variety of models and achieves better utility-privacy tradeoff. In this paper, we propose the Mutual Information Regularization based Defense (MID) against MI attacks. The key idea is to limit the information about the model input contained in the prediction, thereby limiting the ability of an adversary to infer the private training attributes from the model prediction. Our defense principle is model-agnostic and we present tractable approximations to the regularizer for linear regression, decision trees, and neural networks, which have been successfully attacked by prior work if not attached with any defenses. We present a formal study of MI attacks by devising a rigorous game-based definition and quantifying the associated information leakage. Our theoretical analysis sheds light on the inefficacy of DP in defending against MI attacks, which has been empirically observed in several prior works. Our experiments demonstrate that MID leads to state-of-the-art performance for a variety of MI attacks, target models and datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题