论文标题
朝着对不精确机的对抗评估
Towards Adversarial Evaluations for Inexact Machine Unlearning
论文作者
论文摘要
机器学习模型面临着对个人用户数据存储的越来越疑问,以及损坏数据(例如后门或系统偏见)的不利影响。 Machine Unerning可以通过允许从学习模型的受影响的培训数据进行事后删除来解决这些问题。确切地完成这项任务在计算上很昂贵;因此,最近的作品提出了不精确的算法来解决此问题,以及测试这些算法有效性的评估方法。 在这项工作中,我们首先概述了评估方法的一些必要标准,并显示现有的评估不满意。然后,我们设计了一种更强的黑盒评估方法,称为阶级混乱(IC)测试,该测试在训练过程中对对手操纵数据以检测未学习程序的不足。我们还提出了两种出于分析动机的基线方法〜(EU-K和CF-K),它们的表现优于几个流行的不精确的学习方法。总体而言,我们证明了对抗性评估策略如何有助于分析各种未学习现象,从而指导更强的学习算法的发展。
Machine Learning models face increased concerns regarding the storage of personal user data and adverse impacts of corrupted data like backdoors or systematic bias. Machine Unlearning can address these by allowing post-hoc deletion of affected training data from a learned model. Achieving this task exactly is computationally expensive; consequently, recent works have proposed inexact unlearning algorithms to solve this approximately as well as evaluation methods to test the effectiveness of these algorithms. In this work, we first outline some necessary criteria for evaluation methods and show no existing evaluation satisfies them all. Then, we design a stronger black-box evaluation method called the Interclass Confusion (IC) test which adversarially manipulates data during training to detect the insufficiency of unlearning procedures. We also propose two analytically motivated baseline methods~(EU-k and CF-k) which outperform several popular inexact unlearning methods. Overall, we demonstrate how adversarial evaluation strategies can help in analyzing various unlearning phenomena which can guide the development of stronger unlearning algorithms.