论文标题

通过更智能的局部抽样来改善石灰鲁棒性

Improving LIME Robustness with Smarter Locality Sampling

论文作者

Saito, Sean, Chua, Eugene, Capel, Nicholas, Hu, Rocco

论文摘要

诸如石灰等解释性算法使机器学习系统能够采用透明度和公平性,这在商业用例中是重要的品质。但是,最近的工作表明,对手可以利用石灰的幼稚抽样策略来掩盖有偏见的有害行为。我们建议通过训练生成的对抗网络来采样更现实的合成数据,以使石灰更加健壮,解释器用来生成解释。我们的实验表明,我们提出的方法表明,与香草石灰相比,在检测有偏见的对抗性行为方面,三个现实世界数据集的准确性提高。这是在保持可比较的解释质量的同时实现的,在某些情况下,TOP-1的准确性高达99.94%。

Explainability algorithms such as LIME have enabled machine learning systems to adopt transparency and fairness, which are important qualities in commercial use cases. However, recent work has shown that LIME's naive sampling strategy can be exploited by an adversary to conceal biased, harmful behavior. We propose to make LIME more robust by training a generative adversarial network to sample more realistic synthetic data which the explainer uses to generate explanations. Our experiments demonstrate that our proposed method demonstrates an increase in accuracy across three real-world datasets in detecting biased, adversarial behavior compared to vanilla LIME. This is achieved while maintaining comparable explanation quality, with up to 99.94\% in top-1 accuracy in some cases.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源