用生成对抗扰动颠覆公平的图像搜索

论文标题

用生成对抗扰动颠覆公平的图像搜索

Subverting Fair Image Search with Generative Adversarial Perturbations

论文作者

Ghosh, Avijit, Jagielski, Matthew, Wilson, Christo

论文摘要

在这项工作中，我们在排名的背景下探索了交叉的公平性和鲁棒性：当对排名模型进行校准以实现公平的定义时，外部对手是否有可能使排名模型在不访问模型或培训数据的情况下不公平地行事？为了调查这个问题，我们提出了一个案例研究，然后使用使用生成的对抗性扰动（GAP）模型进行恶意修改的图像来攻击最先进的，公平的图像搜索引擎。这些扰动试图导致公平的重新排列算法不公平地提高包含对手选择亚群体的人的图像等级。我们提出了广泛的实验的结果，表明我们的攻击可以成功地为来自多数级别的人们提供相对于相当排名的基线搜索结果的人们的不公平优势。我们证明了我们的攻击在许多变量中都有强大的态度，它们对搜索结果的相关性的影响接近零，并且它们在严格的威胁模型下取得了成功。我们的发现突出了（1）（1）实现公平性所需的数据可能会受到对抗的操纵，并且（2）模型本身对攻击并不强大。

In this work we explore the intersection fairness and robustness in the context of ranking: when a ranking model has been calibrated to achieve some definition of fairness, is it possible for an external adversary to make the ranking model behave unfairly without having access to the model or training data? To investigate this question, we present a case study in which we develop and then attack a state-of-the-art, fairness-aware image search engine using images that have been maliciously modified using a Generative Adversarial Perturbation (GAP) model. These perturbations attempt to cause the fair re-ranking algorithm to unfairly boost the rank of images containing people from an adversary-selected subpopulation. We present results from extensive experiments demonstrating that our attacks can successfully confer significant unfair advantage to people from the majority class relative to fairly-ranked baseline search results. We demonstrate that our attacks are robust across a number of variables, that they have close to zero impact on the relevance of search results, and that they succeed under a strict threat model. Our findings highlight the danger of deploying fair machine learning algorithms in-the-wild when (1) the data necessary to achieve fairness may be adversarially manipulated, and (2) the models themselves are not robust against attacks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题