评估差异私有合成数据的公平影响

论文标题

评估差异私有合成数据的公平影响

Evaluating the Fairness Impact of Differentially Private Synthetic Data

论文作者

Bullwinkel, Blake, Grabarz, Kristen, Ke, Lily, Gong, Scarlett, Tanner, Chris, Allen, Joshua

论文摘要

差异化私有（DP）合成数据是一种最大化包含敏感信息数据的实用性的有希望的方法。但是，由于抑制了代表性不足的阶级，而这些阶级通常需要实现隐私，因此，它可能与公平冲突。我们评估了四个DP合成器，并提出了经验结果，表明这些模型中的三个经常在下游二进制分类任务上降低公平性结果。我们在生成的合成数据中存在的公平性与存在的少数群体比例之间建立了联系，并发现通过多标签下采样方法预处理的数据训练合成器可以促进更公平的结果而不会降低准确性。

Differentially private (DP) synthetic data is a promising approach to maximizing the utility of data containing sensitive information. Due to the suppression of underrepresented classes that is often required to achieve privacy, however, it may be in conflict with fairness. We evaluate four DP synthesizers and present empirical results indicating that three of these models frequently degrade fairness outcomes on downstream binary classification tasks. We draw a connection between fairness and the proportion of minority groups present in the generated synthetic data, and find that training synthesizers on data that are pre-processed via a multi-label undersampling method can promote more fair outcomes without degrading accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题