通过对抗性学习达到机会平等的公平性

论文标题

通过对抗性学习达到机会平等的公平性

Towards Equal Opportunity Fairness through Adversarial Learning

论文作者

Han, Xudong, Baldwin, Timothy, Cohn, Trevor

论文摘要

对抗性训练是自然语言处理中缓解偏见的常见方法。尽管大多数关于辩护的工作都是出于机会均等的动机，但在标准对抗训练中并未明确捕获它。在本文中，我们提出了一个增强的歧视者，以进行对抗训练，该歧视者将目标类作为输入，以创建更丰富的功能，并更明确地模拟了机会均等的机会。两个数据集上的实验结果表明，就性能而言，我们的方法对标准的对抗性偏见方法大大改善了 - 费用权衡。

Adversarial training is a common approach for bias mitigation in natural language processing. Although most work on debiasing is motivated by equal opportunity, it is not explicitly captured in standard adversarial training. In this paper, we propose an augmented discriminator for adversarial training, which takes the target class as input to create richer features and more explicitly model equal opportunity. Experimental results over two datasets show that our method substantially improves over standard adversarial debiasing methods, in terms of the performance--fairness trade-off.

下载PDF全文

下载文献需遵守相关版权规定

论文标题