通过解释质量公平：评估事后解释质量的差异

论文标题

通过解释质量公平：评估事后解释质量的差异

Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations

论文作者

Dai, Jessica, Upadhyay, Sohini, Aivodji, Ulrich, Bach, Stephen H., Lakkaraju, Himabindu

论文摘要

由于事后解释方法越来越多地被利用以解释高风险环境中的复杂模型，因此确保在包括少数群体在内的各个人群亚组中，所得解释的质量始终高。例如，与与其他性别相关的实例（例如，女性）相关的实例（例如，女性）相关的说明不应该这样。但是，几乎没有研究能够评估通过最先进的解释方法输出的解释质量质量的这种基于群体的差异。在这项工作中，我们通过启动确定基于群体的解释质量差异的研究来解决上述差距。为此，我们首先概述了构成解释质量以及差异尤其有问题的关键属性。然后，我们利用这些属性提出了一个新颖的评估框架，该框架可以通过最先进的方法定量测量解释质量的差异。使用此框架，我们进行了严格的经验分析，以了解是否出现了解释质量的基于组的差异。我们的结果表明，当所解释的模型复杂且高度非线性时，这种差异更可能发生。此外，我们还观察到某些事后解释方法（例如，综合梯度，外形）更有可能表现出上述差异。据我们所知，这项工作是第一个强调和研究解释质量差异的问题。通过这样做，我们的工作阐明了以前未开发的方式，其中解释方法可能在现实世界决策中引入不公平。

As post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to ensure that the quality of the resulting explanations is consistently high across various population subgroups including the minority groups. For instance, it should not be the case that explanations associated with instances belonging to a particular gender subgroup (e.g., female) are less accurate than those associated with other genders. However, there is little to no research that assesses if there exist such group-based disparities in the quality of the explanations output by state-of-the-art explanation methods. In this work, we address the aforementioned gaps by initiating the study of identifying group-based disparities in explanation quality. To this end, we first outline the key properties which constitute explanation quality and where disparities can be particularly problematic. We then leverage these properties to propose a novel evaluation framework which can quantitatively measure disparities in the quality of explanations output by state-of-the-art methods. Using this framework, we carry out a rigorous empirical analysis to understand if and when group-based disparities in explanation quality arise. Our results indicate that such disparities are more likely to occur when the models being explained are complex and highly non-linear. In addition, we also observe that certain post hoc explanation methods (e.g., Integrated Gradients, SHAP) are more likely to exhibit the aforementioned disparities. To the best of our knowledge, this work is the first to highlight and study the problem of group-based disparities in explanation quality. In doing so, our work sheds light on previously unexplored ways in which explanation methods may introduce unfairness in real world decision making.

下载PDF全文

下载文献需遵守相关版权规定

论文标题