用用式接地模拟进行解释评估

论文标题

用用式接地模拟进行解释评估

Use-Case-Grounded Simulations for Explanation Evaluation

论文作者

Chen, Valerie, Johnson, Nari, Topin, Nicholay, Plumb, Gregory, Talwalkar, Ameet

论文摘要

越来越多的研究进行了人类主题评估，以研究为用户提供机器学习模型的解释是否可以帮助他们实施现实世界中的用例。但是，运行的用户研究具有挑战性且昂贵，因此每个研究通常仅评估有限的不同设置，例如，研究通常只评估一些任意选择的解释方法。为了应对这些挑战和援助用户研究设计，我们介绍了用用用的模拟评估（Simevals）。 Simevals涉及培训算法剂，以输入信息内容（例如模型解释），这些信息内容将在人类学科研究中呈现给每个参与者，以预测感兴趣的用例的答案。算法代理的测试集精度提供了衡量下游用例信息内容的预测性。我们对三种现实世界中的用例（正向模拟，模型调试和反事实推理）进行全面评估，以证明Simevals可以有效地确定哪种解释方法将为每个用例提供帮助。这些结果提供了证据表明，可以使用Simevals进行有效筛选一组重要的用户研究设计决策，例如在进行潜在昂贵的用户研究之前，选择应向用户提供哪些解释。

A growing body of research runs human subject evaluations to study whether providing users with explanations of machine learning models can help them with practical real-world use cases. However, running user studies is challenging and costly, and consequently each study typically only evaluates a limited number of different settings, e.g., studies often only evaluate a few arbitrarily selected explanation methods. To address these challenges and aid user study design, we introduce Use-Case-Grounded Simulated Evaluations (SimEvals). SimEvals involve training algorithmic agents that take as input the information content (such as model explanations) that would be presented to each participant in a human subject study, to predict answers to the use case of interest. The algorithmic agent's test set accuracy provides a measure of the predictiveness of the information content for the downstream use case. We run a comprehensive evaluation on three real-world use cases (forward simulation, model debugging, and counterfactual reasoning) to demonstrate that Simevals can effectively identify which explanation methods will help humans for each use case. These results provide evidence that SimEvals can be used to efficiently screen an important set of user study design decisions, e.g. selecting which explanations should be presented to the user, before running a potentially costly user study.

下载PDF全文

下载文献需遵守相关版权规定

论文标题