论文标题
评估可解释的AI:哪种算法解释有助于用户预测模型行为?
Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?
论文作者
论文摘要
近年来,解释机器学习模型的算法方法已激增。我们进行了人类主题测试,这是将算法解释对模型可解释性,模拟性的关键方面隔离的影响,同时避免重要的混杂实验因素的关键方面的作用。当一个人可以预测其在新输入上的行为时,模型就可以模拟。通过涉及文本和表格数据的两种模拟测试,我们评估了五种解释方法:(1)石灰,(2)锚,(3)决策边界,(4)原型模型,以及(5)一种结合了每种方法解释的复合方法。在很少的情况下,发现了方法有效性的明确证据:石灰改善了表格分类的模拟性,我们的原型方法在反事实模拟测试中有效。我们还收集了解释的主观评分,但我们并未发现评级可以预测解释的有益解释。我们的结果提供了对解释如何影响各种解释方法和数据域中的模拟性的第一个可靠且全面的估计。我们表明(1)我们需要谨慎对待评估解释方法的指标,并且(2)当前方法有很大的改进空间。我们所有的支持代码,数据和模型均可在以下网址公开获取:https://github.com/peterbhase/interpretablenlp-acl2020
Algorithmic approaches to interpreting machine learning models have proliferated in recent years. We carry out human subject tests that are the first of their kind to isolate the effect of algorithmic explanations on a key aspect of model interpretability, simulatability, while avoiding important confounding experimental factors. A model is simulatable when a person can predict its behavior on new inputs. Through two kinds of simulation tests involving text and tabular data, we evaluate five explanations methods: (1) LIME, (2) Anchor, (3) Decision Boundary, (4) a Prototype model, and (5) a Composite approach that combines explanations from each method. Clear evidence of method effectiveness is found in very few cases: LIME improves simulatability in tabular classification, and our Prototype method is effective in counterfactual simulation tests. We also collect subjective ratings of explanations, but we do not find that ratings are predictive of how helpful explanations are. Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability across a variety of explanation methods and data domains. We show that (1) we need to be careful about the metrics we use to evaluate explanation methods, and (2) there is significant room for improvement in current methods. All our supporting code, data, and models are publicly available at: https://github.com/peterbhase/InterpretableNLP-ACL2020