论文标题
通过决策界限审核和调试深度学习模型:个人级别和小组级分析
Auditing and Debugging Deep Learning Models via Decision Boundaries: Individual-level and Group-level Analysis
论文作者
论文摘要
深度学习模型因缺乏简单的解释而受到批评,这破坏了他们对重要应用的信心。然而,它们始终在许多应用中使用,这对人类的生活造成了影响,这主要是因为它们的表现更好。因此,非常需要可以解释,审核和调试此类模型的计算方法。在这里,我们使用翻转点来实现这些目标,该模型具有连续的输出分数(例如,由SoftMax计算),用于社交应用中。翻转点是两个输出类之间的边界上的任何点:例如对于具有二进制是/否输出的模型,翻转点是任何输入的任何输入,该输入为“是”和“否”产生相等的得分。最接近给定输入的翻转点特别重要,因为它揭示了输入中最小的变化,这将改变模型的分类,我们表明它是解决良好的优化问题的解决方案。翻转点还使我们能够系统地研究深度学习分类器的决策界限。对深层模型的决策界限的最终见解可以通过非专家可以理解的解释报告清楚地解释该模型在个体级别上的输出。我们还制定了一个程序,以了解和审核对人群的模型行为。翻转点也可以用于改变决策界限,以改善不良行为。我们通过研究在机器学习的社交应用中使用的标准数据集培训的几种模型来证明我们的方法。我们还确定了最负责特定分类和错误分类的功能。
Deep learning models have been criticized for their lack of easy interpretation, which undermines confidence in their use for important applications. Nevertheless, they are consistently utilized in many applications, consequential to humans' lives, mostly because of their better performance. Therefore, there is a great need for computational methods that can explain, audit, and debug such models. Here, we use flip points to accomplish these goals for deep learning models with continuous output scores (e.g., computed by softmax), used in social applications. A flip point is any point that lies on the boundary between two output classes: e.g. for a model with a binary yes/no output, a flip point is any input that generates equal scores for "yes" and "no". The flip point closest to a given input is of particular importance because it reveals the least changes in the input that would change a model's classification, and we show that it is the solution to a well-posed optimization problem. Flip points also enable us to systematically study the decision boundaries of a deep learning classifier. The resulting insight into the decision boundaries of a deep model can clearly explain the model's output on the individual-level, via an explanation report that is understandable by non-experts. We also develop a procedure to understand and audit model behavior towards groups of people. Flip points can also be used to alter the decision boundaries in order to improve undesirable behaviors. We demonstrate our methods by investigating several models trained on standard datasets used in social applications of machine learning. We also identify the features that are most responsible for particular classifications and misclassifications.