论文标题
高风险领域中基于概念和论证的可解释模型
A Concept and Argumentation based Interpretable Model in High Risk Domains
论文作者
论文摘要
可解释性已成为某些高风险领域(例如医疗保健,银行和安全性)中人工智能的重要话题。对于常用的表格数据,传统方法仅使用数值和分类数据训练了端到端的机器学习模型,并且不利用人类可理解的知识,例如数据描述。然而,从表格数据中挖掘人类水平的知识并将其用于预测仍然是一个挑战。因此,我们提出了一个基于概念和论证的模型(CAM),其中包括以下两个组成部分:一种新颖的概念挖掘方法,可从特征的描述和基本数据的描述中获得人类可以理解的概念及其关系,以及一种基于定量论证的方法来进行知识表示和推理。因此,CAM提供了基于人类水平知识的决策,推理过程在本质上可以解释。最后,为了可视化有目的的可解释模型,我们提供了一个对话说明,其中包含CAM内主导的推理路径。开源基准数据集和现实词业务数据集的实验结果表明,CAM是透明且可解释的,并且CAM内部的知识与人类的理解是一致的; (2)与其他最先进模型相比,我们的可解释方法可以达到竞争结果。
Interpretability has become an essential topic for artificial intelligence in some high-risk domains such as healthcare, bank and security. For commonly-used tabular data, traditional methods trained end-to-end machine learning models with numerical and categorical data only, and did not leverage human understandable knowledge such as data descriptions. Yet mining human-level knowledge from tabular data and using it for prediction remain a challenge. Therefore, we propose a concept and argumentation based model (CAM) that includes the following two components: a novel concept mining method to obtain human understandable concepts and their relations from both descriptions of features and the underlying data, and a quantitative argumentation-based method to do knowledge representation and reasoning. As a result of it, CAM provides decisions that are based on human-level knowledge and the reasoning process is intrinsically interpretable. Finally, to visualize the purposed interpretable model, we provide a dialogical explanation that contain dominated reasoning path within CAM. Experimental results on both open source benchmark dataset and real-word business dataset show that (1) CAM is transparent and interpretable, and the knowledge inside the CAM is coherent with human understanding; (2) Our interpretable approach can reach competitive results comparing with other state-of-art models.