一种可解释的基于XGBoost的方法，用于评估2型糖尿病患者心血管疾病风险

论文标题

一种可解释的基于XGBoost的方法，用于评估2型糖尿病患者心血管疾病风险

An explainable XGBoost-based approach towards assessing the risk of cardiovascular disease in patients with Type 2 Diabetes Mellitus

论文作者

Athanasiou, Maria, Sfrintzeri, Konstantina, Zarkogianni, Konstantia, Thanopoulou, Anastasia C., Nikita, Konstantina S.

论文摘要

心血管疾病（CVD）是糖尿病（DM）患者残疾和死亡的重要原因。 2型DM（T2DM）管理的国际临床指南建立在初级和次要预防上，并有利于评估与CVD相关的风险因素对适当的治疗开始。 CVD风险预测模型可以提供有价值的工具，以优化医疗访问的频率，并对CVD事件进行及时的预防和治疗干预措施。这些模型中解释性模式的整合可以增强人类对推理过程的理解，最大化透明度，并利用对模型在临床实践中采用的信任。本研究的目的是开发和评估T2DM个体致命或非致命CVD发病率的可解释的个性化风险预测模型。一种基于极端梯度提升（XGBoost）和树状（Shapley添加说明）方法的可解释方法，以计算5年CVD风险以及对模型决策的单个解释产生。 5年随访的560例T2DM患者的数据用于开发和评估目的。获得的结果（AUC = 71.13％）表明，所提出的方法处理使用的数据集的不平衡性质的潜力，同时提供了有关集成模型的决策过程的临床意义见解。

Cardiovascular Disease (CVD) is an important cause of disability and death among individuals with Diabetes Mellitus (DM). International clinical guidelines for the management of Type 2 DM (T2DM) are founded on primary and secondary prevention and favor the evaluation of CVD related risk factors towards appropriate treatment initiation. CVD risk prediction models can provide valuable tools for optimizing the frequency of medical visits and performing timely preventive and therapeutic interventions against CVD events. The integration of explainability modalities in these models can enhance human understanding on the reasoning process, maximize transparency and embellish trust towards the models' adoption in clinical practice. The aim of the present study is to develop and evaluate an explainable personalized risk prediction model for the fatal or non-fatal CVD incidence in T2DM individuals. An explainable approach based on the eXtreme Gradient Boosting (XGBoost) and the Tree SHAP (SHapley Additive exPlanations) method is deployed for the calculation of the 5-year CVD risk and the generation of individual explanations on the model's decisions. Data from the 5-year follow up of 560 patients with T2DM are used for development and evaluation purposes. The obtained results (AUC = 71.13%) indicate the potential of the proposed approach to handle the unbalanced nature of the used dataset, while providing clinically meaningful insights about the ensemble model's decision process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题