UNITE：利用多源数据的基于不确定性的健康风险预测

论文标题

UNITE：利用多源数据的基于不确定性的健康风险预测

UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data

论文作者

Chen, Chacha, Liang, Junjie, Ma, Fenglong, Glass, Lucas M., Sun, Jimeng, Xiao, Cao

论文摘要

成功的健康风险预测需要模型的准确性和可靠性。现有的预测模型主要取决于采矿电子健康记录（EHR），其先进的深度学习技术以提高模型的准确性。但是，他们都忽略了公开可用的在线健康数据的重要性，尤其是社会经济状况，环境因素和每个位置的详细人口统计信息，这些信息都是强烈的预测信号，并且绝对可以增强精密医学。为了实现模型可靠性，该模型需要提供预测的准确预测和不确定性得分。但是，现有的不确定性估计方法在处理多源数据中存在的高维数据时通常会失败。为了填补空白，我们提出了基于不确定性的健康风险预测（UNITE）模型。在自适应多模式深内核和随机变异推理模块的基础上，UNITE提供了准确的疾病风险预测和不确定性估计，利用了多种健康数据，包括EHR数据，患者人口统计数据和从网络收集的公共卫生数据。我们评估了现实世界中疾病风险预测任务的联合：非酒精性脂肪肝病（NASH）和阿尔茨海默氏病（AD）。 UNITE在F1检测中达到高达0.841，用于NASH检测的PR-AUC高达0.609，并且比最佳基线优于$ 19 \％的$ 19 \％。我们还显示Unite可以模拟有意义的不确定性，并可以通过聚集类似患者来提供基于证据的临床支持。

Successful health risk prediction demands accuracy and reliability of the model. Existing predictive models mainly depend on mining electronic health records (EHR) with advanced deep learning techniques to improve model accuracy. However, they all ignore the importance of publicly available online health data, especially socioeconomic status, environmental factors, and detailed demographic information for each location, which are all strong predictive signals and can definitely augment precision medicine. To achieve model reliability, the model needs to provide accurate prediction and uncertainty score of the prediction. However, existing uncertainty estimation approaches often failed in handling high-dimensional data, which are present in multi-sourced data. To fill the gap, we propose UNcertaInTy-based hEalth risk prediction (UNITE) model. Building upon an adaptive multimodal deep kernel and a stochastic variational inference module, UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data including EHR data, patient demographics, and public health data collected from the web. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD). UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19\%$ over the best baseline. We also show UNITE can model meaningful uncertainties and can provide evidence-based clinical support by clustering similar patients.

下载PDF全文

下载文献需遵守相关版权规定

论文标题