在存在混淆的医疗干预措施的情况下，监视机器学习（ML）的风险预测算法

论文标题

在存在混淆的医疗干预措施的情况下，监视机器学习（ML）的风险预测算法

Monitoring machine learning (ML)-based risk prediction algorithms in the presence of confounding medical interventions

论文作者

Feng, Jean, Gossmann, Alexej, Pennello, Gene, Petrick, Nicholas, Sahiner, Berkman, Pirracchio, Romain

论文摘要

混淆医疗干预措施（CMI）的问题使机器学习（ML）的风险预测模型（CMI）变得复杂：当算法预测患者对不良事件的高风险有很高的风险时，临床医生更有可能管理预防性治疗，并改变AlgorithM的目标。一种简单的方法是忽略CMI并仅监测未经治疗的患者，其结果保持不变。通常，忽略CMI可能会膨胀I型错误，因为（i）未经治疗的患者不成比例地代表那些风险较低的患者，并且（ii）模型和临床医生在模型中的进化都可以诱导违反标准假设的复杂依赖性。然而，我们表明，如果一个人监视有条件的性能以及有条件的交换性或时间稳定的选择偏差，则仍然可以进行有效的推断。具体而言，我们开发了具有动态控制限制的新的基于分数的累积总和（CUSUM）监视程序。通过模拟，我们证明了将模型更新与监视结合的好处，并调查了预测模型中的过度信任如何延迟性能恶化的检测。最后，我们说明了如何使用这些监测方法来检测基于ML的风险计算器的校准衰减，以用于术后恶心和COVID-19 COVID-19大流行期间的呕吐。

Performance monitoring of machine learning (ML)-based risk prediction models in healthcare is complicated by the issue of confounding medical interventions (CMI): when an algorithm predicts a patient to be at high risk for an adverse event, clinicians are more likely to administer prophylactic treatment and alter the very target that the algorithm aims to predict. A simple approach is to ignore CMI and monitor only the untreated patients, whose outcomes remain unaltered. In general, ignoring CMI may inflate Type I error because (i) untreated patients disproportionally represent those with low predicted risk and (ii) evolution in both the model and clinician trust in the model can induce complex dependencies that violate standard assumptions. Nevertheless, we show that valid inference is still possible if one monitors conditional performance and if either conditional exchangeability or time-constant selection bias hold. Specifically, we develop a new score-based cumulative sum (CUSUM) monitoring procedure with dynamic control limits. Through simulations, we demonstrate the benefits of combining model updating with monitoring and investigate how over-trust in a prediction model may delay detection of performance deterioration. Finally, we illustrate how these monitoring methods can be used to detect calibration decay of an ML-based risk calculator for postoperative nausea and vomiting during the COVID-19 pandemic.

下载PDF全文

下载文献需遵守相关版权规定

论文标题