福尔摩斯：健康在线模型合奏为重症监护病房的深度学习模型服务

论文标题

福尔摩斯：健康在线模型合奏为重症监护病房的深度学习模型服务

HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units

论文作者

Hong, Shenda, Xu, Yanbo, Khare, Alind, Priambada, Satria, Maher, Kevin, Aljiffry, Alaa, Sun, Jimeng, Tumanov, Alexey

论文摘要

深度学习模型已在医疗保健中实现了专家级的表现，专注于培训准确的模型。但是，在许多临床环境（例如重症监护室（ICU））中，实时模型服务同样重要，而不是准确性，因为在ICU中，患者护理在同时更加紧急和更昂贵。因此，临床决策及其及时性直接影响患者的结果和护理成本。为了及时做出决定，我们认为基础服务系统必须是延迟感知的。为了加剧挑战，健康分析应用程序通常需要模型而不是单个模型的组合，以更好地为不同目标，多模式数据，不同的预测窗口以及潜在的个性化预测来更好地专门针对单个模型。为了应对这些挑战，我们提出了Holmes-An Online Model Model Ensemble服务框架，用于医疗保健应用程序。福尔摩斯动态地识别出最佳性能的模型集，以达到最高精度，同时还满足了端到端预测的下秒延迟约束。我们证明，福尔摩斯能够有效地导航准确性/潜伏期折衷，组成合奏并服务模型集成管道，从而扩展到从100名患者中流式传输数据，每个患者以250〜Hz的形式产生波形数据。福尔摩斯在准确性和延迟（按数量级）方面优于对同一临床任务的常规离线批处理处理。在小儿有氧ICU数据上，对HOLMES的风险预测任务进行了测试，其预测准确性高于95％，并且在64张床模拟的次秒延迟中进行了测试。

Deep learning models have achieved expert-level performance in healthcare with an exclusive focus on training accurate models. However, in many clinical environments such as intensive care unit (ICU), real-time model serving is equally if not more important than accuracy, because in ICU patient care is simultaneously more urgent and more expensive. Clinical decisions and their timeliness, therefore, directly affect both the patient outcome and the cost of care. To make timely decisions, we argue the underlying serving system must be latency-aware. To compound the challenge, health analytic applications often require a combination of models instead of a single model, to better specialize individual models for different targets, multi-modal data, different prediction windows, and potentially personalized predictions. To address these challenges, we propose HOLMES-an online model ensemble serving framework for healthcare applications. HOLMES dynamically identifies the best performing set of models to ensemble for highest accuracy, while also satisfying sub-second latency constraints on end-to-end prediction. We demonstrate that HOLMES is able to navigate the accuracy/latency tradeoff efficiently, compose the ensemble, and serve the model ensemble pipeline, scaling to simultaneously streaming data from 100 patients, each producing waveform data at 250~Hz. HOLMES outperforms the conventional offline batch-processed inference for the same clinical task in terms of accuracy and latency (by order of magnitude). HOLMES is tested on risk prediction task on pediatric cardio ICU data with above 95% prediction accuracy and sub-second latency on 64-bed simulation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题