轨迹检查：一种迭代临床医生驱动的增强学习研究设计的方法

论文标题

轨迹检查：一种迭代临床医生驱动的增强学习研究设计的方法

Trajectory Inspection: A Method for Iterative Clinician-Driven Design of Reinforcement Learning Studies

论文作者

Ji, Christina X., Oberst, Michael, Kanjilal, Sanjat, Sontag, David

论文摘要

强化学习（RL）有可能显着改善临床决策。但是，从观察数据中通过RL学到的治疗政策对研究设计中的微妙选择敏感。我们强调了一种简单的方法，即轨迹检查，将临床医生带入基于模型的RL研究的迭代设计过程。我们确定该模型建议出乎意料的积极治疗或期望从其建议中获得令人惊讶的积极结果。然后，我们检查了使用学习模型和政策模拟的临床轨迹以及实际的医院课程。将这种方法应用于RL的最新工作进行败血症管理，我们发现了模型偏差，偏爱可能与小样本量有关的高加压剂剂量，以及临床上令人难以置信的排放期望，而无需断奶的血管加压剂。我们希望，检测和解决我们方法发现的问题的迭代将导致RL政策激发人们对部署的信心。

Reinforcement learning (RL) has the potential to significantly improve clinical decision making. However, treatment policies learned via RL from observational data are sensitive to subtle choices in study design. We highlight a simple approach, trajectory inspection, to bring clinicians into an iterative design process for model-based RL studies. We identify where the model recommends unexpectedly aggressive treatments or expects surprisingly positive outcomes from its recommendations. Then, we examine clinical trajectories simulated with the learned model and policy alongside the actual hospital course. Applying this approach to recent work on RL for sepsis management, we uncover a model bias towards discharge, a preference for high vasopressor doses that may be linked to small sample sizes, and clinically implausible expectations of discharge without weaning off vasopressors. We hope that iterations of detecting and addressing the issues unearthed by our method will result in RL policies that inspire more confidence in deployment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题