在具有完全观察到的结果的多目标环境中的治疗政策学习

论文标题

在具有完全观察到的结果的多目标环境中的治疗政策学习

Treatment Policy Learning in Multiobjective Settings with Fully Observed Outcomes

论文作者

Boominathan, Soorajnath, Oberst, Michael, Zhou, Helen, Kanjilal, Sanjat, Sontag, David

论文摘要

在一些医疗决策问题（例如抗生素处方）中，实验室测试可以为患者如何应对不同的治疗选择提供精确的指示。这使我们能够“充分观察”所有潜在的治疗结果，但是尽管存在历史数据，但在初始治疗决策时，这些结果是实时产生的。此外，这些环境中的治疗政策通常需要在多个相互竞争的目标之间进行权衡，例如治疗的有效性和有害的副作用。我们介绍，比较和评估在这种情况下学习个性化治疗政策的三种方法：首先，我们考虑两种间接方法，它们使用治疗响应的预测模型来构建对目标之间不同权衡的构建政策。其次，我们考虑一种直接的方法，该方法可以构建此类政策，而没有中间模型的结果。使用尿路感染（UTI）患者的医疗数据集，我们表明所有方法都学会了与临床医生相比，在所有结果上都严格取得更好的表现，同时也在不同的目标之间进行交易。我们证明了直接方法的其他好处，包括在简单情况下灵活地纳入延期为医生的其他目标。

In several medical decision-making problems, such as antibiotic prescription, laboratory testing can provide precise indications for how a patient will respond to different treatment options. This enables us to "fully observe" all potential treatment outcomes, but while present in historical data, these results are infeasible to produce in real-time at the point of the initial treatment decision. Moreover, treatment policies in these settings often need to trade off between multiple competing objectives, such as effectiveness of treatment and harmful side effects. We present, compare, and evaluate three approaches for learning individualized treatment policies in this setting: First, we consider two indirect approaches, which use predictive models of treatment response to construct policies optimal for different trade-offs between objectives. Second, we consider a direct approach that constructs such a set of policies without intermediate models of outcomes. Using a medical dataset of Urinary Tract Infection (UTI) patients, we show that all approaches learn policies that achieve strictly better performance on all outcomes than clinicians, while also trading off between different objectives. We demonstrate additional benefits of the direct approach, including flexibly incorporating other goals such as deferral to physicians on simple cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题