近似动态编程方法，以控制选举患者

论文标题

近似动态编程方法，以控制选举患者

An approximate dynamic programming approach to the admission control of elective patients

论文作者

Zhang, Jian, Dridi, Mahjoub, Moudni, Abdellah El

论文摘要

在本文中，我们提出了一种近似动态编程（ADP）算法来解决马尔可夫决策过程（MDP）公式，以控制选举患者。为了公平有效地管理多个专业的选修患者，我们建立了一个候补名单，并为每个患者分配了时间依赖的动态优先级评分。然后，考虑到患者的随机到达，每周都会进行顺序决定。在每周结束时，我们选择了下周的患者从等待名单中进行治疗。通过最大程度地减少MDP在无限视野上的成本函数，我们试图实现患者的等待时间与外科资源过度利用之间的最佳权衡。考虑到大规模的实际问题所产生的维数的诅咒，我们首先分析了MDP的结构特性，并提出了一种算法，以促进搜索最佳动作。然后，我们开发了一种新型的基于增强学习的ADP算法作为解决方案技术。实验结果表明，与常规动态编程方法相比，所提出的算法所需的计算时间更少。此外，该算法被证明能够计算现实尺寸问题的高质量近乎最佳的策略。

In this paper, we propose an approximate dynamic programming (ADP) algorithm to solve a Markov decision process (MDP) formulation for the admission control of elective patients. To manage the elective patients from multiple specialties equitably and efficiently, we establish a waiting list and assign each patient a time-dependent dynamic priority score. Then, taking the random arrivals of patients into account, sequential decisions are made on a weekly basis. At the end of each week, we select the patients to be treated in the following week from the waiting list. By minimizing the cost function of the MDP over an infinite horizon, we seek to achieve the best trade-off between the patients' waiting times and the over-utilization of surgical resources. Considering the curses of dimensionality resulting from the large scale of realistically sized problems, we first analyze the structural properties of the MDP and propose an algorithm that facilitates the search for best actions. We then develop a novel reinforcement-learning-based ADP algorithm as the solution technique. Experimental results reveal that the proposed algorithms consume much less computation time in comparison with that required by conventional dynamic programming methods. Additionally, the algorithms are shown to be capable of computing high-quality near-optimal policies for realistically sized problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题