L1-Robust Markov决策过程的部分政策迭代

论文标题

L1-Robust Markov决策过程的部分政策迭代

Partial Policy Iteration for L1-Robust Markov Decision Processes

论文作者

Ho, Chin Pang, Petrik, Marek, Wiesemann, Wolfram

论文摘要

强大的马尔可夫决策过程（MDP）允许计算可靠的解决方案，以解决动态决策问题，这些决策问题的发展是由奖励和部分已知的过渡概率建模的。不幸的是，考虑到过渡概率的不确定性会显着增加求解强大MDP的计算复杂性，从而严重限制了其可扩展性。本文介绍了新的有效算法，用于解决由加权$ L_1 $规范定义的S-和SA矩形歧义集的共同稳健MDP。我们建议针对强大的MDPS进行部分政策迭代，一种新的，有效的，灵活的和一般的政策迭代计划。我们还提出了在准线性时间中计算强大的贝尔曼运算符的快速方法，几乎与非稳态贝尔曼操作员的线性复杂性匹配。我们的实验结果表明，所提出的方法比使用线性编程求解器与可靠值迭代相结合的最新方法要快。

Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for uncertainty in the transition probabilities significantly increases the computational complexity of solving robust MDPs, which severely limits their scalability. This paper describes new efficient algorithms for solving the common class of robust MDPs with s- and sa-rectangular ambiguity sets defined by weighted $L_1$ norms. We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs. We also propose fast methods for computing the robust Bellman operator in quasi-linear time, nearly matching the linear complexity the non-robust Bellman operator. Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach which uses linear programming solvers combined with a robust value iteration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题