折现马尔可夫决策过程的功能稳定性使用经济MPC散发学理论

论文标题

折现马尔可夫决策过程的功能稳定性使用经济MPC散发学理论

Functional Stability of Discounted Markov Decision Processes Using Economic MPC Dissipativity Theory

论文作者

Kordabad, Arash Bahari, Gros, Sebastien

论文摘要

本文讨论了闭环马尔可夫链的功能稳定性，这是由折扣优化标准产生的最佳政策，形成了马尔可夫决策过程（MDPS）。我们研究了国家分布的概率度量（密度），并扩展了经济模型预测性控制的耗散理论，以表征MDP稳定性，从而研究了MDP的稳定性。该理论需要一种所谓的存储函数，以满足耗散性不平等。在概率衡量空间和折扣设置中，我们引入了新的耗散条件，以确保MDP稳定性。然后，我们使用有限的 - 摩恩最佳控制问题，以生成有效的存储功能。实际上，我们建议使用Q-学习来计算存储功能。

This paper discusses the functional stability of closed-loop Markov Chains under optimal policies resulting from a discounted optimality criterion, forming Markov Decision Processes (MDPs). We investigate the stability of MDPs in the sense of probability measures (densities) underlying the state distributions and extend the dissipativity theory of Economic Model Predictive Control in order to characterize the MDP stability. This theory requires a so-called storage function satisfying a dissipativity inequality. In the probability measures space and for the discounted setting, we introduce new dissipativity conditions ensuring the MDP stability. We then use finite-horizon optimal control problems in order to generate valid storage functionals. In practice, we propose to use Q-learning to compute the storage functionals.

下载PDF全文

下载文献需遵守相关版权规定

论文标题