论文标题

MDPS通过矩耦合的低级别近似值

A Low-rank Approximation for MDPs via Moment Coupling

论文作者

Zhang, Amy B. Z., Gurvich, Itai

论文摘要

我们介绍了一个框架,以近似马尔可夫决策过程,该过程站在两个支柱上:国家聚集 - 作为算法基础设施;和中央限制性理论型近似 - 作为最佳保证的数学基础。 The theory is grounded in recent work Braverman et al (2020} that relates the solution of the Bellman equation to that of a PDE where, in the spirit of the central limit theorem, the transition matrix is reduced to its local first and second moments. Solving the PDE is $\textit{not}$ required by our method. Instead, we construct a "sister" (controlled) Markov chain whose two local transition moments are approximately identical由于这种$ \ textit {moment匹配} $,原始链条及其“姐妹”通过PDE耦合,这是一种耦合,可有助于最佳保证。从$ n $到$ n^{\ frac {1} {2}+ε} $,就像一个人可能从基于中央限制定理的近似值中所期望的那样。

We introduce a framework to approximate a Markov Decision Process that stands on two pillars: state aggregation -- as the algorithmic infrastructure; and central-limit-theorem-type approximations -- as the mathematical underpinning of optimality guarantees. The theory is grounded in recent work Braverman et al (2020} that relates the solution of the Bellman equation to that of a PDE where, in the spirit of the central limit theorem, the transition matrix is reduced to its local first and second moments. Solving the PDE is $\textit{not}$ required by our method. Instead, we construct a "sister" (controlled) Markov chain whose two local transition moments are approximately identical with those of the focal chain. Because of this $\textit{moment matching}$, the original chain and its "sister" are coupled through the PDE, a coupling that facilitates optimality guarantees. Embedded into standard soft aggregation algorithms, moment matching provided a disciplined mechanism to tune the aggregation and disaggregation probabilities. The computational gains arise from the reduction of the effective state space from $N$ to $N^{\frac{1}{2}+ε}$ is as one might intuitively expect from approximations grounded in the central limit theorem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源