价值迭代算法的强度多项式，用于计算折扣动态编程的几乎最佳策略

论文标题

价值迭代算法的强度多项式，用于计算折扣动态编程的几乎最佳策略

Strong Polynomiality of the Value Iteration Algorithm for Computing Nearly Optimal Policies for Discounted Dynamic Programming

论文作者

Feinberg, Eugene A., He, Gaojin

论文摘要

本说明提供了对按价值迭代计算所需的操作数量的上限，这是无限马可分子折扣马尔可夫决策过程的几乎最佳政策，并具有有限的状态和行动。对于给定的折现因子，奖励功能的大小以及与最佳性的所需关系，这些上限在州行动对的数量中是强烈的多项式，并且所提供的上限之一具有其属性，即它是折现因子价值的不良函数。

This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given discount factor, magnitude of the reward function, and desired closeness to optimality, these upper bounds are strongly polynomial in the number of state-action pairs, and one of the provided upper bounds has the property that it is a non-decreasing function of the value of the discount factor.

下载PDF全文

下载文献需遵守相关版权规定

论文标题