论文标题
价值迭代算法的强度多项式,用于计算折扣动态编程的几乎最佳策略
Strong Polynomiality of the Value Iteration Algorithm for Computing Nearly Optimal Policies for Discounted Dynamic Programming
论文作者
论文摘要
本说明提供了对按价值迭代计算所需的操作数量的上限,这是无限马可分子折扣马尔可夫决策过程的几乎最佳政策,并具有有限的状态和行动。对于给定的折现因子,奖励功能的大小以及与最佳性的所需关系,这些上限在州行动对的数量中是强烈的多项式,并且所提供的上限之一具有其属性,即它是折现因子价值的不良函数。
This note provides upper bounds on the number of operations required to compute by value iterations a nearly optimal policy for an infinite-horizon discounted Markov decision process with a finite number of states and actions. For a given discount factor, magnitude of the reward function, and desired closeness to optimality, these upper bounds are strongly polynomial in the number of state-action pairs, and one of the provided upper bounds has the property that it is a non-decreasing function of the value of the discount factor.