论文标题
LIMSUP和LIMINF阈值目标的策略复杂性在可数的MDP中,并申请最佳预期收益
Strategy Complexity of Limsup and Liminf Threshold Objectives in Countable MDPs, with Applications to Optimal Expected Payoffs
论文作者
论文摘要
我们研究了马尔可夫决策过程(MDP),这些国家数量无数状态。 $ \ limsup $(分别$ \ liminf $)阈值目标是最大化直接看到奖励的无限序列的$ \ limsup $(resp。$ \ liminf $)的可能性。我们建立了这些目标的策略复杂性的完整图片,即$ \ varepsilon $ -optimal(最佳)策略所要求的内存上的上限和下限。然后,我们将这些结果应用于(Sudderth,经济学和金融的决策,2020年),以解决有关预期$ \ limsup $(resp。$ \ liminf $)回报的最佳策略的战略复杂性。
We study Markov decision processes (MDPs) with a countably infinite number of states. The $\limsup$ (resp. $\liminf$) threshold objective is to maximize the probability that the $\limsup$ (resp. $\liminf$) of the infinite sequence of directly seen rewards is non-negative. We establish the complete picture of the strategy complexity of these objectives, i.e., the upper and lower bounds on the memory required by $\varepsilon$-optimal (resp. optimal) strategies. We then apply these results to solve two open problems from (Sudderth, Decisions in Economics and Finance, 2020) about the strategy complexity of optimal strategies for the expected $\limsup$ (resp. $\liminf$) payoff.