受限的上限增强学习

论文标题

受限的上限增强学习

Constrained Upper Confidence Reinforcement Learning

论文作者

Zheng, Liyuan, Ratliff, Lillian J.

论文摘要

受限的马尔可夫决策过程是一类随机决策问题，决策者必须选择满足辅助成本限制的策略。本文扩展了对设置的上限置信加强学习，在这种设置中，奖励函数和由成本函数描述的约束尚不清楚，但已知过渡内核。通过许多应用程序，包括对未知，可能不安全的环境的探索，这一设置激励了。我们提出了一种算法C-UCRL，并表明它实现了亚线性遗憾（$ O（T^{\ frac {\ frac {3} {4}}} \ sqrt {\ log（t/δ）} $）在满足奖励的同时甚至可以满足可能性的同时，同时还可以满足奖励，即提供了说明性的例子。

Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret ($ O(T^{\frac{3}{4}}\sqrt{\log(T/δ)})$) with respect to the reward while satisfying the constraints even while learning with probability $1-δ$. Illustrative examples are provided.

下载PDF全文

下载文献需遵守相关版权规定

论文标题