论文标题

受限的上限增强学习

Constrained Upper Confidence Reinforcement Learning

论文作者

Zheng, Liyuan, Ratliff, Lillian J.

论文摘要

受限的马尔可夫决策过程是一类随机决策问题,决策者必须选择满足辅助成本限制的策略。本文扩展了对设置的上限置信加强学习,在这种设置中,奖励函数和由成本函数描述的约束尚不清楚,但已知过渡内核。通过许多应用程序,包括对未知,可能不安全的环境的探索,这一设置激励了。我们提出了一种算法C-UCRL,并表明它实现了亚线性遗憾($ O(T^{\ frac {\ frac {3} {4}}} \ sqrt {\ log(t/δ)} $)在满足奖励的同时甚至可以满足可能性的同时,同时还可以满足奖励,即提供了说明性的例子。

Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known. Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret ($ O(T^{\frac{3}{4}}\sqrt{\log(T/δ)})$) with respect to the reward while satisfying the constraints even while learning with probability $1-δ$. Illustrative examples are provided.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源