论文标题
递归限制以防止受约束的强化学习中的不稳定性
Recursive Constraints to Prevent Instability in Constrained Reinforcement Learning
论文作者
论文摘要
我们考虑为马尔可夫决策过程找到确定性政策的挑战,该过程(在所有州)均匀地(在所有州)最大化一个奖励,但要受到不同奖励的概率约束。现有的解决方案并不能完全解决我们确切的问题定义,但是,这在安全至关重要的机器人系统的背景下自然而然地产生。已知这类问题很难,但是确定性和统一最优性的综合要求可以造成学习不稳定。在这项工作中,在用一个简单的例子描述和激励我们的问题之后,我们提出了一种适当的约束增强学习算法,可以使用递归约束来防止学习不稳定。我们提出的方法承认一种近似形式,可提高效率,并且是保守的W.R.T.约束。
We consider the challenge of finding a deterministic policy for a Markov decision process that uniformly (in all states) maximizes one reward subject to a probabilistic constraint over a different reward. Existing solutions do not fully address our precise problem definition, which nevertheless arises naturally in the context of safety-critical robotic systems. This class of problem is known to be hard, but the combined requirements of determinism and uniform optimality can create learning instability. In this work, after describing and motivating our problem with a simple example, we present a suitable constrained reinforcement learning algorithm that prevents learning instability, using recursive constraints. Our proposed approach admits an approximative form that improves efficiency and is conservative w.r.t. the constraint.