概率受限加强学习的政策梯度

论文标题

概率受限加强学习的政策梯度

Policy Gradients for Probabilistic Constrained Reinforcement Learning

论文作者

Chen, Weiqin, Subramanian, Dharmashankar, Paternain, Santiago

论文摘要

本文考虑了在强化学习（RL）的背景下学习安全政策的问题。特别是，我们考虑了概率安全的概念。这就是我们的目标，以设计具有很高可能性的安全集中的系统状态的政策。这个概念与文献中经常考虑的累积约束不同。使用概率安全的挑战是缺乏其梯度表达式。实际上，政策优化算法依赖于目标函数的梯度和约束。据我们所知，这项工作是为概率约束提供此类明确梯度表达式的第一部作品。值得注意的是，该约束家族的梯度可以应用于各种基于政策的算法。我们从经验上证明，可以在连续导航问题中处理概率约束。

This paper considers the problem of learning safe policies in the context of reinforcement learning (RL). In particular, we consider the notion of probabilistic safety. This is, we aim to design policies that maintain the state of the system in a safe set with high probability. This notion differs from cumulative constraints often considered in the literature. The challenge of working with probabilistic safety is the lack of expressions for their gradients. Indeed, policy optimization algorithms rely on gradients of the objective function and the constraints. To the best of our knowledge, this work is the first one providing such explicit gradient expressions for probabilistic constraints. It is worth noting that the gradient of this family of constraints can be applied to various policy-based algorithms. We demonstrate empirically that it is possible to handle probabilistic constraints in a continuous navigation problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题