论文标题
通过深度加固学习
Infinite-Horizon Reach-Avoid Zero-Sum Games via Deep Reinforcement Learning
论文作者
论文摘要
在本文中,我们考虑了无限摩托车触及 - 避免零和游戏问题,在该问题中,目标是在状态空间中找到一套,称为“远距离避免范围”集合,以便可以控制在其中一个州开始的系统以达到给定的目标集,而无需在最严重的情况下违反约束。我们通过使用合同的Bellman备份来设计新的值函数来解决此问题,其中超级零级集合,即评估值函数为非负数的状态集,可恢复到达范围的远距离集。在此基础上,我们证明可以对提出的方法进行调整以计算可行性内核,或者可以控制的一组状态,以满足给定的限制,或者向后可触及的集合,或者可以向给定目标集驱动的一组状态。最后,我们建议通过将保守的Q学习(一种深厚的强化学习技术)扩展到高维问题中的维度问题的诅咒,以学习一种价值函数,以使学习价值函数的超零级集可以用作(保守的)(保守)近似值范围。我们的理论和经验结果表明,即使使用神经网络近似,提出的方法也可以可靠地学习到达范围的避免范围和最佳控制策略。
In this paper, we consider the infinite-horizon reach-avoid zero-sum game problem, where the goal is to find a set in the state space, referred to as the reach-avoid set, such that the system starting at a state therein could be controlled to reach a given target set without violating constraints under the worst-case disturbance. We address this problem by designing a new value function with a contracting Bellman backup, where the super-zero level set, i.e., the set of states where the value function is evaluated to be non-negative, recovers the reach-avoid set. Building upon this, we prove that the proposed method can be adapted to compute the viability kernel, or the set of states which could be controlled to satisfy given constraints, and the backward reachable set, or the set of states that could be driven towards a given target set. Finally, we propose to alleviate the curse of dimensionality issue in high-dimensional problems by extending Conservative Q-Learning, a deep reinforcement learning technique, to learn a value function such that the super-zero level set of the learned value function serves as a (conservative) approximation to the reach-avoid set. Our theoretical and empirical results suggest that the proposed method could learn reliably the reach-avoid set and the optimal control policy even with neural network approximation.