论文标题
贝叶斯软演奏者 - 批评:基于定向的无环策略图,基于深度强化学习
Bayesian Soft Actor-Critic: A Directed Acyclic Strategy Graph Based Deep Reinforcement Learning
论文作者
论文摘要
采用合理的策略是具有挑战性的,但对于智能代理人的智能代理人至关重要,其资源有限,在危险,非结构化和动态环境中工作,以改善系统的效用,降低整体成本并提高任务成功概率。本文提出了一种基于贝叶斯链接的新型定向无环策略图形分解方法,将复杂的政策分离为几个简单的亚事业,并将其作为贝叶斯战略网络(BSN)组织。我们将这种方法整合到最先进的DRL方法中 - 软演员 - 批评者(SAC),并通过组织几个亚波主义作为联合政策来构建相应的贝叶斯软演奏者(BSAC)模型。我们将我们的方法与OpenAI健身房环境中标准连续控制基准测试的最先进的深钢筋学习算法进行了比较。结果表明,BSAC方法的有希望的潜力可显着提高训练效率。
Adopting reasonable strategies is challenging but crucial for an intelligent agent with limited resources working in hazardous, unstructured, and dynamic environments to improve the system's utility, decrease the overall cost, and increase mission success probability. This paper proposes a novel directed acyclic strategy graph decomposition approach based on Bayesian chaining to separate an intricate policy into several simple sub-policies and organize their relationships as Bayesian strategy networks (BSN). We integrate this approach into the state-of-the-art DRL method -- soft actor-critic (SAC), and build the corresponding Bayesian soft actor-critic (BSAC) model by organizing several sub-policies as a joint policy. We compare our method against the state-of-the-art deep reinforcement learning algorithms on the standard continuous control benchmarks in the OpenAI Gym environment. The results demonstrate that the promising potential of the BSAC method significantly improves training efficiency.