论文标题
通过词典目标进行增强学习的符合鲁棒性
Bounded Robustness in Reinforcement Learning via Lexicographic Objectives
论文作者
论文摘要
强化学习中的策略鲁棒性可能不惜一切代价是可取的:由其他最佳政策造成的鲁棒性要求的变化应是可解释的,可量化的,并且可以正式验证。在这项工作中,我们研究了如何通过分析如何通过对干扰的随机线性算子解释来改变这种噪声的方式对任意观察噪声的最大鲁棒性,并在噪声和噪声内核的稳健性和属性之间建立连接和基础MDP之间的联系。然后,我们为政策鲁棒性构建了足够的条件,并提出了适用于任何政策梯度算法的鲁棒性诱导计划,该计划正式通过词典优化与预期的政策实用性进行了贸易,同时保留策略合成中的融合和次级偏见。
Policy robustness in Reinforcement Learning may not be desirable at any cost: the alterations caused by robustness requirements from otherwise optimal policies should be explainable, quantifiable and formally verifiable. In this work we study how policies can be maximally robust to arbitrary observational noise by analysing how they are altered by this noise through a stochastic linear operator interpretation of the disturbances, and establish connections between robustness and properties of the noise kernel and of the underlying MDPs. Then, we construct sufficient conditions for policy robustness, and propose a robustness-inducing scheme, applicable to any policy gradient algorithm, that formally trades off expected policy utility for robustness through lexicographic optimisation, while preserving convergence and sub-optimality in the policy synthesis.