论文标题
在随机状态过渡下进行基于价值的多物体增强学习的问题的演示
A Demonstration of Issues with Value-Based Multiobjective Reinforcement Learning Under Stochastic State Transitions
论文作者
论文摘要
我们在具有随机状态过渡的环境的背景下,通过无模型,基于价值的方法报告了一个以前未知的问题。一个示例多目标马尔可夫决策过程(MOMDP)来证明在这种情况下,这些方法可能无法发现最大化标量预期回报的策略,实际上可能会融合到帕累托主导的解决方案。我们讨论了几种替代方法,这些方法可能更适合于使用随机转变最大化MOMDP中的SER。
We report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. An example multiobjective Markov Decision Process (MOMDP) is used to demonstrate that under such conditions these approaches may be unable to discover the policy which maximises the Scalarised Expected Return, and in fact may converge to a Pareto-dominated solution. We discuss several alternative methods which may be more suitable for maximising SER in MOMDPs with stochastic transitions.