在非马克维亚环境中的强化学习

论文标题

在非马克维亚环境中的强化学习

Reinforcement Learning in Non-Markovian Environments

论文作者

Chandak, Siddharth, Shah, Pratik, Borkar, Vivek S, Dodhia, Parth

论文摘要

由范罗伊（Van Roy）开发的新型范式和在任意非马克维亚环境中增强学习的新范式的动机，我们提出了一种相关的配方，并明确地固定在该格式上的Q-学习算法时，由观察到的非马克维亚观测造成的误差。基于这一观察结果，我们建议代理设计的标准应为某些条件定律寻求良好的近似值。受经典随机控制的启发，我们表明我们的问题减少了近似统计数据的递归计算。这导致了基于自动编码器的代理设计方案，然后在部分观察到的强化学习环境中进行数值测试。

Motivated by the novel paradigm developed by Van Roy and coauthors for reinforcement learning in arbitrary non-Markovian environments, we propose a related formulation and explicitly pin down the error caused by non-Markovianity of observations when the Q-learning algorithm is applied on this formulation. Based on this observation, we propose that the criterion for agent design should be to seek good approximations for certain conditional laws. Inspired by classical stochastic control, we show that our problem reduces to that of recursive computation of approximate sufficient statistics. This leads to an autoencoder-based scheme for agent design which is then numerically tested on partially observed reinforcement learning environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题