连续状态和行动MDP的随机政策学习

论文标题

连续状态和行动MDP的随机政策学习

Randomized Policy Learning for Continuous State and Action MDPs

论文作者

Sharma, Hiteshi, Jain, Rahul

论文摘要

深厚的强化学习方法已经实现了最先进的结果，从而取得了各种具有挑战性的高维领域，从视频游戏到机车。成功的关键是使用用于近似政策和价值功能的深神经网络。但是，需要对重量进行实质性调整才能获得良好的效果。相反，我们使用随机函数近似。此类网络不仅比训练完全连接的网络便宜，而且还提高了数值性能。我们提出\ texttt {randpol}，这是一种具有连续状态和动作空间的MDP的通用策略迭代算法。策略和价值功能都用随机网络表示。我们还为算法的性能提供有限的时间。然后，我们在具有挑战性的环境上显示了数值性能，并将它们与基于神经网络的深度网络算法进行了比较。

Deep reinforcement learning methods have achieved state-of-the-art results in a variety of challenging, high-dimensional domains ranging from video games to locomotion. The key to success has been the use of deep neural networks used to approximate the policy and value function. Yet, substantial tuning of weights is required for good results. We instead use randomized function approximation. Such networks are not only cheaper than training fully connected networks but also improve the numerical performance. We present \texttt{RANDPOL}, a generalized policy iteration algorithm for MDPs with continuous state and action spaces. Both the policy and value functions are represented with randomized networks. We also give finite time guarantees on the performance of the algorithm. Then we show the numerical performance on challenging environments and compare them with deep neural network based algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题