采样高效的加固学习与日志（t）切换成本

论文标题

采样高效的加固学习与日志（t）切换成本

Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost

论文作者

Qiao, Dan, Yin, Ming, Min, Ming, Wang, Yu-Xiang

论文摘要

我们研究了较低（策略）切换成本的加固学习问题（RL） - 现实生活中的RL应用程序使新政策的部署成本昂贵，并且策略更新的数量必须较低。在本文中，我们提出了一种基于阶段探索和自适应政策消除的新算法，使$ \ widetilde {o}（\ sqrt {h^4s^2at}）$感到遗憾，同时需要$ O（HSA \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ log \ t）$。这是对最著名的开关成本$ O（H^2sa \ log T）$的指数改进。在上面，$ s，a $表示$ h $ horizon情节马尔可夫决策过程模型中的状态和行动数量未知，而$ t $是步骤的数量。作为我们新技术的副产品，我们还获得了一种无奖励探索算法，其切换成本为$ O（HSA）$。此外，我们证明了一对信息理论的下限，该界限说（1）任何无regret算法都必须具有$ω（HSA）$的开关成本；（2）任何$ \ widetilde {o}（\ sqrt {t}）$遗憾算法必须产生$ω（HSA \ log \ log \ log t）$的开关成本。因此，我们的两种算法在其切换成本方面都是最佳的。

We study the problem of reinforcement learning (RL) with low (policy) switching cost - a problem well-motivated by real-life RL applications in which deployments of new policies are costly and the number of policy updates must be low. In this paper, we propose a new algorithm based on stage-wise exploration and adaptive policy elimination that achieves a regret of $\widetilde{O}(\sqrt{H^4S^2AT})$ while requiring a switching cost of $O(HSA \log\log T)$. This is an exponential improvement over the best-known switching cost $O(H^2SA\log T)$ among existing methods with $\widetilde{O}(\mathrm{poly}(H,S,A)\sqrt{T})$ regret. In the above, $S,A$ denotes the number of states and actions in an $H$-horizon episodic Markov Decision Process model with unknown transitions, and $T$ is the number of steps. As a byproduct of our new techniques, we also derive a reward-free exploration algorithm with a switching cost of $O(HSA)$. Furthermore, we prove a pair of information-theoretical lower bounds which say that (1) Any no-regret algorithm must have a switching cost of $Ω(HSA)$; (2) Any $\widetilde{O}(\sqrt{T})$ regret algorithm must incur a switching cost of $Ω(HSA\log\log T)$. Both our algorithms are thus optimal in their switching costs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题