论文标题
Q-学习的最终迭代收敛结合:开关系统方法
Final Iteration Convergence Bound of Q-Learning: Switching System Approach
论文作者
论文摘要
Q学习被称为基本强化学习(RL)算法之一。在过去的几十年中,它的融合一直是广泛研究的重点。最近,使用开关系统框架引入了一个新的预期错误和Q学习分析。这种方法将Q学习的动态视为一个离散的随机交换系统。先前的研究建立了使用Lyapunov函数在平均迭代的有限时间误差,从而进一步了解Q学习。虽然有价值,但分析的重点是平均迭代的误差界限,这是固有的缺点:它需要额外的平均步骤,这可以减速收敛率。此外,最终迭代是Q学习的原始格式,更常用,通常被视为大多数迭代算法中更直观和自然的形式。在本文中,我们会根据开关系统框架在Q-学习的最终迭代中介绍一个有限的时间误差。与以前的工作相比,提出的误差界限具有不同的功能,并涵盖了不同的方案。最后,我们预计提出的结果通过与离散时间切换系统的连接提供了有关Q学习的其他见解,并且可能会呈现一个新的模板,以进行有限的时间分析更通用的RL算法。
Q-learning is known as one of the fundamental reinforcement learning (RL) algorithms. Its convergence has been the focus of extensive research over the past several decades. Recently, a new finitetime error bound and analysis for Q-learning was introduced using a switching system framework. This approach views the dynamics of Q-learning as a discrete-time stochastic switching system. The prior study established a finite-time error bound on the averaged iterates using Lyapunov functions, offering further insights into Q-learning. While valuable, the analysis focuses on error bounds of the averaged iterate, which comes with the inherent disadvantages: it necessitates extra averaging steps, which can decelerate the convergence rate. Moreover, the final iterate, being the original format of Q-learning, is more commonly used and is often regarded as a more intuitive and natural form in the majority of iterative algorithms. In this paper, we present a finite-time error bound on the final iterate of Q-learning based on the switching system framework. The proposed error bounds have different features compared to the previous works, and cover different scenarios. Finally, we expect that the proposed results provide additional insights on Q-learning via connections with discrete-time switching systems, and can potentially present a new template for finite-time analysis of more general RL algorithms.