论文标题
通过线性函数近似的分散时间差学习的有限时间估计误差的确切公式
Exact Formulas for Finite-Time Estimation Errors of Decentralized Temporal Difference Learning with Linear Function Approximation
论文作者
论文摘要
在本文中,我们考虑了多代理增强学习(MARL)中的策略评估问题,并通过线性函数近似值来得出有限的时间均值估计误差(TD)学习的有限时间均值估计误差(TD)。我们的分析取决于分散的TD学习方法可以看作是马尔可夫跳跃线性系统(MJLS)的事实。然后,可以应用标准MJLS理论来量化分散的TD方法的估计误差的平均值和协方差矩阵。还讨论了我们确切公式对算法性能的各种含义。一个有趣的发现是,在必要且充分的稳定性条件下,均方估计误差将以特定的指数速率收敛到确切的限制。
In this paper, we consider the policy evaluation problem in multi-agent reinforcement learning (MARL) and derive exact closed-form formulas for the finite-time mean-squared estimation errors of decentralized temporal difference (TD) learning with linear function approximation. Our analysis hinges upon the fact that the decentralized TD learning method can be viewed as a Markov jump linear system (MJLS). Then standard MJLS theory can be applied to quantify the mean and covariance matrix of the estimation error of the decentralized TD method at every time step. Various implications of our exact formulas on the algorithm performance are also discussed. An interesting finding is that under a necessary and sufficient stability condition, the mean-squared TD estimation error will converge to an exact limit at a specific exponential rate.