论文标题
单个时间尺度的演员批评方法来解决线性二次调节器的融合保证
Single Time-scale Actor-critic Method to Solve the Linear Quadratic Regulator with Convergence Guarantees
论文作者
论文摘要
我们提出了一个时间尺度的参与者 - 批评算法来解决线性二次调节器(LQR)问题。至少正方形的时间差异(LSTD)方法适用于评论家,并将自然政策梯度方法用于演员。我们给出了与样本复杂性$ \ MATHCAL {O}(\ VAREPSILON^{ - 1} \ log(\ Varepsilon^{ - 1})^2)$的收敛证明。证明中的方法适用于一般单个时间尺度的双重优化问题。我们还在数字上验证了有关收敛性的理论结果。
We propose a single time-scale actor-critic algorithm to solve the linear quadratic regulator (LQR) problem. A least squares temporal difference (LSTD) method is applied to the critic and a natural policy gradient method is used for the actor. We give a proof of convergence with sample complexity $\mathcal{O}(\varepsilon^{-1} \log(\varepsilon^{-1})^2)$. The method in the proof is applicable to general single time-scale bilevel optimization problem. We also numerically validate our theoretical results on the convergence.