论文标题
改善(自然)参与者批评算法的样品复杂性界限
Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms
论文作者
论文摘要
Actor-Critic(AC)算法是找到强化学习中最佳政策的流行方法。在无限的地平线方案中,最近已经建立了AC和天然参与者(NAC)算法的有限样本收敛速率,但在每种迭代中的独立且相同分布的(I.I.D.)采样和单样本更新下。相反,本文表征了马尔可夫采样下AC和NAC的收敛速率和样本复杂性,每次迭代的微型批量数据以及Actor具有一般政策类别的近似。我们表明,微型批量AC的总体样本复杂性达到$ε$ - $ - $ - 固定点的固定点可以提高AC的样本复杂性,订单$ \ Mathcal {o}(O}(ε^{ - 1} \ log(1} \ log(1/ε))$的整体样品$ -Ac $ acc的整体样品复杂性 - NAC通过$ \ Mathcal {O}(ε^{ - 1}/\ log(1/ε))$的订单。此外,此工作中特征的AC和NAC的样本复杂性优于政策梯度(PG)和自然政策梯度(NPG)的$ \ Mathcal {o}(O}((1-γ)^{ - 3} { - 3})$和$ \ \ \ \ \ \ \ \ \ \ \米}(O}(O})((1-γ)^{1-γ) 分别。这是第一项理论研究,确定AC和NAC在无限范围内的AC和NPG对PG和NPG的秩序绩效改善,这是由于批评的纳入而进行的。
The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement learning. In the infinite horizon scenario, the finite-sample convergence rate for the AC and natural actor-critic (NAC) algorithms has been established recently, but under independent and identically distributed (i.i.d.) sampling and single-sample update at each iteration. In contrast, this paper characterizes the convergence rate and sample complexity of AC and NAC under Markovian sampling, with mini-batch data for each iteration, and with actor having general policy class approximation. We show that the overall sample complexity for a mini-batch AC to attain an $ε$-accurate stationary point improves the best known sample complexity of AC by an order of $\mathcal{O}(ε^{-1}\log(1/ε))$, and the overall sample complexity for a mini-batch NAC to attain an $ε$-accurate globally optimal point improves the existing sample complexity of NAC by an order of $\mathcal{O}(ε^{-1}/\log(1/ε))$. Moreover, the sample complexity of AC and NAC characterized in this work outperforms that of policy gradient (PG) and natural policy gradient (NPG) by a factor of $\mathcal{O}((1-γ)^{-3})$ and $\mathcal{O}((1-γ)^{-4}ε^{-1}/\log(1/ε))$, respectively. This is the first theoretical study establishing that AC and NAC attain orderwise performance improvement over PG and NPG under infinite horizon due to the incorporation of critic.