在混合动作空间中进行交通信号控制的强化学习

论文标题

在混合动作空间中进行交通信号控制的强化学习

Reinforcement learning for traffic signal control in hybrid action space

论文作者

Luo, Haoqing, jin, sheng

论文摘要

根据动作空间，基于盛行的基于加固的基于学习的流量信号控制方法通常是可拍摄或持续时间 - 可观的持续时间。在本文中，我们提出了一种新颖的控制体系结构TBO，该结构基于混合近端策略优化。据我们所知，TBO是第一个基于RL的算法，用于实现分期和持续时间的同步优化。与离散和连续的动作空间相比，混合动作空间是一个合并的搜索空间，在该空间中，TBO更好地实现了频繁开关和不饱和释放之间的权衡。进行实验以证明与现有基准相比，TBO平均将队列长度和延迟分别降低了13.78％和14.08％。此外，我们计算了指示TBO的通行权的Gini系数不会在提高效率的同时损害公平性。

The prevailing reinforcement-learning-based traffic signal control methods are typically staging-optimizable or duration-optimizable, depending on the action spaces. In this paper, we propose a novel control architecture, TBO, which is based on hybrid proximal policy optimization. To the best of our knowledge, TBO is the first RL-based algorithm to implement synchronous optimization of the staging and duration. Compared to discrete and continuous action spaces, hybrid action space is a merged search space, in which TBO better implements the trade-off between frequent switching and unsaturated release. Experiments are given to demonstrate that TBO reduces the queue length and delay by 13.78% and 14.08% on average, respectively, compared to the existing baselines. Furthermore, we calculate the Gini coefficients of the right-of-way to indicate TBO does not harm fairness while improving efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题