IG-RL：用于大规模交通信号控制的归纳图增强学习

论文标题

IG-RL：用于大规模交通信号控制的归纳图增强学习

IG-RL: Inductive Graph Reinforcement Learning for Massive-Scale Traffic Signal Control

论文作者

Devailly, François-Xavier, Larocque, Denis, Charlin, Laurent

论文摘要

缩放自适应交通信号控制涉及处理组合状态和动作空间。多代理强化学习试图通过将控制权分配给专门的代理来应对这一挑战。但是，专业化阻碍了概括和可传递性，以及在多代理环境中主导的神经网络结构的基础计算图并没有提供灵活性来处理任意数量的实体，这些实体在道路网络之间发生变化，而随着车辆的遍历，车辆穿越该网络。我们基于适应任何道路网络结构的图形横向跨度网络引入归纳图形增强学习（IG-RL），以了解交通控制者及其周围环境的详细表示。我们的分散方法使学习可转移的自适应交通信号控制政策。在接受任意道路网络的培训之后，我们的模型可以推广到新的道路网络，交通分布和交通状态，而没有额外的培训和持续数量的参数，与先前的方法相比，可实现更大的可扩展性。此外，我们的方法可以通过捕获车道和车辆水平的（动态）需求来利用可用数据的粒度。在培训期间从未经历过的道路网络和交通设置，对所提出的方法进行了测试。我们将IG-RL与多代理增强学习和域特异性基准进行比较。在综合道路网络和涉及控制3,971个曼哈顿交通信号的较大实验中，我们表明IG-RL的不同实例均优于基本线。

Scaling adaptive traffic-signal control involves dealing with combinatorial state and action spaces. Multi-agent reinforcement learning attempts to address this challenge by distributing control to specialized agents. However, specialization hinders generalization and transferability, and the computational graphs underlying neural-networks architectures -- dominating in the multi-agent setting -- do not offer the flexibility to handle an arbitrary number of entities which changes both between road networks, and over time as vehicles traverse the network. We introduce Inductive Graph Reinforcement Learning (IG-RL) based on graph-convolutional networks which adapts to the structure of any road network, to learn detailed representations of traffic-controllers and their surroundings. Our decentralized approach enables learning of a transferable-adaptive-traffic-signal-control policy. After being trained on an arbitrary set of road networks, our model can generalize to new road networks, traffic distributions, and traffic regimes, with no additional training and a constant number of parameters, enabling greater scalability compared to prior methods. Furthermore, our approach can exploit the granularity of available data by capturing the (dynamic) demand at both the lane and the vehicle levels. The proposed method is tested on both road networks and traffic settings never experienced during training. We compare IG-RL to multi-agent reinforcement learning and domain-specific baselines. In both synthetic road networks and in a larger experiment involving the control of the 3,971 traffic signals of Manhattan, we show that different instantiations of IG-RL outperform baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题