用于转移自主移动的需求的图形元强化学习

论文标题

用于转移自主移动的需求的图形元强化学习

Graph Meta-Reinforcement Learning for Transferable Autonomous Mobility-on-Demand

论文作者

Gammelli, Daniele, Yang, Kaidi, Harrison, James, Rodrigues, Filipe, Pereira, Francisco C., Pavone, Marco

论文摘要

自主移动需求（AMOD）系统代表了现有运输范式的有吸引力的替代品，目前受到城市化和不断增长的旅行需求的挑战。这些系统通过集中控制自动驾驶汽车的车队，为客户提供出行服务，目前开始部署在世界各地的许多城市中。当前基于学习的控制AMOD系统的方法仅限于单一城市方案，因此，允许服务操作员在同一运输系统中采取无限量的操作决策。但是，现实世界中的系统运营商几乎无法为他们运营的每个城市提供全面培训AMOD控制器，因为这可能会导致培训期间大量质量较差的决策，从而使单城市策略成为潜在不切实际的解决方案。为了解决这些局限性，我们建议通过元强化学习（Meta-RL）的镜头形式化多城市AMOD问题，并设计基于Recurrent Graph神经网络的参与者批评算法。在我们的方法中，对AMOD控制器进行了明确的培训，因此新城市中的少量经验将产生良好的系统性能。从经验上讲，我们展示了如何通过迅速学习迅速适应性的政策来实现在看不见的城市中近乎最佳性能的控制策略，从而使它们不仅使它们对新型环境，而且对分布在现实世界中的分配变化也更加稳健，例如特殊事件，意外的会众和动态定价计划。

Autonomous Mobility-on-Demand (AMoD) systems represent an attractive alternative to existing transportation paradigms, currently challenged by urbanization and increasing travel needs. By centrally controlling a fleet of self-driving vehicles, these systems provide mobility service to customers and are currently starting to be deployed in a number of cities around the world. Current learning-based approaches for controlling AMoD systems are limited to the single-city scenario, whereby the service operator is allowed to take an unlimited amount of operational decisions within the same transportation system. However, real-world system operators can hardly afford to fully re-train AMoD controllers for every city they operate in, as this could result in a high number of poor-quality decisions during training, making the single-city strategy a potentially impractical solution. To address these limitations, we propose to formalize the multi-city AMoD problem through the lens of meta-reinforcement learning (meta-RL) and devise an actor-critic algorithm based on recurrent graph neural networks. In our approach, AMoD controllers are explicitly trained such that a small amount of experience within a new city will produce good system performance. Empirically, we show how control policies learned through meta-RL are able to achieve near-optimal performance on unseen cities by learning rapidly adaptable policies, thus making them more robust not only to novel environments, but also to distribution shifts common in real-world operations, such as special events, unexpected congestion, and dynamic pricing schemes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题