无线控制系统中的大型图形增强学习

论文标题

无线控制系统中的大型图形增强学习

Large-Scale Graph Reinforcement Learning in Wireless Control Systems

论文作者

Lima, Vinicius, Eisen, Mark, Gatsis, Konstantinos, Ribeiro, Alejandro

论文摘要

现代控制系统通常采用无线网络来在空间分布的植物，执行器和传感器之间交换信息。由于无线网络由随机，快速变化的传输条件定义，这些传输条件挑战了控制系统设计中通常存在的假设，因此通信资源的正确分配对于实现可靠的操作至关重要。但是，设计资源分配政策具有挑战性，激励最近的作品成功利用深度学习和深入的增强学习技术来设计无线控制系统（WCSS）的资源分配和调度策略。随着神经网络中可学习的参数的数量随输入信号的大小而增长，深度强化学习可能无法扩展，从而将这种调度和资源分配策略的立即概括限制为大规模系统。但是，网络中的植物和控制器之间的干扰和褪色模式诱导了一个随时间变化的图，该图可用于基于图形神经网络（GNN）构建策略表示，而现在的可学习参数的数量与网络中的植物数量无关。我们在WCSS的背景下进一步建立，由于图形排列的固有不变性，GNN能够建模可扩展和可转移的资源分配策略，随后通过原始的二重加固学习对此进行了培训。数值实验表明，所提出的图形增强学习方法产生的政策不仅超过了基线解决方案和基于大规模的大规模系统中的深度强化学习策略，而且还可以在各种大小的网络中转移。

Modern control systems routinely employ wireless networks to exchange information between spatially distributed plants, actuators and sensors. With wireless networks defined by random, rapidly changing transmission conditions that challenge assumptions commonly held in the design of control systems, proper allocation of communication resources is essential to achieve reliable operation. Designing resource allocation policies, however, is challenging, motivating recent works to successfully exploit deep learning and deep reinforcement learning techniques to design resource allocation and scheduling policies for wireless control systems (WCSs). As the number of learnable parameters in a neural network grows with the size of the input signal, deep reinforcement learning may fail to scale, limiting the immediate generalization of such scheduling and resource allocation policies to large-scale systems. The interference and fading patterns among plants and controllers in the network, however, induce a time-varying graph that can be used to construct policy representations based on graph neural networks (GNNs), with the number of learnable parameters now independent of the number of plants in the network. We further establish in the context of WCSs that, due to inherent invariance to graph permutations, the GNN is able to model scalable and transferable resource allocation policies, which are subsequently trained with primal-dual reinforcement learning. Numerical experiments show that the proposed graph reinforcement learning approach yields policies that not only outperform baseline solutions and deep reinforcement learning based policies in large-scale systems, but that can also be transferred across networks of varying size.

下载PDF全文

下载文献需遵守相关版权规定

论文标题