大规模MEC系统的分布式资源调度：一个多代理的合奏深度加固学习，并通过模仿加速

论文标题

大规模MEC系统的分布式资源调度：一个多代理的合奏深度加固学习，并通过模仿加速

Distributed Resource Scheduling for Large-Scale MEC Systems: A Multi-Agent Ensemble Deep Reinforcement Learning with Imitation Acceleration

论文作者

Jiang, Feibo, Dong, Li, Wang, Kezhi, Yang, Kun, Pan, Cunhua

论文摘要

我们考虑优化分布式资源调度，以最大程度地减少大型移动边缘计算（MEC）系统中所有物联网设备（IOTD）的任务延迟和能耗的总和。为了解决此问题，我们提出了一个分布式的智能资源调度（DIRS）框架，其中包括依靠全局信息和每个部署在每个MEC服务器中的代理的分布式决策制定的集中培训。更具体地说，我们首先引入了一种新型的多代理合奏辅助分布式深化学习（DRL）体系结构，该结构可以通过对状态空间进行分区，并通过结合所有代理的决策来简化每个代理的整体神经网络结构。其次，我们采用行动改进来增强所提出的DIRS框架的勘探能力，其中通过新颖的Lévy飞行搜索获得了近乎最佳的状态行动对。最后，提出了一个模仿加速方案，以预先培训所有代理，该试剂可以通过从少量演示数据中学习专业经验来显着加速所提出的框架的学习过程。进行了广泛的模拟，以证明拟议的DIRS框架是有效的，并且表现优于现有基准方案。

We consider the optimization of distributed resource scheduling to minimize the sum of task latency and energy consumption for all the Internet of things devices (IoTDs) in a large-scale mobile edge computing (MEC) system. To address this problem, we propose a distributed intelligent resource scheduling (DIRS) framework, which includes centralized training relying on the global information and distributed decision making by each agent deployed in each MEC server. More specifically, we first introduce a novel multi-agent ensemble-assisted distributed deep reinforcement learning (DRL) architecture, which can simplify the overall neural network structure of each agent by partitioning the state space and also improve the performance of a single agent by combining decisions of all the agents. Secondly, we apply action refinement to enhance the exploration ability of the proposed DIRS framework, where the near-optimal state-action pairs are obtained by a novel Lévy flight search. Finally, an imitation acceleration scheme is presented to pre-train all the agents, which can significantly accelerate the learning process of the proposed framework through learning the professional experience from a small amount of demonstration data. Extensive simulations are conducted to demonstrate that the proposed DIRS framework is efficient and outperforms the existing benchmark schemes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题