DL-DRL：多尺度任务计划的双层深钢筋学习方法

论文标题

DL-DRL：多尺度任务计划的双层深钢筋学习方法

DL-DRL: A double-level deep reinforcement learning approach for large-scale task scheduling of multi-UAV

论文作者

Mao, Xiao, Cao, Zhiguang, Fan, Mingfeng, Wu, Guohua, Pedrycz, Witold

论文摘要

利用无人机（UAVS）执行任务正在越来越受欢迎。为了解决基本的任务调度问题，基于深度强化学习（DRL）的方法比传统的启发式方法表现出显着的优势，因为它们依赖于手工设计的规则。但是，随着问题的扩大，他们的决策空间将变得巨大，从而降低了计算效率。为了减轻此问题，我们提出了基于鸿沟和征服框架（DCF）的双层深钢筋学习（DL-DRL）方法，在此我们将多-UAV的任务计划分解为任务分配和路线计划。特别是，我们在高级DRL模型中设计了一个编码器结构化的策略网络，以将任务分配给不同的无人机，并且在我们的下层DRL模型中利用了另一个基于注意力的策略网络来构建每个无人机的路线，目的是最大程度地提高了鉴于无人机的最大飞行距离执行任务的数量。为了有效地培训这两个模型，我们设计了一种互动培训策略（ITS），其中包括预训练，强化培训和替代培训。实验结果表明，我们的DL-DRL在解决方案质量和计算效率方面对基于学习的基线和包括Or-Tool（Or-Tools）（包括Or-Tools）的性能有益。我们还通过将其应用于最多1000个任务的较大尺寸来验证我们的方法的概括性能。此外，我们还通过消融研究表明，我们的它可以帮助达到绩效和训练效率之间的平衡。

Exploiting unmanned aerial vehicles (UAVs) to execute tasks is gaining growing popularity recently. To solve the underlying task scheduling problem, the deep reinforcement learning (DRL) based methods demonstrate notable advantage over the conventional heuristics as they rely less on hand-engineered rules. However, their decision space will become prohibitively huge as the problem scales up, thus deteriorating the computation efficiency. To alleviate this issue, we propose a double-level deep reinforcement learning (DL-DRL) approach based on a divide and conquer framework (DCF), where we decompose the task scheduling of multi-UAV into task allocation and route planning. Particularly, we design an encoder-decoder structured policy network in our upper-level DRL model to allocate the tasks to different UAVs, and we exploit another attention based policy network in our lower-level DRL model to construct the route for each UAV, with the objective to maximize the number of executed tasks given the maximum flight distance of the UAV. To effectively train the two models, we design an interactive training strategy (ITS), which includes pre-training, intensive training and alternate training. Experimental results show that our DL-DRL performs favorably against the learning-based and conventional baselines including the OR-Tools, in terms of solution quality and computation efficiency. We also verify the generalization performance of our approach by applying it to larger sizes of up to 1000 tasks. Moreover, we also show via an ablation study that our ITS can help achieve a balance between the performance and training efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题