RTAW：用于多机器人任务分配仓库环境中的启发性的加固学习方法

论文标题

RTAW：用于多机器人任务分配仓库环境中的启发性的加固学习方法

RTAW: An Attention Inspired Reinforcement Learning Method for Multi-Robot Task Allocation in Warehouse Environments

论文作者

Agrawal, Aakriti, Bedi, Amrit Singh, Manocha, Dinesh

论文摘要

我们提出了一种基于新颖的增强学习算法，用于仓库环境中的多机器人任务分配问题。我们将其作为马尔可夫的决策过程提出，并通过一种新颖的深度多代理增强学习方法（称为RTAW）来解决，并具有启发性的策略架构。因此，我们提出的策略网络使用独立于机器人/任务数量的全局嵌入。我们利用近端政策优化算法进行培训，并使用精心设计的奖励来获得融合的政策。融合的政策确保了不同机器人之间的合作，以最大程度地减少总旅行延迟（TTD），这最终改善了一个足够大的任务列表。在我们的广泛实验中，我们将RTAW算法的性能与最新方法（例如近视拾取距离最小化（Greedy）和基于遗憾的基于不同导航方案的基准基线）进行了比较。在TTD中，我们在TTD中显示了最高14％（25-1000秒）的情况，这些方案具有数百或数千个任务，用于不同挑战性的仓库布局和任务生成方案。我们还通过在模拟中显示高达$ 1000 $的机器人的性能来证明我们的方法的可扩展性。

We present a novel reinforcement learning based algorithm for multi-robot task allocation problem in warehouse environments. We formulate it as a Markov Decision Process and solve via a novel deep multi-agent reinforcement learning method (called RTAW) with attention inspired policy architecture. Hence, our proposed policy network uses global embeddings that are independent of the number of robots/tasks. We utilize proximal policy optimization algorithm for training and use a carefully designed reward to obtain a converged policy. The converged policy ensures cooperation among different robots to minimize total travel delay (TTD) which ultimately improves the makespan for a sufficiently large task-list. In our extensive experiments, we compare the performance of our RTAW algorithm to state of the art methods such as myopic pickup distance minimization (greedy) and regret based baselines on different navigation schemes. We show an improvement of upto 14% (25-1000 seconds) in TTD on scenarios with hundreds or thousands of tasks for different challenging warehouse layouts and task generation schemes. We also demonstrate the scalability of our approach by showing performance with up to $1000$ robots in simulations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题