RLX2：从头开始培训稀疏的深钢筋学习模型

论文标题

RLX2：从头开始培训稀疏的深钢筋学习模型

RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch

论文作者

Tan, Yiqin, Hu, Pihe, Pan, Ling, Huang, Jiatai, Huang, Longbo

论文摘要

培训深度强化学习（DRL）模型通常需要高计算成本。因此，压缩DRL模型具有训练加速和模型部署的巨大潜力。但是，生成小型模型的现有方法主要通过迭代训练密集的网络采用基于知识蒸馏的方法。结果，培训过程仍然需要大量的计算资源。确实，由于在引导程序培训中的非平稳性，DRL中从头开始的稀疏培训尚未得到很好的探索，并且尤其具有挑战性。在这项工作中，我们提出了一个新颖的稀疏DRL训练框架，即“操纵的加固学习彩票”（RLX2），该框架基于基于梯度的拓扑演化，并且能够完全基于稀疏网络训练稀疏的DRL模型。具体而言，RLX2引入了一种具有动态能力重播缓冲液的新型多步进目标机制，以实现稀疏模型中强大的价值学习和有效的拓扑探索。它还在多个任务中达到了最新的稀疏训练性能，显示7.5 \ times-20 \ times模型压缩，性能降低少于3％，最多20 \ times和50 \ times Flops减少了训练和推理。

Training deep reinforcement learning (DRL) models usually requires high computation costs. Therefore, compressing DRL models possesses immense potential for training acceleration and model deployment. However, existing methods that generate small models mainly adopt the knowledge distillation-based approach by iteratively training a dense network. As a result, the training process still demands massive computing resources. Indeed, sparse training from scratch in DRL has not been well explored and is particularly challenging due to non-stationarity in bootstrap training. In this work, we propose a novel sparse DRL training framework, "the Rigged Reinforcement Learning Lottery" (RLx2), which builds upon gradient-based topology evolution and is capable of training a sparse DRL model based entirely on a sparse network. Specifically, RLx2 introduces a novel multi-step TD target mechanism with a dynamic-capacity replay buffer to achieve robust value learning and efficient topology exploration in sparse models. It also reaches state-of-the-art sparse training performance in several tasks, showing 7.5\times-20\times model compression with less than 3% performance degradation and up to 20\times and 50\times FLOPs reduction for training and inference, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题