论文标题
高加固学习应用的超参数调整
Hyperparameter Tuning for Deep Reinforcement Learning Applications
论文作者
论文摘要
强化学习(RL)应用程序可以简单地通过与环境互动来学习最佳行为,从而在从控制简单的摆到复杂的数据中心的各种应用中迅速获得了巨大的成功。但是,设置正确的超参数可能会对通过RL产生的推理模型中部署的解决方案性能和可靠性产生巨大影响,该推理模型用于决策。超参数搜索本身是一个费力的过程,需要许多迭代和计算昂贵的过程才能找到最佳的设置,以产生最佳的神经网络体系结构。与其他神经网络体系结构相比,由于所需的算法复杂性和仿真平台,Deep RL并未见证太多的高参数调整。在本文中,我们提出了一个分布式可变长度遗传算法框架,以系统地调整各种RL应用的超参数,从而通过进化来改善架构的训练时间和稳健性。我们在许多RL问题(从简单的健身房到复杂的应用程序)上演示了方法的可扩展性,并与贝叶斯方法相比。我们的结果表明,越来越多的几代人,最佳解决方案需要更少的培训情节,并且在计算方面便宜,同时对部署更加强大。我们的结果必须为实现现实世界中的问题提高深度加强学习控制者。
Reinforcement learning (RL) applications, where an agent can simply learn optimal behaviors by interacting with the environment, are quickly gaining tremendous success in a wide variety of applications from controlling simple pendulums to complex data centers. However, setting the right hyperparameters can have a huge impact on the deployed solution performance and reliability in the inference models, produced via RL, used for decision-making. Hyperparameter search itself is a laborious process that requires many iterations and computationally expensive to find the best settings that produce the best neural network architectures. In comparison to other neural network architectures, deep RL has not witnessed much hyperparameter tuning, due to its algorithm complexity and simulation platforms needed. In this paper, we propose a distributed variable-length genetic algorithm framework to systematically tune hyperparameters for various RL applications, improving training time and robustness of the architecture, via evolution. We demonstrate the scalability of our approach on many RL problems (from simple gyms to complex applications) and compared with Bayesian approach. Our results show that with more generations, optimal solutions that require fewer training episodes and are computationally cheap while being more robust for deployment. Our results are imperative to advance deep reinforcement learning controllers for real-world problems.