具有抑制网络的软批评者，可更快地进行再训练

论文标题

具有抑制网络的软批评者，可更快地进行再训练

Soft Actor-Critic with Inhibitory Networks for Faster Retraining

论文作者

Ide, Jaime S., Mićović, Daria, Guarino, Michael J., Alcedo, Kevin, Rosenbluth, David, Pope, Adrian P.

论文摘要

重复使用以前训练的模型对于深度加强学习至关重要，以加快对新代理商的训练。但是，当目标和约束与以前学到的技能相抵触时，尚不清楚如何获得新技能。此外，在训练时，利用已经学到的知识与探索新技能之间存在固有的冲突。在软演员批评（SAC）方法中，可以动态调整温度参数以加重动作熵并平衡Explore $ \ times $漏洞利用权衡。但是，控制单个系数在重新培训的背景下可能具有挑战性，当目标是矛盾的时候。在这项工作中，受神经科学研究的启发，我们提出了一种使用抑制网络的新方法，以允许单独和适应性状态的价值评估以及独特的自动熵调谐。最终，我们的方法允许控制抑制，以处理利用较小的风险，获得的行为和探索新颖的行为以克服更具挑战性的任务之间的冲突。我们通过在OpenAI健身环境中进行实验来验证我们的方法。

Reusing previously trained models is critical in deep reinforcement learning to speed up training of new agents. However, it is unclear how to acquire new skills when objectives and constraints are in conflict with previously learned skills. Moreover, when retraining, there is an intrinsic conflict between exploiting what has already been learned and exploring new skills. In soft actor-critic (SAC) methods, a temperature parameter can be dynamically adjusted to weight the action entropy and balance the explore $\times$ exploit trade-off. However, controlling a single coefficient can be challenging within the context of retraining, even more so when goals are contradictory. In this work, inspired by neuroscience research, we propose a novel approach using inhibitory networks to allow separate and adaptive state value evaluations, as well as distinct automatic entropy tuning. Ultimately, our approach allows for controlling inhibition to handle conflict between exploiting less risky, acquired behaviors and exploring novel ones to overcome more challenging tasks. We validate our method through experiments in OpenAI Gym environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题