论文标题
深入增强学习权重的自适应同步方法
An adaptive synchronization approach for weights of deep reinforcement learning
论文作者
论文摘要
深Q-NETWORKS(DQN)是最著名的深入学习方法之一,它使用深度学习来近似行动值函数。解决诸如移动目标问题和样本之间的相关性之类的众多深层强化学习挑战是该模型的主要优点。尽管近年来有各种DQN的扩展,但它们都使用与DQN相似的方法来克服移动目标的问题。尽管提到了优点,但在固定的步长中同步网络重量,而与代理人的行为无关,在某些情况下可能会导致某些正确学习的网络的损失。这些丢失的网络可能会带来更多奖励的州,因此在重播记忆中存储了更好的样本,以进行将来的培训。在本文中,我们从DQN家族中解决了这个问题,并为DQN中使用的神经权重同步提供了一种自适应方法。在这种方法中,权重的同步是基于代理的最新行为来完成的,该行为是通过间隔结束时的标准来衡量的。为了测试此方法,我们使用建议的自适应同步方法调整了DQN和彩虹方法。我们将这些调整后的方法与它们在著名游戏中的标准形式进行了比较,结果证实了我们同步方法的质量。
Deep Q-Networks (DQN) is one of the most well-known methods of deep reinforcement learning, which uses deep learning to approximate the action-value function. Solving numerous Deep reinforcement learning challenges such as moving targets problem and the correlation between samples are the main advantages of this model. Although there have been various extensions of DQN in recent years, they all use a similar method to DQN to overcome the problem of moving targets. Despite the advantages mentioned, synchronizing the network weight in a fixed step size, independent of the agent's behavior, may in some cases cause the loss of some properly learned networks. These lost networks may lead to states with more rewards, hence better samples stored in the replay memory for future training. In this paper, we address this problem from the DQN family and provide an adaptive approach for the synchronization of the neural weights used in DQN. In this method, the synchronization of weights is done based on the recent behavior of the agent, which is measured by a criterion at the end of the intervals. To test this method, we adjusted the DQN and rainbow methods with the proposed adaptive synchronization method. We compared these adjusted methods with their standard form on well-known games, which results confirm the quality of our synchronization methods.