论文标题
可视化演员评论家方法的损失格局,并在库存优化中应用
Visualizing the Loss Landscape of Actor Critic Methods with Applications in Inventory Optimization
论文作者
论文摘要
连续控制是加强学习的广泛适用领域。该领域的主要参与者是参与者批评的方法,它们利用神经近似值的政策梯度作为一种常见的做法。我们研究的重点是显示参与者损失函数的特征,这是优化的重要组成部分。我们利用损失函数的低维可视化,并为各种算法的损失景观提供了比较。此外,我们将方法应用于多商店的动态库存控制,这是供应链操作中一个众所周知的困难问题,并探索与最佳策略相关的损失功能的形状。我们使用加强学习对问题进行了建模和解决,同时拥有损失格局以最佳的方式。
Continuous control is a widely applicable area of reinforcement learning. The main players of this area are actor-critic methods that utilize policy gradients of neural approximators as a common practice. The focus of our study is to show the characteristics of the actor loss function which is the essential part of the optimization. We exploit low dimensional visualizations of the loss function and provide comparisons for loss landscapes of various algorithms. Furthermore, we apply our approach to multi-store dynamic inventory control, a notoriously difficult problem in supply chain operations, and explore the shape of the loss function associated with the optimal policy. We modelled and solved the problem using reinforcement learning while having a loss landscape in favor of optimality.