论文标题
经济学和金融中的强化学习
Reinforcement Learning in Economics and Finance
论文作者
论文摘要
强化学习算法描述了代理如何通过重复经验在顺序决策过程中学习最佳行动策略。在给定的环境中,代理政策为他提供了一些运行和终端奖励。与在线学习一样,代理商会顺序学习。就像在多武器的匪徒问题中一样,当代理人选择动作时,他无法推断出其他动作选择引起的奖励。在强化学习中,他的行为会产生后果:它们不仅影响奖励,而且影响世界未来状态。加强学习的目的是找到一个最佳政策 - 从世界状态到一组行动的映射,以最大程度地提高累积奖励,这是一种长期的策略。探索在短期内可能是最佳的,但可能会导致最佳的长期。在强化学习框架中可以表达许多最佳控制的问题,在经济学中流行了40多年,并且可以使用深度学习算法的最新计算科学进展,可以使用经济学家使用,以解决复杂的行为问题。在本文中,我们提出了一种最新的加强学习技术,并在经济学,游戏理论,运营研究和金融中提出了应用。
Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal rewards. As in online learning, the agent learns sequentially. As in multi-armed bandit problems, when an agent picks an action, he can not infer ex-post the rewards induced by other action choices. In reinforcement learning, his actions have consequences: they influence not only rewards, but also future states of the world. The goal of reinforcement learning is to find an optimal policy -- a mapping from the states of the world to the set of actions, in order to maximize cumulative reward, which is a long term strategy. Exploring might be sub-optimal on a short-term horizon but could lead to optimal long-term ones. Many problems of optimal control, popular in economics for more than forty years, can be expressed in the reinforcement learning framework, and recent advances in computational science, provided in particular by deep learning algorithms, can be used by economists in order to solve complex behavioral problems. In this article, we propose a state-of-the-art of reinforcement learning techniques, and present applications in economics, game theory, operation research and finance.