使用加固学习的对冲：上下文$ k $ armed的强盗与$ q $ - 学习

论文标题

使用加固学习的对冲：上下文$ k $ armed的强盗与$ q $ - 学习

Hedging using reinforcement learning: Contextual $k$-Armed Bandit versus $Q$-learning

论文作者

Cannelli, Loris, Nuti, Giuseppe, Sala, Marzio, Szehr, Oleg

论文摘要

在存在风险和市场摩擦的情况下，建立有或有索赔的复制策略是金融工程的关键问题。在实际市场中，连续复制，例如黑色，学者和默顿（BSM）的模型，不仅是不现实的，而且由于高交易成本而不受欢迎。已经提出了多种方法来平衡有效的复制和在不完整的市场环境中的损失之间。随着人工智能（AI）的兴起，基于AI的对冲引起了人们的极大兴趣，在此特别关注的是对经常性的神经网络系统和$ Q $ - 学习算法的变化。从实际的角度来看，只能从市场环境的模拟器中获得足够的培训样本。但是，如果仅根据模拟数据对代理进行培训，则运行时性能将主要反映模拟的准确性，这导致了模型选择和校准的经典问题。在本文中，对冲问题被视为规避风险的上下文$ k $武装的强盗问题的实例，这是由于体系结构的简单性和样本效率所激发的。这允许从现实世界数据中进行现实的在线模型更新。我们发现，$ k $武装的匪徒模型自然适合对冲的损益和损失表述，从而在没有交易成本和风险的情况下提供了比$ q $更准确，有效的方法。

The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton (BSM), is not only unrealistic but it is also undesirable due to high transaction costs. A variety of methods have been proposed to balance between effective replication and losses in the incomplete market setting. With the rise of Artificial Intelligence (AI), AI-based hedgers have attracted considerable interest, where particular attention was given to Recurrent Neural Network systems and variations of the $Q$-learning algorithm. From a practical point of view, sufficient samples for training such an AI can only be obtained from a simulator of the market environment. Yet if an agent was trained solely on simulated data, the run-time performance will primarily reflect the accuracy of the simulation, which leads to the classical problem of model choice and calibration. In this article, the hedging problem is viewed as an instance of a risk-averse contextual $k$-armed bandit problem, which is motivated by the simplicity and sample-efficiency of the architecture. This allows for realistic online model updates from real-world data. We find that the $k$-armed bandit model naturally fits to the Profit and Loss formulation of hedging, providing for a more accurate and sample efficient approach than $Q$-learning and reducing to the Black-Scholes model in the absence of transaction costs and risks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题