风险感知线性匪徒：智能顺序路由中的理论和应用

论文标题

风险感知线性匪徒：智能顺序路由中的理论和应用

Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing

论文作者

Ji, Jingwei, Xu, Renyuan, Zhu, Ruihao

论文摘要

由机器学习中的实际考虑因素进行财务决策，例如规避风险和大型行动空间，我们考虑使用智能订单路由中的应用（SOR）进行风险吸引的土匪优化。具体而言，基于对纳斯达克迭代数据集产生的线性价格影响的初步观察，我们启动了风险感知的线性匪徒的研究。在这种情况下，我们旨在最大程度地减少遗憾，在面对一组奖励是（最初）未知参数的线性函数的动作时，在均值方差指标下，衡量我们的性能不足。在方差最小化的全球最佳（G-最佳）设计的驱动下，我们提出了新型独立于实例的风险意识探索探索（RISE）算法（RISE）算法和实例依赖性风险感知的连续消除（RISE ++）算法。然后，我们严格地分析了他们近乎最佳的遗憾上限，以表明，通过利用线性结构，与现有方法相比，我们的算法可以大大减少遗憾。最后，我们通过使用合成数据集和NASDAQ ITCH数据集进行SOR设置中的广泛数值实验来证明算法的性能。我们的结果表明，1）纳斯达克数据集确实可以很好地支持线性结构假设；更重要的是2）在遗憾的角度，尤其是在复杂的决策情况下，上升和上升++都可以显着优于竞争方法。

Motivated by practical considerations in machine learning for financial decision-making, such as risk aversion and large action space, we consider risk-aware bandits optimization with applications in smart order routing (SOR). Specifically, based on preliminary observations of linear price impacts made from the NASDAQ ITCH dataset, we initiate the study of risk-aware linear bandits. In this setting, we aim at minimizing regret, which measures our performance deficit compared to the optimum's, under the mean-variance metric when facing a set of actions whose rewards are linear functions of (initially) unknown parameters. Driven by the variance-minimizing globally-optimal (G-optimal) design, we propose the novel instance-independent Risk-Aware Explore-then-Commit (RISE) algorithm and the instance-dependent Risk-Aware Successive Elimination (RISE++) algorithm. Then, we rigorously analyze their near-optimal regret upper bounds to show that, by leveraging the linear structure, our algorithms can dramatically reduce the regret when compared to existing methods. Finally, we demonstrate the performance of the algorithms by conducting extensive numerical experiments in the SOR setup using both synthetic datasets and the NASDAQ ITCH dataset. Our results reveal that 1) The linear structure assumption can indeed be well supported by the Nasdaq dataset; and more importantly 2) Both RISE and RISE++ can significantly outperform the competing methods, in terms of regret, especially in complex decision-making scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题