论文标题
远程电气倾斜优化的非政策学习
Off-policy Learning for Remote Electrical Tilt Optimization
论文作者
论文摘要
我们使用非政策上下文多臂(CMAB)技术解决了远程电倾斜(RET)优化的问题。 RET优化的目标是控制天线的垂直倾斜角度的方向,以优化代表用户在蜂窝网络中感知的服务质量(QOS)的关键性能指标(KPI)。学习改进的倾斜更新政策很难。一方面,在真实网络中以在线方式提出新政策需要探索以前从未使用过的倾斜更新,并且在操作上太冒险了。另一方面,通过模拟制定该策略的模拟间隔差距。在本文中,我们通过使用实际网络收集的现有数据以离线方式学习改进的政策来避免这些问题。我们制定了使用非政策CMAB框架制定此类策略的问题。我们建议从数据中提取最佳倾斜更新策略的CMAB学习算法。我们在现实世界中4G长期演化(LTE)蜂窝网络数据上训练和评估这些策略。我们的政策显示出对用于收集数据的基于规则的记录策略的一致改进。
We address the problem of Remote Electrical Tilt (RET) optimization using off-policy Contextual Multi-Armed-Bandit (CMAB) techniques. The goal in RET optimization is to control the orientation of the vertical tilt angle of the antenna to optimize Key Performance Indicators (KPIs) representing the Quality of Service (QoS) perceived by the users in cellular networks. Learning an improved tilt update policy is hard. On the one hand, coming up with a new policy in an online manner in a real network requires exploring tilt updates that have never been used before, and is operationally too risky. On the other hand, devising this policy via simulations suffers from the simulation-to-reality gap. In this paper, we circumvent these issues by learning an improved policy in an offline manner using existing data collected on real networks. We formulate the problem of devising such a policy using the off-policy CMAB framework. We propose CMAB learning algorithms to extract optimal tilt update policies from the data. We train and evaluate these policies on real-world 4G Long Term Evolution (LTE) cellular network data. Our policies show consistent improvements over the rule-based logging policy used to collect the data.