论文标题
自动课程生成用于网络中的学习适应
Automatic Curriculum Generation for Learning Adaptation in Networking
论文作者
论文摘要
当深度强化学习(RL)展示其在网络和系统中的优势时,其陷阱也引起了公众的关注 - 当培训以应对各种网络工作负载和以前看不见的部署环境时,RL政策通常表现出次优性能和差的概括性。 为了解决这些问题,我们提出了Genet,这是一个新的培训框架,用于学习更好的基于RL的网络适应算法。 Genet建立在课程学习的概念上,该课程学习已证明对RL广泛使用的其他领域的类似问题有效。在高水平上,课程学习逐渐为培训提供了更困难的环境,而不是随机选择它们,以便当前的RL模型可以在培训中取得有意义的进步。但是,在网络中应用课程学习是具有挑战性的,因为它如何衡量网络环境的“难度”仍然未知。 我们的见解不是依靠手工制作的启发式方法来确定环境的难度水平,而是利用传统的基于规则的(非RL)基准:如果当前的RL模型在网络环境中的性能明显比基线差,那么在这种环境中进一步培训时,该模型的潜力是实质性的。因此,基因会自动搜索当前模型显着落后于传统基线方案的环境,并随着训练的进展而迭代地促进这些环境。通过在三种用例中评估基因 - 自适应视频流,拥塞控制和负载平衡,我们表明,基因产生的RL策略在每种情况下都超过了经常训练的RL策略和传统基线,而不仅仅是在合成工作负载下,而且在实际环境中。
As deep reinforcement learning (RL) showcases its strengths in networking and systems, its pitfalls also come to the public's attention--when trained to handle a wide range of network workloads and previously unseen deployment environments, RL policies often manifest suboptimal performance and poor generalizability. To tackle these problems, we present Genet, a new training framework for learning better RL-based network adaptation algorithms. Genet is built on the concept of curriculum learning, which has proved effective against similar issues in other domains where RL is extensively employed. At a high level, curriculum learning gradually presents more difficult environments to the training, rather than choosing them randomly, so that the current RL model can make meaningful progress in training. However, applying curriculum learning in networking is challenging because it remains unknown how to measure the "difficulty" of a network environment. Instead of relying on handcrafted heuristics to determine the environment's difficulty level, our insight is to utilize traditional rule-based (non-RL) baselines: If the current RL model performs significantly worse in a network environment than the baselines, then the model's potential to improve when further trained in this environment is substantial. Therefore, Genet automatically searches for the environments where the current model falls significantly behind a traditional baseline scheme and iteratively promotes these environments as the training progresses. Through evaluating Genet on three use cases--adaptive video streaming, congestion control, and load balancing, we show that Genet produces RL policies which outperform both regularly trained RL policies and traditional baselines in each context, not only under synthetic workloads but also in real environments.