Ask-AC：倡议顾问在循环演员批评框架中

论文标题

Ask-AC：倡议顾问在循环演员批评框架中

Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework

论文作者

Liu, Shunyu, Chen, Kaixuan, Yu, Na, Song, Jie, Feng, Zunlei, Song, Mingli

论文摘要

尽管取得了令人鼓舞的结果，但最新的交互式增强学习方案依赖于以持续监视或预定义的规则的形式从顾问专家那里获得监督信号，这不可避免地导致了繁琐且昂贵的学习过程。在本文中，我们介绍了一项新型的计划顾问顾问在循环演员 - 批评框架中，称为Ask-AC，该框架用双向学习者的定位者代替了单方面的顾问引导机制，从而替换了学习者和顾问之间的自定义和有效的消息交流。 Ask-AC的核心是两个互补的组件，即动作请求者和自适应状态选择器，可以很容易地将其纳入各种离散的参与者 - 批评架构中。前一个组件允许代理商在不确定状态的情况下首次寻求顾问干预，而后者则确定了前者可能错过的不稳定状态，尤其是在环境变化时，然后学会了在此类状态上推广询问行动。对固定环境和非平稳环境以及不同参与者 - 评分骨架的实验结果表明，所提出的框架显着提高了代理的学习效率，并与连续顾问监测获得的实验框架可以与表现相同。

Despite the promising results achieved, state-of-the-art interactive reinforcement learning schemes rely on passively receiving supervision signals from advisor experts, in the form of either continuous monitoring or pre-defined rules, which inevitably result in a cumbersome and expensive learning process. In this paper, we introduce a novel initiative advisor-in-the-loop actor-critic framework, termed as Ask-AC, that replaces the unilateral advisor-guidance mechanism with a bidirectional learner-initiative one, and thereby enables a customized and efficacious message exchange between learner and advisor. At the heart of Ask-AC are two complementary components, namely action requester and adaptive state selector, that can be readily incorporated into various discrete actor-critic architectures. The former component allows the agent to initiatively seek advisor intervention in the presence of uncertain states, while the latter identifies the unstable states potentially missed by the former especially when environment changes, and then learns to promote the ask action on such states. Experimental results on both stationary and non-stationary environments and across different actor-critic backbones demonstrate that the proposed framework significantly improves the learning efficiency of the agent, and achieves the performances on par with those obtained by continuous advisor monitoring.

下载PDF全文

下载文献需遵守相关版权规定

论文标题