基于事件的动态频谱访问的多代理深层随机策略梯度

论文标题

基于事件的动态频谱访问的多代理深层随机策略梯度

Multi-Agent Deep Stochastic Policy Gradient for Event Based Dynamic Spectrum Access

论文作者

Kassab, Rahif, Destounis, Apostolos, Tsilimantos, Dimitrios, Debbah, Merouane

论文摘要

我们考虑了动态频谱访问（DSA）问题，其中$ k $物联网（IoT）设备竞争构成框架的$ t $ time插槽。设备共同监视$ m $事件，其中每个事件都可以由多个物联网设备监视。当至少一个监视事件处于活动状态时，每个设备都会选择一个事件和一个时间插槽来传输相应的活动事件信息。如果多个设备选择同一时间插槽，则会发生碰撞并丢弃所有发送的数据包。为了捕获观察同一事件的设备可能会传输冗余信息的事实，我们考虑系统的平均总和事件速率而不是经典的帧吞吐量。我们提出了一种基于多机构深层确定性策略梯度（MADDPG）随机版本的多代理增强学习方法，以通过利用设备级别的相关性和事件的时间相关性来访问框架。通过数值模拟，我们表明所提出的方法能够有效利用上述相关性，并且表现优于基准解决方案，例如标准多元访问协议以及广泛使用的独立深Q-NETWORK（IDQN）算法。

We consider the dynamic spectrum access (DSA) problem where $K$ Internet of Things (IoT) devices compete for $T$ time slots constituting a frame. Devices collectively monitor $M$ events where each event could be monitored by multiple IoT devices. Each device, when at least one of its monitored events is active, picks an event and a time slot to transmit the corresponding active event information. In the case where multiple devices select the same time slot, a collision occurs and all transmitted packets are discarded. In order to capture the fact that devices observing the same event may transmit redundant information, we consider the maximization of the average sum event rate of the system instead of the classical frame throughput. We propose a multi-agent reinforcement learning approach based on a stochastic version of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) to access the frame by exploiting device-level correlation and time correlation of events. Through numerical simulations, we show that the proposed approach is able to efficiently exploit the aforementioned correlations and outperforms benchmark solutions such as standard multiple access protocols and the widely used Independent Deep Q-Network (IDQN) algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题