论文标题
强大的离群臂识别
Robust Outlier Arm Identification
论文作者
论文摘要
我们研究了稳健的离群臂识别(ROAI)的问题,该问题是通过自适应地采样的奖励分布来识别其预期奖励与大多数人大大偏离大多数人的武器。我们使用预期奖励的中值和中值绝对偏差来计算离群阈值。与使用平均值和标准偏差相比,这是阈值的强大选择,因为即使在存在极端异常值的情况下,它也可以识别较高的臂。我们的设置不同于现有的纯探索问题,在该问题中,阈值被预先指定为给定值或等级。这在目的是确定有前途的项目的应用程序中很有用,但是该集合的基数尚不清楚,例如为新疾病寻找有希望的药物或识别人群青睐的物品。我们为ROAI提出了两种$δ$ -PAC算法,其中包括第一个用于离群检测的UCB式算法,并在其样品复杂性上得出上限。我们还证明了与对数因素相匹配的问题,最坏情况下的情况下的最低限制了,这表明我们的上限通常是不可解决的。实验结果表明,与最先进的算法相比,我们的算法既有稳定性又大约$ 5 $ x。
We study the problem of Robust Outlier Arm Identification (ROAI), where the goal is to identify arms whose expected rewards deviate substantially from the majority, by adaptively sampling from their reward distributions. We compute the outlier threshold using the median and median absolute deviation of the expected rewards. This is a robust choice for the threshold compared to using the mean and standard deviation, since it can identify outlier arms even in the presence of extreme outlier values. Our setting is different from existing pure exploration problems where the threshold is pre-specified as a given value or rank. This is useful in applications where the goal is to identify the set of promising items but the cardinality of this set is unknown, such as finding promising drugs for a new disease or identifying items favored by a population. We propose two $δ$-PAC algorithms for ROAI, which includes the first UCB-style algorithm for outlier detection, and derive upper bounds on their sample complexity. We also prove a matching, up to logarithmic factors, worst case lower bound for the problem, indicating that our upper bounds are generally unimprovable. Experimental results show that our algorithms are both robust and about $5$x sample efficient compared to state-of-the-art.