论文标题
基于设计不完整的U统计数据
Design based incomplete U-statistics
论文作者
论文摘要
U统计量被广泛用于经济学,机器学习和统计等领域。但是,尽管他们享有理想的统计属性,但它们具有明显的缺点,因为随着数据尺寸$ n $的增加,计算变得不切实际。具体而言,组合数量(例如$ m $),即$ d $必须评估的U级统计量为$ O(n^d)$。自Blom(1976)以来,已经做出了许多努力,以使用一小部分组合来近似原始的U统计量,后者将这种近似值称为不完整的U统计效果。据我们所知,所有现有的方法都需要$ m $才能比$ n $更快,尽管比$ n^d $要慢,以使相应的不完整的U态stativation在平均平方错误方面发挥渐近效率。在本文中,我们引入了一种新型的不完整的U统计量,即使$ M $的增长速度比$ n $慢,该统计效率也可能渐近。在某些情况下,$ m $仅需要超过$ \ sqrt {n} $的生长速度。我们的理论和经验结果均显示出新不完整的U统计效率的统计效率的显着提高。
U-statistics are widely used in fields such as economics, machine learning, and statistics. However, while they enjoy desirable statistical properties, they have an obvious drawback in that the computation becomes impractical as the data size $n$ increases. Specifically, the number of combinations, say $m$, that a U-statistic of order $d$ has to evaluate is $O(n^d)$. Many efforts have been made to approximate the original U-statistic using a small subset of combinations since Blom (1976), who referred to such an approximation as an incomplete U-statistic. To the best of our knowledge, all existing methods require $m$ to grow at least faster than $n$, albeit more slowly than $n^d$, in order for the corresponding incomplete U-statistic to be asymptotically efficient in terms of the mean squared error. In this paper, we introduce a new type of incomplete U-statistic that can be asymptotically efficient, even when $m$ grows more slowly than $n$. In some cases, $m$ is only required to grow faster than $\sqrt{n}$. Our theoretical and empirical results both show significant improvements in the statistical efficiency of the new incomplete U-statistic.