论文标题

光谱正规核两样本测试

Spectral Regularized Kernel Two-Sample Tests

论文作者

Hagrass, Omar, Sriperumbudur, Bharath K., Li, Bing

论文摘要

在过去的十年中,这种方法在解决一般(即非欧几里得)域上解决非参数测试问题的方法是基于复制概率分布的核心核心嵌入的核心内核希尔伯特空间(RKHS)的概念。我们工作的主要目标是了解基于这种方法构建的两样本测试的最佳性。首先,我们显示流行的MMD(最大平均差异)两样本测试在Hellinger距离中测得的分离边界方面并不是最佳的。其次,我们通过考虑协方差信息(未通过MMD检验捕获),对基于光谱正则化的MMD测试进行修改,并证明所提出的测试是最佳的最佳分离边界的最佳测试。第三,我们提出了上述测试的自适应版本,该版本涉及数据驱动的策略,以选择正则化参数,并显示自适应测试几乎是最小到对数因素的最小值。此外,我们的结果适用于测试的置换变体,其中测试阈值通过样品的排列优雅选择。通过关于合成和真实数据的数值实验,我们证明了与MMD检验和文献中其他流行测试相比,提出的测试的出色性能。

Over the last decade, an approach that has gained a lot of popularity to tackle nonparametric testing problems on general (i.e., non-Euclidean) domains is based on the notion of reproducing kernel Hilbert space (RKHS) embedding of probability distributions. The main goal of our work is to understand the optimality of two-sample tests constructed based on this approach. First, we show the popular MMD (maximum mean discrepancy) two-sample test to be not optimal in terms of the separation boundary measured in Hellinger distance. Second, we propose a modification to the MMD test based on spectral regularization by taking into account the covariance information (which is not captured by the MMD test) and prove the proposed test to be minimax optimal with a smaller separation boundary than that achieved by the MMD test. Third, we propose an adaptive version of the above test which involves a data-driven strategy to choose the regularization parameter and show the adaptive test to be almost minimax optimal up to a logarithmic factor. Moreover, our results hold for the permutation variant of the test where the test threshold is chosen elegantly through the permutation of the samples. Through numerical experiments on synthetic and real data, we demonstrate the superior performance of the proposed test in comparison to the MMD test and other popular tests in the literature.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源