论文标题
低级别特征张量密度估计第I部分:基础
Low-rank Characteristic Tensor Density Estimation Part I: Foundations
论文作者
论文摘要
有效的非参数密度估计是高维多元数据分析的关键挑战。在本文中,我们提出了一种基于张力分解工具的新颖方法。任何多元密度都可以通过傅立叶变换来表示其特征函数。如果所寻求的密度被紧凑,则可以在可控制的误差中通过前导傅立叶系数的有限张量近似其特征函数,它们的大小是基于基础密度的平滑度的尺寸。可以通过样品平均观察到的随机向量的实现来自然估计该张量。为了规避维度的诅咒,我们引入了该特征张量的低排名模型,该模型显着改善了密度估计值,尤其是对于高维数据和/或样品饥饿的制度。由于在某些条件下,由于低级张量分解的独特性,我们的方法使学习真实的数据生成分布。我们使用几个测量数据集证明了该方法的非常有希望的性能。
Effective non-parametric density estimation is a key challenge in high-dimensional multivariate data analysis. In this paper,we propose a novel approach that builds upon tensor factorization tools. Any multivariate density can be represented by its characteristic function, via the Fourier transform. If the sought density is compactly supported, then its characteristic function can be approximated, within controllable error, by a finite tensor of leading Fourier coefficients, whose size de-pends on the smoothness of the underlying density. This tensor can be naturally estimated from observed realizations of the random vector of interest, via sample averaging. In order to circumvent the curse of dimensionality, we introduce a low-rank model of this characteristic tensor, which significantly improves the density estimate especially for high-dimensional data and/or in the sample-starved regime. By virtue of uniqueness of low-rank tensor decomposition, under certain conditions, our method enables learning the true data-generating distribution. We demonstrate the very promising performance of the proposed method using several measured datasets.