论文标题
从嘈杂的观察结果中恢复数据排列:线性制度
Recovering Data Permutations from Noisy Observations: The Linear Regime
论文作者
论文摘要
本文考虑了嘈杂的数据结构恢复问题。目的是调查以下问题:鉴于对排列数据集的嘈杂观察,根据哪种原始数据排序?重点是根据各向同性高斯分布生成数据的方案,并且噪声是带有任意协方差矩阵的加性高斯。这个问题在假设检验框架内提出。目的是研究最佳解码器在数据大小中具有多项式复杂性的线性状态,并通过简单地计算噪声观察的无限依赖性线性函数来声明置换。本文的主要结果是根据噪声协方差矩阵的线性制度的完整表征。具体而言,表明该矩阵必须具有非常平坦的光谱,最多具有三种不同的特征值才能诱导线性状态。讨论了该结果的几个实际相关含义,也表征了线性制度中决策标准所产生的错误概率。核心技术组件包括使用线性代数和几何工具,例如施泰纳对称性。
This paper considers a noisy data structure recovery problem. The goal is to investigate the following question: Given a noisy observation of a permuted data set, according to which permutation was the original data sorted? The focus is on scenarios where data is generated according to an isotropic Gaussian distribution, and the noise is additive Gaussian with an arbitrary covariance matrix. This problem is posed within a hypothesis testing framework. The objective is to study the linear regime in which the optimal decoder has a polynomial complexity in the data size, and it declares the permutation by simply computing a permutation-independent linear function of the noisy observations. The main result of the paper is a complete characterization of the linear regime in terms of the noise covariance matrix. Specifically, it is shown that this matrix must have a very flat spectrum with at most three distinct eigenvalues to induce the linear regime. Several practically relevant implications of this result are discussed, and the error probability incurred by the decision criterion in the linear regime is also characterized. A core technical component consists of using linear algebraic and geometric tools, such as Steiner symmetrization.