论文标题
关于随机素描算法和Tracy-Widom定律
On randomized sketching algorithms and the Tracy-Widom law
论文作者
论文摘要
越来越多的工作探讨了将随机投影集成到数值线性代数的算法中。主要动机是降低处理大数据集的总体计算成本。适当选择的随机投影可用于将原始数据集嵌入较低维空间中,以便保留原始数据集的关键属性。这些算法通常称为草图算法,因为投影的数据集可以用作完整数据集的压缩表示。我们表明,随机矩阵理论,尤其是Tracy-Widom定律,可用于描述高数据制度中在$ n \ gg d $中描述素描算法的操作特征。渐近大样本结果特别感兴趣,因为这是素描对数据压缩最有用的制度。特别是,我们开发了生成随机子空间嵌入的成功率的渐近近似值以及迭代素描算法的收敛概率。我们在真正的大型高维数据集上测试了许多素描算法,发现渐近表达式可以准确地预测经验性能。
There is an increasing body of work exploring the integration of random projection into algorithms for numerical linear algebra. The primary motivation is to reduce the overall computational cost of processing large datasets. A suitably chosen random projection can be used to embed the original dataset in a lower-dimensional space such that key properties of the original dataset are retained. These algorithms are often referred to as sketching algorithms, as the projected dataset can be used as a compressed representation of the full dataset. We show that random matrix theory, in particular the Tracy-Widom law, is useful for describing the operating characteristics of sketching algorithms in the tall-data regime when $n \gg d$. Asymptotic large sample results are of particular interest as this is the regime where sketching is most useful for data compression. In particular, we develop asymptotic approximations for the success rate in generating random subspace embeddings and the convergence probability of iterative sketching algorithms. We test a number of sketching algorithms on real large high-dimensional datasets and find that the asymptotic expressions give accurate predictions of the empirical performance.