论文标题

随机SVD及其应用于统计的扰动分析

Perturbation Analysis of Randomized SVD and its Applications to Statistics

论文作者

Zhang, Yichi, Tang, Minh

论文摘要

随机奇异值分解(RSVD)是用于计算大型数据矩阵截断的SVD的一类计算有效算法。给定一个$ m \ times n $矩阵$ \ wideHat {\ mathbf m} $,原型RSVD算法输出的近似值$ k $引导的左左单数矢量的$ \ wideHat {\ m mathbf {m}} $,通过计算$ \ \ \ wide的svd} (\ wideHat {\ mathbf m}^{\ top} \ wideHat {\ mathbf {m}}}}})^{g} \ Mathbf g $;这里$ g \ geq 1 $是一个整数,$ \ mathbf g \ in \ mathbb {r}^{n \ times \ wideTilde {k}} $是一个随机的高斯素描矩阵,带有$ \ widetilde {k} \ geq k $。在本文中,我们为$ \ ell_2 $和$ \ ell_ {2,\ infty} $距(通过RSVD获得)以及当$ \ wideHat {\ mathbf {m}} $时限制进入限制,将投影到$ \ wideHat {\ Mathbf {u}} _ g \ g \ wideHat {\ wideHat {\ mathbf {\ mathbf {u}} _ g^_ g^^{\ top top} $。这些界限取决于奇异值差距和电源迭代的数量$ g $,较小的差距需要$ g $的较大值,以保证$ \ ell_2 $和$ \ ell_ {2,\ infty} $ distances的融合。我们将理论结果应用于$ \ wideHat {\ mathbf {m}} $的设置,是某些未观察到的信号矩阵$ \ mathbf {m mathbf {m} $的加法扰动。特别是,我们在三个推理问题上获得了RSVD的几乎最佳的收敛率和渐近正态性,即在随机图,嘈杂的矩阵完成中的子空间估计和社区检测以及缺少数据的PCA。

Randomized singular value decomposition (RSVD) is a class of computationally efficient algorithms for computing the truncated SVD of large data matrices. Given an $m \times n$ matrix $\widehat{\mathbf M}$, the prototypical RSVD algorithm outputs an approximation of the $k$ leading left singular vectors of $\widehat{\mathbf{M}}$ by computing the SVD of $\widehat{\mathbf{M}} (\widehat{\mathbf M}^{\top} \widehat{\mathbf{M}})^{g} \mathbf G$; here $g \geq 1$ is an integer and $\mathbf G \in \mathbb{R}^{n \times \widetilde{k}}$ is a random Gaussian sketching matrix with $\widetilde{k} \geq k$. In this paper we derive upper bounds for the $\ell_2$ and $\ell_{2,\infty}$ distances between the exact left singular vectors $\widehat{\mathbf{U}}$ of $\widehat{\mathbf{M}}$ and its approximation $\widehat{\mathbf{U}}_g$ (obtained via RSVD), as well as entrywise error bounds when $\widehat{\mathbf{M}}$ is projected onto $\widehat{\mathbf{U}}_g \widehat{\mathbf{U}}_g^{\top}$. These bounds depend on the singular values gap and number of power iterations $g$, and smaller gap requires larger values of $g$ to guarantee the convergences of the $\ell_2$ and $\ell_{2,\infty}$ distances. We apply our theoretical results to settings where $\widehat{\mathbf{M}}$ is an additive perturbation of some unobserved signal matrix $\mathbf{M}$. In particular, we obtain the nearly-optimal convergence rate and asymptotic normality for RSVD on three inference problems, namely, subspace estimation and community detection in random graphs, noisy matrix completion, and PCA with missing data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源