论文标题
随机SVD及其应用于统计的扰动分析
Perturbation Analysis of Randomized SVD and its Applications to Statistics
论文作者
论文摘要
随机奇异值分解(RSVD)是用于计算大型数据矩阵截断的SVD的一类计算有效算法。给定一个$ m \ times n $矩阵$ \ wideHat {\ mathbf m} $,原型RSVD算法输出的近似值$ k $引导的左左单数矢量的$ \ wideHat {\ m mathbf {m}} $,通过计算$ \ \ \ wide的svd} (\ wideHat {\ mathbf m}^{\ top} \ wideHat {\ mathbf {m}}}}})^{g} \ Mathbf g $;这里$ g \ geq 1 $是一个整数,$ \ mathbf g \ in \ mathbb {r}^{n \ times \ wideTilde {k}} $是一个随机的高斯素描矩阵,带有$ \ widetilde {k} \ geq k $。在本文中,我们为$ \ ell_2 $和$ \ ell_ {2,\ infty} $距(通过RSVD获得)以及当$ \ wideHat {\ mathbf {m}} $时限制进入限制,将投影到$ \ wideHat {\ Mathbf {u}} _ g \ g \ wideHat {\ wideHat {\ mathbf {\ mathbf {u}} _ g^_ g^^{\ top top} $。这些界限取决于奇异值差距和电源迭代的数量$ g $,较小的差距需要$ g $的较大值,以保证$ \ ell_2 $和$ \ ell_ {2,\ infty} $ distances的融合。我们将理论结果应用于$ \ wideHat {\ mathbf {m}} $的设置,是某些未观察到的信号矩阵$ \ mathbf {m mathbf {m} $的加法扰动。特别是,我们在三个推理问题上获得了RSVD的几乎最佳的收敛率和渐近正态性,即在随机图,嘈杂的矩阵完成中的子空间估计和社区检测以及缺少数据的PCA。
Randomized singular value decomposition (RSVD) is a class of computationally efficient algorithms for computing the truncated SVD of large data matrices. Given an $m \times n$ matrix $\widehat{\mathbf M}$, the prototypical RSVD algorithm outputs an approximation of the $k$ leading left singular vectors of $\widehat{\mathbf{M}}$ by computing the SVD of $\widehat{\mathbf{M}} (\widehat{\mathbf M}^{\top} \widehat{\mathbf{M}})^{g} \mathbf G$; here $g \geq 1$ is an integer and $\mathbf G \in \mathbb{R}^{n \times \widetilde{k}}$ is a random Gaussian sketching matrix with $\widetilde{k} \geq k$. In this paper we derive upper bounds for the $\ell_2$ and $\ell_{2,\infty}$ distances between the exact left singular vectors $\widehat{\mathbf{U}}$ of $\widehat{\mathbf{M}}$ and its approximation $\widehat{\mathbf{U}}_g$ (obtained via RSVD), as well as entrywise error bounds when $\widehat{\mathbf{M}}$ is projected onto $\widehat{\mathbf{U}}_g \widehat{\mathbf{U}}_g^{\top}$. These bounds depend on the singular values gap and number of power iterations $g$, and smaller gap requires larger values of $g$ to guarantee the convergences of the $\ell_2$ and $\ell_{2,\infty}$ distances. We apply our theoretical results to settings where $\widehat{\mathbf{M}}$ is an additive perturbation of some unobserved signal matrix $\mathbf{M}$. In particular, we obtain the nearly-optimal convergence rate and asymptotic normality for RSVD on three inference problems, namely, subspace estimation and community detection in random graphs, noisy matrix completion, and PCA with missing data.