快速计算潜在相关性

论文标题

快速计算潜在相关性

Fast computation of latent correlations

论文作者

Yoon, Grace, Müller, Christian L., Gaynanova, Irina

论文摘要

潜在的高斯副模型提供了执行多视图数据集成的强大手段，因为这些模型可以通过潜在的高斯相关性无缝地表达混合变量类型（二进制，连续，元素膨胀）之间的依赖性。然而，这些潜在相关性的估计以相当大的计算成本，因为它阻止了这些模型在高维数据上的常规使用。在这里，我们提出了一种新的计算方法，用于通过混合多线性插值和优化方案来估计潜在相关性。我们的方法可以通过几个数量级加速最新计算的现状，从而可以快速计算潜在的高斯模型模型，即使变量数量$ p $很大。我们为我们的数值方案的近似错误提供了理论保证，并支持其在模拟和现实数据上的出色性能。我们说明了我们方法在高维稀疏定量和相对丰度微生物组数据以及癌症基因组图集项目的多视图数据方面的实际优势。我们的方法在r软件包混合机中实现，可在https://github.com/irinagain/mixedcca上找到。

Latent Gaussian copula models provide a powerful means to perform multi-view data integration since these models can seamlessly express dependencies between mixed variable types (binary, continuous, zero-inflated) via latent Gaussian correlations. The estimation of these latent correlations, however, comes at considerable computational cost, having prevented the routine use of these models on high-dimensional data. Here, we propose a new computational approach for estimating latent correlations via a hybrid multi-linear interpolation and optimization scheme. Our approach speeds up the current state of the art computation by several orders of magnitude, thus allowing fast computation of latent Gaussian copula models even when the number of variables $p$ is large. We provide theoretical guarantees for the approximation error of our numerical scheme and support its excellent performance on simulated and real-world data. We illustrate the practical advantages of our method on high-dimensional sparse quantitative and relative abundance microbiome data as well as multi-view data from The Cancer Genome Atlas Project. Our method is implemented in the R package mixedCCA, available at https://github.com/irinagain/mixedCCA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题