论文标题
一种新的方法,可以从大社区数据中更快,更准确地推断物种关联
A new method for faster and more accurate inference of species associations from big community data
论文作者
论文摘要
1。联合物种分布模型(JSDM)解释了通过环境的贡献,生物关联以及可能在空间结构的残留协方差来解释社区组成的空间变化。作为社区生态学和宏观生态学的一般分析框架,它们表现出巨大的希望,但是当前的JSDM即使在潜在变量近似时,在大型数据集上的扩展也很差,从而限制了它们对于当前新兴的大型(例如,metabarcoding and Metabarcoding and)的有用性。 2。在这里,我们提出了一种新颖的,更可扩展的JSDM(SJSDM),该JSDM(SJSDM)通过使用关节JSDM可能性的蒙特卡罗整合来规避使用潜在变量,并允许在所有模型组件上进行灵活的弹性净正则化。我们在Pytorch中实施了SJSDM,这是一个现代的机器学习框架,可以利用CPU和GPU计算。使用模拟社区与已知的物种相关以及不同数量的物种和地点,我们将SJSDM与最先进的JSDM实现进行比较,以确定推断的物种种类和物种 - 环境关联的计算运行时间和准确性。 3。我们发现SJSDM的数量级比现有的JSDM算法快(即使在CPU上运行),并且可以缩放到非常大的数据集。尽管速度得到了巨大提高,但SJSDM比替代JSDM实现更准确地估计物种关联结构。我们使用EDNA案例研究与数千种真菌运营分类单元(OTU)一起证明了SJSDM对大型社区数据的适用性。 4。我们的SJSDM方法使JSDM对数百或数千种可能的大型社区数据集进行了分析,从而大大扩展了JSDM在生态学中的适用性。我们在R软件包中提供我们的方法,以促进其用于实际数据分析的适用性。
1. Joint Species Distribution models (JSDMs) explain spatial variation in community composition by contributions of the environment, biotic associations, and possibly spatially structured residual covariance. They show great promise as a general analytical framework for community ecology and macroecology, but current JSDMs, even when approximated by latent variables, scale poorly on large datasets, limiting their usefulness for currently emerging big (e.g., metabarcoding and metagenomics) community datasets. 2. Here, we present a novel, more scalable JSDM (sjSDM) that circumvents the need to use latent variables by using a Monte-Carlo integration of the joint JSDM likelihood and allows flexible elastic net regularization on all model components. We implemented sjSDM in PyTorch, a modern machine learning framework that can make use of CPU and GPU calculations. Using simulated communities with known species-species associations and different number of species and sites, we compare sjSDM with state-of-the-art JSDM implementations to determine computational runtimes and accuracy of the inferred species-species and species-environmental associations. 3. We find that sjSDM is orders of magnitude faster than existing JSDM algorithms (even when run on the CPU) and can be scaled to very large datasets. Despite the dramatically improved speed, sjSDM produces more accurate estimates of species association structures than alternative JSDM implementations. We demonstrate the applicability of sjSDM to big community data using eDNA case study with thousands of fungi operational taxonomic units (OTU). 4. Our sjSDM approach makes the analysis of JSDMs to large community datasets with hundreds or thousands of species possible, substantially extending the applicability of JSDMs in ecology. We provide our method in an R package to facilitate its applicability for practical data analysis.