一种可扩展的分区方法，用于建模大规模的非组织非高斯空间数据集

论文标题

一种可扩展的分区方法，用于建模大规模的非组织非高斯空间数据集

A Scalable Partitioned Approach to Model Massive Nonstationary Non-Gaussian Spatial Datasets

论文作者

Lee, Benjamin Seiyon, Park, Jaewoo

论文摘要

非组织非高斯空间数据在许多学科中很常见，包括气候科学，生态学，流行病学和社会科学。示例包括有关疾病发病率和二进制卫星数据的计数数据（云/无云）。将这些数据集建模为固定空间过程可能是不现实的，因为它们是在大型异质域中收集的（即空间行为在各个区域之间有所不同）。尽管已经针对非组织空间模型开发了几种方法，但这些方法主要集中在高斯响应上。此外，大型非高斯数据集的拟合非组织模型在计算上是过于刺激的。为了应对这些挑战，我们提出了一种可扩展的算法，用于通过利用现代高性能计算系统中的并行计算来建模此类数据。我们使用经过精心策划的空间基础函数将空间域分配为不相交的子区域，并拟合本地非组织模型。然后，我们使用新型的基于邻居的加权方案结合了本地过程。我们的方法可以很好地扩展到大量数据集（例如100万个样本），并且可以在Nimble（贝叶斯分层建模的流行软件环境中实现）。我们演示了模拟示例和两个与传染病和遥感有关的大型现实数据集的方法。

Nonstationary non-Gaussian spatial data are common in many disciplines, including climate science, ecology, epidemiology, and social sciences. Examples include count data on disease incidence and binary satellite data on cloud mask (cloud/no-cloud). Modeling such datasets as stationary spatial processes can be unrealistic since they are collected over large heterogeneous domains (i.e., spatial behavior differs across subregions). Although several approaches have been developed for nonstationary spatial models, these have focused primarily on Gaussian responses. In addition, fitting nonstationary models for large non-Gaussian datasets is computationally prohibitive. To address these challenges, we propose a scalable algorithm for modeling such data by leveraging parallel computing in modern high-performance computing systems. We partition the spatial domain into disjoint subregions and fit locally nonstationary models using a carefully curated set of spatial basis functions. Then, we combine the local processes using a novel neighbor-based weighting scheme. Our approach scales well to massive datasets (e.g., 1 million samples) and can be implemented in nimble, a popular software environment for Bayesian hierarchical modeling. We demonstrate our method to simulated examples and two large real-world datasets pertaining to infectious diseases and remote sensing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题