论文标题
基于图的与健康相关的面积数据的空间分割
Graph-Based Spatial Segmentation of Health-Related Areal Data
论文作者
论文摘要
平滑通常用于提高嘈杂的面积数据的可读性和解释性。但是,在许多情况下,基本数量是不连续的。在这种情况下,需要特定的方法来估计分段恒定空间过程。在这种情况下,一种众所周知的方法是使用邻接图和基于图的融合套索对信号进行分割。但是此方法不能很好地扩展到大图。本文介绍了一种新的方法,用于分段恒定的空间估计,即(i)在大图上迅速计算,并且(ii)比熔融套索(对于相同数量的正则化)产生更稀疏的模型,从而给出了易于解释的估计值。我们说明了模拟数据的方法,并将其应用于荷兰超重患病率的真实数据。确定了健康和不健康的区域,这无法通过社会经济特征的人口统计学来解释。我们发现我们的方法能够识别此类区域,并可以协助决策者制定改善健康策略。我们在R中的方法的实现可在github.com/goepp/graphseg上公开获得。
Smoothing is often used to improve the readability and interpretability of noisy areal data. However there are many instances where the underlying quantity is discontinuous. In this case, specific methods are needed to estimate the piecewise constant spatial process. A well-known approach in this setting is to perform segmentation of the signal using the adjacency graph, as does the graph-based fused lasso. But this method does not scale well to large graphs. This article introduces a new method for piecewise-constant spatial estimation that (i) is fast to compute on large graphs and (ii) yields sparser models than the fused lasso (for the same amount of regularization), giving estimates that are easier to interpret. We illustrate our method on simulated data and apply it to real data on overweight prevalence in the Netherlands. Healthy and unhealthy zones are identified which cannot be explained by demographic of socio-economic characteristics. We find that our method is capable of identifying such zones and can assist policy makers with their health-improving strategies. The implementation of our method in R is publicly available at github.com/goepp/graphseg.