高斯图形模型探索和选择高尺寸低样本量设置

论文标题

高斯图形模型探索和选择高尺寸低样本量设置

Gaussian Graphical Model exploration and selection in high dimension low sample size setting

论文作者

Lartigue, Thomas, Bottani, Simona, Baron, Stephanie, Colliot, Olivier, Durrleman, Stanley, Allassonnière, Stéphanie

论文摘要

高斯图形模型（GGM）通常用于描述随机向量的组件之间的条件相关性。在本文中，我们比较了GGM推理方法的两个家族：鼻边缘选择和受到惩罚的可能性最大化。我们在合成数据上证明，当样本量较小时，两种方法与真实的数据相比，两种方法的边缘太少或太多。结果，我们提出了一种复合程序，该程序探索了具有点数数值方案的图形家庭，并选择了具有整体似然标准的候选者。我们证明，当观测值的数量很少时，此选择方法的图形更接近真理，并且对应于与实际分布相比，与其他两个分布相比，kl差异更好。最后，我们对两种具体情况表明了算法的兴趣：首先是大脑成像数据，然后是生物肾脏病数据。在这两种情况下，我们的结果都与每个领域的当前知识一致。

Gaussian Graphical Models (GGM) are often used to describe the conditional correlations between the components of a random vector. In this article, we compare two families of GGM inference methods: nodewise edge selection and penalised likelihood maximisation. We demonstrate on synthetic data that, when the sample size is small, the two methods produce graphs with either too few or too many edges when compared to the real one. As a result, we propose a composite procedure that explores a family of graphs with an nodewise numerical scheme and selects a candidate among them with an overall likelihood criterion. We demonstrate that, when the number of observations is small, this selection method yields graphs closer to the truth and corresponding to distributions with better KL divergence with regards to the real distribution than the other two. Finally, we show the interest of our algorithm on two concrete cases: first on brain imaging data, then on biological nephrology data. In both cases our results are more in line with current knowledge in each field.

下载PDF全文

下载文献需遵守相关版权规定

论文标题