论文标题
强大的贝叶斯变量选择用于基因环境相互作用
Robust Bayesian variable selection for gene-environment interactions
论文作者
论文摘要
基因环境(g $ \ times $ e)相互作用具有重要意义,可以阐明超出主要遗传和环境影响的复杂疾病的病因。通常遇到了G $ \ t Times $ e研究中疾病表型中的异常值和数据污染,从而导致了广泛的稳健正则化方法的发展。然而,在贝叶斯框架内,在现有研究中尚未解决这个问题。我们为G $ \ times $ e相互作用研究开发了一种完全贝叶斯强大的变量选择方法。拟议的贝叶斯方法可以有效地在响应变量中有效地适应重尾错误和异常值,同时通过考虑结构稀疏性进行变量选择。特别是,对于强大的稀疏组选择,已经对个体和组水平施加了尖峰和slab先验,以牢固地识别重要的主和相互作用效应。已经开发了有效的吉布斯采样器来促进快速计算。通过护士健康研究和TCGA黑色素瘤数据进行基因表达测量的SNP测量的大量仿真研究和分析,对糖尿病数据进行了测量,这表明所提出的方法的出色性能超过了多个竞争替代方案。
Gene-environment (G$\times$E) interactions have important implications to elucidate the etiology of complex diseases beyond the main genetic and environmental effects. Outliers and data contamination in disease phenotypes of G$\times$E studies have been commonly encountered, leading to the development of a broad spectrum of robust regularization methods. Nevertheless, within the Bayesian framework, the issue has not been taken care of in existing studies. We develop a fully Bayesian robust variable selection method for G$\times$E interaction studies. The proposed Bayesian method can effectively accommodate heavy-tailed errors and outliers in the response variable while conducting variable selection by accounting for structural sparsity. In particular, for the robust sparse group selection, the spike-and-slab priors have been imposed on both individual and group levels to identify important main and interaction effects robustly. An efficient Gibbs sampler has been developed to facilitate fast computation. Extensive simulation studies and analysis of both the diabetes data with SNP measurements from the Nurses' Health Study and TCGA melanoma data with gene expression measurements demonstrate the superior performance of the proposed method over multiple competing alternatives.