论文标题
高尺寸的稀疏添加剂模型带有小波
Sparse additive models in high dimensions with wavelets
论文作者
论文摘要
在多元回归中,当协变量数量众多时,通常可以合理地假设只有少数它们具有预测信息。例如,在某些医疗应用中,人们认为只有成千上万的基因负责癌症。在这种情况下,目的不仅是提出良好的拟合度,而且还要选择相关的协变量(基因)。我们建议使用高维度(样本大小和协变量数量)的加法模型进行模型选择。我们的方法是由于快速小波变换而在计算上有效的,它不依赖交叉验证,它解决了一个规定的惩罚参数的凸优化问题,称为分数通用阈值。我们还提出了基于Stein无偏见的风险估计的第二条规则。我们使用蒙特卡洛模拟和实际数据来比较基于错误发现率(FDR),真实正率(TPR)和平均平方误差的各种方法。我们的方法是唯一处理高维度的方法,并且具有最好的FDR-TPR权衡。
In multivariate regression, when covariates are numerous, it is often reasonable to assume that only a small number of them has predictive information. In some medical applications for instance, it is believed that only a few genes out of thousands are responsible for cancers. In that case, the aim is not only to propose a good fit, but also to select the relevant covariates (genes). We propose to perform model selection with additive models in high dimensions (sample size and number of covariates). Our approach is computationally efficient thanks to fast wavelet transforms, it does not rely on cross validation, and it solves a convex optimization problem for a prescribed penalty parameter, called the quantile universal threshold. We also propose a second rule based on Stein unbiased risk estimation geared towards prediction. We use Monte Carlo simulations and real data to compare various methods based on false discovery rate (FDR), true positive rate (TPR) and mean squared error. Our approach is the only one to handle high dimensions, and has the best FDR--TPR trade-off.