论文标题

组件的自适应修剪适合鲁棒混合物回归

Component-wise Adaptive Trimming For Robust Mixture Regression

论文作者

Chang, Wennan, Zhou, Xinyu, Zang, Yong, Zhang, Chi, Cao, Sha

论文摘要

使用预期最大化(EM)算法对混合回归模型的参数估计对异常值高度敏感。在这里,我们提出了一种快速有效的鲁棒混合回归算法,称为组件适应性修剪(CAT)方法。我们考虑同时进行离群值检测和鲁棒参数估计,以最大程度地减少异常污染的影响。强大的混合回归具有许多重要的应用,包括在人类癌症基因组学数据中,其中人群通常会显示出不必要的技术扰动添加的强​​大异质性。现有的强大混合回归方法患有异常值,因为它们要么在异常值的存在下进行参数估计,要么依赖于异常值污染水平的先验知识。 CAT是在分类期望最大化的框架中实施的,根据该框架,可以得出对异常值的自然定义。它在每个独家混合组件中实现了至少修剪的正方形(LTS)方法,其中鲁棒性问题可以从混合案例转换为简单的线性回归案例。 LTS方法的高分解点使我们避免了修剪参数的预先指定。与现有的多种算法相比,在模拟数据和实际基因组数据的不同情况下,CAT是最有竞争力的算法,可以处理和适应性的离群值以及重型尾噪声。 CAT已在Cran中可用的R软件包“ RobMixReg”中实现。

Parameter estimation of mixture regression model using the expectation maximization (EM) algorithm is highly sensitive to outliers. Here we propose a fast and efficient robust mixture regression algorithm, called Component-wise Adaptive Trimming (CAT) method. We consider simultaneous outlier detection and robust parameter estimation to minimize the effect of outlier contamination. Robust mixture regression has many important applications including in human cancer genomics data, where the population often displays strong heterogeneity added by unwanted technological perturbations. Existing robust mixture regression methods suffer from outliers as they either conduct parameter estimation in the presence of outliers, or rely on prior knowledge of the level of outlier contamination. CAT was implemented in the framework of classification expectation maximization, under which a natural definition of outliers could be derived. It implements a least trimmed squares (LTS) approach within each exclusive mixing component, where the robustness issue could be transformed from the mixture case to simple linear regression case. The high breakdown point of the LTS approach allows us to avoid the pre-specification of trimming parameter. Compared with multiple existing algorithms, CAT is the most competitive one that can handle and adaptively trim off outliers as well as heavy tailed noise, in different scenarios of simulated data and real genomic data. CAT has been implemented in an R package `RobMixReg' available in CRAN.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源