论文标题
通用鲁棒回归通过最大平均差异
Universal Robust Regression via Maximum Mean Discrepancy
论文作者
论文摘要
许多现代数据集自动收集,因此很容易被异常值污染。这导致了对鲁棒估计的兴趣,包括鲁棒性的新概念,例如对数据的对抗性污染。但是,大多数健壮的估计方法都是为特定模型设计的。值得注意的是,最近提出了许多方法,以在线性模型(或广义线性模型)中获得可靠的估计器,并为非常特定的设置(例如β回归或样本选择模型开发了一些方法)。在本文中,我们基于最大的平均差异最小化开发了一种新的方法,以在任意回归模型中进行稳健估计。我们构建了两个估计器,这些估计量都被证明对Huber型污染具有鲁棒性。我们获得了对他们的一个非反应误差,并表明它对对抗性污染也是可靠的,但是该估计量在实践中使用的计算量在计算上比其他估计值更昂贵。作为我们对拟议估计量的理论分析的副产品,我们为具有独立兴趣的分布的条件均值嵌入而得出了新的结果。
Many modern datasets are collected automatically and are thus easily contaminated by outliers. This led to a regain of interest in robust estimation, including new notions of robustness such as robustness to adversarial contamination of the data. However, most robust estimation methods are designed for a specific model. Notably, many methods were proposed recently to obtain robust estimators in linear models (or generalized linear models), and a few were developed for very specific settings, for example beta regression or sample selection models. In this paper we develop a new approach for robust estimation in arbitrary regression models, based on Maximum Mean Discrepancy minimization. We build two estimators which are both proven to be robust to Huber-type contamination. We obtain a non-asymptotic error bound for one them and show that it is also robust to adversarial contamination, but this estimator is computationally more expensive to use in practice than the other one. As a by-product of our theoretical analysis of the proposed estimators we derive new results on kernel conditional mean embedding of distributions which are of independent interest.