论文标题
通过元学习自动化异常值检测
Automating Outlier Detection via Meta-Learning
论文作者
论文摘要
如果在新数据集上进行了无监督的离群检测(OD)任务,我们如何自动选择一个好的离群检测方法及其超参数(S)(统称为模型)?到目前为止,OD的模型选择一直是“黑色艺术”。由于任何模型评估都是由于缺乏(i)带有标签的数据以及(ii)通用目标函数而不可行的。在这项工作中,我们开发了基于元学习的第一种原则性数据驱动方法,用于对OD的模型选择,称为Metaod。 Metaod利用了现有的异常检测基准数据集对过去的大量检测模型的过去表现,并带有这种先前的经验,以自动选择一个有效模型,而无需使用任何标签。为了捕获任务相似性,我们引入了量化数据集外围特征的专业元功能。通过全面的实验,我们展示了元元素在选择检测模型中的有效性,该检测模型显着优于最流行的异常检测器(例如LOF和IFOREST)以及各种最先进的无监督的元学习者,同时非常快。为了促进对这个新问题的可重复性和进一步研究,我们为整个元学习系统,基准环境和测试床数据集开放源。
Given an unsupervised outlier detection (OD) task on a new dataset, how can we automatically select a good outlier detection method and its hyperparameter(s) (collectively called a model)? Thus far, model selection for OD has been a "black art"; as any model evaluation is infeasible due to the lack of (i) hold-out data with labels, and (ii) a universal objective function. In this work, we develop the first principled data-driven approach to model selection for OD, called MetaOD, based on meta-learning. MetaOD capitalizes on the past performances of a large body of detection models on existing outlier detection benchmark datasets, and carries over this prior experience to automatically select an effective model to be employed on a new dataset without using any labels. To capture task similarity, we introduce specialized meta-features that quantify outlying characteristics of a dataset. Through comprehensive experiments, we show the effectiveness of MetaOD in selecting a detection model that significantly outperforms the most popular outlier detectors (e.g., LOF and iForest) as well as various state-of-the-art unsupervised meta-learners while being extremely fast. To foster reproducibility and further research on this new problem, we open-source our entire meta-learning system, benchmark environment, and testbed datasets.