论文标题

使用最小描述长度聚类在异质数据中的分类异常检测

Categorical anomaly detection in heterogeneous data using minimum description length clustering

论文作者

Cheney, James, Gombau, Xavier, Berrada, Ghita, Benabderrahmane, Sidahmed

论文摘要

已经提出了基于最小描述长度(MDL)原理的分类数据,提出了快速有效的无监督异常检测算法。但是,在检测代表不同来源混合的异质数据集中的异常时,它们可能是无效的,例如系统和用户过程具有不同行为模式的安全场景。我们提出了一个用于增强任何基于MDL的异常检测模型的荟萃分析,以通过将混合模型拟合到数据,通过K-Means聚类的变体将混合模型拟合到数据中来处理异质数据。我们的实验结果表明,使用离散混合模型相对于以前的两种异常检测算法提供了竞争性能,而更复杂的模型的混合物在合成数据集和从安全方案中都带来了进一步的收益。

Fast and effective unsupervised anomaly detection algorithms have been proposed for categorical data based on the minimum description length (MDL) principle. However, they can be ineffective when detecting anomalies in heterogeneous datasets representing a mixture of different sources, such as security scenarios in which system and user processes have distinct behavior patterns. We propose a meta-algorithm for enhancing any MDL-based anomaly detection model to deal with heterogeneous data by fitting a mixture model to the data, via a variant of k-means clustering. Our experimental results show that using a discrete mixture model provides competitive performance relative to two previous anomaly detection algorithms, while mixtures of more sophisticated models yield further gains, on both synthetic datasets and realistic datasets from a security scenario.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源