论文标题
评估现实生活中高度不平衡在线信用卡付款数据集的重新采样方法
Evaluating resampling methods on a real-life highly imbalanced online credit card payments dataset
论文作者
论文摘要
基于机器学习的任何信用卡欺诈检测的各种问题来自事务数据集的不平衡方面。实际上,与常规交易数量相比,欺诈的数量很小,已被证明会损害学习表现,例如,最坏的情况下,算法可以学会将所有交易分类为常规。已知重新采样方法和成本敏感的方法是利用这一问题不平衡数据集的好候选者。本文评估了大型现实生活中的在线信用卡支付数据集上的许多最先进的重采样方法。我们表明它们效率低下,因为方法是棘手的,或者是因为指标没有表现出很大的改进。我们的工作有助于(1)中的该域,我们比较了大规模数据集和(2)中我们使用现实生活中的在线信用卡付款数据集的许多最先进的重采样方法。
Various problems of any credit card fraud detection based on machine learning come from the imbalanced aspect of transaction datasets. Indeed, the number of frauds compared to the number of regular transactions is tiny and has been shown to damage learning performances, e.g., at worst, the algorithm can learn to classify all the transactions as regular. Resampling methods and cost-sensitive approaches are known to be good candidates to leverage this issue of imbalanced datasets. This paper evaluates numerous state-of-the-art resampling methods on a large real-life online credit card payments dataset. We show they are inefficient because methods are intractable or because metrics do not exhibit substantial improvements. Our work contributes to this domain in (1) that we compare many state-of-the-art resampling methods on a large-scale dataset and in (2) that we use a real-life online credit card payments dataset.