论文标题
无监督的异常检测用于离散序列医疗数据
Unsupervised anomaly detection for discrete sequence healthcare data
论文作者
论文摘要
医疗保健中的欺诈行为普遍存在,因为医生可以开出不必要的治疗以增加账单。保险公司希望发现这些异常的欺诈性账单并减少损失。传统的欺诈检测方法使用专家规则和手动数据处理。 最近,机器学习技术可以使此过程自动化,但是手工标记的数据非常昂贵,通常是过时的。我们提出了一种机器学习模型,该模型以一种无监督的方式自动化欺诈检测。两种深度学习方法包括用于预测的LSTM神经网络下一次患者访问和SEQ2SEQ模型。为了使产生的异常得分的归一化,我们提出了经验分布函数(EDF)方法。因此,该算法可以解决高级失衡问题。 我们使用有关Allianz Company的患者访问数据序列的真实数据进行验证。这些模型为医疗保健中欺诈检测的无监督异常检测提供了最先进的结果。我们的EDF方法进一步提高了LSTM模型的质量。
Fraud in healthcare is widespread, as doctors could prescribe unnecessary treatments to increase bills. Insurance companies want to detect these anomalous fraudulent bills and reduce their losses. Traditional fraud detection methods use expert rules and manual data processing. Recently, machine learning techniques automate this process, but hand-labeled data is extremely costly and usually out of date. We propose a machine learning model that automates fraud detection in an unsupervised way. Two deep learning approaches include LSTM neural network for prediction next patient visit and a seq2seq model. For normalization of produced anomaly scores, we propose Empirical Distribution Function (EDF) approach. So, the algorithm works with high class imbalance problems. We use real data on sequences of patients' visits data from Allianz company for the validation. The models provide state-of-the-art results for unsupervised anomaly detection for fraud detection in healthcare. Our EDF approach further improves the quality of LSTM model.