两步方法来改善Android恶意软件探测器的性能

论文标题

两步方法来改善Android恶意软件探测器的性能

A two-steps approach to improve the performance of Android malware detectors

论文作者

Daoudi, Nadia, Allix, Kevin, Bissyandé, Tegawendé F., Klein, Jacques

论文摘要

Android OS的受欢迎程度使其成为恶意软件开发人员的目标。为了逃避检测，包括基于ML的技术，攻击者投资于创建与合法应用程序非常相似的恶意软件。在本文中，我们提出了指导性重新培训，这是一种基于监督的表示的基于学习的方法，可提高恶意软件探测器的性能。首先，数据集分为“简单”和“困难”样本，在这种样本中，困难与恶意软件检测器所产生的预测概率相关联：对于困难的样本，概率使得分类器对预测的误差率没有信心。然后，我们将指导性的再培训方法应用于困难样本以改善其分类。对于“简单”样本的子集，基本恶意软件检测器用于做出最终预测，因为该子集上的错误率较低。对于“困难”样本的子集，我们依赖于指导的重新培训，这利用了正确的预测和基本恶意软件检测器的错误来指导重新培训过程。指导再培训的重点是困难的样本：它使用监督的对比学习学习了这些样品的新嵌入，并训练辅助分类器以进行最终预测。我们使用超过265K恶意软件和良性应用程序验证了四种最先进的Android恶意软件检测方法的方法，我们证明，导率的重新培训可以减少多达40.41％的恶意软件探测器预测错误。我们的方法是通用的，旨在增强二进制分类任务上的分类性能。因此，它可以应用于Android恶意软件检测之外的其他分类问题。

The popularity of Android OS has made it an appealing target to malware developers. To evade detection, including by ML-based techniques, attackers invest in creating malware that closely resemble legitimate apps. In this paper, we propose GUIDED RETRAINING, a supervised representation learning-based method that boosts the performance of a malware detector. First, the dataset is split into "easy" and "difficult" samples, where difficulty is associated to the prediction probabilities yielded by a malware detector: for difficult samples, the probabilities are such that the classifier is not confident on the predictions, which have high error rates. Then, we apply our GUIDED RETRAINING method on the difficult samples to improve their classification. For the subset of "easy" samples, the base malware detector is used to make the final predictions since the error rate on that subset is low by construction. For the subset of "difficult" samples, we rely on GUIDED RETRAINING, which leverages the correct predictions and the errors made by the base malware detector to guide the retraining process. GUIDED RETRAINING focuses on the difficult samples: it learns new embeddings of these samples using Supervised Contrastive Learning and trains an auxiliary classifier for the final predictions. We validate our method on four state-of-the-art Android malware detection approaches using over 265k malware and benign apps, and we demonstrate that GUIDED RETRAINING can reduce up to 40.41% prediction errors made by the malware detectors. Our method is generic and designed to enhance the classification performance on a binary classification task. Consequently, it can be applied to other classification problems beyond Android malware detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题