VULBERTA：简化的源代码预先培训漏洞检测

论文标题

VULBERTA：简化的源代码预先培训漏洞检测

VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection

论文作者

Hanif, Hazim, Maffeis, Sergio

论文摘要

本文介绍了Vulberta，这是一种深入学习方法，可检测源代码中的安全漏洞。我们的方法在开源C/C/C ++项目中使用定制令牌的Roberta模型预先使用自定义令状管道。该模型了解了代码语法和语义的深刻知识表示，我们将利用这些语义来培训漏洞检测分类器。我们评估了几个数据集（vuldeepecker，draper，揭示和muvuldeepecker）和基准测试（codexglue和d2a）的二进制和多类漏洞检测任务的方法。评估结果表明，尽管具有概念上的简单性，但Vulberta实现了最新的性能，并且在不同数据集的现有方法上都胜过现有的方法，并且在培训数据的大小和模型参数数量方面成本有限。

This paper presents VulBERTa, a deep learning approach to detect security vulnerabilities in source code. Our approach pre-trains a RoBERTa model with a custom tokenisation pipeline on real-world code from open-source C/C++ projects. The model learns a deep knowledge representation of the code syntax and semantics, which we leverage to train vulnerability detection classifiers. We evaluate our approach on binary and multi-class vulnerability detection tasks across several datasets (Vuldeepecker, Draper, REVEAL and muVuldeepecker) and benchmarks (CodeXGLUE and D2A). The evaluation results show that VulBERTa achieves state-of-the-art performance and outperforms existing approaches across different datasets, despite its conceptual simplicity, and limited cost in terms of size of training data and number of model parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题