论文标题
关于人造软件漏洞基准的实证研究
An Empirical Study on Benchmarks of Artificial Software Vulnerabilities
论文作者
论文摘要
最近,已经开发了各种技术(例如,模糊)来进行脆弱性检测。为了评估这些技术,由于缺乏地面真相,社区一直在开发人造脆弱性的基准。但是,人们担心这种脆弱性不能代表现实,并可能导致不可靠和误导的结果。不幸的是,缺乏处理此类问题的研究。 在这项工作中,为了了解这些基准测试的现实程度,我们对三个人工脆弱性基准进行了实证研究-LAVA-M,RODE0DAY和CGC(2669个错误)以及各种现实的记忆腐败漏洞(80 CVE)。此外,我们提出了一个模型来描述记忆浪费漏洞的特性。遵循此模型,我们进行密集的实验和数据分析。我们的分析结果表明,尽管人工基准试图接近现实世界,但它们仍然与现实有很大差异。根据调查结果,我们提出了一系列策略来提高人工基准的质量。
Recently, various techniques (e.g., fuzzing) have been developed for vulnerability detection. To evaluate those techniques, the community has been developing benchmarks of artificial vulnerabilities because of a shortage of ground-truth. However, people have concerns that such vulnerabilities cannot represent reality and may lead to unreliable and misleading results. Unfortunately, there lacks research on handling such concerns. In this work, to understand how close these benchmarks mirror reality, we perform an empirical study on three artificial vulnerability benchmarks - LAVA-M, Rode0day and CGC (2669 bugs) and various real-world memory-corruption vulnerabilities (80 CVEs). Furthermore, we propose a model to depict the properties of memory-corruption vulnerabilities. Following this model, we conduct intensive experiments and data analyses. Our analytic results reveal that while artificial benchmarks attempt to approach the real world, they still significantly differ from reality. Based on the findings, we propose a set of strategies to improve the quality of artificial benchmarks.