论文标题
BCFA:CFA规模的定制控制流量分析
BCFA: Bespoke Control Flow Analysis for CFA at Scale
论文作者
论文摘要
许多数据驱动的软件工程任务,例如发现编程模式,采矿API规范等,对控制流程图(CFG)进行量表分析。分析数百万个CFG可能是昂贵的,并且分析的性能在很大程度上取决于基本的CFG遍历策略。最先进的分析框架使用固定的遍历策略。我们认为,单个遍历策略不符合各种分析和CFG,并提出了定制控制流分析(BCFA)。鉴于对照流量分析(CFA)和大量CFG,BCFA为每个CFG选择了最有效的遍历策略。 BCFA通过分析CFA的代码来提取CFA的一组属性,并将其与CFG的属性相结合,例如分支因子和循环性,以选择最佳的遍历策略。我们已经在BOA中实施了BCFA,并使用一组代表性静态分析评估了BCFA,这些分析主要涉及遍历CFG和两个包含2.87亿和1.62亿CFG的大数据集。我们的结果表明,BCFA可以加速大规模分析1%-28%。此外,BCFA的开销低;小于0.2%,较低的错误预测率;小于0.01%。
Many data-driven software engineering tasks such as discovering programming patterns, mining API specifications, etc., perform source code analysis over control flow graphs (CFGs) at scale. Analyzing millions of CFGs can be expensive and performance of the analysis heavily depends on the underlying CFG traversal strategy. State-of-the-art analysis frameworks use a fixed traversal strategy. We argue that a single traversal strategy does not fit all kinds of analyses and CFGs and propose bespoke control flow analysis (BCFA). Given a control flow analysis (CFA) and a large number of CFGs, BCFA selects the most efficient traversal strategy for each CFG. BCFA extracts a set of properties of the CFA by analyzing the code of the CFA and combines it with properties of the CFG, such as branching factor and cyclicity, for selecting the optimal traversal strategy. We have implemented BCFA in Boa, and evaluated BCFA using a set of representative static analyses that mainly involve traversing CFGs and two large datasets containing 287 thousand and 162 million CFGs. Our results show that BCFA can speedup the large scale analyses by 1%-28%. Further, BCFA has low overheads; less than 0.2%, and low misprediction rate; less than 0.01%.