论文标题
与串联和换能器有效求解的字符串约束(技术报告)
String Constraints with Concatenation and Transducers Solved Efficiently (Technical Report)
论文作者
论文摘要
字符串分析是关于如何通过程序操纵字符串的推理问题。它具有许多应用程序,包括自动检测跨站点脚本(XSS)。流行的字符串分析技术包括符号执行,在其核心使用字符串(约束)求解器上。这样的求解器通常是关于在与串联操作员作为原子约束的串联理论中表达的约束的原因。近年来,研究人员开始认识到将所有替换的操作员和有限转换纳入与串联理论中的重要性。此类字符串操作通常对于Web应用程序中的XSS漏洞的推理至关重要,尤其是对于建模固定函数和隐式浏览器转换(例如InnerHTML)至关重要。 在本文中,我们提供了第一个可以理解涉及串联和有限转移的约束的弦求解器。此外,它具有几个重要片段(例如直线片段)的完整性和终止保证。本文中提出的主要挑战是该理论的最坏情况的复杂性。为此,我们提出了一种利用简洁的有限自动机作为字符串约束的简洁符号表示的方法。交替不仅可以在表示换能器的布尔组合时在太空中节省,而且还可以简要地表示传感器和串联组合。关于AFA语言空虚的推理需要在指数尺寸的图中进行状态空间探索,为此我们使用模型检查算法(例如IC3)。我们已经实施了算法,并证明了其在文献中XSS和其他示例中得出的基准上的功效。
String analysis is the problem of reasoning about how strings are manipulated by a program. It has numerous applications including automatic detection of cross-site scripting (XSS). A popular string analysis technique includes symbolic executions, which at their core use string (constraint) solvers. Such solvers typically reason about constraints expressed in theories over strings with the concatenation operator as an atomic constraint. In recent years, researchers started to recognise the importance of incorporating the replace-all operator and finite transductions in the theories of strings with concatenation. Such string operations are typically crucial for reasoning about XSS vulnerabilities in web applications, especially for modelling sanitisation functions and implicit browser transductions (e.g. innerHTML). In this paper, we provide the first string solver that can reason about constraints involving both concatenation and finite transductions. Moreover, it has a completeness and termination guarantee for several important fragments (e.g. straight-line fragment). The main challenge addressed in the paper is the prohibitive worst-case complexity of the theory. To this end, we propose a method that exploits succinct alternating finite automata as concise symbolic representations of string constraints. Alternation offers not only exponential savings in space when representing Boolean combinations of transducers, but also a possibility of succinct representation of otherwise costly combinations of transducers and concatenation. Reasoning about the emptiness of the AFA language requires a state-space exploration in an exponential-sized graph, for which we use model checking algorithms (e.g. IC3). We have implemented our algorithm and demonstrated its efficacy on benchmarks that are derived from XSS and other examples in the literature.