论文标题
字符串到图形近似匹配的复杂性问题
Complexity Issues of String to Graph Approximate Matching
论文作者
论文摘要
将查询字符串与有向图匹配的问题(其顶点都由字符串标记)在不同字段中的应用,从数据挖掘到计算生物学。已经考虑了问题的几种变体,具体取决于匹配是精确或近似的事实,在后一种情况下,考虑了编辑操作以及允许的何处。在本文中,我们介绍了近似匹配问题的复杂性的结果,其中编辑操作是符号替换,仅在图形标签上或图形标签和查询字符串上允许。我们介绍了一个问题的变体,该变体询问图表中是否存在一个代表带有任何数量编辑操作的查询字符串的路径,并且我们显示的是NP complete,即使标签具有长度为一,字母表为二进制。此外,当它通过输入字符串的长度和图形标签的长度进行参数化时,我们表明该问题是固定参数可拖动的,并且不太可能接受多项式内核。当仅在图形标签上允许编辑操作时,此问题的NP完整性会导致近似匹配的任何因素(在任何因素内)。此外,我们表明,当参数是编辑操作的数量时,即使对于与DAG距离的距离,我们所考虑的近似字符串匹配与图形的变体不是固定参数可进行的。后一个结果的降低使我们能够证明该变体的不合适性,其中可以在查询字符串和图形标签上同时应用编辑操作。
The problem of matching a query string to a directed graph, whose vertices are labeled by strings, has application in different fields, from data mining to computational biology. Several variants of the problem have been considered, depending on the fact that the match is exact or approximate and, in this latter case, which edit operations are considered and where are allowed. In this paper we present results on the complexity of the approximate matching problem, where edit operations are symbol substitutions and are allowed only on the graph labels or both on the graph labels and the query string. We introduce a variant of the problem that asks whether there exists a path in a graph that represents a query string with any number of edit operations and we show that is is NP-complete, even when labels have length one and in the case the alphabet is binary. Moreover, when it is parameterized by the length of the input string and graph labels have length one, we show that the problem is fixed-parameter tractable and it is unlikely to admit a polynomial kernel. The NP-completeness of this problem leads to the inapproximability (within any factor) of the approximate matching when edit operations are allowed only on the graph labels. Moreover, we show that the variants of approximate string matching to graph we consider are not fixed-parameter tractable, when the parameter is the number of edit operations, even for graphs that have distance one from a DAG. The reduction for this latter result allows us to prove the inapproximability of the variant where edit operations can be applied both on the query string and on graph labels.