一组字符串的大约最长常见子序列的数据结构

论文标题

一组字符串的大约最长常见子序列的数据结构

A Data-Structure for Approximate Longest Common Subsequence of A Set of Strings

论文作者

Aghamolaei, Sepideh

论文摘要

给定一组$ k $弦乐$ i $，它们最长的常见子序列（LCS）是最大长度的字符串，是$ i $中所有字符串的子集。此问题的数据结构将$ i $用于数据结构，以便可以更快地计算出一组查询字符串$ q $的LCS。由于该问题是任意$ K $的NP-HARD，因此我们允许一个错误，该错误允许某些字符由其他字符替换。我们用额外的输入$ m $来定义问题的近似版本，即描述输入的正则表达式（REGEX）的长度，近似因素是算法中返回的可能性的对数，由算法返回的可能性，除以对数Regex，由最小值数量分配。然后，我们使用树数据结构来实现Sublrinear-Pime LCS查询。我们还解释了如何将这个想法扩展到最长增加的子序列（LIS）问题。

Given a set of $k$ strings $I$, their longest common subsequence (LCS) is the string with the maximum length that is a subset of all the strings in $I$. A data-structure for this problem preprocesses $I$ into a data-structure such that the LCS of a set of query strings $Q$ with the strings of $I$ can be computed faster. Since the problem is NP-hard for arbitrary $k$, we allow an error that allows some characters to be replaced by other characters. We define the approximation version of the problem with an extra input $m$, which is the length of the regular expression (regex) that describes the input, and the approximation factor is the logarithm of the number of possibilities in the regex returned by the algorithm, divided by the logarithm regex with the minimum number of possibilities. Then, we use a tree data-structure to achieve sublinear-time LCS queries. We also explain how the idea can be extended to the longest increasing subsequence (LIS) problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题