论文标题
重新审视重叠的链接
Chaining with overlaps revisited
论文作者
论文摘要
链算法旨在基于一组锚定局部比对作为输入形成两个序列的半全球比对。根据链条的优化标准和确切的定义,有几种$ O(n \ log n)$ time算法可以最佳地解决此问题,其中$ n $是输入锚的数量。 在本文中,我们专注于公式,使锚在链中重叠。 Shibuya和Kurochin研究了这种表述(Wabi 2003),但它们的算法没有正确的证明。我们重新访问并修改其算法,以考虑锚定优先级关系的严格定义,并添加了所需的推导,以说服由$ O(n \ log^2 n)$时间在锚定上以固定匹配形成的锚定时间的正确性。由于Shibuya和Kurochin考虑的对优先关系关系的定义更加轻松,或者当锚点不固定时,例如统一长度($ K $ -MERS)的匹配项,该算法将$ O(N \ log n)$时间带走。 我们还建立了与重叠的链接与所研究的最长常见子序列(LCS)问题之间的联系。
Chaining algorithms aim to form a semi-global alignment of two sequences based on a set of anchoring local alignments as input. Depending on the optimization criteria and the exact definition of a chain, there are several $O(n \log n)$ time algorithms to solve this problem optimally, where $n$ is the number of input anchors. In this paper, we focus on a formulation allowing the anchors to overlap in a chain. This formulation was studied by Shibuya and Kurochin (WABI 2003), but their algorithm comes with no proof of correctness. We revisit and modify their algorithm to consider a strict definition of precedence relation on anchors, adding the required derivation to convince on the correctness of the resulting algorithm that runs in $O(n \log^2 n)$ time on anchors formed by exact matches. With the more relaxed definition of precedence relation considered by Shibuya and Kurochin or when anchors are non-nested such as matches of uniform length ($k$-mers), the algorithm takes $O(n \log n)$ time. We also establish a connection between chaining with overlaps to the widely studied longest common subsequence (LCS) problem.