计算内部字典匹配中的不同模式

论文标题

计算内部字典匹配中的不同模式

Counting Distinct Patterns in Internal Dictionary Matching

论文作者

Charalampopoulos, Panagiotis, Kociumaka, Tomasz, Mohamed, Manal, Radoszewski, Jakub, Rytter, Wojciech, Straszyński, Juliusz, Waleń, Tomasz, Zuba, Wiktor

论文摘要

我们考虑预处理$ n $的文本$ t $的问题和字典$ \ nathcal {d} $，以便能够有效地回答查询$ countdistinct（i，j）$，即给定$ i $和$ j $返回$ \ \ m nathcal {d} $的模式数量的问题。 J] $。字典是内部的，从某种意义上说，$ \ Mathcal {d} $中的每个模式都是$ t $的片段。这样，字典将空间与模式数量成正比$ d = | \ mathcal {d} | $而不是其总长度，这可能是$θ（n \ cdot d）$。 $ \ tilde {\ Mathcal {o}}（n+d）$ - 大小数据结构，该大小数据结构（I，i，j）$ QUERIES $ \ MATHCAL {o}（\ log n）$ - 大约在$ \ \ tilde {\ tilde {\ tilde {\ mathcal {o}}（o）$最近介绍了piNcess in Indist at in Indifal pornifation。在这里，我们提出了一个$ \ tilde {\ Mathcal {o}}（n+d）$ - 大小数据结构，该数据结构回答$ countdistinct（i，j）$ queries $ 2 $ 2 $ - 以$ \ tilde {\ Mathcal {\ Mathcal {o}}}（1）$时间。使用范围查询，对于任何$ m $，我们给出了$ \ tilde {\ mathcal {o}}}（\ min（nd/m，n^2/m^2）+d）$ - 大小数据结构，该数据结构回答$ countdistinct（i，j）$ QUERIES $ cOUST $ \ tilde in $ \ tilde {\ tilde {\ nathcal {\ nathcal {o）当字典由字符串的所有平方因子组成时，我们还考虑了特殊情况。我们设计了一个$ \ MATHCAL {O}（n \ log^2 n）$ - 大小数据结构，该结构使我们能够在文本片段中计数不同的正方形$ t [i \ mathinner {。

We consider the problem of preprocessing a text $T$ of length $n$ and a dictionary $\mathcal{D}$ in order to be able to efficiently answer queries $CountDistinct(i,j)$, that is, given $i$ and $j$ return the number of patterns from $\mathcal{D}$ that occur in the fragment $T[i \mathinner{.\,.} j]$. The dictionary is internal in the sense that each pattern in $\mathcal{D}$ is given as a fragment of $T$. This way, the dictionary takes space proportional to the number of patterns $d=|\mathcal{D}|$ rather than their total length, which could be $Θ(n\cdot d)$. An $\tilde{\mathcal{O}}(n+d)$-size data structure that answers $CountDistinct(i,j)$ queries $\mathcal{O}(\log n)$-approximately in $\tilde{\mathcal{O}}(1)$ time was recently proposed in a work that introduced internal dictionary matching [ISAAC 2019]. Here we present an $\tilde{\mathcal{O}}(n+d)$-size data structure that answers $CountDistinct(i,j)$ queries $2$-approximately in $\tilde{\mathcal{O}}(1)$ time. Using range queries, for any $m$, we give an $\tilde{\mathcal{O}}(\min(nd/m,n^2/m^2)+d)$-size data structure that answers $CountDistinct(i,j)$ queries exactly in $\tilde{\mathcal{O}}(m)$ time. We also consider the special case when the dictionary consists of all square factors of the string. We design an $\mathcal{O}(n \log^2 n)$-size data structure that allows us to count distinct squares in a text fragment $T[i \mathinner{.\,.} j]$ in $\mathcal{O}(\log n)$ time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题