论文标题
计算内部字典匹配中的不同模式
Counting Distinct Patterns in Internal Dictionary Matching
论文作者
论文摘要
我们考虑预处理$ n $的文本$ t $的问题和字典$ \ nathcal {d} $,以便能够有效地回答查询$ countdistinct(i,j)$,即给定$ i $和$ j $返回$ \ \ m nathcal {d} $的模式数量的问题。 J] $。字典是内部的,从某种意义上说,$ \ Mathcal {d} $中的每个模式都是$ t $的片段。这样,字典将空间与模式数量成正比$ d = | \ mathcal {d} | $而不是其总长度,这可能是$θ(n \ cdot d)$。 $ \ tilde {\ Mathcal {o}}(n+d)$ - 大小数据结构,该大小数据结构(I,i,j)$ QUERIES $ \ MATHCAL {o}(\ log n)$ - 大约在$ \ \ tilde {\ tilde {\ tilde {\ mathcal {o}}(o)$最近介绍了piNcess in Indist at in Indifal pornifation。在这里,我们提出了一个$ \ tilde {\ Mathcal {o}}(n+d)$ - 大小数据结构,该数据结构回答$ countdistinct(i,j)$ queries $ 2 $ 2 $ - 以$ \ tilde {\ Mathcal {\ Mathcal {o}}}(1)$时间。使用范围查询,对于任何$ m $,我们给出了$ \ tilde {\ mathcal {o}}}(\ min(nd/m,n^2/m^2)+d)$ - 大小数据结构,该数据结构回答$ countdistinct(i,j)$ QUERIES $ cOUST $ \ tilde in $ \ tilde {\ tilde {\ nathcal {\ nathcal {o)当字典由字符串的所有平方因子组成时,我们还考虑了特殊情况。我们设计了一个$ \ MATHCAL {O}(n \ log^2 n)$ - 大小数据结构,该结构使我们能够在文本片段中计数不同的正方形$ t [i \ mathinner {。
We consider the problem of preprocessing a text $T$ of length $n$ and a dictionary $\mathcal{D}$ in order to be able to efficiently answer queries $CountDistinct(i,j)$, that is, given $i$ and $j$ return the number of patterns from $\mathcal{D}$ that occur in the fragment $T[i \mathinner{.\,.} j]$. The dictionary is internal in the sense that each pattern in $\mathcal{D}$ is given as a fragment of $T$. This way, the dictionary takes space proportional to the number of patterns $d=|\mathcal{D}|$ rather than their total length, which could be $Θ(n\cdot d)$. An $\tilde{\mathcal{O}}(n+d)$-size data structure that answers $CountDistinct(i,j)$ queries $\mathcal{O}(\log n)$-approximately in $\tilde{\mathcal{O}}(1)$ time was recently proposed in a work that introduced internal dictionary matching [ISAAC 2019]. Here we present an $\tilde{\mathcal{O}}(n+d)$-size data structure that answers $CountDistinct(i,j)$ queries $2$-approximately in $\tilde{\mathcal{O}}(1)$ time. Using range queries, for any $m$, we give an $\tilde{\mathcal{O}}(\min(nd/m,n^2/m^2)+d)$-size data structure that answers $CountDistinct(i,j)$ queries exactly in $\tilde{\mathcal{O}}(m)$ time. We also consider the special case when the dictionary consists of all square factors of the string. We design an $\mathcal{O}(n \log^2 n)$-size data structure that allows us to count distinct squares in a text fragment $T[i \mathinner{.\,.} j]$ in $\mathcal{O}(\log n)$ time.