论文标题

维修语法是斐波那契单词的最小语法

RePair Grammars are the Smallest Grammars for Fibonacci Words

论文作者

Mieno, Takuya, Inenaga, Shunsuke, Horiyama, Takashi

论文摘要

基于语法的压缩是一种无损失的数据压缩方案,它代表仅生成$ W $的无上下文语法代表给定的字符串$ W $。一般而言,虽然计算生成给定的字符串$ W $的最小语法是NP-HARD,但已经提出了许多在实践中运行良好的多项式语法压缩机。 Larsson和Moffat在1999年提出的维修是一种基于语法的压缩机,它递归地代替了字符串中最常见的Bigrams的所有可能发生。由于可以多种选择要替换最常见的大型群岛,因此维修的不同实现可能会导致不同的语法。在本文中,我们表明,最小的语法生成斐波那契单词$ f_k $可以完全以维修为特征,其中$ f_k $表示$ k $ -th fibonacci Word。也就是说,任何维修实施生成的$ f_k $的语法都是$ f_k $的最小语法,并且对于$ f_k $,任何其他语法都不是最小的。据我们所知,斐波那契词是第一个为修复是最佳修复的非平凡的无限弦。

Grammar-based compression is a loss-less data compression scheme that represents a given string $w$ by a context-free grammar that generates only $w$. While computing the smallest grammar which generates a given string $w$ is NP-hard in general, a number of polynomial-time grammar-based compressors which work well in practice have been proposed. RePair, proposed by Larsson and Moffat in 1999, is a grammar-based compressor which recursively replaces all possible occurrences of a most frequently occurring bigrams in the string. Since there can be multiple choices of the most frequent bigrams to replace, different implementations of RePair can result in different grammars. In this paper, we show that the smallest grammars generating the Fibonacci words $F_k$ can be completely characterized by RePair, where $F_k$ denotes the $k$-th Fibonacci word. Namely, all grammars for $F_k$ generated by any implementation of RePair are the smallest grammars for $F_k$, and no other grammars can be the smallest for $F_k$. To the best of our knowledge, Fibonacci words are the first non-trivial infinite family of strings for which RePair is optimal.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源