论文标题

隔断的布鲁姆过滤器的案例

A Case for Partitioned Bloom Filters

论文作者

Almeida, Paulo Sérgio

论文摘要

在分区的Bloom过滤器中,$ M $ $ BIT矢量分为$ K $ diScoint $ m/k $大小的零件,每个哈希函数一个。与硬件设计相反,它们占上风,软件实现主要采用标准的Bloom过滤器,因为由于较大的误报率(FPR),因此分区过滤器稍差一些。在本文中,通过执行深入分析,首先我们表明标准BLOOM过滤器的FPR优势小于思想。更重要的是,通过研究每个元素FPR,我们表明标准的Bloom过滤器在域中的斑点较弱:将被测试的元素比误报频率要比预期的要频繁得多。这在针对许多过滤器(例如,数据包转发)测试的情况下是相关的。此外,如果使用天真的双重哈希,标准的绽放过滤器很容易表现出极低的斑点,这是几个,甚至是主流的库中发生的。分区的布鲁姆过滤器在域上表现出FPR的均匀分布,并且对双重哈希的天真使用,没有较弱的斑点。最后,通过调查测试集成员资格以外的几种用法,我们指出了具有截然相关部件的许多优点:可以单独采样,提取,添加或退休,从而为例如SIMD使用,尺寸减小,设置差异性或在流中的重复检测提供出色的设计。分区的布卢姆过滤器更好,应该在一般目的库中替换标准形式,也应该作为新颖设计的基础。

In a partitioned Bloom Filter the $m$ bit vector is split into $k$ disjoint $m/k$ sized parts, one per hash function. Contrary to hardware designs, where they prevail, software implementations mostly adopt standard Bloom filters, considering partitioned filters slightly worse, due to the slightly larger false positive rate (FPR). In this paper, by performing an in-depth analysis, first we show that the FPR advantage of standard Bloom filters is smaller than thought; more importantly, by studying the per-element FPR, we show that standard Bloom filters have weak spots in the domain: elements which will be tested as false positives much more frequently than expected. This is relevant in scenarios where an element is tested against many filters, e.g., in packet forwarding. Moreover, standard Bloom filters are prone to exhibit extremely weak spots if naive double hashing is used, something occurring in several, even mainstream, libraries. Partitioned Bloom filters exhibit a uniform distribution of the FPR over the domain and are robust to the naive use of double hashing, having no weak spots. Finally, by surveying several usages other than testing set membership, we point out the many advantages of having disjoint parts: they can be individually sampled, extracted, added or retired, leading to superior designs for, e.g., SIMD usage, size reduction, test of set disjointness, or duplicate detection in streams. Partitioned Bloom filters are better, and should replace the standard form, both in general purpose libraries and as the base for novel designs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源