论文标题
从数据到加速分类的表搜索程序:方法和实用准则
Learning from Data to Speed-up Sorted Table Search Procedures: Methodology and Practical Guidelines
论文作者
论文摘要
排序的表搜索过程是典型的查询工具,其广泛使用情况现在还包括Web应用程序,例如搜索引擎(Google Chrome)和AD BIDDING SYSTEMS(APPNEXUS)。以很少的太空成本来加速他们仍然是一个非常重要的成就。在这里,我们研究了哪种扩展机器学习技术可以通过系统的实验比较来加速,从而通过系统的实验比较分类的表搜索程序,具有不同的数据布局,并在此处开发了他们的同行。我们表征了后者可以利用后者的情况,这是对CPU和GPU计算的占据。我们的方法也有助于对学习数据结构的研究,这是一项最新的建议,以改善基本数据结构的时间/空间性能,例如B-Trees,Hash表,Bloom过滤器。的确,我们还形式化了一种学到的二分法分类表搜索程序的算法范式,该搜索程序自然地补充了这里提出的学习的范围,并且将大多数已知的排序表搜索过程描述为具有“学习阶段”,该过程近似于简单的线性回归。
Sorted Table Search Procedures are the quintessential query-answering tool, with widespread usage that now includes also Web Applications, e.g, Search Engines (Google Chrome) and ad Bidding Systems (AppNexus). Speeding them up, at very little cost in space, is still a quite significant achievement. Here we study to what extend Machine Learning Techniques can contribute to obtain such a speed-up via a systematic experimental comparison of known efficient implementations of Sorted Table Search procedures, with different Data Layouts, and their Learned counterparts developed here. We characterize the scenarios in which those latter can be profitably used with respect to the former, accounting for both CPU and GPU computing. Our approach contributes also to the study of Learned Data Structures, a recent proposal to improve the time/space performance of fundamental Data Structures, e.g., B-trees, Hash Tables, Bloom Filters. Indeed, we also formalize an Algorithmic Paradigm of Learned Dichotomic Sorted Table Search procedures that naturally complements the Learned one proposed here and that characterizes most of the known Sorted Table Search Procedures as having a "learning phase" that approximates Simple Linear Regression.