海啸：相关数据和偏斜工作负载的多维多维指数

论文标题

海啸：相关数据和偏斜工作负载的多维多维指数

Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads

论文作者

Ding, Jialin, Nathan, Vikram, Alizadeh, Mohammad, Kraska, Tim

论文摘要

基于谓词过滤数据是任何现代数据仓库最基本的操作之一。加速过滤器表达式执行的技术包括聚类索引，专业排序订单（例如Z order），多维索引，以及对于高选择性查询，辅助索引。但是，这些方案很难调节，它们的性能不一致。关于学到的多维索引的最新工作引入了自动优化特定数据集和工作负载索引的想法。但是，在存在相关数据和偏斜的查询工作负载的情况下，该工作的性能遭受了损害，这两者在实际应用中很常见。在本文中，我们介绍了海啸，它解决了这些限制，以比现有的多维多维指数高达6倍的查询性能和高达8倍的索引尺寸，此外，除了最高11倍的查询性能和比最佳调整的传统指数要比最高11倍的查询性能和170倍的索引尺寸。

Filtering data based on predicates is one of the most fundamental operations for any modern data warehouse. Techniques to accelerate the execution of filter expressions include clustered indexes, specialized sort orders (e.g., Z-order), multi-dimensional indexes, and, for high selectivity queries, secondary indexes. However, these schemes are hard to tune and their performance is inconsistent. Recent work on learned multi-dimensional indexes has introduced the idea of automatically optimizing an index for a particular dataset and workload. However, the performance of that work suffers in the presence of correlated data and skewed query workloads, both of which are common in real applications. In this paper, we introduce Tsunami, which addresses these limitations to achieve up to 6X faster query performance and up to 8X smaller index size than existing learned multi-dimensional indexes, in addition to up to 11X faster query performance and 170X smaller index size than optimally-tuned traditional indexes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题