论文标题
基于在线聚合的近似查询处理:文献调查
Online Aggregation based Approximate Query Processing: A Literature Survey
论文作者
论文摘要
在当前的世界中,现代组织强烈使用OLAP(在线分析处理)来对数据进行临时分析,从而提供了更好的决策。因此,OLAP的性能至关重要。但是,支持大型数据集的OLAP是昂贵的。提出了近似查询过程(AQP),以有效地计算与确切答案一样近的近似值。现有的AQP技术可以分为两个部分,在线聚合和离线概要生成,每个部分都有其局限性和挑战。基于在线聚合的AQP逐渐生成近似结果,并进行了一些错误估计(即置信区间),直到完成所有数据的处理为止。在基于离线概要的AQP中,使用查询工作负载或数据统计信息等A-Priori知识来离线生成概要。后来,使用这些摘要回答了OLAP查询。本文重点是调查基于在线聚合的AQP。为此,首先,我们讨论了基于在线聚合的AQP中的研究挑战,并总结了解决这些挑战的现有方法。此外,我们还讨论了现有在线聚合机制的优势和局限性。最后,我们讨论了进一步推进在线聚合研究的一些研究挑战和机会。我们的目标是让人们了解基于在线聚合的AQP领域的当前进展,并找到新的见解。
In the current world, OLAP (Online Analytical Processing) is used intensively by modern organizations to perform ad hoc analysis of data, providing insight for better decision making. Thus, the performance for OLAP is crucial; however, it is costly to support OLAP for a large data-set. An approximate query process (AQP) was proposed to efficiently compute approximate values as close as to the exact answer. Existing AQP techniques can be categorized into two parts, online aggregation, and offline synopsis generation, each having its limitations and challenges. Online aggregation-based AQP progressively generates approximate results with some error estimates (i.e., confidence interval) until the processing of all data is done. In Offline synopsis generation-based AQP, synopses are generated offline using a-priori knowledge such as query workload or data statistics. Later, OLAP queries are answered using these synopses. This paper focuses on surveying only the online aggregation-based AQP. For this purpose, firstly, we discuss the research challenges in online aggregation-based AQP and summarize existing approaches to address these challenges. In addition, we also discuss the advantages and limitations of existing online aggregation mechanisms. Lastly, we discuss some research challenges and opportunities for further advancing online aggregation research. Our goal is for people to understand the current progress in the online aggregation-based AQP area and find new insights into it.