论文标题

Neurosketch:通过神经网络对范围骨料查询的快速和近似评估

NeuroSketch: Fast and Approximate Evaluation of Range Aggregate Queries with Neural Networks

论文作者

Zeighami, Sepanta, Shahabi, Cyrus, Sharan, Vatsal

论文摘要

范围汇总查询(RAQ)是许多现实世界应用程序中不可或缺的一部分,通常需要快速而近似的查询答案。最近的工作研究了使用机器学习(ML)模型来回答RAQ的工作,其中学会了数据模型来回答查询。但是,对基于ML的方法以及何时表现良好的理论了解。此外,由于ML接近数据模型,因此它们无法利用任何特定的特定信息以提高实践中的性能。在本文中,我们专注于建模``查询''而不是数据并训练神经网络以学习查询答案。这种焦点的变化使我们能够从理论上研究我们的ML方法,以在回答RAQ时为神经网络绑定的分布和查询依赖错误。我们通过开发Neurosketch(一种神经网络框架来回答RAQ在实践中,确认我们的理论结果。关于现实世界,TPC基准和合成数据集的广泛实验研究表明,Neurosketch回答了RAQ比最新的数量级快,并且具有更好的准确性。

Range aggregate queries (RAQs) are an integral part of many real-world applications, where, often, fast and approximate answers for the queries are desired. Recent work has studied answering RAQs using machine learning (ML) models, where a model of the data is learned to answer the queries. However, there is no theoretical understanding of why and when the ML based approaches perform well. Furthermore, since the ML approaches model the data, they fail to capitalize on any query specific information to improve performance in practice. In this paper, we focus on modeling ``queries'' rather than data and train neural networks to learn the query answers. This change of focus allows us to theoretically study our ML approach to provide a distribution and query dependent error bound for neural networks when answering RAQs. We confirm our theoretical results by developing NeuroSketch, a neural network framework to answer RAQs in practice. Extensive experimental study on real-world, TPC-benchmark and synthetic datasets show that NeuroSketch answers RAQs multiple orders of magnitude faster than state-of-the-art and with better accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源