论文标题

POTUS:数据流处理系统的预测在线元组计划

POTUS: Predictive Online Tuple Scheduling for Data Stream Processing Systems

论文作者

Huang, Xi, Shao, Ziyu, Yang, Yang

论文摘要

大多数在线服务提供商在云中部署自己的数据流处理系统,以进行大规模和实时数据分析。但是,这样的系统,例如,Apache Heron通常采用幼稚的调度方案在处理实例中分配数据流(以元组为单位),这可能会导致工作负载失衡和系统中断。因此,数据流的时间变化与这种僵化的调度方案设计之间仍然存在不匹配。此外,对数据流处理系统进行预测时间安排的基本好处也尚未探索。在本文中,我们将重点介绍了Apache Heron中预测服务的元组调度问题。通过在系统建模和决策制定的粒度上进行仔细的选择,我们将问题作为随机网络优化问题提出,并提出POTUS,Potus是一种在线预测调度方案,旨在通过以分布式方式来最大程度地减少数据流处理的响应时间。理论分析和仿真结果表明,POTUS具有带有队列稳定性保证的超低响应时间。此外,POTUS仅需要温和的未来信息价值,即使在错误的预测中,POTUS也需要有效地减少响应时间。

Most online service providers deploy their own data stream processing systems in the cloud to conduct large-scale and real-time data analytics. However, such systems, e.g., Apache Heron, often adopt naive scheduling schemes to distribute data streams (in the units of tuples) among processing instances, which may result in workload imbalance and system disruption. Hence, there still exists a mismatch between the temporal variations of data streams and such inflexible scheduling scheme designs. Besides, the fundamental benefits of predictive scheduling to data stream processing systems also remain unexplored. In this paper, we focus on the problem of tuple scheduling with predictive service in Apache Heron. With a careful choice in the granularity of system modeling and decision making, we formulate the problem as a stochastic network optimization problem and propose POTUS, an online predictive scheduling scheme that aims to minimize the response time of data stream processing by steering data streams in a distributed fashion. Theoretical analysis and simulation results show that POTUS achieves an ultra-low response time with queue stability guarantee. Moreover, POTUS only requires mild-value of future information to effectively reduce the response time, even with mis-prediction.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源