论文标题

使用多目标优化来增强云数据分析

Boosting Cloud Data Analytics using Multi-Objective Optimization

论文作者

Song, Fei, Zaouk, Khaled, Lyu, Chenghao, Sinha, Arnab, Fan, Qi, Diao, Yanlei, Shenoy, Prashant

论文摘要

云中的数据分析已成为企业业务不可或缺的一部分。但是,大数据分析系统仍然缺乏对任务的用户绩效目标和预算限制的能力,该目标集体称为任务目标,并且会自动配置分析作业以实现这些目标。本文提出了一个数据分析优化器,该优化器可以自动确定具有适当数量的内核以及最能满足任务目标的其他系统参数的群集配置。我们工作的核心是一种原则性的多目标优化(MOO)方法,该方法计算帕累托最佳的作业配置集以揭示不同用户目标之间的权衡,建议一种新的工作配置,可以最好地探索此类交易,并采用新颖的优化来在几秒钟内启用此类建议。我们根据渐进式边界的概念提出有效的增量算法,以实现我们的MOO方法并将其实现为基于火花的原型。使用基准工作负载进行的详细实验表明,我们的MOO技术在现有MOO方法上提供了2-50倍的加速,同时提供了帕累托前沿的良好覆盖范围。与OtterTune相比,我们的方法是一种最先进的性能调整系统时,我们的方法建议将TPCX-BB基准测试的运行时间减少26 \%-49 \%的配置,同时适应多个目标的不同应用程序偏好。

Data analytics in the cloud has become an integral part of enterprise businesses. Big data analytics systems, however, still lack the ability to take user performance goals and budgetary constraints for a task, collectively referred to as task objectives, and automatically configure an analytic job to achieve these objectives. This paper presents a data analytics optimizer that can automatically determine a cluster configuration with a suitable number of cores as well as other system parameters that best meet the task objectives. At a core of our work is a principled multi-objective optimization (MOO) approach that computes a Pareto optimal set of job configurations to reveal tradeoffs between different user objectives, recommends a new job configuration that best explores such tradeoffs, and employs novel optimizations to enable such recommendations within a few seconds. We present efficient incremental algorithms based on the notion of a Progressive Frontier for realizing our MOO approach and implement them into a Spark-based prototype. Detailed experiments using benchmark workloads show that our MOO techniques provide a 2-50x speedup over existing MOO methods, while offering good coverage of the Pareto frontier. When compared to Ottertune, a state-of-the-art performance tuning system, our approach recommends configurations that yield 26\%-49\% reduction of running time of the TPCx-BB benchmark while adapting to different application preferences on multiple objectives.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源