河马：用舞台树驯服深度学习的超参数优化

论文标题

河马：用舞台树驯服深度学习的超参数优化

Hippo: Taming Hyper-parameter Optimization of Deep Learning with Stage Trees

论文作者

Shin, Ahnjae, Kim, Do Yoon, Jeong, Joo Seong, Chun, Byung-Gon

论文摘要

高参数优化对于将深度学习模型的准确性推向其极限至关重要。一项被称为研究的高参数优化工作涉及使用不同训练旋钮进行训练模型的大量试验，因此非常重量，通常需要数小时和几天才能完成。我们观察到，通过高参数优化算法发出的试验通常具有共同的高参数序列前缀。基于此观察结果，我们提出了HIPPO，这是一种超参数优化系统，可消除训练过程中的冗余，以大大减少总体计算量。 Hippo并没有像现有的超参数优化系统那样独立执行每个试验，而是将高参数序列分为阶段，并合并常见阶段以形成一阶段（称为舞台树），然后在分布式GPU服务器环境上每棵树执行一次阶段。 HIPPO不仅适用于单个研究，而且适用于多研究场景，在该方案中，可以将相同模型和搜索空间的多次研究作为阶段树提出。评估表明，河马基于阶段的执行策略优于基于试验的方法，例如多种模型和超参数优化算法，减少GPU小时和端到端训练时间的方法。

Hyper-parameter optimization is crucial for pushing the accuracy of a deep learning model to its limits. A hyper-parameter optimization job, referred to as a study, involves numerous trials of training a model using different training knobs, and therefore is very computation-heavy, typically taking hours and days to finish. We observe that trials issued from hyper-parameter optimization algorithms often share common hyper-parameter sequence prefixes. Based on this observation, we propose Hippo, a hyper-parameter optimization system that removes redundancy in the training process to reduce the overall amount of computation significantly. Instead of executing each trial independently as in existing hyper-parameter optimization systems, Hippo breaks down the hyper-parameter sequences into stages and merges common stages to form a tree of stages (called a stage-tree), then executes a stage once per tree on a distributed GPU server environment. Hippo is applicable to not only single studies, but multi-study scenarios as well, where multiple studies of the same model and search space can be formulated as trees of stages. Evaluations show that Hippo's stage-based execution strategy outperforms trial-based methods such as Ray Tune for several models and hyper-parameter optimization algorithms, reducing GPU-hours and end-to-end training time significantly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题