HPC系统上数据驱动的AI模型的超参数优化

论文标题

HPC系统上数据驱动的AI模型的超参数优化

Hyperparameter optimization of data-driven AI models on HPC systems

论文作者

Wulff, Eric, Girone, Maria, Pata, Joosep

论文摘要

在欧洲Exascale计算卓越中心“对Exascale的AI-和基于仿真的工程研究”（COE RISE）（COE RISE），研究人员开发了针对Exascale的新颖，可扩展的AI技术。这项工作练习了高性能计算资源，以使用多个计算节点上的分布式培训来执行大规模的超参数优化。这是Raise在数据驱动的用例中的工作的一部分，该案例利用了项目中开发的AI-和HPC交叉方法。为了应对对可行和资源有效的高参数优化方法的需求，对高级超参数搜索算法进行了基准测试和比较。测试并比较了评估的算法，包括随机搜索，超频带和ASHA。作为一个例子，一种称为MLPF的图形神经网络模型，是为高能物理学中机器学习粒子流重建的任务而开发的，它是优化的基础模型。结果表明，超参数优化显着提高了MLPF的性能，如果不访问大型高性能计算资源，这是不可能的。还表明，就MLPF而言，与贝叶斯优化相结合的ASHA算法可为所研究算法所花费的每个计算资源增长最大的性能提高。

In the European Center of Excellence in Exascale computing "Research on AI- and Simulation-Based Engineering at Exascale" (CoE RAISE), researchers develop novel, scalable AI technologies towards Exascale. This work exercises High Performance Computing resources to perform large-scale hyperparameter optimization using distributed training on multiple compute nodes. This is part of RAISE's work on data-driven use cases which leverages AI- and HPC cross-methods developed within the project. In response to the demand for parallelizable and resource efficient hyperparameter optimization methods, advanced hyperparameter search algorithms are benchmarked and compared. The evaluated algorithms, including Random Search, Hyperband and ASHA, are tested and compared in terms of both accuracy and accuracy per compute resources spent. As an example use case, a graph neural network model known as MLPF, developed for the task of Machine-Learned Particle-Flow reconstruction in High Energy Physics, acts as the base model for optimization. Results show that hyperparameter optimization significantly increased the performance of MLPF and that this would not have been possible without access to large-scale High Performance Computing resources. It is also shown that, in the case of MLPF, the ASHA algorithm in combination with Bayesian optimization gives the largest performance increase per compute resources spent out of the investigated algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题