论文标题
原位工作流程的共同组合
Co-scheduling Ensembles of In Situ Workflows
论文作者
论文摘要
分子动力学(MD)模拟广泛用于研究大型分子系统。 HPC系统是运行这些研究的理想平台,但是,即使使用现代超级计算机,也达到必要的模拟时间尺度来检测稀有过程也很具有挑战性。为了克服时间尺度的限制,长MD轨迹的模拟被多个短距离模拟所取代,这些模拟在模拟集合中同时执行。分析通常与这些仿真共同安排,以有效地处理运行时模拟产生的大量数据,这要归功于原位技术。执行模拟的工作流程及其原位分析需要有效的共同安排策略和对计算资源的复杂管理,以使它们不会相互放慢。在本文中,我们提出了一种有效的方法来进行固定模拟和原位分析,以便将工作流程集合的制作量最小化。我们提出了一种新颖的方法,通过使用对工作流程集合执行的理论框架进行建模的理论框架,以在资源约束下为工作流程分配资源。我们根据各种工作流程配置的扳手仿真框架使用精确的模拟器来评估提出的方法。结果证明了共进行仿真的重要性和原位分析,这些分析将数据共同受益,从数据区域中受益,其中效率低下的调度决策可以导致MakePAN的因子30速度30速度。
Molecular dynamics (MD) simulations are widely used to study large-scale molecular systems. HPC systems are ideal platforms to run these studies, however, reaching the necessary simulation timescale to detect rare processes is challenging, even with modern supercomputers. To overcome the timescale limitation, the simulation of a long MD trajectory is replaced by multiple short-range simulations that are executed simultaneously in an ensemble of simulations. Analyses are usually co-scheduled with these simulations to efficiently process large volumes of data generated by the simulations at runtime, thanks to in situ techniques. Executing a workflow ensemble of simulations and their in situ analyses requires efficient co-scheduling strategies and sophisticated management of computational resources so that they are not slowing down each other. In this paper, we propose an efficient method to co-schedule simulations and in situ analyses such that the makespan of the workflow ensemble is minimized. We present a novel approach to allocate resources for a workflow ensemble under resource constraints by using a theoretical framework modeling the workflow ensemble's execution. We evaluate the proposed approach using an accurate simulator based on the WRENCH simulation framework on various workflow ensemble configurations. Results demonstrate the significance of co-scheduling simulations and in situ analyses that couple data together to benefit from data locality, in which inefficient scheduling decisions can lead up to a factor 30 slowdown in makespan.