论文标题

在HPC系统上扩展INFN TIER-1

Extension of the INFN Tier-1 on a HPC system

论文作者

Boccali, Tommaso, Pra, Stefano Dal, Spiga, Daniele, Ciangottini, Diego, Zani, Stefano, Bozzi, Concezio, De Salvo, Alessandro, Valassi, Andrea, Noferini, Francesco, Agnello, Luca dell, Stagni, Federico, Doria, Alessandra, Bonacorsi, Daniele

论文摘要

位于博洛尼亚(意大利)CNAF的INFN TIER-1是WLCG E基础结构的中心,支持4个主要的LHC协作和其他30多个与INFN相关的实验。 After multiple tests towards elastic expansion of CNAF compute power via Cloud resources (provided by Azure, Aruba and in the framework of the HNSciCloud project), and building on the experience gained with the production quality extension of the Tier-1 farm on remote owned sites, the CNAF team, in collaboration with experts from the ALICE, ATLAS, CMS, and LHCb experiments, has been working to put in production a solution of an integrated HTC+HPC系统,带有Prace Cineca Center,位于博洛尼亚附近。此类扩展将在配备英特尔骑士登陆(KNL)处理器的Marconi A2分区上实施。为了成功地在低RAM节点上运行,并克服了HPC系统在标准网格站点方面部署的HPC系统。我们使用成功的Prace Grant N. 2018194658获得了3000万个KNL核心小时,从大规模集成工作中展示了大规模集成工作的初步结果。

The INFN Tier-1 located at CNAF in Bologna (Italy) is a center of the WLCG e-Infrastructure, supporting the 4 major LHC collaborations and more than 30 other INFN-related experiments. After multiple tests towards elastic expansion of CNAF compute power via Cloud resources (provided by Azure, Aruba and in the framework of the HNSciCloud project), and building on the experience gained with the production quality extension of the Tier-1 farm on remote owned sites, the CNAF team, in collaboration with experts from the ALICE, ATLAS, CMS, and LHCb experiments, has been working to put in production a solution of an integrated HTC+HPC system with the PRACE CINECA center, located nearby Bologna. Such extension will be implemented on the Marconi A2 partition, equipped with Intel Knights Landing (KNL) processors. A number of technical challenges were faced and solved in order to successfully run on low RAM nodes, as well as to overcome the closed environment (network, access, software distribution, ... ) that HPC systems deploy with respect to standard GRID sites. We show preliminary results from a large scale integration effort, using resources secured via the successful PRACE grant N. 2018194658, for 30 million KNL core hours.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源