论文标题

开放Matsci ML工具包:材料科学机器学习的灵活框架

The Open MatSci ML Toolkit: A Flexible Framework for Machine Learning in Materials Science

论文作者

Miret, Santiago, Lee, Kin Long Kelvin, Gonzales, Carmelo, Nassar, Marcel, Spellings, Matthew

论文摘要

我们介绍了开放的Matsci ML工具包:一种灵活,独立且可扩展的基于Python的框架,以对科学数据进行深入学习模型和方法,并针对材料科学和OpenCatalyst数据集进行了专门关注。我们的工具包提供:1。用于材料科学的可扩展机器学习工作流程利用Pytorch Lightning,该工作流程可以跨不同的计算功能(笔记本电脑,服务器,群集)和硬件平台(CPU,GPU,XPU)进行无缝缩放。 2。深图库(DGL)支持快速图形神经网络原型制作和开发。通过通过开源发布与研究社区发布和共享此工具包,我们希望:1。降低想要开始使用OpenCatalyst数据集的新机器学习研究人员和从业人员的入口障碍,这些数据集目前是最大的计算材料科学数据集。 2。使科学界能够将高级机器学习工具应用于高影响力的科学挑战,例如为清洁能源应用建模材料行为。我们通过启用了三个新的eproivariant神经网络模型来展示框架的功能,以进行多个OpenCatalyst任务,并为计算缩放和模型性能带来有希望的结果。

We present the Open MatSci ML Toolkit: a flexible, self-contained, and scalable Python-based framework to apply deep learning models and methods on scientific data with a specific focus on materials science and the OpenCatalyst Dataset. Our toolkit provides: 1. A scalable machine learning workflow for materials science leveraging PyTorch Lightning, which enables seamless scaling across different computation capabilities (laptop, server, cluster) and hardware platforms (CPU, GPU, XPU). 2. Deep Graph Library (DGL) support for rapid graph neural network prototyping and development. By publishing and sharing this toolkit with the research community via open-source release, we hope to: 1. Lower the entry barrier for new machine learning researchers and practitioners that want to get started with the OpenCatalyst dataset, which presently comprises the largest computational materials science dataset. 2. Enable the scientific community to apply advanced machine learning tools to high-impact scientific challenges, such as modeling of materials behavior for clean energy applications. We demonstrate the capabilities of our framework by enabling three new equivariant neural network models for multiple OpenCatalyst tasks and arrive at promising results for compute scaling and model performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源