论文标题
水晶变形金刚:用于生成和修补材料设计的自学神经语言模型
Crystal Transformer: Self-learning neural language model for Generative and Tinkering Design of Materials
论文作者
论文摘要
从自然语言处理到学习生物序列和有机分子的语言,自我监督的神经语言模型最近取得了前所未有的成功。这些模型在具有学识渊博的蛋白质和分子的生成,结构分类以及功能预测中表现出了卓越的性能。但是,大多数基于掩蔽的预训练的语言模型不是为生成设计而设计的,并且它们的黑盒本质使得很难解释其设计逻辑。在这里,我们提出了BLMM Crystal Transformer,这是一种基于神经网络的概率生成模型,用于生成和修补无机材料的设计。我们的模型建立在用于文本生成的空白填充语言模型上,并在学习“材料语法”以及高质量的生成,可解释性和数据效率方面具有独特的优势。它可以产生高达89.7%电荷中立性和84.8%平衡的电负性的化学有效材料组合物,与伪随机采样基线相比,它们高4和8倍。 BLMM的概率生成过程允许它推荐基于学到的材料化学的修补操作,并使材料掺杂有用。结合TCSP Crysal结构预测算法,我们应用了模型来发现一组使用DFT计算验证的新材料。因此,我们的工作将基于无机的人工智能的无监督的变压器语言模型带入了无机材料。已经开发了用于计算材料掺杂的用户友好的Web应用程序,可以通过\ url {www.materialsatlas.org/blmtinker}自由访问。
Self-supervised neural language models have recently achieved unprecedented success, from natural language processing to learning the languages of biological sequences and organic molecules. These models have demonstrated superior performance in the generation, structure classification, and functional predictions for proteins and molecules with learned representations. However, most of the masking-based pre-trained language models are not designed for generative design, and their black-box nature makes it difficult to interpret their design logic. Here we propose BLMM Crystal Transformer, a neural network based probabilistic generative model for generative and tinkering design of inorganic materials. Our model is built on the blank filling language model for text generation and has demonstrated unique advantages in learning the "materials grammars" together with high-quality generation, interpretability, and data efficiency. It can generate chemically valid materials compositions with as high as 89.7\% charge neutrality and 84.8\% balanced electronegativity, which are more than 4 and 8 times higher compared to a pseudo random sampling baseline. The probabilistic generation process of BLMM allows it to recommend tinkering operations based on learned materials chemistry and makes it useful for materials doping. Combined with the TCSP crysal structure prediction algorithm, We have applied our model to discover a set of new materials as validated using DFT calculations. Our work thus brings the unsupervised transformer language models based generative artificial intelligence to inorganic materials. A user-friendly web app has been developed for computational materials doping and can be accessed freely at \url{www.materialsatlas.org/blmtinker}.