论文标题

监督分子力场和性质预测的预处理预测

Supervised Pretraining for Molecular Force Fields and Properties Prediction

论文作者

Gao, Xiang, Gao, Weihao, Xiao, Wenzhi, Wang, Zhirui, Wang, Chong, Xiang, Liang

论文摘要

机器学习方法已成为分子建模任务(包括分子力场和性质预测)的流行。传统的监督学习方法缺乏特定任务的标记数据的稀缺性,激发了大规模数据集用于其他相关任务。我们建议在具有原子电荷和3D几何形状的8600万分子的数据集上预认识神经网络,作为输入和分子能作为标签。实验表明,与从头开始的训练相比,对审计模型进行微调可以显着提高七个分子财产预测任务和两个力场任务的性能。我们还证明,从预处理模型中学习的表示形式包含有关分子结构的足够信息,通过表明表示表示的线性探测可以预测许多分子信息,包括原子类型,原子间距离,分子支架类别以及分子碎片的存在。我们的结果表明,监督预处理是分子建模的有希望的研究方向

Machine learning approaches have become popular for molecular modeling tasks, including molecular force fields and properties prediction. Traditional supervised learning methods suffer from scarcity of labeled data for particular tasks, motivating the use of large-scale dataset for other relevant tasks. We propose to pretrain neural networks on a dataset of 86 millions of molecules with atom charges and 3D geometries as inputs and molecular energies as labels. Experiments show that, compared to training from scratch, fine-tuning the pretrained model can significantly improve the performance for seven molecular property prediction tasks and two force field tasks. We also demonstrate that the learned representations from the pretrained model contain adequate information about molecular structures, by showing that linear probing of the representations can predict many molecular information including atom types, interatomic distances, class of molecular scaffolds, and existence of molecular fragments. Our results show that supervised pretraining is a promising research direction in molecular modeling

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源