论文标题

具有神经网络的自动编码无向分子图

Autoencoding Undirected Molecular Graphs With Neural Networks

论文作者

Olsen, Jeppe Johan Waarkjær, Christensen, Peter Ebert, Hansen, Martin Hangaard, Johansen, Alexander Rosenberg

论文摘要

验证分子结构的离散结构规则通常仅限于实现八位位规则或类似的简单确定性启发式方法。我们提出了一个模型,灵感来自自然语言处理的语言建模,并能够从无向分子图的集合中学习,从而拟合了集合中存在的任何基本结构规则。我们引入了对流行变压器模型的改编,该模型可以学习原子与纽带之间的关系。据我们所知,变压器适应是第一个经过训练的模型,以解决恢复部分观察到的分子的无监督任务。在这项工作中,我们评估了不同程度的信息影响性能W.R.T.要拟合符合八位字规则的QM9数据集,并拟合锌数据集,其中包含超价分子和离子,需要模型学习更复杂的结构规则。更具体地说,我们测试了带有债券订单信息的完整离散图,仅具有连接性的完整离散图,一个邻居袋,一个原子袋和基于计数的Unigram统计信息。这些结果提供了令人鼓舞的证据,即即使只有连接性,神经网络也可以学习特定于数据集的任意分子结构规则,因为变压器的适应性超过了锌数据集上的强八个规则基线。

Discrete structure rules for validating molecular structures are usually limited to fulfillment of the octet rule or similar simple deterministic heuristics. We propose a model, inspired by language modeling from natural language processing, with the ability to learn from a collection of undirected molecular graphs, enabling fitting of any underlying structure rule present in the collection. We introduce an adaption to the popular Transformer model, which can learn relationships between atoms and bonds. To our knowledge, the Transformer adaption is the first model that is trained to solve the unsupervised task of recovering partially observed molecules. In this work, we assess how different degrees of information impact performance w.r.t. to fitting the QM9 dataset, which conforms to the octet rule, and to fitting the ZINC dataset, which contains hypervalent molecules and ions requiring the model to learn a more complex structure rule. More specifically, we test a full discrete graph with bond order information, a full discrete graph with only connectivity, a bag-of-neighbors, a bag-of-atoms, and a count-based unigram statistics. These results provide encouraging evidence that neural networks, even when only connectivity is available, can learn arbitrary molecular structure rules specific to a dataset, as the Transformer adaption surpasses a strong octet rule baseline on the ZINC dataset.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源