BATMANNET：用于分子表示的双分支蒙版图形变压器自动编码器

论文标题

BATMANNET：用于分子表示的双分支蒙版图形变压器自动编码器

BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular Representation

论文作者

Wang, Zhen, Feng, Zheng, Li, Yanjun, Li, Bowen, Wang, Yongrui, Sha, Chulin, He, Min, Li, Xiaolin

论文摘要

尽管使用图神经网络（GNN）进行了AI驱动的药物发现（AIDD），但有效的分子表示学习仍然是一个开放的挑战，尤其是在标记分子不足的情况下，有效的分子表示学习仍然是一项挑战，尽管已经做出了大量的努力。最近的研究表明，在未标记的数据集上通过自我监督的学习预先训练的大型GNN模型可以在下游分子属性预测任务中更好地转移性能。但是，这些研究中的方法需要多个复杂的自我监督任务和大规模数据集，这些任务耗时，计算上昂贵且难以端到端培训。在这里，我们设计了一种简单而有效的自我监督策略，以同时学习有关分子的本地和全球信息，并进一步提出了一种新颖的双支掩盖图形变压器自动编码器（BATMANNET）来学习分子表示。 Batmannet具有两个量身定制的互补和不对称图自动编码器，分别从掩盖的分子图中重建缺失的节点和边缘。通过这种设计，Batmannet可以有效地捕获分子的潜在结构和语义信息，从而提高分子表示的性能。 Batmannet在13个基准数据集上实现了多种药物发现任务的最先进结果，包括分子特性预测，药物 - 药物相互作用和药物目标相互作用，表明其在分子表示学习中的巨大潜力和优势。

Although substantial efforts have been made using graph neural networks (GNNs) for AI-driven drug discovery (AIDD), effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, the approaches in these studies require multiple complex self-supervised tasks and large-scale datasets, which are time-consuming, computationally expensive, and difficult to pre-train end-to-end. Here, we design a simple yet effective self-supervised strategy to simultaneously learn local and global information about molecules, and further propose a novel bi-branch masked graph transformer autoencoder (BatmanNet) to learn molecular representations. BatmanNet features two tailored complementary and asymmetric graph autoencoders to reconstruct the missing nodes and edges, respectively, from a masked molecular graph. With this design, BatmanNet can effectively capture the underlying structure and semantic information of molecules, thus improving the performance of molecular representation. BatmanNet achieves state-of-the-art results for multiple drug discovery tasks, including molecular properties prediction, drug-drug interaction, and drug-target interaction, on 13 benchmark datasets, demonstrating its great potential and superiority in molecular representation learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题