论文标题

基于可扩展片段的3D分子设计,并进行增强学习

Scalable Fragment-Based 3D Molecular Design with Reinforcement Learning

论文作者

Flam-Shepherd, Daniel, Zhigalin, Alexander, Aspuru-Guzik, Alán

论文摘要

机器学习有可能自动化分子设计并大大加速新功能化合物的发现。为了实现这一目标,已成功地使用了使用字符串和图表的生成模型和增强学习(RL)来搜索新分子。但是,这些方法受到限制,因为它们的表示忽略了分子的三维(3D)结构。实际上,几何形状在反分子设计中的许多应用中起着重要作用,尤其是在药物发现中。因此,重要的是建立可以基于面向属性的几何约束在3D空间中生成分子结构的模型。为了解决这个问题,一种方法是通过依次将原子放置在太空中的位置来生成分子作为3D点云 - 这允许该过程受到诸如能量或其他特性之类的物理量的指导。但是,这种方法效率低下,因为放置各个原子使探索不必要地深入,从而限制了可以产生的分子的复杂性。此外,在优化分子时,有机和药物化学家会使用已知的片段和官能团而不是单个原子。我们引入了一个新型的RL框架,用于可扩展的3D设计,该框架使用分层剂来依次将分子子结构放置在3D空间中,从而构建分子,从而试图建立在分子设计领域中现有的人类知识上。在各种具有不同子结构的实验中,我们表明,仅在能量考虑因素的引导下,我们的药物可以有效学会从许多分布中产生100多个原子的分子,包括药物样分子,有机LED分子和生物分子。

Machine learning has the potential to automate molecular design and drastically accelerate the discovery of new functional compounds. Towards this goal, generative models and reinforcement learning (RL) using string and graph representations have been successfully used to search for novel molecules. However, these approaches are limited since their representations ignore the three-dimensional (3D) structure of molecules. In fact, geometry plays an important role in many applications in inverse molecular design, especially in drug discovery. Thus, it is important to build models that can generate molecular structures in 3D space based on property-oriented geometric constraints. To address this, one approach is to generate molecules as 3D point clouds by sequentially placing atoms at locations in space -- this allows the process to be guided by physical quantities such as energy or other properties. However, this approach is inefficient as placing individual atoms makes the exploration unnecessarily deep, limiting the complexity of molecules that can be generated. Moreover, when optimizing a molecule, organic and medicinal chemists use known fragments and functional groups, not single atoms. We introduce a novel RL framework for scalable 3D design that uses a hierarchical agent to build molecules by placing molecular substructures sequentially in 3D space, thus attempting to build on the existing human knowledge in the field of molecular design. In a variety of experiments with different substructures, we show that our agent, guided only by energy considerations, can efficiently learn to produce molecules with over 100 atoms from many distributions including drug-like molecules, organic LED molecules, and biomolecules.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源