论文标题
基于药物设计的基于Ollivier持续性RICCI曲率(OPRC)的分子表示
Ollivier persistent Ricci curvature (OPRC) based molecular representation for drug design
论文作者
论文摘要
有效的分子特征是药物设计中机器学习模型的主要问题之一。在这里,我们首次提出了持久的RICCI曲率(PRC),特别是Ollivier持续的RICCI曲率(OPRC),以进行分子特征和特征工程。持续同源性中提出的过滤过程用于生成一系列嵌套的分子图。这些嵌套图上的ollivier ricci曲率的持久性和变化被定义为ollivier持续的ricci曲率。此外,持久性属性在过滤过程中是OPRC的统计和组合特性,被用作分子描述符,并与机器学习模型(尤其是梯度增强树(GBT))相结合。我们的OPRC-GBT模型用于预测蛋白质 - 配体结合亲和力,这是药物设计的关键步骤之一。基于来自良好的蛋白质配体数据库(即PDBBIND)的三个最常见的数据集,我们深入测试了我们的模型并与现有模型进行比较。已经发现,我们的模型比所有具有传统分子描述符的机器学习模型都更好。
Efficient molecular featurization is one of the major issues for machine learning models in drug design. Here we propose persistent Ricci curvature (PRC), in particular Ollivier persistent Ricci curvature (OPRC), for the molecular featurization and feature engineering, for the first time. Filtration process proposed in persistent homology is employed to generate a series of nested molecular graphs. Persistence and variation of Ollivier Ricci curvatures on these nested graphs are defined as Ollivier persistent Ricci curvature. Moreover, persistent attributes, which are statistical and combinatorial properties of OPRCs during the filtration process, are used as molecular descriptors, and further combined with machine learning models, in particular, gradient boosting tree (GBT). Our OPRC-GBT model is used in the prediction of protein-ligand binding affinity, which is one of key steps in drug design. Based on three most-commonly used datasets from the well-established protein-ligand binding databank, i.e., PDBbind, we intensively test our model and compare with existing models. It has been found that our model are better than all machine learning models with traditional molecular descriptors.