通过可扩展的变异高斯过程来学习诱导分子数据的诱导点和不确定性

论文标题

通过可扩展的变异高斯过程来学习诱导分子数据的诱导点和不确定性

Learning inducing points and uncertainty on molecular data by scalable variational Gaussian processes

论文作者

Tsitsvero, Mikhail, Jin, Mingoo, Lyalin, Andrey

论文摘要

对大数据集的不确定性控制和可扩展性是在材料科学和化学中基于自动机器学习的预测管道中部署高斯流程（GP）模型的两个主要问题。解决这两个问题的一种方法是引入潜在的诱导点变量，并为边缘模具目标目标选择正确的近似值。在这里，我们从经验上表明，分子描述符空间中诱导点的变分学习改善了两个分子动力学数据集对能量和原子力的预测。首先，我们表明，各种GP可以学会代表不同类型的分子的配置，这些分子在配置的初始化集中不存在。我们提供了替代对数类样训练目标和变分分布的比较。在评估的几种近似边缘样品可能性目标中，我们表明，预测对数似然性提供了出色的不确定性估计，以略有预测质量的费用。此外，我们将研究扩展到一个大分子晶体系统，表明各种GP模型通过有效学习数据集的稀疏表示来预测原子力。

Uncertainty control and scalability to large datasets are the two main issues for the deployment of Gaussian process (GP) models within the autonomous machine learning-based prediction pipelines in material science and chemistry. One way to address both of these issues is by introducing the latent inducing point variables and choosing the right approximation for the marginal log-likelihood objective. Here, we empirically show that variational learning of the inducing points in a molecular descriptor space improves the prediction of energies and atomic forces on two molecular dynamics datasets. First, we show that variational GPs can learn to represent the configurations of the molecules of different types that were not present within the initialization set of configurations. We provide a comparison of alternative log-likelihood training objectives and variational distributions. Among several evaluated approximate marginal log-likelihood objectives, we show that predictive log-likelihood provides excellent uncertainty estimates at the slight expense of predictive quality. Furthermore, we extend our study to a large molecular crystal system, showing that variational GP models perform well for predicting atomic forces by efficiently learning a sparse representation of the dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题