通过随机迭代目标增强改善分子设计

论文标题

通过随机迭代目标增强改善分子设计

Improving Molecular Design by Stochastic Iterative Target Augmentation

论文作者

Yang, Kevin, Jin, Wengong, Swanson, Kyle, Barzilay, Regina, Jaakkola, Tommi

论文摘要

分子设计中的生成模型倾向于被富含参数化的数据渴望神经模型，因为它们必须创建复杂的结构化对象作为输出。由于缺乏足够的培训数据，从数据中估算此类模型可能具有挑战性。在本文中，我们提出了一种令人惊讶的有效自我训练方法，以迭代创建其他分子靶标。我们首先将生成模型与简单的属性预测变量一起预先培训。然后将属性预测变量用作从生成模型过滤候选结构的可能性模型。在随机EM迭代过程中，迭代产生和使用其他目标，以最大程度地接受候选结构。一个简单的拒绝（重新加权）采样器足以绘制后样品，因为在预训练后生成模型已经很合理。对于无条件和条件分子设计，我们证明了对强基础的显着增长。特别是，我们的方法在有条件分子设计中的先前最新方法的绝对增益超过10％。最后，我们表明我们的方法在其他领域也很有用，例如程序综合。

Generative models in molecular design tend to be richly parameterized, data-hungry neural models, as they must create complex structured objects as outputs. Estimating such models from data may be challenging due to the lack of sufficient training data. In this paper, we propose a surprisingly effective self-training approach for iteratively creating additional molecular targets. We first pre-train the generative model together with a simple property predictor. The property predictor is then used as a likelihood model for filtering candidate structures from the generative model. Additional targets are iteratively produced and used in the course of stochastic EM iterations to maximize the log-likelihood that the candidate structures are accepted. A simple rejection (re-weighting) sampler suffices to draw posterior samples since the generative model is already reasonable after pre-training. We demonstrate significant gains over strong baselines for both unconditional and conditional molecular design. In particular, our approach outperforms the previous state-of-the-art in conditional molecular design by over 10% in absolute gain. Finally, we show that our approach is useful in other domains as well, such as program synthesis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题