批归其标准化的推理网络使KL消失了

论文标题

批归其标准化的推理网络使KL消失了

A Batch Normalized Inference Network Keeps the KL Vanishing Away

论文作者

Zhu, Qile, Su, Jianlin, Bi, Wei, Liu, Xiaojiang, Ma, Xiyao, Li, Xiaolin, Wu, Dapeng

论文摘要

变分自动编码器（VAE）被广泛用作生成模型，通过结合摊销的变异推断和深神经网络，以近似模型的潜在变量。然而，当与强大的自动解码器配对时，VAE通常会收敛于称为“后倒塌”的局部最佳最佳。以前的方法考虑每个数据点的Kullback Leibler Divergence（KL）个体。我们建议让KL遵循整个数据集的分布，并分析通过保持KL分布阳性的期望来防止后塌陷。然后，我们提出了批次归一化vae（BN-VAE），这是一种简单但有效的方法，通过将近似后验参数的分布正规化来设置期望的下限。如果不引入任何新的模型组件或修改目标，我们的方法可以有效，有效地避免后部塌陷。我们进一步表明，所提出的BN-VAE可以扩展到条件VAE（CVAE）。从经验上讲，我们的方法在语言建模，文本分类和对话生成上超过了强大的自回归基线，并且与VAE保持了几乎相同的训练时间，并竞争更复杂的方法。

Variational Autoencoder (VAE) is widely used as a generative model to approximate a model's posterior on latent variables by combining the amortized variational inference and deep neural networks. However, when paired with strong autoregressive decoders, VAE often converges to a degenerated local optimum known as "posterior collapse". Previous approaches consider the Kullback Leibler divergence (KL) individual for each datapoint. We propose to let the KL follow a distribution across the whole dataset, and analyze that it is sufficient to prevent posterior collapse by keeping the expectation of the KL's distribution positive. Then we propose Batch Normalized-VAE (BN-VAE), a simple but effective approach to set a lower bound of the expectation by regularizing the distribution of the approximate posterior's parameters. Without introducing any new model component or modifying the objective, our approach can avoid the posterior collapse effectively and efficiently. We further show that the proposed BN-VAE can be extended to conditional VAE (CVAE). Empirically, our approach surpasses strong autoregressive baselines on language modeling, text classification and dialogue generation, and rivals more complex approaches while keeping almost the same training time as VAE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题