通过自我分类增强变分的生成

论文标题

通过自我分类增强变分的生成

Enhancing variational generation through self-decomposition

论文作者

Asperti, Andrea, Bugo, Laura, Filippini, Daniele

论文摘要

在本文中，我们介绍了分裂变量自动编码器（SVAE）的概念，其输出$ \ hat {x} $作为加权总和$σ\ odot \ hat {x_1} +（1-σ）\ odot \ odot \ hat \ hat hat {\ em学习}构图图。组合图像$ \ hat {x_1}，\ hat {x_2} $以及$σ$ -map由模型自动合成。该网络经常进行训练，通常是变异自动编码器，训练和重建图像之间具有负loglikelihood的损失。 $ \ hat {x_1}，\ hat {x_2} $或$σ$都不需要额外的损失。分解是非确定性的，但遵循两个主要方案，我们可以将大致分为\ say {stantactic}或\ say {smantic}。在第一种情况下，地图倾向于利用相邻像素之间的强相关性，将图像分为两个互补的高频子图像。在第二种情况下，地图通常集中在对象的轮廓上，以其内容的有趣变化将图像分开，并具有更明显和独特的特征。在这种情况下，根据经验观察，$ \ hat {x_1} $和$ \ hat {x_2} $的fréchet成立距离（FID）通常比$ \ hat {x} $要低（因此更好），显然是在前者的平均值中遭受的。从某种意义上说，SVAE迫使变异自动编码器做出选择，与替代方案之间的固有趋势相反，其目的是最大程度地减少针对特定样本的重建损失。根据FID指标，我们的技术在MNIST，CIFAR10和CELEBA等典型数据集上进行了测试，使我们能够胜过所有以前所有以前的纯粹构造（不依赖归一化流）。

In this article we introduce the notion of Split Variational Autoencoder (SVAE), whose output $\hat{x}$ is obtained as a weighted sum $σ\odot \hat{x_1} + (1-σ) \odot \hat{x_2}$ of two generated images $\hat{x_1},\hat{x_2}$, and $σ$ is a {\em learned} compositional map. The composing images $\hat{x_1},\hat{x_2}$, as well as the $σ$-map are automatically synthesized by the model. The network is trained as a usual Variational Autoencoder with a negative loglikelihood loss between training and reconstructed images. No additional loss is required for $\hat{x_1},\hat{x_2}$ or $σ$, neither any form of human tuning. The decomposition is nondeterministic, but follows two main schemes, that we may roughly categorize as either \say{syntactic} or \say{semantic}. In the first case, the map tends to exploit the strong correlation between adjacent pixels, splitting the image in two complementary high frequency sub-images. In the second case, the map typically focuses on the contours of objects, splitting the image in interesting variations of its content, with more marked and distinctive features. In this case, according to empirical observations, the Fréchet Inception Distance (FID) of $\hat{x_1}$ and $\hat{x_2}$ is usually lower (hence better) than that of $\hat{x}$, that clearly suffers from being the average of the former. In a sense, a SVAE forces the Variational Autoencoder to make choices, in contrast with its intrinsic tendency to {\em average} between alternatives with the aim to minimize the reconstruction loss towards a specific sample. According to the FID metric, our technique, tested on typical datasets such as Mnist, Cifar10 and CelebA, allows us to outperform all previous purely variational architectures (not relying on normalization flows).

下载PDF全文

下载文献需遵守相关版权规定

论文标题