论文标题
共享跨域特征表示和图像到图像翻译的自动编码
Auto-Encoding for Shared Cross Domain Feature Representation and Image-to-Image Translation
论文作者
论文摘要
图像到图像翻译是计算机视觉和模式识别问题的子集,我们的目标是在域的输入图像$ \ Mathbf {x} _1 $和域$ \ Mathbf {x} _2 _2 $之间学习映射。当前方法使用具有编码器结构的神经网络来学习映射$ G:\ MathBf {X} _1 \ to \ Mathbf {X} _2 $,以便从$ \ Mathbf {x} _2} _2 $ g(x} _2 $ g(x)和$ g(\ mathbf {\ mathbf {x {x _1 _1)中,图像分布$ g(\ mathbf {x} _1)= d_g(f_g(\ mathbf {x} _1))$和$ f_g(\ cdot)$被称为encoder,$ d_g(\ cdot)$称为解码器。当前,此类方法还计算了一个反映射$ f:\ mathbf {x} _2 \ to \ mathbf {x} _1 $使用单独的concoder-decoder对$ d_f(f_f(\ mathbf {x} _2 _2)_2)在这里,我们介绍了一种使用单个编码器架构进行跨多个域进行跨域图像到图像翻译的方法。 We use an auto-encoder network which given an input image $\mathbf{X}_1$, first computes a latent domain encoding $Z_d = f_d (\mathbf{X}_1)$ and a latent content encoding $Z_c = f_c (\mathbf{X}_1)$, where the domain encoding $Z_d$ and content encoding $ z_c $是独立的。然后,解码器网络$ g(z_d,z_c)$创建了原始图像$ \ mathbf {\ wideHat {x}} _ 1 = g(z_d,z_c)\ oft \ mathbf {x} _1 $。理想情况下,编码$ z_d $的域不包含有关图像内容的信息,而编码$ z_c $的内容不包含有关图像域的信息。我们使用编码的此属性来查找跨域$ g:x \ y $的映射,只需更改编码解码器输入的$ z_d $的域即可。 $ g(\ mathbf {x} _1)= d(f_d(\ Mathbf {x} _2^i),f_c(\ Mathbf {x} _1))$其中$ \ Mathbf {x} _2^_2^i是$ i^{
Image-to-image translation is a subset of computer vision and pattern recognition problems where our goal is to learn a mapping between input images of domain $\mathbf{X}_1$ and output images of domain $\mathbf{X}_2$. Current methods use neural networks with an encoder-decoder structure to learn a mapping $G:\mathbf{X}_1 \to\mathbf{X}_2$ such that the distribution of images from $\mathbf{X}_2$ and $G(\mathbf{X}_1)$ are identical, where $G(\mathbf{X}_1) = d_G (f_G (\mathbf{X}_1))$ and $f_G (\cdot)$ is referred as the encoder and $d_G(\cdot)$ is referred to as the decoder. Currently, such methods which also compute an inverse mapping $F:\mathbf{X}_2 \to \mathbf{X}_1$ use a separate encoder-decoder pair $d_F (f_F (\mathbf{X}_2))$ or at least a separate decoder $d_F (\cdot)$ to do so. Here we introduce a method to perform cross domain image-to-image translation across multiple domains using a single encoder-decoder architecture. We use an auto-encoder network which given an input image $\mathbf{X}_1$, first computes a latent domain encoding $Z_d = f_d (\mathbf{X}_1)$ and a latent content encoding $Z_c = f_c (\mathbf{X}_1)$, where the domain encoding $Z_d$ and content encoding $Z_c$ are independent. And then a decoder network $g(Z_d,Z_c)$ creates a reconstruction of the original image $\mathbf{\widehat{X}}_1=g(Z_d,Z_c )\approx \mathbf{X}_1$. Ideally, the domain encoding $Z_d$ contains no information regarding the content of the image and the content encoding $Z_c$ contains no information regarding the domain of the image. We use this property of the encodings to find the mapping across domains $G: X\to Y$ by simply changing the domain encoding $Z_d$ of the decoder's input. $G(\mathbf{X}_1 )=d(f_d (\mathbf{x}_2^i ),f_c (\mathbf{X}_1))$ where $\mathbf{x}_2^i$ is the $i^{th}$ observation of $\mathbf{X}_2$.