论文标题

用谎言组转换和稀疏编码解开图像

Disentangling images with Lie group transformations and sparse coding

论文作者

Chau, Ho Yin, Qiu, Frank, Chen, Yubei, Olshausen, Bruno

论文摘要

离散的空间模式及其连续转换是自然信号中包含的两个重要规律性。谎言组和表示理论是以前的作品中用于建模连续图像转换的数学工具。另一方面,稀疏编码是学习自然信号中模式词典的重要工具。在本文中,我们将这些想法结合在贝叶斯生成模型中,该模型学会以完全无监督的方式解散空间模式及其持续的转换。图像被建模为形状成分的稀疏叠加,然后是通过n个连续变量参数化的转换。形状组件和转换不是预定义的,而是适应数据中的对称性,并以这样的限制,即转换形成n维圆环的表示。在由特定MNIST数字的受控几何变换组成的数据集上训练该模型表明,它可以与数字一起恢复这些转换。在完整的MNIST数据集中进行的培训表明,它可以同时学习基本的数字形状和自然变换,例如剪切和伸展,其中包含在此数据中。

Discrete spatial patterns and their continuous transformations are two important regularities contained in natural signals. Lie groups and representation theory are mathematical tools that have been used in previous works to model continuous image transformations. On the other hand, sparse coding is an important tool for learning dictionaries of patterns in natural signals. In this paper, we combine these ideas in a Bayesian generative model that learns to disentangle spatial patterns and their continuous transformations in a completely unsupervised manner. Images are modeled as a sparse superposition of shape components followed by a transformation that is parameterized by n continuous variables. The shape components and transformations are not predefined, but are instead adapted to learn the symmetries in the data, with the constraint that the transformations form a representation of an n-dimensional torus. Training the model on a dataset consisting of controlled geometric transformations of specific MNIST digits shows that it can recover these transformations along with the digits. Training on the full MNIST dataset shows that it can learn both the basic digit shapes and the natural transformations such as shearing and stretching that are contained in this data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源