生成的随机潜在特征模型和自然图像的统计数据

论文标题

生成的随机潜在特征模型和自然图像的统计数据

Generative random latent features models and statistics of natural images

论文作者

Fleig, Philipp, Nemenman, Ilya

论文摘要

复杂的多变量系统通常通过将其组成单元分为组成部分来分析，有时称为潜在特征，这些特征提供物理或生物学解释。但是，可以定义先验的许多不同类型的潜在特征和数据分解，并且通常使用试验和错误方法来确定系统及其数据自然的分解。非常需要对哪种分解适合给定数据集有原则的理解。在这项工作中，我们朝着这个方向迈出了一步，并认为数据中的样本样本相关性带有重要信息。为此，我们基于潜在特征的线性混合，构建了大数据的生成随机潜在特征矩阵模型。我们模型的关键要素是，我们允许混合系数之间的统计依赖性，并认为该模型捕获了许多类型的自然数据中发现的特性属性。数据的潜在维度和相关模式仅由两个模型参数控制。该模型的数据模式包括（重叠）簇，稀疏混合和约束（非负）混合。我们描述了每种模式的特征相关性和特征值分布。最后，我们将模型从自然图像中的相关数据贴上，并与模型的稀疏混合方式找到了几乎完美的匹配。该发现与自然场景图像中众所周知的稀疏编码结构一致，并提供了有关适当数据分解的信息，即稀疏的编码方案。我们认为，我们的工作将为生物系统的各种数据提供类似的见解。

Complex, multivariable systems are often analyzed by grouping their constituent units into components, sometimes referred to as latent features, which afford physical or biological interpretation. However, a priori many different types of latent features and data decompositions can be defined, and one typically uses a trial and error approach to determine a decomposition that is natural to the system and its data. It is highly desirable to develop principled understanding of which decomposition is appropriate for given a data set. In this work, we take a step in this direction and argue that sample-sample correlations in the data carry important information to this effect. For this we construct a generative random latent feature matrix model of large data based on linear mixing of latent features. Key ingredient of our model is that we allow for statistical dependence between the mixing coefficients and argue that the model captures characteristic properties found in many types of natural data. Latent dimensionality and correlation patterns of the data are controlled by only two model parameters. The model's data patterns include (overlapping) clusters, sparse mixing, and constrained (non-negative) mixing. We describe the characteristic correlation and eigenvalue distributions of each pattern. Finally, we fit the model on correlation data from natural images and find a near perfect match with the sparse mixing regime of our model. This finding is in line with the well-known sparse coding structure in natural scene images and provides information about the appropriate data decomposition, namely a sparse coding scheme. We believe that our work will deliver similar insights for diverse data of biological systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题