论文标题
MHVAE:一种用于多模式表示学习的人类启发的深层层次生成模型
MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning
论文作者
论文摘要
人类能够创建其外部现实的丰富代表。它们的内部表示允许交叉模式推断,如果可用的感知可以引起缺失的输入方式的感知经验。在本文中,我们贡献了多模式层次变化自动编码器(MHVAE),这是一种用于表示学习的分层多模式模型。受到人类认知模型的启发,MHVAE能够学习特定于模式的分布,任意数量的方式以及负责跨模式推断的联合模式分布。我们正式得出该模型的证据下限,并提出了一种新的方法,以基于模式特异性表示的辍学,以近似联合模式后验。我们在标准的多模式数据集上评估了MHVAE。我们的模型与其他最先进的生成模型相同,涉及由任意输入方式和跨模式推断的联合模式重建。
Humans are able to create rich representations of their external reality. Their internal representations allow for cross-modality inference, where available perceptions can induce the perceptual experience of missing input modalities. In this paper, we contribute the Multimodal Hierarchical Variational Auto-encoder (MHVAE), a hierarchical multimodal generative model for representation learning. Inspired by human cognitive models, the MHVAE is able to learn modality-specific distributions, of an arbitrary number of modalities, and a joint-modality distribution, responsible for cross-modality inference. We formally derive the model's evidence lower bound and propose a novel methodology to approximate the joint-modality posterior based on modality-specific representation dropout. We evaluate the MHVAE on standard multimodal datasets. Our model performs on par with other state-of-the-art generative models regarding joint-modality reconstruction from arbitrary input modalities and cross-modality inference.