MHVAE：一种用于多模式表示学习的人类启发的深层层次生成模型

论文标题

MHVAE：一种用于多模式表示学习的人类启发的深层层次生成模型

MHVAE: a Human-Inspired Deep Hierarchical Generative Model for Multimodal Representation Learning

论文作者

Vasco, Miguel, Melo, Francisco S., Paiva, Ana

论文摘要

人类能够创建其外部现实的丰富代表。它们的内部表示允许交叉模式推断，如果可用的感知可以引起缺失的输入方式的感知经验。在本文中，我们贡献了多模式层次变化自动编码器（MHVAE），这是一种用于表示学习的分层多模式模型。受到人类认知模型的启发，MHVAE能够学习特定于模式的分布，任意数量的方式以及负责跨模式推断的联合模式分布。我们正式得出该模型的证据下限，并提出了一种新的方法，以基于模式特异性表示的辍学，以近似联合模式后验。我们在标准的多模式数据集上评估了MHVAE。我们的模型与其他最先进的生成模型相同，涉及由任意输入方式和跨模式推断的联合模式重建。

Humans are able to create rich representations of their external reality. Their internal representations allow for cross-modality inference, where available perceptions can induce the perceptual experience of missing input modalities. In this paper, we contribute the Multimodal Hierarchical Variational Auto-encoder (MHVAE), a hierarchical multimodal generative model for representation learning. Inspired by human cognitive models, the MHVAE is able to learn modality-specific distributions, of an arbitrary number of modalities, and a joint-modality distribution, responsible for cross-modality inference. We formally derive the model's evidence lower bound and propose a novel methodology to approximate the joint-modality posterior based on modality-specific representation dropout. We evaluate the MHVAE on standard multimodal datasets. Our model performs on par with other state-of-the-art generative models regarding joint-modality reconstruction from arbitrary input modalities and cross-modality inference.

下载PDF全文

下载文献需遵守相关版权规定

论文标题