论文标题
如何理解蒙面的自动编码器
How to Understand Masked Autoencoders
论文作者
论文摘要
“掩盖的自动编码器(MAE)是可扩展的视觉学习者”彻底改变了自我监督的学习方法,因为它不仅可以实现图像预训练的最新技术,而且是一个里程碑,它可以弥合视觉和语言掩盖的自动编码(BERT-Style)(BERT-Style)预生殖。但是,据我们所知,迄今为止,没有理论观点可以解释MAE的强大表现力。在本文中,我们第一次提出了一个统一的理论框架,该框架为MAE提供了数学理解。具体而言,我们使用非重叠域分解设置下的积分内核来解释MAE的基于斑块的注意力方法。为了帮助研究社区进一步理解MAE取得巨大成功的主要原因,我们提出了五个问题,并使用运营商理论的见解,用数学上的严格回答。
"Masked Autoencoders (MAE) Are Scalable Vision Learners" revolutionizes the self-supervised learning method in that it not only achieves the state-of-the-art for image pre-training, but is also a milestone that bridges the gap between visual and linguistic masked autoencoding (BERT-style) pre-trainings. However, to our knowledge, to date there are no theoretical perspectives to explain the powerful expressivity of MAE. In this paper, we, for the first time, propose a unified theoretical framework that provides a mathematical understanding for MAE. Specifically, we explain the patch-based attention approaches of MAE using an integral kernel under a non-overlapping domain decomposition setting. To help the research community to further comprehend the main reasons of the great success of MAE, based on our framework, we pose five questions and answer them with mathematical rigor using insights from operator theory.