论文标题
VMLOC:基于学习的多模式相机定位的变异融合
VMLoc: Variational Fusion For Learning-Based Multimodal Camera Localization
论文作者
论文摘要
最近的基于学习的方法在单拍摄像机本地化领域取得了令人印象深刻的结果。但是,如何最好地融合多种方式(例如,图像和深度)以及处理降解或缺失的输入的方法较少。特别是,我们注意到,以前的深融合方法的性能并不比采用单一模式的模型要好得多。我们猜想这是因为通过求和或串联构成空间融合的幼稚方法,这些方法没有考虑到每种模式的不同优势。为了解决这个问题,我们提出了一个称为VMLOC的端到端框架,通过通过变异的Experts(POE)将不同的传感器输入融合到一个共同的潜在空间中,然后是基于注意力的融合。与以前的多模式变分作品直接适应了香草变分的自动编码器的目标函数,我们展示了如何通过基于重要性权重的无偏见的目标函数来准确估算相机定位。我们的模型对RGB-D数据集进行了广泛的评估,结果证明了我们的模型的功效。源代码可在https://github.com/kaichen-z/vmloc上找到。
Recent learning-based approaches have achieved impressive results in the field of single-shot camera localization. However, how best to fuse multiple modalities (e.g., image and depth) and to deal with degraded or missing input are less well studied. In particular, we note that previous approaches towards deep fusion do not perform significantly better than models employing a single modality. We conjecture that this is because of the naive approaches to feature space fusion through summation or concatenation which do not take into account the different strengths of each modality. To address this, we propose an end-to-end framework, termed VMLoc, to fuse different sensor inputs into a common latent space through a variational Product-of-Experts (PoE) followed by attention-based fusion. Unlike previous multimodal variational works directly adapting the objective function of vanilla variational auto-encoder, we show how camera localization can be accurately estimated through an unbiased objective function based on importance weighting. Our model is extensively evaluated on RGB-D datasets and the results prove the efficacy of our model. The source code is available at https://github.com/kaichen-z/VMLoc.