共同观看模式增强生成变压器学习，用于汽车地理位置

论文标题

共同观看模式增强生成变压器学习，用于汽车地理位置

Co-visual pattern augmented generative transformer learning for automobile geo-localization

论文作者

Zhao, Jianwei, Zhai, Qiang, Zhao, Pengbo, Huang, Rui, Cheng, Hong

论文摘要

地理位置是无人车辆的路线规划和导航的基本组成部分，但基于GNSS的地理位置在拒绝服务条件下失败。跨视图地理位置定位（CVGL）旨在通过与巨大的地理标记的空中（\ emph {e.g。}，卫星）图像匹配来估算地面相机的地理位置，但由于空中差异而引起了极大的挑战，但由于空中差异而引起了极大的挑战。在现有方法中，主要使用类似暹罗的体系结构提取不同观点的全球表示，但是很少考虑它们的互动效果。在本文中，我们为CVGL提供了一种新的方法，该方法使用跨视图知识生成技术与变压器结合使用，即相互生成变压器学习（MGTL）。具体而言，通过采用主干网络产生的初始表示，MGTL开发了两个独立的生成子模块 - 一个用于从地面视图语义中产生的空中感知知识，反之亦然 - 并通过注意机制充分利用完全互惠率。此外，为了更好地捕获空中视图和地面视图之间的共同关系，我们引入了层叠的注意力掩盖算法以进一步提高准确性。关于挑战公共基准的广泛实验，\ emph {i。}，{CVACT}和{cvusa}，演示了与现有的最新模型相比，该方法设置了新记录的有效性。

Geolocation is a fundamental component of route planning and navigation for unmanned vehicles, but GNSS-based geolocation fails under denial-of-service conditions. Cross-view geo-localization (CVGL), which aims to estimate the geographical location of the ground-level camera by matching against enormous geo-tagged aerial (\emph{e.g.}, satellite) images, has received lots of attention but remains extremely challenging due to the drastic appearance differences across aerial-ground views. In existing methods, global representations of different views are extracted primarily using Siamese-like architectures, but their interactive benefits are seldom taken into account. In this paper, we present a novel approach using cross-view knowledge generative techniques in combination with transformers, namely mutual generative transformer learning (MGTL), for CVGL. Specifically, by taking the initial representations produced by the backbone network, MGTL develops two separate generative sub-modules -- one for aerial-aware knowledge generation from ground-view semantics and vice versa -- and fully exploits the entirely mutual benefits through the attention mechanism. Moreover, to better capture the co-visual relationships between aerial and ground views, we introduce a cascaded attention masking algorithm to further boost accuracy. Extensive experiments on challenging public benchmarks, \emph{i.e.}, {CVACT} and {CVUSA}, demonstrate the effectiveness of the proposed method which sets new records compared with the existing state-of-the-art models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题