3D体素级关节分割和运动云中的时空变压器注意网络

论文标题

3D体素级关节分割和运动云中的时空变压器注意网络

Spatiotemporal Transformer Attention Network for 3D Voxel Level Joint Segmentation and Motion Prediction in Point Cloud

论文作者

Wei, Zhensong, Qi, Xuewei, Bai, Zhengwei, Wu, Guoyuan, Nayak, Saswat, Hao, Peng, Barth, Matthew, Liu, Yongkang, Oguchi, Kentaro

论文摘要

环境感知包括检测，分类，跟踪和运动预测是自动驾驶系统和智能运输应用程序的关键推动者。随着传感技术和机器学习技术的进步，基于激光雷达的传感系统已成为一个有前途的解决方案。该解决方案的当前挑战是如何有效地将不同的感知任务结合到单个主链中，以及如何直接从点云序列直接从Point Cloud序列中有效学习时空特征。在这项研究中，我们提出了一个新型时空注意网络，该网络基于变压器的自我发项机制，用于在体素水平的点云中进行关节语义分割和运动预测。训练网络可以同时输出体素级别类，并通过直接从点云数据集学习直接学习。提出的主链既包括时间注意模块（TAM）和空间注意模块（SAM），以学习和提取复杂的时空特征。已经通过Nuscenes数据集评估了这种方法，并实现了有希望的性能。

Environment perception including detection, classification, tracking, and motion prediction are key enablers for automated driving systems and intelligent transportation applications. Fueled by the advances in sensing technologies and machine learning techniques, LiDAR-based sensing systems have become a promising solution. The current challenges of this solution are how to effectively combine different perception tasks into a single backbone and how to efficiently learn the spatiotemporal features directly from point cloud sequences. In this research, we propose a novel spatiotemporal attention network based on a transformer self-attention mechanism for joint semantic segmentation and motion prediction within a point cloud at the voxel level. The network is trained to simultaneously outputs the voxel level class and predicted motion by learning directly from a sequence of point cloud datasets. The proposed backbone includes both a temporal attention module (TAM) and a spatial attention module (SAM) to learn and extract the complex spatiotemporal features. This approach has been evaluated with the nuScenes dataset, and promising performance has been achieved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题