论文标题

部分可观测时空混沌系统的无模型预测

Efficient Video Instance Segmentation via Tracklet Query and Proposal

论文作者

Wu, Jialian, Yarram, Sudhir, Liang, Hui, Lan, Tian, Yuan, Junsong, Eledath, Jayan, Medioni, Gerard

论文摘要

视频实例细分(VIS)旨在同时对视频中的多个对象实例进行分类,细分和跟踪。最近的剪辑级VIS播放了一个简短的视频剪辑,因为每次显示出比帧级VIS(按段落跟踪)更强的性能,因为使用了来自多个框架的更多时间上下文。但是,大多数剪辑级方法既不是端到端的,也不是实时的。这些限制是由最近的Vis Transformer(VISTR)解决的,该变压器(VISTR)在剪辑中端到端执行VIS。但是,由于其框架密集的关注,Vistr的训练时间很长。此外,VISTR在多个视频剪辑中无法完全端到端学习,因为它需要手工制作的数据关联将实例曲目链接在连续剪辑之间。本文提出了有效培训和推理的完全端到端框架。核心是通过迭代Query-video交互在空间和时间之间关联利益区域(ROI)(ROI)的曲目查询和曲目提案。我们进一步提出了一项对应学习,使轨道链接端到端可学习。与VISTR相比,EfficityVis需要减少15倍的训练时期,同时在YouTube-VIS基准上实现最先进的准确性。同时,我们的方法可以在没有数据关联的情况下单个端到端通过的整个视频实例进行细分。

Video Instance Segmentation (VIS) aims to simultaneously classify, segment, and track multiple object instances in videos. Recent clip-level VIS takes a short video clip as input each time showing stronger performance than frame-level VIS (tracking-by-segmentation), as more temporal context from multiple frames is utilized. Yet, most clip-level methods are neither end-to-end learnable nor real-time. These limitations are addressed by the recent VIS transformer (VisTR) which performs VIS end-to-end within a clip. However, VisTR suffers from long training time due to its frame-wise dense attention. In addition, VisTR is not fully end-to-end learnable in multiple video clips as it requires a hand-crafted data association to link instance tracklets between successive clips. This paper proposes EfficientVIS, a fully end-to-end framework with efficient training and inference. At the core are tracklet query and tracklet proposal that associate and segment regions-of-interest (RoIs) across space and time by an iterative query-video interaction. We further propose a correspondence learning that makes tracklets linking between clips end-to-end learnable. Compared to VisTR, EfficientVIS requires 15x fewer training epochs while achieving state-of-the-art accuracy on the YouTube-VIS benchmark. Meanwhile, our method enables whole video instance segmentation in a single end-to-end pass without data association at all.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源