Tubetk：采用管子以一步训练模型跟踪多对象

论文标题

Tubetk：采用管子以一步训练模型跟踪多对象

TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model

论文作者

Pang, Bo, Li, Yizhuo, Zhang, Yifan, Li, Muchen, Lu, Cewu

论文摘要

多对象跟踪是一个基本的视觉问题，已经研究了很长时间。由于深度学习为对象检测算法带来了出色的性能，因此通过检测跟踪（TBD）已成为主流跟踪框架。尽管TBD取得了成功，但这种两步方法太复杂了，无法以端到端的方式训练并引起许多挑战，例如对视频时空信息的探索不足，面对对象闭塞时的脆弱性以及对检测结果的过度依赖。 To address these challenges, we propose a concise end-to-end model TubeTK which only needs one step training by introducing the ``bounding-tube" to indicate temporal-spatial locations of objects in a short video clip. TubeTK provides a novel direction of multi-object tracking, and we demonstrate its potential to solve the above challenges without bells and whistles. We analyze the performance of TubeTK on several MOT benchmarks and provide经验证据表明，与采用私人检测结果的其他方法相比，没有任何辅助技术在某种程度上克服一定程度的阻塞，我们的一阶段端到端模型可以实现最新的性能，即使它可以采用拟议的模型。 https://github.com/bopang1996/tubetk。

Multi-object tracking is a fundamental vision problem that has been studied for a long time. As deep learning brings excellent performances to object detection algorithms, Tracking by Detection (TBD) has become the mainstream tracking framework. Despite the success of TBD, this two-step method is too complicated to train in an end-to-end manner and induces many challenges as well, such as insufficient exploration of video spatial-temporal information, vulnerability when facing object occlusion, and excessive reliance on detection results. To address these challenges, we propose a concise end-to-end model TubeTK which only needs one step training by introducing the ``bounding-tube" to indicate temporal-spatial locations of objects in a short video clip. TubeTK provides a novel direction of multi-object tracking, and we demonstrate its potential to solve the above challenges without bells and whistles. We analyze the performance of TubeTK on several MOT benchmarks and provide empirical evidence to show that TubeTK has the ability to overcome occlusions to some extent without any ancillary technologies like Re-ID. Compared with other methods that adopt private detection results, our one-stage end-to-end model achieves state-of-the-art performances even if it adopts no ready-made detection results. We hope that the proposed TubeTK model can serve as a simple but strong alternative for video-based MOT task. The code and models are available at https://github.com/BoPang1996/TubeTK.

下载PDF全文

下载文献需遵守相关版权规定

论文标题