XMEM：带有Atkinson-Shiffrin内存模型的长期视频对象分割

论文标题

XMEM：带有Atkinson-Shiffrin内存模型的长期视频对象分割

XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

论文作者

Cheng, Ho Kei, Schwing, Alexander G.

论文摘要

我们提出XMEM，这是一种由Atkinson-Shiffrin内存模型启发的统一功能存储器存储的长视频的视频对象分割体系结构。视频对象分割的先前工作通常仅使用一种类型的功能内存。对于超过一分钟的视频，单个功能内存模型紧密地链接了内存消耗和准确性。相比之下，遵循Atkinson-Shiffrin模型，我们开发了一个结构，该体系结构结合了多个独立但深厚的特征记忆存储：快速更新的感觉存储器，高分辨率的工作记忆和紧凑的长期记忆。至关重要的是，我们开发了一种记忆增强算法，该算法通常将主动使用的工作记忆元素合并到长期记忆中，从而避免记忆爆炸并最大程度地减少长期预测的性能衰减。结合新的内存阅读机制，XMEM在长时间数据集上与最新方法（不适用于长视频无效）同时，XMEM大大超过了长效数据集上的最新性能。代码可从https://hkchengrex.github.io/xmem获得

We present XMem, a video object segmentation architecture for long videos with unified feature memory stores inspired by the Atkinson-Shiffrin memory model. Prior work on video object segmentation typically only uses one type of feature memory. For videos longer than a minute, a single feature memory model tightly links memory consumption and accuracy. In contrast, following the Atkinson-Shiffrin model, we develop an architecture that incorporates multiple independent yet deeply-connected feature memory stores: a rapidly updated sensory memory, a high-resolution working memory, and a compact thus sustained long-term memory. Crucially, we develop a memory potentiation algorithm that routinely consolidates actively used working memory elements into the long-term memory, which avoids memory explosion and minimizes performance decay for long-term prediction. Combined with a new memory reading mechanism, XMem greatly exceeds state-of-the-art performance on long-video datasets while being on par with state-of-the-art methods (that do not work on long videos) on short-video datasets. Code is available at https://hkchengrex.github.io/XMem

下载PDF全文

下载文献需遵守相关版权规定

论文标题