VCT：视频压缩变压器

论文标题

VCT：视频压缩变压器

VCT: A Video Compression Transformer

论文作者

Mentzer, Fabian, Toderici, George, Minnen, David, Hwang, Sung-Jin, Caelles, Sergi, Lucic, Mario, Agustsson, Eirikur

论文摘要

我们展示了如何使用变压器来大大简化神经视频压缩。以前的方法一直依赖越来越多的建筑偏见和先进的方法，包括运动预测和翘曲操作，从而导致复杂的模型。取而代之的是，我们独立地将输入帧映射到表示形式，并使用变压器对其依赖关系进行建模，让它预测给定过去的未来表示的分布。最终的视频压缩变压器优于标准视频压缩数据集上的先前方法。合成数据的实验表明，我们的模型学会了处理复杂的运动模式，例如纯粹从数据中模糊和褪色。我们的方法易于实施，我们发布代码以促进未来的研究。

We show how transformers can be used to vastly simplify neural video compression. Previous methods have been relying on an increasing number of architectural biases and priors, including motion prediction and warping operations, resulting in complex models. Instead, we independently map input frames to representations and use a transformer to model their dependencies, letting it predict the distribution of future representations given the past. The resulting video compression transformer outperforms previous methods on standard video compression data sets. Experiments on synthetic data show that our model learns to handle complex motion patterns such as panning, blurring and fading purely from data. Our approach is easy to implement, and we release code to facilitate future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题