交叉模态压缩：迈向人类易懂的语义压缩

论文标题

交叉模态压缩：迈向人类易懂的语义压缩

Cross Modal Compression: Towards Human-comprehensible Semantic Compression

论文作者

Li, Jiguo, Jia, Chuanmin, Zhang, Xinfeng, Ma, Siwei, Gao, Wen

论文摘要

传统的图像/视频压缩旨在减少尽可能高的信号保真度的传输/存储成本。但是，随着近年来对机器分析和语义监测的需求不断增长，语义保真度而不是信号忠诚正在成为图像/视频压缩中的另一个新兴关注点。随着交叉模态翻译和产生的最新进展，在本文中，我们提出了交叉模态压缩〜（CMC），一种用于视觉数据的语义压缩框架，以将高冗余的视觉数据〜（例如图像，视频等）转换为紧凑的，人类的可见域〜（例如文本，素描，素描，语义映射，属性等），同时确定，请确保。具体来说，我们首先将CMC问题作为率延伸优化问题。其次，我们研究了与传统图像/视频压缩和最新特征压缩框架的关系，显示了我们的CMC和这些先前的框架之间的差异。然后，我们为CMC提出了一种新颖的范式，以证明其有效性。定性和定量结果表明，我们提出的CMC可以以超高的压缩比实现令人鼓舞的重建结果，比广泛使用的JPEG基线显示出更好的压缩性能。

Traditional image/video compression aims to reduce the transmission/storage cost with signal fidelity as high as possible. However, with the increasing demand for machine analysis and semantic monitoring in recent years, semantic fidelity rather than signal fidelity is becoming another emerging concern in image/video compression. With the recent advances in cross modal translation and generation, in this paper, we propose the cross modal compression~(CMC), a semantic compression framework for visual data, to transform the high redundant visual data~(such as image, video, etc.) into a compact, human-comprehensible domain~(such as text, sketch, semantic map, attributions, etc.), while preserving the semantic. Specifically, we first formulate the CMC problem as a rate-distortion optimization problem. Secondly, we investigate the relationship with the traditional image/video compression and the recent feature compression frameworks, showing the difference between our CMC and these prior frameworks. Then we propose a novel paradigm for CMC to demonstrate its effectiveness. The qualitative and quantitative results show that our proposed CMC can achieve encouraging reconstructed results with an ultrahigh compression ratio, showing better compression performance than the widely used JPEG baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题