学习的图像压缩，具有离散的高斯混合物的可能性和注意力模块

论文标题

学习的图像压缩，具有离散的高斯混合物的可能性和注意力模块

Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules

论文作者

Cheng, Zhengxue, Sun, Heming, Takeuchi, Masaru, Katto, Jiro

论文摘要

图像压缩是一个基本的研究领域，数十年来已经制定了许多众所周知的压缩标准。最近，学到的压缩方法表现出快速的发展趋势，并具有令人鼓舞的结果。但是，学到的压缩算法和在重视压缩标准之间仍然存在性能差距，尤其是在广泛使用的PSNR度量方面。在本文中，我们探讨了最近学习的压缩算法的剩余冗余。我们发现了用于速率估计的准确熵模型在很大程度上影响了网络参数的优化，从而影响了速率延伸性能。因此，在本文中，我们建议使用离散的高斯混合物可能性来参数化潜在代码的分布，从而实现更准确，更灵活的熵模型。此外，我们利用了最近的注意模块，并将它们纳入网络体系结构以提高性能。实验结果表明，与现有的柯达和高分辨率数据集中的现有学习压缩方法相比，我们提出的方法实现了最先进的性能。据我们所知，我们的方法是使用有关PSNR的最新压缩标准多功能视频编码（VVC）实现可比性能的第一项工作。更重要的是，当MS-SSIM优化时，我们的方法会产生更令人愉悦的结果。此项目页面在此https url https://github.com/zhengxuecheng/learnenned-image-compression-with-gmm-and-compention

Image compression is a fundamental research field and many well-known compression standards have been developed for many decades. Recently, learned compression methods exhibit a fast development trend with promising results. However, there is still a performance gap between learned compression algorithms and reigning compression standards, especially in terms of widely used PSNR metric. In this paper, we explore the remaining redundancy of recent learned compression algorithms. We have found accurate entropy models for rate estimation largely affect the optimization of network parameters and thus affect the rate-distortion performance. Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. Besides, we take advantage of recent attention modules and incorporate them into network architecture to enhance the performance. Experimental results demonstrate our proposed method achieves a state-of-the-art performance compared to existing learned compression methods on both Kodak and high-resolution datasets. To our knowledge our approach is the first work to achieve comparable performance with latest compression standard Versatile Video Coding (VVC) regarding PSNR. More importantly, our approach generates more visually pleasant results when optimized by MS-SSIM. This project page is at this https URL https://github.com/ZhengxueCheng/Learned-Image-Compression-with-GMM-and-Attention

下载PDF全文

下载文献需遵守相关版权规定

论文标题