论文标题

期望最大化的对比度学习,用于紧凑的视频和语言表示

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

论文作者

Jin, Peng, Huang, Jinfa, Liu, Fenglin, Wu, Xian, Ge, Shen, Song, Guoli, Clifton, David A., Chen, Jie

论文摘要

大多数视频和语言表示学习方法都采用对比度学习,例如剪辑,将视频和文本特征根据文本视频对的语义相似,将视频和文本特征投射到一个常见的潜在空间中。但是,这种学到的共享潜在空间通常并不是最佳的,并且无法完全消除视觉和文本表示之间的方式差距。在本文中,我们提出了期望最大化对比度学习(EMCL)学习紧凑的视频和语言表示。具体而言,我们使用期望最大化算法为潜在空间找到一组紧凑的碱基集,其中可以简洁地表示这些碱基的线性组合。视频和语言表示的这种特征分解可降低潜在空间的等级,从而增加语义的代表力。在三个基准文本视频检索数据集上进行的广泛实验证明,与以前的方法相比,我们的EMCL可以学习更多的歧视性视频和语言表示,并且在所有指标上的先前最先前的方法都明显胜过。更令人鼓舞的是,可以应用所提出的方法来提高现有方法作为共同训练层或开箱即用的推理模块的性能,而无需额外的培训,从而使其轻松地纳入任何现有方法。

Most video-and-language representation learning approaches employ contrastive learning, e.g., CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs. However, such learned shared latent spaces are not often optimal, and the modality gap between visual and textual representation can not be fully eliminated. In this paper, we propose Expectation-Maximization Contrastive Learning (EMCL) to learn compact video-and-language representations. Specifically, we use the Expectation-Maximization algorithm to find a compact set of bases for the latent space, where the features could be concisely represented as the linear combinations of these bases. Such feature decomposition of video-and-language representations reduces the rank of the latent space, resulting in increased representing power for the semantics. Extensive experiments on three benchmark text-video retrieval datasets prove that our EMCL can learn more discriminative video-and-language representations than previous methods, and significantly outperform previous state-of-the-art methods across all metrics. More encouragingly, the proposed method can be applied to boost the performance of existing approaches either as a jointly training layer or an out-of-the-box inference module with no extra training, making it easy to be incorporated into any existing methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源