论文标题
在线学习视频中可学习的钥匙帧提取及其使用语义词向量的应用程序识别
Online Learnable Keyframe Extraction in Videos and its Application with Semantic Word Vector in Action Recognition
论文作者
论文摘要
由于视频摘要,动作识别等各种应用,视频处理已成为计算机视觉的流行研究方向。最近,基于深度学习的方法在行动识别中取得了令人印象深刻的结果。但是,这些方法需要处理完整的视频序列以识别该动作,即使这些帧中的大多数与识别特定动作相似且非必不可少。此外,这些非必需框架增加了计算成本,并可能使行动识别方法混淆。取而代之的是,称为KeyFrames的重要框架不仅有助于识别操作,还可以减少每个视频序列的分类或其他应用程序的处理时间,例如摘要。同样,目前的视频处理方法尚未以在线方式展示。 在以上的动机上,我们提出了一个在线学习模块,用于钥匙帧提取。该模块可用于选择视频中的键射击,因此可以应用于视频摘要。提取的密钥帧可以用作任何基于深度学习的分类模型的输入,以识别行动。我们还建议使用一个插件模块,以使用语义词向量作为输入以及关键帧以及针对分类模型的新型火车/测试策略。据我们所知,这是第一次提出这样的在线模块和火车/测试策略。 在视频摘要和行动识别中,许多常用数据集的实验结果显示了使用该模块的令人印象深刻的结果。
Video processing has become a popular research direction in computer vision due to its various applications such as video summarization, action recognition, etc. Recently, deep learning-based methods have achieved impressive results in action recognition. However, these methods need to process a full video sequence to recognize the action, even though most of these frames are similar and non-essential to recognizing a particular action. Additionally, these non-essential frames increase the computational cost and can confuse a method in action recognition. Instead, the important frames called keyframes not only are helpful in the recognition of an action but also can reduce the processing time of each video sequence for classification or in other applications, e.g. summarization. As well, current methods in video processing have not yet been demonstrated in an online fashion. Motivated by the above, we propose an online learnable module for keyframe extraction. This module can be used to select key-shots in video and thus can be applied to video summarization. The extracted keyframes can be used as input to any deep learning-based classification model to recognize action. We also propose a plugin module to use the semantic word vector as input along with keyframes and a novel train/test strategy for the classification models. To our best knowledge, this is the first time such an online module and train/test strategy have been proposed. The experimental results on many commonly used datasets in video summarization and in action recognition have shown impressive results using the proposed module.