论文标题
语义意识到密集的视频字幕
Semantic-Aware Pretraining for Dense Video Captioning
论文作者
论文摘要
本报告介绍了我们在活动网络挑战2021中进行事件密集任务的方法的细节。我们提出了一种语义意识到的通用视频字幕的训练预处理方法,该方法赋予了学识渊博的功能以识别高级语义概念。不同方式的不同视频功能被馈入事件字幕模块,以生成准确而有意义的句子。我们的最终合奏模型在测试集中达到了10.00流星得分。
This report describes the details of our approach for the event dense-captioning task in ActivityNet Challenge 2021. We present a semantic-aware pretraining method for dense video captioning, which empowers the learned features to recognize high-level semantic concepts. Diverse video features of different modalities are fed into an event captioning module to generate accurate and meaningful sentences. Our final ensemble model achieves a 10.00 METEOR score on the test set.