论文标题
多媒体社会分析的数据集和基准
A Dataset and Benchmarks for Multimedia Social Analysis
论文作者
论文摘要
我们提出了一个新的公开数据集,目的是通过在同一上下文中提供视觉和语言数据来推进多模式学习。这是通过从社交媒体网站获取数据的帖子,其中包含多个配对的图像/视频和文本以及包含图像/视频和/或文本的评论树来实现的。共有67.7万张帖子,290万个帖子图像,488K发布视频,140万个评论图像,460万个评论视频和9690万条评论,可以共同使用来自不同方式的数据,以改善各种任务的性能,例如图像字幕,图像分类,下一个框架预测,sentients分析,sentients分析和语言模型。我们为数据集提供了广泛的统计信息。最后,我们使用预训练的模型和几个完全连接的网络为其中一项回归任务提供了基线性能分析。
We present a new publicly available dataset with the goal of advancing multi-modality learning by offering vision and language data within the same context. This is achieved by obtaining data from a social media website with posts containing multiple paired images/videos and text, along with comment trees containing images/videos and/or text. With a total of 677k posts, 2.9 million post images, 488k post videos, 1.4 million comment images, 4.6 million comment videos, and 96.9 million comments, data from different modalities can be jointly used to improve performances for a variety of tasks such as image captioning, image classification, next frame prediction, sentiment analysis, and language modeling. We present a wide range of statistics for our dataset. Finally, we provide baseline performance analysis for one of the regression tasks using pre-trained models and several fully connected networks.