论文标题

Tumblr用户的大规模性别/年龄预测

Large-scale Gender/Age Prediction of Tumblr Users

论文作者

Zhan, Yao, Hu, Changwei, Hu, Yifan, Kasturi, Tejaswi, Ramasamy, Shanmugam, Gillingham, Matt, Yamamoto, Keith

论文摘要

作为领先的内容提供商和社交媒体,Tumblr每月吸引3.71亿个访问,2.8亿个博客和5330万个帖子。 Tumblr的受欢迎程度为广告商提供了通过赞助帖子来推广其产品的绝佳机会。但是,针对广告的特定人口组是一项艰巨的任务,因为Tumblr在注册过程中不需要性别和年龄等用户信息。因此,为了促进广告定位,必须使用帖子,图像和社交联系等丰富内容来预测用户的人口统计学。在本文中,我们为年龄和性别预测提出了基于图和深度学习的模型,这些模型考虑了用户活动和内容功能。对于基于图形的模型,我们提出了两种方法:网络嵌入和标签传播,以生成连接功能以及直接推断用户的人口统计学。对于深度学习模型,我们利用卷积神经网络(CNN)和多层感知器(MLP)来预测用户的年龄和性别。实际Tumblr每日数据集的实验结果,数亿活跃用户和数十亿个以下关系,表明我们的方法通过将年龄相对提高的准确性提高了81%,而AUC和准确性则显着优于基线模型,而性别的AUC和准确性则提高了5 \%。

Tumblr, as a leading content provider and social media, attracts 371 million monthly visits, 280 million blogs and 53.3 million daily posts. The popularity of Tumblr provides great opportunities for advertisers to promote their products through sponsored posts. However, it is a challenging task to target specific demographic groups for ads, since Tumblr does not require user information like gender and ages during their registration. Hence, to promote ad targeting, it is essential to predict user's demography using rich content such as posts, images and social connections. In this paper, we propose graph based and deep learning models for age and gender predictions, which take into account user activities and content features. For graph based models, we come up with two approaches, network embedding and label propagation, to generate connection features as well as directly infer user's demography. For deep learning models, we leverage convolutional neural network (CNN) and multilayer perceptron (MLP) to prediction users' age and gender. Experimental results on real Tumblr daily dataset, with hundreds of millions of active users and billions of following relations, demonstrate that our approaches significantly outperform the baseline model, by improving the accuracy relatively by 81% for age, and the AUC and accuracy by 5\% for gender.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源