论文标题

Facebook用户的人口有多偏见?将Facebook用户的人口统计与人口普查数据进行比较以生成更正因素

How Biased is the Population of Facebook Users? Comparing the Demographics of Facebook Users with Census Data to Generate Correction Factors

论文作者

Ribeiro, Filipe N., Benevenuto, Fabrício, Zagheni, Emilio

论文摘要

全世界的人口普查是指导政府投资和公共政策的主要数据来源。但是,这些来源的获得非常昂贵,并且相对较少收集。在过去的十年中,人们对从社交媒体使用数据来补充传统数据来源的兴趣越来越大。但是,社交媒体使用者并不代表普通人群。因此,基于社交媒体数据的分析需要统计调整,例如分层后,以消除偏见并提出稳固的统计要求。只有在使用社交媒体的人口组频率的信息时,才有可能进行调整。与官方统计数据相比,这些数据使研究人员能够产生适当的统计校正因子。在本文中,我们利用Facebook广告平台来编译相当于Facebook用户的总普查。我们的汇编包括七个人口属性的人口分布,例如性别和年龄在美国的不同地理水平上。通过将Facebook计数与美国人口普查和盖洛普提供的官方报告进行比较,我们发现了很高的相关性,尤其是在政治倾向和种族方面。我们还确定了官方统计数据可能低估人口数量的情况,例如移民。我们使用收集的信息来计算所有计算属性的偏差校正因子,以评估在Facebook上或多或少地表示不同人口组的程度。我们提供了第一个综合分析,用于评估在几个维度上的Facebook用户中的偏见。这些信息可用于及时地产生偏见调整的人口估计和人口统计计数

Censuses around the world are key sources of data to guide government investments and public policies. However, these sources are very expensive to obtain and are collected relatively infrequently. Over the last decade, there has been growing interest in the use of data from social media to complement traditional data sources. However, social media users are not representative of the general population. Thus, analyses based on social media data require statistical adjustments, like post-stratification, in order to remove the bias and make solid statistical claims. These adjustments are possible only when we have information about the frequency of demographic groups using social media. These data, when compared with official statistics, enable researchers to produce appropriate statistical correction factors. In this paper, we leverage the Facebook advertising platform to compile the equivalent of an aggregate-level census of Facebook users. Our compilation includes the population distribution for seven demographic attributes such as gender and age at different geographic levels for the US. By comparing the Facebook counts with official reports provided by the US Census and Gallup, we found very high correlations, especially for political leaning and race. We also identified instances where official statistics may be underestimating population counts as in the case of immigration. We use the information collected to calculate bias correction factors for all computed attributes in order to evaluate the extent to which different demographic groups are more or less represented on Facebook. We provide the first comprehensive analysis for assessing biases in Facebook users across several dimensions. This information can be used to generate bias-adjusted population estimates and demographic counts in a timely way and at fine geographic granularity in between data releases of official statistics

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源