论文标题

用于样品选择偏差校正的贝叶斯算法

A Bayesian algorithm for sample selection bias correction

论文作者

Astuti, Valerio

论文摘要

在本文中,我们提出了一种基于调查数据的非传统数据与统计数据的技术,以部分纠正非随机样本选择产生的偏差。所有主要的社交媒体平台都代表着由自我选择过程产生的一般人群的大量样本。这意味着他们不是代表大公众的代表,并且在推断从这些样本到整个人群得出的结论中存在问题。我们提出了一种将这些大量数据与来自传统来源的算法集成在一起的算法,其属性却较少但更可靠。这种集成允许利用两全其美的最佳,并掌握典型的“大数据”来源的细节以及精心设计的样本调查的代表性。

In this paper we present a technique to couple non-traditional data with statistics based on survey data, in order to partially correct for the bias produced by non-random sample selections. All major social media platforms represent huge samples of the general population, generated by a self-selection process. This implies that they are not representative of the larger public, and there are problems in extrapolating conclusions drawn from these samples to the whole population. We present an algorithm to integrate these massive data with ones coming from traditional sources, with the properties of being less extensive but more reliable. This integration allows to exploit the best of both worlds and reach the detail of typical "big data" sources and the representativeness of a carefully designed sample survey.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源