论文标题

平均水平的平均水平如何?手机数据衡量的人类行为的时间模式 - 或为什么选择星期四

How average is average? Temporal patterns in human behaviour as measured by mobile phone data -- or why chose Thursdays

论文作者

Toger, Marina, Shuttleworth, Ian, Östh, John

论文摘要

移动电话数据(将文件尺寸扩展到Terabytes)很容易淹没一些研究人员可用的计算能力。此外,出于道德原因,通常仅将数据访问授予特定子集,从而限制分析以涵盖单天,几周或地理区域。因此,与其他天,几周或几个月相比,在其上下文中通常不可能设置特定的分析或事件。这对于学术裁判员对手机数据的研究以及分析师的决定如何进行采样,处理多少数据以及哪些事件是异常的。所有这些问题都需要了解大数据中的可变性,以回答平均水平的平均水平的问题?本文使用大型手机数据集提供了一种方法来回答这些基本但必要的问题。我们表明,通过在每小时,每日和每月的水平上分析数据的时间变异性,文件大小是电话用户活动水平的强大代理。然后,我们应用时间序列分析以隔离时间周期性。最后,我们讨论了数据中异常事件的置信度限制。我们建议一种用于移动电话数据选择的分析方法,该方法表明,理想的数据应在整个工作周,整个星期和一年中进行采样,以获得代表性的平均值。但是,在不可能的情况下,时间变异性是使特定的工作日数据可以提供其他日子的一般结构的公平情况。

Mobile phone data -- with file sizes scaling into terabytes -- easily overwhelm the computational capacity available to some researchers. Moreover, for ethical reasons, data access is often granted only to particular subsets, restricting analyses to cover single days, weeks, or geographical areas. Consequently, it is frequently impossible to set a particular analysis or event in its context and know how typical it is, compared to other days, weeks or months. This is important for academic referees questioning research on mobile phone data and for the analysts in deciding how to sample, how much data to process, and which events are anomalous. All these issues require an understanding of variability in Big Data to answer the question of how average is average? This paper provides a method, using a large mobile phone dataset, to answer these basic but necessary questions. We show that file size is a robust proxy for the activity level of phone users by profiling the temporal variability of the data at an hourly, daily and monthly level. We then apply time-series analysis to isolate temporal periodicity. Finally, we discuss confidence limits to anomalous events in the data. We recommend an analytical approach to mobile phone data selection which suggests that ideally data should be sampled across days, across working weeks, and across the year, to obtain a representative average. However, where this is impossible, the temporal variability is such that specific weekdays' data can provide a fair picture of other days in their general structure.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源