论文标题
深层混合物密度网络,用于众包天气数据的插值插值
A deep mixture density network for outlier-corrected interpolation of crowd-sourced weather data
论文作者
论文摘要
随着传感器和相关的IT基础架构的成本降低(如物联网所举例),环境科学家的观察数据数量增加了。但是,随着可用观察站点的数量增加,出现数据质量问题的机会也是如此,特别是考虑到许多传感器没有官方维护团队的好处。为了实现人群采购的“物联网”类型观测值对环境建模的观测值,我们需要可以在数据建模过程中自动化异常值的方法,以免它们污染感兴趣现象的真实分布。为此,在这里,我们提出了一种贝叶斯深度学习方法,用于具有自动离群值检测的环境变量的时空建模。我们的方法实现了高斯 - 均匀的混合物密度网络,其双重目的 - 对感兴趣的现象进行建模以及学习对异常值进行分类和忽略的异常值 - 通过我们的神经网络的专门设计的分支同时实现。对于我们的示例申请,我们使用了大都会办公室的天气观察网站数据,这是一个大约1900年私人运行的观察以及不列颠群岛的非官方气象站的观察。使用表面空气温度的数据,我们演示了我们的深层混合模型方法如何实现高科技时空温度分布的建模,而不会受到虚假观察的污染。我们希望采用我们的方法将有助于释放将广泛的观测资源(包括从人群采购)纳入未来环境模型的潜力。
As the costs of sensors and associated IT infrastructure decreases - as exemplified by the Internet of Things - increasing volumes of observational data are becoming available for use by environmental scientists. However, as the number of available observation sites increases, so too does the opportunity for data quality issues to emerge, particularly given that many of these sensors do not have the benefit of official maintenance teams. To realise the value of crowd sourced 'Internet of Things' type observations for environmental modelling, we require approaches that can automate the detection of outliers during the data modelling process so that they do not contaminate the true distribution of the phenomena of interest. To this end, here we present a Bayesian deep learning approach for spatio-temporal modelling of environmental variables with automatic outlier detection. Our approach implements a Gaussian-uniform mixture density network whose dual purposes - modelling the phenomenon of interest, and learning to classify and ignore outliers - are achieved simultaneously, each by specifically designed branches of our neural network. For our example application, we use the Met Office's Weather Observation Website data, an archive of observations from around 1900 privately run and unofficial weather stations across the British Isles. Using data on surface air temperature, we demonstrate how our deep mixture model approach enables the modelling of a highly skilled spatio-temporal temperature distribution without contamination from spurious observations. We hope that adoption of our approach will help unlock the potential of incorporating a wider range of observation sources, including from crowd sourcing, into future environmental models.