论文标题
航空延迟分析和预测的实证研究
Empirical Study on Airline Delay Analysis and Prediction
论文作者
论文摘要
大数据分析是对非常大规模数据集的逻辑分析。数据分析可以增强组织并改善决策过程。在本文中,我们提出了航空公司延迟分析和预测,以通过天气数据集的组合分析航空公司数据集。在这项研究工作中,我们考虑了分析飞行延迟的各种属性,例如,在日常,航空公司,云覆盖,温度等。此外,我们在各种机器学习模型上进行了严格的实验,以正确预测飞行的延迟,即L2正规化,高斯幼稚的贝叶斯,k-nearest Bayes,K-Nearest邻居,k-Nearest邻居,决策森林森林和随机森林分类。随机森林模型的准确性为82%,延迟阈值的飞行延迟为15分钟。分析是使用1987年至2008年数据集进行的,该培训是在2000年至2007年的数据集中进行的,并使用2008年数据进行了验证的预测结果。此外,在随机森林模型中,我们已经召回了99%。
The Big Data analytics are a logical analysis of very large scale datasets. The data analysis enhances an organization and improve the decision making process. In this article, we present Airline Delay Analysis and Prediction to analyze airline datasets with the combination of weather dataset. In this research work, we consider various attributes to analyze flight delay, for example, day-wise, airline-wise, cloud cover, temperature, etc. Moreover, we present rigorous experiments on various machine learning model to predict correctly the delay of a flight, namely, logistic regression with L2 regularization, Gaussian Naive Bayes, K-Nearest Neighbors, Decision Tree classifier and Random forest model. The accuracy of the Random Forest model is 82% with a delay threshold of 15 minutes of flight delay. The analysis is carried out using dataset from 1987 to 2008, the training is conducted with dataset from 2000 to 2007 and validated prediction result using 2008 data. Moreover, we have got recall 99% in the Random Forest model.