基于k-means-lstm的COVID-19确认案例的预测

论文标题

基于k-means-lstm的COVID-19确认案例的预测

Prediction of the Number of COVID-19 Confirmed Cases Based on K-Means-LSTM

论文作者

Vadyala, Shashank Reddy, Betgeri, Sai Nethra, Sherer, Eric A., Amritphale, Amod

论文摘要

Covid-19是一种大流行病，在美国开始迅速扩散，于2020年1月19日在华盛顿州发现了第一个病例。截至2020年4月20日，2020年3月9日，然后迅速增加，总案例为25,739。19020年4月20日，COVID-19的大流行非常令人不安，以至于很难理解任何人如何受到该病毒的影响。根据美国疾病控制与预防中心（CDC）的数据，尽管大多数患有冠状病毒的人81％，但几乎没有轻度症状，但其他人可能依靠呼吸机来呼吸或根本不呼吸。 SEIR模型在预测各种疾病的人口结果方面具有广泛的适用性。但是，许多研究人员在没有验证必要假设的情况下使用这些模型。太多的研究人员通常通过使用过多的预测变量和较小的样本量来创建模型来“过度拟合”数据。因此，开发的模型不太可能对单独的一组人群和地区进行有效性检查。研究人员仍未意识到在没有尝试验证的情况下发生过度拟合。在本文中，我们提出了一种组合算法，该算法使用XGBoost，K均值和长期记忆（LSTM）神经网络将相似的天数选择基于该区域进行选择，以构建一个预测模型（即短期 - LSTM），以构建在美国路易斯安那州美国路易斯安那州的短期19例。加权K均值算法基于极端梯度提升来评估预测与过去几天之间的相似性。结果表明，使用K-Means-LSTM的方法具有更高的精度，RMSE为601.20，而SEIR模型的RMSE为3615.83。

COVID-19 is a pandemic disease that began to rapidly spread in the US with the first case detected on January 19, 2020, in Washington State. March 9, 2020, and then increased rapidly with total cases of 25,739 as of April 20, 2020. The Covid-19 pandemic is so unnerving that it is difficult to understand how any person is affected by the virus. Although most people with coronavirus 81%, according to the U.S. Centers for Disease Control and Prevention (CDC), will have little to mild symptoms, others may rely on a ventilator to breathe or not at all. SEIR models have broad applicability in predicting the outcome of the population with a variety of diseases. However, many researchers use these models without validating the necessary hypotheses. Far too many researchers often "overfit" the data by using too many predictor variables and small sample sizes to create models. Models thus developed are unlikely to stand validity check on a separate group of population and regions. The researcher remains unaware that overfitting has occurred, without attempting such validation. In the paper, we present a combination algorithm that combines similar days features selection based on the region using Xgboost, K Means, and long short-term memory (LSTM) neural networks to construct a prediction model (i.e., K-Means-LSTM) for short-term COVID-19 cases forecasting in Louisana state USA. The weighted k-means algorithm based on extreme gradient boosting is used to evaluate the similarity between the forecasts and past days. The results show that the method with K-Means-LSTM has a higher accuracy with an RMSE of 601.20 whereas the SEIR model with an RMSE of 3615.83.

下载PDF全文

下载文献需遵守相关版权规定

论文标题