A Comparative Analysis of Forecasting Models on COVID-19

Erol Genevois, Müjde; Cedolin, Michele

doi:10.1007/978-3-030-91851-4_8

Müjde Erol Genevois¹⁴ &
Michele Cedolin¹⁴

Part of the book series: International Series in Operations Research & Management Science ((ISOR,volume 326))

395 Accesses

Abstract

The COVID-19 spread all around the world, causing more than a million deaths and reaching over 50 million confirmed cases. A forecast of these numbers is vital for the adequate preparations of health care capacities and for the governments to take the necessary decisions. In this study, it is aimed to predict the evolution of COVID-19 figures, employing alternative statistical models such as the Holt-Winters, ARIMA, and ARIMAX while using the time series corresponding to different parameters of this disease such as daily cases, daily deaths, and the stringency index. Considered are the John Hopkins University epidemiological world data and the top ten countries with the highest cases, along with China. The fitting of the time series and the upcoming 10 days projections resulted in a high level of accuracy, presented with alternative error metrics and comparisons between the situations of countries. Holt-Winters is the best performing model, while ARIMAX gives the worst accuracy results. Moreover, it was found that the use of coefficient determination and Bayesian information criterion alone are not suitable, and scale independent metrics should be employed when the data ranges differ. The results of this study would be useful to set up benchmark results for other studies and the projections may be used for medical, economic, and social precaution and preparation.

Access provided by Autonomous University of Puebla. Download chapter PDF

Time Series Analysis in COVID-19 Daily Reported Cases in South Africa: A Box-Jenkins Methodology

Analysis of COVID-19 Trends in Bangladesh: A Machine Learning Analysis

The Research of Mathematical Models for Forecasting Covid-19 Cases

Keywords

1 Introduction

Since December 2019, the world is in combat with COVID-19, which started in Wuhan, China and spread to more than a 100 countries. According to the data collected by the World Health Organization (WHO)^{Footnote 1} on February 14th, 2021, there have been 108,153,741 confirmed cases of COVID-19, with 2,381,295 cases resulting in deaths. While some countries are going through the second wave, some states started the vaccination process, and governments responded to the global pandemic with different measures.

Apart from the clinical researchers, academics approached the COVID-19 problem in different ways. While the pandemic spread and 1 year of living with COVID-19 passed, the social impact and economic aspects of the virus have become critical (Bruns et al., 2020). The diagnosis of the virus with the artificial intelligence image processing techniques became important (Bhattacharya et al., 2021). Similarities between the SARS and other epidemics were investigated (Peeri et al., 2020). Some part of the studies concentrated on estimating the cases and deaths per country, and a significant forecasting literature was formed.

Because there was no data available at the start of this epidemic, predicting and projections were difficult. However, the spread of COVID-19 is highly dangerous and necessitates strict plans and government policies. Therefore, forecasting confirmed cases and deaths in the future days is crucial in order to manage health resource capacities and put in place the necessary protection procedures. Consequently, this study tries to apply alternative forecasting models for the daily reported COVID-19 confirmed cases and deaths of the most affected 10 countries and China. It employs, namely Holt-Winters, ARIMA, and ARIMAX models, providing accuracy results in alternative error metrics.

The rest of the study is organized as follows: The second section consists of literature review. The third section presents the employed methods with the application, and the fourth section gives the concluding remarks and discussions.

2 COVID-19 Forecasting Literature Review

Forecasting the outcomes of a pandemic is important for governments in order to take the necessary restriction measures while preparing the appropriate health infrastructure (e.g. intensive care units for COVID-19 cases). The countries shared their data despite their discordance, with the public and WHO. Many researchers employed this data (worldwide or in specific countries) to forecast cases, deaths, and recovery numbers. In the last year, a solid forecasting literature was built where researchers alternated approaches such as machine learning approaches, statistical and epidemiological models. These articles focused on a selected country or a group of countries’ case and death data (daily or cumulative), while some articles aimed to forecast the worldwide data generally in alternative forecasting horizons and training scales were written.

Al-Qaness et al. (2020) proposed an Adaptive Neuro-Fuzzy Inference System (ANFIS), which uses an enhanced flower pollination algorithm (FPA) with the help of the Salp Swarm Algorithm (SSA) to forecast the confirmed cases in China for the upcoming 10 days. Ankarali et al. (2020) employed ARIMA, Simple Exponential Smoothing, Holt’s Two Parameter Model, and Brown’s Double Exponential Smoothing Model to forecast 10 days of cumulative cases, cumulative deaths, daily cases, daily deaths, cumulative recovered and active cases of 25 countries, which exceed 1000 as cumulative cases in March 15. Ayinde et al. (2020) focused on the Nigeria data set and tried to forecast 2 months of confirmed cases, discharged cases, and death cases using classic, quadratic, cubic, and quartic versions of linear regression, logarithmic regression, logistic regression, and exponential linear regression. Ayyoubzadeh et al. (2020) predicted the COVID-19 cases in Iran using the Google Trends data. They employed the linear regression model and long short-term memory (LSTM) models and obtained a strong correlation for keywords like “hand sanitizer,” “handwashing,” and “antiseptic.”

Benvenuto et al. (2020) employed ARIMA to forecast the next 2 days of COVID-19 confirmed cases and indicated that “if the virus does not develop new mutations, the number of cases should reach a plateau.” Crokidakis (2020) employed a susceptible–infectious–quarantined–recovered (SIQR) model to estimate confirmed cases, ratio of infectious individuals, the reproduction number, and the epidemic doubling time of Brazil. Dandekar and Barbastathis (2020) built a neural network aided quarantine control model to test the impact of strict quarantine measures on the reproduction number in Wuhan. Their simulation results showed that rigid quarantine measures helped China on the new case numbers. Hernandez-Matamoros et al. (2020) built ARIMA models to forecast total case numbers per million, grouping countries according to their continents. Hu et al. (2020) used modified autoencoders for modeling the number of the cumulative and newly confirmed cases. They outlined the immense difference between the immediate and late interventions on total active cases and suggested a case ending time of January 10, 2021 under immediate aggressive interaction. Ibrahim et al. (2021) employed a variational Long Short-Term Memory (LSTM) autoencoder to forecast the spread of the coronavirus across the globe for the next day and 10 days ahead that employs historical data with the urban characteristics and stringency index measures. Ivorra et al. (2020) developed a new θ-SEIHRD model containing the characteristics of COVID-19, to identify the unknown parameters of the pandemics that fit the total cases of China, to estimate the reproduction rate, and to find the maximum number of hospitalized people.

Jia et al. (2020) employed a Logistic model, the Bertalanffy and the Gompertz model to estimate the new cases and death toll of China. Among these mathematical models, the logistic model is the best fitting-one. Kafieh et al. (2020) trained alternative machine learning models as random forest, multi-layer perceptron, LSTM with regular and extended features, and multivariate LSTM to estimate daily number of confirmed, death, and recovered COVID-19 cases. Kolozsvari et al. (2020) used recurrent neural networks with LSTM units to create prediction models of 17 countries’ daily infection numbers per 100,000 habitants, outlining the effect of the repeated peaks of the epidemic. Kumar et al. (2020) employed ARIMA and Richard’s model to estimate new cases, new deaths, and recovery rates of India. Liu et al. (2020) used related internet search activity in their combined mechanistic and machine learning model to estimate the real-time COVID-19 cases of the Chinese provinces. Liu et al. (2021) modeled the coronavirus in China, South Korea, Italy, Germany, and the UK, and under different scenarios, simulated their new cases. Milhinhos and Costa (2020) employed nonlinear regression to estimate the active cases and total deaths of Portugal and built a comparative model with South Korea, outlining the similarities. Pandey et al. (2020) employed SEIR and a regression model to predict the expected cases in India within 2 weeks.

Petropoulos and Makridakis (2020) employed exponential smoothing to forecast global confirmed cases, deaths, and recoveries with a forecasting horizon of 10 days. Roosa et al. (2020) used the generalized logistic growth model (GLM) and the Richards model to estimate 5-, 10-, and 15- days of cumulative cases of China. Sameni (2020) employed SEIR and the compartmental model to estimate the propagation. They simulated seven different scenarios and tried to find the reproduction and fatality rates of COVID-19. Xu et al. (2020) used the SEIQRPD model which divided the population into susceptible, exposed, infectious, quarantined, recovered, insusceptible, and deceased individuals to estimate the USA COVID-19 cases. Yang et al. (2020) used the SEIR model helped by a trained LSTM in SARS-2003, to predict COVID-19 peaks and sizes in China. Yonar et al. (2020) employed exponential smoothing and ARIMA to forecast the number of COVID-19 cases of the G8 countries. Table 1 summarizes the existing literature per country (forecasting target), the employed method, and the forecasting horizon.

Table 1 COVID-19 forecasting literature

A Comparative Analysis of Forecasting Models on COVID-19

Abstract

Similar content being viewed by others

Time Series Analysis in COVID-19 Daily Reported Cases in South Africa: A Box-Jenkins Methodology

Analysis of COVID-19 Trends in Bangladesh: A Machine Learning Analysis

The Research of Mathematical Models for Forecasting Covid-19 Cases

Keywords

1 Introduction

2 COVID-19 Forecasting Literature Review

3 Methodology and Application

3.1 Data Characteristics

3.2 Error Metrics

3.3 Holt-Winters Model

3.4 ARIMA

3.5 ARIMAX

4 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation