Keywords

1 Introduction

Since December 2019, the world is in combat with COVID-19, which started in Wuhan, China and spread to more than a 100 countries. According to the data collected by the World Health Organization (WHO)Footnote 1 on February 14th, 2021, there have been 108,153,741 confirmed cases of COVID-19, with 2,381,295 cases resulting in deaths. While some countries are going through the second wave, some states started the vaccination process, and governments responded to the global pandemic with different measures.

Apart from the clinical researchers, academics approached the COVID-19 problem in different ways. While the pandemic spread and 1 year of living with COVID-19 passed, the social impact and economic aspects of the virus have become critical (Bruns et al., 2020). The diagnosis of the virus with the artificial intelligence image processing techniques became important (Bhattacharya et al., 2021). Similarities between the SARS and other epidemics were investigated (Peeri et al., 2020). Some part of the studies concentrated on estimating the cases and deaths per country, and a significant forecasting literature was formed.

Because there was no data available at the start of this epidemic, predicting and projections were difficult. However, the spread of COVID-19 is highly dangerous and necessitates strict plans and government policies. Therefore, forecasting confirmed cases and deaths in the future days is crucial in order to manage health resource capacities and put in place the necessary protection procedures. Consequently, this study tries to apply alternative forecasting models for the daily reported COVID-19 confirmed cases and deaths of the most affected 10 countries and China. It employs, namely Holt-Winters, ARIMA, and ARIMAX models, providing accuracy results in alternative error metrics.

The rest of the study is organized as follows: The second section consists of literature review. The third section presents the employed methods with the application, and the fourth section gives the concluding remarks and discussions.

2 COVID-19 Forecasting Literature Review

Forecasting the outcomes of a pandemic is important for governments in order to take the necessary restriction measures while preparing the appropriate health infrastructure (e.g. intensive care units for COVID-19 cases). The countries shared their data despite their discordance, with the public and WHO. Many researchers employed this data (worldwide or in specific countries) to forecast cases, deaths, and recovery numbers. In the last year, a solid forecasting literature was built where researchers alternated approaches such as machine learning approaches, statistical and epidemiological models. These articles focused on a selected country or a group of countries’ case and death data (daily or cumulative), while some articles aimed to forecast the worldwide data generally in alternative forecasting horizons and training scales were written.

Al-Qaness et al. (2020) proposed an Adaptive Neuro-Fuzzy Inference System (ANFIS), which uses an enhanced flower pollination algorithm (FPA) with the help of the Salp Swarm Algorithm (SSA) to forecast the confirmed cases in China for the upcoming 10 days. Ankarali et al. (2020) employed ARIMA, Simple Exponential Smoothing, Holt’s Two Parameter Model, and Brown’s Double Exponential Smoothing Model to forecast 10 days of cumulative cases, cumulative deaths, daily cases, daily deaths, cumulative recovered and active cases of 25 countries, which exceed 1000 as cumulative cases in March 15. Ayinde et al. (2020) focused on the Nigeria data set and tried to forecast 2 months of confirmed cases, discharged cases, and death cases using classic, quadratic, cubic, and quartic versions of linear regression, logarithmic regression, logistic regression, and exponential linear regression. Ayyoubzadeh et al. (2020) predicted the COVID-19 cases in Iran using the Google Trends data. They employed the linear regression model and long short-term memory (LSTM) models and obtained a strong correlation for keywords like “hand sanitizer,” “handwashing,” and “antiseptic.”

Benvenuto et al. (2020) employed ARIMA to forecast the next 2 days of COVID-19 confirmed cases and indicated that “if the virus does not develop new mutations, the number of cases should reach a plateau.” Crokidakis (2020) employed a susceptible–infectious–quarantined–recovered (SIQR) model to estimate confirmed cases, ratio of infectious individuals, the reproduction number, and the epidemic doubling time of Brazil. Dandekar and Barbastathis (2020) built a neural network aided quarantine control model to test the impact of strict quarantine measures on the reproduction number in Wuhan. Their simulation results showed that rigid quarantine measures helped China on the new case numbers. Hernandez-Matamoros et al. (2020) built ARIMA models to forecast total case numbers per million, grouping countries according to their continents. Hu et al. (2020) used modified autoencoders for modeling the number of the cumulative and newly confirmed cases. They outlined the immense difference between the immediate and late interventions on total active cases and suggested a case ending time of January 10, 2021 under immediate aggressive interaction. Ibrahim et al. (2021) employed a variational Long Short-Term Memory (LSTM) autoencoder to forecast the spread of the coronavirus across the globe for the next day and 10 days ahead that employs historical data with the urban characteristics and stringency index measures. Ivorra et al. (2020) developed a new θ-SEIHRD model containing the characteristics of COVID-19, to identify the unknown parameters of the pandemics that fit the total cases of China, to estimate the reproduction rate, and to find the maximum number of hospitalized people.

Jia et al. (2020) employed a Logistic model, the Bertalanffy and the Gompertz model to estimate the new cases and death toll of China. Among these mathematical models, the logistic model is the best fitting-one. Kafieh et al. (2020) trained alternative machine learning models as random forest, multi-layer perceptron, LSTM with regular and extended features, and multivariate LSTM to estimate daily number of confirmed, death, and recovered COVID-19 cases. Kolozsvari et al. (2020) used recurrent neural networks with LSTM units to create prediction models of 17 countries’ daily infection numbers per 100,000 habitants, outlining the effect of the repeated peaks of the epidemic. Kumar et al. (2020) employed ARIMA and Richard’s model to estimate new cases, new deaths, and recovery rates of India. Liu et al. (2020) used related internet search activity in their combined mechanistic and machine learning model to estimate the real-time COVID-19 cases of the Chinese provinces. Liu et al. (2021) modeled the coronavirus in China, South Korea, Italy, Germany, and the UK, and under different scenarios, simulated their new cases. Milhinhos and Costa (2020) employed nonlinear regression to estimate the active cases and total deaths of Portugal and built a comparative model with South Korea, outlining the similarities. Pandey et al. (2020) employed SEIR and a regression model to predict the expected cases in India within 2 weeks.

Petropoulos and Makridakis (2020) employed exponential smoothing to forecast global confirmed cases, deaths, and recoveries with a forecasting horizon of 10 days. Roosa et al. (2020) used the generalized logistic growth model (GLM) and the Richards model to estimate 5-, 10-, and 15- days of cumulative cases of China. Sameni (2020) employed SEIR and the compartmental model to estimate the propagation. They simulated seven different scenarios and tried to find the reproduction and fatality rates of COVID-19. Xu et al. (2020) used the SEIQRPD model which divided the population into susceptible, exposed, infectious, quarantined, recovered, insusceptible, and deceased individuals to estimate the USA COVID-19 cases. Yang et al. (2020) used the SEIR model helped by a trained LSTM in SARS-2003, to predict COVID-19 peaks and sizes in China. Yonar et al. (2020) employed exponential smoothing and ARIMA to forecast the number of COVID-19 cases of the G8 countries. Table 1 summarizes the existing literature per country (forecasting target), the employed method, and the forecasting horizon.

Table 1 COVID-19 forecasting literature

As can be observed from Table 1, most of the studies focused on a single country data, with a forecasting horizon ranging from 2 to 15 days, while there are articles that aim only to fit the training data set. Along with the epidemiological models, regression models are widely used in the literature. Literature outlines that statistical models are simple but effective tools to forecast COVID-19 numbers with the highest proportion. Machine learning models such as LSTM or ANFIS, epidemiological models such as SEIR and combinations like SIR and SIQR are the other common approaches.

In this study, double exponential smoothing (Holt-Winters), ARIMA, and exogenous version ARIMAX models are employed to fit and forecast the daily case and daily death numbers of the selected countries and global data.

3 Methodology and Application

3.1 Data Characteristics

The data employed in this study is available at the Coronavirus Research Center at the Johns Hopkins University website.Footnote 2 A total of 192 countries deals with the virus; however, in this study, only the first ten countries with the highest case numbers on February 9th, 2021 and China are considered. The first case dates differ among the countries, and the reporting process of these cases is somehow problematic. Therefore, the first day of the training set is selected as the day when cumulative cases reached “100” for each country, which is considered to be a more robust option. The training set length differs for each country and ends on January 30th, 2021. The remaining days are separated for the forecasting part. Table 2 shows the countries, with their cumulative case and death numbers, the initial date with data length and the fatality rate.

As is observed in Table 2, the countries have combatted the virus since March 2020. The USA is the most effected country by case and death numbers. The average fatality rate is 2.2%, while the maximum fatality rate is observed in China and Italy, 4.8% and 3.5%, respectively. The minimum fatality rates are in India and Turkey with 1.4% and 1.6%, which may be linked to the average population age of these countries. China is the virus-source (the virus’ source) country. The virus spread after approximately 1 month to Europe, starting with Italy, France, the UK, Germany, and Spain. At last, it affected Russia and Turkey. The last affected countries had more time to prepare, while countries like Italy, which was the first effected, experienced more difficulties in the initial days of the spread of the virus. The countries show some similarities; however, they all have different COVID-19 waves lengths and population properties. In the appendix, charts belonging to daily case vs daily death numbers of the countries can be referenced to investigate these differences. To mathematically evaluate the resemblances between the case and death time series, a correlation test between the countries’ data and world data is done. However, due to the initial day differences, the test is applied for the first 316 days of the pandemic. The outcomes for the new cases and new deaths correlations are as follows.

Table 3 shows that for most of the countries, a correlation between the country based new cases and worldwide new cases can be obtained; however, this hypothesis is not true for the daily death numbers of the countries. China is acting as an outlier in every aspect. The Spain data is corrupted since it contains negative values for the new deaths and new cases along with zeroes. In terms of the case numbers USA, UK, Germany, and Russia are highly correlated with the World. In terms of the death numbers, there is no country that is linearly correlated with the worldwide death numbers. To sum up, worldwide data is not an explanatory variable to yield better results when it comes to individual country data. The time series of the countries show obvious differences; therefore, they should be examined separately, and they should have their own model configurations. The figures in this study show the charts belonging to the worldwide data, and in the appendix the figures belonging to the other countries may be found. Figure 1 shows the New Case vs Daily Death numbers of the World and USA.

Table 2 COVID-19 details of the countries
Table 3 Correlation test results
Fig. 1
figure 1

Training data: left—World, right—USA

3.2 Error Metrics

Alternative error metrics are employed in the COVID-19 forecasting literature. All the statistical models based their study on R 2 which is the coefficient of determination that represents the proportion of the variance for dependent variable by the regression variable. RMSE and bic are the other error metrics that are used by Ankarali et al. (2020), Kumar et al. (2020), and Yonar et al. (2020). In this study, the results are provided according to these metrics. However, these metrics are scale dependent and are not suitable to compare the forecasting accuracies by countries. Therefore, the results are also shown in SMAPE and MAPE. The formulae of the employed metrics are provided next.

  • Bayesian Information Criterion (bic)

The bic or Schwarz information criterion (SIC) is a criterion for model selection based on the likelihood function like AIC (Schwarz, 1978). The general notation is as

$$ \mathrm{BIC}=k\ln (n)-2\ln \Big(\hat{L}\Big)\vspace*{-12pt} $$
(1)
  • Root Mean Squared Error (RMSE)

RMSE or root mean-squared deviation (RMSD) is the square root of the averaged squared errors. It is scale dependent and highly sensitive to the outliers. When there is a set of time series, it is a difficult metric to interpret.

$$ \mathrm{RMSE}=\sqrt{\frac{1}{n}\sum_{i=1}^n{\left({Y}_i-{\hat{Y}}_i\right)}^2}\vspace*{-9pt} $$
(2)
  • Symmetric Mean Absolute Percentage Error (SMAPE)

SMAPE is an accuracy measure based on percentage errors where the absolute difference between the A t and F t is divided by the half sum of absolute values of the A t and F t . This value is summed for every t and divided by the number of fitted points n.

$$ \mathrm{SMAPE}=\frac{100\%}{n}{\sum}_{t=1}^n\left|\frac{F_t-{A}_t}{\left(\left|{A}_t\right|+\left|{F}_t\right|\right)/2}\right| $$
(3)

The main advantage of the SMAPE is the interpretability (values range between 0 and 1) and the scale independency, which are necessities when dealing with multiple time series. The drawbacks are that when the actual value is zero, this metric is undefined because of the denominator.

  • Mean Absolute Percentage Error (MAPE)

MAPE or mean absolute percentage deviation (MAPD) is a prediction measure where the difference between the actual value (A t) and forecast value (F t) is divided by the actual value. The absolute value of division is summed for every t and divided by the number of fitted points n. This value may be multiplied by 100% for a percentage error.

$$ M=\frac{1}{n}{\sum}_{t=1}^n\left|\frac{A_t-{F}_t}{A_t}\right|\vspace*{-10pt} $$
(4)
  • Coefficient of Determination (R 2 )

The coefficient of determination denoted as R 2 is a widely used error metric in regression statistics, based on the proportion of variance in the dependent variable that may be justified by the independent variable. It is known also as the goodness of fit and it is the square of the correlation coefficient.

3.3 Holt-Winters Model

Holt-Winters is a statistical model that employs exponential smoothing to encode past values, used to predict the training data and forecasting. When the data is not stationary, in other words when there is a trend factor in data, simple exponential smoothing remains insufficient and the use of double exponential smoothing, or the Holt-Winters model becomes necessary (Holt, 1957). The COVID-19 data does not yet show any seasonality. Therefore, the seasonal parameter of the model is not included. With this adjustment, the method comprises the forecast equation with two smoothing equations for the level l t and for the trend b t, with corresponding parameters α and β between 0 and 1. The component form of the Holt-Winters model is

$$ {\hat{y}}_{t+\mathrm{hIt}}={l}_t+h{b}_t\vspace*{-18pt} $$
(5)
$$ {l}_t=\alpha \left({y}_t\right)+\left(1-\alpha \right)\left({l}_{t-1}+{b}_{t-1}\right)\vspace*{-18pt} $$
(6)
$$ {b}_t=\beta \left({l}_t-{l}_{t-1}\right)+\left(1-\beta \right){b}_{t-1} $$
(7)

The equations are done in MS Excel with generalized reduced gradient nonlinear solver method that looks at the slope of the objective function (decreasing selected error metrics) as the input values change and determine the optimality when the partial derivatives are zero (Abadie, 1978). Table 4 gives the results accuracy in R 2, RMSE, SMAPE, and MAPE.

Table 4 Holt-Winters accuracy results and parameters

As is observed from Table 4, for each time series, three alternative double exponential smoothing models are solved to decrease the RMSE, SMAPE, and MAPE, respectively. The objective error metric highly effects the parameters α and β and the accuracy of the model. RMSE is a scale dependent measure, thus it is not suitable for comparison. When case predictions are observed, SMAPE ranges between 0.9% (Russia) and 11.3% (France). For the death predictions, the maximum SMAPE is 12.3% (Spain, corrupted data with negative values) and the minimum SMAPE is 1.6% (Turkey). For the world data, SMAPE and MAPE values are around 2.5%. For the all-time series, the R 2 shows the power of the correlation with 99.99%, with a poor discriminating power. SMAPE and MAPE values show the suitability of the model to the COVID-19 data set. The fitting charts for the World are in Fig. 2.

Fig. 2
figure 2

Fitting Curves for the World: left—New Case, right—New Death

The forecasting values by the parameters, optimized for SMAPE are in Tables 5 and 6 for daily cases and daily deaths, respectively.

Table 5 Daily case forecasting (Holt-Winters)
Table 6 Daily death forecasting (Holt-Winters)

3.4 ARIMA

The ARIMA model describes a univariate time series as a combination of autoregressive (AR) and moving average (MA) lags which capture the autocorrelation within the time series. The order of integration denotes how many times the series has been differenced to get a stationary series. An ARIMA(p,d,q) model where p is the autoregressive lag, d is the degree of differencing, and q is the number of moving average lags can be denoted as:

$$ {\Delta }^D{y}_t={\sum}_{i=1}^p{\varphi}_i{\Delta }^D{y}_{t-i}+{\sum}_{j=1}^q{\theta}_j{\epsilon}_{t-j}+{\epsilon}_t,\kern0.5em {\epsilon}_t\sim N\left(0,{\sigma}^2\right) $$
(8)

The (p,d,q) parameters of the model are found by an iterative algorithm that tries to minimize the Bayesian information criterion (bic) values, considering the autocorrelation values. The sample and partial autocorrelation functions belonging to the World are given in Fig. 3.

Fig. 3
figure 3

ACF and PACF charts (World)

ARIMA configurations and results for the new cases and new deaths are given in Tables 7 and 8, respectively. These and the following tables show the result by five different error metrics. Bic values are the goodness of fit measure that evaluate the performance of the selected model compared to other models. R 2 represents the proportion of the variance for a dependent variable that is explained by the independent variable. RMSE is the square root of the mean of the squared errors. The existing literature share their results with these three metrics; however, these metrics are scale dependent, and they are not suitable to compare accuracy results for different countries. Therefore, in this study, the scale independent error metric MAPE is calculated. The “Inf” values on MAPE are based on the instability at near zero of the time series. To overcome this problem, the symmetric version SMAPE is considered.

Table 7 ARIMA results for the new cases
Table 8 ARIMA results for the new deaths

When the configurations of the models are investigated, most of the models show a non-stationarity characteristic, which is supported also with the augmented Dickey–Fuller test. For the new cases, the algorithm does not integrate the Turkey and India data and for the daily deaths does not differ between the India and Spain data. A second degree of differentiation is only required for the new cases for Russia, the UK, and Italy, and new deaths for Spain. In general, a seven-lag order is selected by the model for the autoregressive and moving average degrees. However, when the data is decomposed, the seasonality is found to be approximately 0; therefore, a SARIMA model is not necessary. China gives the maximum error values, and the reliability of their values is often discussed in public, therefore in the comments, China will be excluded due to data instability. Most of the countries fit the ARIMA model quite well. The focus of the study is not decreasing the errors as much as possible but providing an easy and fast fitting and forecasting solution and offering a comparative platform to the researchers and readers to discuss.

The R 2 values greater than 99% show the robustness of the model to explain the variance. The RMSE values may be used for each country to interpret fitting and forecasting intervals. The model performances over countries are done by SMAPE values. For the daily case numbers, the lowest SMAPE is for Russia with 0.79% and Turkey with 2.59%. France and Spain are the worst fitting countries, with 19.09% and 15.21%, respectively. Remaining countries and the world are within acceptable limits, their SMAPE ranging between 1% and 7%. The fitting of the death numbers is not as successful as the new cases fitting. In Table 6, the worst fitting countries are India and Spain with 37.41% and 36.77%, respectively. Turkey (1.47%) and Brazil (3.71%) are the best fitting countries using ARIMA. These countries may be grouped in alternative ways. One way of it is considering the fitting error closeness of the country with the world error term. The countries which have numerically close SMAPEs can be considered as coherent countries. When SMAPEs are too low, the countries may be grouped as negative coherent countries and when SMAPEs are too high, they may be grouped as positive coherent countries, where the necessity of building more sophisticated models arises. Table 9 gives this classification. Spain’s data set is corrupted and contains negative values along zeroes, which reflects directly the model results.

Table 9 Classification of the countries by coherence to the world

The ARIMA 10-days forecasting outcomes are in Tables 10 and 11 for daily cases and daily deaths, respectively.

Table 10 Daily case forecasting (ARIMA)
Table 11 Daily death forecasting (ARIMA)

Tables 10 and 11 show that world daily case and death numbers of the virus reached a steady plateau for the first days of the February 2021. USA and UK case and death numbers are decreasing, while the situation is worsening for India and Brazil. Turkey and Russia have a slightly negative slope, where the numbers seem to decrease.

3.5 ARIMAX

ARIMAX is an extension of the ARIMA model where there are suitable explanatory variables that can be incorporated into fitting and forecasting problems. In practice, these additional exogenous variables X create a multivariate time series model instead of a univariate model and improve the prediction performance. An ARIMAX(p,d,q) model for a time series y t with an exogenous series X can be written as

$${ \begin{array}{l}{\Delta }^D{y}_t={\sum}_{i=1}^p{\varphi}_i{\Delta }^D{y}_{t-i}+{\sum}_{j=1}^q{\theta}_j{\epsilon}_{t-j}+{\sum}_{m=1}^M{\beta}_m{X}_{m,t}+{\epsilon}_t,\kern0.5em {\epsilon}_t\sim N\left(0,{\sigma}^2\right)\end{array}}$$
(9)

New cases and new deaths are correlated time series and may be meaningful for each other as an exogenous variable fit and forecast better. Another significant data is the stringency index of the countries. The stringency index reflects the government attitudes of the countries and is calculated as a function of school and workspace closures, cancellation of public events, restrictions on public gatherings, closures of public transport, stay at-home requirements, public information campaigns, restriction on internal movements and international travel controls.Footnote 3 To test the effectiveness of using these exogenous variables, the Granger-causality test is applied among the time series.

The Granger-causality test is a statistical hypothesis test to determine the usefulness of a time series for forecasting another series (Granger, 1969). A time series X is said to Granger-cause Y, when it provides statistically significant information about the future of the Y The notation is.

$$ p\left[Y\left(t+1\right)\in A|I(t)\right]\ne p\left[Y\left(t+1\right)\in A|{I}_{-X}(t)\right] $$
(10)

where p is probability, A is an arbitrary non-empty set, and I(t) and I X(t) denote the information as of time t in the universe, and in the modified universe where X is excluded. In this study this test is employed to detect in which series ARIMAX can be employed. In total, six hypotheses are built. These hypotheses are, respectively, “case” Granger-causes “deaths” and vice versa, “case” Granger-causes “stringency index” and vice versa, and “deaths” Granger-cause “stringency index” and vice versa.

Tables 12, 13, and 14 show the results of these tests, where h value 1 indicates the acceptance of the hypothesis, which does not neglect the Granger-cause effect between the time series for a p-value lower than 0.05.

Table 12 The Granger-causality test results on case to death and vice versa
Table 13 The Granger-causality test results on case to stringency and vice versa
Table 14 The Granger-causality test results on death to stringency and vice versa

The first hypothesis is based on the strong correlation idea between the case and death numbers. However, as can be observed from Table 13, only for seven countries “case” has a Granger-cause on the “death” numbers. Similarly, only for seven countries the “death” numbers can be employed to estimate the “case” numbers. In addition, these countries are not the same, and this Granger-cause cannot be generalized for countries; therefore, it will not be included in the ARIMAX model.

The second hypothesis is based on the effect of the government restrictions on the case number and vice versa. Although this idea makes sense in theory, when the test is applied, it is found that it does not make sense statistically. Only in two countries “case” is the Granger-cause of the stringency index, and only in three states the stringency has a significant effect on the “case” numbers estimation. These three countries will be modeled with ARIMAX to measure the impact on the forecasting accuracy.

As is clear from Table 14, death has no impact on the stringency index in each country, however when the vice versa situation is considered, for all the countries (except China), the stringency index is a Granger-cause of the death numbers, therefore should be used in the ARIMAX as an exogenous variable to increase the forecasting accuracy. Based on the Granger-causality test, the results of the ARIMAX model are given in Table 15.

Table 15 ARIMAX scores on new cases (stringency index as exogenous variable)

The SMAPE values of the ARIMA model belonging to India, Russia, and Italy were 6.19%, 7.9%, and 3.22%, respectively. ARIMAX results shows that, the only significant contribution of the stringency index on the estimation process, obtained in India, by an added value of 3.76%. This can be considered as warning not to employ complex models when the forecasting accuracy satisfactory.

The Granger-cause effect between the stringency index and new deaths is common for countries. Table 16 shows the results of the ARIMAX model where the stringency index is considered as an exogenous variable to predict the new deaths.

Table 16 ARIMAX scores on new deaths (stringency index as exogenous variable)

Spain gives the worst performance. When the data of Spain is investigated the negative values of new deaths are observed. This corruption of the data set is reflected directly on the solutions. Therefore, this country necessitates a data cleaning process rather than a sophisticated model. MAPE does not perform well because of the near zero values. UK is not suitable to be fitted with ARIMAX with a SMAPE of 72.25%, which is far greater than the simple ARIMA process (Table 17).

Table 17 Daily death forecasting (ARIMAX)

Tables 18 and 19 show the Holt-Winters outperforming performance for the new case and new deaths except for Spain. For the new deaths ARIMAX is an overfitting method and should not be used in the prediction of the COVID-19 numbers.

Table 18 Comparative results (new death)
Table 19 Comparative results (new case)

The ARIMA and Holt-Winter models may be used for fitting and forecasting the cases and deaths, they can be employed as benchmark results for alternative forecasting methods. Figures 4 and 5 draw the 10-days forecasting outcomes of the employed models with the test data for World, USA, and UK.

Fig. 4
figure 4

Forecasting world COVID-19 data: left—New case, right—New death

Fig. 5
figure 5

Forecasting daily deaths: left—USA, right—UK

4 Conclusion

The COVID-19 studies are an ongoing literature in alternative branches. This study is among the first efforts that compiles forecasting research. COVID-19, having completed its first year, employs a satisfactory large training data set and shows the accuracy results of simple but successful statistical forecasting models on a total of 24 time series (12 for new cases and 12 for new deaths). This paper employs three different models, those being the Holt-Winters, the ARIMA, and the ARIMAX models, with five different error metrics, bic, R 2, RMSE, SMAPE, and MAPE. All the models provide satisfactory results where percentage errors are generally lower than 10% and R 2 is approximately 99.9% showing the power of regression-based models. In general, the Holt-Winters (known as double exponential smoothing) outperforms the ARIMA, and although an introduction of an exogenous variable in the estimation process exists, ARIMAX is the lowest performing model, still with the acceptable results for most of the countries (see Figs. 4 and 5).

The correlation of the most effected countries’ data is calculated with the world data. The Granger-causality tests show the importance of the correct exogenous variable selection. Statistically, the new cases and new deaths are dependent variables; however, in the estimation process they cannot be used for each other as auxiliary inputs. The stringency index consisting of government attitudes towards combatting the virus, statistically does not affect the case numbers; however, it has a Granger-cause effect in death numbers.

With the available data set and all the parameter details, this study provides reproducible results, where outcomes may be used by other researchers as benchmark results. Further researchers may classify the countries according to their response to statistical models, and with a more focused attention, such as data cleaning or machine learning approaches, they can improve the fitting and forecasting accuracy performances. The finding of a meaningful exogenous variable in the estimation would be beneficial to increase the ARIMAX performance.