Keywords

Introduction

The global spread of the COVID-19 pandemic has resulted in considerable number of losses of life. It has been regarded as the world’s largest economic and health disaster since World War II. According to the World Health Organization (WHO), the SARS-CoV-2 virus has infected approximately 200 million individuals globally. The virus has been shown to spread between persons via respiratory channels during human movement, enhancing its transmissibility and making the whole population vulnerable. COVID-19 confirmed cases numbered 4,799,266 on May 17th, while 316,520 persons died as a result of the pandemic at global level. Due to the lack of vaccine and drugs at the initial outbreak, different countries have taken varied approaches to contain the outbreak. The most usual responses include strict lockdown, partial lockdown, the closure of all educational institutions, and the cancellation of all sorts of aircraft. Because of the link between human movement and viral transmissibility, governments throughout the world have implemented restrictions, such as mandatory face mask, social distancing, and shutting public transit and restaurants, to avoid crowds. Although the implementation of such regulations has slowed the spread, the emergence of lethal mutations continued to put public health at risk.

Medical supplies are frequently in low supply due to rising patient numbers, placing a strain on healthcare systems and personnel in many nations. Thus, one of the most important factors to contain and control the spread is understanding the nature of the spread and accurately projecting the patterns. Reliability in forecasting COVID-19 spread trends can aid in the prediction of pandemic outbreaks and boost government readiness to combat the pandemic. Furthermore, precise forecasting can offer feedback on whether the implemented strategy helps reduce the burden on the country’s healthcare system.

Such a tumultuous environment of epidemic breakouts sparked numerous broad questions where there is a definite answer: Would coronavirus endure until a vaccine is discovered, or will it be eradicated after a set length of time? How long does it take a medical expert to develop the correct drug or vaccine? What is the estimated number of individuals who will be affected by this epidemic? What is the likelihood of death or recovery among the afflicted patients? Is it different in different age groups and different parts of the world? If that’s the case, what may be the reasons? How effective is the lockdown approach at reducing the spread? What are the negative consequences of lockdown, and how long can various countries afford it?

In the last decade, machine learning (ML) has established itself as a distinct academic subject by tackling a slew of extremely complicated and sophisticated real-world challenges. There is extant research that attempts to forecast death daily using the traditional and deep learning methods, like long short-term memory (LSTM) and its variants. The mean squared error (MSE) and absolute error (MAE) score are commonly used to evaluate the prediction capabilities of the models. Recurrent neural network, a type of deep learning, is used in this study to anticipate the pandemic trend for India by forecasting the number of new cases. The reason for choosing India is due to the fact that it is one of the top 10 severely afflicted countries in the world, according to healthcare professionals. Furthermore, the LSTM model built beats several previously published models; therefore, the work utilizes it to anticipate COVID-19 instances a week in advance.

The rest of the paper is organized as follows: section “Literature review” discusses related research in this field, section “Materials and methods” describes the dataset and details the proposed system, section “Results and discussion” discusses the experimental results, and section “Conclusion” concludes the study.

Literature Review

Machine learning models are widely employed to understand the COVID-19 pandemic from various medical perceptives, including understanding the impact of antibodies [1], chest X-rays and chest CT images [2, 3], mutations [4], and forecasting pandemic trends.

The authors of this study [5] focused on predicting the number of COVID-19 cases that will be confirmed, recovered, or died in 60 days in the 16 high-impact nations. They used a seasonal auto-regressive integrated moving average (SARIMA) and an auto-regressive integrated moving average (ARIMA) models. According to their study, the SARIMA model is more realistic than the ARIMA model. Da Silva et al. [6] compared the univariate ARIMA and a proposed hybrid model that examine the number of illnesses in the top 27 afflicted cities in Brazil. Their experiments demonstrated that the ensemble model outperformed the single model by 26.73%.

Researchers have indicated a strong desire to learn more about India’s rapid expansion. Swaraj et al. [7] built a model for predicting the COVID-19 epidemic in India that used ARIMA and a nonlinear auto-regressive neural network (NAR). When compared to the single ARIMA model, the hybrid model exhibits a considerable reduction in evaluation metrics. Wadhwa et al. [8] forecast the number of active cases across India 3 months ahead using the linear regression (LR) model. Khan et al. [9] implemented various machine learning models to determine when will the number of cases in India stop growing and to examine policy restrictions. According to their findings, the GPR model surpasses the other models with an accuracy of 95 percent. Using daily fresh confirmed cases in Russia, Peru, and Iran, Wang et al. [9] created an LSTM model to estimate pandemic trends for 150 days. Bayesian model was used on publicly available global data to assess the impact of lockdowns on COVID-19 transmission for five nations with high covid incidence (India, Brazil, Russia, the USA, and the UK). It has been established that if the lockdowns are lifted, the outbreak tempo in Brazil, India, and Russia would considerably rise.

In [10], an auto-regression model was used to predict confirmed and recovered COVID-19 cases in Jakarta. With an MPAE value of less than 20%, the results suggest that this technique delivers adequate forecasting accuracy. When compared to traditional approaches, such as ARIMA, exponential smoothing, BATS, and Prophet, this methodology performed better for pandemic prediction. However, the prediction quality of the Poisson auto-regression technique still has to be improved to achieve good prediction performance. ARIMA, MLP, LSTM, and feedforward neural network (FNN) are four regression models used to forecast COVID-19 spread in [11]. The LSTM model was shown to have the highest forecast accuracy in this investigation.

In [12], a few machine learning models, including susceptible-infected-recovered, linear regression, polynomial regression, and SVR and LSTM, are examined in projecting COVID-19 cases in Saudi Arabia and Bahrain. When utilizing confirmed COVID-19 cases data from Saudi Arabia, the results show that SVR offers the greatest predicting, whereas LR surpasses the other models with Bahrain verified cases data.

Materials and Methods

Description of Dataset

The data for this study was taken from the government of India official website https://www.mohfw.gov.in/. The dataset contains information about the newly confirmed COVID-19 cases, cured cases, and deaths for each day for each state. The confirmed cases, cured cases, new cases, and death are updated by the Ministry of Health and Family Welfare (MoHFW), India, on a regular basis. The website provides state-wise statics of all aforementioned parameters. In the dataset, daily COVID-19 statistics are available for 560 days from January 30, 2020, till August 11, 2021. It contains 18,110 corona records observed for different states at different days. This dataset has been used to analyze the state-wise trend. The data from August 12, 2021 till date of this article was fetched from coronavirus research center of John Hopkins University available at GitHub site and are updated daily. The records are split into 65:35 for training and validation; records of 450 days are used for training, and remaining records are utilized for validation. A time step of seven is considered as the spread of covid is significant from 1 week to another week. COVID-19 statistics plots from data taken from MoHFW are shown in Fig. 1. Figure 2 depicts the total confirmed, recovered, active cases and deaths for each state. Figure 3 shows the top ten states with the highest confirmed cases. Figure 1a–c displays the heatmap plot of confirmed, recovered, and deaths for each state in India.

Fig. 1
Three maps of India represent the data of state-wise recovered cases, confirmed cases, and deaths from COVID-19. The maps indicate that Maharashtra has the highest number of cases and deaths among all the states.

Plots from dataset. (a) State wise recovered cases, (b) state wise confirmed cases, (c) state wise deaths

Fig. 2
A line graph represents the confirmed, recovered, active, and death cases in different states of India. It represents the highest data for Maharashtra in all the cases.

Total infected cases till August11, 2021

Fig. 3
A table represents the confirmed, deaths, cured and active cases for 10 states of India. Maharashtra tops the table with 1928603 confirmed cases, and Rajasthan is at the bottom with 307554 cases.

Top ten states with respect to active cases (till August 11, 2021)

Forecasting COVID-19 with Recurrent Neural Network

The analysis of underlying patterns in time series data has seen as key way to solve a series of forecasting problems, like stock market forecasting, traffic planning and management, and weather prediction. In healthcare applications, time series forecasting model is used to predict the spread of disease, estimate survival and mortality rate, and evaluate the possible risk caused by disease over time.

For short-term forecasting, conventional time series models, e.g., ARIMA and exponential smoothing, are appropriate. Long-term forecasting involves unearthing the underlying trends of the data and the effect of the association among the related parameters to provide estimates for future [13]. As they demand tremendous computations, conventional techniques were limited in their ability in terms of high-dimensional data and complex nature of functions [14].

Currently, deep learning models have been widely employed in forecasting problems [15], owing to its nature to learn the mapping of the input-output pair and support multiple inputs and outputs. Specifically, recurrent neural networks (RNNs) pose the ability to handle the sequence dependency that exists between inputs. However, for any standard RNN, weights on the hidden layers and output layers would either decay or explode. To tackle this gradient problem, long short-term memory (LSTM) has been designed and have been employed successfully in various domains [16].

ADF (Augmented Dickey-Fuller) Test

The time series forecasting model a stationary time series data for better prediction. So, as the preliminary step, we checked the nature of the dataset used in the study using the augmented Dickey-Fuller (ADF) test. The results of the test are interpreted based on the p-values. The ADF test was performed on the covid dataset and found to be nonstationary as the p-value is over 5% as shown in Fig. 4.

Fig. 4
A set of text represents the A D F static value as 0.627317, p value as 0.988270, and 3 critical values at 1%, 5 %, and 10%.

ADF statistics for the COVID-19 dataset

In order to make the dataset stationary, lag 1 difference was performed on the dataset. The ADF statistics after lag difference is shown in Fig. 5.

Fig. 5
A set of text represents the A D F static value as negative 4.160818, p value as 0.000767, and 3 critical values at 1%, 5 %, and 10%.

ADF statistics for the COVID-19 dataset after performing lag difference technique

LSTM

RNN is the key deep learning technique on time series data to extract temporal correlations hidden in the data [17]. It has one to many hidden states distributed in the temporal way and can forecast the future with good accuracy than traditional methods [18,19,20]. The major disadvantage with this method is its inability to overcome vanishing gradient problem [21]. To address this shortcoming, LSTM was developed, which regularizes the gradient flow [16]. Long short-term memory is a recent variant of recurrent neural network to resolve exploding and vanishing gradient problems. LSTMs are capable of learning long-range dependencies hidden in the data through memory cells (LSTM cells). The dissection of LSTM cell is shown in Fig. 6.

Fig. 6
A flow diagram of L S T M cell. It represents 4 combinations of inputs to the forget gate, input gate, input update, and output gate. The final outputs are the C subscript t and h subscript t.

LSTM cell

These dependencies and temporal correlation of the input are captured in the LSTM cell through the series of gates, viz., forget gate, input gate, and output gate, along with the sigmoid and tangent activation function. The computation at each gate in LSTM cell is shown in the equations below [22].

$$ \textrm{Input}\ \textrm{gate}\ {i}_t=\sigma \left({W}_{xi}{x}_t+{W}_{hi}{h}_{t-1}+{W}_{Ci}{C}_{t-1}+{b}_i\right) $$
(1)
$$ \textrm{Forget}\ \textrm{gate}\ {f}_t=\sigma \left({W}_{xf}{x}_t+{W}_{hf}{h}_{t-1}+{W}_{Cf}{C}_{t-1}+{b}_f\right) $$
(2)
$$ \textrm{Output}\ \textrm{gate}\ {o}_t=\sigma \left({W}_{xo}{x}_t+{W}_{ho}{h}_{t-1}+{W}_{Co}{C}_t+{b}_o\right) $$
(3)
$$ \textrm{Cell}\ \textrm{state}\ {c}_t=\sigma \Big({f}_t{c}_{t-1}+{i}_t\tan h\left({W}_{xc}{x}_t+{W}_{hc}{h}_{t-1}+{b}_c\right) $$
(4)
$$ \textrm{Hidden}\ \textrm{state}\ {h}_t={o}_t\tan h\left({c}_t\right) $$
(5)

where σ represents sigmoid function and tanh is tangent function. In this paper, variants of LSTM are implemented and are discussed in the following sections.

Stacked LSTM

In stacked LSTM, multiple LSTM layers stacked together as depicted in Fig. 7. Each intermediate LSTM output layer provides a sequence of outputs which is fed to the next LSTM layer. Also, it provides output for each time step rather than a one output for all input time steps. The computation at each stage is given in Eqs. (6), (7), (8), (9) and (10) [22].

Fig. 7
A diagram of stacked L S T M. It represents the flow of x subscript t minus 1, x subscript t, x subscript t + 1, up to x subscript t + n to the y subscript t minus 1, y subscript t, y subscript t + 1 up to y subscript t + n through different layers of L S T M.

Stacked LSTM

$$ \textrm{Input}\ \textrm{gate}\ {i}_t^L=\sigma \left({W}_{ih}^L{h}_{t-1}^L+{W}_{ix}^L{h}_t^{L-1}+{b}_i^L\right) $$
(6)
$$ \textrm{Forget}\ \textrm{gate}\kern0.125em {f_t^L}_{.}=\sigma \left({W}_{fh}^L{h}_{t-1}^L+{W}_{fx}^L{h}_t^{L-1}+{b}_f^L\right) $$
(7)
$$ \textrm{Output}\ \textrm{gate}\ {o_t^L}_{.}=\sigma \left({W}_{oh}^L{h}_{t-1}^L+{W}_{ox}^L{h}_t^{L-1}+{b}_o^L\right) $$
(8)
$$ \textrm{Cell}-\textrm{state}\kern0.41em {c_t^L}_{.}=\left({f}_t^LW{c}_{t-1}^L+{i}_t^L{c}_t^{L-1}\right) $$
(9)
$$ \textrm{Hidden}\ \textrm{state}\ {h_t^L}_{.}={o}_t^L\tan h\left({c}_t^L\right) $$
(10)

Bidirectional LSTM

Unlike LSTM, which can process inputs only in the forward direction, bidirectional LSTM uses information from both directions (from future to past and from past to future) as shown in Fig. 8.

Fig. 8
A diagram of bidirectional L S T M. It represents the flow of x subscript t minus 1, x subscript t, x subscript t + 1, up to x subscript t + n to the y subscript t minus 1, y subscript t, y subscript t + 1 up to y subscript t + n through 2 different layers of L S T M.

Bidirectional LSTM

The computation at each stage for producing output is given below [22].

$$ \textrm{Input}\ \textrm{gate}\ {i}_t^{\leftarrow L}=\sigma \left({W}_{\leftarrow ih}^L{h}_{t-1}^L+{W}_{\leftarrow ix}^L{h}_t^{L-1}+{b}_{\leftarrow i}^L\right) $$
(11)
$$ \textrm{Forget}\ \textrm{gate}\kern0.125em {f_t^{\leftarrow L}}_{.}=\sigma \left({W}_{\leftarrow fh}^L{h}_{t-1}^L+{W}_{\leftarrow fx}^L{h}_t^{L-1}+{b}_{\leftarrow f}^L\right) $$
(12)
$$ \textrm{Output}\ \textrm{gate}\ {o_t^{\leftarrow L}}_{.}=\sigma \left({W}_{\leftarrow oh}^L{h}_{t-1}^L+{W}_{\leftarrow ox}^L{h}_t^{L-1}+{b}_{\leftarrow o}^L\right) $$
(13)
$$ \textrm{Cell}-\textrm{state}\kern0.41em {c_t^{\leftarrow L}}_{.}=\left({f}_t^{\leftarrow L}W{c}_{t-1}^{\leftarrow L}+{i}_t^{\leftarrow L}{c}_t^{\leftarrow L-1}\right) $$
(14)
$$ \textrm{Hidden}\ \textrm{state}\ {h_t^{\leftarrow L}}_{.}={o}_t^{\leftarrow L}\tan h\left(c{\leftarrow}_t^L\right) $$
(15)

The output of the network is the cumulative outputs from both directions and is given by

$$ \textrm{Output}{y}_t={W}_{hy}^{\leftarrow }{h}_t^{\leftarrow }+{W}_{hy}^{\to }{h}_t^{\to }+{b}_y $$
(16)

The Proposed Models

LSTM Model

Three LSTM models are built for this study and experimented on the dataset. The first model based on stacked LSTM is shown in Fig. 9. It has an input layer, two LSTM hidden layers, a fully connected layer, and an output layer. The input time sequence is set to 7, considering the significance of a week of COVID-19 data. Both the first and second LSTM hidden layers have 150 units and a rectified linear unit (ReLU) activation function. The fully connected layer is designed with 64 neurons, and the final output layer has a dense layer with 1 neuron. The proposed second LSTM model is similar to the first model with an additional dropout layer after the first hidden LSTM layer (dropout probability 0.5).

Fig. 9
A model of diagram L S T M represents different forms of input and output values in 5 tables.

Proposed stacked LSTM model

The hyperparameters set for both models are summarized in Table 1.

Table 1 Hyperparameters of the proposed model

Bidirectional LSTM Model

A bidirectional STM model with architecture as shown in Fig. 10 was implemented. It has an input layer, two bidirectional LSTM hidden layers, a fully connected layer, and an output layer. The input time sequence is set to 7 as that of stacked LSTM model. Both the first and second LSTM hidden layers are an LSTM layer with 300 units and a rectified linear unit (ReLU) activation function. The fully connected layer is designed with 150 neurons, and the final output layer had a dense layer with 1 neuron.

Fig. 10
A model of bidirectional L S T M represents different forms of input and output values in 5 tables.

Proposed bidirectional LSTM model

Results and Discussion

In this section, the performance of three proposed models on Indian covid dataset is discussed. Three variants of LSTM models, namely, stacked LSTM, stacked LSTM + dropout, and bidirectional LSTM, are built and experimented on the dataset. Each model has been trained using the same dataset and evaluated by the same validation dataset. The forecasting for all proposed models is based on the attribute confirmed cases in the dataset. The confirmed covid cases plotted for 1 year period (2021–2022) and 2 years period (2020–2022) are shown in Fig. 11a, b, respectively.

Fig. 11
A set of 2 graphs. Graph a represents confirmed COVID cases from April 2021 to April 2022 with an increasing trend. Graph b represents the number of confirmed, cured, and death cases since January 2020.

Confirmed cases (a) for 1 year period, (b) for 2 years period

From Fig. 11a, rise is event at two periods: January 2021 peaking in March 2021 and January 2022 peaking in February 2022. These two peaks indicate the second and third wave in India, respectively. The second wave in India started in January 2021, peaked in March 2021, and declined in June 2021. The third wave driven by the Omicron variant started in January 2022, peaked in February 2022, and declined in March 2022. The start of the first wave (January 2020) and its trend for 2 years period of 2020–2022 can be interpreted in Fig. 11b.

The trend of active, cured, and death for top three states, namely Maharashtra, Karnataka, and Tamil Nādu for the period of 2 years from 2020 to 2022 are shown in Fig. 12. These three states top the list of severely affected states. Though the number of active, cured, and death cases vary for each state, they exhibit more or less same trend throughout the 2-year period.

Fig. 12
A set of 9 graphs represents the active, cured, and death cases in Maharashtra, Karnataka, and Tamil Nadu. Maharashtra nearly surpasses both states in all three categories.

Trend of active, cured, and death in top three states

The proposed three models, namely, stacked LSTM model, LSTM with dropout layer, and bidirectional model, are built on training dataset and checked against the validation dataset. All the experiments are conducted using a 16GB graphics processing unit and Keras framework with TensorFlow back end.

The proposed models are trained on the dataset for different epoch sizes of 50,100,150 and 500. The training loss and validation loss for three models at epoch size = 150 are given in Figs. 13a, b and 14, respectively. The loss plot curve can used to interpret the performance of the model whether are underfit, overfit, or perfectly fit the data. Underfitting models have high bias, meaning that training loss will not decrease with increase in data. It indicates that the model is not able learn from the training data. On the other hand, overfitting indicates high variance. The model can perform well on the training data, but poor on the unseen data. It means that model cannot generalize well.

Fig. 13
A set of 2 graphs represents the training and validation loss for the L S T M and bidirectional L S T M models. both graphs represent multiple spikes for validation losses.

(a). Plot of training/validation loss for LSTM, (b) training/validation loss for bidirectional LSTM

Fig. 14
A graph of loss versus epochs represents the training and validation loss for L S T M + dropout. It represents multiple spikes for validation loss.

Training/validation loss for LSTM + dropout

The training-validation loss plot of stacked LSTM revealed that both training loss and validation loss are high with smaller training samples. As the samples are increased, both the losses came down. More to that, both the losses follow the same path, and distance between them is less. It indicated that the proposed stacked LSTM model shows good fit on data and can generalize well on the unseen data.

The training-validation loss plot of bidirectional LSTM revealed that validation loss is very high than training loss and shoots up at several batches of datapoints. This indicate that the proposed bisectional model has overfitting problem and not able to generalize on new data.

The training-validation loss plot of stacked LSTM + dropout model has shown a similar behavior of the first model, but with the validation loss greater than the first model.

Error measures of the proposed models are calculated using the metrics RMSE and MAPE and are tabulated in Table 2.

Table 2 Performance measure

As the LSTM and bidirectional LSTM models have shown better, these two models are used for forecasting. These models forecast the confirm cases for 7 days from June 21, 2022, to June 28, 2022, and are shown in Fig. 15a, b, respectively.

Fig. 15
A set of 2 graphs represents the prediction for the next 7 days for actual confirmed and predicted values using the L S T M, and L S T M + dropout model. The predicted values follow a decreasing trend.

Forecasting using (a) LSTM model, (b) LSTM + dropout model

Conclusion

This study proposed three LSTM variant models to forecast confirmed cases of COVID-19 in India. The data was collected through the government of India website and Johns Hopkins University. The necessary preprocessing techniques on data were carried out and was normalized. The data was split into training and testing dataset. The first model is LSTM model with input layer, two hidden layers, a dense layer, and an output layer. In the second model, dropout layer was added to the first model. The third model is bidirectional LSTM model. The performance of the proposed models has been evaluated using MAPE, and RMSE, on test dataset. The findings revealed that the proposed stacked LSTM outperforms other models and is best suited for Indian covid dataset.