Keywords

1 Introduction

The COVID-19 pandemic began in 2019 and spread throughout the entire world; many organizations or individuals claimed that this pandemic originated in China, albeit China is reluctant to concede such an accusation. Since 2019, the population has been targeted by the coronavirus and its various variations, posing the biggest threat to the economy, healthcare, and governance. At the beginning of the epidemic, exposure to infected patients was difficult and drawn out. Deep learning (DL) is one of the possible solutions that will soon be combined and developed with clinical tests to allow the accurate detection of infection cases and take the initial steps automatically [1]. The spread rate of COVID-19 is contingent on time, situation, weather, population and lifestyle, and quite varied region. Considerable strategies have been solicited in locating and assessing the superseding measure of infectious conditions. Whenever an epidemic extends from a region or country from various perspectives over time, especially climate cycle variations or viral transmission through the period, these records are determined by non-linear attributes [2]. Analyzing diverse COVID-19 data sets, it has been established that infection cases evolved rapidly over a certain period of time. Many mathematical model area units are used to predict and evaluate the progression of proven affected cases individually [3]. Area units acquired several modeling, estimation, and statement approaches to deal with this pandemic.

The DL approach Recurrent Neural Network (RNN) has been used to predict the possible infection rates. Compared to supervised learning, it is a challenge in training input areas to find a perfect pattern which is learned by ML mechanism for solving a complex data set and finding relationships in pre-sized output sets [4]. Artificial Neural Network (ANN) layers have single or multiple combinations of layers, and DL neural networks are composed of ANN, so the structure of these neural networks is parallelly established inside RNN algorithms. Using DL, predicting the next pandemic of coronavirus infection rate is problematic choosing the correct algorithm. Deep literacy styles can relate the structure and pattern of similar data to the non-linearity and avoid the complexity of an algorithm LSTM had used in time-series forecasting [5]. Infection rates and climatic changes varied within selected countries. Thus, data sets shrink, and it becomes obvious that GRU will perform well in prediction since RNN’s most eccentric redaction is GRU. It is a refined process of data transformation held to the next iteration. It would be more successful in forecasting COVID-19 transmission if the input data had temporal components and was not based on typical regression methods [6].

Many researchers analyzed COVID-19 time-series data; prudent analysis of time-series data aids in making to make new decisions that are very important for public awareness. Gated recurrent units (GRU) and long short-term memory (LSTM) have similarities, with some differences between the computational sections. Both have the highest performance capability, but the fact arises when the data become smaller or larger. Many researchers have shown a process of work where DL gives a satisfactory result on time series forecasting. Some works in the literature part on time series analysis of COVIDCovid-19 data comports similar to their work through the process.

Chowdhury et al. [7] focused on finding a suitable machine learning algorithm that can predict the COVID-19 daily new cases with higher accuracy, they used (ANFIS) and LSTM to see the newly infected cases in Bangladesh in this study LSTM had shown a favorable result on a scenario-based model with MAPE of 4.51, RMSE-6.55 and correlation coefficient −0.75 accuracy was good enough. Liao et al. [8] have reported a COVID-19 prediction model based on a time-dependent + SIRVE. GRU forecasting accuracy was noticeable, and they showed that the single day prediction accuracy rate improves 51% compared to the best existing single deep learning predictions. Shahid et al. [2] proposed forecast models comparison LSTM, GRU, and Bi-LSTM are assessed for time series prediction of confirming cases, death, and recoveries in ten affected countries due to COVID-19, comparing among them Bi-LSTM predicted well, but the accuracy of GRU also showed well result. Arun Kumar et al. [9] in their work proposed state-of-art DL Recurrent Neural Networks (RNN) models, with GRU and LSTM cells to predict the country-wise cumulative confirmed cases, cumulative recovered cases, and cumulative fatalities and showed that individual model show variations in result for each of 10 countries. Some of the country’s LSTMs gave satisfactory results, and some of the country GRU gave well accuracy. Engelbrecht and Scholes [10] tested for seasonal climate permittivity in observed COVIDCovid-19 infection data to show that if the complaint does have a substantial seasonal dependence, and herd immunity isn’t established during the first peak season of an outbreak, there’s likely to be a seasonality-sensitive alternate surge of infections about one time after the original outbreak.

The remaining part of our paper in holds mathematical equations, data visualization graphs, calculated data, and graphical figures to give a clearer understanding of the process of dual application of deep neural network technique which projected a satisfactory outgrowth. Seasonal changes have a significant impact on new cases where a range of temperatures represents weather and season.

2 Related Work

Chowdhury et al. [7] focused on finding a suitable machine learning algorithm that can predict the COVID-19 daily new cases with higher accuracy, they used (ANFIS) and LSTM to see the newly infected cases in Bangladesh in this study LSTM showed a favorable result on a scenario-based model with MAPE of 4.51, RMSE-6.55 and correlation coefficient −0.75 accuracy was good enough. Liao et al. [8] have reported a COVID-19 prediction model based on a time-dependent + SIRVE. This model combines DL technology with the mathematical implementation of infectious diseases and forecasts the parameters in the mathematical model of infectious diseases by fusing DL time series prediction methods in the result section, GRU forecasting accuracy was noticeable, accuracy rate improves 51% compared to the best existing single deep learning predictions. Shahid et al. [2] proposed forecast models comparison LSTM, GRU, and Bi-LSTM are assessed for time series prediction of confirming cases, death, and recoveries in ten affected countries due to COVID-19, comparing among them Bi-LSTM predicted well, but the accuracy of GRU also showed well result, model ranking from good performance to lowest in their scenario was Bi-LSTM, LSTM, GRU, SVR and ARIMA where Bi-LSTM generates lowest MAE and RMSE values of 0.0070 and 0.0077 respectively. Arun Kumar et al. [9] in their work proposed state-of-art DL Recurrent Neural Networks (RNN) models, with Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTM) cells to predict the country-wise cumulative confirmed cases, cumulative recovered cases, and showed variations in result for each of 10 countries. The GRU and LSTM cells, along with Recurrent Neural Networks (RNN), were developed to predict the future trends of COVID-19, Some of the countryies LSTMs gave satisfactory results, also for some of the countryies GRU gave good accuracy. Engelbrecht and Scholes [10] had test for seasonal climate permittivity in observed COVID-19 infection data to show that if the complaint does have a substantial seasonal dependence and herd immunity isn’t established during the first peak season of an outbreak, there’s likely to be a seasonality-sensitive alternate surge of infections about one time after the original outbreak.

3 Methodology

The workflow applied in this study is displayed in Fig. 1. Table 1 outlines the computed precision of DL. The mathematical equations utilized are displayed in Eq. 17. The graphical representation or assertion of data visualization & predicted results have merged in Figs. 2, 3, 4, 5, 6, 7, 8, 9, 10 and Table 1 successively. To give a clearer understanding of this research, the whole working process has been provided.

Fig. 1.
figure 1

Methodology diagram and working process.

3.1 COVID-19 Data Set

Data sets have been assembled from unprecedented resources, the final data sets that have been used in this work have been reformed from “OurWorldInData (OWID)” [11], provided publicly accessible daily datasets, and “NASA Prediction of Worldwide Energy Resources’’ [12] accorded daily cases datasets. Data spanning April 2020 to March 2022, 600 days of data, was ordered in a time series format by date, month & year. Essentially parameters are new cases, new deaths, and new tests from “OWID” [11]. Different temperature parameters were collected in the measurement of latitude and longitude data provided by “NASA”. We have utilized the five most populated countries proclaimed on the “WorldOmeters” [13] website. For a clearer observation of data, common parameter called date are present in all separate datasets.

3.1.1 Data Prepossessing

This research concentrated on seasonal COVID-19 affected cases where a range of temperature indicates a season and the data set has a temperature column date. This time series forecasting centered on seasonal effect, a column of affected new cases assembled from the ‘OWID’ data set. The temperature T2M (temperature in 2 m) was collected from “NASA”. In this research, highly populated countries are Bangladesh, the Philippines, Mexico, Vietnam, and Indonesia, as recommended by the “worldometer” population page we cleaned the noisy data and filled up the NaN value linearly, then used the mean value to fill the rest of the process using pandas data frame. In the forecasting, the data set was divided into 5 different sections, because this research is forecasting the seasonal effect of COVID-19 on countries. This research has focused on 3 different seasons summer, winter, and spring.

The main data set has a date, new cases, temperature, season, and location columns. Countries are selected in the Asia region mostly, and countries have a similar range of temperatures. Most populated countries data have accumulated from the “WorldOmeters” website. Temperature is collected by giving latitude and longitude from Google. For Bangladesh 23.6850° N, 90.3563° E, Mexico 23.6345° N, 102.5528° W, Philippines 12.8797° N, 121.7740° E, Vietnam 14.0583° N, 108.2772° E, Indonesia 0.7893° S, 113.9213° E. The data set is separated by season March, April, May, and June; these months are considered summer. Spring is selected as -July, August, September, and October, and the selected month for winter is -November, December, January, and February; these Dates were converted with string to timestamp format. Data normalization is one of the most important steps before training LSTM and GRU models. In this research, MinMaxScaler has been used for normalization. MinMaxScaler turns the training dataset inputs into {0,1} range of data as shown in Eq. (1). Actual values will be turned into minimum 0 and maximum range of 1 for each variable. Normalization avoids scaling problems during training and testing models (Table 1).

$$ {\mathbf{Scale}} = \left( {{\mathbf{Input}}{\text{ }} - {\text{ }}{\mathbf{minimum}}{\text{ }}{\mathbf{of}}{\text{ }}{\mathbf{input}}} \right)/\left( {{\mathbf{maximum}}{\text{ }}{\mathbf{of}}{\text{ }}{\mathbf{input}}{\text{ }} - {\text{ }}{\mathbf{minimum}}{\text{ }}{\mathbf{of}}{\text{ }}{\mathbf{input}}} \right) $$
(1)

Exploiting the training set as input and scale are output after scaling, every training set will go through this equation of data scaling for normalization.

3.2 Deep Learning Models Details

DL is one of the significant methods of forecasting. It is a difficult task with traditional programs, hence DL has been shown to significantly improve techniques to predict both structured and unstructured data [14]. The real-time data technique is rather hard to process, beginning with locating statistical data files, transforming them into training and test results, and finally applying RNN to represent the data via visual analysis [15]. In this research, DL RNN models applied time series forecasting.

RNN planned target vectors from the entire history of past information. In this manner, models contrasted with old branches of occurrence data, and RNN are less complex in demonstrating elements of consistent succession data. As a rule, RNN layout associations between units in coordinated circles and recollects past contributions through its internal state. The deeply hidden output feature is beneficial to extract elements of versions into the hidden state, constructing it more simply to expect output summaries of the records of preceding inputs more efficiently [16]. With the help of vanishing gradient descent, the unnecessary data are removed, and the effective data are stored in the memory cell for the next iteration. Stochastic gradients tend to evaporate or expand. It’s hard to keep track of long-term dependencies with such simple RNN to overcome the vanishing or exploding gradient challenges; RNN with LSTM and GRU have been developed [17].

3.2.1 Long-Short-Term-Memory (LSTM)

LSTM works excellently in vanishing and exploding gradients. In the RNN model, problems occur when a large number of data rollovers in this situation memory unit taking spacing with some unnecessary data. To avoid this LSTM was introduced with a memory unit called cell state shown in Eq. (2).

$$ {\text{Cell}}\;{\text{state}} = \left( {{\text{input}}\;{\text{gate}}*{\text{new}}\;{\text{candidate}}} \right) + \left( {{\text{forget}}\;{\text{gate}}*{\text{cell}}\;{\text{state}} - 1} \right) + {\text{b}} $$
(2)

The four generalized formulas as input-output and forget gate uses the sigmoid activation function and the tanh activation function is used for new candidates as shown in (3).

$$ \sigma /{\text{tanh }}\left( {{\text{W}} \cdot {\text{X}} + {\text{U}} \cdot {\text{h}} - 1 + {\text{b}}} \right) $$
(3)

In LSTM weights are always updating in each layer, to generate new weight automatically from calculated new correction value, in the model new state is introduced as shown in Eq. (4)

$$ {\text{New}}\;{\text{state}} = {\text{output}}\;{\text{gate}}*{\text{new}}\;{\text{candidate}} $$
(4)

W is weight, b is biased, and (cell state - 1) is the previous output return as input, h-1 is the previously hidden state return as new.

3.2.2 Gated-Recurrent-Unit (GRU)

GRU has two major gates that act as a switch. Either could be 0 or 1. The reset gate considers 0 and the update gate is kept at 1. The reset gate determines how important the information must be discarded [18]. GRU and LSTM had a similarity, only two gating layers reset gates and update gates instead of three gating layers [19]. GRU input gate merges into the reset gate, and the output gate merges into the update gate as shown in Eq. (5) in each hidden state.

$$ \sigma \left( {{\text{W}} \cdot {\text{X }}\left[ {{\text{h}} - 1,{\text{ X}}} \right] + {\text{b}}} \right) $$
(5)

GRU introduces new memory contents with an adjustable combination; for the fewer gates, the complexity of GRU is much easier.

3.3 Training and Testing

Training data was defined as records between February 24–2020, to December 12–2021, these daily records were considered training data, and this data set was trained for 60 days of a chunk. Testing sets were defined from December 23–2021 to February 24–2022. In the X coordinate, predicted 60 days were added, and in the Y coordinate, new infection cases were added as trained as training and testing datasets. Temperature and new case data these, two features are considered as a perimeter and were used as a feature of the training and testing set. This procedure was applied to all selected countries’ datasets.

3.3.1 Prediction Accuracy Measurement

Mean Square Logarithm Error (MSLE) and Root Mean Square Logarithm Error (RMSLE) were used for measuring the loss function of prepared models. These Regression models are used for measuring the forecasting performance and showing the difference between the real value and forecast value, shown in Eq. (6). The specialty of MSLE is matrices that avoid the natural log of possible 0 values for the actual value and forecasting value. MSLE error measurements were used in the validation and testing stage [20]. RMSLE is nothing but root over the MSLE as shown in Eq. (7).

$$ {\text{MSLE}} = \frac{1}{T}\mathop \sum \nolimits_{i = 1}^{n} (\left( {log\left( {Fi + 1} \right) - log\left( {Ri + 1} \right)} \right)2 $$
(6)
$$ {\text{RMSLE}} = \sqrt {\frac{1}{T}\mathop \sum \nolimits_{i = 1}^{n} (\left( {log\left( {Fi + 1} \right) - log\left( {Ri + 1} \right)} \right)2} $$
(7)

where T is the total number of observations, Fi is forecasting a target, Ri is a real target for i, and log(x) is the natural logarithm.

4 Result

In understanding infection rate, data had two parameters; country and seasonal effect. Data visualization of COVID-19 has been performed utilizing several segments, including a bar chart (as shown in Figs. 2 and 9), pie chart (as evident in Fig. 9), and line plotting techniques (can be seen in Figs. 4, 5, 6, 7 and 8). COVID-19 transmission rate As a function of seasonal changes is depicted in Fig. 2. This graph delineates that spring and winter have a paramount number of infections. All countries are visualized separately in 3 axes termed new cases, date, and temperature to give a distinct understanding. First and foremost, Bangladesh’s data reveals that in spring 2020-06 to 2020-09 and 2021-06 to 2021-09, the infection rate came to a head at the temperate weather condition of 28 °C demonstrated in Fig. 4. In summer 2020-02 to 2020-05 and 2021-02 to 2021-05 the infection rate was lower at the temperature of 30 °C or above.

Fig. 2.
figure 2

Infected country over 3 years

2020-07 to 2020-10 and again in 2021-08 to 2021-09 spring when the temperate weather condition was 28 °C to 29°. In the winter season, the infection rate also rose, but in the summer 2020-02 to 2020-05 and 2021-02 to 2021-05, the infection rate was lower at the temperature of 30 °C or above (shown in Fig. 7). The rest of the countries have a critical situation in the spring season and less infection rate in spring season temperature (as shown in Fig. [5, 6, 8]).

For the Philippines, the infection rate surged in the spring season from the middle of the bar chart (see Fig. 8) indicates that the daily infection rates were high in the winter and spring seasons. It is well established that the beginning of the spring season and middle of the winter season this time period is notable for the vast spread out of the COVID-19 infection rate.

Fig. 3.
figure 3

Bangladesh New case Observation with temperature

Fig. 4.
figure 4

México New case Observation with temperature.

Fig. 5.
figure 5

Vietnam New case Observation with temperature.

Fig. 6.
figure 6

Philippine New case Observation with temperature.

Fig. 7.
figure 7

Indonesia New case Observation with temperature

During the winter season, newly infected cases are 100000 and above in Mexico. In the summer season, the number of infected cases in Mexico is below 20000 Shown in Fig. 8. Here, other countries follow the same pattern of ratio on infection rates (Fig. 8). The positivity rate slowed down during the summer season temperature and evidently increased in winter and spring. Warmer humid climates appear to have less SARS-CoV-2 viral spread, based on the observational process of the research or the inherent potential of distortion, the validity of the data provided was a poor rate of infection [14].

Fig. 8.
figure 8

Seasonal New case Observation temperature.

The parentage of infection transmission pie charts (Fig. 9) had been created by means of new cases that occurred during the time period. The most infected country is Indonesia, where 28.1% of people were infected by a coronavirus, then Vietnam at 15.0% of other countries infection percentage is given (Fig. 9). Here the calculation of the percentage of daily new cases was considered as the mean value of total new cases. Among selected countries again, we can see that a higher number of people were infected in the spring season.

Fig. 9.
figure 9

Country-wise COVID-19 infection percentage.

5 Discussion

After visualizing past data (Figs. 3, 4, 5, 6 and 7) we have uncovered a relationship between season and the spread of COVID-19. A variety of seasonal temperature ranges was selected, and two different RNN techniques LSTM and GRU, with Relu activation function with Adam optimizer (Table 1) have been applied to predict the best result for different countries. In the area of machine learning, Adam was discovered to be strong and well-suited to the optimization problem [21]. Table 1 shows 100 epochs where batch sizes 32 and 64 as these batch sizes are suitable for GRU and LSTM.

Fig. 10.
figure 10figure 10

Graphical representations of COVID-19 Forecasting.

According to MSLE and RMSLE evaluation (Table 1), the accuracy for Bangladesh was LSTM (3.903 and 1.975) and GRU (3.470 and 1.862) shown in Fig. 10. Indonesia has MSLE-8.700 and RMSLE-2.949 for LSTM, for GRU MSLE- 11.836 RMSLE-3.440. From these observations, it is clear that GRU performs better than the LSTM model for both MSLE and RMLSE accuracy tests as shown in Fig. 10.

Table 1. COVID-19 LSTM and GRU result

6 Conclusions and Future Work

People continue to be infected by COVID-19 which continues to be dangerous through prevalence. The purpose of this research is clear visualization of new cases that occur in season and the performance measurement of DL models. Our research demonstrates that COVId-19 has a seasonal effect. Analyzing data demonstrated that the same temperature has different effects on cases in different locations temperatures and newly confirmed cases are very onerous. Where it can be said there is a seasonal effect on new cases, and a particular season has a range in temperature. As a result, we came up with a decision that, the range of temperature during the summer season spread or effectiveness of coronavirus is much slower and becomes inactive. The highest infections happened in temperate weather conditions of spring and the beginning of the winter season. The DL model RNN shows good results on sequential data from different perspectives of model or data deployment, but accuracy varies. LSTM performs well in large data sets, but breaking GRU is used here. As a result, we found that GRU has better accuracy and fast computational abilities. The outcome appears that in temperate weather of the winter and spring seasons, the effect on COVID-19 is considerable and the range of temperatures of these seasons is noticeable, while the temperatures of summer pose less dangeras.

The model’s accuracy in this paper could be more significant if there was more data. These models train with two features: a new case and temperature. As per our findings, it is possible to control and maintain summer season temperature or other natural effectiveness artificially in living rooms, offices, organizations institutions, etc. Infection rates could be potentially reduced in other seasons, minimizing infection rates as future work to predict upcoming waves.