Keywords

1 Introduction

The viral infection that causes COVID-19 most likely rose out of a creature, but is currently spreading from individual to individual. Overall, COVID-19 is a challenge faced by multiple disciplines like medicine, defense, finance, telecommunication, information technology, and so on. The contamination is thought to spread chiefly between people who are in close contact with one another (inside around 6 ft) through respiratory globules conveyed when a infected individual hacks or sniffles. It also may be possible that an individual can get COVID-19 by reaching a surface or article that has the disease on it and a short time later reaching their own mouth, nose, or conceivably their eyes, yet this isn’t accepted to be the essential way the contamination spreads. Patients with COVID-19 have had delicate to genuine respiratory malady with impacts of:

  • Fever

  • Cough

  • Shortness of breath

The World Health Organization (WHO) announced the 2019–20 coronavirus epidemic a Public Health Emergency of International Concern (PHEIC) on 30 January 2020 (WHO 2020; Mahtani 2020) and a pandemic on 11 March 2020 . Proof of neighborhood spread of the ailment has been found in numerous nations over each of the six WHO districts (World Health Organization 2020a, b).

2 Related Work

Xu et al. (2020) examined the neurotic characteristics of a patient who passed on from severe infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) by postmortem biopsies.

Novel (2020) reported outcomes of a descriptive, experimental investigation of all cases identified as of February 11, 2020.

Chen et al. (2020) expected to assess the clinical characteristics of COVID-19 in pregnancy and the intrauterine vertical transmission capability of COVID-19 infection.

Liu et al. (2020) audit the basic reproduction number (R0) of the COVID-19 virus. R0 means that the transmissibility of a virus, representing the normal number of new infections created by an infectious person in an absolutely innocent populace.

Pan et al. (2020) studied the adjustment in chest CT findings associated with COVID-19 pneumonia from beginning diagnosis until understanding recuperation.

Sujatha et al. (2020a, b) utilized linear regression, vector autoregression and multilayer perceptron technique for COVID-2019 cases prediction in India using Kaggle dataset.

Iwendi et al. (2020) applied boosted random forest algorithm for COVID-2019 prediction.

3 Dataset Description

We have used the COVID-2019 (KP 2020) dataset from kaggle where data from January 22, 2020 to March 26, 2020 are present. The dataset is having data of more than 180 countries with their attributes such as Province/State, Country/Region, Latitude, Longitude, Date, Confirmed, and Deaths. Out of this dataset we have concentrated on India’s data. The dataset is having 65 instances for India. As we have taken only India’s data so with have discarded the latitude and longitude by only considering the corresponding dates followed by the confirmed, death, and recovered cases. We anticipated the future effects of COVID-19 pandemic in India through time series analysis, correlation analysis and Granger Test and GMDH.

4 Experimental Results

For experimentation purpose we have incorporated Group Method of Data Handling (GMDH) strategy. This method was started by Ivakhneko in 1966 and it has been improved and advanced in the course of recent years. The GMDH calculation interfaces the inputs to outputs with higher sequential polynomial networks which are principally feed-forward and multilayered neural networks (NN) (Onwubolu 2009). Right now, nodes are shrouded units and the activation polynomial coefficients are weights which are evaluated by standard least square regression (Ghanadzadeh et al. 2012). Lately, be that as it may, the utilization of such self-composed networks has prompted fruitful use of the GMDH-type computation in a wide scope of zones in engineering and science (Ahmadi et al. 2007; Abdolrahimi et al. 2014; Pazuki and Kakhki 2013; Atashrouz et al. 2015; Najafzadeh 2015). The GMDH is a polynomial-based model. As indicated by the GMDH approach, each layer can be acquired from a quadratic polynomial function. In this manner, the input variables are anticipated to the yield variable. The primary objective right now is finding of function, f, which ventures the input variables to the yield variable. In this way, the output variable (Xi) can be composed from the input variables as the accompanying structure:

$$ {X}_i=f\left({Y}_{i1},{Y}_{i2},{Y}_{i3}\dots {Y}_{in}\right),\kern1.25em i=\left(1,2,3,\dots, M\right) $$
(5.1)

where, Ys are input variables. The structure of the GMDH can be obtained using the minimization of an objective function. The objective function ω can be written as:

$$ \omega =\sum \limits_{i=1}^M{\left[\left(\mathrm{X}\left({Y}_{i1},{Y}_{i2},{Y}_{i3}\dots {Y}_{in}\right)\right)-{X}_i\right]}^2 $$
(5.2)

where, in the above equation Xi is actual data (Naderpour et al. 2019).

GMDH (Ivakhnenko 1971; Dorn et al. 2012) is known as a self-sorting deep learning technique for time-series analysis issues. It is broadly utilized in numerous fields, for example, forecasting, data mining, optimization and pattern recognition and so on. GMDH-based NN can be considered as a polynomial NN. As a distinction with different networks, the GMDH network changes persistently during the preparation procedure. A few favorable circumstances of GMDH network can be referenced as: self-association in the preparation procedure, high exactness in forecasting, findings for high-request nonlinear frameworks and so forth (Nguyen et al. 2019).

GMDH comprises of parametric, clusterization, analogs complexing, re-binarization and likelihood calculations. This inductive methodology depends on sifting through of bit by bit entangled models and determination of the ideal arrangement by least of outer standard trademark. Polynomials as well as nonlinear, probabilistic methods or clusterizations are utilized as essential models.

GMDH methods can be valuable in light of the fact that:

  • The number of layers and neurons in hidden layers, model structure and other ideal hyperparameters are resolved consequently.

  • It ensures that the most precise or impartial models will be found—technique doesn’t miss the best arrangement during arranging everything being equal (in the given class of functions).

  • As input variables can be utilized nonlinear functions, that may have effect on output variable.

  • It naturally finds interpretable connections in data and chooses compelling input factors.

  • GMDH sorting calculations are fairly basic for developing software (GMDH n.d.) .

We have used the GMDH software and Orange Data Mining software (Bioinformatics Laboratory 2020) for conducting our experiments. Granger test and correlation analysis is conducted using and time series analysis is conducted using (NNS 2020). Data science begins with collecting the data followed by consolidating, investigating, understanding and finally presenting it with valuable information. In data consolidating and investigating phase, as a part of knowledge gain, should know the nature of attributes and the nature of values make up the dataset. Data visualization helps in getting great insight of the data set followed by applying the required process of classification, association, clustering based on the problem scope (Van Der Aalst 2016; Keller et al. 1994; Sui 2019; Mirkin 2019a). Table 5.1 provides the statistics of the considered dataset for the forecasting purpose. No missing value is the added advantage that helps in good prediction (Evans et al. 2007; Peat and Barton 2008).

Table 5.1 Statistics of INDIA COVID-19

Time series plot depicts the nature of the variable considered for the experimental purpose. It helps to recognize drifts in the data over spell. Intuitive in nature and provides faster insight about the changeover of the data over the time span. Line chart representation is used often that as higher clarity and informative in nature (Wen 2019). In the line chart representation parallelly visualize the changeover of the multiple variable of the dataset over the time at single shot.

Figure 5.1 shows the data connected for the variable like confirmed cases, death cases, and recovered cased of INDIA COVID-19 dataset on the normalize pattern. On applying the correlation over the dataset in orange data mining too we obtain the light of the correlation among the attributes. Higher correlation and minimum false discovery rate is the pattern that is visualized in this dataset (Mirkin 2019b). It’s very obvious date plays great role in time series representation. Table 5.2 provides insight about correlation between two features along with measures.

Fig. 5.1
figure 1

Time series plot of INDIA COVID-19

Table 5.2 Correlation of INDIA COVID-19 features

Granger test is the statistical method for determining the influence of one time series with another time series feature considered for the experimental part. Famously it is called as granger causality since provided by academician Clive Granger. Find application in various fields like economics, neuro science and so on. With confidence 95%, for the dataset it’s interpreted that confirmed cases four times ahead of recovered cases. Similarly as of now for the 2 months of time series dataset lag is not much inferred but as the length of data grows the lag may be skewed positively or negatively (Wen et al. 2019; Ghysels et al. 2016). Figure 5.2 shows the granger test with max lag as 5.

Fig. 5.2
figure 2

Granger test on INDIA COVID-19

Curve fitting is the method of fabricating a curve based on the mathematical function. Obviously, that will make the perfect fit for the given data points with constraints. Fitted curve assist in the data visualization (Mudelsee 2019; Guest 2012; Maddams 1980). It helps in inferring values when data is missing with the generated function. Best fit can form in either straight manner or curve. Criterion value provides the insight about best fit. Figures 5.3, 5.4, and 5.5 illustrates the curve fit for the selected variable confirmed, death, and recovered, respectively.

Fig. 5.3
figure 3

Curve fitting of confirmed cases in INDIA COVID-19

Fig. 5.4
figure 4

Curve fitting of death cases in INDIA COVID-19

Fig. 5.5
figure 5

Curve fitting of recovered cases in INDIA COVID-19

Various measures help in understanding the accuracy of the fitted curve. Mathematical function for each variable is responsible for setting values in any part of curve. Based on this perspective equations 5.3, 5.4, and 5.5 represents function of the confirmed, deaths, and recovered features, respectively.

$$ {Y}_1=2.51235+{\mathrm{time}}^{2\ast \left(-1.52766\right)}+{\mathrm{time}}^{3\ast 0.515268}+{\mathrm{time}}^{4\ast \left(-0.0698097\right)}+{\mathrm{time}}^{5\ast 0.00501064}+{\mathrm{time}}^{6\ast \left(-0.000210204\right)}+{\mathrm{time}}^{7\ast 5.31865\mathrm{e}-06}+{\mathrm{time}}^{8\ast \left(-7.98039\mathrm{e}-08\right)}+{\mathrm{time}}^{9\ast 6.531\mathrm{e}-10}+{\mathrm{time}}^{10\ast \left(-2.243\mathrm{e}-12\right)} $$
(5.3)
$$ {Y}_1=-0.0657009+{\mathrm{time}}^{5\ast 3.21508\mathrm{e}-07}+{\mathrm{time}}^{7\ast \left(-3.58273\mathrm{e}-10\right)}+{\mathrm{time}}^{9\ast 1.77937\mathrm{e}-13}+{\mathrm{time}}^{10\ast \left(-1.67276\mathrm{e}-15\right)} $$
(5.4)
$$ {Y}_1=0.0795093+{\mathrm{time}}^{4\ast \left(-1.35655\mathrm{e}-05\right)}+{\mathrm{time}}^{6\ast 1.22536\mathrm{e}-07}+{\mathrm{time}}^{7\ast \left(-7.69865\mathrm{e}-09\right)}+{\mathrm{time}}^{8\ast 1.98941\mathrm{e}-10}+{\mathrm{time}}^{9\ast \left(-2.38232\mathrm{e}-12\right)}+{\mathrm{time}}^{10\ast 1.09322\mathrm{e}-14} $$
(5.5)

Ranking of the best fit in the ascending order begins with deaths, recovered, and confirmed variable with criterion value as 0.0406, 0.0297 and 0.01181 for the mentioned functions respectively. Figure 5.6 graphical represents the plot of the mean absolute error (MAE), root mean square error (RMSE), coefficient of determination and correlation of model fit for the three variables of dataset.

Fig. 5.6
figure 6

Accuracy of model fit

We have conducted our experiment in GMDH based on four core algorithms namely Combinatorial (quick), Stepwise forward selection, Stepwise mixed selection, and GMDH neural network.

4.1 Combinatorial (Quick) Approach

The traditional combinatorial GMDH method produces models of all conceivable input variable mixes and chooses best model from the created set of models as indicated by a picked choice standard (Anastasakis and Mort 2001). Here for the experimental part with combinatorial (quick) method, the parameters used are reorder observation as Pseudo-random, validation strategy as k-fold validation, twofold, validation criteria as RMSE.balance, variable ranking as correlation, drop variable as rank 5, additional variable as xi.xj with return best model as 100 with time series mode.

$$ {Y}_1(t)=-1497.69+{}^{``}\mathrm{Confirmed}\left[t-8\right],{\mathrm{cubert}}^{"}\ast 413.182 $$
(5.6)

With the help of mathematical function (5.6) and system generated criterion value of 0.00153, the system is predicting the confirmed cases. Figure 5.7 shows the predicted values for confirmed cases.

$$ {Y}_1\left[t\right]=1.84925+{}^{``}\mathrm{Deaths}\left[t-5\right],{\mathrm{cubert}}^{"}\ast \left(-2.82812\right)+{}^{``}\mathrm{Deaths}\left[t-9\right],{\mathrm{cubert}}^{"}\ast \left(-2.50114\right)+\mathrm{cycle}\ast 1.17415 $$
(5.7)
Fig. 5.7
figure 7

Confirmed case prediction graph plot

With the above mathematical function (5.7) and system generated criterion value of 0.11036, the system is predicting the death cases. Figure 5.8 shows the predicted values for death cases.

$$ {Y}_1\left[t\right]=-98.82+\mathrm{cycle}\ast 1.44 $$
(5.8)
Fig. 5.8
figure 8

Death case prediction graph plot

With the above mathematical function (5.8) and system generated criterion value of 0.17355, the system is predicting the recovered cases. Figure 5.9 shows the predicted values for recovered cases.

Fig. 5.9
figure 9

Recovered case prediction graph plot

Tables 5.3 and 5.4 shows the predicted value and post-processed results of confirmed, death and recovered cases based on combinatorial (quick) algorithm.

Table 5.3 Forecast based on combinatorial(quick) approach
Table 5.4 Post-processed results by combinatorial (quick) algorithm

4.2 Stepwise Forward Selection Approach

Forward selection is a kind of stepwise regression which starts with an unfilled model and includes variables individually. In each forward advance one can include the one variable that gives the absolute best improvement to your model (Glen 2019) . Here for the experimental part with stepwise forward selection method, the parameters used are reorder observation as Pseudo-random, validation strategy as k-fold validation, twofold, validation criteria as RMSE.balance, variable ranking as correlation, drop variable as rank 100, no additional variable are used with limit model complexity as 200, return best model as 100 with time series mode.

$$ {Y}_1\left[t\right]=-1497.69+{}^{``}\mathrm{Confirmed}\left[t-8\right],{\mathrm{cubert}}^{"}\ast 413.182 $$
(5.9)

With the above mathematical function (5.9) and system generated criterion value of 0.00153, the system is predicting the confirmed cases. Figure 5.10 shows the predicted values for confirmed cases.

$$ {Y}_1\left[t\right]=1.84925+\mathrm{cycle}\ast 1.17415+{}^{``}\mathrm{Deaths}\left[t-5\right],{\mathrm{cubert}}^{"}\ast \left(-2.82812\right)+{}^{``}\mathrm{Deaths}\left[t-9\right],{\mathrm{cubert}}^{"}\ast \left(-2.50114\right) $$
(5.10)
Fig. 5.10
figure 10

Confirmed case graph plot

With the above mathematical function (5.10) and system generated criterion value of 0.11036, the system is predicting the death cases. Figure 5.11 shows the predicted values for death cases.

$$ {Y}_1\left[t\right]=-98.82+\mathrm{cycle}\ast 1.44 $$
(5.11)
Fig. 5.11
figure 11

Death case graph plot

With the above mathematical function (5.11) and system generated criterion value of 0.17355, the system is predicting the recovered cases. Figure 5.12 shows the predicted values for recovered cases.

Fig. 5.12
figure 12

Recovered case prediction graph plot

Tables 5.5 and 5.6 shows the predicted value and post-processed results of confirmed, death, and recovered cases based on stepwise forward selection algorithm.

Table 5.5 Forecast based on stepwise forward selection approach
Table 5.6 Post-processed results by stepwise forward selection algorithm

4.3 Stepwise Mixed Selection Approach

The mixed stepwise variable determination system will ponder both including and evacuating one variable at each progression and make the best stride. In mixed calculation we could without much of a stretch include one variable, at that point include or expel another and afterward evacuate the primary variable included (CMU Statistics 2015). Here for the experimental part with stepwise mixed selection method, the parameters used are reorder observation as Pseudo-random, validation strategy as k-fold validation, twofold, validation criteria as RMSE.balance, variable ranking as correlation, drop variable as rank 100, no additional variable are used with limit model complexity as 200, return best model as 100 with time series mode.

$$ {Y}_1\left[t\right]=304.496+{N}_3\ast {N}_2\ast 0.000803145 $$
(5.12)
$$ {N}_2\left[t\right]=301.867+{N}_4\ast {N}_3\ast 0.000810118 $$
(5.13)
$$ {N}_3\left[t\right]=298.562+{N}_5\ast {N}_4\ast 0.000818998 $$
(5.14)
$$ {N}_4\left[t\right]=296.225+{N}_6\ast {N}_5\ast 0.000825283 $$
(5.15)
$$ {N}_5\left[t\right]=71.2228+\mathrm{time}\ast {N}_6\ast 0.0161384 $$
(5.16)
$$ {N}_6\left[t\right]=-1591.51+\mathrm{time}\ast \mathrm{cycle}\ast 0.00773787 $$
(5.17)

With the above mathematical functions (5.125.17) and system generated criterion value of 0.0921, the system is predicting the confirmed cases. Figure 5.13 shows the predicted values for confirmed cases.

$$ {Y}_1\left[t\right]=1.84925+\mathrm{cycle}\ast 1.17415+{}^{``}\mathrm{Deaths}\left[t-5\right],{\mathrm{cubert}}^{"}\ast \left(-2.82812\right)+{}^{``}\mathrm{Deaths}\left[t-9\right],{\mathrm{cubert}}^{"}\ast \left(-2.50114\right) $$
(5.18)
Fig. 5.13
figure 13

Confirmed case prediction graph plot

With the above mathematical function (5.18) and system generated criterion value of 0.11036, the system is predicting the death cases. Figure 5.14 shows the predicted values for death cases.

$$ {Y}_1\left[t\right]=-98.82+\mathrm{cycle}\ast 1.44 $$
(5.19)
Fig. 5.14
figure 14

Death case prediction graph plot

With the above mathematical function (5.19) and system generated criterion value of 0.17355, the system is predicting the recovered cases . Figure 5.15 shows the predicted values for recovered cases.

Fig. 5.15
figure 15

Recovered case prediction graph plot

Tables 5.7 and 5.8 shows the predicted value and post-processed results of confirmed, death, and recovered cases based on stepwise mixed selection algorithm.

Table 5.7 Forecast based on stepwise mixed selection approach
Table 5.8 Post-processed results by stepwise mixed selection algorithm

4.4 GMDH Neural Network Approach

GMDH neural network comprehends time arrangement anticipating and information mining undertakings by building artificial neural networks and applying them to the information. Neural network estimating is more adaptable than ordinary linear or polynomial approximations and is along these lines progressively exact. With neural networks one can find and consider nonlinear associations and connections among information and construct an up-and-comer model with high forecast quality (NNS 2020) . Here for the experimental part with stepwise mixed selection method, the parameters used are reorder observation as Pseudo-random, validation strategy as k-fold validation, twofold, validation criteria as RMSE.balance, variable ranking as correlation, drop variable as rank 100, neuron function as a + xi (linear), maximum number of layers as 33 with initial layer width as 1 with time series mode.

$$ {Y}_1\left[t\right]=-1497.69+{}^{``}\mathrm{Confirmed}\left[t-8\right],{\mathrm{cubert}}^{"}\ast 413.182 $$
(5.20)

With the above mathematical function (5.20) and system generated criterion value of 0.0015284, the system is predicting the confirmed cases. Figure 5.16 shows the predicted values for confirmed cases.

$$ {Y}_1\left[t\right]=0.426828+{}^{``}\mathrm{Deaths}\left[t-6\right].{\mathrm{cubert}}^{"}\ast {N}_3\ast 0.142041+{N}_3\ast 0.740675 $$
(5.21)
$$ {N}_3\left[t\right]=-0.219251+{}^{``}\mathrm{Deaths}\left[t-5\right].{\mathrm{cubert}}^{"}\ast {N}_3\ast 0.491739+{N}_3\ast 0.957112 $$
(5.22)
$$ {N}_4\left[t\right]=0.438937+{}^{``}\mathrm{Deaths}\left[t-9\right].{\mathrm{cubert}}^{"}\ast {N}_5\ast 0.126892+{N}_5\ast 0.797122 $$
(5.23)
$$ {N}_5\left[t\right]=0.671779+{}^{``}\mathrm{Deaths}\left[t-1\right].{\mathrm{cubert}}^{"}\ast \mathrm{cycle}\ast 0.437579 $$
(5.24)
Fig. 5.16
figure 16

Confirmed case prediction graph plot

With the above mathematical functions (5.215.24) and system generated criterion value of 0.031344, the system is predicting the death cases. Figure 5.17 shows the predicted values for death cases.

$$ {Y}_1\left[t\right]=-98.82+\mathrm{cycle}\ast 1.44 $$
(5.25)
Fig. 5.17
figure 17

Death case prediction graph plot

With the above mathematical function (5.25) and system generated criterion value of 0.17355, the system is predicting the recovered cases. Figure 5.18 shows the predicted values for recovered cases.

Fig. 5.18
figure 18

Recovered case prediction graph plot

Tables 5.9 and 5.10 shows the predicted value and post-processed results of confirmed, death, and recovered cases based on GMDH-NN algorithm.

Table 5.9 Forecast based on GMDH-NN approach
Table 5.10 Post-processed results by GMDH-NN approach

5 Comparison Between the Algorithms Based on MAE, RMSE, SD, Correlation

Figure 5.19 shows the comparison of various used algorithms on parameters like Correlation, SD, MAE, RMSE. Based on the comparison it is clear that the stepwise mixed algorithm gives good prediction result for confirmed cases.

Fig. 5.19
figure 19

Comparing algorithms for confirmed cases based on various parameters

Figure 5.20 shows the comparison of various used algorithms on parameters like Correlation, SD, MAE, RMSE. Based on the comparison it is clear that the GMDH-NN algorithm gives good prediction result for death cases.

Fig. 5.20
figure 20

Comparing algorithms for death cases based on various parameters

Figure 5.21 shows the comparison of various used algorithms on parameters like Correlation, SD, MAE, RMSE. Based on the comparison it is clear that the GMDH-NN algorithm gives good prediction result for death cases.

Fig. 5.21
figure 21

Comparing algorithms for recovered cases based on various parameters

6 Conclusion

As this disease is declared as an epidemic, the present study will help researchers to understand the impact of this outbreak. We have used combinatorial (quick), stepwise forward selection, stepwise mixed selection and GMDH neural network to predict the spread of disease in India. Mathematical function mentioned in the each approach provides insight about the provided prediction. From the parametric comparisons it is clear that the GMDH-NN algorithm provides good accuracy in our case. Post-processed results obtained give the accuracy of the fitted model. COVID-19 provides a broad spectrum of future work in various disciplines.