Keywords

1 Introduction

Coronavirus in the form of COVID-19 poses a pandemic threat in most of the countries all over the world in 2020. The COVID-19 is spreading at alarming rate in almost all countries over the globe. It has first originated in Wuhan, Hubei province, Republic of China [1] at the end of December 2019. The 41 cases existing as ‘pneumonia of unknown reasons’ are reported by the Wuhan Municipal Health Committee [2]. On January 01, 2020, the seafood wholesale market in Wuhan is announced as the epicenter of the outbreak of COVID-19 and is decided to be closed. After few days, human‐to‐human transmission is reported in Wuhan [3]. In first week of February 2020, the virus outbreak causes more than 24,000 total confirmed cases and 494 deaths and spreads in 25 countries around the world. The outbreak is a major concern in Italy, Spain, Iran followed by England, USA, Russia, UAE, Australia, Canada, Singapore, India and many other countries. The outbreak is increasing day by day and poses a major threat to healthcare system in different countries. Till now in June 13, the total cases in different countries around the world are 7,756,905 with 4,28,576 deaths and 3,974,422 recovery [4]. The most affected states and union territories in India are Maharashtra, Tamil Nadu, Delhi, Gujarat, Uttar Pradesh, Rajasthan and West Bengal. 98% of the patients are having mild symptoms or asymptomatic, and rest 2% are in critical and serious condition of acute respiratory problems. The problem of rapid outbreak in India is also very alarming and spread to almost every states in May 2020. The total no of cases in India up to June 13, 2020 are 3,09,603 with 8,890 deaths and 1,54,330 recovery [5].

The first case of COVID-19 in India is reported in 30 January 2020 in Kerala. There has been a gradual increase in the number of infections (1,251 on March 30, among which 32 deaths and 102 recovered cases). In response, Indian government has implemented international travel bans, Janata curfew and strict lockdown throughout the country from 25th March onward. The major event in connection with COVID-19 outbreak is listed in Table 1.

Table 1 Key events and decisions taken by Indian government

India is having second highest population after China around the world with very large population density, limited infrastructure and healthcare systems to cater to very large demands of COVID-19 patients. On the other hand, factors like warmer climate as well as humidity [6,7,8,9], a large proportion of the young population, and possible immunity due to BCG vaccinations [8], may favor India. Most of the infected patients in India are asymptomatic or mild symptoms, which is quite unusual as compared to Europeans and North American countries. India is experiencing early lockdown followed by China in view of favorable effect in controlling the final epidemic size. However, considering the huge population (almost 1.35 billion), with high population density, poses significant challenges to mitigate the pandemic situation due to COVID-19 spreading, without affecting economic and social issues. In this context, it is very important to consider the huge demand of healthcare systems (like ICU beds, PPE, ventilator, oxygenerator, etc.) and also enforce social distancing and avoid spreading in a larger scale through community transmission. So the prediction of possible COVID-19 outbreak is very important for formulation of policy-making decision regarding healthcare system lockdown and social distancing. The trend of spread in India is confined in some specific hotspot areas, especially in highly population dense area, where the timely decision of lockdown, social distancing and rapid diagnosis of infected cases play a major role to prevent pandemic situation.

Mathematical modeling is frequently used to predict the outbreaks of different diseases in epidemiology [10, 11]. Infectious disease models under epidemiology aim at understanding the mechanisms that influence the spread of diseases and predicting disease transmission. Mathematical models are popular to evaluate the potential impact of different control measures of pandemic diseases and to guide public health policy decisions of certain union government. Different models used earlier to predict the nature of the out breaks of Ebola pandemic diseases at African countries are reported in different literatures [12,13,14]. The different models used to predict the pandemic disease outbreak mainly consist of two types of models such as statistical models [15] and mechanistic models [16]. The forecasting of different diseases by mathematical modeling in case of dengue, influenza and chikungunya is reported in different literature [17,18,19]. In all models, a balance must be shown between obtaining precise forecasts, considering for all uncertainties, both in the data and in the dynamics of transmission.

Mathematical modeling is used to understand the dynamics of the pandemic in its early stages and to predict the rate of spreading from the infector to receiver. The basic reproduction number (R0) measures the average number of secondary infections generated by primary cases in a fully susceptible population. A schematic diagram of basic reproduction number with a value of 2.0 is shown in Fig. 1

Fig. 1
figure 1

A schematic diagram of basic reproduction number

Liu et al. [20] reviewed and listed reproduction numbers reported in literatures in PubMed, bioRxiv and Google Scholar. Twelve studies in the period between January 1, 2020–February 7, 2020, were covered in their paper. It was shown that the estimates of reproduction number range from 1.49 to 6.49, with a mean of 3.28, a median of 2.79 and interquartile range (IQR) of 1.16. They attribute that such a large deviation to the estimation method adopted such as stochastic method, mathematical methods and exponential growth method. Zhang et al. [21] estimated the value of R0 and probable outbreak dynamics in the Diamond Princess cruise ship.

In this study, data is obtained from Govt. of India, the Ministry of Health and Family Welfare (MoHFW) for the period from 2 March to 1 April to estimate the R0 of COVID-19 by applying different statistical models. No new cases of COVID-19 have been detected in India from January 31 to 1 March 1, 2020; therefore in the modeling, the number of the samples has been taken from March 2, 2020 onward. The gradual increase of infected cases is observed on March 2, 2020 onward. The ‘projections’ package in R is used to get an idea about the possible epidemic trend in India.

2 Materials and Methods

The incidence data is taken from the Ministry of Health and Family Welfare (MoHFW) of the Government of India and COVID 10 tracker in India [5], which tracks the country-wise COVID-19 cases in India. To estimate the reproduction number, three serial interval (SI) distributions reported in the literature: (1) by Li et al. [22]; (2) by Nishihura et al.; [23] and (3) by Du et al. [24].

Two R packages ‘earlyR’ [25] and ‘R0’ [26, 27] are used to estimate the basic reproduction number (R0). The ‘earlyR’ package estimates this number using the maximum likelihood (ML) method as illustrated by Cori et al. [28]. The ‘R0’ package uses five different methods to estimate the number: (1) from attack rate, (2) maximum likelihood, (3) exponential growth rate, (4) Bayesian approach and (5) time-dependent reproduction number. In this paper, three methods from the ‘R0’ package: (1) exponential growth rate (EG), (2) maximum likelihood (ML) and (3) time-dependent (TD) reproduction number are used. The maximum likelihood (ML) method of estimation in ‘R0’ package follows the algorithm proposed by White and Pagano [29], whereas the exponential growth rate method follows the paper by Wallinga and Lipsitch [30], and the time-dependent method is proposed by Wallinga and Teunis [31]. Nouvellet et al. [32] presented a simple approach to forecast near-future incidence cases based on a statistical method. In their model, the daily incidence \( I_{t} \) can be approximated by the renewal equation which is assumed to be a Poisson’s process as given by \( I_{t} \sim {\text{Pois}}\left( {R_{t} \sum\nolimits_{s = 0}^{t} {I_{t - s} \omega_{s} } } \right) \) where \( \omega \) is the serial interval and \( R_{t} \) is the instantaneous reproduction rate. Their method is implemented in the R package named ‘projections’ [33].

2.1 Serial Interval

To estimate the reproduction number, it is necessary to know about generation time, which is the time lag between infection in primary cases (infectors) and secondary cases (infectee). It is generally obtained from the serial interval (SI) which is defined as the time lag between the onset of symptoms in primary cases and secondary cases. It is assumed the serial interval has gamma distribution. In the analysis of the early outbreak of novel coronavirus in the form of COVID-19 in the city of Wuhan, China, Li et al. [22] estimated that the SI has a mean of 7.5 days and an SD of 3.4 days. However, Nishiura et al. [23] estimated that the mean and standard deviation of SI are 4.7 days (95% CI: [3.7, 6.0]) and 2.9 days, respectively. They have estimated the parameters based on the dataset of 28 infector/infectee pairs. Du et al. [24] also obtained similar SI distribution, where mean is 3.96 days (95% CI: [3.53, 4.39]) and SD is 4.75 days (95% CI: [4.46, 5.07]). Their dataset consists of 468 COVID-19 transmission events.

3 Results

3.1 Analysis of COVID-19 in India

3.1.1 Epidemic Curve and Preliminary Analysis

The epidemic curve of incidence cases in India from March 2, 2020 to April 1, 2020, is shown in Fig. 2. An exponential model is also fitted to the epidemic curve and is shown in Fig. 2. The fitted curve shows that daily cases are doubling in approximately four days.

Fig. 2
figure 2

Epidemic curve of COVID-19 in India (March 02, 2020–April 01, 2020)

3.1.2 Estimation of R0

Table 2 shows our estimation of R0 along with the 95% confidence interval for three reported SI data and the R package used in the analysis. The range of estimated values of R0 is from 1.53 to 3.25 with a mean of 2.18. It seems that the ‘R0’ package tends to overestimate the reproduction number slightly. Also, the R0 value estimated with SI mean = 7.5 and 3.4 is higher than that obtained with other SI data. The mean value obtained (i.e., 2.18) in this analysis falls within the WHO recommended value.

Table 2 Estimation of R0 with confidence interval

Figure 3 shows the daily observed incidence and model predicted incidence using ML and EG methods. This fitted incidence is then used to estimate R0. It can be seen from Fig. 3 that the predicted model fits quite well to the observed incidence data.

Fig. 3
figure 3

Observed incidence and model predicted incidence using ML and EG

The likely values of the basic reproduction number implemented with the ‘earlyR’ package are shown in Fig. 4.

Fig. 4
figure 4

Estimation of basic reproduction number (R0), ‘earlyR’ package

As deviations are observed concerning serial interval distribution, a sensitivity analysis of SI on the R0 value is also carried out. The serial interval is assumed to have gamma distribution. The mean and standard deviation of the SI are varied over a range of 1–7 days and 2–5 days, respectively, and then R0 numbers are estimated using the ML method. Figure 5a shows three different serial distributions considered in the present study. Figure 5b depicts the sensitivity of R0 to SI mean and standard deviation. From the sensitivity analysis, it can be seen that R0 has a maximum value of 2.9 when SI mean and SD are seven days and two days, respectively.

Fig. 5
figure 5

a Serial interval distribution, b sensitivity of serial interval distribution to R0

Figure 6 shows the time-dependent reproduction number (R(t)) over the period from March 14, 2020 to June 10, 2020. The epidemic curve is also shown for this period. The SI mean and the SD are assumed to be 7.5 days and 3.4 days, respectively. The start of lockdown, i.e., March 25, 2020, is also marked on the figure. It can be seen that there is a reduction in reproduction number from April 15 onward, mainly due to the imposition of travel restriction, closure of public places and the imposition of lockdown. For the last twenty days or so, the effective reproduction number has reduced to nearly 1.22 as an effect of the lockdown. However, for containing the spread of the virus, it is necessary to reduce the value of reproduction number below one.

Fig. 6
figure 6

Epidemic curve and the time-dependent reproduction number in India

3.2 Forecasting Near-Future Incidences in India

An attempt has been made to forecast near-future incidences up to August 12 using the R package ‘projections’ developed by Thibaut et al. [33]. The first forty-five days’ data starting from March 14 is used for learning the model. Further, during the projection, the reproduction number as given in the renewal equation is assumed to be constant during the projection. This reproduction number is obtained from the average of last seven days of the time-dependent reproduction number, while the serial interval has a mean and standard deviation of 7.5 and 3.4 days, respectively. Figure 7a shows the predicted daily incidences in India from April 28 to August 12. The actual daily incidences from April 28 to June 10 are also shown in Fig. 7a. It can be seen that predicted cases by the present methodology fit quite well with the observed cases. Figure 7b shows the predicted cumulative daily incidences up to August 12. With the present epidemic trajectory, it is estimated that predictions are as follows: On 15 June, cumulative incidences are 305,477 (range: 291,402–319,738); on 25 June, cumulative incidences are 492,903 (range: 469,785–517,156); on 05 July, cumulative incidences are 781,432 (range: 741,211–821,212); on 15 July, cumulative incidences are 1,225,636 (range: 1,159,404–1,289,513); on 25 July, cumulative incidences are 1,909,548 (range: 1,802,053–2,011,688); on 01 August, cumulative incidences are 2,597,702 (range: 2,447,797–2,738,595). It may be noted that these are conservative estimates, assuming the present rate of infection persists. However, if the restrictions of lockdown are eased up, then it is expected that R0 will increase which will result in more number of COVID-19 cases in India.

Fig. 7
figure 7

a Predicted daily incidence and actually observed incidence in India, b predicted cumulative daily incidence in India

3.3 Analysis of COVID-19 in States and UT of India

Analysis of the COVID-19 cases is also carried out for seven worst affected Indian states, namely Maharashtra, Tamil Nadu, Delhi, Gujarat, Uttar Pradesh, Rajasthan and West Bengal. A map of Indian states and the distribution of active COVID-19 cases are shown in Fig. 8. A probable projection of COVID-19 cases in those states is given in Table 3. It may be noted that the predictions are based on the assumption that reproduction number remains same in the prediction horizon which may not be correct particularly for long-term projection.

Fig. 8
figure 8

COVID-19 active cases in India as on June 10, 2020 [5]

Table 3 Prediction of COVID-19 cases in India and Indian states

It may be noted that first 45 days staring from March 14, 2020, is used for future prediction of COVID-19 cases.

Maharashtra

Maharashtra has the highest number of COVID-19 cases in India. As of June 10, total confirmed cases in Maharashtra is 1,04,568 with 51,379 active cases, 49,346 recovered and 3,830 death cases. The epidemic curve and the time-dependent reproduction number in Maharashtra are shown in Fig. 9. However, on a positive note, it can be observed that the effective reproductive number in Maharashtra has reduced to 1.08 on June 10, 2020.

Fig. 9
figure 9

Epidemic curve and the time-dependent reproduction number in Maharashtra

Figure 10a shows the predicted daily incidences in Maharashtra from April 28 to August 12, whereas Fig. 10b shows the cumulative projection in Maharashtra during that period. The predictions for Maharashtra are as follows: On 30 June, cumulative incidences are 1,81,648 (range: 1,68,149–1,97,195), and on 15 July, cumulative incidences are 3,41,105 (range: 3,13,764–3,70,958).

Fig. 10
figure 10

a Predicted daily incidence and actually observed incidence in Maharashtra, b predicted cumulative daily incidence in Maharashtra

Tamil Nadu:

Tamil Nadu has second highest number of COVID-19 cases in India. On June 13, Tamil Nadu has observed total 42,687 confirmed cases, 18,881 active cases, 23,490 recovered cases and 397 deaths. Among the worst affected seven states considered in this study, Tamil Nadu has lowest infection death rate. The majority of cases are mainly concentrated in and around of Chennai, which is considered to be the epicenter of Tamil Nadu. However, even after the end of lockdown, the time-dependent reproduction number is still on the higher side. On June 10, 2020, the reproduction number in Tamil Nadu is 1.49 which is higher compared to other Indian states and national average. Also, slight upward trend in the time-dependent reproduction number can be observed (Fig. 11).

Fig. 11
figure 11

Epidemic curve and the time-dependent reproduction number in Tamil Nadu

Plots indicating future incidences in Tamil Nadu are shown in Figs. 12a and b. It is predicted that cumulative cases may reach 1,19,781 (range: 97,668–1,53,513) on 30 June and 3,61,632 (range: 2,94,560–4,65,707). However, it be noted that such high numbers are predicted for Tamil Nadu can be attributed to higher reproduction number. It is quite possible that actual cases may be much lower, as infection rate may go down due to control measures (Fig. 12).

Fig. 12
figure 12

a Predicted daily incidence and actually observed incidence in Tamil Nadu, b predicted cumulative daily incidence in Tamil Nadu

Fig. 13
figure 13

Epidemic curve and the time-dependent reproduction number in Delhi

Delhi

Delhi is the third most affected state in India. As on 13 June, there are 3,85,98 confirmed COVID-19 cases with 22,742 active cases, 14,945 recovered cases and 1271 deaths. At the end of 10 June, the time-dependent reproduction number in Delhi is 1.19.

The prediction of future incidences for Delhi is shown in Figs. 14a and b. The predictions for Delhi are as follows: On 30 June, cumulative incidences are 83,585 (range: 68,370–97,017); on 15 July cumulative incidences are 1,80,666 (range: 1,46,961–2,09,867).

Fig. 14
figure 14figure 14

a Predicted daily incidence and actually observed incidence in Delhi, b predicted cumulative daily incidence in Delhi

Gujarat

There are 23,079 confirmed COVID-19 cases in Gujarat as on June 13, 2020, with 5,739 active cases, 15,891 recovered cases and 1449 deaths. It can be observed from Fig. 15 that the time-dependent reproduction number in Gujarat is 1.24 on June 10, 2020. For a brief period at the last week of May, this number had gone below 1.0, but then again it is increased.

Fig. 15
figure 15

Epidemic curve and the time-dependent reproduction number in Gujarat

The prediction of future incidences in Gujarat is shown in Figs. 16a and b. From the simulation, it is predicted that cumulative cases in Gujarat may reach 26,802 (range: 23,705–30,779) on 30 June and 37,829 (range: 32,761–43,559) on July 15.

Fig. 16
figure 16

a Predicted daily incidence and actually observed incidence in Gujarat, b predicted cumulative daily incidence in Gujarat

Uttar Pradesh

In Uttar Pradesh, there are 13,118 confirmed cases of COVID-19 resulting in 4,858 active cases, 7,875 recovered and 385 deaths on 13 June. The epidemic curve of Uttar Pradesh and the variation of reproduction number with time are shown in Fig. 17. Reproduction number at 10 June is 1.34. Also a slight upward trend in the time-dependent reproduction number can be seen from June 01 onward.

Fig. 17
figure 17

Epidemic curve and the time-dependent reproduction number in Uttar Pradesh

The predicted daily incidence and cumulative incidences in Uttar Pradesh is shown in Fig. 18a and b. It is estimated that is the current rate of infection persists then June 30 and July 15. The predicted value of cumulative cases on July 15 may reach 21,320 [range: 23,705-30,779] and 35,580 [range: 27,453-43,777], respectively.

Fig. 18
figure 18figure 18

a Predicted daily incidence and actually observed incidence in Uttar Pradesh, b predicted cumulative daily incidence in Uttar Pradesh

Rajasthan

For the state of Rajasthan, there are 12,068 confirmed cases of COVID-19 patients along with 2785 active cases, 9011 recovered cases and 272 deaths as on June 13, 2020. The epidemic curve and the time-dependent reproduction number for Rajasthan are shown in Fig. 19. R(t) has downward trend from the week preceding June 01, and on June 10 the value is 1.09.

Fig. 19
figure 19

Epidemic curve and the time-dependent reproduction number in Rajasthan

The predicted daily incidence and cumulative incidences in Rajasthan are shown in Fig. 20a and b, respectively. The predictions for Rajasthan are as follows: On 30 June, cumulative incidences are 17,706 (range: 13,275–23,073); on 15 July, cumulative incidences are 28,354 (range: 20,979–38,153).

Fig. 20
figure 20

a Predicted daily incidence and actually observed incidence in Rajasthan, b predicted cumulative daily incidence in Rajasthan

West Bengal

West Bengal is one of the highly populous states in India. On June 13, 2020, there are 10698 confirmed cases along with 5693 active cases, 4542 recovered cases and 463 deaths. The epidemic curve and the variation of reproduction number for West Bengal are shown in Fig. 21. A downward trend in the reproduction number from 01 June can be observed from the figure. Its value on 10 June is nearly 1.31.

Fig. 21
figure 21

Epidemic curve and the time-dependent reproduction number in West Bengal

The predicted daily incidence and cumulative incidences in West Bengal are shown in Fig. 22a and b, respectively. The predictions for West Bengal are as follows: On 30 June, cumulative incidences are 19,772 (range: 13,584–26,044); on 15 July, cumulative incidences are 42,864 (range: 29,526–57,359).

Fig. 22
figure 22figure 22

a Predicted daily incidence and actually observed incidence in West Bengal, b predicted cumulative daily incidence in West Bengal

4 Discussion

In this study, the basic reproduction number of novel coronavirus (COVID-19) is estimated from the early outbreak of the disease in India. From the results, it is found that the number varies from 1.53 to 3.25 with a mean of 2.18. This variation is due to the method for estimating the number. Also in this study, an attempt is made to predict near-future incidences in India and also in different Indian states. However, it must be noted that these predictions serve as general guidelines rather than absolute certainty. The predictions in India and also in different states help the central and also the state government to formulate the policy for near-future healthcare system in terms of COVID hospitals, doctors, health workers, ICU beds, ventilators, PPE, etc. to fight against COVID-19 pandemic. The predictions also help to take decision regarding further lockdown in hotspot/epicenter area, infrastructure set up for COVID patients, economic and social issues, etc.

Financial support & sponsorship: None.

Conflicts of Interest: None.