1 Introduction

The pandemic of the novel coronavirus disease (COVID-19), caused by the SARS-CoV2 virus, has proven to be one of the most serious health crises in recent human history. As of this writing, over half a billion of infection cases have been confirmed [1], with millions more probably having gone undetected owing to a myriad of reasons, from a lack of sufficient testing to asymptomatic infections to poor reporting practices. Sadly, more than 6.3 millions lives worldwide have thus far been lost to the disease. The global fight against COVID-19 has been made more difficult by the resurgence of infections after periods of relative control of the disease spread, thus giving rise to successive ‘epidemic waves’ [2, 3] in most countries.

Several epidemics, especially those caused by respiratory viruses, are known to occur in repeated, seasonal patterns [4,5,6]. The COVID-19 pandemic has shown a much more complex behavior in that the successive waves of infections in a given population group stem not so much from seasonal effects but rather from the intricate interaction between the virus propagation dynamics and the population behavior in response to interventions (or lack thereof) by the local health authorities [7]. Indeed, when restrictions are lifted (or poorly adhered to), transmission tends to increase; and, conversely, when control measures are reintroduced, the transmission rate declines. It is thus important to be able to identify, in a quantitative and reliable manner, the occurrence of distinct waves in a given epidemic dataset, as this information can help researchers and health authorities to surmise the impact of control measures (and their relaxation) on the overall epidemic evolution. Having reliable mathematical models and numerical algorithms to describe epidemic curves with multiple waves is therefore an important step in this endeavor. Several models have been considered in the literature to describe COVID-19 curves with multiple waves, such as compartmental [8,9,10] and growth [7, 11] models with time-dependent parameters, among others. These models, albeit satisfactory in many cases, have the disadvantage that their defining ordinary differential equations (ODE) need to be integrated numerically, which makes fitting the model to empirical data more cumbersome.

In this paper, we consider a different class of models—called the pathway approach—where the growth rate for the quantity of interest, say, the total number of cases or deaths, is given as a known, explicit function of time. The pathway approach was originally introduced to describe, in a unified manner, a large family of probability distributions [12, 13], but recently it has been applied to one-wave COVID-19 curves of cases and deaths [14]. Here, we extend the model to multiwave epidemic curves by assuming that the model parameters become time dependent, so as to reflect the changes in the underlying epidemiological conditions associated with the successive waves of infection. The specific time dependency of the model parameters is given by a multistep logistic-like function with N plateaus, where N is the number of waves, whereby each plateau represents the parameter value during the corresponding wave. We apply our time-dependent pathway model to COVID-19 mortality curves from ten selected countries, exhibiting from two up to five waves and show that the model is in excellent agreement with the empirical data for all countries considered.

The pathway approach considered here has several advantages. First, the model turns out to be quite flexible and capable of capturing the various distinct wave-like patterns present in empirical data. Second, it allows us to perform numerical fits directly on the daily data without any ODE integration involved, as the theoretical daily curve is written as an explicit function of time. This is particularly relevant for multiwave epidemics, as fittings based on numerical integration of ODEs for such cases are more computationally demanding and more prone to parameter uncertainties. Third, from the fitted theoretical curve, it is an easy matter to determine its maxima and minima and thus locate the starting and peak dates for each wave. Furthermore, the model is able to describe not only the main waves of a given epidemic curve but also smaller dynamical structures (e.g., sub-waves and ‘shoulders’) inside a main wave, as will been seen later.

2 Data

Here, we focus exclusively on mortality data from COVID-19, instead of infection cases. The reason for this choice is the difficulty to estimate the actual number of infected people by the SARS-CoV-2, since the confirmed cases represent only an unknown fraction of the total number of infections. In this scenario, the number of deaths attributed to COVID-19 is a somewhat more reliable measure to describe the dynamics of the epidemic [15].

As our main aim here is to analyze the successive waves of the COVID-19 epidemic in different countries, we have selected a representative set of countries that have undergone, until the maximum date considered here, namely March 3, 2022, from two up to five waves of infections. More specifically, we have analyzed the COVID-19 mortality curves for the following ten countries: Austria, Brazil, Bulgaria, Canada, Croatia, Italy, Netherlands, Slovakia, South Africa, and USA. The data used in this study were obtained from the database made publicly available by the Johns Hopkins University [16], which lists in automated fashion the number of the confirmed cases and deaths per country.

3 Methods

In this section, we introduce the pathway model and discuss its main mathematical aspects. We begin with the case where the model parameters are constant in time, which applies to single-wave epidemic curves. Then, we allow the model parameters to become time dependent, so as to capture multiple-wave effects.

3.1 Standard pathway model

We describe the time evolution of the daily number of cases or deaths (dD/dt) in the epidemic by means of the pathway model (PM), defined by the following ordinary differential equation:

$$\begin{aligned} \frac{dD}{dt}\!=\!\frac{Ct^{\alpha }}{\left[ 1\!+\!\beta (q-1)t^\gamma \right] ^{1/(q-1)}},\, C,\alpha ,\beta ,\gamma \!>\!0,q\!>\!1. \nonumber \\ \end{aligned}$$
(1)

where t is the time elapsed since the last day prior to the first death, such that \(t=0\) represents the day before the first death occurred. As indicated above, the aggregated quantity D(t) could represent the cumulative number of either detected infection cases or deaths attributed to COVID-19. But here, we shall focus exclusively on death curves, for they are less affected by under reporting [15] and hence present a more reliable measure to study multiple waves of infections, which is our main goal here. The dynamical role played by the different parameters in (1) will be discussed later on. For now, it suffices to say that in order to ensure that D(t) reaches a finite value for \(t\rightarrow \infty \) (rather than growing indefinitely which is not epidemiologically sensible), we must require that

$$\begin{aligned} {\gamma }-(q-1)(\alpha +1)>0. \end{aligned}$$
(2)
Fig. 1
figure 1

Plots of the daily (a) and cumulative (b) curves for the pathway model with the following (constant) parameters: \(C=10^{-3}\), \(\alpha =3.3\), \(q=1.4\), \(\beta =10^{-5}\), and \(\gamma =3\)

It is worth noting that the right-hand side of (1), with \(q>1\), corresponds to the generalized type-2 beta density function, which is one of the limiting cases of the so-called pathway approach to describe certain complex systems [12, 13]. In the pathway approach, the parameter q can be varied so as to produce a rather extensive family of probability distributions, ranging from the generalized type-1 beta functions (\(q<1\)) to the generalized type-2 beta functions (\(q>1\)), while also recovering in the limit \(q\rightarrow 1\) the generalized gamma distribution and other related distributions [13]. In epidemic dynamics, which is our main focus here, the relevant range is \(q>1\). It is nonetheless worth noting that the generalized type-1 beta functions (\(q<1\)) also find applications in the context of mathematical epidemiology. In this case, however, the pathway approach is formulated as a generalized logistic model, where the growth rate is written in terms of the cumulative variable D itself (rather than explicitly in terms of the time t). In other words, replacing t with D in the right-hand side of (1) and considering \(q<1\), one obtains the so-called beta logistic model, which has been applied to epidemic COVID-19 curves with one and multiple epidemic waves [7, 17].

Viewing the right-hand side of (1) as a probability density function can be useful to understand the underlying epidemic growth process. In this perspective, Eq. (1) says that the growth rate of the disease, \(\dot{D}(t)\), where dot denotes time derivative, counted in number of cases or deaths at a given time t, can be viewed as proportional to the probability that a new case or death may occur at time t. For a typical epidemic outbreak, one then expects that the probability is quite low in the beginning of the outbreak, then it increases rather sharply until reaching a peak, after which the probability of infection should decrease, possibly with a long tail. The PM, as defined in (1), provides a flexible approach to describe this generic behavior of an epidemic outbreak. Another noteworthy aspect of the pathway approach is the fact that its probability density function can be derived by optimizing a generalized entropy measure [18], which provides an interesting way to justify the PM as an effective macroscopic description of an underlying agent-based dynamics.

Fig. 2
figure 2

Plots of the daily (a) and cumulative (b) curves for the pathway model with \(N=3\) waves, where the time-dependent parameters are as shown in Fig. 3

Fig. 3
figure 3

Plots of the model parameters \(\{C(t),\alpha (t),q(t),\beta (t),\gamma (t)\}\) used in the example with three waves (\(N=3\)) shown in Fig. 2. The respective plateau parameters entering function (14) are as follows: \(C_1=1.84 \times 10^{-12}\), \(C_2=2.24 \times 10^{-15}\), \(C_3=1.59\times 10^{-16}\), \(\alpha _1=7.19\), \(\alpha _2=7.67\), \(\alpha _3=7.89\), \(q_1=2.11\), \(q_2=1.23\), \(q_3=1.75\), \(\beta _1=1.45\times 10^{-6}\), \(\beta _2=9.70\), \(\beta _3=6.05\), \(\gamma _1=2.23\), \(\gamma _2=0.0826\), \(\gamma _3=1.05\), \(t_1=166\), \(\rho _1=0.125\), \(t_2=375\), and \(\rho _2=0.180\). The plot of C(t) in the upper left panel is shown in a semi-log scale

A direct integration of (1) yields the following analytic expression for the cumulative number of deaths up to time t:

$$\begin{aligned} D(t)= & {} \frac{Ct^{\alpha +1}}{\alpha +1} {\, _2}F{_1}\left( \frac{\alpha +1}{\gamma }, \frac{1}{q-1}; 1\right. \nonumber \\{} & {} \left. +\frac{\alpha +1}{\gamma }; -\beta (q-1)t^\gamma \right) . \end{aligned}$$
(3)

where \({_2 F_1}(a,b;c;x)\) is the Gauss hypergeometric function. The fact that the PM has an explicit analytic solution is an important property, especially for applications in one-wave epidemic curves, for it allows a direct fit of the model to cumulative curves, where fluctuations are smaller in comparison with daily curves; see Sect. 4.1.

Figure 1a shows a plot of the function \(\dot{D}(t)\), as given by the right-hand side of (1) for the following parameter values: \(C=1\times 10^{-3}\), \(\alpha =3.3\), \(q=1.4\), \(\beta =1\times 10^{-5}\) and \(\gamma =3\). Note that the shape of the curve \(\dot{D}(t)\) has the expected behavior described above. One important feature of the curve \(\dot{D}(t)\) defined by (1) is its asymmetry around the peak, which reflects the fast initial growth of the epidemics, followed by a slower decay of the daily number of deaths/cases after the peak, as seen in Fig. 1a. In terms of the cumulative curve D(t), shown in Fig. 1b, this dynamics translates as follows: The curve displays a rapid early rise, followed by a nearly linear growth regime around the inflection point (which corresponds to the peak of the daily curve), after which the growth profile decelerates and starts to approach a plateau, which corresponds to the total number of cases/deaths at the end of the epidemics (assuming there is only one wave of infections). The different regimes of the growth dynamics mentioned above are governed by the different parameters of the model, as discussed next.

First, taking the limit \(t\rightarrow 0\) in (1) yields

$$\begin{aligned} \frac{dD}{dt}\approx Ct^{\alpha }, \end{aligned}$$
(4)

so that the cumulative curve has a polynomial early growth:

$$\begin{aligned} D(t)\approx {At^{\mu }},\quad t\rightarrow 0, \end{aligned}$$
(5)

where \(A=C/(\alpha +1)\) and \(\mu =\alpha +1\). Incidentally, Eq. (4) can be used to estimate the order of magnitude of the constant C, which as we shall see is the smallest parameter in the model. If we define \(t_{b}\) as the approximate day of the beginning of the fully developed epidemic growth (roughly 2 weeks), then \(C \approx M/t_{b}^{\alpha }\), where \(M=D^{\prime }(t_b)\) is the number of daily deaths at time \(t_b\). Setting \(M=100\) and \(t_b=14\), we can see that \(C\lesssim 10^{-10}\) for values of \(\alpha \gtrsim 10\). Similar analysis shows that the parameter \(\beta \) is also rather small, being comparable to or sometimes even smaller than C, while the exponents \(\alpha \), q, and \(\gamma \) are typically of the order of unity. Controlling the numerical errors associated with such small values of C and \(\beta \) (in comparison with the other parameters) is one of the challenges of our fitting algorithm; see below. Now, differentiating (1) and setting it to zero, one finds that the inflection point \(t_c\) of the cumulative curve (corresponding to the peak of the daily curve), defined by \(\ddot{D}(t_c)=0\), is given by

$$\begin{aligned} t_c=\left( \frac{\alpha }{\beta [\gamma -\alpha (q-1)]}\right) ^{1/\gamma } . \end{aligned}$$
(6)

In view of (6), we can alternatively write (1) as

$$\begin{aligned} \frac{dD}{dt}=\frac{Ct^{\alpha }}{\left[ 1+\left( \frac{\alpha (q-1)}{\gamma -\alpha (q-1)}\right) \left( \frac{t}{t_c}\right) ^\gamma \right] ^{1/(q-1)}}. \end{aligned}$$
(7)

Let us now analyze the large-time behavior of D(t), i.e., for \(t\gg t_c\). First, we compute the final plateau, K, of the cumulative epidemic curve, where \(K=\lim _{t\rightarrow \infty }D(t)\). Writing \(K=\int _0^\infty \dot{D}(t)dt\), where \(\dot{D}(t)\) is as in (1), and performing the integration yields

$$\begin{aligned} K = \frac{C}{\gamma \left[ \beta (q-1)\right] ^{(\alpha +1)/\gamma }}\frac{\Gamma (\frac{1}{q-1}- \frac{\alpha +1}{\gamma })\Gamma (\frac{\alpha +1}{\gamma })}{\Gamma (\frac{1}{q-1})}. \end{aligned}$$
(8)

Now taking the limit \(t\rightarrow \infty \) in (1), one obtains

$$\begin{aligned} \frac{dD}{dt}\approx \frac{C t^{\alpha -\gamma (q-1)^{-1}}}{\left[ \beta (q-1)\right] ^{1/(q-1)}} , \end{aligned}$$
(9)

which upon integration yields

$$\begin{aligned} D(t)\approx K - \frac{B}{ t^{\nu }}, \quad t\rightarrow \infty , \end{aligned}$$
(10)

where

$$\begin{aligned} B = \frac{(q-1)C}{\left[ \beta (q-1)\right] ^{1/(q-1)}\left[ \gamma -{(q-1)}(\alpha +1)\right] }, \end{aligned}$$
(11)

and

$$\begin{aligned} \nu = \frac{\gamma }{q-1}- \alpha -1, \end{aligned}$$
(12)

with condition (2) ensuring that \(\nu >0\).

Equation (10) thus shows that the PM has the important property that the cumulative curve approaches the plateau as a power law, rather than exponentially fast as would be predicted by, say, the standard compartmental models of the SIR type. Recent analyses [17] have shown that the COVID-19 fatality curves for many countries do indeed display such a polynomially slow approach to the plateau (during the first wave of the disease). Hence it is important to consider theoretical models with power-law behavior. Several such models have been discussed in the literature, including generalized logistic models [17] and compartmental models with nonlinear incidence rate [19]. One advantage of the pathway approach, as already mentioned, is that it is formulated explicitly in time, making its power-law behavior rather manifest.

For completeness, we note that in the limit \(q\rightarrow 1\), Eq. (1) becomes

$$\begin{aligned} \frac{dD}{dt}=Ct^{\alpha }\exp \left( -\beta t^\gamma \right) , \end{aligned}$$
(13)

showing that in this limit, the daily curve has a stretched-exponential decay after the peak, whereas in the general case (\(q>1\)), it has a power-law tail, as shown in (9).

From the preceding analysis, a few words can be said about the role that each parameter plays in defining the overall shape of the curve D(t). First, it is obvious from (4) that the parameter \(\alpha \) controls the early growth regime, as it defines the exponent of the polynomial growth at the beginning of the outbreak. Second, note that, although the parameter \(\gamma \) enters into the expression for the exponent \(\nu \) in (12), it is clear that \(\nu \) is more sensitive to variations in the parameter q. Besides, the power-law behavior shown in (10) is only possible for \(q>1\), with \(q\rightarrow 1\) yielding a stretched-exponential behavior, as mentioned above. It can thus be said that q is mainly responsible for controlling the power-law saturation regime, while \(\gamma \) adds flexibility to the model as it contributes to the asymmetry of the cumulative (daily) curve around the inflection point (peak). Third, one sees from (6) that the parameter \(\beta \) sets the time scale for the location of the inflection point \(t_c\). Finally, let us consider the role of the pre-factor C in (1). In statistical applications of the pathway approach, where the right-hand side of (1) is a probability density function, C is a normalizing constant which can be obtained in terms of the other parameters. In a similar vein, as the ‘area’ under the daily curve \(\dot{D}(t)\) yields the total number of deaths/cases at the end of the epidemic, it is natural to expect that C should relate to the value of the plateau K of the cumulative curve, which is indeed the case as shown in (8). As the value of K is not known a priori, we must therefore take C as a free parameter to be estimated from the numerical fit of the model to the data.

As described above, the set of parameters of the pathway model can capture a rich class of dynamical behaviors and therein lies the model’s power and flexibility. The pathway approach has, however, the drawback that the model parameters are not easily interpreted in terms of standard epidemiological concepts, as in the case of compartmental models. In this context, it should be noted that some logistic-like growth models can be ‘mapped’ onto respective compartmental models [7, 20, 21], whereby the parameters of the former models can be put into correspondence (albeit in a coupled and nonlinear manner) with the parameters of the latter. A similar comparative study between the pathway approach and compartmental models is therefore an interesting topic for future research.

The pathway approach in its probabilistic version has been applied to a great variety of physical phenomena, for instance in astrophysics and statistical mechanics [13]. More recently, this approach was also used to model epidemic dynamics in the context of the COVID-19 pandemic [14], where the main idea was to predict the day in which the peak of the curves of active cases and daily deaths would be achieved in various countries. Here, we shall pursue further the epidemiological application of the PM by extending it to the case of multiple waves of infections, as described next.

3.2 Multiple-wave model

To describe epidemic curves with multiple waves of infections, we shall continue to use the PM, as given by (1), but now we assume that all model parameters are time dependent, which we indicate by writing C(t), \(\alpha (t)\), q(t), \(\beta (t)\), and \(\gamma (t)\). Furthermore, to capture the distinct growth regimes corresponding to the successive waves, we propose that these parameters evolve in time according to the following generalized logistic function

$$\begin{aligned} \zeta (t) = \zeta _1 + \sum _{i=1}^{N-1}\frac{\left( \zeta _{i+1}-\zeta _i\right) }{2}\left[ 1+\tanh \left( \frac{\rho _i(t-t_i)}{2}\right) \right] , \nonumber \\ \end{aligned}$$
(14)

where \(\zeta (t)\) stands for any of the model parameters, that is, \(\zeta (t)=\{C(t), \alpha (t), q(t), \beta (t), \gamma (t)\}\), and N is the number of waves. The function given in (14) describes a curve with N plateaus, whose values are denoted by the constants \(\zeta _i\), \(i=1,\ldots ,N\), where each plateau represents the corresponding parameter value during the i-th infection wave. The constants \(t_i\), \(i=1,\ldots ,N-1\), determine the transition times between successive waves; whereas the constants \(\rho _i\) characterize how rapid this transition takes place, so that the larger the value of \(\rho _i\), the faster the transition to the next wave regime. The transition times, \(t_i\), and transition rates, \(\rho _i\), are assumed to be the same for all model parameters. This is justified because an overall change in the epidemic dynamics, as a result, say, of the adoption or relaxation of control measures, is expected to affect simultaneously all epidemiological parameters [17]. Thus, in effective models based on a single ODE, such as the pathway approach or growth models, changes in the epidemic dynamics as result of both pharmacological and non-pharmacological interventions are reflected in time variations in the model parameters. Conversely, one may mimic the implementation of control measures by introducing a priori changes in the model parameters at some point in time and studying their impact in the future evolution of the epidemic curve [15].

In the case of time-dependent parameters, an analytical solution for the PM is no longer possible. Thus, in order to obtain the cumulative curve D(t), one must resort to a numerical integration of (1), with the parameters \(\{C(t), \alpha (t), q(t), \beta (t), \gamma (t)\}\) described by their respective transition functions as given in (14).

Figure 2a shows an example plot of a daily curve obtained from the PM (1) for three waves, i.e., \(N=3\), with the corresponding cumulative curve being shown in Fig. 2b. The time dependencies of the model parameters \(\{C(t),\alpha (t),q(t),\beta (t),\gamma (t)\}\) for the example given in Fig. 2 are shown in Fig. 3, where the specific plateau values entering function (14) for each model parameter are given in the figure caption. In Fig. 3, one clearly sees the three plateaus for each of the model parameters, where each such plateau gives origin to a corresponding wave in the epidemic curves shown in Fig. 2. Also indicated in Fig. 3 are the corresponding transition times \(t_i\), for \(i=1,2\), between i-th and the \((i+1)\)-th waves. The temporal width, \(\varDelta t^{(i)}\), of each transition region is dictated by the corresponding transition rate \(\rho _i\) (assuming that the transition times \(t_i\) are well separated apart). An estimate of the width \(\varDelta t^{(i)}\) can be obtained by using a linear approximation for each transition region centered at the mid point between the two consecutive plateaus, in which case one finds that \(\varDelta t^{(i)}=4/\rho _i\) [7].

One sees from (14) that, for a given N, the N-wave model has \(7N-2\) free parameters, corresponding to the N plateaus for each of the five model parameters \(\{C,\alpha ,q,\beta ,\gamma \}\), together with the \(2(N-1\)) parameters \(t_i\) and \(\rho _i\), \(i=1,\ldots ,N-1\), describing the transition regions between successive waves. In applying a model with such a large number of parameters to empirical data, one must be careful to avoid excessive overfitting. Below we describe a fitting procedure that aims at minimizing this risk.

3.3 Data analysis

In all numerical fits reported here, we employed the Levenberg-Marquardt algorithm to solve the nonlinear least square optimization problem, as implemented in the scipy package of the Python language. In the case of single-wave epidemic curves, the empirical data can be fitted with the PM with constant parameters. Since in this case the model can be integrated exactly for the cumulative number D(t), we prefer to fit the analytic solution given in (3) to the cumulative data, where the level of noise is considerably smaller than that in the daily curve. For this reason, the fits presented in Sect. 4.1 for one-wave curves were performed on the respective cumulative empirical data using the exact solution (3). In such cases, for each given dataset, we need to determine five parameters, namely \(\{C,\alpha ,q,\beta ,\gamma \}\).

For epidemic curves with multiple waves, which is our main focus of interest here, the model parameters become time dependent, as discussed in Sect. 3.2, and the model no longer admits an explicit solution for the cumulative count D(t). In such cases, it is more convenient to perform the fits on the empirical data for the daily number of deaths, as the function \(\dot{D}(t)\) is given explicitly by (1) and (14), thus rendering the numerical analysis easier to apply to the daily data (rather than to the cumulative counts). Thus, all numerical fits presented in Sects. 4.24.5 for COVID-19 curves with multiple waves were performed on the daily curves.

Owing to the large number of parameters for the multiwave model (we recall that there are \(7N-2\) free parameters for N waves), the parameter estimates returned by the fitting procedure are quite sensitive to their respective initial guesses. In particular, we noticed that the fitting procedure is particularly sensitive to the initial guesses for the transition times \(t_i\) between successive waves. For instance, in some cases, one obtains a visually good fit to the data, but with high errors in some of the parameters, possibly indicating overfitting. We have therefore implemented some control measures to reduce overfitting, as discussed below.

First, we have imposed some range restrictions on the parameters \(\{C_i, \alpha _i, q_i, \beta _i, \gamma _i\}\), such that their respective lower bounds are as in (1), with an additional upper bound for \(\{q_i\}\) equal to 3. Second, in the fitting procedure, we require that condition (2) be satisfied for the parameters of the last wave, namely

$$\begin{aligned} {\gamma _N}-(q_N-1)(\alpha _N+1)>0. \end{aligned}$$
(15)

As discussed in Sect. 3.1, this is necessary to ensure that D(t) reaches a finite value as \(t\rightarrow \infty \). Third, we have carefully selected the initial guesses for the fitting parameters, as follows. As mentioned before, the quality of the fits is quite sensitive on the choice of the initial guesses for the transitions times \(t_i\) but less so for the other parameters. In view of this fact, we have chosen to fix the initial guesses for the parameters \(\{C_i, \alpha _i, q_i, \beta _i, \gamma _i\}=\{1\times 10^{-3},4,1.4,1\times 10^{-5},3\}\), for \(i=1,2,\ldots ,N\), as well as the initial guesses for the transitions rates at \(\rho _i=0.1\), for \(i=1,2,\ldots ,N-1\); whereas the initial guesses for the transition times \(t_i\) are randomly selected in intervals whose bounds are chosen a priori by visual inspection of the empirical curve. This manner to select the parameter initial guesses has proven quite satisfactory and in general yields very good fits. As already mentioned, because of the large discrepancy in orders of magnitude of the parameters \(C_i\) and \(\beta _i\) relatively to the other parameters, the errors in the fitting parameters are not reliably estimated by our routine. Nonetheless, the excellent agreement between the theoretical curves and the empirical data in all fits presented here indicates that our numerical procedure yields dependable results.

Fig. 4
figure 4

Left panels: cumulative number of deaths (green circles) attributed to COVID-19 for a Brazil, up to October 20, 2020, c Italy, up to August 10, 2020, and e USA, up to June 27, 2020. The solid curves are the best fits by the standard pathway model with one wave. Right panels: Daily number of deaths for the same countries as in the corresponding left panels, where the empirical data are indicated by green circles and the solid curve represents the time derivative of the respective theoretical curve in the left panels. The peaks of the daily curves are indicated by the black dots

4 Results

As of this writing, most countries around the world have exhibited at least three or more waves of COVID-19 [22], with just a few countries having only two major waves, and hardly any with merely a single wave. As our main aim in this paper is to illustrate the application of the PM to epidemic curves with multiple waves, we have chosen a representative sample of COVID-19 fatality curves from countries that exhibit up to five waves. More specifically, we have selected a total of 10 countries, as follows: one country (Slovakia) with two waves; two countries (Brazil and Bulgaria) with three waves; three countries (Croatia, Netherlands, and South Africa) with four waves; and four countries (Austria, Italy, Portugal, and USA) with five waves. For all selected countries above, we have analyzed data up to March 3, 2022.

For completeness, we have also included examples of applications of the PM to single-wave epidemic curves. Since to this day practically every country has experienced at least a second wave of COVID-19 infections, in order to obtain single-wave curves we need to truncate the empirical data at a suitable date before the second wave had started. To exemplify such cases, we have selected three countries and respective maximum dates, as follows: Brazil, up to October 22, 2020; Italy, up to August 10, 2020; and the USA, up to June 27, 2020. Below, we start by showing model fits for one-wave curves, after which we present results for multiple waves.

4.1 Examples with one wave

In the left panels of Fig. 4, we show as green circles the empirical data for the cumulative number of COVID-19 deaths in Brazil (up to October 22, 2020), Italy (up to August 10, 2020), and the USA (up to June 27, 2020); while the black solid lines correspond to the fitted curves obtained from the exact solution (3) for the PM. The right panels in Fig. 4 show the empirical daily counts of deaths in green circles, while the solid curves represent the theoretical daily curves given by the right-hand side of (1). The black dots on the theoretical daily curves indicate the peaks of the first wave. In all fits shown, one sees a remarkable agreement between the theoretical model and the empirical data. The fit parameters for the plots shown in Fig. 4 are given in Table 2. In the remainder of this section, we present examples of recent COVID-19 epidemic curves for several countries where multiple waves of infections developed.

4.2 Example with two waves

We recall that for curves with multiple waves, the PM no longer admits an exact solution for the cumulative curve D(t). In such cases, it is more convenient to fit the daily data with the theoretical daily curve \(\dot{D}(t)\), as it is given explicitly by the right-hand side of (1), with the model parameters as in (14). In Fig. 5a, we show the empirical data (green circles) of the daily number of COVID-19 deaths for Slovakia, which has thus far developed only two main waves of COVID-19. The black solid line in this figure corresponds to the fitted curve obtained from the multiwave PM with \(N=2\), where the fitted parameters are given in the figure caption. In Fig. 5b, we show the corresponding cumulative curves, where the empirical data are indicated by green circles and the solid black curve was obtained by numerical integration of Eqs. (1) and (14), using the parameter values obtained from the fit in the left panel.

Fig. 5
figure 5

a Daily number of deaths (green circles) attributed to COVID-19 for Slovakia up to March 3, 2022. The solid curve is the best fit by the pathway model with three waves (\(N=2\)), yielding the following parameters: \(C_1=4.39 \times 10^{-13}, \alpha _1=4.59, q_1=2.05, \beta _1=1.14 \times 10^{-4}, \gamma _1=3.95, C_2=1.97 \times 10^{-3}, \alpha _2=4.09, q_2=1.00, \beta _2=6.60 \times 10^{-6}, \gamma _2=2.25, \rho _1=0.0214\), and \(t_1=416\). The black dots indicate the maxima and minima of the daily theoretical curve. b Cumulative number of deaths, where the empirical data are indicated by green circles and the solid curve represents the curves obtained by numerically integrating the theoretical curve in the left panel

Table 1 Peak and starting dates of the main epidemic waves for all countries studied, as obtained from the maxima and minima of the respective theoretical daily curves shown in Figs. 5, 6, 7, 8

It is interesting to notice that the first main wave in Slovakia came to almost a complete stop, before a resurgence of the disease. This is indicated by the near-zero ‘trough’ separating the two main waves in Fig. 5a, which corresponds to a near-horizontal, extended intermediate plateau in the cumulative curve in Fig. 5b. Capturing such a wide flat trough with an overall smooth curve is not an easy task, nonetheless the PM does a remarkable job in fitting Slovakia’s daily curve, as seen in Fig. 5a. The agreement between theory and data in the cumulative curves, see Fig. 5b, is also very good.

Fig. 6
figure 6

Left panels: daily number of deaths (green circles) attributed to COVID-19 for a Brazil and c Bulgaria, up to March 3, 2022. The solid curves are the best fits by the pathway model with three waves (\(N=3\)). The black dots indicate the maxima and minima of the daily theoretical curve that correspond to the peak and starting dates of the main waves. Right panels: Cumulative number of deaths for the same countries as in the corresponding left panels, where the empirical data are indicated by green circles and the solid curve represents the curves obtained by numerically integrating the theoretical curves in the left panels

Fig. 7
figure 7

Left panels: Daily number of deaths (green circles) attributed to COVID-19 for a Croatia, and c Netherlands, and e South Africa, up to March 3, 2022. The solid curves are the best fits by the pathway model with four waves (\(N=4\)), while the black dots indicate the maxima and minima corresponding to the peak and starting dates of the main waves. Right panels: Cumulative number of deaths for the same countries as in the corresponding left panels, where the empirical data are indicated by green circles and the solid curve represents the curves obtained by numerically integrating the theoretical curves in the left panels

Fig. 8
figure 8

Left panels: Daily number of deaths (green circles) attributed to COVID-19 for a Austria, c Italy, e USA, and g Portugal, up to March 3, 2022. The solid curves are the best fits by the pathway model with five waves (\(N=5\)). The black dots indicate the maxima and minima that correspond to the peak and starting dates of the main waves. Right panels: Cumulative number of deaths for the same countries as in the corresponding left panels, where the empirical data are indicated by green circles and the solid curve represents the curves obtained by numerically integrating the theoretical curves in the left panels

From the best-fit model, one can obtain relevant information about the epidemic evolution, such as the peak and starting dates of each wave, as represented by the maxima and minima of the theoretical daily curve, which are indicated by black dots in Fig. 5a. The calendar dates for such important characteristic points of the COVID-19 epidemic in Slovakia are given in Table 1.

4.3 Countries with three waves

In Fig. 6, we show the empirical data (green circles) of the daily number of COVID-19 deaths for Brazil and Bulgaria, two countries that have experienced three major waves of the pandemic. The black solid lines in this figure correspond to the fitted curves obtained from the multiwave PM (1), where the time dependency of the parameters is as given in (14) for \(N=3\). (The parameter values of the theoretical curve for the fits shown in Fig. 6 are given in Table 3.)

One sees from the left panels of Fig. 6 that in spite of large fluctuations in the empirical daily data, the theoretical curves give a very good description of the data evolution. This, in turn, translates into an excellent agreement between theory and data for the cumulative curves, as shown in the right panels of Fig. 6, where the green data are the cumulative death counts and the black solid lines are obtained by numerical integration of the theoretical curves in the corresponding left panels. The starting and peak dates for the successive epidemic waves in each country shown in Fig. 6, corresponding to the black dots indicated in the left panels of the figure, are given in Table 1.

4.4 Countries with four waves

In the left panels of Fig. 7, we show the empirical data (green circles) for the daily number of COVID-19 deaths for Croatia, Netherlands, and South Africa, superimposed with the fitted curves (black solid lines) obtained using the four-wave version of the model given by (1), with the parameters varying in time according to (14) with \(N=4\). (The parameter values of the theoretical curves are shown in Table 4.) As before, the right panels of Fig. 7 show the empirical (green circles) and theoretical (black solid lines) cumulative curves obtained from the daily curves shown in corresponding left panels. Once again, one sees a very good agreement between theory and data for both daily and cumulative curves. The peak and starting dates for the successive epidemic waves in each country shown in Fig. 7 are again given in Table 1.

4.5 Countries with five waves

In the left panels of Fig. 8, we show the empirical data (green circles) and theoretical fits (solid black lines) of the multiwave PM with \(N=5\) for the daily number of COVID-19 deaths for Austria, Italy, Portugal, and USA. (The parameter values of the theoretical curve are shown in Table 5.) The right panels in the figure show the empirical and theoretical cumulative curves for the corresponding daily curves in the left panels. Again, the same conclusions as before can be drawn about the efficiency of our model in fitting the empirical data, even for epidemic curves with complex evolution patterns and a larger number of waves of infections, such as those shown in Fig. 8. The peak and starting dates of the epidemic waves for the countries seen in Fig. 8 are all shown in Table 1.

5 Discussion

We have seen above that the PM, as defined in (1) and (14), is a versatile model that is able to capture with a considerable degree of fidelity the complex patterns of COVID-19 epidemic curves with multiple waves of infection. An important advantage of the pathway approach to bear in mind is that it yields an explicit model, meaning that the expression for the daily curve, represented by \(\dot{D}(t)\) in (1), is written explicitly in terms of the time variable, which makes it quite convenient for direct fits to the daily empirical data. Contrast this with the usual growth and compartmental models, where the respective differential equations are given in terms of the accumulated variables, thus requiring a numerical integration of the model before it can be fitted to the data [7]. A minor conceptual downside of the pathway formulation is that it cannot be formally rewritten as a standard growth model, since the dependency of the growth rate \(\dot{D}\) on the cumulative quantity D is not known a priori nor can it be easily obtained (if possible at all), although it could be easily computed numerically. This makes the epidemiological interpretation of the parameters of the PM less direct than, say, those of compartmental and growth models [7, 17]; see Sect. 3.1. Nonetheless, the PM has proven very effective in describing epidemics with complex multiwave behavior, as shown here, and as such it contributes an important tool to the mathematical epidemiology toolbox.

Another important aspect of the PM is that it has an in-built complexity that stems from the multistep logistic function (14) used for the time dependency of the parameters. This allows for a rich behavior that is not easily anticipated from the model’s equation of motion. For example, in addition to the N main peaks (for a given N), the model is also able to capture ‘subwaves’ (i.e., smaller peaks) and ‘shoulders’ (i.e., flatter portions) near a main peak. Examples of these interesting features can be seen, for instance, in Figs. 6a, c, 8c, and e.

As already mentioned, from the fitted theoretical curve for a given location, one can extract relevant information about the dynamical evolution of the disease in the chosen location. For example, from the fitted theoretical daily curve, it is a simple matter to determine its maxima and minima, where the former points correspond to the waves’ peaks, when the epidemic was at its worst periods; while the latter indicate periods when an epidemic wave has subsided, meaning that some control of the disease spread had by then been attained, after which a resurgence of infections takes place (probably owing to relaxation of control measures), thus characterizing the beginning of a new wave. Obtaining reliable estimates for the starting and peak dates for each successive wave is important for researchers and health authorities [23,24,25], as it allows for a direct comparison of the occurrence of peaks and troughs in an epidemic curve with the timing of containment measures, thus making it possible to study in a more quantitative fashion the efficacy of interventions and the (possibly negative) impact of their relaxation.

It is important to point out that the type of information extracted from mathematical models, such as the PM, cannot be easily obtained—at least not with the same degree of accuracy—from a mere visual inspection of neither the raw empirical data nor its moving-average smoothed version [11, 26]. Indeed, the large fluctuations in the daily data make usual smoothing procedures less reliable for such purposes. Mathematical models are therefore required to obtain a sound quantitative description of the epidemic dynamics. Another important aspect of the PM that we would like to point out is its applicability to the daily number of both deaths and infection cases. For instance, a combined analysis of cases and deaths within the pathway approach was performed for the first COVID-19 wave in several countries [14]. Similar analysis could in principle be extended to multiple waves, albeit it is more numerically demanding and hence will be left for future studies.

6 Conclusion

In this paper, we have studied the dynamics of multiple waves of COVID-19 infections by means of a generalized pathway model with time-dependent parameters. The pathway approach used here is formulated explicitly in time by writing the growth rate of the relevant epidemic variable (in our case, number of deaths attributed to COVID-19) as a prescribed function of time—more specifically, a type-2 beta function [12]. The explicit timewise nature of the model allows it to be fitted directly to daily data without the need of any ODE integration [14].

Here, we have extended the pathway model to the case of epidemic curves with multiple waves by assuming that the original model parameters are time dependent, so as to capture the successive acceleration-deceleration-reacceleration regimes (waves) of the disease. More concretely, in order to describe an epidemic curve with N waves, we assumed that the parameters vary in time according to a multistep logistic function with N plateaus, where each plateau represents the parameter value during the respective wave [7]. We have applied the model to the daily number of COVID-19 deaths for ten selected countries—all exhibiting multiple waves of infections (ranging from two to five). Our results show that the model is very efficient in modeling such complex epidemic data.

From the fitted model, important characteristic points of the epidemic evolution, such as the starting and peak dates for each wave, can be easily obtained. We have argued that this type of information should be helpful in analyzing the effectiveness of both pharmacological and non-pharmacological interventions. For instance, it is expected that the time for an epidemic wave to reach a peak should correlate positively with the delay to adopt control measures and negatively with their strength [15]. Conversely, it is reasonable to expect that the beginning of a reacceleration regime (new wave) is in part due to relaxation of control measures or perhaps to the appearance of new pathogen variant or both. Similarly, the start of vaccination should help to tame an epidemic wave. As the main goal of the present paper was to introduce the pathway approach for multiwave epidemics and show its efficiency in describing COVID-19 data, a more detailed study about the effectiveness of intervention measures along the lines outlined above is left for future work. As a concluding remark, we note that although the pathway approach presented here was focused on the COVID-19 pandemic, the model can be readily applied to any infectious disease, old and new, thus opening new possibilities of applications.