1 Introduction

Mathematical models are indispensable tools for understanding the dynamics of infectious disease spread and for informing public health interventions. The incidence rate, an essential characteristic representing the transmission mechanism, encompasses the frequency of new infections in a susceptible population over a given period of time [1]. Stochastic epidemic models, in particular, are able to capture the randomness and variability inherent in disease transmission processes, providing a more realistic representation of epidemiological phenomena [2]. In the field of epidemiology, the principle of mass action is often applied to infectious disease models, suggesting that infection spreads via a bilinear incidence function, usually represented by \(\beta S I\), where \(\beta \) is the transmission rate, S is the number of susceptible individuals, and I is the number of infected individuals. This bilinear form assumes that each contact between a susceptible and an infected individual has an equal probability of transmitting the disease, which implies homogeneous mixing within the population [3]. However, this assumption has limitations, as it does not account for the heterogeneity in contact patterns that can significantly affect disease transmission dynamics. Real-world interactions are influenced by various factors such as age, social behavior, and community structure, which can lead to different transmission rates among different groups within the population [4, 5]. Moreover, the principle of mass action does not consider the varying intensity and duration of contacts, which can also impact the likelihood of transmission [6, 7]. The use of a general contact function, denoted as g(S), in stochastic epidemic models offers several advantages over the simple bilinear form. It allows for the incorporation of more complex and realistic patterns of contact among individuals, reflecting the heterogeneity and stratification of real-world populations. For example, g(S) can be tailored to account for varying susceptibility across different population segments or changes in contact rates due to behavioral interventions. This can lead to a more accurate representation of the transmission process, improving the predictive power of the model and informing more effective intervention strategies. Additionally, a general contact function can be tailored to specific diseases and their modes of transmission, whether it be direct contact, indirect contact, or vector-borne, providing a flexible framework that can be adapted to various epidemiological scenarios [6, 8,9,10,11]. By moving beyond the limitations of the principle of mass action, epidemiologists can better understand the nuances of disease spread and design targeted control measures that take into account the complex nature of human interactions and disease ecology.

In this paper, we present an extension of the traditional SIR model [12,13,14,15], which only considers susceptible, infected, and recovered individuals, by adding a category for unreported cases of infection. The main purpose of the SIUR (Susceptible, Infected-reported, Unreported-infected, and Recovered) model is to gain a better picture of disease transmission by acknowledging that not all infected individuals are reported or identified, which could have a significant effect on the spread and control of an epidemic. The inclusion of an unreported infected compartment (U) allows us to capture the hidden spread of the disease, which is critical for understanding the full scope of an epidemic. Many infectious diseases, including COVID-19, exhibit a substantial number of asymptomatic or mildly symptomatic cases that go undetected but still contribute to transmission. This model is particularly useful for understanding the spread of infectious diseases and the impact of unreported cases on the dynamics of an epidemic. In the absence of unreported cases, several studies have investigated the dynamics of models incorporating different incidence rate functions. For example, the bilinear incidence form has been examined in studies by Lahrouz et al. [16, 17] and Tornatore et al. [18]. The saturated functional response has been explored in works by Lan et al. [19] and Wang et al. [20], focusing on stationary and Turing patterns. Additionally, the frequency-dependent functional response has been studied by Li et al. [21], while other functional response forms such as the Beddington-DeAngelis response, investigated by Ji et al. [22] and Salman et al. [23], and the Crowley-Martin response, examined by Jan et al. [24] in the context of HIV dynamics, have been the focus of scholarly attention. These investigations have contributed to a deeper understanding of the dynamics described by stochastic models, shedding light on their behavior under different incidence rate specifications and enriching the body of knowledge pertaining to infectious disease modeling.

Our paper is organized as follows. In Sect. 2, we describe the proposed epidemic model and explain its infectious mechanism. We then delve into a thorough examination of the mathematical properties of our stochastic epidemiological model in Sect. 3. We explore key aspects such as the existence and positivity of solutions, as well as the identification of a critical threshold parameter, \({\mathcal {R}}_s\), which profoundly influences the long-term dynamics of the system. This section offers valuable insights into the fundamental behavior of infectious diseases within our model framework. Moving on to the Numerical Study Sect. 4, we embark on a detailed exploration of the dynamics of the model through numerical simulations. We begin by presenting synthetic data simulations that serve to illustrate and validate our theoretical findings, showcasing phenomena such as disease extinction and persistence. Subsequently, we delve deeper into specific aspects, including the impact of the susceptible contact function g(S) and a selection method proposed for optimizing this function based on real-world scenarios. Additionally, we calibrate our model with real data detailing the daily incidence cases and the hospital admissions of COVID-19 cases in the UK. Finally, we conclude our paper by discussing potential avenues for future research, offering insights into the ongoing evolution of epidemiological modeling and its applications in addressing emerging health challenges.

2 The SIUR epidemic model

The SIUR model extends the traditional SIR framework by incorporating unreported cases of infection, which are crucial for understanding the full scope of disease transmission. The infectious mechanism in this model is described through the interaction between susceptible, reported infected, and unreported infected populations. The transmission rate \(\beta g(S) \left( I+U\right) /1+aI^{p}\) reflects the likelihood of susceptible individuals contracting the disease based on the combined presence of reported and unreported infected individuals. This formulation allows the model to capture the effects of underreporting on the overall dynamics of disease spread, highlighting the significant impact of unreported cases on public health strategies. The model consists of the following set of nonlinear stochastic differential equations:

$$\begin{aligned} \left\{ \begin{array}{lll} d S &{}= \left[ \mu _1 -\mu _1 S -\beta g(S) \frac{ \left( I+U\right) }{1+aI^{p}} +\omega R \right] dt \\ &{}\quad -\sigma g(S)\frac{ \left( I+U\right) }{1+aI^{p}} dB(t), \\ d I&{}= \left[ -(\mu _1+ \alpha + \delta )I + \nu \beta g(S)\frac{ \left( I+U\right) }{1+aI^{p}} \right] dt\\ &{}\quad +\sigma \nu g(S) \frac{\left( I+U\right) }{1+aI^{p}} dB(t), \\ d U &{}= \left[ -(\mu _1+ \alpha + \delta )U+ (1-\nu ) \beta g(S)\frac{ \left( I+U\right) }{1+aI^{p}} \right] dt \\ &{}\quad +\sigma (1-\nu ) g(S)\frac{ \left( I+U\right) }{1+aI^{p}} dB(t), \\ d R &{}= \left[ -(\mu _1+\omega )R + \delta (I+ U) \right] dt. \end{array}\right. \nonumber \\ \end{aligned}$$
(1)

The positive constants \(\mu , \alpha , \delta \), and \(\omega \) denote birth and death rates, disease-induced death, recovery rates for both reported and unreported infected individuals, and the rate of losing immunity, respectively. The expression \(\beta g(S) \left( I+U\right) /1+aI^{p}\) signifies the rate at which susceptible individuals contract the infection, considering factors like transmission rate (\(\beta \)), general contact function (g(S)), and the combined impact of reported (I) and unreported (U) infected individuals. The denominator \(1+aI^{p}\) adjusts the infection rate based on reported cases, with a and p being positive constants that modulate the influence of reported infections on transmission. The parameter \(\nu \) indicates the proportion of new infections that are reported. The function g is subject to the conditions \(g(S) \ge 0\) and being continuously differentiable with \(g(0) = 0\). Additionally, the Brownian motion is denoted by B(t), and \(\sigma > 0\) represents the intensity of environmental noise affecting the infection coefficient \(\beta \).

The SDE (1) can alternatively be formulated using the approach outlined in literature [25]. Given any initial value \(z_0:=Z(0) =(s, i, u, r)\) and a sufficiently small time increment \(\Delta t \ge 0\), we posit that the solution \(Z(t) = (S(t), I(t), U(t), R(t))\) forms a Markov process with a conditional mean and a conditional variance respectively given by

$$\begin{aligned}{} & {} {{\mathbb {E}}}[Z(t+\Delta t)-Z(t) \mid Z=Z(0)]\\{} & {} \quad \approx \left[ \begin{array}{c} \mu _1 -\mu _1 S -\beta g(S) \frac{ \left( I+U\right) }{1+aI^{p}} +\omega R \\ -(\mu _1+ \alpha + \delta )I + \nu \beta g(S)\frac{ \left( I+U\right) }{1+aI^{p}} \\ -(\mu _1+ \alpha + \delta )U+ (1-\nu ) \beta g(S)\frac{ \left( I+U\right) }{1+aI^{p}} \\ -(\mu _1+\omega )R + \delta (I+ U) \end{array}\right] \Delta t, \end{aligned}$$

and

$$\begin{aligned}{} & {} {\text {Var}}[Z(t+\Delta t)-Z(t) \mid Z=Z(0)]\\{} & {} \quad \approx \left[ \begin{array}{c} \sigma ^2 g(S)^2 \left( \frac{ \left( I+U\right) }{1+aI^{p}} \right) ^2 \\ \sigma ^2 \nu ^2 g(S)^2 \left( \frac{ \left( I+U\right) }{1+aI^{p}} \right) ^2 \\ \sigma ^2 (1-\nu )^2 g(S)^2 \left( \frac{ \left( I+U\right) }{1+aI^{p}} \right) ^2\\ 0 \end{array}\right] \Delta t. \end{aligned}$$

3 Qualitative properties of the model

In this study, we will begin by assuming the standard conditions for a probability space \({\mathbb {S}} = (\Gamma , {\mathcal {F}}, \{{\mathcal {F}}_{t}\}_{t\ge 0}, {\mathbb {P}})\), incorporating the prerequisites of being increasing and right-continuous. Furthermore, we will posit that the filtration \(\{{\mathcal {F}}_{t}\}_{t\ge 0}\) encompasses all \({\mathbb {P}}\)-null sets in its initial set of events, denoted by \({\mathcal {F}}_0\). We define \({\mathbb {R}}_{+}^4:=[0, \infty )^4 \) to represent the non-negative real space. To delve into the analysis of the model represented by the system (1), we first delineate the boundaries of a set denoted as \(\Omega \), which can be described as follows:

$$\begin{aligned} \Omega =\left\{ X \in {\mathbb {R}}^{4}_{+};\,\, \dfrac{\mu _1}{\mu _1+ \alpha }< X_1+X_2+X_3+ X_4 < 1\right\} . \end{aligned}$$

Next, we present the following theorem.

Theorem 1

The subsequent results are established.

i):

For any \(z_0 \in \Omega \), there exists a unique global solution to the SDE (1), such that

$$\begin{aligned} {\mathbb {P}} \{ Z(t)= & {} \left( S(t), I(t), U(t), R(t)\right) \in \\{} & {} \Omega \quad \forall t \ge 0 \} =1, \quad \quad a.s.. \end{aligned}$$
ii):

For any \(\theta > 0\), there exist two positive constants \(C_1\) and \(C_2\) such that the solution Z(t) of the system (1) satisfies:

$$\begin{aligned}{} & {} {\mathbb {E}} \left[ \left( S(t)+ I(t) +U(t) +R(t) \right) ^{\theta +1 } \right] \nonumber \\{} & {} \quad \le \left( \left( s+i+u+r \right) ^{\theta +1 } - \frac{C_2}{C_1} \right) e^{-C_1 t} + \frac{C_2}{C_1}.\nonumber \\ \end{aligned}$$
(2)

Proof

i) The set \(\Omega \) is almost surely positively invariant under the dynamics of system (1). This assertion follows from standard arguments, and a detailed proof can be found in [14, 26].

For ii), we introduce the Lyapunov function

$$\begin{aligned} \psi _1 (S,I,U,R)=\left( S+ I +U+R \right) ^{\theta +1 }, \end{aligned}$$

where the parameter \( \theta > 0\) will be determined subsequently. Calculating the differential operator associated with system (1), we obtain

$$\begin{aligned} {\mathcal {L}} \psi _1= & {} (\theta +1) \left( S+ I+U+R \right) ^{\theta } \\{} & {} \quad \left( \mu _1 - \mu _1 \left( S+ I+U+R \right) - \alpha \left( I +U \right) \right) \\{} & {} +\frac{\theta (\theta +1)}{2} \left( S+ I +U+R \right) ^{\theta -1 } \sigma ^2 g(S)^2 \\{} & {} \left( \frac{ \left( I+U\right) }{1+aI^{p}} \right) ^2 \\\le & {} \mu _1 (\theta +1) \left( S+ I +U+R \right) ^\theta + (\theta +1) \\{} & {} \quad \left( S{+} I {+}U{+}R \right) ^{\theta -1} \Big ( {-} \mu _1 \left( S{+} I {+}U{+}R \right) ^2 \\{} & {} + \frac{\theta }{2} \sigma ^2 g(S)^2 \left( \frac{ \left( I+U\right) }{1+aI^{p}} \right) ^2 \Big ). \end{aligned}$$

Define \(M= \underset{(S,I,U,R)}{\sup } \frac{ g(S)^2 \left( I+U\right) ^2 }{\left( 1+aI^{p}\right) ^2 \left( S+ I +U+ R\right) ^2}\), we get:

$$\begin{aligned} {\mathcal {L}} \psi _1\le & {} \mu _1 (\theta +1) \left( S+ I +U+R \right) ^\theta + (\theta +1) \nonumber \\{} & {} \quad \left( - \mu _1 + \frac{\theta }{2} \sigma ^2 M \right) \psi _1. \end{aligned}$$
(3)

Now, choosing \( \theta < \frac{2 \mu _1}{\sigma ^2 M}\), we let \( C_1:= \mu _1 - \frac{\theta }{2} \sigma ^2\,M \) and

$$\begin{aligned}{} & {} C_2:= \underset{(S,I,U,R)}{\sup } \left\{ \mu _1 (\theta +1) \left( S+ I +U+R \right) ^\theta \right. \\{} & {} \quad \left. - \theta C_1 \left( S+ I +U+R \right) ^{\theta +1}\right\} . \end{aligned}$$

We deduce from (3) that

$$\begin{aligned} {\mathcal {L}} \psi _1(S,I,U,R) + C_1 \psi _1(S,I,U,R)\le & {} C_2 < +\infty .\nonumber \\ \end{aligned}$$
(4)

Define the stopping time \(\tau _{\varepsilon } = \inf \left\{ t \ge 0, S(t)\right. \left. + I(t) +U(t) +R(t) \ge \varepsilon \right\} \). Using (4) and the Itô formula, we have

$$\begin{aligned} \begin{aligned} {\mathbb {E}}&{\left[ e^{C_1\left( t \wedge \tau _\varepsilon \right) } \psi _1 \left( S(t \wedge \tau _\varepsilon ), I(t \wedge \tau _\varepsilon ), U(t \wedge \tau _\varepsilon ), R(t \wedge \tau _\varepsilon ) \right) \right] } \\&= \psi _1(s,i,u,r)+{\mathbb {E}}\left[ \int _0^{t \wedge \tau _\varepsilon } e^{C_1 v}\left[ {\mathcal {L}} \psi _1\left( S(v),I(v), \right. \right. \right. \\&\quad \left. \left. \left. U(v), R(v) \right) +C_1 \psi _1\left( S(v), I(v), U(v), R(v) \right) \right] d v \right] \\&\le \psi _1(s,i,u,r) +\frac{C_2}{C_1}\left( e^{C_1 t \wedge \tau _\varepsilon }-1\right) . \end{aligned} \end{aligned}$$

Letting \(\varepsilon \rightarrow + \infty \) and applying the Fatou Lemma we get

$$\begin{aligned}{} & {} {\mathbb {E}} \left[ \left( S(t)+ I(t) +U(t) +R(t) \right) ^{\theta +1 } \right] \\{} & {} \quad \le \left( \left( s+i+u+r \right) ^{\theta +1 } - \frac{C_2}{C_1} \right) e^{-C_1 t} + \frac{C_2}{C_1}. \end{aligned}$$

The proof is now complete. \(\square \)

Next, we aim to analyze the dynamics of the stochastic system (1). To proceed, we define the following threshold:

$$\begin{aligned} {\mathcal {R}}_{s} = \frac{\beta M_s}{ \mu _1 + \alpha + \delta + \frac{1}{2}\sigma ^2\frac{ M_s^2}{(1+a)^2}}, \end{aligned}$$

where \(M_s= \underset{S \in (0,1)}{\sup } g(S)\).

Initially, we will demonstrate that if the stochastic threshold \( {\mathcal {R}}_{s}\) is greater than 1, then for any initial solution \(z_0 \in \Omega \), the probability distribution of the solution Z(t) converges exponentially to an invariant distribution \(\pi \in \Omega \). In other words, the levels of susceptible, infected, unreported, and recovered individuals reach a stable positive state eventually. For this purpose, let?s define \( \parallel \cdot , \cdot \parallel _{TV} \) to be the total variation norm on the space \(({\mathbb {R}}_+^n, {\mathcal {B}}({\mathbb {R}}_+^n)) \) as:

$$\begin{aligned} \parallel \phi , \varphi \parallel _{TV} = \underset{A \in {\mathcal {B}}({\mathbb {R}}_+^n)}{\sup } |\phi (A) - \varphi (A) |, \end{aligned}$$

where \({\mathcal {B}}({\mathbb {R}}_+^n)\) denotes the Borel measurable subsets of \({\mathbb {R}}_+^n\).

Theorem 2

For all initial values \(z_0 \in \Omega \). If \({\mathcal {R}}_{s}> 1\) and \(\beta > \frac{1}{2} \sigma ^2 M_s\), then there exists an invariant probability measure \(\pi \) on \(\Omega \) and \(\eta > 0\) such that

$$\begin{aligned} \lim _{t \rightarrow \infty } e^{\eta t}\left\| P(t, z_0, \cdot )-\pi (\cdot )\right\| _{T V}=0, \end{aligned}$$

where \(P(t, z_0, \cdot )\) is the transition probability of Z(t) starting from \(z_0 \).

Proof

Consider the Lyapunov function

$$\begin{aligned} \psi _2(S,I,U,R)= \left( I+U\right) ^{k_0}. \end{aligned}$$

By the Itô formula, we get

$$\begin{aligned} d \psi _2= & {} k_0 \left( I+U\right) ^{k_0} \left( -(\mu _1 +\alpha + \delta ) + \beta \frac{g(S)}{1+ a I^p} \right. \nonumber \\{} & {} \quad \left. + \frac{1}{2} (k_0 -1) \sigma ^2 \frac{g^2(S)}{(1+aI^p)^2} \right) dt \nonumber \\{} & {} + k_0 \left( I+U\right) ^{k_0} \sigma \frac{g(S)}{1+ a I^p} dB_t \nonumber \\\le & {} k_0 \left( I+U\right) ^{k_0} \left( -(\mu _1 + \alpha + \delta ) + \beta M_s \right. \nonumber \\{} & {} \quad \left. + \frac{1}{2} |k_0 -1|\sigma ^2 M_s^2 \right) dt + k_0 \left( I+U\right) ^{k_0} \sigma M_s dB_t \nonumber \\:= & {} C_3 \psi _2 dt + k_0 \sigma M_s \psi _2 dB_t, \end{aligned}$$
(5)

where \(C_3 = k_0 \left( -(\mu _1 + \alpha + \delta ) + \beta M_s + \frac{1}{2} |k_0 -1|\right. \left. \sigma ^2 M_s^2 \right) \) and in the last inequality we use the fact that \( (1+aI^p) \ge 1\).

By integrating Eq.(5), followed by taking the expectation on both sides and applying the well-known Gronwall inequality, we derive the following result:

$$\begin{aligned} {\mathbb {E}}\left( \psi _2(S,I,U,R)\right)\le & {} \psi _2\left( s,i,u,r\right) e^{C_3 t}. \end{aligned}$$
(6)

On the other hand and through the Itô formula, we have

$$\begin{aligned} ln \left( I(t)+U(t)\right)= & {} ln \left( i+u\right) - G(t), \end{aligned}$$
(7)

where

$$\begin{aligned} G(t)= & {} - \int _{0}^t {\mathcal {L}} \left( ln(I(v)+U(v)) \right) dv \\{} & {} \quad - \int _{0}^t \sigma \frac{g(S)}{1+ a I^p} dB_v, \end{aligned}$$

and \({\mathcal {L}} \left( ln(I+U) \right) \) is given by

$$\begin{aligned}{} & {} {\mathcal {L}} \left( ln(I+U) \right) = -(\mu _1 +\alpha + \delta ) + \beta \frac{g(S)}{1+ a I^p} \nonumber \\{} & {} \quad -\frac{1}{2} \sigma ^2 \frac{g^2(S)}{(1+aI^p)^2} \nonumber \\= & {} \beta M_s -(\mu _1 +\alpha + \delta ) - \frac{1}{2} \frac{\sigma ^2 M_s^2}{(1+a)^2} +\beta \frac{g(S)}{1+ a I^p} \nonumber \\{} & {} \quad -\frac{1}{2} \sigma ^2 \frac{g^2(S)}{(1+aI^p)^2} -\beta M_s + \frac{1}{2} \frac{\sigma ^2 M_s^2}{(1+a)^2} \nonumber \\= & {} \beta M_s \left( 1- \frac{1}{{\mathcal {R}}_0} \right) +\frac{g(S)}{1+ a I^p} \left( \beta - \frac{1}{2} \sigma ^2 \frac{g(S)}{1+ a I^p} \right) \nonumber \\{} & {} + \frac{1}{2} \frac{\sigma ^2 }{(1+a)^2} \left( M_s - \frac{(1+a)^2 \beta }{\sigma ^2} \right) ^2 + \frac{1}{2} \frac{(1+a)^2 \beta ^2 }{\sigma ^2}. \end{aligned}$$

Since \(1+a I^p \ge 1, g(S) \le M_s\), and \(\beta > \frac{1}{2} \sigma ^2 M_s\), we conclude that

$$\begin{aligned} {\mathcal {L}} \left( ln(I+U) \right)\ge & {} \beta M_s \left( 1- \frac{1}{{\mathcal {R}}_0} \right) +\frac{g(S)}{1+ a I^p}\\{} & {} \left( \beta - \frac{1}{2} \sigma ^2 M_s \right) \nonumber \\ {}\ge & {} \beta M_s \left( 1- \frac{1}{{\mathcal {R}}_s} \right) := C_4 > 0. \end{aligned}$$

Hence, there exists a \(t_0 >0\) such that for any \(T>t_0\) we have

$$\begin{aligned} {\mathbb {E}}\left[ G(T)\right]\le & {} - C_4 T. \end{aligned}$$
(8)

From Eqs. (6) and (7), we derive

$$\begin{aligned}{} & {} {\mathbb {E}}\left[ e^{2 G(T)} \right] + {\mathbb {E}}\left[ e^{-2 G(T)}\right] {=} \left( i+u\right) ^2 {\mathbb {E}} \left[ \left( I+U \right) ^{-2} \right] \nonumber \\{} & {} \qquad + \left( i+u\right) ^{-2} {\mathbb {E}} \left[ \left( I+U \right) ^{2} \right] \nonumber \\{} & {} \quad \le 2 e^{C_3 T}. \end{aligned}$$

Using [27, Lemma 3.4], then the log-Laplace transform \(ln {\mathbb {E}} \left[ e^{\theta G(T)} \right] \) satisfies the following equation

$$\begin{aligned} ln {\mathbb {E}} \left[ e^{\theta G(T)} \right]\le & {} \theta {\mathbb {E}} \left[ G(T) \right] + \theta ^2 C_5, \quad \theta \in [0,1), \end{aligned}$$
(9)

for some \(C_5 < \infty \).

By considering \(\theta \) to be sufficiently small, it can be inferred from Eqs. (8) and (9) that

$$\begin{aligned} {\mathbb {E}}\left[ e^{\theta G(T)} \right]\le & {} e^{-\theta C_4 T}, \end{aligned}$$

and

$$\begin{aligned}{} & {} {\mathbb {E}}\left[ (I(T) + U(T) )^{-\theta } \right] \nonumber \\{} & {} \quad \le (i+u)^{-\theta } e^{-\theta C_4 T}, \quad \theta \in [0,1). \end{aligned}$$
(10)

Now, consider the Lyaunouv \(C^2\)-function

$$\begin{aligned} \psi _3(S,I,U,R)= & {} \psi _1(S,I,U,R) + \psi _2(S,I,U,R),\\{} & {} \quad \theta \in [0,1). \end{aligned}$$

Then, using (2) and Eq.(10), we can deduce that

$$\begin{aligned}{} & {} {\mathbb {E}}\left( \psi _3\left( S(T),I(T),U(T),R(T)\right) \right) \\{} & {} \qquad = {\mathbb {E}}\left( \psi _1\left( S(T),I(T),U(T),R(T)\right) \right) \\{} & {} \qquad +{\mathbb {E}}\left( \psi _2\left( S(T),I(T),U(T),R(T)\right) \right) \\{} & {} \quad \le \left( \psi _1(s,i,u,r) - \frac{C_2}{C_1} \right) e^{-C_1 T}\\{} & {} \qquad + \frac{C_2}{C_1} + \psi _2(s,i,u,r) e^{-\theta C_4 T} \\{} & {} \quad \le \xi _T \psi _3(s,i,u,r) + \frac{C_2}{C_1} \left( 1- e^{-C_1 T}\right) , \end{aligned}$$

where \( \xi _T = max\{ e^{-C_1 T}, e^{-\theta C_4 T} \}\) and \(\lim _{T\rightarrow \infty } \xi _T = 0\).

Given that the Markov process (S(t), I(t), U(t), R(t)) is irreducible, and the transition probability function \(P(t, z_0, \cdot )\) possesses a smooth density, it can be deduced from [28, 29] that there exist positive constants \(\eta > 0\) and \(C > 0 \) such that

$$\begin{aligned} \left\| P(t, z_0, \cdot )-\pi (\cdot )\right\| _{T V}\le & {} C \psi _3(s,i,u,r) e^{-\eta t}. \end{aligned}$$

Moreover, letting \(t\rightarrow \infty \), we get

$$\begin{aligned} \lim _{t\rightarrow \infty } \left\| P(t, z_0, \cdot )-\pi (\cdot )\right\| _{T V} = 0. \end{aligned}$$

Therefore, the proof is completed. \(\square \)

Subsequently, our objective is to establish the stochastic asymptotic stability of the disease-free equilibrium \(E_0=(1,0,0,0)\) of system (1) when the threshold \({\mathcal {R}}_s\) is less than 1. Additionally, we aim to demonstrate that under this condition, the population sizes of infected, unreported, and recovered individuals exponentially diminish to zero, while the population of susceptible individuals eventually stabilizes at a positive level.

Theorem 3

Consider the stochastic system (1) with initial condition in \(\Omega \). If \({\mathcal {R}}_{s}< 1\) and \(M_s \sigma ^2 < \beta (1+a)^2\), then the following results hold:

i):

The disease-free equilibrium \(E_0 \) of (1) is stochastically asymptotically stable.

ii):

I(t), U(t) and R(t) tend to zeros exponentially with probability 1, i.e.,

$$\begin{aligned} \lim _{t \rightarrow \infty } I(t) = \lim _{t \rightarrow \infty } U(t)=\lim _{t \rightarrow \infty } R(t)=0, \quad a.s., \end{aligned}$$

and

$$\begin{aligned} \lim _{t \rightarrow \infty } S(t) =1, \quad a.s. \end{aligned}$$

Proof

i) Let \(z_0 \in \Omega \). Introduce the positive-definite function

$$\begin{aligned} \psi _3(S,I,U,R)= & {} \frac{1}{2}\theta _1 \left( 1-S\right) ^2 + \frac{1}{k} \\{} & {} \left( I+U\right) ^k + \frac{1}{2} \theta _2 R^2, \end{aligned}$$

where \(\theta _1, \theta _2\) and k are real positive constants to be chosen carefully later on. We have

$$\begin{aligned}{} & {} {\mathcal {L}}\psi _3 = - \mu _1\theta _1 \left( 1-S\right) ^2 + \theta _1 \beta \frac{g(S)}{1+ a I^p}\\{} & {} \quad \qquad \left( I+U\right) \left( 1-S\right) - \omega \theta _1 R \left( 1-S\right) \\{} & {} \quad \qquad + \frac{1}{2} \theta _1 \sigma ^2 \frac{ g^2(S)}{\left( + a I^p\right) ^2}\left( I+U\right) ^2\\{} & {} \quad \qquad -(\mu _1 +\alpha + \delta ) \left( I+U\right) ^k +\beta \frac{g(S) }{1+ a I^p} \left( I+U\right) ^k \\{} & {} \quad \quad +\frac{1}{2} (k-1) \sigma ^2 \frac{g^2(S)}{(1+aI^p)^2} \left( I+U\right) ^k \\{} & {} \quad \qquad - \theta _2 \left( \mu _1 + \omega \right) R^2 + \theta _2 \delta \left( I+U\right) R. \end{aligned}$$

Using \(g(S) \le M_s, 1\le 1+a I^p \le 1+ a\), and \(I+U \le 1-S\), we estimate

$$\begin{aligned} {\mathcal {L}}\psi _3\le & {} - \mu _1\theta _1 \left( 1-S\right) ^2 + \theta _1 \beta M_s \left( I+U\right) \left( 1-S\right) \\{} & {} - \omega \theta _1 R \left( I+U\right) + \frac{1}{2} \theta _1 \sigma ^2 M_s^2 \left( I+U\right) ^2 \\{} & {} -(\mu _1 +\alpha + \delta ) \left( I+U\right) ^k +\beta g(S) \left( I+U\right) ^k \\{} & {} - \frac{1}{2} \sigma ^2 \frac{g^2(S)}{(1+a)^2} \left( I+U\right) ^k + \frac{k}{2} \sigma ^2 M_s^2 \left( I+U\right) ^k \\ {}{} & {} - \theta _2 \left( \mu _1 + \omega \right) R^2 + \theta _2 \delta \left( I+U\right) R. \end{aligned}$$

Selecting \(k \in (0,2)\), and employing that

$$\begin{aligned} \left( I+U\right) \left( 1-S\right)\le & {} \frac{\varepsilon }{2}\left( 1-S\right) ^2 \\{} & {} + \frac{1}{2 \varepsilon } \left( I+U\right) ^k, \quad \text {for all } \varepsilon > 0, \end{aligned}$$

we get

$$\begin{aligned}{} & {} {\mathcal {L}}\psi _3 \le - \theta _1 \left( \mu _1 - \frac{\varepsilon }{2}\beta M_s\right) \left( 1-S\right) ^2 \\{} & {} \qquad + \bigg ( -(\mu _1 +\alpha + \delta ) + \beta g(S) \\{} & {} \qquad + - \frac{1}{2} \sigma ^2 \frac{g^2(S)}{(1+a)^2}+ \frac{k}{2} \sigma ^2 M_s^2 \\{} & {} \qquad \frac{1}{2}\theta _1 \left( \frac{\beta M_s}{2 \varepsilon } + \sigma ^2 M_s^2 \right) \bigg ) \left( I+U\right) ^k \\{} & {} \qquad - \theta _2 \left( \mu _1 + \omega \right) R^2+ \left( \theta _2 \delta - \theta _1 \omega \right) \left( I+U\right) R \\{} & {} \quad := - \theta _1 \left( \mu _1 - \frac{\varepsilon }{2}\beta M_s\right) \left( 1-S\right) ^2 \\{} & {} \qquad + \bigg (\varphi (g(S)) + \frac{k}{2} \sigma ^2 M_s^2 \\{} & {} \qquad + \frac{1}{2}\theta _1 \left( \frac{\beta M_s}{2 \varepsilon } + \sigma ^2 M_s^2 \right) \bigg ) \left( I+U\right) ^k \\{} & {} \qquad - \theta _2 \left( \mu _1 + \omega \right) R^2+ \left( \theta _2 \delta - \theta _1 \omega \right) \left( I+U\right) R, \end{aligned}$$

where \(\varphi (X) = - \frac{1}{2} \frac{\sigma ^2 }{(1+a)^2}X^2 + \beta X-(\mu _1 +\alpha + \delta )\). From the assumption \(M_s \sigma ^2 <\beta (1+a)^2\), it’s straightforward to verify that \( \varphi (X)\) is increasing on \((0,M_s)\). Thus,

$$\begin{aligned} \varphi (g(S)) \le \varphi (M_s)=\beta M_s \left( 1- \frac{1}{{\mathcal {R}}_s}\right) < 0. \end{aligned}$$

Thereby, we can choose \(\theta _1, \theta _2\) and \(\varepsilon \) such that

$$\begin{aligned}{} & {} \mu _1 - \frac{\varepsilon }{2}\beta M_s > 0, \quad \beta M_s \left( 1- \frac{1}{{\mathcal {R}}_s}\right) + \frac{k}{2} \sigma ^2 M_s^2 \\{} & {} \quad + \frac{1}{2}\theta _1 \left( \frac{\beta M_s}{2 \varepsilon } + \sigma ^2 M_s^2 \right) <0, \end{aligned}$$

and

$$\begin{aligned} \theta _2 \delta - \theta _1 \omega < 0. \end{aligned}$$

Hence,

$$\begin{aligned} {\mathcal {L}}\psi _3\le & {} 0. \end{aligned}$$

Therefore, the equilibrium state \(E_0\) is stochastically stable for system (1).

ii) By the Itô formula, we derive from the second and third equations of system (1)

$$\begin{aligned} d ln\left( I(t) + U(t) \right)= & {} {\mathcal {L}}\left( ln\left( I(t) + U(t) \right) \right) dt \nonumber \\{} & {} + \sigma \frac{g(S)}{1+ a I^p} dB_t, \end{aligned}$$
(11)

where

$$\begin{aligned} {\mathcal {L}} \left( ln(I+U) \right)= & {} -(\mu _1 +\alpha + \delta ) + \beta \frac{g(S)}{1+ a I^p} \\{} & {} -\frac{1}{2} \sigma ^2 \frac{g^2(S)}{(1+aI^p)^2} \\\le & {} -(\mu _1 +\alpha + \delta ) + \beta g(S)\\{} & {} - \frac{1}{2} \frac{\sigma ^2 }{(1+a)^2} g^2(S) \\:= & {} \varphi (g(S)). \end{aligned}$$

Since,

$$\begin{aligned} \varphi (g(S)) \le \varphi (M_s)=\beta M_s \left( 1- \frac{1}{{\mathcal {R}}_s}\right) . \end{aligned}$$

Consequently,

$$\begin{aligned} {\mathcal {L}} \left( ln(I+U) \right)\le & {} \beta M_s \left( 1- \frac{1}{{\mathcal {R}}_s}\right) :=C_5. \end{aligned}$$
(12)

Integrating Eq. (11) from 0 to t, considering (12), taking expectations, and dividing by t on both sides, we obtain

$$\begin{aligned} \frac{ln\left( I(t) + U(t) \right) }{t}\le & {} \frac{ln\left( i+u\right) }{t} + C_5 \nonumber \\{} & {} + \frac{1}{t} \int _{0}^t \sigma \frac{g(S)}{1+ a I^p} dB_v. \end{aligned}$$
(13)

Given that \(g(S)\le M_s\) and \(1+aI^p \ge 1\), and employing the law of large numbers for martingales, we deduce

$$\begin{aligned} \limsup _{t\rightarrow \infty } \frac{1}{t} \int _{0}^t \sigma \frac{g(S)}{1+ a I^p} dB_v=0 \quad a.s. \end{aligned}$$

Therefore,

$$\begin{aligned} \limsup _{t\rightarrow \infty } \frac{ln\left( I(t) + U(t) \right) }{t} \le C_5< 0 \quad a.s., \end{aligned}$$

which ensures

$$\begin{aligned} \lim _{t \rightarrow \infty } I(t) = \lim _{t \rightarrow \infty } U(t)=0 \quad a.s. \end{aligned}$$
(14)

The remaining steps of the proof follow a similar approach as used in [14, 30, 31].

The proof is thus concluded successfully. \(\square \)

Fig. 1
figure 1

The impact of the infection force p on the dynamics of disease transmission Z(t) starting from \(z_0=(0.8,0.1,0.05,0.05)\) when \({\mathcal {R}}_s < 1\)

4 Numerical study

In this section, we delve into the numerical investigation of our study, providing a comprehensive analysis of the dynamics of the proposed model under various scenarios. We use the Milstein higher-order approach [32] to implement a numerical sheme of the system. We examine the impact of crucial parameters, including the infection force characterized by p, and the choice of the susceptible contact function g(S), across synthetic and real datasets. The discretization scheme of the system (1) takes the following form:

$$\begin{aligned} \left\{ \begin{array}{lllll} S_{k+1}&{}=S_{k}+\big (\mu _1 -\mu _1 S_k -\beta g(S_k) \frac{ \left( I_k+U_k\right) }{1+aI_k^{p}} +\omega R_k \big ) h \\ &{}\quad - \sigma g(S_k) \frac{ \left( I_k+U_k\right) }{1+aI_k^{p}} \eta _{k} \sqrt{h} \\ &{}\quad -\frac{\sigma ^2}{2}g(S_k) \frac{ \left( I_k+U_k\right) }{1+aI_k^{p}} \left( \eta _{k}^2 -1\right) h, \\ I_{k+1}&{}=I_{k} + \big (-(\mu _1+ \alpha + \delta )I_k + \nu \beta g(S_k)\frac{ \left( I_k+U_k\right) }{1+aI_k^{p}}\big ) h \\ &{}\quad + \sigma \nu g(S_k) \frac{ \left( I_k+U_k\right) }{1+aI_k^{p}} \eta _{k} \sqrt{h} \\ &{}\quad +\frac{\sigma ^2}{2} \nu g(S_k) \frac{ \left( I_k+U_k\right) }{1+aI_k^{p}} \left( \eta _{k}^2 -1\right) h, \\ U_{k+1}&{}=U_{k}+ \big (-(\mu _1+ \alpha + \delta )\\ &{}\quad U_k+ (1-\nu ) \beta g(S_k)\frac{ \left( I_k+U_k\right) }{1+aI_k^{p}} \big ) h \\ &{}\quad + \sigma (1-\nu ) g(S_k) \frac{ \left( I_k+U_k\right) }{1+aI_k^{p}} \eta _{k} \sqrt{h} \\ &{}\quad +\frac{\sigma ^2}{2}(1-\nu ) g(S_k) \frac{ \left( I_k+U_k\right) }{1+aI_k^{p}} \left( \eta _{k}^2 -1\right) h, \\ R_{k+1}&{}=R_{k} + \big (-(\mu _1+\omega )R_k + \delta (I_k+ U_k)\big ) h. \end{array} \right. \end{aligned}$$
Fig. 2
figure 2

The impact of the infection force p on the dynamics of disease transmission Z(t) starting from \(z_0=(0.8,0.1,0.05,0.05)\) when \({\mathcal {R}}_s > 1\)

Fig. 3
figure 3

The density functions corresponding to S(t), I(t), U(t) and R(t) for \(p=2\) based on 1000 simulations

where \(\eta _{k}\) are mutually independent N(0, 1) random variables for \(k=1,2,\cdots \).

4.1 Synthetic data

We first conduct some numerical simulations are given to both illustrate and validate our theoretical findings with various examples.

4.1.1 Extinction of diseases

We exemplify the conditions for disease extinction as delineated in Theorem 3 (Fig. 1). By setting the parameters to specific values: \(\mu _1 = 0.1\), \(\alpha = 0.05\), \(\delta = 0.2\), \(\omega = 0.02\), \(\beta = 0.375\), \(\nu = 0.7\), \(a = 0.1\), \(p \in \{0.5, 1, 2, 3\}\), \(\sigma = 0.3\), and considering the function \(g(S) = S^p\), it is evident that \(M_s = 1\). The conditions for extinction, \({\mathcal {R}}_s < 1\) and \(M_s \sigma ^2 < \beta (1 + a)^2\), are satisfied, signifying the incapacity of the disease to perpetuate within the populace autonomously. On average, each infected individual transmits the disease to fewer than one other individual during their period of infectivity. Consequently, this trend leads to the gradual decline and ultimate eradication of the disease from the population. Furthermore, it is pertinent to underscore that the augmenting power of the infection force, denoted by parameter p, inversely influences the incidence of the disease and its propagation dynamics. As the value of p escalates, the disease incidence diminishes, manifesting in a reduced rate of spread throughout the population. This observation underscores the crucial role of parameter p in modulating the dynamics of disease transmission and underscores its significance in epidemiological modeling and control strategies.

4.1.2 Disease persistence

In our subsequent analysis, we deliberately select specific parameter configurations to delve into the dynamics of the SIURS epidemic model across varying values of the power parameter, denoted as p. Retaining the previously chosen parameter settings and function g(S) from our prior example, we introduce alterations in certain parameters. Specifically, adjusting \(\beta = 0.4\), \(a = 0.2\), and \(\sigma = 0.2\), we derive the basic reproduction number, denoted as \({\mathcal {R}}_s\), where \({\mathcal {R}}_s > 1\) and \(\beta > \frac{1}{2} \sigma ^2 M_s\). This outcome signifies the sustained prevalence of the infectious disease within the population. Notably, this persistence observation resonates with the theoretical insights outlined in Theorem 2, which delineates conditions conducive to the enduring presence of infectious diseases (Fig. 2). Moreover, the system is positively recurrent, exhibiting a unique stationary distribution shown in Fig. 3. Furthermore, consistent with our expectations, as the parameter p escalates, the incidence of the disease diminishes, leading to a discernible reduction in its rate of transmission across the population.

4.1.3 The impact of the susceptible contact function g(S)

In this section, we explore how the choice of the susceptible contact function g(S) affects the behavior of the solution to the system (1). We compare the dynamics resulting from three distinct functions:

$$\begin{aligned} g_1(S)= & {} S^2, \quad g_2(S) = b-be^{-S}, \quad \text {and} \\ g_3(S)= & {} \frac{c S}{1+S}, \end{aligned}$$

where b and c represent positive constants. Table 1 presents the modified parameters for each scenario tested. The remaining parameters are held constant at \(\mu _1=0.1, \alpha =0.05, \delta =0.2, \omega =0.02, \nu =0.7\) and \(p=2\).

Table 1 List of parameters for different scenario tested

This study investigates the impact of the choice of the susceptible contact function g(S) on the behavior of the system described by (1). By comparing the dynamics resulting from different functions, we gain insights into how the system responds to variations in g(S), which directly influences the transmission dynamics of the disease (4). The function g(S) plays a crucial role in determining the evolution direction of the epidemic \({\mathcal {R}}_s\) and the rate at which susceptible individuals become infected. For instance, when \(g_1(S) = S^2\), the transmission rate increases quadratically with the susceptible population S, potentially leading to rapid disease spread in densely populated regions. On the other hand, \(g_2(S) = b - be^{-S}\) introduces a more nuanced transmission pattern, where b controls the initial transmission rate, and the exponential decay term modulates the rate as the susceptible population decreases. This function may represent scenarios where preventive measures are implemented gradually or where the effectiveness of interventions diminishes over time. Moreover, \(g_3(S) = \frac{cS}{1+S}\) exhibits a saturating effect on transmission. Initially, the transmission rate increases linearly with S, but it eventually saturates as S approaches larger values. This function captures scenarios where the disease transmission reaches a plateau due to factors such as limited contact opportunities or immunity buildup within the population. By analyzing the response of the system (1) to these different g(S) functions, we gain insights into the interplay between population dynamics and disease transmission dynamics. This understanding can inform public health strategies and interventions aimed at controlling the spread of infectious diseases in real-world scenarios. The importance of selecting a susceptible contact function g(S) that accurately captures the dynamics of disease transmission cannot be overstated. It directly influences the behavior of the system and the rate at which the disease spreads within a population. Understanding the implications of different g(S) functions is essential for developing effective strategies to control and mitigate the impact of infectious diseases.

In the next section of this study, we will delve deeper into this topic by proposing a selection method for the function g. Our objective is to identify the most informative function that provides valuable insights into the interplay between population susceptibility and disease transmission dynamics. Through this analysis, we aim to enhance our understanding of the factors driving disease spread and contribute to the development of more accurate models for predicting and managing infectious disease outbreaks.

Fig. 4
figure 4figure 4figure 4

Evolution of Z(t) for different susceptible contact function g(S)

4.1.4 Selection method for the susceptible contact function g(S)

In real-world scenarios and public health data analysis, it’s common to observe the number of infected and recovered individuals on a daily or weekly basis. Leveraging this insight, we propose a method for selecting the function g(S) by minimizing the Mean Square Error (MSE) between the actual observations and the simulated data using multiple candidate functions g. Ultimately, the goal is to identify the function that yields the lowest error, thereby providing the best fit to the observed data. Mathematically, we express this as:

$$\begin{aligned} MSE(g)= & {} {\mathbb {E}} \left[ (I_{obs}-I_{sim}(g))^2 \right] \nonumber \\{} & {} + {\mathbb {E}} \left[ (R_{obs}-R_{sim}(g))^2 \right] , \end{aligned}$$
(15)

and

$$\begin{aligned} g^*= \underset{g \in {\mathcal {C}}^1\left( (0,1),{\mathbb {R}}^+\right) }{argmin} MSE(g). \end{aligned}$$

To achieve this, we generate 1000 paths of the \(Z_g(t)\) process described by (1) for \(g(S) = S\), starting from \(z_0\), with varying horizon times H using the parameters from scenario 1 in Table 1. Next, for each candidate function \(g_i(S)\), we simulate \(Z_{g_i}\) and estimate the MSE \(\varepsilon _H\) using the Monte Carlo method for different values of H. In this example, we consider the following set of candidate functions:

$$\begin{aligned} g_1(S)= & {} S^{1/2}, \quad g_2(S) =S, \quad g_3(S) =S^{3/2}, \\{} & {} g_4(S) =S^{2}, \quad g_5(S) = 1-e^{-S}, \end{aligned}$$

and

$$\begin{aligned} g_6(S) = \frac{ S}{1+S}. \end{aligned}$$

In summary, the proposed selection method consists of the following 4 steps:

(i):

Simulate the system for \(g(S) = S\) and record the 1000 trajectories of the I and R compartments (presente the real observations in this case).

(ii):

For each candidate function \(g_i(S)\) (\(i = 1\) to 6), simulate the corresponding system using the same initial conditions and the same parameter values, and record the 1000 trajectories of the I and R compartments.

(iii):

Calculate the MSE between simulated and real observations employing Monte Carlo method for each function \(g_i(S)\) for different values of H.

(iv):

Repeat steps 2 and 3 for \(H = 100, 500, 1000, 2000\).

Fig. 5
figure 5

MSE \(\varepsilon _H\) for different g(S) functions tested

As H increases, the Mean Squared Error (MSE) tends to decrease. This observation is consistent with the notion that as we observe the system over a longer time horizon, the simulated trajectories tend to converge towards a more accurate representation of the true dynamics. Additionally, the MSE values indicate that the method effectively selects the most suitable function, \(g_2(S)\), as the one with the lowest error across different horizon values (Fig. 5). This suggests that \(g_2(S)\) provides the best fit to the observed data and captures the underlying dynamics of the system most accurately among the candidate functions tested.

This approach allows us to choose the g(S) function that most accurately captures the transmission dynamics observed in real-world data. By aligning the simulated outcomes with empirical observations, we can enhance the predictive power of our models and better understand the underlying mechanisms driving disease transmission. This selection method facilitates the development of more reliable and informative models for studying and managing infectious diseases.

4.2 COVID-19 data in the UK

COVID-19, caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), has emerged as one of the most significant health crises of the twenty-first century worldwide [31, 33, 34]. Originating in Wuhan, China, in late 2019, the outbreak swiftly spread across the globe, prompting the World Health Organization (WHO) to declare it a global pandemic on March 11, 2020 [35]. In the United Kingdom (UK), the first cases were identified on January 31, 2020, marking the onset of a rapid escalation in cases throughout February and March [36]. By December 2020, the UK had endured over 50, 000 deaths and 230, 000 hospital admissions due to COVID-19, despite estimates suggesting that less than 20% of the population had been exposed to the virus. As of December 31, 2020, the population of the UK stood at 68, 602, 259, serving as the initial value for the susceptible population in epidemiological modeling [37]. The birth rate was estimated at 2371 per day [37], contributing to the constant influx of individuals into the susceptible class.

Now, we rigorously calibrated our SIURS model to capture the temporal dynamics of daily new reported cases of SARS-CoV-2 and hospital admissions in the UK, covering the period from January 31, 2020, to May 20, 2022 [38] (Fig. 5). During this timeframe, the highest number of reported infections peaked at 226, 524 individuals in March 2022, with an average of 25, 587.23 daily infections and 10, 053.23 daily hospital admissions in the UK.

Fig. 6
figure 6

New daily reported cases and hospital admissions in the UK (31/01/2020–20/05/2022)

One of the central hurdles in mathematical modeling studies is the precise estimation of model parameters. By scrutinizing available literature, clinical studies and research investigating the progression of the COVID-19 pandemic in the UK, we have derived estimates for specific model parameters. Table 2 provides these estimates along with their respective sources.

Table 2 List of parameters values for the SIURS model (1) of COVID-19 transmission in the UK based on model calibration, relevant literature and clinical studies

To address the remaining parameters, namely a and the function g, we opt for a systematic approach. We employ the power function \(g(S) = S^p\) and explore a range of values for the exponent p from 0 to 10, and for the parameter a from 0 to 2. This selection process aligns with the methodology outlined in Sect. 4.1.4, where we use a fitting procedure to calibrate the model against observed daily COVID-19 incidences and hospital cases. The objective is to locally minimize the Mean Squared Error (MSE), thus refining the accuracy of the model. The optimal values of p and a that yield the lowest MSE (15) are found to be \(p=1.3\) and \(a=0.85\).

Upon fitting our stochastic SIURS model to real data (Fig. 7), we observe that the results accurately capture the modeled trends, especially in the initial stages of the pandemic (from January 30, 2020, to mid-2021). However, significant deviations between the adjusted data and observations become apparent thereafter. This phenomenon can be attributed to the emergence of new variants of SARS-CoV-2. These disparities arise due to multifaceted intricacies inherent in modeling COVID-19. These variations stem from multiple factors, including the natural variability in human behavior, the evolving nature of health interventions over time, and the inherent dynamics of virus transmission. Moreover, there are latent variables whose effects on the trajectory of the epidemic are not fully understood, adding further complexity. Additionally, spatial factors such as population density, patterns of inter-regional mobility, and differences in intervention strategies between regions contribute additional layers of complexity to the situation. Furthermore, the stochastic nature of the disease process, combined with potential flaws or gaps in data collection, amplifies these discrepancies. Despite these challenges, our model represents a significant step forward in understanding disease dynamics, offering valuable insights for future analyses and the development of more effective strategies for managing the pandemic.

Fig. 7
figure 7

The model (1) fitted curve compared to reported COVID-19 cases in the UK: January 31, 2020, to May 20, 2022

5 Conclusion and perspectives

In conclusion, our study offers a comprehensive analysis of a stochastic epidemiological model that integrates unreported cases and incorporates a general susceptible contact function. We have highlighted the pivotal role of the critical threshold parameter in determining the long-term dynamics of infectious diseases. By conducting extensive numerical simulations and calibrating the model with real-world data from the UK, we have demonstrated its robustness in accurately capturing phenomena such as disease extinction and persistence. Our findings emphasize the critical influence of the susceptible contact function g(S) in optimizing the model’s predictive accuracy for practical applications.

Further research for this work could explore the effectiveness of various intervention strategies, such as vaccination campaigns, social distancing measures, and healthcare capacity improvements. By simulating the effect of these interventions in the version, researchers can gain treasured insights into their potential to mitigate the spread of infectious illnesses and inform evidence-primarily based choice-making in public health policy. Moreover, extending the model to encompass multiple geographic regions or populations could permit the analysis of local disparities in disease transmission and intervention effectiveness. This extension ought to empower policymakers to tailor interventions to precise neighborhood contexts, allocate resources extra correctly, and deal with the specific challenges faced via distinctive groups. Additionally, investigating the long-term dynamics and evolutionary methods of infectious diseases, such as the emergence of latest variants, holds significant promise for boosting our information of epidemic trajectories. By integrating evolutionary modeling strategies into the framework, researchers can more accurately predict and respond to emerging infectious disease threats, in the end contributing to the development of more resilient public health techniques.