1 Introduction

Regression is an important method in inference and prediction problems. Since the invention of method of least square by Legendre (1805), regression analysis is widely used to investigate relationships between independent variables and dependent variable. Statisticians also use method of maximum likelihood popularized by Wilks (1938) together with the method of least square. With the development of regression analysis, t test (Student 1908) and F test (Fisher 1925) were investigated to test the hypothesis of regression. Proposed by Neyman and Pearson (1933), likelihood ratio test also shows promising results in most cases. Furthermore, starting with the linear regression (Galton 1885), mathematicians also invented numerous nonlinear regression models to fit different scenarios. Well-known examples of nonlinear regression models include logistic regression model, polynomial regression model, etc. This paper will analyze another important nonlinear regression model: Gompertz regression model.

Gompertz regression model was invented by Benjamin Gompertz in 1825 to elaborate his law of human mortality (Gompertz 1825). Gompertz model has a “S-shape” feature that enables it to be widely applied in many biological systems to determine the growth of population, because it can accurately describe the process of a relatively low growth rate in the early stage and late stage as well as a rapid growth in the intermediate stage (Nguimkeu 2014). Although Gompertz model shares similar properties with logistic model, it is symmetric while Logistic model is not, so one model might outperform the other in some circumstances (Nguimkeu 2014). For instances, Laird used Gompertz model to describe the growth of tumor (Laird 1964); Zwietering and other biologists investigated the bacterial growth curve based on the Gompertz model (Zwietering et al. 1990). It is then important to further analyze the property of Gompertz model to better describe related biological feature.

Most of the time, statisticians consider observations as precise data and ignore the fact that data points might be acquired in non-random way, which might lead to inaccurate inference or predictions. For instance, in a large biological system, it is always impossible to precisely measure the whole population. Scientists usually use the capture–recapture method to estimate the whole population size, which might result in biased estimation. In order to handle scenarios like this, Tanaka et al. (1982) first proposed a fuzzy linear regression model in 1982 and was later modified and improved by Corral and Gil (1984). In addition to the method proposed by Tanaka et al., uncertainty theory is shown to provide more accurate and reliable results at some cases (Liu 2012). Uncertainty theory was proposed to investigate the relationship between independent variables and dependent variable with uncertain observed data.

In uncertainty theory, the unknown parameters in models can be estimated by the principle of least squares (Yao and Liu 2018). Yang and Liu (2017) proposed uncertain time series analysis and estimated the unknown parameters using the principle of least squares. Besides, Lio and Liu have already deduced the method to find unknown parameters, forecast value, and confidence interval in several uncertain regression models (Lio and Liu 2018). In this article, we follow their process to analyze the properties of uncertain Gompertz regression model. In Sect. 2, we will introduce some preliminary knowledge of uncertainty theory. In Sects. 3 and 4, we will provide the method to find unknown parameters in the Gompertz regression model and its residual analysis. In Sect. 5, the confidence interval of the uncertain Gompertz regression models is provided. In Sect. 6, we will provide a numerical example to illustrate the application of the uncertain Gompertz regression model. Finally, some conclusions are made in Sect. 7.

2 Preliminaries

According to Liu, many surveys showed that subjective uncertainty cannot be modeled by fuzziness and hence cannot be processed by possibility theory (Liu 2009). In order to deal with subjective uncertainty, Liu proposed uncertainty theory to better capture the properties of imprecise observations that rely on degrees of belief (Liu 2007). In this section, we will provide some concepts and theorems of uncertainty theory which will be useful in the following analysis of Gompertz regression model.

Definition 1

(Liu 2007) Let \(\mathcal {L}\) be a \(\sigma \)-algebra on a nonempty set \(\varGamma \). Then \((\varGamma ,\mathcal {L})\) is a measurable space, and each element \(\varGamma \) in \(\mathcal {L}\) is called an event. An uncertain measure is a set function \(\mathcal {M}: \mathcal {L}\rightarrow [0,1]\) if it satisfies the following axioms:

Axiom 1

(Normality Axiom) \(\mathcal {M}\{\varGamma \}=1\) for the universal set \(\varGamma \).

Axiom 2

(Duality Axiom) \(\mathcal {M}\{\varLambda \}+\mathcal {M}\{\varLambda ^{c}\}=1\) for any event \(\varLambda \).

Axiom 3

(Subadditivity Axiom) For every countablesequence of events \(\varLambda _1, \varLambda _2, \ldots ,\) we have

$$\begin{aligned} \displaystyle \mathcal {M}\left\{ \bigcup _{i=1}^{\infty }\varLambda _i\right\} \le \sum _{i=1}^{\infty }\mathcal {M}\{\varLambda _i\}. \end{aligned}$$

Let \(\varGamma \) be a nonempty set, let \(\mathcal {L}\) be a \(\sigma \)-algebra over \(\varGamma \), and let \(\mathcal {M}\) be an uncertain measure. Then the triplet \((\varGamma ,\mathcal {L},\mathcal {M})\) is called an uncertainty space. Furthermore, the product uncertain measure \(\mathcal {M}\) satisfies the following axiom:

Axiom 4

(Product Axiom) (Liu 2009) Let \((\varGamma _k,\mathcal {L}_k,\mathcal {M}_k)\) be uncertainty spaces for \(k=1, 2, \ldots \). The product uncertain measure \(\mathcal {M}\)is an uncertain measure satisfying

$$\begin{aligned} \displaystyle \mathcal {M}\left\{ \prod _{k=1}^\infty \varLambda _k\right\} =\bigwedge _{k=1}^\infty \mathcal {M}_k\{\varLambda _k\} \end{aligned}$$

where\(\varLambda _k\)are arbitrarily chosen events from\(\mathcal {L}_k\)for\(k=1, 2, \ldots \), respectively.

Definition 2

(Liu 2007) An uncertain variable \(\xi \) is a measurable function from an uncertainty space \((\varGamma , \mathcal {L}, \mathcal {M})\) to the set of real numbers, i.e., for any Borel set B of real numbers. The set

$$\begin{aligned} \{ \xi \in B \} = \{ \gamma \in \varGamma \ | \ \xi (\gamma ) \in B \} \end{aligned}$$

is an event.

We define the uncertainty distribution \(\varPhi \) of an uncertain variable \(\xi \) as \(\varPhi (x) = \mathcal {M}\{ \xi \le x \}\) for any real number x. An uncertainty distribution \(\varPhi (x)\) is called regular if it is a continuous and strictly increasing function with respect to x where \(0<\varPhi (x)<1\), and

$$\begin{aligned} \displaystyle \lim _{x \rightarrow -\infty } \varPhi (x) = 0, \qquad \lim _{x \rightarrow \infty } \varPhi (x) = 1. \end{aligned}$$

Let \(\xi \) be an uncertain variable with regular uncertainty distribution \(\varPhi (x)\). Then the inverse function \(\varPhi ^{-1}(\alpha )\) is called the inverse uncertainty distribution of \(\xi \) (Liu 2010).

Now we introduce some special uncertainty distributions.

An uncertain variable \(\xi \) is called linear, denoted by \(\mathcal {L}(a,b)\), if it has an uncertainty distribution

$$\begin{aligned} \varPhi (x) = \left\{ \begin{array}{ll} \displaystyle 0, &{} \quad \text{ if }\; x \le a \\ \displaystyle (x - a)/(b - a), &{} \quad \text{ if }\; a < x \le b \\ \displaystyle 1, &{} \quad \text{ if }\; x > b \end{array} \right. \end{aligned}$$

where a and b are real numbers with \(a<b\). The inverse uncertainty distribution of \(\mathcal {L}(a,b)\) is

$$\begin{aligned} \varPhi ^{-1}(\alpha ) = (1-\alpha )a + \alpha b. \end{aligned}$$

An uncertain variable \(\xi \) is called zigzag, denoted by \(\mathcal {Z}(a,b,c)\), if it has an uncertainty distribution

$$\begin{aligned} \varPhi (x) = \left\{ \begin{array}{ll} \displaystyle 0, &{} \quad \text{ if }\; x \le a \\ \displaystyle (x - a)/[2(b - a)], &{} \quad \text{ if }\; a< x \le b \\ \displaystyle (x + c - 2b)/[2(c - b)], &{} \quad \text{ if }\; b < x \le c \\ \displaystyle 1, &{} \quad \text{ if }\; x > c \end{array} \right. \end{aligned}$$

where a, b and c are real numbers with \(a<b<c\). Then the inverse uncertainty distribution of \(\mathcal {Z}(a,b,c)\) is

$$\begin{aligned} \varPhi ^{-1}(\alpha ) = \left\{ \begin{array}{ll} (1-2\alpha )a+2\alpha b, &{}\quad \text{ if }\; \alpha < 0.5 \\ (2-2\alpha )b+(2\alpha -1)c, &{}\quad \text{ if }\;\alpha \ge 0.5. \end{array} \right. \end{aligned}$$

An uncertain variable \(\xi \) is called normal, denoted by \(\mathcal {N}(e,\sigma )\), if it has an uncertainty distribution

$$\begin{aligned} \varPhi (x)=\left( 1+ \exp \left( \frac{\pi (e-x)}{\sqrt{3}\sigma } \right) \right) ^{-1}, x \in \mathfrak {R}\end{aligned}$$

where e and \(\sigma \) are real numbers with \(\sigma >0\). Then the inverse uncertainty distribution of \(\mathcal {N}(e, \sigma )\) is

$$\begin{aligned} \varPhi ^{-1}(\alpha )=e+ \frac{\sigma \sqrt{3}}{\pi } \ln \frac{\alpha }{1-\alpha }. \end{aligned}$$

Definition 3

(Liu 2009) Uncertain variables \(\xi _1, \xi _2, \ldots , \xi _n\) are independent if

$$\begin{aligned} \displaystyle \mathcal {M}\left\{ \bigcap _{i=1}^{n} (\xi _{i} \in B_{i}) \right\} = \bigwedge _{i=1}^{n} \mathcal {M}\left\{ \xi _{i} \in B_{i} \right\} \end{aligned}$$

for any Borel sets \(B_{1}, B_{2}, \ldots , B_{n}\) of real numbers.

Let \(\xi _{1}, \xi _{2}, \ldots , \xi _{n}\) be independent uncertain variables with regular uncertainty distributions \(\varPhi _{1}, \varPhi _{2}, \ldots , \varPhi _{n}\), respectively. If \(f(x_{1}, x_{2}, \ldots , x_{n})\) is a strictly monotonous function, then the inverse uncertainty distribution of the uncertain variable \(f(\xi _{1}, \xi _{2}, \ldots , \xi _{n})\) can be calculated by the following theorem (Liu 2010).

Theorem 1

(Liu 2010) Let \(\xi _{1}, \xi _{2}, \ldots , \xi _{n}\) be independent uncertain variables with regular uncertainty distributions \(\varPhi _{1}, \varPhi _{2}, \ldots , \varPhi _{n}\), respectively. If a function \(f(\xi _{1}, \xi _{2}, \ldots , \xi _{n})\) is strictly increasing with respect to \(\xi _{1}, \xi _{2}, \ldots , \xi _{m}\) and strictly decreasing with respect to \(\xi _{m+1}, \xi _{m+2}, \ldots , \xi _{n}\), then the uncertain variable \(\xi = f(\xi _{1}, \xi _{2}, \ldots , \xi _{n})\) has an inverse uncertainty distribution

$$\begin{aligned}&\displaystyle \varPsi ^{-1}(\alpha ) \\&\quad = f(\varPhi ^{-1}_{1}(\alpha ), \ldots , \varPhi ^{-1}_{m}(\alpha ), \varPhi ^{-1}_{m+1}(1-\alpha ), \ldots , \varPhi ^{-1}_{n}(1-\alpha )). \end{aligned}$$

Definition 4

(Liu (2007)) Let \(\xi \) be an uncertain variable. The expected value of \(\xi \) is defined as

$$\begin{aligned} \displaystyle E[\xi ] = \int _{0}^{+\infty } \mathcal {M}\{\xi \ge x \}\mathrm {d} x - \int _{-\infty }^{0} \mathcal {M}\{\xi \le x \}\mathrm {d} x \end{aligned}$$

provided that at least one of the two integrals is finite.

Definition 5

(Liu 2007) Let \(\xi \) be an uncertain variable with finite expected value e. The variance of \(\xi \) is

$$\begin{aligned} V[\xi ]=E\left[ (\xi -e)^{2}\right] . \end{aligned}$$

Theorem 2

(Liu 2010) If \(\xi \) is an uncertain variable with regular uncertainty distribution \(\varPhi \), then we have

$$\begin{aligned} E[\xi ]= & {} \int _{0}^{1} \varPhi ^{-1}(\alpha ) \mathrm{d}\alpha , \end{aligned}$$
(1)
$$\begin{aligned} E[\xi ^{2}]= & {} \int _{0}^{1} (\varPhi ^{-1}(\alpha ))^{2} \mathrm{d}\alpha , \end{aligned}$$
(2)
$$\begin{aligned} V[\xi ]= & {} \int _{0}^{1} (\varPhi ^{-1}(\alpha )-e)^{2} \mathrm{d}\alpha . \end{aligned}$$
(3)

3 Uncertain Gompertz regression model

Let \((x_{1},x_{2},\ldots ,x_{p})\) be a vector of independent variables, and let y be the dependent variable. If the relationship between \((x_{1},x_{2},\ldots ,x_{p})\) and y can be expressed by a function, f, then the model is generally expressed as

$$\begin{aligned} y=f(x_{1},x_{2},\ldots ,x_{p}|{\varvec{\beta }})+\epsilon \end{aligned}$$
(4)

where \({\varvec{\beta }}\) is a vector of unknown parameters, and \(\epsilon \) is a disturbance term. If we have a set of imprecisely observed data,

$$\begin{aligned} (\tilde{x}_{i1}, \tilde{x}_{i2}, \ldots , \tilde{x}_{ip}, \tilde{y}_{i}), \quad i=1,2,\ldots ,n \end{aligned}$$
(5)

where \(\tilde{x}_{i1}, \tilde{x}_{i2}, \ldots , \tilde{x}_{ip},\tilde{y}_{i}\) are uncertain variables with uncertainty distributions \(\varPhi _{i1}, \varPhi _{i2}, \ldots , \varPhi _{ip}, \varPsi _{i}, i=1,2,\ldots ,n\), respectively. In order to perform prediction or inference, it is necessary to find the vector of unknown parameters, \({\varvec{\beta }}\). Since it is impossible to find the precise value of the parameters, we are interested in obtaining an estimation of \({\varvec{\beta }}\), denoted by \({\varvec{\beta }}^{*}\), based on the imprecisely observed data.

As proposed by Yao and Liu (2018), the least squares estimate of \({\varvec{\beta }}\) in the regression model (4) can be obtained by solving the minimization problem below:

$$\begin{aligned} \min \limits _{{\varvec{\beta }}} \sum \limits _{i=1}^{n} E[(\tilde{y}_{i} - f(\tilde{x}_{i1},\tilde{x}_{i2},\ldots , \tilde{x}_{ip}|{\varvec{\beta }}))^{2}]. \end{aligned}$$
(6)

If we denote the optimal solution of the minimization problem (6) as \({\varvec{\beta }}^{*}\), then the fitted regression model is given as

$$\begin{aligned} y=f(x_{1},x_{2},\ldots ,x_{p})|{\varvec{\beta }}^{*}). \end{aligned}$$
(7)

Definition 6

The Gompertz regression model is defined as:

$$\begin{aligned} y = \beta _{0}\exp (-\beta _{1} \exp (-\beta _{2}x)) + \epsilon , \quad \beta _{0}>0,\beta _{1}>0,\beta _{2}>0 \end{aligned}$$
(8)

where \(\beta _{0},\beta _{1},\beta _{2}\) are parameters.

Gompertz regression model is widely used in the biological system to determine the growth of population of a certain specie. It can accurately capture the characteristics of the population growth in nature.

Theorem 3

Let \((\tilde{x}_{i}, \tilde{y}_{i}), i=1,2,\ldots ,n\), be a set of imprecisely observed data, where \(\tilde{x}_{i}, \tilde{y}_{i}\) are independent uncertain variables with regular uncertainty distributions \(\varPhi _{i}, \varPsi _{i}, i=1,2,\ldots , n\), respectively. Then the least squares estimate of \(\beta _{0}, \beta _{1}\) and \(\beta _{2}\) in the Gompertz regression model is the optimal solution of the following minimization problem:

$$\begin{aligned}&\min \limits _{\beta _{0},\beta _{1},\beta _{2}} \sum \limits _{i=1}^{n} \int _{0}^{1} \nonumber \\&\quad \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0} \exp (-\beta _{1}\exp (-\beta _{2} \varPhi _{i}^{-1}(1-\alpha )) \right) ^{2} \mathrm{d}\alpha . \end{aligned}$$
(9)

Proof

By Eq. (6), the least squares estimate of \(\beta _{0}, \beta _{1}\) and \(\beta _{2}\) in the Gompertz regression model can be obtained by solving the minimization problem

$$\begin{aligned} \min \limits _{\beta _{0},\beta _{1},\beta _{2}} \sum \limits _{i=1}^{n} E \left[ \left( \tilde{y}_{i} - \beta _{0} \exp (-\beta _{1}\exp (\beta _{2}\tilde{x}_{i})) \right) ^{2} \right] . \end{aligned}$$
(10)

Since the function

$$\begin{aligned} \tilde{y}_{i} - \beta _{0} \exp (-\beta _{1}\exp (\beta _{2}\tilde{x}_{i})) \end{aligned}$$
(11)

is strictly increasing with respect to \(\tilde{y}_{i}\) and strictly decreasing with respect to \(\tilde{x}_{i}\) for each i. According to Theorem 1, the inverse uncertainty distribution of function (11) is

$$\begin{aligned} F_{i}^{-1}(\alpha ) = \varPsi _{i}^{-1}(\alpha ) - \beta _{0} \exp (-\beta _{1}\exp (-\beta _{2} \varPhi _{i}^{-1}(1-\alpha ))). \end{aligned}$$

Then according to Theorem 2, we have

$$\begin{aligned}&E \left[ \left( \tilde{y}_{i} - \beta _{0} \exp (-\beta _{1}\exp (\beta _{2}\tilde{x}_{i})) \right) ^{2} \right] \\&\quad = \int _{0}^{1} \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0} \exp (-\beta _{1}\exp (-\beta _{2} \varPhi _{i}^{-1}(1-\alpha )) \right) ^{2} \mathrm{d}\alpha . \end{aligned}$$

Thus the minimization problem (10) is equivalent to

$$\begin{aligned}&\min \limits _{\beta _{0},\beta _{1},\beta _{2}} \sum \limits _{i=1}^{n} \int _{0}^{1} \\&\quad \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0} \exp (-\beta _{1}\exp (-\beta _{2} \varPhi _{i}^{-1}(1-\alpha )) \right) ^{2} \mathrm{d}\alpha . \end{aligned}$$

The theorem is verified.

4 Residual analysis

In the regression model (4), there is always a disturbance term, \(\epsilon \), because it is usually impossible for our model to perfectly fit each observation point. The disturbance term indicates the distance that dependent variable y may deviate from the regression. Because of the nature of disturbance, each observation has a different value of \(\epsilon \), and hence we are only interested in finding an estimation of \(\epsilon \) for the given set of imprecisely observed data

$$\begin{aligned} (\tilde{x}_{i1}, \tilde{x}_{i2}, \ldots , \tilde{x}_{ip}, \tilde{y}_{i}), \quad i=1,2,\ldots ,n. \end{aligned}$$
(12)

For each i, the difference between the observed value \(\tilde{y}_{i}\) and \(f(\tilde{x}_{i1}, \tilde{x}_{i2}, \ldots , \tilde{x}_{ip}|{\varvec{\beta }}^{*})\) represents the distance of observation and our regression model and hence is the disturbance term \(\epsilon \). Thus, we propose a definition of \(\epsilon \) as follows:

Definition 7

(Lio and Liu 2018) Let \((\tilde{x}_{i1}, \tilde{x}_{i2}, \ldots , \tilde{x}_{ip},\tilde{y}_{i}), \quad i=1,2,\ldots ,n\) be a set of imprecisely observed data, and let the fitted regression model be

$$\begin{aligned} y = f(x_{i1}, x_{i2}, \ldots , x_{ip}|{\varvec{\beta }}^{*}). \end{aligned}$$
(13)

Then for each \(i \ (i=1,2,\ldots ,n)\), the term

$$\begin{aligned} \hat{\epsilon }_{i} = \tilde{y}_{i} - f(\tilde{x}_{i1}, \tilde{x}_{i2}, \ldots , \tilde{x}_{ip}|{\varvec{\beta }}^{*}) \end{aligned}$$
(14)

is called the ith residual.

In uncertainty theory, because the data are imprecisely observed, the ith residual \(\epsilon \) is also assumed to be an uncertain variable. Because each observation has a different disturbance term, we use the average of the expected values of residuals:

$$\begin{aligned} \hat{e} = \frac{1}{n} \sum \limits _{i=1}^{n}E[\hat{\epsilon }_{i}] \end{aligned}$$
(15)

to estimate the expected value of \(\epsilon \), and

$$\begin{aligned} \hat{\sigma }^{2} = \frac{1}{n} \sum \limits _{i=1}^{n}E[(\hat{\epsilon }_{i} - \hat{e})^{2}] \end{aligned}$$
(16)

to estimate the variance, where \(\hat{\epsilon }_{i}\) are the ith residuals, \(i=1,2,\ldots ,n\), respectively.

Theorem 4

Let \((\tilde{x}_{i}, \tilde{y}_{i}), i=1,2,\ldots ,n\) be a set of imprecisely observed data, where \(\tilde{x}_{i}, \tilde{y}_{i}\) are independent uncertain variables with regular uncertainty distributions \(\varPhi _{i}, \varPsi _{i}, i=1,2,\ldots , n\), respectively, and the fitted Gompertz regression model is denoted as

$$\begin{aligned} y = \beta _{0}^{*}\exp (-\beta _{1}^{*} \exp (-\beta _{2}^{*}x)), \quad \beta _{0}^{*}>0,\beta _{1}^{*}>0,\beta _{2}^{*}>0 \end{aligned}$$
(17)

Then the estimated expected value of \(\epsilon \) is

$$\begin{aligned}&\hat{e} = \frac{1}{n} \sum \limits _{i=1}^{n} \int _{0}^{1} \nonumber \\&\quad \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0}^{*} \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} \varPhi _{i}^{-1}(1-\alpha ))) \right) \mathrm{d}\alpha \end{aligned}$$
(18)

and the estimated variance is

$$\begin{aligned}&\hat{\sigma }^{2} = \frac{1}{n} \sum \limits _{i=1}^{n} \int _{0}^{1} \nonumber \\&\quad \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0}^{*} \right. \nonumber \\&\left. \quad \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} \varPhi _{i}^{-1}(1-\alpha ))) - \hat{e} \right) ^{2} \mathrm{d}\alpha . \end{aligned}$$
(19)

Proof

Since the function

$$\begin{aligned} \tilde{y}_{i} - \beta _{0}^{*}\exp (-\beta _{1}^{*} \exp (-\beta _{2}^{*}\tilde{x}_{i})), \quad \beta _{0}^{*}>0,\beta _{1}^{*}>0,\beta _{2}^{*}>0 \nonumber \quad (*) \end{aligned}$$

is strictly increasing with respect to \(\tilde{y}_{i}\) and strictly decreasing with respect to \(\tilde{x}_{i}\) for each i. According to Theorem 1, the inverse uncertainty distribution of the function \((*)\) is

$$\begin{aligned} F_{i}^{-1}(\alpha ) = \varPsi _{i}^{-1}(\alpha ) - \beta _{0}^{*} \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} \varPhi _{i}^{-1}(1-\alpha ))). \end{aligned}$$

By the definition of estimate of residuals and variance of uncertain variables and Theorem 2, the theorem is proved trivially.

5 Forecast value and confidence interval

The main purpose of regression is to make prediction given a new observation of independent variables. Suppose \(\tilde{x}_{p}\) is a new observation of independent variables, and \(\tilde{x}_{p}\) is a uncertain dependent variable with regular uncertainty distribution \(\varPhi _{p}\). Since the model has been constructed by using previous imprecisely observed data \((\tilde{x}_{i},\tilde{y}_{i}), \ i=1,2,\ldots ,n\), we can then make predictions and derive the forecast value for the new dependent variable \(\tilde{y}_{p}\).

For example, scientists want to investigate the population of wild lions in a new plain. From previous imprecisely observed data set, the scientists choose the area of the plain, as independent variables, and the population of wild lions as dependent variable. Then, an uncertain Gompertz regression model can be constructed by these observations. By obtaining the data from the new plain, we can then use the model to predict the population of wild lions in the new plain.

Table 1 Imprecisely observed data where \(\mathcal {L}(a,b)\) represents linear uncertain variable

Mathematically, we can denote the fitted uncertain Gompertz regression model constructed by using previous observations as:

$$\begin{aligned} y = \beta _{0}^{*}\exp (-\beta _{1}^{*} \exp (-\beta _{2}^{*}x)), \quad \beta _{0}^{*}>0,\beta _{1}^{*}>0, \beta _{2}^{*}>0 \end{aligned}$$
(20)

and the residual \(\epsilon \) has estimated expected value \(\hat{e}\) and variance \(\hat{\sigma }^{2}\), and is independent of \(\tilde{x}_{p}\). Then the forecast uncertain variable of y with respect to \(\tilde{x}_{p}\) is determined by

$$\begin{aligned} \hat{y} = \beta _{0}^{*}\exp (-\beta _{1}^{*} \exp (-\beta _{2}^{*}\tilde{x})) + \epsilon . \end{aligned}$$
(21)

The intuitive attempt would be finding a point estimate for the new y. The forecast value of y is defined as

$$\begin{aligned} \mu = \beta _{0}^{*}E[\exp (-\beta _{1}^{*} \exp (-\beta _{2}^{*}\tilde{x}))] + \hat{e}, \end{aligned}$$
(22)

or we can use the integral to express \(\mu \):

$$\begin{aligned} \mu = \int _{0}^{1} (\beta _{0}^{*} \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} \varPhi ^{-1}(1-\alpha ))))\mathrm{d}\alpha + \hat{e}. \end{aligned}$$
(23)

The forecast value of y represents the expected value of the forecast uncertain variable \(\hat{y}\). Assume that the disturbance term \(\epsilon \) has a normal uncertainty distribution \(N(\hat{e},\hat{\sigma })\) with inverse uncertainty distribution \(\varPhi ^{-1}(\alpha )\), i.e.,

$$\begin{aligned} \varPhi ^{-1}(\alpha )=\hat{e} + \frac{\hat{\sigma }\sqrt{3}}{\pi } \ln \frac{\alpha }{1-\alpha }. \end{aligned}$$
(24)

We can then derive the inverse uncertainty distribution of \(\hat{y}\):

$$\begin{aligned} \hat{\varPsi }^{-1}(\alpha ) = \beta _{0}^{*} \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} \varPhi ^{-1}(1-\alpha ))) + \varPhi ^{-1}(\alpha ) \end{aligned}$$
(25)

Trivially, the uncertainty distribution, \(\hat{\varPsi }\), of \(\hat{y}\) can be obtained by \(\hat{\varPsi }^{-1}\).

The forecast value, \(\mu \), is a point estimation of y. However, we are more interested to find a range of values that y might fall within. Hence, the confidence interval of uncertain Gompertz regression model is proposed. Although we might loss some precision or accuracy of prediction, but we are more confident about the correctness of our prediction. Let \(\alpha \) denote the confidence level, which indicates the belief degree of corresponding confidence intervals that contain the true value of y. We can derive the confidence interval by finding the minimum value b such that

$$\begin{aligned} \hat{\varPsi }(\mu +b)-\hat{\varPsi }(\mu -b) \ge \alpha . \end{aligned}$$
(26)

Since

$$\begin{aligned} \mathcal {M}\{ \mu -b \le \hat{y} \le \mu +b \} \ge \hat{\varPsi }(\mu +b)-\hat{\varPsi }(\mu -b), \end{aligned}$$
(27)

it follows that \(\mathcal {M}\{ \mu -b \le \hat{y} \le \mu +b \} \ge \alpha \). Thus the \(\alpha \) confidence interval of y can be represented as \([\mu -b,\mu +b]\), which can be simplified as

$$\begin{aligned} \mu \pm b. \end{aligned}$$
(28)

6 Numerical example

In this section, we will use an example to illustrate the application of uncertain Gompertz regression model to make prediction for a new independent data with imprecise observation. Furthermore, we will provide a confidence interval to make stronger and more reliable prediction when compared to the point estimation.

The data are originally about spores germination, where the independent variable is the day from the beginning of experiment and the dependent variable is the germination rate. The data can only be imprecisely observed because of the condition of experiment, and hence an uncertain Gompertz regression model should be applied under this circumstance.

We use \((\tilde{x}_{i},\tilde{y}_{i}), i=1,2,\ldots ,18\) to denote the set of imprecisely observed data of spores germination, where \(\tilde{x}_{i},\tilde{y}_{i}\) are independent uncertain variables with linear uncertainty distributions, \(\varPhi _{i}, \varPsi _{i}\), respectively. \(\tilde{x}_{i}\) represents the days and \(\tilde{y}_{i}\) represents germination rate. The specific data are provided in Table 1.

We may use uncertain Gompertz regression model to forecast the germination rate for any given day. The uncertain Gompertz regression model is given as

$$\begin{aligned} y = \beta _{0}\exp (-\beta _{1} \exp (-\beta _{2}x)) + \epsilon . \end{aligned}$$
(29)

In order to obtain the least squares estimate of \(\beta _{0}, \beta _{1}\) and \(\beta _{2}\) in the Gompertz regression model, we need to solve the minimization problem (6), i.e.,

$$\begin{aligned} \min \limits _{\beta _{0},\beta _{1},\beta _{2}} \sum \limits _{i=1}^{n} E \left[ \left( \tilde{y}_{i} - \beta _{0} \exp (-\beta _{1}\exp (\beta _{2}\tilde{x}_{i})) \right) ^{2} \right] , \end{aligned}$$
(30)

or equivalently,

$$\begin{aligned}&\min \limits _{\beta _{0},\beta _{1},\beta _{2}} \sum \limits _{i=1}^{n} \int _{0}^{1} \nonumber \\&\quad \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0} \exp (-\beta _{1}\exp (-\beta _{2} \varPhi _{i}^{-1}(1-\alpha ))) \right) ^{2} \mathrm{d}\alpha . \end{aligned}$$
(31)

From Theorem 3, we can obtain the least squares estimate

$$\begin{aligned} (\beta _{0}^{*},\beta _{1}^{*},\beta _{2}^{*})=(86.0493,11.9398,0.2391). \end{aligned}$$
(32)

The fitted Gompertz regression model is then

$$\begin{aligned} y = 86.0493\exp (-11.9398 \exp (-0.2391x)). \end{aligned}$$
(33)

From Eq. (18), i.e.,

$$\begin{aligned}&\hat{e} = \frac{1}{18} \sum \limits _{i=1}^{18} \int _{0}^{1} \nonumber \\&\quad \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0}^{*} \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} \varPhi _{i}^{-1}(1-\alpha ))) \right) \mathrm{d}\alpha , \end{aligned}$$
(34)

we can obtain the estimated expected value of the disturbance term \(\epsilon \), and from Eq. (19), i.e.,

$$\begin{aligned}&\hat{\sigma }^{2} = \frac{1}{18} \sum \limits _{i=1}^{18} \int _{0}^{1} \nonumber \\&\quad \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0}^{*} \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} \varPhi _{i}^{-1}(1-\alpha ))) - \hat{e} \right) ^{2} \mathrm{d}\alpha , \end{aligned}$$
(35)

we can obtain the variance of the disturbance term \(\epsilon \). The estimated expected value and variance of \(\epsilon \) are

$$\begin{aligned} \hat{e}=0.0000, \quad \hat{\sigma }^{2}=3.5725, \end{aligned}$$
(36)

respectively. Now suppose

$$\begin{aligned} \tilde{x}_{p} \sim \mathcal {L}(24.0,26.0) \end{aligned}$$
(37)

is a new imprecisely observed data of a day. Assume the new day \(\tilde{x}_{p}\) is independent of \(,\epsilon \), we can then obtain the forecast uncertain variable of the dependent variable y

$$\begin{aligned} \hat{y} = 86.0493\exp (-11.9398 \exp (-0.2391\tilde{x}_{p})) + \epsilon , \end{aligned}$$
(38)

and the forecast value of y is 83.4837 which can be obtained from Eq. (22), i.e.,

$$\begin{aligned} \mu = \beta _{0}^{*}E[\exp (-\beta _{1}^{*} \exp (-\beta _{2}^{*}\tilde{x}_{p}))] + \hat{e}. \end{aligned}$$
(39)

For the confidence level \(\alpha =95\%\), if we suppose that the disturbance term \(\epsilon \) is a normal uncertain variable, then

$$\begin{aligned} b=5.7271 \end{aligned}$$
(40)

is the minimum value for Eq. (26), i.e.,

$$\begin{aligned} \hat{\varPsi }(\mu +b)-\hat{\varPsi }(\mu -b) \ge 95 \% . \end{aligned}$$
(41)

\(\hat{\varPsi }\) is the uncertainty distribution of \(\hat{y}\) and can be calculated by

$$\begin{aligned} \hat{\varPsi }^{-1}(\alpha ) = \beta _{0}^{*} \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} (24(1-\alpha )+26\alpha ))) \end{aligned}$$
(42)

where \(\varPhi ^{-1}(\alpha )\) is the inverse uncertainty distribution of normal uncertain variable \(\mathcal {N}(\hat{e}, \hat{\sigma })\). The \(95\%\) confidence interval of dependent variable y is then

$$\begin{aligned} 83.4837 \pm 5.7271. \end{aligned}$$
(43)

7 Conclusion

This article introduced the uncertain Gompertz regression model with the method to obtain estimate least square betas, model’s forecast value and confidence interval, and residual analysis of the model. We also provided a numerical example of spores germination to explain the application of the uncertain Gompertz regression model in the real-life problem.