Uncertain Gompertz regression model with imprecise observations

Hu, Zeyu; Gao, Jinwu

doi:10.1007/s00500-018-3611-1

Uncertain Gompertz regression model with imprecise observations

Focus
Published: 02 November 2018

Volume 24, pages 2543–2549, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

Uncertain Gompertz regression model with imprecise observations

Download PDF

337 Accesses
24 Citations
Explore all metrics

Abstract

Regression is widely applied in many fields. Regardless of the types of regression, we often assume that the observations are precise. However, in real-life circumstances, this assumption can only be met sometimes, which means the traditional regression methods can result in significant imprecise or biased predictions. Consequently, uncertain regression models might provide more accurate and meaningful results under these circumstances. In this article, we provide the residual analysis of uncertain Gompertz regression model, as well as the corresponding forecast value and confidence interval. Finally, we give a numerical example of uncertain Gompertz regression model.

Multivariate uncertain regression model with imprecise observations

Article 04 March 2020

Research on Regression Model Based on Uncertain System

Uncertain regression analysis: an approach for imprecise observations

Article 15 February 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Regression is an important method in inference and prediction problems. Since the invention of method of least square by Legendre (1805), regression analysis is widely used to investigate relationships between independent variables and dependent variable. Statisticians also use method of maximum likelihood popularized by Wilks (1938) together with the method of least square. With the development of regression analysis, t test (Student 1908) and F test (Fisher 1925) were investigated to test the hypothesis of regression. Proposed by Neyman and Pearson (1933), likelihood ratio test also shows promising results in most cases. Furthermore, starting with the linear regression (Galton 1885), mathematicians also invented numerous nonlinear regression models to fit different scenarios. Well-known examples of nonlinear regression models include logistic regression model, polynomial regression model, etc. This paper will analyze another important nonlinear regression model: Gompertz regression model.

Gompertz regression model was invented by Benjamin Gompertz in 1825 to elaborate his law of human mortality (Gompertz 1825). Gompertz model has a “S-shape” feature that enables it to be widely applied in many biological systems to determine the growth of population, because it can accurately describe the process of a relatively low growth rate in the early stage and late stage as well as a rapid growth in the intermediate stage (Nguimkeu 2014). Although Gompertz model shares similar properties with logistic model, it is symmetric while Logistic model is not, so one model might outperform the other in some circumstances (Nguimkeu 2014). For instances, Laird used Gompertz model to describe the growth of tumor (Laird 1964); Zwietering and other biologists investigated the bacterial growth curve based on the Gompertz model (Zwietering et al. 1990). It is then important to further analyze the property of Gompertz model to better describe related biological feature.

Most of the time, statisticians consider observations as precise data and ignore the fact that data points might be acquired in non-random way, which might lead to inaccurate inference or predictions. For instance, in a large biological system, it is always impossible to precisely measure the whole population. Scientists usually use the capture–recapture method to estimate the whole population size, which might result in biased estimation. In order to handle scenarios like this, Tanaka et al. (1982) first proposed a fuzzy linear regression model in 1982 and was later modified and improved by Corral and Gil (1984). In addition to the method proposed by Tanaka et al., uncertainty theory is shown to provide more accurate and reliable results at some cases (Liu 2012). Uncertainty theory was proposed to investigate the relationship between independent variables and dependent variable with uncertain observed data.

In uncertainty theory, the unknown parameters in models can be estimated by the principle of least squares (Yao and Liu 2018). Yang and Liu (2017) proposed uncertain time series analysis and estimated the unknown parameters using the principle of least squares. Besides, Lio and Liu have already deduced the method to find unknown parameters, forecast value, and confidence interval in several uncertain regression models (Lio and Liu 2018). In this article, we follow their process to analyze the properties of uncertain Gompertz regression model. In Sect. 2, we will introduce some preliminary knowledge of uncertainty theory. In Sects. 3 and 4, we will provide the method to find unknown parameters in the Gompertz regression model and its residual analysis. In Sect. 5, the confidence interval of the uncertain Gompertz regression models is provided. In Sect. 6, we will provide a numerical example to illustrate the application of the uncertain Gompertz regression model. Finally, some conclusions are made in Sect. 7.

2 Preliminaries

According to Liu, many surveys showed that subjective uncertainty cannot be modeled by fuzziness and hence cannot be processed by possibility theory (Liu 2009). In order to deal with subjective uncertainty, Liu proposed uncertainty theory to better capture the properties of imprecise observations that rely on degrees of belief (Liu 2007). In this section, we will provide some concepts and theorems of uncertainty theory which will be useful in the following analysis of Gompertz regression model.

Definition 1

(Liu 2007) Let $\mathcal {L}$ be a $\sigma $-algebra on a nonempty set $\varGamma $. Then $(\varGamma ,\mathcal {L})$ is a measurable space, and each element $\varGamma $ in $\mathcal {L}$ is called an event. An uncertain measure is a set function $\mathcal {M}: \mathcal {L}\rightarrow [0,1]$ if it satisfies the following axioms:

Axiom 1

(Normality Axiom) $\mathcal {M}\{\varGamma \}=1$ for the universal set $\varGamma $.

Axiom 2

(Duality Axiom) $\mathcal {M}\{\varLambda \}+\mathcal {M}\{\varLambda ^{c}\}=1$ for any event $\varLambda $.

Axiom 3

(Subadditivity Axiom) For every countablesequence of events $\varLambda _1, \varLambda _2, \ldots ,$ we have

$$\begin{aligned} \displaystyle \mathcal {M}\left\{ \bigcup _{i=1}^{\infty }\varLambda _i\right\} \le \sum _{i=1}^{\infty }\mathcal {M}\{\varLambda _i\}. \end{aligned}$$

Let $\varGamma $ be a nonempty set, let $\mathcal {L}$ be a $\sigma $-algebra over $\varGamma $, and let $\mathcal {M}$ be an uncertain measure. Then the triplet $(\varGamma ,\mathcal {L},\mathcal {M})$ is called an uncertainty space. Furthermore, the product uncertain measure $\mathcal {M}$ satisfies the following axiom:

Axiom 4

(Product Axiom) (Liu 2009) Let $(\varGamma _k,\mathcal {L}_k,\mathcal {M}_k)$ be uncertainty spaces for $k=1, 2, \ldots $. The product uncertain measure $\mathcal {M}$is an uncertain measure satisfying

$$\begin{aligned} \displaystyle \mathcal {M}\left\{ \prod _{k=1}^\infty \varLambda _k\right\} =\bigwedge _{k=1}^\infty \mathcal {M}_k\{\varLambda _k\} \end{aligned}$$

where$\varLambda _k$are arbitrarily chosen events from$\mathcal {L}_k$for$k=1, 2, \ldots $, respectively.

Definition 2

(Liu 2007) An uncertain variable $\xi $ is a measurable function from an uncertainty space $(\varGamma , \mathcal {L}, \mathcal {M})$ to the set of real numbers, i.e., for any Borel set B of real numbers. The set

$$\begin{aligned} \{ \xi \in B \} = \{ \gamma \in \varGamma \ | \ \xi (\gamma ) \in B \} \end{aligned}$$

is an event.

We define the uncertainty distribution $\varPhi $ of an uncertain variable $\xi $ as $\varPhi (x) = \mathcal {M}\{ \xi \le x \}$ for any real number x. An uncertainty distribution $\varPhi (x)$ is called regular if it is a continuous and strictly increasing function with respect to x where $0<\varPhi (x)<1$, and

$$\begin{aligned} \displaystyle \lim _{x \rightarrow -\infty } \varPhi (x) = 0, \qquad \lim _{x \rightarrow \infty } \varPhi (x) = 1. \end{aligned}$$

Let $\xi $ be an uncertain variable with regular uncertainty distribution $\varPhi (x)$. Then the inverse function $\varPhi ^{-1}(\alpha )$ is called the inverse uncertainty distribution of $\xi $ (Liu 2010).

Now we introduce some special uncertainty distributions.

An uncertain variable $\xi $ is called linear, denoted by $\mathcal {L}(a,b)$, if it has an uncertainty distribution

$$\begin{aligned} \varPhi (x) = \left\{ \begin{array}{ll} \displaystyle 0, &{} \quad \text{ if }\; x \le a \\ \displaystyle (x - a)/(b - a), &{} \quad \text{ if }\; a < x \le b \\ \displaystyle 1, &{} \quad \text{ if }\; x > b \end{array} \right. \end{aligned}$$

where a and b are real numbers with $a<b$. The inverse uncertainty distribution of $\mathcal {L}(a,b)$ is

$$\begin{aligned} \varPhi ^{-1}(\alpha ) = (1-\alpha )a + \alpha b. \end{aligned}$$

An uncertain variable $\xi $ is called zigzag, denoted by $\mathcal {Z}(a,b,c)$, if it has an uncertainty distribution

$$\begin{aligned} \varPhi (x) = \left\{ \begin{array}{ll} \displaystyle 0, &{} \quad \text{ if }\; x \le a \\ \displaystyle (x - a)/[2(b - a)], &{} \quad \text{ if }\; a< x \le b \\ \displaystyle (x + c - 2b)/[2(c - b)], &{} \quad \text{ if }\; b < x \le c \\ \displaystyle 1, &{} \quad \text{ if }\; x > c \end{array} \right. \end{aligned}$$

where a, b and c are real numbers with $a<b<c$. Then the inverse uncertainty distribution of $\mathcal {Z}(a,b,c)$ is

$$\begin{aligned} \varPhi ^{-1}(\alpha ) = \left\{ \begin{array}{ll} (1-2\alpha )a+2\alpha b, &{}\quad \text{ if }\; \alpha < 0.5 \\ (2-2\alpha )b+(2\alpha -1)c, &{}\quad \text{ if }\;\alpha \ge 0.5. \end{array} \right. \end{aligned}$$

An uncertain variable $\xi $ is called normal, denoted by $\mathcal {N}(e,\sigma )$, if it has an uncertainty distribution

$$\begin{aligned} \varPhi (x)=\left( 1+ \exp \left( \frac{\pi (e-x)}{\sqrt{3}\sigma } \right) \right) ^{-1}, x \in \mathfrak {R}\end{aligned}$$

where e and $\sigma $ are real numbers with $\sigma >0$. Then the inverse uncertainty distribution of $\mathcal {N}(e, \sigma )$ is

$$\begin{aligned} \varPhi ^{-1}(\alpha )=e+ \frac{\sigma \sqrt{3}}{\pi } \ln \frac{\alpha }{1-\alpha }. \end{aligned}$$

Definition 3

(Liu 2009) Uncertain variables $\xi _1, \xi _2, \ldots , \xi _n$ are independent if

$$\begin{aligned} \displaystyle \mathcal {M}\left\{ \bigcap _{i=1}^{n} (\xi _{i} \in B_{i}) \right\} = \bigwedge _{i=1}^{n} \mathcal {M}\left\{ \xi _{i} \in B_{i} \right\} \end{aligned}$$

for any Borel sets $B_{1}, B_{2}, \ldots , B_{n}$ of real numbers.

Let $\xi _{1}, \xi _{2}, \ldots , \xi _{n}$ be independent uncertain variables with regular uncertainty distributions $\varPhi _{1}, \varPhi _{2}, \ldots , \varPhi _{n}$, respectively. If $f(x_{1}, x_{2}, \ldots , x_{n})$ is a strictly monotonous function, then the inverse uncertainty distribution of the uncertain variable $f(\xi _{1}, \xi _{2}, \ldots , \xi _{n})$ can be calculated by the following theorem (Liu 2010).

Theorem 1

(Liu 2010) Let $\xi _{1}, \xi _{2}, \ldots , \xi _{n}$ be independent uncertain variables with regular uncertainty distributions $\varPhi _{1}, \varPhi _{2}, \ldots , \varPhi _{n}$, respectively. If a function $f(\xi _{1}, \xi _{2}, \ldots , \xi _{n})$ is strictly increasing with respect to $\xi _{1}, \xi _{2}, \ldots , \xi _{m}$ and strictly decreasing with respect to $\xi _{m+1}, \xi _{m+2}, \ldots , \xi _{n}$, then the uncertain variable $\xi = f(\xi _{1}, \xi _{2}, \ldots , \xi _{n})$ has an inverse uncertainty distribution

$$\begin{aligned}&\displaystyle \varPsi ^{-1}(\alpha ) \\&\quad = f(\varPhi ^{-1}_{1}(\alpha ), \ldots , \varPhi ^{-1}_{m}(\alpha ), \varPhi ^{-1}_{m+1}(1-\alpha ), \ldots , \varPhi ^{-1}_{n}(1-\alpha )). \end{aligned}$$

Definition 4

(Liu (2007)) Let $\xi $ be an uncertain variable. The expected value of $\xi $ is defined as

$$\begin{aligned} \displaystyle E[\xi ] = \int _{0}^{+\infty } \mathcal {M}\{\xi \ge x \}\mathrm {d} x - \int _{-\infty }^{0} \mathcal {M}\{\xi \le x \}\mathrm {d} x \end{aligned}$$

provided that at least one of the two integrals is finite.

Definition 5

(Liu 2007) Let $\xi $ be an uncertain variable with finite expected value e. The variance of $\xi $ is

$$\begin{aligned} V[\xi ]=E\left[ (\xi -e)^{2}\right] . \end{aligned}$$

Theorem 2

(Liu 2010) If $\xi $ is an uncertain variable with regular uncertainty distribution $\varPhi $, then we have

$$\begin{aligned} E[\xi ]= & {} \int _{0}^{1} \varPhi ^{-1}(\alpha ) \mathrm{d}\alpha , \end{aligned}$$

(1)

$$\begin{aligned} E[\xi ^{2}]= & {} \int _{0}^{1} (\varPhi ^{-1}(\alpha ))^{2} \mathrm{d}\alpha , \end{aligned}$$

(2)

$$\begin{aligned} V[\xi ]= & {} \int _{0}^{1} (\varPhi ^{-1}(\alpha )-e)^{2} \mathrm{d}\alpha . \end{aligned}$$

(3)

3 Uncertain Gompertz regression model

Let $(x_{1},x_{2},\ldots ,x_{p})$ be a vector of independent variables, and let y be the dependent variable. If the relationship between $(x_{1},x_{2},\ldots ,x_{p})$ and y can be expressed by a function, f, then the model is generally expressed as

$$\begin{aligned} y=f(x_{1},x_{2},\ldots ,x_{p}|{\varvec{\beta }})+\epsilon \end{aligned}$$

(4)

where ${\varvec{\beta }}$ is a vector of unknown parameters, and $\epsilon $ is a disturbance term. If we have a set of imprecisely observed data,

$$\begin{aligned} (\tilde{x}_{i1}, \tilde{x}_{i2}, \ldots , \tilde{x}_{ip}, \tilde{y}_{i}), \quad i=1,2,\ldots ,n \end{aligned}$$

(5)

where $\tilde{x}_{i1}, \tilde{x}_{i2}, \ldots , \tilde{x}_{ip},\tilde{y}_{i}$ are uncertain variables with uncertainty distributions $\varPhi _{i1}, \varPhi _{i2}, \ldots , \varPhi _{ip}, \varPsi _{i}, i=1,2,\ldots ,n$, respectively. In order to perform prediction or inference, it is necessary to find the vector of unknown parameters, ${\varvec{\beta }}$. Since it is impossible to find the precise value of the parameters, we are interested in obtaining an estimation of ${\varvec{\beta }}$, denoted by ${\varvec{\beta }}^{*}$, based on the imprecisely observed data.

As proposed by Yao and Liu (2018), the least squares estimate of ${\varvec{\beta }}$ in the regression model (4) can be obtained by solving the minimization problem below:

$$\begin{aligned} \min \limits _{{\varvec{\beta }}} \sum \limits _{i=1}^{n} E[(\tilde{y}_{i} - f(\tilde{x}_{i1},\tilde{x}_{i2},\ldots , \tilde{x}_{ip}|{\varvec{\beta }}))^{2}]. \end{aligned}$$

(6)

If we denote the optimal solution of the minimization problem (6) as ${\varvec{\beta }}^{*}$, then the fitted regression model is given as

$$\begin{aligned} y=f(x_{1},x_{2},\ldots ,x_{p})|{\varvec{\beta }}^{*}). \end{aligned}$$

(7)

Definition 6

The Gompertz regression model is defined as:

$$\begin{aligned} y = \beta _{0}\exp (-\beta _{1} \exp (-\beta _{2}x)) + \epsilon , \quad \beta _{0}>0,\beta _{1}>0,\beta _{2}>0 \end{aligned}$$

(8)

where $\beta _{0},\beta _{1},\beta _{2}$ are parameters.

Gompertz regression model is widely used in the biological system to determine the growth of population of a certain specie. It can accurately capture the characteristics of the population growth in nature.

Theorem 3

Let $(\tilde{x}_{i}, \tilde{y}_{i}), i=1,2,\ldots ,n$, be a set of imprecisely observed data, where $\tilde{x}_{i}, \tilde{y}_{i}$ are independent uncertain variables with regular uncertainty distributions $\varPhi _{i}, \varPsi _{i}, i=1,2,\ldots , n$, respectively. Then the least squares estimate of $\beta _{0}, \beta _{1}$ and $\beta _{2}$ in the Gompertz regression model is the optimal solution of the following minimization problem:

$$\begin{aligned}&\min \limits _{\beta _{0},\beta _{1},\beta _{2}} \sum \limits _{i=1}^{n} \int _{0}^{1} \nonumber \\&\quad \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0} \exp (-\beta _{1}\exp (-\beta _{2} \varPhi _{i}^{-1}(1-\alpha )) \right) ^{2} \mathrm{d}\alpha . \end{aligned}$$

(9)

Proof

By Eq. (6), the least squares estimate of $\beta _{0}, \beta _{1}$ and $\beta _{2}$ in the Gompertz regression model can be obtained by solving the minimization problem

$$\begin{aligned} \min \limits _{\beta _{0},\beta _{1},\beta _{2}} \sum \limits _{i=1}^{n} E \left[ \left( \tilde{y}_{i} - \beta _{0} \exp (-\beta _{1}\exp (\beta _{2}\tilde{x}_{i})) \right) ^{2} \right] . \end{aligned}$$

(10)

Since the function

$$\begin{aligned} \tilde{y}_{i} - \beta _{0} \exp (-\beta _{1}\exp (\beta _{2}\tilde{x}_{i})) \end{aligned}$$

(11)

is strictly increasing with respect to $\tilde{y}_{i}$ and strictly decreasing with respect to $\tilde{x}_{i}$ for each i. According to Theorem 1, the inverse uncertainty distribution of function (11) is

$$\begin{aligned} F_{i}^{-1}(\alpha ) = \varPsi _{i}^{-1}(\alpha ) - \beta _{0} \exp (-\beta _{1}\exp (-\beta _{2} \varPhi _{i}^{-1}(1-\alpha ))). \end{aligned}$$

Then according to Theorem 2, we have

$$\begin{aligned}&E \left[ \left( \tilde{y}_{i} - \beta _{0} \exp (-\beta _{1}\exp (\beta _{2}\tilde{x}_{i})) \right) ^{2} \right] \\&\quad = \int _{0}^{1} \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0} \exp (-\beta _{1}\exp (-\beta _{2} \varPhi _{i}^{-1}(1-\alpha )) \right) ^{2} \mathrm{d}\alpha . \end{aligned}$$

Thus the minimization problem (10) is equivalent to

$$\begin{aligned}&\min \limits _{\beta _{0},\beta _{1},\beta _{2}} \sum \limits _{i=1}^{n} \int _{0}^{1} \\&\quad \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0} \exp (-\beta _{1}\exp (-\beta _{2} \varPhi _{i}^{-1}(1-\alpha )) \right) ^{2} \mathrm{d}\alpha . \end{aligned}$$

The theorem is verified.

4 Residual analysis

In the regression model (4), there is always a disturbance term, $\epsilon $, because it is usually impossible for our model to perfectly fit each observation point. The disturbance term indicates the distance that dependent variable y may deviate from the regression. Because of the nature of disturbance, each observation has a different value of $\epsilon $, and hence we are only interested in finding an estimation of $\epsilon $ for the given set of imprecisely observed data

$$\begin{aligned} (\tilde{x}_{i1}, \tilde{x}_{i2}, \ldots , \tilde{x}_{ip}, \tilde{y}_{i}), \quad i=1,2,\ldots ,n. \end{aligned}$$

(12)

For each i, the difference between the observed value $\tilde{y}_{i}$ and $f(\tilde{x}_{i1}, \tilde{x}_{i2}, \ldots , \tilde{x}_{ip}|{\varvec{\beta }}^{*})$ represents the distance of observation and our regression model and hence is the disturbance term $\epsilon $. Thus, we propose a definition of $\epsilon $ as follows:

Definition 7

(Lio and Liu 2018) Let $(\tilde{x}_{i1}, \tilde{x}_{i2}, \ldots , \tilde{x}_{ip},\tilde{y}_{i}), \quad i=1,2,\ldots ,n$ be a set of imprecisely observed data, and let the fitted regression model be

$$\begin{aligned} y = f(x_{i1}, x_{i2}, \ldots , x_{ip}|{\varvec{\beta }}^{*}). \end{aligned}$$

(13)

Then for each $i \ (i=1,2,\ldots ,n)$, the term

$$\begin{aligned} \hat{\epsilon }_{i} = \tilde{y}_{i} - f(\tilde{x}_{i1}, \tilde{x}_{i2}, \ldots , \tilde{x}_{ip}|{\varvec{\beta }}^{*}) \end{aligned}$$

(14)

is called the ith residual.

In uncertainty theory, because the data are imprecisely observed, the ith residual $\epsilon $ is also assumed to be an uncertain variable. Because each observation has a different disturbance term, we use the average of the expected values of residuals:

$$\begin{aligned} \hat{e} = \frac{1}{n} \sum \limits _{i=1}^{n}E[\hat{\epsilon }_{i}] \end{aligned}$$

(15)

to estimate the expected value of $\epsilon $, and

$$\begin{aligned} \hat{\sigma }^{2} = \frac{1}{n} \sum \limits _{i=1}^{n}E[(\hat{\epsilon }_{i} - \hat{e})^{2}] \end{aligned}$$

(16)

to estimate the variance, where $\hat{\epsilon }_{i}$ are the ith residuals, $i=1,2,\ldots ,n$, respectively.

Theorem 4

Let $(\tilde{x}_{i}, \tilde{y}_{i}), i=1,2,\ldots ,n$ be a set of imprecisely observed data, where $\tilde{x}_{i}, \tilde{y}_{i}$ are independent uncertain variables with regular uncertainty distributions $\varPhi _{i}, \varPsi _{i}, i=1,2,\ldots , n$, respectively, and the fitted Gompertz regression model is denoted as

$$\begin{aligned} y = \beta _{0}^{*}\exp (-\beta _{1}^{*} \exp (-\beta _{2}^{*}x)), \quad \beta _{0}^{*}>0,\beta _{1}^{*}>0,\beta _{2}^{*}>0 \end{aligned}$$

(17)

Then the estimated expected value of $\epsilon $ is

$$\begin{aligned}&\hat{e} = \frac{1}{n} \sum \limits _{i=1}^{n} \int _{0}^{1} \nonumber \\&\quad \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0}^{*} \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} \varPhi _{i}^{-1}(1-\alpha ))) \right) \mathrm{d}\alpha \end{aligned}$$

(18)

and the estimated variance is

$$\begin{aligned}&\hat{\sigma }^{2} = \frac{1}{n} \sum \limits _{i=1}^{n} \int _{0}^{1} \nonumber \\&\quad \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0}^{*} \right. \nonumber \\&\left. \quad \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} \varPhi _{i}^{-1}(1-\alpha ))) - \hat{e} \right) ^{2} \mathrm{d}\alpha . \end{aligned}$$

(19)

Proof

Since the function

$$\begin{aligned} \tilde{y}_{i} - \beta _{0}^{*}\exp (-\beta _{1}^{*} \exp (-\beta _{2}^{*}\tilde{x}_{i})), \quad \beta _{0}^{*}>0,\beta _{1}^{*}>0,\beta _{2}^{*}>0 \nonumber \quad (*) \end{aligned}$$

is strictly increasing with respect to $\tilde{y}_{i}$ and strictly decreasing with respect to $\tilde{x}_{i}$ for each i. According to Theorem 1, the inverse uncertainty distribution of the function $(*)$ is

$$\begin{aligned} F_{i}^{-1}(\alpha ) = \varPsi _{i}^{-1}(\alpha ) - \beta _{0}^{*} \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} \varPhi _{i}^{-1}(1-\alpha ))). \end{aligned}$$

By the definition of estimate of residuals and variance of uncertain variables and Theorem 2, the theorem is proved trivially.

5 Forecast value and confidence interval

The main purpose of regression is to make prediction given a new observation of independent variables. Suppose $\tilde{x}_{p}$ is a new observation of independent variables, and $\tilde{x}_{p}$ is a uncertain dependent variable with regular uncertainty distribution $\varPhi _{p}$. Since the model has been constructed by using previous imprecisely observed data $(\tilde{x}_{i},\tilde{y}_{i}), \ i=1,2,\ldots ,n$, we can then make predictions and derive the forecast value for the new dependent variable $\tilde{y}_{p}$.

For example, scientists want to investigate the population of wild lions in a new plain. From previous imprecisely observed data set, the scientists choose the area of the plain, as independent variables, and the population of wild lions as dependent variable. Then, an uncertain Gompertz regression model can be constructed by these observations. By obtaining the data from the new plain, we can then use the model to predict the population of wild lions in the new plain.

Table 1 Imprecisely observed data where $\mathcal {L}(a,b)$ represents linear uncertain variable

Full size table

Mathematically, we can denote the fitted uncertain Gompertz regression model constructed by using previous observations as:

$$\begin{aligned} y = \beta _{0}^{*}\exp (-\beta _{1}^{*} \exp (-\beta _{2}^{*}x)), \quad \beta _{0}^{*}>0,\beta _{1}^{*}>0, \beta _{2}^{*}>0 \end{aligned}$$

(20)

and the residual $\epsilon $ has estimated expected value $\hat{e}$ and variance $\hat{\sigma }^{2}$, and is independent of $\tilde{x}_{p}$. Then the forecast uncertain variable of y with respect to $\tilde{x}_{p}$ is determined by

$$\begin{aligned} \hat{y} = \beta _{0}^{*}\exp (-\beta _{1}^{*} \exp (-\beta _{2}^{*}\tilde{x})) + \epsilon . \end{aligned}$$

(21)

The intuitive attempt would be finding a point estimate for the new y. The forecast value of y is defined as

$$\begin{aligned} \mu = \beta _{0}^{*}E[\exp (-\beta _{1}^{*} \exp (-\beta _{2}^{*}\tilde{x}))] + \hat{e}, \end{aligned}$$

(22)

or we can use the integral to express $\mu $:

$$\begin{aligned} \mu = \int _{0}^{1} (\beta _{0}^{*} \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} \varPhi ^{-1}(1-\alpha ))))\mathrm{d}\alpha + \hat{e}. \end{aligned}$$

(23)

The forecast value of y represents the expected value of the forecast uncertain variable $\hat{y}$. Assume that the disturbance term $\epsilon $ has a normal uncertainty distribution $N(\hat{e},\hat{\sigma })$ with inverse uncertainty distribution $\varPhi ^{-1}(\alpha )$, i.e.,

$$\begin{aligned} \varPhi ^{-1}(\alpha )=\hat{e} + \frac{\hat{\sigma }\sqrt{3}}{\pi } \ln \frac{\alpha }{1-\alpha }. \end{aligned}$$

(24)

We can then derive the inverse uncertainty distribution of $\hat{y}$:

$$\begin{aligned} \hat{\varPsi }^{-1}(\alpha ) = \beta _{0}^{*} \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} \varPhi ^{-1}(1-\alpha ))) + \varPhi ^{-1}(\alpha ) \end{aligned}$$

(25)

Trivially, the uncertainty distribution, $\hat{\varPsi }$, of $\hat{y}$ can be obtained by $\hat{\varPsi }^{-1}$.

The forecast value, $\mu $, is a point estimation of y. However, we are more interested to find a range of values that y might fall within. Hence, the confidence interval of uncertain Gompertz regression model is proposed. Although we might loss some precision or accuracy of prediction, but we are more confident about the correctness of our prediction. Let $\alpha $ denote the confidence level, which indicates the belief degree of corresponding confidence intervals that contain the true value of y. We can derive the confidence interval by finding the minimum value b such that

$$\begin{aligned} \hat{\varPsi }(\mu +b)-\hat{\varPsi }(\mu -b) \ge \alpha . \end{aligned}$$

(26)

Since

$$\begin{aligned} \mathcal {M}\{ \mu -b \le \hat{y} \le \mu +b \} \ge \hat{\varPsi }(\mu +b)-\hat{\varPsi }(\mu -b), \end{aligned}$$

(27)

it follows that $\mathcal {M}\{ \mu -b \le \hat{y} \le \mu +b \} \ge \alpha $. Thus the $\alpha $ confidence interval of y can be represented as $[\mu -b,\mu +b]$, which can be simplified as

$$\begin{aligned} \mu \pm b. \end{aligned}$$

(28)

6 Numerical example

In this section, we will use an example to illustrate the application of uncertain Gompertz regression model to make prediction for a new independent data with imprecise observation. Furthermore, we will provide a confidence interval to make stronger and more reliable prediction when compared to the point estimation.

The data are originally about spores germination, where the independent variable is the day from the beginning of experiment and the dependent variable is the germination rate. The data can only be imprecisely observed because of the condition of experiment, and hence an uncertain Gompertz regression model should be applied under this circumstance.

We use $(\tilde{x}_{i},\tilde{y}_{i}), i=1,2,\ldots ,18$ to denote the set of imprecisely observed data of spores germination, where $\tilde{x}_{i},\tilde{y}_{i}$ are independent uncertain variables with linear uncertainty distributions, $\varPhi _{i}, \varPsi _{i}$, respectively. $\tilde{x}_{i}$ represents the days and $\tilde{y}_{i}$ represents germination rate. The specific data are provided in Table 1.

We may use uncertain Gompertz regression model to forecast the germination rate for any given day. The uncertain Gompertz regression model is given as

$$\begin{aligned} y = \beta _{0}\exp (-\beta _{1} \exp (-\beta _{2}x)) + \epsilon . \end{aligned}$$

(29)

In order to obtain the least squares estimate of $\beta _{0}, \beta _{1}$ and $\beta _{2}$ in the Gompertz regression model, we need to solve the minimization problem (6), i.e.,

$$\begin{aligned} \min \limits _{\beta _{0},\beta _{1},\beta _{2}} \sum \limits _{i=1}^{n} E \left[ \left( \tilde{y}_{i} - \beta _{0} \exp (-\beta _{1}\exp (\beta _{2}\tilde{x}_{i})) \right) ^{2} \right] , \end{aligned}$$

(30)

or equivalently,

$$\begin{aligned}&\min \limits _{\beta _{0},\beta _{1},\beta _{2}} \sum \limits _{i=1}^{n} \int _{0}^{1} \nonumber \\&\quad \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0} \exp (-\beta _{1}\exp (-\beta _{2} \varPhi _{i}^{-1}(1-\alpha ))) \right) ^{2} \mathrm{d}\alpha . \end{aligned}$$

(31)

From Theorem 3, we can obtain the least squares estimate

$$\begin{aligned} (\beta _{0}^{*},\beta _{1}^{*},\beta _{2}^{*})=(86.0493,11.9398,0.2391). \end{aligned}$$

(32)

The fitted Gompertz regression model is then

$$\begin{aligned} y = 86.0493\exp (-11.9398 \exp (-0.2391x)). \end{aligned}$$

(33)

From Eq. (18), i.e.,

$$\begin{aligned}&\hat{e} = \frac{1}{18} \sum \limits _{i=1}^{18} \int _{0}^{1} \nonumber \\&\quad \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0}^{*} \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} \varPhi _{i}^{-1}(1-\alpha ))) \right) \mathrm{d}\alpha , \end{aligned}$$

(34)

we can obtain the estimated expected value of the disturbance term $\epsilon $, and from Eq. (19), i.e.,

$$\begin{aligned}&\hat{\sigma }^{2} = \frac{1}{18} \sum \limits _{i=1}^{18} \int _{0}^{1} \nonumber \\&\quad \left( \varPsi _{i}^{-1}(\alpha ) - \beta _{0}^{*} \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} \varPhi _{i}^{-1}(1-\alpha ))) - \hat{e} \right) ^{2} \mathrm{d}\alpha , \end{aligned}$$

(35)

we can obtain the variance of the disturbance term $\epsilon $. The estimated expected value and variance of $\epsilon $ are

$$\begin{aligned} \hat{e}=0.0000, \quad \hat{\sigma }^{2}=3.5725, \end{aligned}$$

(36)

respectively. Now suppose

$$\begin{aligned} \tilde{x}_{p} \sim \mathcal {L}(24.0,26.0) \end{aligned}$$

(37)

is a new imprecisely observed data of a day. Assume the new day $\tilde{x}_{p}$ is independent of $,\epsilon $, we can then obtain the forecast uncertain variable of the dependent variable y

$$\begin{aligned} \hat{y} = 86.0493\exp (-11.9398 \exp (-0.2391\tilde{x}_{p})) + \epsilon , \end{aligned}$$

(38)

and the forecast value of y is 83.4837 which can be obtained from Eq. (22), i.e.,

$$\begin{aligned} \mu = \beta _{0}^{*}E[\exp (-\beta _{1}^{*} \exp (-\beta _{2}^{*}\tilde{x}_{p}))] + \hat{e}. \end{aligned}$$

(39)

For the confidence level $\alpha =95\%$, if we suppose that the disturbance term $\epsilon $ is a normal uncertain variable, then

$$\begin{aligned} b=5.7271 \end{aligned}$$

(40)

is the minimum value for Eq. (26), i.e.,

$$\begin{aligned} \hat{\varPsi }(\mu +b)-\hat{\varPsi }(\mu -b) \ge 95 \% . \end{aligned}$$

(41)

$\hat{\varPsi }$ is the uncertainty distribution of $\hat{y}$ and can be calculated by

$$\begin{aligned} \hat{\varPsi }^{-1}(\alpha ) = \beta _{0}^{*} \exp (-\beta _{1}^{*}\exp (-\beta _{2}^{*} (24(1-\alpha )+26\alpha ))) \end{aligned}$$

(42)

where $\varPhi ^{-1}(\alpha )$ is the inverse uncertainty distribution of normal uncertain variable $\mathcal {N}(\hat{e}, \hat{\sigma })$. The $95\%$ confidence interval of dependent variable y is then

$$\begin{aligned} 83.4837 \pm 5.7271. \end{aligned}$$

(43)

7 Conclusion

This article introduced the uncertain Gompertz regression model with the method to obtain estimate least square betas, model’s forecast value and confidence interval, and residual analysis of the model. We also provided a numerical example of spores germination to explain the application of the uncertain Gompertz regression model in the real-life problem.

References

Corral N, Gil MA (1984) The minimum inaccuracy fuzzy estimation: an extension of the maximum likelihood principle. Stochastica 8:63–81
MathSciNet MATH Google Scholar
Fisher RA (1925) Statistical methods for research workers. Oliver and Boyd, Edinburgh
MATH Google Scholar
Galton F (1885) Regression towards mediocrity in hereditary stature. J Anthropol Inst 15:246–263
Google Scholar
Gompertz B (1825) On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philos Trans B 115:513585
Google Scholar
Laird AK (1964) Dynamics of tumor growth. Br J Cancer 18:490502
Article Google Scholar
Legendre AM (1805) New methods for the determination of the orbits of comets. Firmin Didot, Paris
Google Scholar
Lio W, Liu B (2018) Residual and confidence interval for uncertain regression model with imprecise observations. J Intell Fuzzy Syst 35:2573–83
Article Google Scholar
Liu B (2007) Uncertainty theory, 2nd edn. Springer, Berlin
MATH Google Scholar
Liu B (2009) Some research problems in uncertainty theory. J Uncertain Syst 3:3–10
Google Scholar
Liu B (2010) Uncertainty theory: a branch of mathematics for modeling human uncertainty. Springer, Berlin
Book Google Scholar
Liu B (2012) Why is there a need for uncertainty theory. J Uncertain Syst 6:3–10
Google Scholar
Neyman J, Pearson ES (1933) On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc A Math Phys Eng Sci 231:289–337
MATH Google Scholar
Nguimkeu P (2014) A simple selection test between the Gompertz and logistic growth models. Technol Forecast Soc Change 88:98–105
Article Google Scholar
Student (1908) The probable error of a mean. Biometrika 6:1–25
Article Google Scholar
Tanaka H, Uejima S, Asai K (1982) Linear regression analysis with fuzzy model. IEEE Trans Syst Man Cybern 12:903–907
Article Google Scholar
Wilks SS (1938) The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann Math Stat 9:60–62
Article Google Scholar
Yang X, Liu B (2017) Uncertain time series analysis with imprecise observations. Fuzzy Optim Decis Mak (in press)
Yao K, Liu B (2018) Uncertain regression analysis: an approach for imprecise observations. Soft Comput 22:557982
Google Scholar
Zwietering MH, Jongenburger I, Rombouts FM, VanT Rie TK (1990) Modeling of the bacterial growth curve. Appl Environ Micriobiol 56:197581
Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. 61374082).

Author information

Authors and Affiliations

Department of Mathematics, University of California, Los Angeles, 90024, USA
Zeyu Hu
Uncertain Systems Lab, School of Mathematics, Renmin University of China, Beijing, 100872, China
Jinwu Gao

Authors

Zeyu Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jinwu Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinwu Gao.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Ethical Approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by Y. Ni.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, Z., Gao, J. Uncertain Gompertz regression model with imprecise observations. Soft Comput 24, 2543–2549 (2020). https://doi.org/10.1007/s00500-018-3611-1

Download citation

Published: 02 November 2018
Issue Date: February 2020
DOI: https://doi.org/10.1007/s00500-018-3611-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Uncertain Gompertz regression model with imprecise observations

Abstract

Similar content being viewed by others

Multivariate uncertain regression model with imprecise observations

Research on Regression Model Based on Uncertain System

Uncertain regression analysis: an approach for imprecise observations

Explore related subjects

1 Introduction

2 Preliminaries

Definition 1

Axiom 1

Axiom 2

Axiom 3

Axiom 4

Definition 2

Definition 3

Theorem 1

Definition 4

Definition 5

Theorem 2

3 Uncertain Gompertz regression model

Definition 6

Theorem 3

Proof

4 Residual analysis

Definition 7

Theorem 4

Proof

5 Forecast value and confidence interval

6 Numerical example

7 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation