Abstract
Regression is widely applied in many fields. Regardless of the types of regression, we often assume that the observations are precise. However, in real-life circumstances, this assumption can only be met sometimes, which means the traditional regression methods can result in significant imprecise or biased predictions. Consequently, uncertain regression models might provide more accurate and meaningful results under these circumstances. In this article, we provide the residual analysis of uncertain Gompertz regression model, as well as the corresponding forecast value and confidence interval. Finally, we give a numerical example of uncertain Gompertz regression model.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Regression is an important method in inference and prediction problems. Since the invention of method of least square by Legendre (1805), regression analysis is widely used to investigate relationships between independent variables and dependent variable. Statisticians also use method of maximum likelihood popularized by Wilks (1938) together with the method of least square. With the development of regression analysis, t test (Student 1908) and F test (Fisher 1925) were investigated to test the hypothesis of regression. Proposed by Neyman and Pearson (1933), likelihood ratio test also shows promising results in most cases. Furthermore, starting with the linear regression (Galton 1885), mathematicians also invented numerous nonlinear regression models to fit different scenarios. Well-known examples of nonlinear regression models include logistic regression model, polynomial regression model, etc. This paper will analyze another important nonlinear regression model: Gompertz regression model.
Gompertz regression model was invented by Benjamin Gompertz in 1825 to elaborate his law of human mortality (Gompertz 1825). Gompertz model has a “S-shape” feature that enables it to be widely applied in many biological systems to determine the growth of population, because it can accurately describe the process of a relatively low growth rate in the early stage and late stage as well as a rapid growth in the intermediate stage (Nguimkeu 2014). Although Gompertz model shares similar properties with logistic model, it is symmetric while Logistic model is not, so one model might outperform the other in some circumstances (Nguimkeu 2014). For instances, Laird used Gompertz model to describe the growth of tumor (Laird 1964); Zwietering and other biologists investigated the bacterial growth curve based on the Gompertz model (Zwietering et al. 1990). It is then important to further analyze the property of Gompertz model to better describe related biological feature.
Most of the time, statisticians consider observations as precise data and ignore the fact that data points might be acquired in non-random way, which might lead to inaccurate inference or predictions. For instance, in a large biological system, it is always impossible to precisely measure the whole population. Scientists usually use the capture–recapture method to estimate the whole population size, which might result in biased estimation. In order to handle scenarios like this, Tanaka et al. (1982) first proposed a fuzzy linear regression model in 1982 and was later modified and improved by Corral and Gil (1984). In addition to the method proposed by Tanaka et al., uncertainty theory is shown to provide more accurate and reliable results at some cases (Liu 2012). Uncertainty theory was proposed to investigate the relationship between independent variables and dependent variable with uncertain observed data.
In uncertainty theory, the unknown parameters in models can be estimated by the principle of least squares (Yao and Liu 2018). Yang and Liu (2017) proposed uncertain time series analysis and estimated the unknown parameters using the principle of least squares. Besides, Lio and Liu have already deduced the method to find unknown parameters, forecast value, and confidence interval in several uncertain regression models (Lio and Liu 2018). In this article, we follow their process to analyze the properties of uncertain Gompertz regression model. In Sect. 2, we will introduce some preliminary knowledge of uncertainty theory. In Sects. 3 and 4, we will provide the method to find unknown parameters in the Gompertz regression model and its residual analysis. In Sect. 5, the confidence interval of the uncertain Gompertz regression models is provided. In Sect. 6, we will provide a numerical example to illustrate the application of the uncertain Gompertz regression model. Finally, some conclusions are made in Sect. 7.
2 Preliminaries
According to Liu, many surveys showed that subjective uncertainty cannot be modeled by fuzziness and hence cannot be processed by possibility theory (Liu 2009). In order to deal with subjective uncertainty, Liu proposed uncertainty theory to better capture the properties of imprecise observations that rely on degrees of belief (Liu 2007). In this section, we will provide some concepts and theorems of uncertainty theory which will be useful in the following analysis of Gompertz regression model.
Definition 1
(Liu 2007) Let \(\mathcal {L}\) be a \(\sigma \)-algebra on a nonempty set \(\varGamma \). Then \((\varGamma ,\mathcal {L})\) is a measurable space, and each element \(\varGamma \) in \(\mathcal {L}\) is called an event. An uncertain measure is a set function \(\mathcal {M}: \mathcal {L}\rightarrow [0,1]\) if it satisfies the following axioms:
Axiom 1
(Normality Axiom) \(\mathcal {M}\{\varGamma \}=1\) for the universal set \(\varGamma \).
Axiom 2
(Duality Axiom) \(\mathcal {M}\{\varLambda \}+\mathcal {M}\{\varLambda ^{c}\}=1\) for any event \(\varLambda \).
Axiom 3
(Subadditivity Axiom) For every countablesequence of events \(\varLambda _1, \varLambda _2, \ldots ,\) we have
Let \(\varGamma \) be a nonempty set, let \(\mathcal {L}\) be a \(\sigma \)-algebra over \(\varGamma \), and let \(\mathcal {M}\) be an uncertain measure. Then the triplet \((\varGamma ,\mathcal {L},\mathcal {M})\) is called an uncertainty space. Furthermore, the product uncertain measure \(\mathcal {M}\) satisfies the following axiom:
Axiom 4
(Product Axiom) (Liu 2009) Let \((\varGamma _k,\mathcal {L}_k,\mathcal {M}_k)\) be uncertainty spaces for \(k=1, 2, \ldots \). The product uncertain measure \(\mathcal {M}\)is an uncertain measure satisfying
where\(\varLambda _k\)are arbitrarily chosen events from\(\mathcal {L}_k\)for\(k=1, 2, \ldots \), respectively.
Definition 2
(Liu 2007) An uncertain variable \(\xi \) is a measurable function from an uncertainty space \((\varGamma , \mathcal {L}, \mathcal {M})\) to the set of real numbers, i.e., for any Borel set B of real numbers. The set
is an event.
We define the uncertainty distribution \(\varPhi \) of an uncertain variable \(\xi \) as \(\varPhi (x) = \mathcal {M}\{ \xi \le x \}\) for any real number x. An uncertainty distribution \(\varPhi (x)\) is called regular if it is a continuous and strictly increasing function with respect to x where \(0<\varPhi (x)<1\), and
Let \(\xi \) be an uncertain variable with regular uncertainty distribution \(\varPhi (x)\). Then the inverse function \(\varPhi ^{-1}(\alpha )\) is called the inverse uncertainty distribution of \(\xi \) (Liu 2010).
Now we introduce some special uncertainty distributions.
An uncertain variable \(\xi \) is called linear, denoted by \(\mathcal {L}(a,b)\), if it has an uncertainty distribution
where a and b are real numbers with \(a<b\). The inverse uncertainty distribution of \(\mathcal {L}(a,b)\) is
An uncertain variable \(\xi \) is called zigzag, denoted by \(\mathcal {Z}(a,b,c)\), if it has an uncertainty distribution
where a, b and c are real numbers with \(a<b<c\). Then the inverse uncertainty distribution of \(\mathcal {Z}(a,b,c)\) is
An uncertain variable \(\xi \) is called normal, denoted by \(\mathcal {N}(e,\sigma )\), if it has an uncertainty distribution
where e and \(\sigma \) are real numbers with \(\sigma >0\). Then the inverse uncertainty distribution of \(\mathcal {N}(e, \sigma )\) is
Definition 3
(Liu 2009) Uncertain variables \(\xi _1, \xi _2, \ldots , \xi _n\) are independent if
for any Borel sets \(B_{1}, B_{2}, \ldots , B_{n}\) of real numbers.
Let \(\xi _{1}, \xi _{2}, \ldots , \xi _{n}\) be independent uncertain variables with regular uncertainty distributions \(\varPhi _{1}, \varPhi _{2}, \ldots , \varPhi _{n}\), respectively. If \(f(x_{1}, x_{2}, \ldots , x_{n})\) is a strictly monotonous function, then the inverse uncertainty distribution of the uncertain variable \(f(\xi _{1}, \xi _{2}, \ldots , \xi _{n})\) can be calculated by the following theorem (Liu 2010).
Theorem 1
(Liu 2010) Let \(\xi _{1}, \xi _{2}, \ldots , \xi _{n}\) be independent uncertain variables with regular uncertainty distributions \(\varPhi _{1}, \varPhi _{2}, \ldots , \varPhi _{n}\), respectively. If a function \(f(\xi _{1}, \xi _{2}, \ldots , \xi _{n})\) is strictly increasing with respect to \(\xi _{1}, \xi _{2}, \ldots , \xi _{m}\) and strictly decreasing with respect to \(\xi _{m+1}, \xi _{m+2}, \ldots , \xi _{n}\), then the uncertain variable \(\xi = f(\xi _{1}, \xi _{2}, \ldots , \xi _{n})\) has an inverse uncertainty distribution
Definition 4
(Liu (2007)) Let \(\xi \) be an uncertain variable. The expected value of \(\xi \) is defined as
provided that at least one of the two integrals is finite.
Definition 5
(Liu 2007) Let \(\xi \) be an uncertain variable with finite expected value e. The variance of \(\xi \) is
Theorem 2
(Liu 2010) If \(\xi \) is an uncertain variable with regular uncertainty distribution \(\varPhi \), then we have
3 Uncertain Gompertz regression model
Let \((x_{1},x_{2},\ldots ,x_{p})\) be a vector of independent variables, and let y be the dependent variable. If the relationship between \((x_{1},x_{2},\ldots ,x_{p})\) and y can be expressed by a function, f, then the model is generally expressed as
where \({\varvec{\beta }}\) is a vector of unknown parameters, and \(\epsilon \) is a disturbance term. If we have a set of imprecisely observed data,
where \(\tilde{x}_{i1}, \tilde{x}_{i2}, \ldots , \tilde{x}_{ip},\tilde{y}_{i}\) are uncertain variables with uncertainty distributions \(\varPhi _{i1}, \varPhi _{i2}, \ldots , \varPhi _{ip}, \varPsi _{i}, i=1,2,\ldots ,n\), respectively. In order to perform prediction or inference, it is necessary to find the vector of unknown parameters, \({\varvec{\beta }}\). Since it is impossible to find the precise value of the parameters, we are interested in obtaining an estimation of \({\varvec{\beta }}\), denoted by \({\varvec{\beta }}^{*}\), based on the imprecisely observed data.
As proposed by Yao and Liu (2018), the least squares estimate of \({\varvec{\beta }}\) in the regression model (4) can be obtained by solving the minimization problem below:
If we denote the optimal solution of the minimization problem (6) as \({\varvec{\beta }}^{*}\), then the fitted regression model is given as
Definition 6
The Gompertz regression model is defined as:
where \(\beta _{0},\beta _{1},\beta _{2}\) are parameters.
Gompertz regression model is widely used in the biological system to determine the growth of population of a certain specie. It can accurately capture the characteristics of the population growth in nature.
Theorem 3
Let \((\tilde{x}_{i}, \tilde{y}_{i}), i=1,2,\ldots ,n\), be a set of imprecisely observed data, where \(\tilde{x}_{i}, \tilde{y}_{i}\) are independent uncertain variables with regular uncertainty distributions \(\varPhi _{i}, \varPsi _{i}, i=1,2,\ldots , n\), respectively. Then the least squares estimate of \(\beta _{0}, \beta _{1}\) and \(\beta _{2}\) in the Gompertz regression model is the optimal solution of the following minimization problem:
Proof
By Eq. (6), the least squares estimate of \(\beta _{0}, \beta _{1}\) and \(\beta _{2}\) in the Gompertz regression model can be obtained by solving the minimization problem
Since the function
is strictly increasing with respect to \(\tilde{y}_{i}\) and strictly decreasing with respect to \(\tilde{x}_{i}\) for each i. According to Theorem 1, the inverse uncertainty distribution of function (11) is
Then according to Theorem 2, we have
Thus the minimization problem (10) is equivalent to
The theorem is verified.
4 Residual analysis
In the regression model (4), there is always a disturbance term, \(\epsilon \), because it is usually impossible for our model to perfectly fit each observation point. The disturbance term indicates the distance that dependent variable y may deviate from the regression. Because of the nature of disturbance, each observation has a different value of \(\epsilon \), and hence we are only interested in finding an estimation of \(\epsilon \) for the given set of imprecisely observed data
For each i, the difference between the observed value \(\tilde{y}_{i}\) and \(f(\tilde{x}_{i1}, \tilde{x}_{i2}, \ldots , \tilde{x}_{ip}|{\varvec{\beta }}^{*})\) represents the distance of observation and our regression model and hence is the disturbance term \(\epsilon \). Thus, we propose a definition of \(\epsilon \) as follows:
Definition 7
(Lio and Liu 2018) Let \((\tilde{x}_{i1}, \tilde{x}_{i2}, \ldots , \tilde{x}_{ip},\tilde{y}_{i}), \quad i=1,2,\ldots ,n\) be a set of imprecisely observed data, and let the fitted regression model be
Then for each \(i \ (i=1,2,\ldots ,n)\), the term
is called the ith residual.
In uncertainty theory, because the data are imprecisely observed, the ith residual \(\epsilon \) is also assumed to be an uncertain variable. Because each observation has a different disturbance term, we use the average of the expected values of residuals:
to estimate the expected value of \(\epsilon \), and
to estimate the variance, where \(\hat{\epsilon }_{i}\) are the ith residuals, \(i=1,2,\ldots ,n\), respectively.
Theorem 4
Let \((\tilde{x}_{i}, \tilde{y}_{i}), i=1,2,\ldots ,n\) be a set of imprecisely observed data, where \(\tilde{x}_{i}, \tilde{y}_{i}\) are independent uncertain variables with regular uncertainty distributions \(\varPhi _{i}, \varPsi _{i}, i=1,2,\ldots , n\), respectively, and the fitted Gompertz regression model is denoted as
Then the estimated expected value of \(\epsilon \) is
and the estimated variance is
Proof
Since the function
is strictly increasing with respect to \(\tilde{y}_{i}\) and strictly decreasing with respect to \(\tilde{x}_{i}\) for each i. According to Theorem 1, the inverse uncertainty distribution of the function \((*)\) is
By the definition of estimate of residuals and variance of uncertain variables and Theorem 2, the theorem is proved trivially.
5 Forecast value and confidence interval
The main purpose of regression is to make prediction given a new observation of independent variables. Suppose \(\tilde{x}_{p}\) is a new observation of independent variables, and \(\tilde{x}_{p}\) is a uncertain dependent variable with regular uncertainty distribution \(\varPhi _{p}\). Since the model has been constructed by using previous imprecisely observed data \((\tilde{x}_{i},\tilde{y}_{i}), \ i=1,2,\ldots ,n\), we can then make predictions and derive the forecast value for the new dependent variable \(\tilde{y}_{p}\).
For example, scientists want to investigate the population of wild lions in a new plain. From previous imprecisely observed data set, the scientists choose the area of the plain, as independent variables, and the population of wild lions as dependent variable. Then, an uncertain Gompertz regression model can be constructed by these observations. By obtaining the data from the new plain, we can then use the model to predict the population of wild lions in the new plain.
Mathematically, we can denote the fitted uncertain Gompertz regression model constructed by using previous observations as:
and the residual \(\epsilon \) has estimated expected value \(\hat{e}\) and variance \(\hat{\sigma }^{2}\), and is independent of \(\tilde{x}_{p}\). Then the forecast uncertain variable of y with respect to \(\tilde{x}_{p}\) is determined by
The intuitive attempt would be finding a point estimate for the new y. The forecast value of y is defined as
or we can use the integral to express \(\mu \):
The forecast value of y represents the expected value of the forecast uncertain variable \(\hat{y}\). Assume that the disturbance term \(\epsilon \) has a normal uncertainty distribution \(N(\hat{e},\hat{\sigma })\) with inverse uncertainty distribution \(\varPhi ^{-1}(\alpha )\), i.e.,
We can then derive the inverse uncertainty distribution of \(\hat{y}\):
Trivially, the uncertainty distribution, \(\hat{\varPsi }\), of \(\hat{y}\) can be obtained by \(\hat{\varPsi }^{-1}\).
The forecast value, \(\mu \), is a point estimation of y. However, we are more interested to find a range of values that y might fall within. Hence, the confidence interval of uncertain Gompertz regression model is proposed. Although we might loss some precision or accuracy of prediction, but we are more confident about the correctness of our prediction. Let \(\alpha \) denote the confidence level, which indicates the belief degree of corresponding confidence intervals that contain the true value of y. We can derive the confidence interval by finding the minimum value b such that
Since
it follows that \(\mathcal {M}\{ \mu -b \le \hat{y} \le \mu +b \} \ge \alpha \). Thus the \(\alpha \) confidence interval of y can be represented as \([\mu -b,\mu +b]\), which can be simplified as
6 Numerical example
In this section, we will use an example to illustrate the application of uncertain Gompertz regression model to make prediction for a new independent data with imprecise observation. Furthermore, we will provide a confidence interval to make stronger and more reliable prediction when compared to the point estimation.
The data are originally about spores germination, where the independent variable is the day from the beginning of experiment and the dependent variable is the germination rate. The data can only be imprecisely observed because of the condition of experiment, and hence an uncertain Gompertz regression model should be applied under this circumstance.
We use \((\tilde{x}_{i},\tilde{y}_{i}), i=1,2,\ldots ,18\) to denote the set of imprecisely observed data of spores germination, where \(\tilde{x}_{i},\tilde{y}_{i}\) are independent uncertain variables with linear uncertainty distributions, \(\varPhi _{i}, \varPsi _{i}\), respectively. \(\tilde{x}_{i}\) represents the days and \(\tilde{y}_{i}\) represents germination rate. The specific data are provided in Table 1.
We may use uncertain Gompertz regression model to forecast the germination rate for any given day. The uncertain Gompertz regression model is given as
In order to obtain the least squares estimate of \(\beta _{0}, \beta _{1}\) and \(\beta _{2}\) in the Gompertz regression model, we need to solve the minimization problem (6), i.e.,
or equivalently,
From Theorem 3, we can obtain the least squares estimate
The fitted Gompertz regression model is then
From Eq. (18), i.e.,
we can obtain the estimated expected value of the disturbance term \(\epsilon \), and from Eq. (19), i.e.,
we can obtain the variance of the disturbance term \(\epsilon \). The estimated expected value and variance of \(\epsilon \) are
respectively. Now suppose
is a new imprecisely observed data of a day. Assume the new day \(\tilde{x}_{p}\) is independent of \(,\epsilon \), we can then obtain the forecast uncertain variable of the dependent variable y
and the forecast value of y is 83.4837 which can be obtained from Eq. (22), i.e.,
For the confidence level \(\alpha =95\%\), if we suppose that the disturbance term \(\epsilon \) is a normal uncertain variable, then
is the minimum value for Eq. (26), i.e.,
\(\hat{\varPsi }\) is the uncertainty distribution of \(\hat{y}\) and can be calculated by
where \(\varPhi ^{-1}(\alpha )\) is the inverse uncertainty distribution of normal uncertain variable \(\mathcal {N}(\hat{e}, \hat{\sigma })\). The \(95\%\) confidence interval of dependent variable y is then
7 Conclusion
This article introduced the uncertain Gompertz regression model with the method to obtain estimate least square betas, model’s forecast value and confidence interval, and residual analysis of the model. We also provided a numerical example of spores germination to explain the application of the uncertain Gompertz regression model in the real-life problem.
References
Corral N, Gil MA (1984) The minimum inaccuracy fuzzy estimation: an extension of the maximum likelihood principle. Stochastica 8:63–81
Fisher RA (1925) Statistical methods for research workers. Oliver and Boyd, Edinburgh
Galton F (1885) Regression towards mediocrity in hereditary stature. J Anthropol Inst 15:246–263
Gompertz B (1825) On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philos Trans B 115:513585
Laird AK (1964) Dynamics of tumor growth. Br J Cancer 18:490502
Legendre AM (1805) New methods for the determination of the orbits of comets. Firmin Didot, Paris
Lio W, Liu B (2018) Residual and confidence interval for uncertain regression model with imprecise observations. J Intell Fuzzy Syst 35:2573–83
Liu B (2007) Uncertainty theory, 2nd edn. Springer, Berlin
Liu B (2009) Some research problems in uncertainty theory. J Uncertain Syst 3:3–10
Liu B (2010) Uncertainty theory: a branch of mathematics for modeling human uncertainty. Springer, Berlin
Liu B (2012) Why is there a need for uncertainty theory. J Uncertain Syst 6:3–10
Neyman J, Pearson ES (1933) On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc A Math Phys Eng Sci 231:289–337
Nguimkeu P (2014) A simple selection test between the Gompertz and logistic growth models. Technol Forecast Soc Change 88:98–105
Student (1908) The probable error of a mean. Biometrika 6:1–25
Tanaka H, Uejima S, Asai K (1982) Linear regression analysis with fuzzy model. IEEE Trans Syst Man Cybern 12:903–907
Wilks SS (1938) The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann Math Stat 9:60–62
Yang X, Liu B (2017) Uncertain time series analysis with imprecise observations. Fuzzy Optim Decis Mak (in press)
Yao K, Liu B (2018) Uncertain regression analysis: an approach for imprecise observations. Soft Comput 22:557982
Zwietering MH, Jongenburger I, Rombouts FM, VanT Rie TK (1990) Modeling of the bacterial growth curve. Appl Environ Micriobiol 56:197581
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant No. 61374082).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Ethical Approval
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Communicated by Y. Ni.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hu, Z., Gao, J. Uncertain Gompertz regression model with imprecise observations. Soft Comput 24, 2543–2549 (2020). https://doi.org/10.1007/s00500-018-3611-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3611-1