Abstract
The multivariate regression model is a mathematical tool for estimating the relationships among some explanatory variables and some response variables. In some cases, observed data are imprecise. In order to model those imprecise data, we can employ uncertainty theory to design the uncertain regression model by regarding those data as uncertain variables. Parameters estimation is an important topic in the uncertain regression model. In this paper, we explore a method of parameters estimation by the principle of least squares in the multivariate uncertain regression model containing more than one response variables and assuming both explanatory variables and response variables as uncertain variables. Besides, when the new explanatory variables are given, we propose an approach to obtain the forecast value and the confidence interval of the response variables. At last, a numerical example of the multivariate uncertain regression model is showed.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
In order to understand the relationships among lots of factors, people need to impose structure on those factors. Usually, we build a regression model to describe how the changes in some variables (explanatory variables) affect other variables (response variables). If the regression model contains only one response variable, we call the model multiple regression model. Furthermore, if we want to study the relationships among explanatory variables and more than one response variables, maybe we have two methods. One is that we can establish a multiple regression model for each response variable and all explanatory variables, and consider those models independently. The other is that we can design a multivariate regression model including all response variables and explanatory variables, and take the relationships among response variables into consider. Perhaps the latter one is better. Because we often meet with cases that the correlation of the response variables is high. For example, we want to study the relationships among systolic blood pressure, diastolic blood pressure of a patient (response variables) and his gender, body temperature, heart rate (explanatory variables). Since systolic blood pressure is highly correlated with diastolic blood pressure, it is inconclusive to separate them. Thus, multivariate regression model is more reasonable. In fact, the very reason why we employ a multivariate model is to incorporate the relationships of response variables.
In statistical domain, the relationship among each response variable and explanatory variables is expressed by a function, thus called functional relationship. For example, in the multivariate linear regression model, all functions are linear. Generally speaking, in the multivariate regression model, the functional relationships should be determined in advance through people’s experience although there are unknown parameters in those functions. An experienced simple model not only is easier to remember but also can inspire new idea. The process of modeling is the process of understanding the world. In statistics, the most widely used model is linear model. In Galton (1886), firstly proposed “regression” for a simple linear regression model to study the relationship between children’s height and parents’ height. Twelve years later, Yule (1897) introduced the regression into statistical domain. In Fama et al. (1969), used event study methodology in the multivariate regression model to study the effect of new information on asset prices. In recent years, Ganesh (2018) presented individual regression method to predict the \(\mathrm{PM}_{2.5}\) concentration. Krishnamurthy (2019) using support vector regression to calculate the Lyapunov exponents of short time series.
In the multivariate regression model, it is vital for us to estimate unknown parameters based on given observations. The multivariate least squares estimation is the most widely used estimation method, generalized by Aitken (1935) and developed by Watson (1967). Besides, the multivariate least absolute estimation (Gentle 1977; Bilodeau and Brenner 1999), maximum likelihood estimation (Anderson 1951) and least distance estimation (Bai et al. 1990) are other common methods of point estimation. However, those methods do not consider the relationships among the response variables. In order to take the relationships into consider, Breiman and Friedman (1997) proposed the restrained multivariate least squares estimation by canonical analysis, and Jhun and Choi (2009) presented the bootstrapping least distance estimation in the multivariate regression model.
Note that explanatory variables and the response variables in traditional regression model are assumed to be observed precisely. However, the observations are unable to be precise in some cases. For example, the data of the factories’ carbon emission or the social benefit of factories during some time are collected in an imprecise way. How do we model those imprecise data? Liu (2012) suggested to employ uncertainty theory to model the imprecisely observed data given by the domain experts. Uncertainty theory was founded by Liu (2007) and developed by Liu (2009) based on normality, duality, subadditivity, and product axioms in order to deal with the belief degree with human uncertainty. The regression model based on uncertainty theory is called uncertain regression model. And in uncertain regression model, those imprecisely observed data are regarded as uncertain variables. It is an important topic to estimate the unknown parameters in the uncertain regression model. On the one hand, for the uncertain multiple regression model including only one response variable, many scholars have proposed lots of methods such as the least squares estimation (Yao and Liu 2018), the least absolute deviations estimation (Liu and Yang 2019), and the maximum likelihood estimation (Lio and Liu 2019). In addition, Lio and Liu (2018) explored the interval estimation to predict response variables. On the other hand, if the uncertain regression model includes more than one response variable, we call it multivariate uncertain regression model. Song and Fu (2018) applied the least squares estimation in multivariate uncertain regression model where only the observed data of response variables are imprecise. In this paper, we aim to study multivariate uncertain regression model where the observed data of both explanatory variables and response variables are imprecise. Our work mainly includes parameters estimation, residual analysis, forecast value and confidence interval.
The rest of the paper is organized as follows: In Sect. 2, we propose the multivariate uncertain regression model and estimate the parameters in the model. In Sect. 3, we analyze the residual based on those estimations. In Sect. 4, confidence interval is suggested to forecast the response variables when new explanatory variables are given. In Sect. 5, we provide an example to show the application of the multivariate uncertain regression model. At last, some conclusions are made in Sect. 6.
2 Multivariate uncertain regression model
Assume \((x_1,x_2,\ldots ,x_p)\) is a vector of explanatory variables and \((y_1,y_2,\ldots ,y_q)\) is a vector of response variables. The functional relationships between \(y_j\) and \(x_1,\) \(x_2,\ldots ,x_p\) are assumed to be expressed by the multivariate regression model
where \({\varvec{\beta }}_{{\varvec{j}}}=(\beta _{0j},\beta _{1j},\ldots ,\beta _{pj})^T\) are vectors of unknown parameters and \(\varepsilon _j\) are disturbance terms for \(j=1,2,\) \(\ldots ,q\).
In traditional model, we assume that both \((x_1,x_2,\) \(\ldots ,x_p)\) and \((y_1,y_2,\ldots ,y_q)\) are precisely observational. However, the observations we can obtain are imprecise in some cases. And thus, those observations should be characterized as uncertain variables. Assume that we have the observed data of \(x_1,x_2,\ldots ,x_p\) and \(y_1,y_2,\ldots ,\) \(y_q\) as follows,
where \({\widetilde{x}}_{i1},{\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}\) are uncertain variables with uncertainty distributions \(\varPhi _{i1},\varPhi _{i2},\ldots ,\varPhi _{ip},\varPsi _{i1},\) \(\varPsi _{i2},\ldots ,\varPsi _{iq}\) for \(i=1,2,\ldots ,n\), respectively. For simplicity, denote
The solution of the minimization problem
is the least squares estimate of \({\varvec{\beta }}\) in the multivariate regression model (1). Denote the optimal solution by
Then the fitted regression model is
Theorem 1
Suppose the imprecisely observed data \({\widetilde{x}}_{i1},\) \({\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}, i=1,2,\ldots ,n\) are independent uncertain variables. And assume those uncertain variables \({\widetilde{x}}_{i1},{\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}\) have regular uncertainty distributions \(\varPhi _{i1},\varPhi _{i2},\ldots ,\varPhi _{ip},\varPsi _{i1},\varPsi _{i2},\) \(\ldots ,\varPsi _{iq},i=1,2,\ldots ,n\), respectively. Then the least squares estimate of
in the multivariate linear regression model
is the solution of the following problem:
where
for \(i=1,2,\ldots ,n,j=1,2,\ldots ,q\) and \(k=1,2,\ldots ,p\).
Proof
The least squares estimate of \({\varvec{\beta }}\) in the linear regression model is the optimal solution of the following problem,
Since \({\widetilde{x}}_{i1},{\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq},i=1,2,\ldots ,n\) are independent, we can obtain that \({\widetilde{y}}_{ij}- \beta _{0j}-\sum _{k=1}^p \beta _{kj}{\widetilde{x}}_{ik}\) have the inverse uncertainty distributions
for \(i=1,2,\ldots ,n,j=1,2,\ldots ,q\), respectively. Thus,
\(i=1,2,\ldots ,n,j=1,2,\ldots ,q\). Therefore, the least squ-
ares estimate of \({\varvec{\beta }}\) in the multivariate linear regression model is the solution of the minimization problem,
The theorem is proved. \(\square \)
Theorem 2
Suppose the imprecisely observed data \({\widetilde{x}}_{i1},\) \({\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}, i=1,2,\ldots ,n\) are independent uncertain variables. And assume those uncertain variables \({\widetilde{x}}_{i1},{\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}\) have regular uncertainty distributions \(\varPhi _{i1},\varPhi _{i2},\ldots ,\varPhi _{ip},\varPsi _{i1},\varPsi _{i2},\) \(\ldots ,\varPsi _{iq}, i=1,2,\ldots ,n\), respectively. Then the least squares estimate of
in the multivariate asymptotic regression model
is the optimal solution of the following problem:
Proof
The least squares estimate of \({\varvec{\beta }}\) in the asymptotic regression model is actually the optimal solution of the minimization problem,
Since \({\widetilde{x}}_i,{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq},i=1,2,\ldots ,n\) are independent, we can obtain that \(\displaystyle {\widetilde{y}}_{ij}-\beta _{0j}+\beta _{1j}\exp {(-\beta _{2j}{\widetilde{x}}_i)}\) have the inverse uncertainty distributions
for \(i=1,2,\ldots ,n,j=1,2,\ldots ,q\), respectively. Thus,
\(i=1,2,\ldots ,n,j=1,2,\ldots ,q\). Therefore, the least squares estimate of \({\varvec{\beta }}\) in the multivariate asymptotic regression model is the solution of the minimization problem,
The theorem is proved. \(\square \)
Theorem 3
Suppose the imprecisely observed data \({\widetilde{x}}_{i1},\) \({\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}, i=1,2,\ldots ,n\) are independent uncertain variables. And assume those uncertain variables \({\widetilde{x}}_{i1},{\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}\) have regular uncertainty distributions \(\varPhi _{i1},\varPhi _{i2},\ldots ,\varPhi _{ip},\varPsi _{i1},\varPsi _{i2},\) \(\ldots ,\varPsi _{iq}, i=1,2,\ldots ,n\), respectively. Then the least squares estimate of
in the multivariate Michaelis-Menten regression model
is the optimal solution of the following problem:
Proof
The least squares estimate of \({\varvec{\beta }}\) in the Michaelis-Menten regression model is the optimal solution of the minimization problem,
Since \({\widetilde{x}}_i,{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}\) are independent, we can obtain that \(\displaystyle {\widetilde{y}}_{ij}-\frac{\beta _{1j}{\widetilde{x}}_i}{\beta _{2j}+{\widetilde{x}}_i}\) have the inverse uncertainty distributions
for \(i=1,2,\ldots ,n,j=1,2,\ldots ,q\), respectively. Thus,
\(i=1,2,\ldots ,n,j=1,2,\ldots ,q\). Therefore, the least squares estimate of \({\varvec{\beta }}\) in the multivariate Michaelis-Menten regression model is the solution of the minimization problem,
The theorem is proved. \(\square \)
3 Multivariate residual analysis
In the regression model (1), there is a disturbance term \({\varvec{\varepsilon }}=(\varepsilon _1, \varepsilon _2, \ldots , \varepsilon _q)^{\mathrm{T}}\). It is difficult to discover the disturbance term \({\varvec{\varepsilon }}\) exactly since the term changes for each observation. Thus, we are concerned about how to estimate \({\varvec{\varepsilon }}\) based on imprecisely observed data, \({\widetilde{x}}_{i1},{\widetilde{x}}_{i2},\ldots ,\) \({\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq},i=1,2,\ldots ,n\).
Definition 1
Assume the fitted regression model is
and \({\widetilde{x}}_{i1},{\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}, i=1,2,\ldots ,n\) are imprecisely observed data. Then we call
the i-th residual for each i \((i=1,2,\ldots ,n)\).
Suppose that the disturbance term \({\varvec{\varepsilon }}=(\varepsilon _1, \varepsilon _2, \ldots , \) \(\varepsilon _q)^{\mathrm{T}}\) is an uncertain vector. Then, for each j \((j=1,2,\ldots ,q)\), we use the average of the expected values of residuals, i.e.,
to estimate the expected values of the disturbance term \(\varepsilon _j\), and
to estimate the variances. Then, we call
the vectors of the estimated expected values and estimated variances of disturbance term \({\varvec{\varepsilon }}\), respectively.
Theorem 4
Suppose the imprecisely observed data \({\widetilde{x}}_{i1},\) \({\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}, i=1,2,\ldots ,n\) are independent uncertain variables. And assume those uncertain variables \({\widetilde{x}}_{i1},{\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}\) have regular uncertainty distributions \(\varPhi _{i1},\varPhi _{i2},\ldots ,\varPhi _{ip},\varPsi _{i1},\varPsi _{i2},\) \(\ldots ,\varPsi _{iq}, i=1,2,\ldots ,n\), respectively. Let the fitted multivariate linear regression model be
Then the vector of estimated expected values of the disturbance term \({\varvec{\varepsilon }}\) is
and the vector of estimated variances is
where
for \(i=1,2,\ldots ,n,j=1,2,\ldots ,q\) and \(k=1,2,\ldots ,p\).
Proof
Since the inverse uncertainty distributions of \( {\widetilde{y}}_{ij}- \beta _{0j}^{*}-\sum _{k=1}^p \beta _{kj}^{*}{\widetilde{x}}_{ik}\) are
for \(i=1,2,\ldots ,n,j=1,2,\ldots ,q\), respectively, Theorem 4 holds immediately. \(\square \)
Theorem 5
Suppose the imprecisely observed data \({\widetilde{x}}_{i1},\) \({\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}, i=1,2,\ldots ,n\) are independent uncertain variables. And assume those uncertain variables \({\widetilde{x}}_{i1},{\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}\) have regular uncertainty distributions \(\varPhi _{i1},\varPhi _{i2},\ldots ,\varPhi _{ip},\varPsi _{i1},\varPsi _{i2},\) \(\ldots ,\varPsi _{iq}, i=1,2,\ldots ,n\), respectively. Let the fitted multivariate asymptotic regression model be
Then the vector of estimated expected values of the disturbance term \({\varvec{\varepsilon }}\) is
and the vector of estimated variances is
Proof
Since the inverse uncertainty distributions of \(\displaystyle {\widetilde{y}}_{ij}-\beta _{0j}^{*}+\beta _{1j}^{*}\exp {(-\beta _{2j}^{*}{\widetilde{x}}_i)}\) are
for \(i=1,2,\ldots ,n,j=1,2,\ldots ,q\), respectively, Theorem 5 holds immediately. \(\square \)
Theorem 6
Suppose the imprecisely observed data \({\widetilde{x}}_{i1},\) \({\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}, i=1,2,\ldots ,n\) are independent uncertain variables. And assume those uncertain variables \({\widetilde{x}}_{i1},{\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}\) have regular uncertainty distributions \(\varPhi _{i1},\varPhi _{i2},\ldots ,\varPhi _{ip},\varPsi _{i1},\varPsi _{i2},\) \(\ldots ,\varPsi _{iq}, i=1,2,\ldots ,n\), respectively. Let the fitted multivariate Michaelis–enten regression model be
Then the vector of estimated expected values of the disturbance term \({\varvec{\varepsilon }}\) is
and the vector of estimated variances is
Proof
Since the inverse uncertainty distributions of \(\displaystyle {\widetilde{y}}_{ij}-\frac{\beta _{1j}{\widetilde{x}}_i}{\beta _{2j}+{\widetilde{x}}_i}\) are
for \(i=1,2,\ldots ,n,j=1,2,\ldots ,q\), respectively, Theorem 6 holds immediately. \(\square \)
4 Forecast value and confidence interval
In Sects. 2 and 3, we obtain the least squares estimate \({\varvec{\beta }}^{*}\) and the estimations of the expected value \({\hat{{\varvec{e}}}}\) and the variance \(\hat{{\varvec{\sigma }}}^{{\varvec{2}}}\) of disturbance term \({\varvec{\varepsilon }}\) based on imprecisely observed data \(({\widetilde{x}}_{i1},{\widetilde{x}}_{i2},\ldots ,{\widetilde{x}}_{ip},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},\ldots ,{\widetilde{y}} _{iq}),i=1,\) \(2,\ldots ,n\). Based on the work, we are interested in forecasting the response vector for the new explanatory vector. Assume that \({\varvec{{\widetilde{x}}}}=({\widetilde{x}}_{1}, {\widetilde{x}}_{2}, \ldots , {\widetilde{x}}_{p})^{\mathrm{T}}\) is a vector of new explanatory variables where \({\widetilde{x}}_{1},{\widetilde{x}}_{2},\ldots ,{\widetilde{x}}_{p}\) are independent uncertain variables. Suppose those uncertain variables \({\widetilde{x}}_{1},{\widetilde{x}}_{2},\ldots ,{\widetilde{x}}_{p}\) have regular uncertainty distributions \(\varPhi _1,\varPhi _2,\ldots ,\varPhi _p\), respectively. Although the relationship between uncertain explanatory vector and the uncertain response vector may be complicated, it is still valuable and useful to apply linear regression model for the data. Suppose that the fitted linear regression model is
and the disturbance term \({\varvec{\varepsilon }}\) has the estimated expected value \({\hat{{\varvec{e}}}}\) and variance \(\hat{{\varvec{\sigma }}}^{{\varvec{2}}}\), and is independent of \({\widetilde{x}}_{1},{\widetilde{x}}_{2},\) \(\ldots ,{\widetilde{x}}_{p}\). Then the forecast uncertain vector of \({\varvec{y}}=(y_1,\) \( y_2, \ldots , y_q)^{\mathrm{T}}\) with respect to \(({\widetilde{x}}_{1},{\widetilde{x}}_{2},\ldots ,{\widetilde{x}}_{p})\) is determined by
For each j \((j=1,2,\ldots ,q)\), a single value of \(y_j\) should be estimated from the forecast uncertain vector, and it is natural to define the forecast value of \(y_j\) as
which is the expected value of the forecast uncertain variable \({\hat{y}}_j\). Then we write
as the forecast value of \({\varvec{y}}\).
Furthermore, in Eq. (4), if we assume that the disturbances \(\varepsilon _1,\varepsilon _2,\ldots ,\varepsilon _q\) are identically distributed, but their expected values and variances differ across equation. Especially, if \(\varepsilon _1,\varepsilon _2,\ldots ,\varepsilon _q\) are normal uncertain variables \({\mathcal {N}}({\hat{e}}_1,{\hat{\sigma }}_1),{\mathcal {N}}({\hat{e}}_2,{\hat{\sigma }}_2),\ldots ,{\mathcal {N}}({\hat{e}}_q,{\hat{\sigma }}_q)\), respectively, then the inverse uncertainty distributions of \({\hat{y}}_j\) are determined by
where
and \(\phi _j^{-1}(\alpha )\) are the inverse uncertainty distributions of \(\varepsilon _j\), i.e.,
for \(j=1,2,\ldots ,q,k=1,2,\ldots ,p\), respectively. Then the uncertainty distributions \({\hat{\varPsi }}_{j}\) of \({\hat{y}}_j\) can be obtained by \({\hat{\varPsi }}_{j}^{-1}\), \(j=1,2,\ldots ,q\), respectively.
For each j \((j=1,2,\ldots ,q)\), the forecast value \(\mu _j\) is a point estimation of \(y_j\). However, it is not convincing to claim that the value of each component of uncertain vector \({\varvec{y}}\) is always a precise value. In fact, the point estimates are hard to convince people that they are accurate. If the estimation is a range, like \(3\sim 4\), it is more convincing. Although the value range is wider, the reliability is obviously higher. Thus, we propose the confidence interval to estimate \({\varvec{y}}\).
Taking \(\alpha \) (e.g., 95\(\%\)) as a confidence level, we are interested in finding the minimum values \(b_j\) such that
\(j=1,2,\ldots ,q\), respectively. Since
the \(\alpha \) confidence intervals of \(y_j\) are suggested as \([\mu _j-b_j,\mu _j+b_j]\), which can be abbreviated as \(\mu _j\pm b_j\), \(j=1,2,\ldots ,q\), respectively. Denote
Then, the \(\alpha \) confidence interval of \({\varvec{y}}\) is written as
which represents the set
and we have a chance of \(\alpha \) to cover \({\varvec{y}}\) with our confidence interval.
5 Numerical example
In this section, we design a numerical example to show the estimation of unknown parameters, residual analysis, forecast value and confidence interval in multivariate uncertain regression model.
Consider the linear regression model
Denote
Suppose imprecisely observed data \({\widetilde{x}}_{i1},{\widetilde{x}}_{i2},{\widetilde{x}}_{i3},{\widetilde{x}}_{i4},{\widetilde{y}} _{i1},\) \({\widetilde{y}} _{i2},{\widetilde{y}} _{i3}\), \(i=1,2,\ldots ,21\) are independent uncertain variables, where \({\widetilde{x}}_{i1},{\widetilde{x}}_{i2},{\widetilde{x}}_{i3},{\widetilde{x}}_{i4},{\widetilde{y}} _{i1},{\widetilde{y}} _{i2},{\widetilde{y}} _{i3}\) have linear uncertainty distributions, \(\varPhi _{i1},\varPhi _{i2},\varPhi _{i3},\varPhi _{i4},\varPsi _{i1},\varPsi _{i2},\varPsi _{i3}\), respectively. See the data in Table 1.
First, we estimate the unknown parameters. That is, we should solve the following problem,
It follows from Theorem 1 that the equation (7) can be transformed to an equivalent form
where
for \(i=1,2,\ldots ,21,j=1,2,\ldots ,3\) and \(k=1,2,\ldots ,4\). Then we can obtain the least squares estimate
Thus, the fitted linear multivariate regression model is
It follows from Theorem 4 that we obtain the vectors of the estimated expected values and estimated variances of disturbance term, i.e.,
respectively. Now assume
is a new uncertain explanatory vector and \({\widetilde{x}}_{1},{\widetilde{x}}_{2},{\widetilde{x}}_{3},{\widetilde{x}}_{4},\) \(\varepsilon _1,\varepsilon _2,\varepsilon _3\) are independent. Then the forecast uncertain vector of \({\varvec{y}}=(y_1, y_2, y_3 )^{\mathrm{T}}\) is
Hence, it follows from equation (5) that the forecast value of \({\varvec{y}}\) is
In order to obtain confidence interval of \({\varvec{y}}\), we take a confidence level \(\alpha =95\%\) and suppose \(\varepsilon _1,\varepsilon _2,\varepsilon _3\) are normal uncertain variables \({\mathcal {N}}({\hat{e}}_1,{\hat{\sigma }}_1),{\mathcal {N}}({\hat{e}}_2,{\hat{\sigma }}_2),{\mathcal {N}}({\hat{e}}_3,{\hat{\sigma }}_3)\), respectively. It follows from equation (6) that we can calculate
such that for each j \((j=1,2,3)\), \(b_j\) is the minimum value satisfying
where \({\hat{\varPsi }}_{j}\) is the uncertainty distribution of \({\hat{y}}_j\). Thus, the \(95\%\) confidence interval of the vector of the response variables \({\varvec{y}}\) is
6 Conclusions
This paper is aimed at studying the multivariate uncertain regression model that contains more than one response variables and assumes both explanatory variables and response variables as uncertain variables since the observed data are imprecise in some cases. Based on those data, we estimate unknown parameters by the principle of least squares in the different multivariate regression model, such as multivariate linear regression model, multivariate asymptotic regression model and multivariate Michaelis-Menten regression model. In order to analyze the disturbance terms in the model, we propose the concepts of residuals, and design the vectors of estimated expected values and variances of disturbance terms. Furthermore, it is significant to forecast the response variables when a set of new explanatory variables is given. In the future, we will try to take the relationships among the response variables into consider.
References
Aitken A (1935) On least squares and linear combinations of observations. Proc R Soc Edinb 55:42–48
Anderson T (1951) Estimating linear restrictions on regression coefficients for multivariate normal distributions. Ann Math Stat 22(3):327–351
Bai Z, Chen X, Miao B, Radhakrishna R (1990) Asymptotic theory of least distances estimate in multivariate linear models. Statistics 21(4):503–519
Bilodeau M, Brenner D (1999) Theory of multivariate statistics. Springer, New York
Breiman L, Friedman J (1997) Predicting multivariate responses in multiple linear regression. J R Stat Soc 59(1):3–54
Fama E, Fisher L, Jensen M, Roll R (1969) The adjustment of stock prices to new information. International Economic Review 10(1):1–21
Galton F (1886) Regression towards mediocrity in hereditary stature. J Anthropol Inst GB Ireland 15:246–263
Ganesh S, Arulmozhivarman P, Tatavarti V (2018) Prediction of \({\rm PM}_{2.5}\) using an ensemble of artificial neural networks and regression models. J Ambient Intell Humaniz Comput 8:1–11
Gentle J (1977) Least absolute values estimation: an introduction. Commun Stat Simul Comput 6(4):313–328
Jhun M, Choi I (2009) Bootstrapping least distance estimator in the multivariate regression model. Comput Stat Data Anal 53(12):4221–4227
Krishnamurthy K, Manoharan S, Swaminathan R A Jacobian approach for calculating the Lyapunov exponents of short time series using support vector regression. J Ambient Intell Hum Comput. https://doi.org/10.1007/s12652-019-01525-6
Lio W, Liu B (2018) Residual and confidence interval for uncertain regression model with imprecise observations. J Intell Fuzzy Syst 35(2):2573–2583
Lio W, Liu B (2019) Maximum likelihood estimation for uncertain regression analysis. Technical Report
Liu B (2007) Uncertainty theory, 2nd edn. Springer, Berlin
Liu B (2009) Some research problems in uncertainty theory. J Uncert Syst 3(1):3–10
Liu B (2012) Why is there a need for uncertainty theory. J Uncert Syst 6(1):3–10
Liu Z, Yang Y (2019) Least absolute deviations estimation for uncertain regression with imprecise observations. Fuzzy Optim Decis Mak. https://doi.org/10.1007/s10700-019-09312-w
Song Y, Fu Z (2018) Uncertain multivariable regression model. Soft Comput 22(17):5861–5866
Watson G (1967) Linear least squares regression. Ann Math Stat 38(6):1679–1699
Yao K, Liu B (2018) Uncertain regression analysis: an approach for imprecise observations. Soft Comput 22(17):5579–5582
Yule G (1897) On the theory of correlation. J R Stat Soc 60(4):812–854
Acknowledgements
This work was supported by the National Natural Science Foundation of China of No. 61873329.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ye, T., Liu, Y. Multivariate uncertain regression model with imprecise observations. J Ambient Intell Human Comput 11, 4941–4950 (2020). https://doi.org/10.1007/s12652-020-01763-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-01763-z