1 Introduction

Errors-in-variables (EIV) models and total least squares (TLS) have been substantially investigated theoretically for more than a century (see, e.g., Adcock 1877; Kummell 1879; Pearson 1901; Deming 1931, 1934; Gerhold 1969; Golub and Loan 1980; Huffel and Vandewalle 1991; Markovsky and Huffel 2007; Schaffrin and Wieser 2008; Xu et al. 2012). Recently, they have been applied to solve a wide variety of science and engineering problems (see, e.g., Huffel and Vandewalle 1991; Markovsky and Huffel 2007; Schaffrin and Wieser 2008; Schaffrin and Felus 2009). The TLS method is designed to optimally estimate the unknown parameters \({\varvec{\beta }}\) in the following (linear or linearized) EIV model:

$$\begin{aligned} \mathbf {y} = \mathbf {A} {\varvec{\beta }} + {\varvec{\epsilon }}, \end{aligned}$$
(1)

(see, e.g., Seber and Wild 1989; Huffel and Vandewalle 1991), where both \(\mathbf {y}\) and \(\mathbf {A}\) are measured with random errors, collected in an \(n\)-dimensional vector \({\varvec{\epsilon }}\) and an \((n\times m)\) matrix \({\varvec{\epsilon }}_A\), respectively, \({\varvec{\beta }}\) is an \(m\)-dimensional deterministic vector of unknown parameters to be estimated. If \(\mathbf {A}\) in the EIV model (1) is deterministically given, then (1) returns to a standard linear model (see, e.g., Searle 1971).

When TLS is used to estimate the unknown vector \({\varvec{\beta }}\) in (1), one simultaneously considers the stochastic nature of both measurements \(\mathbf {y}\) and \(\mathbf {A}\) instead of treating \(\mathbf {A}\) as if it were fixed. However, the sense of TLS optimality has been interpreted differently, either under the framework of approximation theory or from the point of view of statistical estimation theory. The former may be referred to Golub and Loan (1980) and the generalized TLS method by Huffel and Vandewalle (1989) and Markovsky and Huffel (2007), because the so-called positive definite matrix of weighting there is not the inverse of a variance–covariance matrix in the statistical sense, as pointed out by Schaffrin and Wieser (2008) (see also Xu et al. 2012). In this paper, we will affirm to interpret the TLS approach from the statistical point of view. Under the statistical framework, one often assumes zero means for both the random errors of \(\mathbf {y}\) and \(\mathbf {A}\) in the EIV model (1). The variance–covariance matrices of the measured data \(\mathbf {y}\) and \(\mathbf {A}\) are either assumed to be \(\mathbf {I}\sigma ^2\) and \(\mathbf {I}_a\sigma ^2\) implicitly (see, e.g., Pearson 1901) or \(\mathbf {W}\!_y^{-1}\sigma ^2\) and \(\mathbf {W}\!_a^{-1}\sigma ^2\) explicitly (see, e.g., Deming 1931, 1934; Gerhold 1969; Schaffrin and Wieser 2008; Xu et al. 2012), where \(\mathbf {I}_a\) is an identity matrix with the dimension corresponding to the vectorized error matrix of \(\mathbf {A}\), both \(\mathbf {W}\!_y\) and \(\mathbf {W}\!_a\) are the given weight matrices of \(\mathbf {y}\) and \(\mathbf {A}\), respectively. Here \(\sigma ^2\) is a positive (unknown) scalar, which is always called the variance of unit weight in the geodetic literature. In general, \(\mathbf {y}\) is also assumed to be stochastically independent of \(\mathbf {A}\), though such an assumption is not absolutely necessary.

In practice, we can often encounter problems in physical, statistical and engineering sciences in which: (i) \(\mathbf {y}\) and \(\mathbf {A}\) in (1) are not necessarily of the same types of measurements. In this case, it is inappropriate and/or unreasonable to assume that we know the variance–covariance matrices for \(\mathbf {y}\) and \(\mathbf {A}\) up to an unknown but identical positive scalar \(\sigma ^2\); and (ii) the elements within each of \(\mathbf {y}\) and \(\mathbf {A}\) are not necessarily of the same types of measurements either. Even if we assume that \(\mathbf {y}\) (and/or \(\mathbf {A}\)) is of the same type of measurements, they may be measured with instruments of different accuracy. For problems of this kind, the corresponding stochastic model should be more appropriately described as follows:

$$\begin{aligned}&{\varvec{\Sigma }}_y = \sum \limits _{i=1}^{m_y} \mathbf {U}_{iy}\sigma _{iy}^2, \end{aligned}$$
(2a)
$$\begin{aligned}&{\varvec{\Sigma }}_a =\sum \limits _{i=1}^{m_a} \mathbf {U}\!_{ia} \sigma _{ia}^2, \end{aligned}$$
(2b)

where \({\varvec{\Sigma }}_y\) and \({\varvec{\Sigma }}_a\) are the variance–covariance matrices of \(\mathbf {y}\) and \(\mathrm vec (\mathbf {A})\), respectively. Here \(\mathrm vec (\mathbf {A})\) stands for the vectorized operation of \(\mathbf {A}\) (see, e.g., Magnus and Neudecker 1988). All the \(\mathbf {U}_{iy}\) and \(\mathbf {U}\!_{ia}\) are known and positive semi-definite. \(\sigma _{iy}^2 \, (i=1,2,\ldots ,m_y)\) and \(\sigma _{ia}^2 \, (i=1,2,\ldots ,m_a)\) are called (unknown) variance components of \(\mathbf {y}\) and \(\mathrm vec (\mathbf {A})\), respectively. One may also further consider the correlation between \(\mathbf {y}\) and \(\mathbf {A}\) in the stochastic model (2) (see, e.g., Fang 2011; Snow 2012). Since the inclusion of correlation may not create additional theoretical difficulty, it will not be further pursued in this paper.

Although the general EIV model (1) with the stochastic model (2) should be more appropriate to describe a wide range of real-life EIV problems, little has ever been done to simultaneously address the model parameters and the variance components. Limited related work has been done to estimate both parameters and variance components in spatial and generalized mixed effects models with measurement errors using the maximum likelihood and quasi-likelihood methods (Wang et al. 1998; Li et al. 2009) and the restricted maximum likelihood and pseudo-likelihood methods (Wang and Davidian 1996). Since the EIV model (1) is not the starting model of these publications, these researchers used likelihood-based methods instead of the TLS principle.

In this paper, we will focus on the weighted TLS estimation of the unknown vector \({\varvec{\beta }}\) in the functional EIV model (1) and the estimation of both the variance components of \({\varvec{\epsilon }}\) and \({\varvec{\epsilon }}_A\) in the stochastic model (2). Obviously, the stochastic model (2) is significantly different from any conventional variance component models in the sense that there exist no direct redundant observations of \(\mathbf {A}\) for an independent assessment of \({\varvec{\epsilon }}_A\). From this point of view, we are interested in investigating the following questions: (i) whether the variance components of (2) can be estimated; (ii) does the estimation of variance components exhibit any special numerical and/or statistical behavior; and (iii) Wang and Davidian (1996), Wang et al. (1998) and Li et al. (2009) performed an asymptotic bias analysis of the so-called naive estimator of variance components of the measurements \(\mathbf {y}\) for spatial and generalized mixed effects models with measurement errors on the basis of two assumptions: 1) by simply ignoring the measurement errors of \(\mathbf {A}\) as if they were known exactly; and 2) by assuming that the number of measurements \(\mathbf {y}\) tends to infinity. They did not derive the finite sample biases for the likelihood-based estimated variance components. Unlike Wang and Davidian (1996), Wang et al. (1998) and Li et al. (2009), our emphasis in this paper will be to perform a finite sample bias analysis for the estimated variance components in connection with the EIV models (1) and (2) by fully taking the random errors of \(\mathbf {A}\) into account.

To answer the questions posed above, we will start with quadratic forms of the residuals of the measured data \(\mathbf {y}\) and \(\mathbf {A}\) to estimate the variance components. The paper is thus organized as follows. Section 2 will first reformulate the EIV model as a nonlinear adjustment problem and use the weighted TLS method to estimate the parameters \({\varvec{\beta }}\). Different solutions methods will be presented. In Sect. 3, we will first discuss adapting variance component estimation to nonlinear models in general and then adapting the minimum norm quadratic unbiased estimation (MINQUE) method to the nonlinear EIV model in particular. Two special structures of variance components will also be discussed. We will show in Sect. 4 that a certain structure of EIV variance components that is commonly assumed and often encountered in practice is not estimable. In other cases, if the variance components are estimable, the estimation of variance components in the EIV models (1) and (2) could become unstable. Instability will then be demonstrated through numerical simulations. To warrant a high quality and stable estimation of variance components, regularization may be needed, which deserves a completely separate paper to discuss and will not be pursued here any further. Unlike the asymptotic bias analysis by Wang and Davidian (1996), Wang et al. (1998) and Li et al. (2009), we will derive the finite sample biases of the estimated variance components in Sect. 5, provided they are estimable.

2 Parameter estimation

2.1 Reformulation of the EIV models (1) and (2) as a nonlinear Gauss–Markoff model with variance components

To investigate the variance component estimation of the functional EIV model (1), we follow Xu et al. (2012) to equivalently rewrite the functional EIV model (1) as the following nonlinear Gauss–Markoff model:

$$\begin{aligned} \mathbf {y} = \overline{\mathbf {A}} {\varvec{\beta }} + {\varvec{\epsilon }},\end{aligned}$$
(3a)
$$\begin{aligned} \mathbf {A}=\overline{\mathbf {A}}+{\varvec{\epsilon }}_A, \end{aligned}$$
(3b)

which can also be equivalently rewritten in vector form as follows:

$$\begin{aligned}&\mathbf {y} = ({\varvec{\beta }}^T\otimes \mathbf {I}_n) \overline{\mathbf {a}} + {\varvec{\epsilon }},\end{aligned}$$
(4a)
$$\begin{aligned}&\mathbf {a}=\overline{\mathbf {a}}+{\varvec{\epsilon }}_a, \end{aligned}$$
(4b)

or in an even more compact form:

$$\begin{aligned} \left[ \begin{array}{c} \mathbf {y} \\ \mathbf {a} \end{array} \right] = \left[ \begin{array}{c} ({\varvec{\beta }}^T\otimes \mathbf {I}_n) \overline{\mathbf {a}} \\ \overline{\mathbf {a}} \end{array} \right] + \left[ \begin{array}{c} {\varvec{\epsilon }} \\ {\varvec{\epsilon }}_a \end{array} \right] , \end{aligned}$$
(5)

where \(\overline{\mathbf {A}}\) is the expectation of \(\mathbf {A}\) in the sense that each element of \(\overline{\mathbf {A}}\) is the expectation of the corresponding element of \(\mathbf {A}\), and the elements of \({\varvec{\epsilon }}_A\) are the random errors of the observed matrix \(\mathbf {A}\), \(\otimes \) stands for the Kronecker product, \(\mathbf {I}_n\) is an \((n\,\times \, n)\) identity matrix, \(\mathbf {a}=\mathrm vec (\mathbf {A})\), \(\overline{\mathbf {a}}=\mathrm vec (\overline{\mathbf {A}})\) and \({\varvec{\epsilon }}_a=\mathrm vec ({\varvec{\epsilon }}_A)\). Schaffrin and Snow (2010) reformulated the functional EIV model (1) as a system of nonlinear Gauss-Helmert condition equations. It is obvious from (4a) that the expectation of \(\mathbf {y}\) is a vector of nonlinear functions of both \({\varvec{\beta }}\) and \(\overline{\mathbf {a}}\). In case that the elements of \(\mathbf {A}\) are not functionally independent or that some of its elements are deterministic, we can follow Xu et al. (2012) to reformulate the functional EIV model (1) as a partial EIV model.

In the remainder of this paper, we will focus on the nonlinear model (3) or (5). The stochastic model for \({\varvec{\epsilon }}\) and \({\varvec{\epsilon }}_a\) is the same as formulated in (2), which can be equivalently represented in one matrix for both the measurements \(\mathbf {y}\) and \(\mathbf {A}\) as follows

$$\begin{aligned} {\varvec{\Sigma }}&= \left[ \begin{array}{cc} {\varvec{\Sigma }}_y &{} \mathbf {0} \\ \mathbf {0} &{} {\varvec{\Sigma }}_a \end{array} \right] \nonumber \\&= \left[ \begin{array}{cc} \sum \limits _{i=1} ^{m_y} \mathbf {U}_{iy} \sigma _{iy}^2 &{} \mathbf {0} \\ \mathbf {0} &{} \sum \limits _{i=1} ^{m_a} \mathbf {U}\!_{ia} \sigma _{ia}^2 \end{array} \right] \nonumber \\&= \sum \limits _{i=1} ^{m_y} \left[ \begin{array}{cc} \mathbf {U}_{iy} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {0} \end{array} \right] \sigma _{iy}^2 + \sum \limits _{i=1}^{m_a} \left[ \begin{array}{cc} \mathbf {0} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {U}_{ia} \end{array} \right] \sigma _{ia}^2 \nonumber \\&= \sum \limits _{i=1} ^{m_y} \mathbf {U}_i\sigma _{iy}^2 +\sum \limits _{i=1} ^{m_a} \mathbf {U}_i\sigma _{ia}^2 \nonumber \\&= \sum \limits _{i=1} ^{m_y+m_a}\mathbf {U}_i\sigma _i^2, \end{aligned}$$
(6)

where the matrices \(\mathbf {U}_i\) corresponding to \(\sigma _{iy}^2\) and \(\sigma _{ia}^2\) are given, respectively, as follows:

$$\begin{aligned} \mathbf {U}_i = \left[ \begin{array}{cc} \mathbf {U}_{iy} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {0} \end{array} \right] , \end{aligned}$$

for \(i=1,2,\ldots ,m_y\) and

$$\begin{aligned} \mathbf {U}_i = \left[ \begin{array}{cc} \mathbf {0} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {U}_{ia} \end{array} \right] , \end{aligned}$$

for \(i=(1+m_y),(2+m_y),\ldots ,(m_y+m_a)\). In this paper, we will assume that both \({\varvec{\Sigma }}_y\) and \({\varvec{\Sigma }}_a\) are invertible. Nevertheless, this assumption is not really necessary. For example, if \({\varvec{\Sigma }}_a\) is singular, one can follow Xu et al. (2012) to reformulate the original EIV model into a partial EIV model, where the new variance–covariance matrix is only restricted to those independent random elements of \(\mathbf {A}\) and becomes invertible. With the reformulation of (5) and the stochastic model (6), it has become clear that the EIV model (1) can be identically treated mathematically as a nonlinear Gauss–Markoff model with unknown variance components; obviously, this is the most natural way to handle the estimation of variance components in EIV models, since conventional nonlinear estimation theory and methods can be adapted naturally. In other words, one can now freely use any numerical methods to solve for the parameter estimate of the nonlinear model and freely adopt any appropriate methods to estimate the variance components.

2.2 The weighted TLS estimate of the parameters without linearization

A procedure of variance component estimation almost always consists of two iteratively looped steps: one to compute the parameters and the other to compute the variance components. Given a set of initial values \(\sigma _{0iy}^2 \, (i=1,2,\ldots ,m_y)\) and \(\sigma _{0ia}^2 \, (i=1,2,\ldots ,m_a)\) for the unknown variance components in (2), we can solve the following weighted TLS minimization problem:

$$\begin{aligned} \mathrm min: \quad \,\, S({\varvec{\beta }}, \overline{\mathbf {a}})&= [ ({\varvec{\beta }}^T\otimes \mathbf {I}_n) \overline{\mathbf {a}} - \mathbf {y}]^T{\varvec{\Sigma }}_{0y}^{-1}[ ({\varvec{\beta }}^T\otimes \mathbf {I}_n) \overline{\mathbf {a}} - \mathbf {y}]\nonumber \\&+ (\overline{\mathbf {a}} - \mathbf {a})^T{\varvec{\Sigma }}_{0a}^{-1}(\overline{\mathbf {a}} - \mathbf {a}), \end{aligned}$$
(7)

to estimate the parameters \({\varvec{\beta }}\) and \(\overline{\mathbf {a}}\), where \({\varvec{\Sigma }}_{0y}\) and \({\varvec{\Sigma }}_{0a}\) are the initial variance–covariance matrices of \(\mathbf {y}\) and \(\mathbf {a}\), with the unknown variance components replaced by the corresponding given initial values \(\sigma _{0iy}^2 \, (i=1,2,\ldots ,m_y)\) and \(\sigma _{0ia}^2 \, (i=1,2,\ldots ,m_a)\), respectively.

Differentiating the objective function \(S({\varvec{\beta }}, \overline{\mathbf {a}})\) of (7) with respect to \({\varvec{\beta }}\) and \(\overline{\mathbf {a}}\) yields:

$$\begin{aligned} \frac{\partial S({\varvec{\beta }},\overline{\mathbf {a}})}{\partial {\varvec{\beta }}}&= 2 \overline{\mathbf {A}}^{{ T}}{\varvec{\Sigma }}_{0y}^{-1} (\overline{\mathbf {A}}{\varvec{\beta }} - \mathbf {y}),\end{aligned}$$
(8a)
$$\begin{aligned} \frac{\partial S({\varvec{\beta }},\overline{\mathbf {a}})}{\partial \overline{\mathbf {a}}}&= 2 {\varvec{\Sigma }}_{0a}^{-1}(\overline{\mathbf {a}}-\mathbf {a})\nonumber \\&+ 2 ({\varvec{\beta }}\otimes \mathbf {I}_n){\varvec{\Sigma }}_{0y}^{-1}\{({\varvec{\beta }}^T\otimes \mathbf {I}_n)\overline{\mathbf {a}}- \mathbf {y}\}. \end{aligned}$$
(8b)

By equating the partial derivatives (8) to zero, we can obtain the weighted TLS estimates of \({\varvec{\beta }}\) and \(\overline{\mathbf {a}}\), which are, respectively, denoted by \(\hat{{\varvec{\beta }}}\) and \(\hat{\overline{\mathbf {a}}}\) (and accordingly \(\hat{\overline{\mathbf {A}}}\)). They are given as follows:

$$\begin{aligned} \hat{{\varvec{\beta }}}&= \Big [(\hat{\overline{\mathbf {A}}})^T {\varvec{\Sigma }}_{0y}^{-1} \hat{\overline{\mathbf {A}}}\Big ]^{-1} (\hat{\overline{\mathbf {A}}})^T {\varvec{\Sigma }}_{0y}^{-1}\mathbf {y},\end{aligned}$$
(9a)
$$\begin{aligned} \hat{\overline{\mathbf {a}}}&= [{\varvec{\Sigma }}_{0a}^{-1}+\hat{{\varvec{\beta }}}(\hat{{\varvec{\beta }}})^T\otimes {\varvec{\Sigma }}_{0y}^{-1}]^{-1}[ {\varvec{\Sigma }}_{0a}^{-1}\mathbf {a} + \hat{{\varvec{\beta }}}\otimes ({\varvec{\Sigma }}_{0a}^{-1}\mathbf {y})]. \end{aligned}$$
(9b)

Since the dimension of \({\varvec{\Sigma }}_{0a}\) is much larger than that of \({\varvec{\Sigma }}_{0y}\), it is not desirable to directly invert the normal matrix \([{\varvec{\Sigma }}_{0a}^{-1}+\hat{{\varvec{\beta }}}(\hat{{\varvec{\beta }}})^T\otimes {\varvec{\Sigma }}_{0y}^{-1}]\) in (9b). By following the inverse of a summed matrix (Magnus and Neudecker 1988), we can rewrite the inverse of this normal matrix:

$$\begin{aligned}&[{\varvec{\Sigma }}_{0a}^{-1}+\hat{{\varvec{\beta }}}(\hat{{\varvec{\beta }}})^T\otimes {\varvec{\Sigma }}_{0y}^{-1}]^{-1}\\&\quad = {\varvec{\Sigma }}_{0a} - {\varvec{\Sigma }}_{0a}(\hat{{\varvec{\beta }}}\otimes \mathbf {I}_n)\mathbf {E}^{-1} ((\hat{{\varvec{\beta }}})^T\otimes \mathbf {I}_n){\varvec{\Sigma }}_{0a}, \end{aligned}$$

where

$$\begin{aligned} \mathbf {E} ={\varvec{\Sigma }}_{0y} + ((\hat{{\varvec{\beta }}})^T\otimes \mathbf {I}_n){\varvec{\Sigma }}_{0a}(\hat{{\varvec{\beta }}}\otimes \mathbf {I}_n). \end{aligned}$$
(9c)

As a result, (9b) can finally be simplified after some technical derivations as follows

$$\begin{aligned} \hat{\overline{\mathbf {a}}} = \mathbf {a} + {\varvec{\Sigma }}_{0a}(\hat{{\varvec{\beta }}}\otimes \mathbf {I}_n) \mathbf {E}^{-1} ( \mathbf {y} - \mathbf {A}\hat{{\varvec{\beta }}} ). \end{aligned}$$
(9d)

Obviously, (9d) should be much more efficient to compute \(\hat{\overline{\mathbf {a}}}\) than (9b), in conjunction with (9a).

The weighted TLS estimates of \({\varvec{\beta }}\) and \(\overline{\mathbf {a}}\) in (9) are numerically solved iteratively. Such an algorithm is known to be a variant of the Gauss–Newton method (see, e.g., Dennis and Schnabel 1996; Xu et al. 2012) and converges linearly. If both \({\varvec{\Sigma }}_y\) and \({\varvec{\Sigma }}_a\) are identity matrices of different orders, then the estimate of \({\varvec{\beta }}\) can be elegantly solved as the eigenvector problem with the least eigenvalue (see, e.g., Pearson 1901; Golub and Loan 1980); otherwise, alternative algorithms can be found in, e.g., Huffel and Vandewalle (1991), Markovsky and Huffel (2007), Schaffrin and Wieser (2008). We should note that the weighted TLS estimates (9) of \({\varvec{\beta }}\) and \(\overline{\mathbf {a}}\) depend on \(\sigma _{0iy}^2 \, (i=1,2,\ldots ,m_y)\) and \(\sigma _{0ia}^2 \, (i=1,2,\ldots ,m_a)\). Thus they should be iteratively solved using the updated set of the estimated variance components to be described in Sect. 3.

2.3 Alternative iterative solution of the nonlinear Gauss–Markoff model through linearization

An alternative way to solve the EIV model (5) once it has been formulated as a nonlinear Gauss–Markoff model is based on linear approximation to the corresponding nonlinear model; this linear approximation procedure has been well known and elegantly documented in the widely accessible mathematical literature (see, e.g., Marquardt 1963; Bates and Watts 1980; Seber and Wild 1989; Björck 1996; Dennis and Schnabel 1996; Fletcher 2000). Geodesists may know better the publication by Pope (1972), which is based on the procedure of Marquardt (1963). Applying the Taylor expansion to the nonlinear observation equations (5) at the approximate values \({\varvec{\beta }}_i\) and \(\overline{\mathbf {a}}_i\) of the ith iteration and truncating it up to the linear approximation, one can obtain the following linearized observation equations:

$$\begin{aligned} \left[ \begin{array}{c} \Delta \mathbf {y} \\ \Delta \mathbf {a} \end{array} \right]&= \left[ \begin{array}{c} \mathbf {y} - \overline{\mathbf {A}}_i {\varvec{\beta }}_i \\ \mathbf {a} - \overline{\mathbf {a}}_i \end{array} \right] \nonumber \\&\approx \left[ \begin{array}{l@{\quad }l} \overline{\mathbf {A}}_i &{} {\varvec{\beta }}^T_i\otimes \mathbf {I}_n \\ \mathbf {0} &{} \mathbf {I}_a \end{array} \right] \left[ \begin{array}{c} \delta {\varvec{\beta }}_i \\ \delta \overline{\mathbf {a}}_i \end{array} \right] + \left[ \begin{array}{c} {\varvec{\epsilon }} \\ {\varvec{\epsilon }}_a \end{array} \right] , \end{aligned}$$
(10)

where \(\delta {\varvec{\beta }}_i={\varvec{\beta }}-{\varvec{\beta }}_i\) and \(\delta \overline{\mathbf {a}}_i=\overline{\mathbf {a}} -\overline{\mathbf {a}}_i\). As a result, the well-known weighted LS estimate of \(\delta {\varvec{\beta }}_i\) and \(\delta \overline{\mathbf {a}}_i\) can be readily obtained, which will be denoted by \(\delta \hat{{\varvec{\beta }}_i}\) and \(\delta \hat{\overline{\mathbf {a}}}_i\), respectively; these estimates are then used to compute the approximate values \({\varvec{\beta }}_{i+1} = {\varvec{\beta }}_i+\delta \hat{{\varvec{\beta }}_i}\) and \(\overline{\mathbf {a}}_{i+1}=\overline{\mathbf {a}}_i +\delta \hat{\overline{\mathbf {a}}}_i\) for the next iteration, starting from some initial values \({\varvec{\beta }}_0\) and \(\overline{\mathbf {a}}_0\). The iteration will be terminated when \(|\delta \hat{{\varvec{\beta }}_s}|<e\) and \(|\delta \hat{\overline{\mathbf {a}}}_s|<e\) at the sth step of iteration, where \(e\) is some predefined very small positive quantity. Thus, the final weighted LS estimates are \(\hat{{\varvec{\beta }}} = {\varvec{\beta }}_{s-1}+\delta \hat{{\varvec{\beta }}}_s\) and \(\hat{\overline{\mathbf {a}}}=\overline{\mathbf {a}}_{s-1} +\delta \hat{\overline{\mathbf {a}}}_s\); they should be essentially equal to the corresponding estimates given in Sect. 2.2.

We may note that during the iteration, say at the ith iteration, one may compute the following quantities:

$$\begin{aligned} \left[ \begin{array}{c} \hat{{\varvec{\epsilon }}}_i \\ \hat{{\varvec{\epsilon }}}_{ai} \end{array} \right] = \left[ \begin{array}{c} \mathbf {y} - \overline{\mathbf {A}}_i {\varvec{\beta }}_i \\ \mathbf {a} - \overline{\mathbf {a}}_i \end{array} \right] - \left[ \begin{array}{cc} \overline{\mathbf {A}}_i &{} {\varvec{\beta }}^T_i\otimes \mathbf {I}_n \\ \mathbf {0} &{} \mathbf {I}_a \end{array} \right] \left[ \begin{array}{c} \delta \hat{{\varvec{\beta }}}_i \\ \delta \hat{\overline{\mathbf {a}}}_i \end{array} \right] . \end{aligned}$$
(11)

We must note that \(\hat{{\varvec{\epsilon }}}_i\) and \(\hat{{\varvec{\epsilon }}}_{ai}\) are not the residuals of measurements \(\mathbf {y}\) and \(\mathbf {a}\) in the least squares sense. Actually, as is well known (see, e.g., Marquardt 1963; Bates and Watts 1988; Seber and Wild 1989; Björck 1996; Dennis and Schnabel 1996; Fletcher 2000), when the iteration described above converges mathematically, then \(\delta \hat{{\varvec{\beta }}_s}=\mathbf {0}\) and \(\delta \hat{\overline{\mathbf {a}}}_s=\mathbf {0}\). In other words, when the solutions converge to the weighted LS estimates \(\hat{{\varvec{\beta }}}\) and \(\hat{\overline{\mathbf {a}}}\), the weighted LS estimates of residuals are given by

$$\begin{aligned} \left[ \begin{array}{c} \hat{{\varvec{\epsilon }}} \\ \hat{{\varvec{\epsilon }}}_a \end{array} \right] = \left[ \begin{array}{c} \mathbf {y} - \hat{\overline{\mathbf {A}}} \hat{{\varvec{\beta }}} \\ \mathbf {a} - \hat{\overline{\mathbf {a}}} \end{array} \right] . \end{aligned}$$
(12)

Since residuals are almost always denoted using the letter \(\mathbf {r}\) in the above widely accessible mathematical and statistical literature, we will follow them and denote the residuals of \(\mathbf {y}\) and \(\mathbf {a}\) by \(\mathbf {r}_y\) and \(\mathbf {r}_a\), respectively. Thus, we have

$$\begin{aligned}&\mathbf {r}_y = \mathbf {y} - \hat{\overline{\mathbf {A}}} \hat{{\varvec{\beta }}} = \hat{{\varvec{\epsilon }}},\end{aligned}$$
(13a)
$$\begin{aligned}&\mathbf {r}_a = \mathbf {a} - \hat{\overline{\mathbf {a}}} = \hat{{\varvec{\epsilon }}}_a. \end{aligned}$$
(13b)

3 MINQUE estimation of variance components

For a linear model with a deterministic design matrix \(\mathbf {A}\) and the stochastic model (2a), there exist a number of well-established methods to estimate the variance components of (2a) such as Helmert method (Helmert 1907; Schaffrin 1983; Grafarend 1985), minimum norm quadratic unbiased estimation (MINQUE) (Rao 1971a; Rao and Kleffe 1988), minimum variance quadratic unbiased estimation (see, e.g., Rao 1971b; LaMotte 1973; Koch 1999), maximum likelihood and marginal/restricted maximum likelihood methods (see, e.g., Kubik 1966, 1970; Hartley and Rao 1967; Patterson and Thompson 1975; Koch 1986) as well as the least squares method of variance components (Pukelsheim 1976; Teunissen and Amiri-Simkooei 2008). The first three classes of methods are intuitively based on quadratic forms of measurements to construct an estimator for the variance components, while the maximum likelihood and marginal maximum likelihood methods have to assume that the joint probability distribution of measurements is given. In contrast, the least squares method is purely algebraic in nature. Pukelsheim (1976) used the residuals of measurements to form the derived linear model with variance components as the unknown model parameters. The corresponding estimator of variance components is unbiased. After a further assumption on the fourth moments of measurement errors, Pukelsheim (1976) proved that the least squares estimator of variance components is also of minimum variance (see also Teunissen and Amiri-Simkooei 2008). Pukelsheim and Styan (1978) further extended the least squares method of variance component estimation to a multivariate linear model. For more details on variance component estimation, the reader is referred to Helmert (1907), Rao and Kleffe (1988), Searle et al. (1992) and Koch (1999) in the case of linear models, and Xu et al. (2006) (see also Koch and Kusche 2007; Xu et al. 2007b), Xu (2009) and Eshagh (2010, 2011) in the case of ill-posed linear models.

In this paper, we will focus on the MINQUE method for the variance component estimation with the EIV models (1) and (2). We will not assume any probability distributions and the fourth moments of the random errors of both \(\mathbf {y}\) and \(\mathbf {A}\). Although the MINQUE method is reported to possess almost all good properties of variance component estimation (if estimable) in the linear model such as invariance, minimum norm, and further the minimum variance under the extra assumption of normal distributions (see, e.g., Rao 1971b; LaMotte 1973; Pukelsheim and Styan 1978; Rao and Kleffe 1988; Wulff and Birkes 2005), these nice statistical properties are not valid anymore here, since the EIV model (1) is essentially nonlinear. From this point of view, one may choose to use any method of his/her choice for variance component estimation, for two reasons: (i) there exist no superior methods for variance component estimation in linear models that are mathematically proved to perform the best in any situation; and (ii) optimal statistical properties of any variance component estimation methods established for linear models are generally not valid for nonlinear models.

3.1 Adapting variance component estimation methods to nonlinear models

For linear models of the form \(\mathbf {y}=\mathbf {A}{\varvec{\beta }} +{\varvec{\epsilon }}\) with a non-random design matrix \(\mathbf {A}\), \(E({\varvec{\epsilon }})=\mathbf {0}\) and \(E\{{\varvec{\epsilon }}{\varvec{\epsilon }}^T\}={\varvec{\Sigma }}=\sum _{i=1}^p \mathbf {U}_i\sigma _i^2\), most variance component estimation methods lead to a set of equations \(\mathbf {S}{\varvec{\sigma }}=\mathbf {q}\) to be solved for \({\varvec{\sigma }}=[\sigma _1^2,\sigma _2^2,\ldots ,\sigma _p^2]\), where \(\mathbf {S}\) and \(\mathbf {q}\) are functions of the positive (semi-)definite matrices \(\mathbf {U}_i\), the design matrix \(\mathbf {A}\) and the observations \(\mathbf {y}\) or the residual estimates \(\mathbf {r}=\mathbf {y}-\mathbf {A}\hat{{\varvec{\beta }}}\). In particular, the design matrix \(\mathbf {A}\) enters through the projection matrix:

$$\begin{aligned} \mathbf {Z}\!_A = \mathbf {A}(\mathbf {A}^T{\varvec{\Sigma }}_{0}^{-1} \mathbf {A})^{-1}\mathbf {A}^T{\varvec{\Sigma }}_{0}^{-1} = \mathbf {A} \mathbf {N}^{-1}\mathbf {A}^T{\varvec{\Sigma }}_{0}^{-1}, \end{aligned}$$
(14)

on the range (column space) \(\mathcal {R}(\mathbf {A})\) of \(\mathbf {A}\), where \(\mathbf {N}=\mathbf {A}^T{\varvec{\Sigma }}_{0}^{-1} \mathbf {A}\) and \({\varvec{\Sigma }}_{0}=\sum _{i=1}^p\mathbf {U}_i\sigma _{i0}^2\) is based on some approximate values \(\sigma _{i0}^2\) of \(\sigma _i^2\). For example, in the case of the MINQUE method, we have the elements \(s_{ij}\) of \(\mathbf {S}\) and those \(q_i\) of \(\mathbf {q}\) as follows:

$$\begin{aligned} s_{ij}&= \mathrm tr \{{\varvec{\Sigma }}_{0}^{-1}(\mathbf {I}-\mathbf {Z}\!_A)\mathbf {U}_i {\varvec{\Sigma }}_{0}^{-1}(\mathbf {I}-\mathbf {Z}\!_A)\mathbf {U}_j\} \nonumber \\&= \mathrm tr \{ \mathbf {P}\mathbf {U}_i\mathbf {P}\mathbf {U}_j\} \end{aligned}$$
(15a)

for \(i,j=1,2,\ldots ,p\) and,

$$\begin{aligned} q_i = \mathbf {r}^T{\varvec{\Sigma }}_{0}^{-1} \mathbf {U}_i {\varvec{\Sigma }}_{0}^{-1}\mathbf {r}, \end{aligned}$$
(15b)

for \(i=1,2,\ldots ,p\), where

$$\begin{aligned} \mathbf {P}&= {\varvec{\Sigma }}_{0}^{-1}(\mathbf {I} -\mathbf {Z}\!_A) \nonumber \\&= {\varvec{\Sigma }}_{0}^{-1}(\mathbf {I}-\mathbf {A}\mathbf {N}^{-1} \mathbf {A}^T{\varvec{\Sigma }}_{0}^{-1}) \nonumber \\&= {\varvec{\Sigma }}_{0}^{-1}(\mathbf {I}-\mathbf {H} {\varvec{\Sigma }}_{0}^{-1}) \nonumber \\&= {\varvec{\Sigma }}_{0}^{-1} \mathbf {R}, \end{aligned}$$
(15c)

with \(\mathbf {H}=\mathbf {A}\mathbf {N}^{-1}\mathbf {A}^T\) and \(\mathbf {R}=\mathbf {I}-\mathbf {H} {\varvec{\Sigma }}_{0}^{-1}.\) Here \(\mathrm tr (\cdot )\) stands for the trace of a square matrix.

In the linear case, \(\mathbf {A}\hat{{\varvec{\beta }}}=\mathbf {Z}\!_A\mathbf {y}\) is the weighted LS estimate of the observables \(\mathbf {A}{\varvec{\beta }}\). It is the closest point to \(\mathbf {y}\) from the linear subspace \(\mathcal {R}(\mathbf {A})\). In the case of a nonlinear model \(\mathbf {y}=\mathbf {f}({\varvec{\beta }})+{\varvec{\epsilon }}\), the least squares solution seeks the estimate \(\hat{{\varvec{\beta }}}\) such that \(\mathbf {f}(\hat{{\varvec{\beta }}})=\mathbf {Z}_{\mathcal {M}}(\mathbf {y})\) is the closest point to \(\mathbf {y}\) from the nonlinear manifold \(\mathcal {M}=\{\mathbf {f}({\varvec{\beta }})|{\varvec{\beta }}\in R^m\}\subset R^n\). In a small neighborhood of \(\mathbf {f}(\hat{{\varvec{\beta }}})\), the manifold \(\mathcal {M}\) is close to the linear manifold spanned by the rows of the matrix:

$$\begin{aligned} \mathbf {A}(\hat{{\varvec{\beta }}})=\left. \frac{\partial \mathbf {f}({\varvec{\beta }})}{\partial {\varvec{\beta }}^T}\right| _{{\varvec{\beta }}=\hat{{\varvec{\beta }}}} \end{aligned}$$

and the nonlinear projector \(\mathbf {Z}\!_{\mathcal {M}}(\mathbf {y})\) gives the same value as \( \mathbf {Z}\!_{A(\hat{\beta })}\), where

$$\begin{aligned} \mathbf {Z}\!_{A(\hat{\beta })} = \mathbf {A}(\hat{{\varvec{\beta }}})\{ [\mathbf {A}(\hat{{\varvec{\beta }}})]^T{\varvec{\Sigma }}_{0}^{-1} \mathbf {A}(\hat{{\varvec{\beta }}})\}^{-1}[\mathbf {A}(\hat{{\varvec{ \beta }}})]^T{\varvec{\Sigma }}_{0}^{-1}. \end{aligned}$$
(16)

This means that a variance component estimation method can be adapted to a nonlinear model within a local type of approximation around \(\mathbf {f}(\hat{{\varvec{\beta }}})\). By simply replacing \(\mathbf {Z}\!_A\) with \(\mathbf {Z}\!_{A(\hat{\beta })}\) in the equations \(\mathbf {S}{\varvec{\sigma }}=\mathbf {q}\) as well as \((\mathbf {y}-\mathbf {A}\hat{{\varvec{\beta }}})\) with its nonlinear counterpart \(\mathbf {r}(=\mathbf {y}-\mathbf {f}(\hat{{\varvec{\beta }}}))\). It must be strongly emphasized that the method of linearization used here for obtaining the least squares solution \(\hat{{\varvec{\beta }}}\) and hence the residuals \(\mathbf {r}\) and \(\mathbf {A}(\hat{{\varvec{\beta }}})\) is completely irrelevant to the suggested adaptation. We have already given two alternative methods in Sects. 2.2 and 2.3.

We should also note that although the matrix \(\mathbf {S}\) is computed at the point of \(\hat{{\varvec{\beta }}}\), it is mathematically admissible to treat \(\hat{{\varvec{\beta }}}\) as some approximate values only to linearize the nonlinear model, as exactly done, for example, by Marquardt (1963) and Pope (1972) . In other words, it is appropriate to treat the matrix \(\mathbf {S}\) as non-random. Actually, one must clearly distinguish two basic concepts: numerical solution and statistical evaluation of the solution. From the point of view of numerical solution, the matrix \(\mathbf {S}\) is simply a consequence of linearization to iterate for the solution to a nonlinear model. One can completely avoid the matrix \(\mathbf {S}\) using other numerical methods such as simulated annealing, genetic algorithms and/or global optimization methods of deterministic type. On the other hand, to statistically evaluate the accuracy of the optimal solution, it is totally incorrect to treat the matrix \(\mathbf {S}\) as random and then apply the error propagation law to compute the accuracy of the solution; instead, in this case, one must depend on the differential geometry of the original nonlinear model, as well documented in, for example, Bates and Watts (1980, 1988) and Seber and Wild (1989) for the error estimate of the weighted LS solution in nonlinear models.

3.2 Adapting the MINQUE method to the EIV model

The MINQUE method for estimation of variance components was proposed by Rao (1971a) (see also Rao and Kleffe 1988; Searle et al. 1992) to minimize the Euclidean norm of \(\mathbf {B}\) in the quadratic form \(\mathbf {y}^T\mathbf {B}\mathbf {y}\) of measurements in the standard linear model with a deterministic design/coefficient matrix \(\mathbf {A}\), subject to constraints such as invariance and unbiasedness. Since the functional EIV model (4) is nonlinear, the MINQUE method is not directly applicable to estimate the variance components in (2), and as a result, does not necessarily possess all its optimal properties in the case of linear Gauss–Markoff models. From this point of view, the choice of the MINQUE method in this paper is basically due to its convenience of using the positive (semi-)definite matrices \(\mathbf {U}_{iy}\) and \(\mathbf {U}_{ja}\) to naturally construct quadratic forms.

For adapting the MINQUE method to the EIV model (5), we need only to note that the design matrix in (10) is accordingly given by

$$\begin{aligned} \mathbf {A}(\hat{{\varvec{\beta }}},\hat{\overline{\mathbf {a}}}) = \left[ \begin{array}{l@{\quad }l} \hat{\overline{\mathbf {A}}} &{} (\hat{{\varvec{\beta }}})^T \otimes \mathbf {I}_n \\ \mathbf {0} &{} \mathbf {I}_a \end{array} \right] . \end{aligned}$$
(17)

With the linearized EIV model (10) and the stochastic model (2), we can now adapt the MINQUE method to estimate the variance components. For brevity of notations, we collect the variance components of \(\mathbf {y}\) in (2a) and those of \(\mathbf {a}\) (or \(\mathbf {A}\)) in (2b) into two vectors \({\varvec{\sigma }}\!_y\) and \({\varvec{\sigma }}\!_a\), respectively. The residuals of \(\mathbf {y}\) and \(\mathbf {a}\) from the nonlinear weighted TLS adjustment by \(\mathbf {r}_y\) and \(\mathbf {r}_a\) have been defined in (13a) and (13b), respectively.

More specifically, we compute all the corresponding matrices such as \(\mathbf {N}\), \(\mathbf {Z}_A\), \(\mathbf {P}\), \(\mathbf {H}\) and \(\mathbf {R}\) in (15) with the new design matrix \(\mathbf {A}(\hat{{\varvec{\beta }}},\hat{\overline{\mathbf {a}}})\) of (17) and denote the corresponding matrix \(\mathbf {P}\) of (15c) by:

$$\begin{aligned} \mathbf {P} = \left[ \begin{array}{cc} \mathbf {P}_y &{} \mathbf {P}_{ya} \\ \mathbf {P}_{ay} &{} \mathbf {P}_a \end{array} \right] . \end{aligned}$$

As a result, the corresponding MINQUE estimates of \({\varvec{\sigma }}\!_y\) and \({\varvec{\sigma }}\!_a\) are derived by solving the following system of linear equations:

$$\begin{aligned} \left[ \begin{array}{cc} \mathbf {S}_y &{} \mathbf {S}_{ya} \\ \mathbf {S}_{ay} &{} \mathbf {S}_a \end{array} \right] \left[ \begin{array}{c} \hat{{\varvec{\sigma }}}\!_y \\ \hat{{\varvec{\sigma }}}\!_a \end{array} \right] = \left[ \begin{array}{c} \mathbf {q}_y \\ \mathbf {q}_a \end{array} \right] , \end{aligned}$$
(18)

where the elements of \(\mathbf {S}_y\), \(\mathbf {S}_a\) and \(\mathbf {S}_{ya}\) are computed, respectively, by the following equations:

$$\begin{aligned} s_y^{ij}&= \mathrm tr \{ \mathbf {P}\mathbf {U}_i\mathbf {P} \mathbf {U}_j\} \nonumber \\&= \mathrm tr \left\{ \left[ \begin{array}{cc} \mathbf {P}_y &{} \mathbf {P}_{ya} \\ \mathbf {P}_{ay} &{} \mathbf {P}_a \end{array} \right] \left[ \begin{array}{cc} \mathbf {U}_{iy} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {0} \end{array} \right] \left[ \begin{array}{cc} \mathbf {P}_y &{} \mathbf {P}_{ya} \\ \mathbf {P}_{ay} &{} \mathbf {P}_a \end{array} \right] \left[ \begin{array}{cc} \mathbf {U}_{jy} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {0} \end{array} \right] \right\} \nonumber \\&= \mathrm tr (\mathbf {P}_y\mathbf {U}_{iy}\mathbf {P}_y\mathbf {U}_{jy}), \end{aligned}$$
(19a)

for \(i, j=1, 2,\ldots , m_y\),

$$\begin{aligned} s_a^{ij}&= \mathrm tr \{ \mathbf {P}\mathbf {U}_i\mathbf {P}\mathbf {U}_j\} \nonumber \\&= \mathrm tr \left\{ \left[ \begin{array}{cc} \mathbf {P}_y &{} \mathbf {P}_{ya} \\ \mathbf {P}_{ay} &{} \mathbf {P}_a \end{array} \right] \left[ \begin{array}{cc} \mathbf {0} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {U}_{ia} \end{array} \right] \left[ \begin{array}{cc} \mathbf {P}_y &{} \mathbf {P}_{ya} \\ \mathbf {P}_{ay} &{} \mathbf {P}_a \end{array} \right] \left[ \begin{array}{cc} \mathbf {0} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {U}_{ja} \end{array} \right] \right\} \nonumber \\&= \mathrm tr (\mathbf {P}_a\mathbf {U}_{ia}\mathbf {P}_a \mathbf {U}_{ja}), \end{aligned}$$
(19b)

for \(i, j=1, 2,\ldots , m_a\), and

$$\begin{aligned} s_{ya}^{ij}&= \mathrm tr \{ \mathbf {P}\mathbf {U}_i\mathbf {P}\mathbf {U}_j\} \nonumber \\&= \mathrm tr \left\{ \left[ \begin{array}{cc} \mathbf {P}_y &{} \mathbf {P}_{ya} \\ \mathbf {P}_{ay} &{} \mathbf {P}_a \end{array} \right] \left[ \begin{array}{cc} \mathbf {U}_{iy} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {0} \end{array} \right] \left[ \begin{array}{cc} \mathbf {P}_y &{} \mathbf {P}_{ya} \\ \mathbf {P}_{ay} &{} \mathbf {P}_a \end{array} \right] \left[ \begin{array}{cc} \mathbf {0} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {U}_{ja} \end{array} \right] \right\} \nonumber \\&= \mathrm tr (\mathbf {P}_{ay} \mathbf {U}_{iy}\mathbf {P}_{ya} \mathbf {U}_{ja}), \end{aligned}$$
(19c)

for \(i=1, 2,\ldots , m_y\) and \(j=1, 2,\ldots , m_a\). The matrices \(\mathbf {P}_y\), \(\mathbf {P}_a\) and \(\mathbf {P}_{ya}\) are respectively equal to

$$\begin{aligned}&\mathbf {P}_y = {\varvec{\Sigma }}_{0y}^{-1} - {\varvec{\Sigma }}_{0y}^{-1}\mathbf {H}_y{\varvec{\Sigma }}_{0y}^{-1},\\&\mathbf {P}_a = {\varvec{\Sigma }}_{0a}^{-1} - {\varvec{\Sigma }}_{0a}^{-1}\mathbf {H}_a{\varvec{\Sigma }}_{0a}^{-1}, \end{aligned}$$

and

$$\begin{aligned} \mathbf {P}_{ya} = - {\varvec{\Sigma }}_{0y}^{-1}\mathbf {H}_{ya}{\varvec{\Sigma }}_{0a}^{-1}. \end{aligned}$$

The matrices \(\mathbf {H}_{ay}\), \(\mathbf {H}_a\) and \(\mathbf {H}_y\) have been given in the Appendix. The elements of the two (sub-)vectors \(\mathbf {q}_y\) and \(\mathbf {q}_a\) on the right hand side of (18) are defined as

$$\begin{aligned} q_y^i&= \left[ \begin{array}{c} \mathbf {r}_y \\ \mathbf {r}_a \end{array} \right] ^T \left[ \begin{array}{cc} {\varvec{\Sigma }}_{0y}^{-1} &{} \mathbf {0} \\ \mathbf {0} &{} {\varvec{\Sigma }}_{0a}^{-1} \end{array} \right] \left[ \begin{array}{cc} \mathbf {U}_{iy} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {0} \end{array} \right] \left[ \begin{array}{cc} {\varvec{\Sigma }}_{0y}^{-1} &{} \mathbf {0} \\ \mathbf {0} &{} {\varvec{\Sigma }}_{0a}^{-1} \end{array} \right] \left[ \begin{array}{c} \mathbf {r}_y \\ \mathbf {r}_a \end{array} \right] \nonumber \\&= \mathbf {r}_y^T{\varvec{\Sigma }}_{0y}^{-1}\mathbf {U}_{iy}{\varvec{ \Sigma }}_{0y}^{-1}\mathbf {r}_y, \end{aligned}$$
(20a)

for \(i=1, 2,\ldots , m_y\), and

$$\begin{aligned} q_a^i&= \left[ \begin{array}{c} \mathbf {r}_y \\ \mathbf {r}_a \end{array} \right] ^T \left[ \begin{array}{cc} {\varvec{\Sigma }}_{0y}^{-1} &{} \mathbf {0} \\ \mathbf {0} &{} {\varvec{\Sigma }}_{0a}^{-1} \end{array} \right] \left[ \begin{array}{cc} \mathbf {0} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {U}_{ia} \end{array} \right] \left[ \begin{array}{cc} {\varvec{\Sigma }}_{0y}^{-1} &{} \mathbf {0} \\ \mathbf {0} &{} {\varvec{\Sigma }}_{0a}^{-1} \end{array} \right] \left[ \begin{array}{c} \mathbf {r}_y \\ \mathbf {r}_a \end{array} \right] \nonumber \\&= \mathbf {r}_a^T{\varvec{\Sigma }}_{0a}^{-1}\mathbf {U}_{ia}{\varvec{ \Sigma }}_{0a}^{-1}\mathbf {r}_a, \end{aligned}$$
(20b)

for \(i=1, 2,\ldots , m_a\).

Substituting (19) and (20) into (18), we can readily obtain the MINQUE estimates of the variance components \({\varvec{\sigma }}\!_y\) and \({\varvec{\sigma }}\!_a\):

$$\begin{aligned} \left[ \begin{array}{cc} \hat{{\varvec{\sigma }}}\!_y \\ \hat{{\varvec{\sigma }}}\!_a \end{array} \right] = \left[ \begin{array}{cc} \mathbf {S}_y &{} \mathbf {S}_{ya} \\ \mathbf {S}_{ay} &{} \mathbf {S}_a \end{array} \right] ^{-1} \left[ \begin{array}{c} \mathbf {q}_y \\ \mathbf {q}_a \end{array} \right] , \end{aligned}$$
(21)

if the coefficient matrix of (18) is regular or invertible.

Before closing this subsection, we make four remarks on the MINQUE estimates of the variance components \({\varvec{\sigma }}\!_y\) and \({\varvec{\sigma }}\!_a\). (i) Although we have derived the estimates of the variance components through the linearization of the nonlinear EIV model, an alternative approach is to directly construct MINQUE-like quadratic forms with the nonlinear residuals, expand the nonlinear residuals to the linear approximation, apply the expectation operators to the quadratic forms with the linearized residuals and finally derive the equations (18) by removing the expectation operators, as can be clearly seen in Sect. 5; (ii) following the method of estimability analysis by Xu et al. (2007a), the total number of variance components \({\varvec{\sigma }}\!_y\) and \({\varvec{\sigma }}\!_a\) cannot be larger than \((n-m)\); otherwise, the coefficient matrix in (18) would not be regular anymore and, as a result, \({\varvec{\sigma }}\!_y\) and \({\varvec{\sigma }}\!_a\) are not estimable; (iii) the MINQUE estimates of \({\varvec{\sigma }}\!_y\) and \({\varvec{\sigma }}\!_a\) are exact only under the assumption of linear models. Since the original EIV model (1), or equivalently (4), is essentially nonlinear, we can only adapt the MINQUE method to estimate \({\varvec{\sigma }}\!_y\) and \({\varvec{\sigma }}\!_a\), as given by (21). Thus, from the theoretical point of view, all the good statistical properties of the MINQUE estimate of variance components in the linear model such as invariance, unbiasedness and minimum variance (in the case of normal distributions) do not hold anymore in the nonlinear EIV model (4); and (iv) If the variance components are estimable, one can start with an initial set of variance components \(\sigma _{0iy}^2 \, (i=1,2,\ldots ,m_y)\) and \(\sigma _{0ia}^2 \, (i=1,2,\ldots ,m_a)\), and then use (18) to compute the iterative MINQUE estimates of \({\varvec{\sigma }}\!_y\) and \({\varvec{\sigma }}\!_a\) numerically, with the elements of \(\mathbf {q}_y\) and \(\mathbf {q}_a\) given by (20).

3.3 Two special structures of variance components

The first special structure of variance components assumes only two variance components in (2), one for the measurements \(\mathbf {y}\) and the other for the measured matrix data \(\mathbf {a}\). As a result, the variance component model (2) can now be rewritten as follows:

$$\begin{aligned} {\varvec{\Sigma }}_y&= \mathbf {W}_y^{-1} \sigma _y^2,\\ {\varvec{\Sigma }}_a&= \mathbf {W}_a^{-1} \sigma _a^2, \end{aligned}$$
(22a)

where \(\mathbf {W}_y\) and \(\mathbf {W}_a\) are the weight matrices of \(\mathbf {y}\) and \(\mathbf {a}\), respectively.

With the stochastic structure (22), according to Horn et al. (1975) and Rao and Kleffe (1988), it is easy to prove that when the MINQUE estimates of \(\sigma _y^2\) and \(\sigma _a^2\) using (18) converge in association with the linearized observation model (10), and after some (light) derivations, we must equivalently have the following simplified equations:

$$\begin{aligned} \widehat{\sigma _y^2} = \mathbf {r}_y^T\mathbf {W}_y\mathbf {r}_y / \mathrm tr (\mathbf {R}_y), \end{aligned}$$
(23a)

to estimate the variance component \(\sigma _y^2\), and

$$\begin{aligned} \widehat{\sigma _a^2} = \mathbf {r}_a^T\mathbf {W}_a\mathbf {r}_a / \mathrm tr (\mathbf {R}_a), \end{aligned}$$
(23b)

to estimate the variance component \(\sigma _a^2\). \(\mathbf {R}_y\) and \(\mathbf {R}_a\) have been defined in (49) of the Appendix, except that now the initial values of \(\sigma _y^2\) and \(\sigma _a^2\) are replaced by \(\widehat{\sigma _y^2}\) and \(\widehat{\sigma _a^2}\), respectively. Actually, the two quantities \(\mathrm tr (\mathbf {R}_y)\) and \(\mathrm tr (\mathbf {R}_a)\) serve as the degrees of freedom for \(\widehat{\sigma _y^2}\) and \(\widehat{\sigma _a^2}\), respectively, and are often called the redundant observational numbers in the geodetic literature. Since both \(\mathbf {R}_y\) and \(\mathbf {R}_a\) contain \(\widehat{\sigma _y^2}\) and \(\widehat{\sigma _a^2}\), the estimates of variance components as given by (23) have to be solved iteratively.

The second special structure of variance components assumes one variance component for the measured matrix data \(\mathbf {a}\) only and further assume that the measurements \(\mathbf {y}\) consist of a number of independent subgroups or different types of measurements. Then the corresponding variance component model (2) can now be rewritten as follows:

$$\begin{aligned} {\varvec{\Sigma }}_y = \left[ \begin{array}{llll} \mathbf {W}_1^{-1}\sigma _{1y}^2 &{} \mathbf {0} &{} \cdots &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {W}_2^{-1}\sigma _{2y}^2 &{} \cdots &{} \mathbf {0} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ \mathbf {0} &{} \mathbf {0} &{} \cdots &{} \mathbf {W}_{m_y}^{-1} \sigma _{m_yy}^2 \end{array} \right] , \end{aligned}$$
(24a)
$$\begin{aligned} {\varvec{\Sigma }}_a = \mathbf {W}_a^{-1} \sigma _a^2, \end{aligned}$$
(24b)

where \(\mathbf {W}_i\) are the weight matrices of the corresponding subgroups or types of measurements \(\mathbf {y}\), respectively, and \(\sigma _{iy}^2 \, (i=1,2,\ldots ,m_y)\) are the corresponding variance components of \(\mathbf {y}\).

Following the same rationale as in the derivation of (23), we can readily obtain the estimates of the variance components \(\sigma _{iy}^2 \, (i=1,2,\ldots ,m_y)\) and \(\sigma _a^2\) below:

$$\begin{aligned} \widehat{\sigma _{iy}^2} = \mathbf {r}_{iy}^T\mathbf {W}_i\mathbf {r}_{iy} / \mathrm tr (\mathbf {R}_{iy}), \end{aligned}$$
(25a)

for \(i=1,2,\ldots ,m_y\), and

$$\begin{aligned} \widehat{\sigma _a^2} = \mathbf {r}_a^T\mathbf {W}_a\mathbf {r}_a / \mathrm tr (\mathbf {R}_a), \end{aligned}$$
(25b)

where \(\mathbf {r}_{iy}\) and \(\mathbf {R}_{iy}\) are the residual sub-vectors of \(\mathbf {r}_y\) and the diagonal submatrices of \(\mathbf {R}_y\), corresponding to the ith subgroup or type of measurements \(\mathbf {y}\), respectively.

4 Estimability and stability analysis

From the point of view of parameter estimation, the EIV model (1) can be equivalently reformulated as a nonlinear Gauss–Markoff model. However, as far as the variance components are concerned, the combination of the functional model (1) with the stochastic model (2) is very special. Looking at the second part (3b) or (4b) of the EIV models, we clearly see that the measurements \(\mathbf {A}\) contribute no direct redundant measurements to the estimation of the variance components of \(\mathbf {A}\). In this section, we will investigate how this peculiarity would affect the estimation of variance components described in (2a) and (2b). We emphasize that the following proof of inestimability of variance components, in association with the EIV model (4) with the stochastic model (27), depends solely on quadratic forms of the residuals of the measurements \(\mathbf {y}\) and \(\mathbf {A}\); the proof is independent of methods used for variance component estimation, though MINQUE and/or Helmert methods may be formally applied to derive an estimator of variance components, as is the case in Sect. 3.

4.1 Estimability analysis

For a standard linear model, namely, the linear model (1) with a deterministic (or non-random) coefficient matrix \(\mathbf {A}\), together with the stochastic model (2a), a linear function \(\mathbf {f}^T{\varvec{\sigma }}\) of the variance components is said to be unbiasedly estimable, if there exists a quadratic estimator \(\mathbf {y}^T\mathbf {B}\mathbf {y}\) such that

$$\begin{aligned} E(\mathbf {y}^T\mathbf {B}\mathbf {y}) = \mathbf {f}^T{\varvec{\sigma }}, \end{aligned}$$
(26)

where \(\mathbf {f}\) is a given vector, \({\varvec{\sigma }}\) is the vector consisting of all the variance components of \(\mathbf {y}\) to be estimated, and \(\mathbf {B}\) is symmetric and should be independent of both \({\varvec{\beta }}\) and \({\varvec{\sigma }}\) (or \({\varvec{\Sigma }}_y\)). Simply speaking, variance components are estimable, if they can be uniquely determined from the derived normal equations for the variance components. In other words, the normal matrix of the linear equations for the variance components can be regularly inverted. For more details on the concept of estimability, the reader is referred to Pincus (1974), Schaffrin (1983), Rao and Kleffe (1988) and Xu et al. (2007a).

In this section, the question of our concern is whether the variance components of the EIV model (1) can be uniquely determined from the given measurements \(\mathbf {y}\) and \(\mathbf {A}\). To start with an estimability analysis, let us assume a commonly encountered stochastic structure (2a) for the measurements \(\mathbf {y}\) and further assume that the elements at each column of \(\mathbf {A}\) are of the same type of measurements with the same accuracy. In other words, the variance structure for the elements at the jth column of \(\mathbf {A}\) can be uniquely represented using one unknown variance component, i.e., \({\varvec{\Sigma }}_{aj}=\mathbf {I}\,\sigma ^2_{ja}\). Accordingly, the stochastic model (2b) for the random matrix \(\mathbf {A}\) can be rewritten as follows:

$$\begin{aligned} {\varvec{\Sigma }}_a = \sum \limits _{i=1} ^m \mathbf {U}_{ia} \sigma _{ia}^2 = \mathrm diag (\sigma _{ia}^2)\otimes \mathbf {I}_n, \end{aligned}$$
(27)

where \(\mathbf {U}_{ia}\) is a diagonal block matrix with its ith block being equal to the identity matrix \(\mathbf {I}_n\), and \(\mathrm diag (\sigma _{ia}^2)\) is a diagonal matrix with its ith diagonal element being equal to \(\sigma _{ia}^2\).

Under the assumption of (27) and given a set of initial values \(\sigma _{0iy}^2 \, (i=1,2,\ldots ,m_y)\) and \(\sigma _{0ia}^2 \, (i=1,2,\ldots ,m)\), we can obtain

$$\begin{aligned} {\varvec{\Sigma }}_{0a}(\hat{{\varvec{\beta }}}\otimes \mathbf {I}_n) =( \mathrm diag (\sigma _{0ia}^2)\otimes \mathbf {I}_n)(\hat{{\varvec{\beta }}}\otimes \mathbf {I}_n) = \mathbf {d}_{0\beta }\otimes \mathbf {I}_n,\nonumber \\ \end{aligned}$$
(28a)

where

$$\begin{aligned} \mathbf {d}_{0\beta } = \mathrm diag (\sigma _{0ia}^2)\hat{{\varvec{\beta }}} = \left[ \begin{array}{c} \sigma _{01a}^2\hat{\beta }_1 \\ \sigma _{02a}^2\hat{\beta }_2 \\ \vdots \\ \sigma _{0ma}^2\hat{\beta }_m \end{array} \right] . \end{aligned}$$

Inserting (28a) into (9c) yields

$$\begin{aligned} \mathbf {E} = {\varvec{\Sigma }}_{0y} + c_{\beta } \mathbf {I}_n, \end{aligned}$$
(28b)

where

$$\begin{aligned} c_{\beta } =\mathbf {d}_{0\beta }^T\hat{{\varvec{\beta }}} = \sum \limits _{i=1} ^m \sigma _{0ia}^2\hat{\beta }_i^2. \end{aligned}$$

Substituting (28a) and (28b) into (9d), we can readily obtain the weighted TLS estimate for each column of \(\overline{\mathbf {A}}\), which is denoted by \(\hat{\overline{\mathbf {a}}}_i\) and given as follows:

$$\begin{aligned} \hat{\overline{\mathbf {a}}}_i = \mathbf {a}_i + \sigma _{0ia}^2\hat{\beta }_i\mathbf {E}^{-1} ( \mathbf {y} - \mathbf {A}\hat{{\varvec{\beta }}} ) \end{aligned}$$
(29)

for \(i=1,\,2,\,\ldots ,\,m\), where \(\hat{\overline{\mathbf {a}}}_i\) and \(\mathbf {a}_i\) are the ith columns of \(\hat{\overline{\mathbf {A}}}\) and \(\mathbf {A}\), respectively. As a result, the residual vector for each column of \(\mathbf {A}\) is equal to

$$\begin{aligned} \mathbf {r}_{ia} = - \sigma _{0ia}^2\hat{\beta }_i\mathbf {E}^{-1} ( \mathbf {y} - \mathbf {A}\hat{{\varvec{\beta }}} ) \end{aligned}$$
(30)

for \(i=1,\,2,\,\ldots ,\,m\). It is surprising to see from (30) that all the residual vectors \(\mathbf {r}_{ia}\) of columns of \(\mathbf {A}\) are proportional to the transformed vector \(\mathbf {E}^{-1} ( \mathbf {y} - \mathbf {A}\hat{{\varvec{\beta }}} )\). In other words, under the stochastic models (2a) and (27), the correction vector to each column of \(\mathbf {A}\) is proportional to \(\mathbf {E}^{-1} ( \mathbf {y} - \mathbf {A}\hat{{\varvec{\beta }}} )\) and its varying coefficient depends on the initial value of the variance component and the estimated parameter of the corresponding column elements of \(\mathbf {A}\), i.e., \(\sigma _{0ia}^2\) and \(\hat{\beta }_i\). More precisely speaking, if we denote the ratios of the corresponding elements among the residual vectors \(\mathbf {r}_{ia}\) by \(\mathbf {r}_{1a}:\mathbf {r}_{2a}:\ldots :\mathbf {r}_{ma}\), then we have

$$\begin{aligned} \mathbf {r}_{1a}:\mathbf {r}_{2a}:\ldots :\mathbf {r}_{ma} = (\sigma _{01a}^2\hat{\beta }_1):(\sigma _{02a}^2\hat{\beta }_2) :\ldots : (\sigma _{0ma}^2\hat{\beta }_m).\nonumber \\ \end{aligned}$$
(31)

Since the residual vectors \(\mathbf {r}_{ia}\) of (30) are the starting point of any variance component estimation, and after taking into account the following equality:

$$\begin{aligned} E(\mathbf {r}_a^T{\varvec{\Sigma }}_{0a}^{-1}\mathbf {U}_{ia}{\varvec{ \Sigma }}_{0a}^{-1}\mathbf {r}_a) = E(\mathbf {r}_{ia}^T\mathbf {r}_{ia})/\sigma _{0ia}^4, \end{aligned}$$

we have

$$\begin{aligned}&E(\mathbf {r}_{1a}^T\mathbf {r}_{1a}/\sigma _{01a}^4) : E(\mathbf {r}_{2a}^T\mathbf {r}_{2a}/\sigma _{02a}^4):\ldots : E(\mathbf {r}_{ma}^T\mathbf {r}_{ma}/\sigma _{0ma}^4)\nonumber \\&\quad = \beta _1^2:\beta _2^2 :\ldots : \beta _m^2, \end{aligned}$$
(32)

up to the second-order approximation of \({{\varvec{\epsilon }}}\) and \({{\varvec{\epsilon }}}_a\). The ratio (32) is mathematically equivalent to saying that the submatrix \(\mathbf {S}_a\) in the system of equations for the variance components is singular. Thus, we can immediately conclude from (30), (31) and (32) that the variance components under the combination of the stochastic models (2a) and (27) are not estimable. This result of inestimability indicates that we have no way of gaining any knowledge about such an EIV stochastic model from the measurements \(\mathbf {y}\) and \(\mathbf {A}\). Our result of inestimability remains valid even if there are only two unknown variance components for the elements of \(\mathbf {A}\) so far as each group of elements is of the same accuracy. The proof is trivial, since one can simply add constraints on \(\sigma _{ia}^2 \, (i=1,2,\ldots ,m)\) in (27).

If we further assume \({\varvec{\Sigma }}_y = \sigma _y^2\mathbf {I}_n\), then (29) and (30), respectively, become

$$\begin{aligned} \hat{\overline{\mathbf {a}}}_i = \mathbf {a}_i + \frac{\sigma _{0ia}^2\hat{\beta }_i}{\sigma _{0y}^2+c_{\beta }} ( \mathbf {y} - \mathbf {A}\hat{{\varvec{\beta }}} ) \end{aligned}$$
(33)

and

$$\begin{aligned} \mathbf {r}_{ia} = - \frac{\sigma _{0ia}^2\hat{\beta }_i}{\sigma _{0y}^2+c_{\beta }} ( \mathbf {y} - \mathbf {A}\hat{{\varvec{\beta }}} ) \end{aligned}$$
(34)

for \(i=1,\,2,\,\ldots ,\,m\). Obviously, the residual vectors (34) also satisfy the relation of ratios (31).

For the EIV model of regression type, namely,

$$\begin{aligned} \mathbf {y} = \mathbf {1}\beta _0 + \mathbf {A}_1 {\varvec{\beta }}_1 + {\varvec{\epsilon }}, \end{aligned}$$
(35)

where \(\mathbf {y}\) and \({\varvec{\epsilon }}\) have been defined as in (1), \(\mathbf {1}\) is a column vector with each element being equal to unity, \(\beta _0\) and \({\varvec{\beta }}_1\) are the scalar and \((m\times 1)\) unknown parameters to be estimated, and the elements of \(\mathbf {A}_1\) are measured with random errors, our result of inestimability remains valid, if \(\mathbf {A}_1\) has the same stochastic structure as defined by (27) (with a varying number of variance components \(\sigma _{ia}^2\) between two and \(m\)). The proof can be completed by following the partial EIV formulation of Xu et al. (2012) and then using the same rationales as used to prove the inestimability in this section. Since all the derivations are technical, we will not repeat the proof here.

4.2 Stability analysis

For arbitrary positive semi-definite matrices \(\mathbf {U}_{iy}\) and \(\mathbf {U}_{ia}\) in the stochastic models (2a) and (2b), we cannot obtain elegant inestimability results, as given in the above for the stochastic model (27). Nevertheless, after taking (9d) into account, we can rewrite the residuals (13) of the measurements \(\mathbf {y}\) and \(\mathbf {a}\) as follows:

$$\begin{aligned} \mathbf {r}_y = \mathbf {y} - \hat{\overline{\mathbf {A}}} \hat{{\varvec{\beta }}}, \end{aligned}$$
(36a)
$$\begin{aligned} \mathbf {r}_a = -{\varvec{\Sigma }}_{0a}(\hat{{\varvec{\beta }}}\otimes \mathbf {I}_n)\mathbf {E}^{-1} ( \mathbf {y} - \mathbf {A}\hat{{\varvec{\beta }}} ). \end{aligned}$$
(36b)

Since \(\hat{\overline{\mathbf {A}}}\) is essentially the addition of \(\mathbf {A}\) and its correction matrix \(\delta \!\mathbf {A}\) which can be directly reconstructed from \(-\mathbf {r}_a\), i.e., \(\mathrm vec (\delta \!\mathbf {A})=-\mathbf {r}_a\), we can rewrite (36a) as

$$\begin{aligned} \mathbf {r}_y = \mathbf {y} - \mathbf {A} \hat{{\varvec{\beta }}} - \delta \!\mathbf {A}\hat{{\varvec{\beta }}}. \end{aligned}$$
(36c)

Obviously, if \(\delta \!\mathbf {A}\) is small, then both the residual vectors \(\mathbf {r}_y\) and \(\mathbf {r}_a\) will be dominated by \((\mathbf {y} - \mathbf {A} \hat{{\varvec{\beta }}})\). It is also clear from (36b) that the \((n\times m)\) residual elements of \(\mathbf {A}\) are determined completely through the combination of only the \(n\) elements of \((\mathbf {y} - \mathbf {A} \hat{{\varvec{\beta }}})\), which are further related to the residuals \(\mathbf {r}_y\) of \(\mathbf {y}\). From this point of view, the condition of the coefficient matrix in (18) for the estimation of variance components may not be very good.

In this case, even if the variance components of the stochastic models (2a) and (2b) are theoretically estimable, we would like to know whether they can be accurately estimated. A proper measure to describe this aspect of estimation property is the stability of the estimated variance components or, more precisely, the condition number of the coefficient matrix of the linear equations (18). Actually, if there exist zero eigenvalues for the coefficient matrix of (18), then the variance components are not estimable. Obviously, the condition number can be computed as soon as the functional EIV model (1) and its corresponding stochastic models (2a) and (2b) are given. To give the reader an impression of stability, we will simulate some examples and show their condition numbers in this section.

In fact, before we start the research on this topic, we have in mind an application of the methods developed in this paper to the Atlantic sea surface temperature data reported first by Mann and Emanuel (2006) and used by Schaffrin and Wieser (2008). With the above estimability analysis, it becomes clear now that we cannot gain the knowledge on the stochastic nature of this data set, since a reasonable assumption for data of this type would be the stochastic model (27). Examples of this type include dam deformation analysis, for example. Thus, instead, we will use problems of this type with only two model parameters \(\beta _1\) and \(\beta _2\) to demonstrate the stability of the linear equations (18).

More specifically, simulated examples can be represented by the following equations

$$\begin{aligned} y_i = a_{i1} \beta _1 + a_{i2} \beta _2 + \epsilon _i, \quad \,\, (i=1,\,2,\,\ldots ,\,n). \end{aligned}$$
(37)

We assume three variance components, namely, one for \(y_i\), one for \(a_{i1}\) and one for \(a_{i2}\). The number of measurements \(\mathbf {y}\) is set to 120. For each experiment, we randomly generate two true values for the parameters \(\beta _1\) and \(\beta _2\) from the interval \([-5.0,\, 5.0]\) and those for \(a_{i1}\) and \(a_{i2}\) are from \([0.1,\, 100.0]\). Although the stochastic model (27) has been known to be not estimable, we will use this simulated example to numerically demonstrate the inestimability. For this purpose, we use the uniform distribution over \([0.1,\, 1.0]\) to generate the weights for \(y_i\) and assume the stochastic model \({{\varvec{\Sigma }}}_{ai}=\mathbf {I}_n\sigma _{ia}^2(i=1,2)\) for the random elements of each column of \(\mathbf {A}\). Applying the MINQUE method to the simulated example, we obtain the following system of equations for estimation of the variance components \(\sigma _y^2\), \(\sigma _{1a}^2\) and \(\sigma _{2a}^2\), namely,

$$\begin{aligned}&\left[ \begin{array}{l@{\quad }l@{\quad }l} 33.900162 &{} 0.725844 &{} 3.113569 \\ 0.725844 &{} 3.935471 &{} 16.881530 \\ 3.113569 &{} 16.881530 &{} 72.414734 \end{array} \right] \left[ \begin{array}{c} \widehat{\sigma _y^2} \\ \widehat{\sigma _{1a}^2} \\ \widehat{\sigma _{2a}^2} \end{array} \right] \nonumber \\&\qquad =\left[ \begin{array}{r} 184.037631 \\ 926.564864 \\ 3974.577300 \end{array} \right] . \end{aligned}$$
(38)

It is easy to show that the coefficient matrix of (38), denoted by \(\mathbf {S}\), is numerically singular, with the three eigenvalues being equal to \(0.000000\), \(33.660731\) and \(76.589635\), respectively. The elements of its last two rows of \(\mathbf {S}\), together with the last two elements of the constant vector on the right hand side of (38), satisfy the following relationship:

$$\begin{aligned} \frac{\mathbf {S}(3,1)}{\mathbf {S}(2,1)} = \frac{\mathbf {S}(3,2)}{\mathbf {S}(2,2)} = \frac{\mathbf {S}(3,3)}{\mathbf {S}(2,3)} = \frac{3974.577300}{926.564864} = 4.28958. \end{aligned}$$

Now we will use non-identity weight matrices for \(a_{i1}\) and \(a_{i2}\) to show the instability aspect of variance component estimations in the EIV model (1). The weight matrices for \(y_i\), \(a_{i1}\) and \(a_{i2}\) are assumed to be diagonal and generated using uniform distributions. More specifically, the weights for \(y_i\) are drawn from \([0.1,\, 1.0]\) and those for \(a_{i1}\) and \(a_{i2}\) are from \([0.1,\, 100.0]\). The experiment is randomly repeated 200 times. Since we may have no prior knowledge on the values of variance components, we get the first set of 200 condition numbers by setting initial values of variance components to unity. For the purpose of comparison, we also use the first iteration result of the componentwise positive variance estimators (25) to get a set of better approximate values for the variance components and then compute the 200 condition numbers for the coefficient matrix of the linear equations (18). These two sets of condition numbers in logarithm with different approximate values of variance components are shown in Fig. 1, with the blue solid line corresponding to the first set of approximate values and the dotted red line to the second set of approximate values. It is clear from the blue solid line of this figure that the coefficient matrix of (18) can often be ill-conditioned, but if very good approximate values of the variance components are available, the condition of the coefficient matrix can be significantly improved. Nevertheless, the coefficient matrix can still become more ill-conditioned from time to time. We should like to warn, however, that a small condition number may not automatically mean a good estimation of variance components, since all the residuals of \(\mathbf {y}\) and \(\mathbf {A}\) are determined either by \(( \mathbf {y} - \hat{\overline{\mathbf {A}}}\hat{{\varvec{\beta }}} )\) or \(( \mathbf {y} - \mathbf {A}\hat{{\varvec{\beta }}} )\).

Fig. 1
figure 1

Condition numbers of the coefficient matrix from the MINQUE estimation of variance components with two different sets of approximate variance components

5 Bias analysis

The weighted TLS estimate of \({\varvec{\beta }}\) in the EIV model (1) is a solution to a nonlinear system of equations, as can be seen in Sect. 2.2 and from various different algorithms to solve TLS problems (see, e.g., Pearson 1901; Gerhold 1969; Golub and Loan 1980; Huffel and Vandewalle 1991; Markovsky and Huffel 2007; Schaffrin and Wieser 2008; Fang 2011; Xu et al. 2012), even if the stochastic model for both \(\mathbf {y}\) and \(\mathbf {A}\) is exactly given without any unknown variance and/or covariance parameters. In fact, the reformulated EIV model (4) is a nonlinear adjustment/regression model with peculiar features. Bates and Watts (1980) developed geometrical measures to diagnose the nonlinearity of a nonlinear regression model; applications in geodesy and geophysics can be found, e.g., in Teunissen (1989b). For a general nonlinear regression model, statistical consequences of model nonlinearity have been substantially investigated. Approximate confidence regions and the bias of the nonlinear LS estimate of \({{\varvec{\beta }}}\) were worked out for a general nonlinear regression model by Beale (1960) and Box (1971), respectively. Clarke (1980) derived the first and second moments of the nonlinear LS estimate of \({{\varvec{\beta }}}\). For some more details on nonlinear regression, the reader can refer to, e.g., Ratkowsky (1983) and Seber and Wild (1989). The first and second moments of the nonlinear LS estimate can also be found in the geodetic literature (see, e.g., Teunissen 1989a). Xu et al. (2012) applied the theory and methods by Beale (1960) and Box (1971) to investigate the statistical aspects of parameter estimation in the nonlinear EIV model (4).

However, nothing has been done to analyze finite sample statistical aspects of variance component estimation in a nonlinear regression/adjustment model, to our best knowledge. Wang and Davidian (1996), Wang et al. (1998) and Li et al. (2009) derived an asymptotic bias analysis of the naive estimator of variance components of \(\mathbf {y}\). Nevertheless, these authors ignored the effect of measurement errors of \(\mathbf {A}\) and assumed that the number of measurements \(\mathbf {y}\) was infinite. In this paper, we will directly follow the weighted TLS method to handle the EIV model (1) and carry out the finite sample bias analysis. Following the same rationale as in Xu et al. (2012), we know that the estimates of variance components, as derived in Sect. 3, cannot be unbiased. We will analyze the local biases of the approximate MINQUE estimates (21) of variance components \({\varvec{\sigma }}\!_y\) and \({\varvec{\sigma }}\!_a\), if they are estimable. In the remainder of this section, we will assume: (i) that both \(\mathbf {y}\) and \(\mathbf {A}\) are normally distributed, and (ii) that \(\mathbf {y}\) and \(\mathbf {A}\) are statistically independent.

By saying that an estimator of variance components is locally unbiased, we follow the standard definition in the literature of variance component estimation (see, e.g., Rao and Kleffe 1988; Koch 1999) to mean that the estimator is unbiased under a given set of initial values of the variance components. To derive the biases of the approximate MINQUE variance components \({\varvec{\sigma }}\!_y\) and \({\varvec{\sigma }}\!_a\), we apply the expectation operator to (21):

$$\begin{aligned} E\left\{ \left[ \begin{array}{c} \hat{{\varvec{\sigma }}}\!_y \\ \hat{{\varvec{\sigma }}}\!_a \end{array} \right] \right\} = \left[ \begin{array}{cc} \mathbf {S}_y &{} \mathbf {S}_{ya} \\ \mathbf {S}_{ay} &{} \mathbf {S}_a \end{array} \right] ^{-1} E\left\{ \left[ \begin{array}{c} \mathbf {q}_y \\ \mathbf {q}_a \end{array} \right] \right\} \end{aligned}$$
(39)

if the coefficient matrix is invertible, where the elements of \(\mathbf {q}_y\) and \(\mathbf {q}_a\) are given by (20a) and (20b), respectively. To proceed to analyze the biases of the estimated variance components under the given initial values \(\sigma _{0iy}^2 \, (i=1,2,\ldots ,m_y)\) and \(\sigma _{0ia}^2 \, (i=1,2,\ldots ,m_a)\), we will have to focus on the two quadratic forms \(q_y^i\) of (20a) and \(q_a^i\) of (20b), which further depend on the residuals \(\mathbf {r}_y\) of (13a) and \(\mathbf {r}_a\) of (13b), respectively. Obviously, in the case of a linear model, the MINQUE estimates of variance components are also unbiased (Rao and Kleffe 1988).

5.1 Expanding the residuals \(\mathbf {r}_y\) and \(\mathbf {r}_a\) in terms of \(\epsilon \) and \(\epsilon _a\)

We now expand the residuals \(\mathbf {r}_y\) and \(\mathbf {r}_a\) of (13) at the true values of \({\varvec{\beta }}\) and \(\overline{\mathbf {a}}\) and truncate the Taylor’s expansions up to the second-order approximation of \({\varvec{\epsilon }}\) and \({\varvec{\epsilon }}_a\). In other words, all the terms of expansion with the third- and higher-order partial derivatives are neglected. As a result, we have

$$\begin{aligned} \left[ \begin{array}{c} \mathbf {r}_y \\ \mathbf {r}_a \end{array} \right]&= \left[ \begin{array}{c} \mathbf {y} - \hat{\overline{\mathbf {A}}} \hat{{\varvec{\beta }}} \\ \mathbf {a} - \hat{\overline{\mathbf {a}}} \end{array} \right] \nonumber \\&= {\varvec{\epsilon }}_{ya} - \mathbf {F}({\varvec{\beta }}, \overline{\mathbf {a}}) {\varvec{\phi }} - \frac{1}{2}\mathbf {J}\mathbf {K}{\varvec{\epsilon }}_{ya}, \end{aligned}$$
(40a)

where

$$\begin{aligned}&{\varvec{\epsilon }}_{ya} = \left[ \begin{array}{c} {\varvec{\epsilon }} \\ {\varvec{\epsilon }}_a \end{array} \right] ,\end{aligned}$$
(40b)
$$\begin{aligned}&\mathbf {F}({\varvec{\beta }}, \overline{\mathbf {a}}) = \mathbf {F} = \left[ \begin{array}{l@{\quad }l} \overline{\mathbf {A}} &{} ({\varvec{\beta }}^T\otimes \mathbf {I}_n) \\ \mathbf {0} &{} \mathbf {I}_a \end{array} \right] ,\end{aligned}$$
(40c)
$$\begin{aligned}&{\varvec{\phi }} = \mathbf {K}{\varvec{\epsilon }}_{ya} + \mathbf {q}_{{\beta \overline{a}}},\end{aligned}$$
(40d)
$$\begin{aligned}&\mathbf {K} = \mathbf {N}^{-1} \mathbf {F}^T{\varvec{\Sigma }}^{-1},\end{aligned}$$
(40e)
$$\begin{aligned}&\mathbf {J} = \{ \mathbf {G}_1\mathbf {K}{\varvec{\epsilon }}_{ya}, \mathbf {G}_2\mathbf {K}{\varvec{\epsilon }}_{ya},\ldots ,\mathbf {G}_n \mathbf {K}{\varvec{\epsilon }}_{ya}, \mathbf {0},\ldots , \mathbf {0}\}^T, \end{aligned}$$
(40f)

and the matrices \(\mathbf {G}_i\) of the second partial derivatives from the first nonlinear observation equations (4a) can be readily written as follows:

$$\begin{aligned} \mathbf {G}_i&= \frac{\partial ^2 y_i}{\partial \left[ \begin{array}{c} {\varvec{\beta }} \\ \overline{\mathbf {a}} \end{array} \right] \partial ({\varvec{\beta }}^T, \,\, \overline{\mathbf {a}}^T)} \nonumber \\&= \left[ \begin{array}{cc} \mathbf {0} &{} ( \mathbf {I}_m \otimes \mathbf {e}_i ) \\ ( \mathbf {I}_m \otimes \mathbf {e}_i^T ) &{} \mathbf {0} \end{array} \right] \end{aligned}$$
(40g)

for \(i=1,\,2,\,\ldots \,n.\) Here \(\mathbf {e}_i=(0,\,\ldots \,0, 1,\,0,\ldots \, ,0)\) is an \(n\)-dimensional natural row vector, i.e., all the elements of \(\mathbf {e}_i\) are equal to zero, except for the ith element being equal to unity. Except for the first \(n\) terms with non-zero values, all the remaining terms of \(\mathbf {J}\) are equal to zero. The reason is this: because the expectation functions of \(\mathbf {a}\) in (4b) are linear with respect to \(\overline{\mathbf {a}}\) and have nothing to do with \({\varvec{\beta }}\), the corresponding second partial derivatives with respect to \(\overline{\mathbf {a}}\) and \({\varvec{\beta }}\) are all equal to zero. We should note that when expanding the residuals \(\mathbf {r}_y\) and \(\mathbf {r}_a\) of (13), one should not represent their Taylor’s expansions first up to the second-order approximation in terms of the estimated errors of \({\varvec{\beta }}\) and \(\overline{\mathbf {a}}\) and then use the second-order approximations of these errors \({\varvec{\phi }}\) in (40d) to derive the final truncated expansion (40a); otherwise, one will end up with an expansion of the residuals \(\mathbf {r}_y\) and \(\mathbf {r}_a\), which correctly uses the second-order partial derivatives with respect to \({\varvec{\epsilon }}\) and \({\varvec{\epsilon }}_a\) but incorrectly contains the third- and fourth-order terms of \({\varvec{\epsilon }}\) and \({\varvec{\epsilon }}_a\), as is the case of Box (1971).

5.2 Expectations of the quadratic forms \(\mathbf {q}_y\) and \(\mathbf {q}_a\)

To compute the expectations of \(\mathbf {q}_y\) and \(\mathbf {q}_a\) in (20), we insert the second-order truncated residuals (40a) into the quadratic forms (20a) and (20b) and apply the expectation operator to both of them, respectively. In the case of (20a), under the assumption of normal distribution, we have

$$\begin{aligned} E(q_y^i)&= E( \mathbf {r}_y^T{\varvec{\Sigma }}_{0y}^{-1}\mathbf {U}_{iy}{\varvec{\Sigma }}_{0y}^{-1}\mathbf {r}_y) \nonumber \\&= E\Big \{\big ({\varvec{\epsilon }}-\mathbf {H}_y{\varvec{\Sigma }}_{0y}^{-1}{\varvec{\epsilon }}-\mathbf {H}_{ya}{\varvec{\Sigma }}_{0a}^{-1}{\varvec{\epsilon }}_a-\mathbf {d}_y\big )^T\nonumber \\&\times {\varvec{\Sigma }}_{0y}^{-1}\mathbf {U}_{iy}{\varvec{\Sigma }}_{0y}^{-1}\big ({\varvec{\epsilon }}-\mathbf {H}_y{\varvec{\Sigma }}_{0y}^{-1}{\varvec{\epsilon }}-\mathbf {H}_{ya}{\varvec{\Sigma }}_{0a}^{-1}{\varvec{\epsilon }}_a-\mathbf {d}_y\big )\Big \} \nonumber \\&= E\Big \{\big ({\varvec{\epsilon }}-\mathbf {H}_y{\varvec{\Sigma }}_{0y}^{-1}{\varvec{\epsilon }}-\mathbf {H}_{ya}{\varvec{\Sigma }}_{0a}^{-1}{\varvec{\epsilon }}_a\big )^T{\varvec{\Sigma }}_{0y}^{-1} \mathbf {U}_{iy}{\varvec{\Sigma }}_{0y}^{-1}\nonumber \\&\times \big ({\varvec{\epsilon }}-\mathbf {H}_y{\varvec{\Sigma }}_{0y}^{-1}{\varvec{\epsilon }}-\mathbf {H}_{ya}{\varvec{\Sigma }}_{0a}^{-1}{\varvec{\epsilon }}_a\big ) \Big \}+ u_{y\sigma }^i \nonumber \\&= E\Big \{\big ({\varvec{\epsilon }}-\mathbf {H}_y{\varvec{\Sigma }}_{0y}^{-1}{\varvec{\epsilon }}\big )^T{\varvec{\Sigma }}_{0y}^{-1}\mathbf {U}_{iy}{\varvec{\Sigma }}_{0y}^{-1}\big ({\varvec{\epsilon }} -\mathbf {H}_y{\varvec{\Sigma }}_{0y}^{-1}{\varvec{\epsilon }}\big ) \Big \} \nonumber \\&+ E\Big \{\big (\mathbf {H}_{ya}{\varvec{\Sigma }}_{0a}^{-1}{\varvec{\epsilon }}_a\big )^T{\varvec{\Sigma }}_{0y}^{-1}\mathbf {U}_{iy}{\varvec{\Sigma }}_{0y}^{-1}\mathbf {H}_{ya} {\varvec{\Sigma }}_{0a}^{-1}{\varvec{\epsilon }}_a\Big \}\nonumber \\&+ u_{y\sigma }^i \nonumber \\&= \mathrm tr \{\mathbf {P}_y\mathbf {U}_{iy}\mathbf {P}_y{\varvec{\Sigma }}_y\}\nonumber \\&+ \mathrm tr \Big \{{\varvec{\Sigma }}_{0a}^{-1}\mathbf {H}_{ay}{\varvec{\Sigma }}_{0y}^{-1}\mathbf {U}_{iy}{\varvec{\Sigma }}_{0y}^{-1}\mathbf {H}_{ya}{\varvec{\Sigma }}_{0a}^{-1}{\varvec{\Sigma }}_a \Big \}+ u_{y\sigma }^i \nonumber \\&= \mathbf {s}_y^i{\varvec{\sigma }}_y + \mathbf {s}_{ya}^i {\varvec{\sigma }}_a + u_{y\sigma }^i \end{aligned}$$
(41a)

for \(i=1, 2,\ldots , m_y\), where

$$\begin{aligned} u_{y\sigma }^i&= E\Big \{ \mathbf {d}_y^T{\varvec{\Sigma }}_{0y}^{-1} \mathbf {U}_{iy}{\varvec{\Sigma }}_{0y}^{-1}\mathbf {d}_y \Big \} \nonumber \\&= \mathrm tr \Big \{{\varvec{\Sigma }}_{0y}^{-1} \mathbf {U}_{iy}{\varvec{\Sigma }}_{0y}^{-1}E(\mathbf {d}_y\mathbf {d}_y^T)\Big \}, \end{aligned}$$
(41b)
$$\begin{aligned} \mathbf {d}_y = \mathbf {F}_y \mathbf {q}_{{\beta \overline{a}}} + \frac{1}{2}\mathbf {J}_y\mathbf {K}{\varvec{\epsilon }}_{ya}, \end{aligned}$$
(41c)

\(\mathbf {F}_y=[\overline{\mathbf {A}},\, ({\varvec{\beta }}^T\otimes \mathbf {I}_n)]\), and \(\mathbf {J}_y\) is the column vector of dimension \(n\), corresponding to the first \(n\) none zero elements of \(\mathbf {J}\) in (40f), and \(\mathbf {s}_y^i\) and \( \mathbf {s}_{ya}^i\) are the ith row vector of \(\mathbf {S}_y\) and \(\mathbf {S}_{ya}\) in Eq. (21), respectively.

Following the approach of bias analysis by Box (1971) (see also Xu and Shimada 2000), we can further write the quadratic term \(\mathbf {q}_{{\beta \overline{a}}}\) in (40d) as follows

$$\begin{aligned} \mathbf {q}_{{\beta \overline{a}}} = -\frac{1}{2}\mathbf {K}\mathbf {J}\mathbf {K}{\varvec{\epsilon }}_{ya} + \mathbf {Q} \mathbf {C}^T ( {\varvec{\Sigma }}^{-1} - {\varvec{\Sigma }}^{-1}\mathbf {H}{\varvec{\Sigma }}^{-1}){\varvec{\epsilon }}_{ya}, \end{aligned}$$
(42)

where the matrix \(\mathbf {C}\) is the first-order correction to the matrix \(\mathbf {F}(\hat{{\varvec{\beta }}}, \hat{\overline{\mathbf {a}}})\) expanded at the point of \(({\varvec{\beta }}, \overline{\mathbf {a}})\) and is given by

$$\begin{aligned} \mathbf {C} = \left[ \begin{array}{c} \mathbf {C}_y \\ \mathbf {C}_a \end{array} \right] . \end{aligned}$$

\(\mathbf {C}_a=\mathbf {0}\), since the measurements \(\mathbf {a}\) are not functions of \({\varvec{\beta }}\), and the submatrix \(\mathbf {C}_y\) is equal to

$$\begin{aligned} \mathbf {C}_y = [ \mathbf {C}_y^1\mathbf {K}{\varvec{\epsilon }}_{ya}, \mathbf {C}_y^2\mathbf {K}{\varvec{\epsilon }}_{ya},\ldots , \mathbf {C}_y^{(n+1)m}\mathbf {K}{\varvec{\epsilon }}_{ya}]. \end{aligned}$$

Here, the first \(m\) submatrices \(\mathbf {C}_y^i\) \((i=1,2,\ldots ,m)\) and the remaining \((n\times m)\) submatrices \(\mathbf {C}_y^j\) \(\{i=(m+1),(m+2),\ldots ,(n+1)m\}\) correspond to \({\varvec{\beta }}\) and \(\overline{\mathbf {a}}\), respectively, and are computed as follows:

$$\begin{aligned} \mathbf {C}_y^i = \frac{\partial ^2 \mathbf {y}}{\partial \beta _i \partial ({\varvec{\beta }}, \overline{\mathbf {a}})^T} = [\mathbf {0}, \, ( \mathbf {e}^T_i\otimes \mathbf {I}_n)] \end{aligned}$$

for \(i=1,2,\ldots ,m\), and

$$\begin{aligned} \mathbf {C}_y^{m+j} = \frac{\partial ^2 \mathbf {y}}{\partial \overline{a}_j \partial ({\varvec{\beta }}, \overline{\mathbf {a}})^T} = [(\mathbf {I}_m\otimes \mathbf {I}_n)\mathbf {e}_j, \, \mathbf {0}] \end{aligned}$$

for \(j=1,2,\ldots ,(n\times m)\).

To compute \(u_{\sigma }^i\) of (41b), we first need to compute the expectation of \(\mathbf {d}_y\mathbf {d}_y^T\), namely,

$$\begin{aligned} E(\mathbf {d}_y\mathbf {d}_y^T)&= E(\mathbf {F}_y\mathbf {q}_{{\beta \overline{a}}}\mathbf {q}_{{\beta \overline{a}}}^T\mathbf {F}^T_y) + \frac{1}{2}E(\mathbf {F}_y \mathbf {q}_{{\beta \overline{a}}}{\varvec{\epsilon }}_{ya}^T\mathbf {K}^T \mathbf {J}_y^T) \nonumber \\&+\, \frac{1}{2}E(\mathbf {J}_y\mathbf {K}{\varvec{\epsilon }}_{ya}\mathbf {q}_{{\beta \overline{a}}}^T\mathbf {F}^T_y) \nonumber \\&+\, \frac{1}{4}E(\mathbf {J}_y\mathbf {K}{\varvec{\epsilon }}_{ya}{\varvec{ \epsilon }}_{ya}^T\mathbf {K}^T\mathbf {J}_y^T) \nonumber \\&= \mathbf {D}_{y1} + \mathbf {D}_{y2} + \mathbf {D}_{y3} + \mathbf {D}_{y4}, \end{aligned}$$
(43)

where

$$\begin{aligned}&\mathbf {D}_{y1} = E(\mathbf {F}_y\mathbf {q}_{{\beta \overline{a}}}\mathbf {q}_{{\beta \overline{a}}}^T\mathbf {F}^T_y),\\&\mathbf {D}_{y2} = \frac{1}{2}E(\mathbf {F}_y \mathbf {q}_{{\beta \overline{a}}}{\varvec{\epsilon }}_{ya}^T\mathbf {K}^T\mathbf {J}_y^T),\\&\mathbf {D}_{y3} = \mathbf {D}_{y2}^T,\\&\mathbf {D}_{y4} = \frac{1}{4}E(\mathbf {J}_y\mathbf {K}{\varvec{\epsilon }}_{ya} {\varvec{\epsilon }}_{ya}^T\mathbf {K}^T\mathbf {J}_y^T). \end{aligned}$$

Because both \({\varvec{\epsilon }}\) and \({\varvec{\epsilon }}_a\) have been assumed to be normally distributed with zero mean and to be statistically independent, we can use statistical results of quadratic forms (Searle 1971) and obtain, after some technical derivations and keeping in mind that \((\mathbf {I}-{\varvec{\Sigma }}^{-1}\mathbf {H}){\varvec{\Sigma }}^{-1} \mathbf {F}=\mathbf {0}\), the expectations of the four matrices \(\mathbf {D}_{y1}\), \(\mathbf {D}_{y2}\), \(\mathbf {D}_{y3}\) and \(\mathbf {D}_{y4}\) of (43) as follows:

$$\begin{aligned} \mathbf {D}_{y1}&= E(\mathbf {F}_y\mathbf {q}_{{\beta \overline{a}}}\mathbf {q}_{{\beta \overline{a}}}^T\mathbf {F}^T_y) \nonumber \\&= \mathbf {F}_yE(\mathbf {q}_{{\beta \overline{a}}}\mathbf {q}_{{\beta \overline{a}}}^T)\mathbf {F}^T_y \nonumber \\&= \frac{1}{4}\mathbf {F}_y\mathbf {K}\mathbf {M}_q \mathbf {K}^T\mathbf {F}_y^T,\end{aligned}$$
(44a)
$$\begin{aligned} \mathbf {D}_{y2}&= -\frac{1}{4}\overline{\mathbf {A}}\mathbf {Q}_{\beta } \overline{\mathbf {A}}^{{ T}}{\varvec{\Sigma }}^{-1}_y\mathbf {M}_{jy} \nonumber \\&- \frac{1}{4}({\varvec{\beta }}^T\otimes \mathbf {I}_n)\mathbf {Q}_{\overline{\mathbf {a}}\beta } \overline{\mathbf {A}}^{{ T}}{\varvec{\Sigma }}^{-1}_y\mathbf {M}_{jy},\end{aligned}$$
(44b)
$$\begin{aligned} \mathbf {D}_{y4}&= \frac{1}{4}\mathbf {M}_{jy}, \end{aligned}$$
(44c)

\(\mathbf {M}_q\) is given by

$$\begin{aligned} \mathbf {M}_q = \left[ \begin{array}{cc} \mathbf {M}_{jy} &{} \mathbf {0} \\ \mathbf {0} &{} \mathbf {0} \end{array} \right] , \end{aligned}$$
(44d)

which is a square matrix with the dimension equal to the total number of elements of \({\varvec{\beta }}\) and \(\mathbf {A}\), and \(\mathbf {M}_{jy}\) is an \((n\,\times \, n)\) matrix with its element given by

$$\begin{aligned} \mathbf {M}_{jy}(i,j) \!=\! \mathrm tr (\mathbf {G}_i\mathbf {Q}\mathbf {G}_j\mathbf {Q}), \quad \mathrm for \quad i, j \!=\! 1,2,\ldots , n. \end{aligned}$$
(44e)

Finally, by inserting (43) and (44) into (41b), we can compute each element of \(u_{\sigma }^i\) \((i=1,2,\ldots ,n)\).

In the similar manner to (41a), we compute the expectation \(E(q_a^i)\) of the ith element of \(\mathbf {q}_a\) as follows:

$$\begin{aligned} E(q_a^i)&= E( \mathbf {r}_a^T{\varvec{\Sigma }}_{0a}^{-1}\mathbf {U}_{ia}{\varvec{\Sigma }}_{0a}^{-1}\mathbf {r}_a) \nonumber \\&= E\Big \{\big ({\varvec{\epsilon }}_a-\mathbf {H}_a{\varvec{\Sigma }}_{0a}^{-1}{\varvec{\epsilon }}_a-\mathbf {H}_{ay}{\varvec{\Sigma }}_{0y}^{-1}{\varvec{\epsilon }}\big )^T{\varvec{\Sigma }}_{0a}^{-1} \mathbf {U}_{ia}{\varvec{\Sigma }}_{0a}^{-1}\big ({\varvec{\epsilon }}_a\nonumber \\&-\mathbf {H}_a{\varvec{\Sigma }}_{0a}^{-1}{\varvec{\epsilon }}_a-\mathbf {H}_{ay}{\varvec{\Sigma }}_{0y}^{-1}{\varvec{\epsilon }}\big )\Big \}+ u_{a\sigma }^i \nonumber \\&= E\Big \{\big ({\varvec{\epsilon }}_a-\mathbf {H}_a{\varvec{\Sigma }}_{0a}^{-1}{\varvec{\epsilon }}_a\big )^T{\varvec{\Sigma }}_{0a}^{-1}\mathbf {U}_{ia}{\varvec{\Sigma }}_{0a}^{-1}\big ({\varvec{\epsilon }}_a\nonumber \\&-\mathbf {H}_a{\varvec{\Sigma }}_{0a}^{-1}{\varvec{\epsilon }}_a\big ) \Big \} \nonumber \\&+ E\Big \{\big (\mathbf {H}_{ay}{\varvec{\Sigma }}_{0y}^{-1}{\varvec{ \epsilon }}\big )^T{\varvec{\Sigma }}_{0a}^{-1} \mathbf {U}_{ia}{\varvec{\Sigma }}_{0a}^{-1}\mathbf {H}_{ay}{\varvec{ \Sigma }}_{0y}^{-1}{\varvec{\epsilon }} \Big \} + u_{a\sigma }^i \nonumber \\&= \mathrm tr \{\mathbf {P}_a\mathbf {U}_{ia}\mathbf {P}_a{\varvec{\Sigma }}_a\}\nonumber \\&+ \mathrm tr \Big \{{\varvec{\Sigma }}_{0y}^{-1}\mathbf {H}_{ya}{\varvec{\Sigma }}_{0a}^{-1}\mathbf {U}_{ia}{\varvec{\Sigma }}_{0a}^{-1}\mathbf {H}_{ay}{\varvec{ \Sigma }}_{0y}^{-1}{\varvec{\Sigma }}_y \Big \}\nonumber \\&+ u_{a\sigma }^i \nonumber \\&= \mathbf {s}_{ay}^i{\varvec{\sigma }}_y + \mathbf {s}_a^i{\varvec{\sigma }}_a+ u_{a\sigma }^i \end{aligned}$$
(45a)

for \(i=1, 2,\ldots , m_a\), where \(\mathbf {F}_a=[\mathbf {0}, \mathbf {I}_a]\) and \(u_{a\sigma }^i\) is given by

$$\begin{aligned} u_{a\sigma }^i&= E\{(\mathbf {F}_a\mathbf {q}_{{\beta \overline{a}}})^T\mathbf {F}_a\mathbf {q}_{{\beta \overline{a}}}\} \nonumber \\&= \frac{1}{4} \mathrm tr \left\{ \left[ \begin{array}{cc} \mathbf {0} &{} \mathbf {0} \\ \mathbf {0} &{} {\varvec{\Sigma }}_{0a}^{-1} \mathbf {U}_{ia}{\varvec{ \Sigma }}_{0a}^{-1} \end{array} \right] \mathbf {K}\mathbf {M}_q\mathbf {K}^T \right\} . \end{aligned}$$
(45b)

5.3 Biases of the estimated variance components

By collecting the expectation of \(\mathbf {q}_y\) in (41a) and that of \(\mathbf {q}_a\) in (45a) together and writing them in matrix form, we have

$$\begin{aligned} E \left\{ \begin{array}{c} \mathbf {q}_y \\ \mathbf {q}_a \end{array} \right\} = \left[ \begin{array}{cc} \mathbf {S}_y &{} \mathbf {S}_{ya} \\ \mathbf {S}_{ay} &{} \mathbf {S}_a \end{array} \right] \left[ \begin{array}{c} {\varvec{\sigma }}_y \\ {\varvec{\sigma }}_a \end{array} \right] + \left[ \begin{array}{c} \mathbf {u}_{y\sigma } \\ \mathbf {u}_{a\sigma } \end{array} \right] , \end{aligned}$$
(46)

where the elements of both \(\mathbf {u}_{y\sigma }\) and \(\mathbf {u}_{a\sigma }\) have been derived and given by (41b) and (45b), respectively. It is now immediately clear that the approximate MINQUE estimate (21) of the variance components is biased. The biases of \(\hat{{\varvec{\sigma }}}\!_y\) and \(\hat{{\varvec{\sigma }}}\!_a\), denoted, respectively, by \(\mathbf {b}_{y\sigma }\) and \(\mathbf {b}_{a\sigma }\), are given as follows:

$$\begin{aligned} \left[ \begin{array}{c} \mathbf {b}_{y\sigma } \\ \mathbf {b}_{a\sigma } \end{array} \right]&= E \left\{ \begin{array}{c} \hat{{\varvec{\sigma }}}\!_y \\ \hat{{\varvec{\sigma }}}\!_a \end{array} \right\} - \left[ \begin{array}{c} {\varvec{\sigma }}_y \\ {\varvec{\sigma }}_a \end{array} \right] \nonumber \\&= \left[ \begin{array}{cc} \mathbf {S}_y &{} \mathbf {S}_{ya} \\ \mathbf {S}_{ay} &{} \mathbf {S}_a \end{array} \right] ^{-1} E \left\{ \begin{array}{c} \mathbf {q}_y \\ \mathbf {q}_a \end{array} \right\} - \left[ \begin{array}{c} {\varvec{\sigma }}_y \\ {\varvec{\sigma }}_a \end{array} \right] \nonumber \\&= \left[ \begin{array}{cc} \mathbf {S}_y &{} \mathbf {S}_{ya} \\ \mathbf {S}_{ay} &{} \mathbf {S}_a \end{array} \right] ^{-1} \left[ \begin{array}{c} \mathbf {u}_{y\sigma } \\ \mathbf {u}_{a\sigma } \end{array} \right] , \end{aligned}$$
(47)

if the coefficient matrix is invertible. Obviously, if this coefficient matrix is ill-conditioned, the condition number will be very large, as shown in Fig. 1, and as a result, the biases of the estimated variance components can become significantly large.

We should like to remark that the values \(u_{y\sigma }^i\) in (41b) and \(u_{a\sigma }^i\) in (45b) are always positive. As a result, all the biases of the componentwise positive estimators of variance components, as discussed in Sect. 3.3, are obviously positive as well. In other words, the approximate MINQUE (componentwise positive) estimators of variance components always statistically produce larger values for all the variance components. While the asymptotic biases of the naive estimator for the variance components of \(\mathbf {y}\) are due to the fact that the random errors of \(\mathbf {A}\) are ignored in the construction of such an estimator (Wang and Davidian 1996; Wang et al. 1998; Li et al. 2009), the finite sample bias analysis of variance components in this paper is valid for both the variance components of \(\mathbf {y}\) and \(\mathbf {A}\) and shows that the source of biases comes from the induced-model nonlinearity of the random errors of \(\mathbf {A}\). Finally, we would like to note that the biases of the estimated variance components will result in incorrect determination of weights of the measurements. As a consequence, the variance–covariance matrices of the weighted TLS estimates of \({\varvec{\beta }}\) and \(\overline{\mathbf {a}}\) will be affected, the extent of which will depend on the sizes of the biases of the estimated variance components. If the biases of the estimated variance components are significantly amplified due to instability, the corresponding weight matrices of the measurements will simply be erroneous and the formal variance–covariance matrices of the weighted TLS estimates \(\hat{{\varvec{\beta }}}\) and \(\hat{\overline{\mathbf {a}}}\) cannot properly reflect their corresponding accuracy any more.

6 Concluding remarks

Total least squares has been substantially developed for more than one century and found a wide variety of applications, either from the point of view of approximation (see, e.g., Golub and Loan 1980; Huffel and Vandewalle 1991) or from the statistical point of view (see, e.g., Pearson 1901; Deming 1931, 1934; Schaffrin and Wieser 2008; Xu et al. 2012). Very often, one assumes that the variance–covariance matrices of \(\mathbf {y}\) and \(\mathbf {A}\) are given, or at least, are given up to an unknown variance of unit weight. In practice, however, we have to solve problems in physical, statistical and engineering sciences in which both \(\mathbf {y}\) and \(\mathbf {A}\) in (1) are not necessarily of the same types of measurements. As a result, we have to simultaneously estimate the EIV model parameters \({\varvec{\beta }}\) and the variance components in the EIV stochastic models. We have proved that the variance components in the EIV stochastic models are not estimable, if all the elements of \(\mathbf {A}\) are random without any functional constraints and can be classified into, at least, two groups of data of the same accuracy. The same is true for the EIV model of regression type (35). This result of inestimability statistically implies that we cannot do anything to gain any knowledge on such EIV stochastic models. From this point of view, to gain knowledge on the EIV stochastic models, one will have to directly collect repeated measurements and/or alike of the elements of \(\mathbf {A}\) for such a stochastic model. Otherwise, if the variance components are estimable, we have derived the MINQUE estimates of the variance components in EIV models. For block-wise \(\mathbf {U}_{iy}\) and \(\mathbf {U}_{ia}\), we have also derived the componentwise positive estimates of the variance components. The estimation of variance components in the EIV model may be unstable, however, as confirmed by the simulated numerical examples. Finally, we have worked out the finite sample biases of the variance components, if they are estimable. As a result of equation instability, the biases of the estimated variance components could be significantly amplified due to a large condition number.