Keywords

1 Introduction

The Errors-In-Variables (EIV) Model has recently seen a lot of attention since, in accordance with Golub and van Loan (1980), it can be treated in its nonlinear form by a least-squares approach that they coined “Total Least-Squares adjustment”. It eventually leads to a (generalized) eigenvalue problem that needs to be solved in lieu of the sequence of normal equations that would result from a traditional “Least-Squares adjustment” within iteratively linearized models. The latter approach dates, at least, back to Helmert (1907), but has as well been used by Deming (1931, 1934) for the approximation of curves and, more recently, by Neitzel (2010) to determine the parameters of a similarity transformation.

In contrast, the nonlinear Total Least-Squares (TLS) approach which, in its original formulation, could tolerate only “element-wise weighting” and thus only diagonal weight matrices, has since been generalized in several steps by Schaffrin and Wieser (2008), Fang (2011), and Mahboub (2012) to now accept any positive-definite weight matrices. This development will be presented in the following Sect. 2, thereby showing how the more specialized algorithms can be derived from the more general ones by simplification.

Moreover, it should be noted that progress has also been made towards the use of positive-semidefinite dispersion matrices in TLS adjustment, which may be handled as described by Schaffrin et al. (2014). These cases are quite relevant whenever the random error matrix needs to show a certain pattern or structure after the adjustment. Due to the limited space, these advanced methods will not be discussed below.

Instead, attention will be paid to a triplet of classical nonlinear models that all can be constructed to be equivalent to the EIV-Model and, furthermore, may undergo a sequence of Least-Squares adjustments via iterative linearization which, in the end, converge to the very same TLS solution. This will be the theme in Sect. 3 although many details have to be left out; for those, see Schaffrin (2015).

2 Nonlinear TLS Adjustment in an EIV-Model

2.1 Fang’s Algorithm

Let the EIV-Model be defined by

$$ y=\underset{n\times m}{\underbrace{\left(A-{E}_A\right)}}\xi +{e}_y,\kern2em {\it rkA}=m<n, $$
(1a)
$$\begin{array}{rcl} e:&=& \left[ \begin{array}{l}\kern1.25em {e}_y\\ {}{e}_A:={\it vec}{E}_A\end{array}\right]\\ &&\quad \sim \left( \left[\begin{array}{c}\hfill 0\hfill \\ {}\hfill 0\hfill \end{array}\right],\kern0.5em {\sigma}_o^2 \left[\begin{array}{cc}\hfill \underset{n\times n}{P_y^{-1}}\hfill \hfill \kern1.5em0\hfill \\ {}\hfill 0\hfill \hfill \kern1.5em\underset{nm\times nm}{P_A^{-1}}\hfill \end{array}\right] =:{\sigma}_o^2{P}^{-1}\right)\end{array} $$
(1b)

where

  • y is the \( n\times 1 \) observation vector;

  • A is the \( n\times m \) (random) coefficient matrix with full column rank (aka “data matrix”);

  • E A is the \( n\times m \) (unknown) random error matrix associated with A;

  • ξ is the \( m\times 1 \) (unknown) parameter vector;

  • e y is the \( n\times 1 \) (unknown) random error vector associated with~y;

  • e A is the \( nm\times 1 \) vectorial form of the matrix E A ;

  • Q is the \( n\left(m+1\right)\times n\left(m+1\right) \) block-diagonal pos.- def. cofactor matrix;

  • \( P:={Q}^{-1} \) is the corresponding block-diagonal pos.- def. weight matrix;

  • σ 2 o is the (unknown) variance component (unit- free);

  • \( \mathrm{C}\mathrm{o}\mathrm{v}\left\{e_y, {\it vec}E_A\right\}=0 \) for the sake of simplicity.

The model generalizes the one used by Schaffrin and Wieser (2008) where a Kronecker product structure for

$$ {Q}_A={P}_A^{-1}={Q}_o\otimes {Q}_x $$
(2)

was assumed, as well as the one used by Golub and von Loan (1980) who only allowed diagonal cofactor matrices with

$$ {Q}_o:={I}_m,\kern1em {Q}_x:={Q}_y= {\it Diag}\left({p}_1^{-1},.......,{p}_n^{-1}\right)={P}_y^{-1}. $$
(3)

The objectives of a nonlinear Total Least-Squares (TLS) adjustment are now based on the principle

$$ {e}_y^T{P}_y{e}_y+{e}_A^T{P}_A{e}_A= \min .\kern1.25em \mathrm{s}.\mathrm{t}.\kern0.75em \left(1\mathrm{a}\right), $$
(4)

which can be given the equivalent form of a Lagrange target function, namely:

$$ \begin{array}{l}\phi \left({e}_y,{e}_A,\xi, \lambda \right):={e}_y^T{P}_y{e}_y+{e}_A^T{P}_A{e}_A+\\ {}\kern2.75em +2{\lambda}^T\left[y-A\xi -{e}_y+\left({\xi}^T\otimes {I}_n\right){e}_A\right]=\mathrm{stationary}.\end{array} $$
(5)

Consequently, the Euler-Lagrange necessary conditions result in the following system of nonlinear “normal equations”:

$$ \frac{1}{2}\frac{\partial \phi }{\partial {e}_y}={P}_y{\tilde{e}}_y-\widehat{\lambda}\dot{=}0 $$
(6a)
$$ \frac{1}{2}\frac{\partial \phi }{\partial {e}_A}={P}_A{\tilde{e}}_A+\left({\widehat{\xi}}^T\otimes {I}_n\right)\widehat{\lambda}\dot{=}0, $$
(6b)
$$ \frac{1}{2}\frac{\partial \phi }{\partial \xi }=-{\left(A-{\tilde{E}}_A\right)}^T\widehat{\lambda}\dot{=}0, $$
(6c)
$$ \frac{1}{2}\frac{\partial \phi }{\partial \lambda }=y-A\widehat{\xi}-{\tilde{e}}_y+\left({\widehat{\xi}}^T\otimes {I}_n\right){\tilde{e}}_A\dot{=}0, $$
(6d)

which still needs to be reduced by partial elimination since the sufficient condition is fulfilled as

$$ \frac{1}{2}\frac{\partial^2\phi }{\partial \left[\begin{array}{c}\hfill {e}_y\hfill \\ {}\hfill {e}_A\hfill \end{array}\right]\partial \left[\begin{array}{cc}\hfill \left.{e}_y^T\right|\hfill \hfill {e}_A^T\hfill \end{array}\right]}=\left[\begin{array}{cc}\hfill {P}_y\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill {P}_A\hfill \end{array}\right]\kern0.5em \mathrm{is}\ \mathrm{p}\mathrm{o}\mathrm{s}.\hbox{-} \mathrm{d}\mathrm{e}\mathrm{f}. $$
(7)

Now, (6a, b) are transformed to provide the residual vectors through

$$ {\tilde{e}}_y={Q}_y\widehat{\lambda}\kern0.5em \mathrm{and}\kern0.5em {\tilde{e}}_A=-{Q}_A\left(\widehat{\xi}\otimes {I}_n\right)\widehat{\lambda} $$
(8a)

so that (6d) can be rewritten as

$$ y-A\widehat{\xi}=\left[{Q}_y+{\left(\widehat{\xi}\otimes {I}_n\right)}^T{Q}_A\left(\widehat{\xi}\otimes {I}_n\right)\right]\cdot \widehat{\lambda}=:{Q}_1\cdot \widehat{\lambda}, $$
(8b)

with \( {Q}_1={Q}_1\left(\widehat{\xi}\right) \) being nonsingular, thus leading to

$$ \widehat{\lambda}={Q}_1^{-1}\left(y-A\widehat{\xi}\right) $$
(9)

and, together with (6c), to the system

$$ \left[\begin{array}{cc}\hfill {Q}_1\hfill & \hfill \left(A-{\tilde{E}}_A\right)\hfill \\ {}\hfill {\left(A-{\tilde{E}}_A\right)}^T\hfill & \hfill 0\hfill \end{array}\right]\left[\begin{array}{c}\hfill \widehat{\lambda}\hfill \\ {}\hfill \widehat{\xi}\hfill \end{array}\right]=\left[\begin{array}{c}\hfill y-{\tilde{E}}_A\widehat{\xi}\hfill \\ {}\hfill 0\hfill \end{array}\right] $$
(10)

Obviously, the estimated parameter vector is now obtained as in Fang (2011, p.27) via

$$ \begin{array}{ll} \widehat{\xi}={\left[{\left(A-{\tilde{E}}_A\right)}^T{Q}_1^{-1}\left(A-{\tilde{E}}_A\right)\right]}^{-1}{\left(A-{\tilde{E}}_A\right)}^T{Q}_1^{-1}\ \cdot\\ \hspace*{18pt} \cdot\left(y-{\tilde{E}}_A\widehat{\xi}\right)\end{array} $$
(11)

and allows updates for Q 1, \( \widehat{\lambda} \), and A , from which a new estimate \( \widehat{\xi} \) results.

The Total Sum of weighted Squared Residuals (TSSR) may now readily be computed from

$$ \begin{array}{ll}{\tilde{e}}_y^T{P}_y{\tilde{e}}_y{+}{\tilde{e}}_A^T{P}_A{\tilde{e}}_A{=}{\widehat{\lambda}}^T\!\left[{Q}_y{+}{\left(\widehat{\xi}\otimes {I}_n\right)}^T{Q}_A\left(\widehat{\xi}\otimes {I}_n\right)\right]\widehat{\lambda}\,{=}\\ {}={\widehat{\lambda}}^T{Q}_1\widehat{\lambda}={\widehat{\lambda}}^T\left(y-A\widehat{\xi}\right)=:\mathrm{TSSR}\end{array} $$
(12)

so that a suitable variance component estimate may be obtained through

$$ {\widehat{\sigma}}_o^2={\widehat{\lambda}}^T\left(y-A\widehat{\xi}\right)/\left(n-m\right)=\mathrm{TSSR}/\left(\mathrm{n}\hbox{-} \mathrm{m}\right) $$
(13)

as the redundancy in model (1a, b) is still n-m.

Alternatively, system (10) can be given the asymmetric form

$$ \left[\begin{array}{cc}\hfill {Q}_1\hfill & \hfill A\hfill \\ {}\hfill {\left(A-{\tilde{E}}_A\right)}^T\hfill & \hfill 0\hfill \end{array}\right]\left[\begin{array}{c}\hfill \widehat{\lambda}\hfill \\ {}\hfill \widehat{\xi}\hfill \end{array}\right]=\left[\begin{array}{c}\hfill y\hfill \\ {}\hfill 0\hfill \end{array}\right] $$
(14)

which would then provide the estimated parameter vector through

$$ \widehat{\xi}={\left[{\left(A-{\tilde{E}}_A\right)}^T{Q}_1^{-1}A\right]}^{-1}{\left(A-{\tilde{E}}_A\right)}^T{Q}_1^{-1}y $$
(15)

and should lead to a similar iteration as before. Note that (15) also appears as formula (21) in Xu et al. (2012), but essentially represents a variant of Fang’s algorithm; also, cf. Fang (2013) where further alternatives are presented.

2.2 Mahboub’s Algorithm

On the other hand, combining (9) with (6c) leads to the following sequence of identities:

$$ \begin{array}{l}{A}^T{Q}_1^{-1}\left(y-A\widehat{\xi}\right){=}{A}^T\widehat{\lambda}{=}{\tilde{E}}_A^T\widehat{\lambda}{=}\left({\widehat{\lambda}}^T\otimes {I}_m\right){\it vec}\left({\tilde{E}}_A^T\right)=\\ {}=\left({\widehat{\lambda}}^T\otimes {I}_m\right)\cdot \left(K{\tilde{e}}_A\right)=\left({I}_m\otimes {\widehat{\lambda}}^T\right){\tilde{e}}_A=\\ {}=-\left[{\left({I}_m\otimes \widehat{\lambda}\right)}^T{Q}_A\left(\widehat{\xi}\otimes {I}_n\right)\right]\widehat{\lambda}=:-{R}_1\cdot \widehat{\lambda}=\\ {}=-{R}_1\cdot {Q}_1^{-1}\left(y-A\widehat{\xi}\right)\end{array} $$
(16)

where K denotes a \( nm\times nm \)commutation matrix” that is also known as “vec-permutation matrix”; for more details, see Magnus and Neudecker (2007).

Obviously, (16) translates into the estimated parameter vector

$$ \widehat{\xi}={\left[\left({A}^T+{R}_1\right){Q}_1^{-1}A\right]}^{-1}\left({A}^T+{R}_1\right){Q}_1^{-1}y $$
(17a)

with \( {R}_1={R}_1\left(\widehat{\xi},\widehat{\lambda}\right) \) and, from (16), with

$$ {R}_1\widehat{\lambda}=-{\tilde{E}}_A^T\widehat{\lambda} $$
(17b)

without necessarily implying that \( {R}_1=-{\tilde{E}}_A^T \). Therefore, the sequence of solutions to (15) may differ from the sequence of solutions to (17a) when iteratively updating Q 1, \( \widehat{\lambda} \), and R 1, before a new parameter vector estimate \( \widehat{\xi} \) can be found; yet the ultimate convergence points will be the same.

Again, the TSSR can be computed from (12) which will lead to the variance component estimate in (13).

2.3 A New Variant of Mahboub’s Algorithm

After giving (16) the form

$$ {A}^T{Q}_1^{-1}\left(y-A\widehat{\xi} \right)=-\left[{\left({I}_m\otimes \widehat{\lambda}\right)}^T{Q}_A\left({I}_m\otimes \widehat{\lambda}\right)\right]\widehat{\xi}, $$
(18a)

the estimated parameter vector may as well be obtained from

$$ \widehat{\xi}=\Big[{A}^T{Q}_1^{-1}A-{\left[{\left({I}_m\otimes \widehat{\lambda}\right)}^T{Q}_A\left({I}_m\otimes \widehat{\lambda}\right)\right]}^{-1}{A}^T{Q}_1^{-1}y $$
(18b)

thus allowing updates for Q 1 and \( \widehat{\lambda} \). This algorithm will be further explored in the near future.

2.4 The Schaffrin–Wieser Algorithm

This algorithm was designed for the somewhat more special case where the cofactor matrix Q A can be split into a Kronecker product, thereby indicating that all columns have cofactor matrices proportional to each other. This implies

$$ {Q}_A={Q}_o\otimes {Q}_x\Rightarrow {Q}_1={Q}_y+\left({\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {Q}_x $$
(19)

and thus

$$ \begin{array}{l}{A}^T{Q}_1^{-1}\left(y-A\widehat{\xi}\right)={A}^T{\left[{Q}_y+\left({\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {Q}_x\right]}^{-1}\cdot\\{}\cdot\left(y-A\widehat{\xi}\right)={A}^T\widehat{\lambda}={\tilde{E}}_A^T\widehat{\lambda}\end{array} $$
(20)

with

$$ {\tilde{e}}_A=-\left({Q}_o\widehat{\xi}\otimes {Q}_x\right)\widehat{\lambda}=-{\it vec}\left({Q}_x\widehat{\lambda}{\widehat{\xi}}^T{Q}_o\right)={\it vec}{\tilde{E}}_A. $$
(21)

(20) and (21) together generate the identity

$$ \begin{array}{l}{A}^T{\left[{Q}_y+\left({\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {Q}_x\right]}^{-1}\left(y-A\widehat{\xi}\right)=\\ {}\kern6em =-{Q}_o\widehat{\xi}\cdot \left({\widehat{\lambda}}^T{Q}_x\widehat{\lambda}\right)=:-{Q}_o\widehat{\xi}\cdot \widehat{{\textit v}}\ \end{array} $$
(22a)

suggesting the iteration

$$ \begin{array}{l}\widehat{\xi}={\left({A}^T{\left[{Q}_y+\left({\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {Q}_x\right]}^{-1}A-{Q}_o\widehat{{\textit v}}\right)}^{-1}\cdot \\ {}\kern6em \cdot {A}^T{\left[{Q}_y+\left({\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {Q}_x\right]}^{-1}y\ \end{array} $$
(22b)

with

$$\begin{array}{rcl} \widehat{{\textit v}}:&=&\left({\widehat{\lambda}}^T{Q}_x\widehat{\lambda}\right)\kern0.75em \mathrm{and}\kern0.5em \widehat{\lambda}:={\left[{Q}_y+\left({\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {Q}_x\right]}^{-1} \cdot\\[12pt] &&\cdot\left(y-A\widehat{\xi}\right)\end{array}\\[-24pt]$$
(22c)

while (12) and (13) generate first the TSSR and then a suitable variance component estimate.

2.5 The Golub-van-Loan Algorithm

Now, the condition (19) is further specialized to

$$ {Q}_x:={Q}_y\Rightarrow {Q}_1=\left(1+{\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {Q}_y $$
(23a)

and

$$ \widehat{\lambda}={\left(1+{\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)}^{-1}{Q}_y^{-1}\left(y-A\widehat{\xi}\right) $$
(23b)

so that (22a) becomes

$$ {A}^T{Q}_y^{-1}\left(y{-}A\widehat{\xi}\right){=}-{Q}_o\widehat{\xi}\cdot \widehat{{\textit v}}\left(1{+}{\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)=:{-}{Q}_o\widehat{\xi}\cdot {\sigma}_{\min}^2 $$
(24a)

with

$$ \begin{array}{l}{\sigma}_{\min}^2=\left({\widehat{\lambda}}^T{Q}_y\widehat{\lambda}\right)\left(1+{\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)=\\ {}\kern1.75em ={\left(y-A\widehat{\xi}\right)}^T{Q}_y^{-1}\left(y-A\widehat{\xi}\right)/\left(1+{\widehat{\xi}}^T{Q}_o\widehat{\xi}\right),\end{array} $$
(24b)

and this, from (24a, b), becomes

$$ \begin{array}{l}{\sigma}_{\min}^2\cdot \left(1{+}{\widehat{\xi}}^T{Q}_o\widehat{\xi}\right){=}{y}^T{Q}_y^{-1}\left(y{-}A\widehat{\xi}\right){+}\left({\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {\sigma}_{\min}^2{\Rightarrow} \\ {}\Rightarrow {\sigma}_{\min}^2={y}^T{Q}_y^{-1}\left(y-A\widehat{\xi}\right)=\mathrm{TSSR}\end{array} $$
(24c)

(24a) and (24c) allow the problem to be rephrased as a generalized eigenvalue problem, specifically as:

$$ \left[\begin{array}{cc}\hfill {A}^T{Q}_y^{-1}A\hfill & \hfill {A}^T{Q}_y^{-1}y\hfill \\ {}\hfill {y}^T{Q}_y^{-1}A\hfill & \hfill {y}^T{Q}_y^{-1}y\hfill \end{array}\right]\left[\begin{array}{c}\hfill \widehat{\xi}\hfill \\ {}\hfill -1\hfill \end{array}\right]=\left[\begin{array}{cc}\hfill {Q}_o\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill 1\hfill \end{array}\right]\left[\begin{array}{c}\hfill \widehat{\xi}\hfill \\ {}\hfill -1\hfill \end{array}\right]\cdot {\sigma}_{\min}^2 $$
(25)

with the variance component estimate

$$ {\widehat{\sigma}}_o^2={\sigma}_{\min}^2/\left(n-m\right) $$
(26)

The original situation, treated by Golub and van Loan (1980), was characterized by the further specializations

$$ {Q}_o:={I}_m\kern0.5em \mathrm{and}\kern0.5em {Q}_y:=\mathrm{Diag}\left({p}_1^{-1},.........,{p}_n^{-1}\right)={P}^{-1} $$
(27)

which, in turn, lead to the standard eigenvalue problem

$$ \left[\begin{array}{cc}\hfill {A}^TPA\hfill & \hfill {A}^TPy\hfill \\ {}\hfill {y}^TPA\hfill & \hfill {y}^TPy\hfill \end{array}\right]\left[\begin{array}{c}\hfill \widehat{\xi}\hfill \\ {}\hfill -1\hfill \end{array}\right]=\left[\begin{array}{c}\hfill \widehat{\xi}\hfill \\ {}\hfill -1\hfill \end{array}\right]\cdot {\sigma}_{\min}^2 $$
(28)

whose solution provides the Total Least-Squares Solution (TLSS).

In the next section, a few equivalent models will be presented for which, traditionally, an identical weighted LEast-Squares Solution (LESS) would have been found after iterative linearization.

3 Traditional Models, Equivalent to the EIV-Model

3.1 The Nonlinear Gauss–Helmert Model

Here, the new vectors

$$ Y:={\it vec}\left[\left.y\right|A\right]\kern1em \mathrm{and}\kern1em e:={\it vec}\left[\left.{e}_y\right|{E}_A\right] $$
(29)

are introduced. Then,

$$ \underset{\bar{\mkern6mu}}{b}\left(\mu :=\underset{n\left(m+1\right)\times 1}{\underbrace{Y-e}},\xi \right):=\left(y-{e}_y\right)-\left(A-{E}_A\right)\xi =0, $$
(30a)
$$ e\sim \left(0,{\sigma}_o^2\left[\begin{array}{cc}\hfill {Q}_y\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill {Q}_A\hfill \end{array}\right]={\sigma}_o^2Q\right), $$
(30b)

with the nonlinear vector-valued vector function

$$ \underset{\bar{\mkern6mu}}{b}:{R}^{\left(n+1\right)\left(m+1\right)-1}\to {R}^n, $$
(30c)

due to the term \( {E}_A\cdot \xi \), forms an equivalent Gauss–Helmert Model that would traditionally be linearized for an iterative Least-Squares adjustment.

The truncated Taylor series, following Pope (1972), then reads:

$$ 0=\underset{\bar{\mkern6mu}}{b}\left(\mu, \xi \right)\approx \underset{\bar{\mkern6mu}}{b}\left({\mu}_o,{\xi}_o\right)+{\left.\frac{\partial \underset{\bar{\mkern6mu}}{b}\left(\mu, \xi \right)}{\partial \left[\left.{\mu}^T\right|{\xi}^T\right]}\right|}_{\mu_o,{\xi}_o}\cdot \left[\begin{array}{c}\hfill \mu -{\mu}_o\hfill \\ {}\hfill \xi -{\xi}_o\hfill \end{array}\right] $$
(31)

with suitable approximations ξ o and \( {\mu}_o:=Y-\underset{\sim }{0} \) where \( \underset{\sim }{0} \) here denotes a “stochastic zero vector” of size \( n\left(m+1\right)\times 1 \). This leads first to

$$ 0\approx \underset{\bar{\mkern6mu}}{b}\left({\mu}_o,{\xi}_o\right){+}{\left.{\left.\frac{\partial \underset{\bar{\mkern6mu}}{b}}{\partial {\mu}^T}\right|}_{\mu_o,{\xi}_o}\cdot \left(\underset{\sim }{0}{-}e\right){+}\frac{\partial \underset{\bar{\mkern6mu}}{b}}{\partial {\xi}^T}\right|}_{\mu_o,{\xi}_o}\cdot \left(\xi -{\xi}_o\right), $$
(32)

then to

$$\begin{array}{rcl}\displaystyle \underset{\bar{\mkern6mu}}{b}\left({\mu}_o,{\xi}_o\right)&+&{\left.\displaystyle \frac{\partial \underset{\bar{\mkern6mu}}{b}}{\partial {\mu}^T}\right|}_{\mu_o,{\xi}_o}\cdot \underset{\sim }{0}\approx {\left.\displaystyle\frac{\partial \underset{\bar{\mkern6mu}}{b}}{\partial {\mu}^T}\right|}_{\mu_o,{\xi}_o}\cdot e-{\left.\displaystyle\frac{\partial \underset{\bar{\mkern6mu}}{b}}{\partial {\xi}^T}\right|}_{\mu_o,{\xi}_o}\cdot\\[21pt] &&\cdot \left(\xi -{\xi}_o\right),\end{array}\\[-24pt] $$
(33)

and eventually to the linearized Gauss–Helmert Model:

$$ {w}_o:=\underset{\bar{\mkern6mu}}{b}\left(Y={\mu}_o+\underset{\sim }{0},{\xi}_o\right)\approx {B}^{(o)}\cdot e+{A}^{(o)}\cdot \left(\xi -{\xi}_o\right), $$
(34a)
$$ {B}^{(o)}:=\left[\left.{I}_n\right|-\left({\xi}_o^T\otimes {I}_n\right)\right],\kern1em {A}^{(o)}:=A-\underset{\sim }{0}, $$
(34b)
$$ \begin{array}{l}\\ {}e\sim \left(0,{\sigma}_o^2Q={\sigma}_o^2\left[\begin{array}{cc}\hfill {Q}_y\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill {Q}_A\hfill \end{array}\right]\right).\end{array} $$
(34c)

Note that the weighted LEast-Squares Solution (LESS) is now being formed through the normal equations

$$ \left[{\left({A}^{(o)}\right)}^T{\left({Q}_1^{(o)}\right)}^{-1}{A}^{(o)}\right]\cdot {\widehat{\xi}}^{(1)}={\left({A}^{(o)}\right)}^T{\left({Q}_1^{(o)}\right)}^{-1}\cdot\\ \cdot\left(y-\underset{\sim }{0}\cdot {\xi}_o\right) $$
(35a)

with

$$ {Q}_1^{(o)}:={B}^{(o)}Q{\left({B}^{(o)}\right)}^T={Q}_y+{\left({\xi}_o\otimes {I}_n\right)}^T{Q}_A\left({\xi}_o\otimes {I}_n\right), $$
(35b)

and the residual vectors through

$$ \begin{array}{rcl}{\tilde{e}}^{(1)}:=\left[\begin{array}{c}\hfill {\tilde{e}}_y^{(1)}\hfill \\ {}\hfill {\tilde{e}}_A^{(1)}\hfill \end{array}\right]&=&\left[\begin{array}{c}\hfill {Q}_y\hfill \\ {}\hfill -{Q}_A\left({\widehat{\xi}}^{(1)}\otimes {I}_n\right)\hfill \end{array}\right]{\left({Q}_1^{(o)}\right)}^{-1}\cdot\\[20pt]&&\cdot\left(y-\underset{\sim }{0}\cdot {\xi}_o-{A}^{(o)}\cdot {\widehat{\xi}}^{(1)}\right)\end{array}\\[-24pt] $$
(35c)

Looking at the next and all the following iteration steps, it becomes clear that this represents one specific iterative solver of Fang’s TLS normal equations (11).

For more details, see Fang (2011, ch. 4.4), Snow (2012, ch. 4), and the forthcoming OSU-Report by Schaffrin (2015), as well as Neitzel (2010) for a specific application.

3.2 The Nonlinear Gauss–Markov Model

In this case, the expectation of the data matrix A is introduced as a new \( n\times m \)parameter matrix

$$ {\varXi}_A:=A-{E}_A\kern0.75em \mathrm{with}\kern0.5em {\xi}_A:={\it vec}{\varXi}_A, $$
(36)

leading to the equivalent Gauss-Markov Model

$$\begin{array}{rcl} y={\left(\xi \otimes {I}_n\right)}^T\cdot {\xi}_A&+&{e}_y=:\underset{\bar{\mkern6mu}}{a}\left(\xi, {\xi}_A\right)+{e}_y,\\[10pt]&& {e}_y\sim \left(0,{\sigma}_o^2{Q}_y\right),\end{array}\\[-24pt] $$
(37a)

with the nonlinear vector-valued vector function

$$ \underset{\bar{\mkern6mu}}{a}:{R}^{\left(n+1\right)m}\to {R}^n $$
(37b)

due to the term \( {\varXi}_A\cdot \xi \). The linearization of model (37a, b) with respect to the approximations ξ o and \( {\xi}_A^{(o)}:={\it vec}\left({A}^{(o)}\right)={\it vec}\left(A-\underset{\sim }{0}\right) \), where \( \underset{\sim }{0} \) now denotes a “stochastic zero matrix” of size \( n\times m \), then leads first to

$$ {\xi}_A-{\xi}_A^{(o)}=\underset{\sim }{0}-{e}_A,\kern1em {e}_A\sim \left(0,{\sigma}_o^2{Q}_A\right), $$
(38a)
$$ y-{e}_y\approx \underset{\bar{\mkern6mu}}{a}\left({\xi}_o,{\xi}_A^{(o)}\right)+{\left.\frac{\partial \underset{\bar{\mkern6mu}}{a}\left(\xi, {\xi}_A\right)}{\partial \left[{\xi}^T,{\xi}_A^T\right]}\right|}_{\xi_o,{\xi}_A^{(o)}}\cdot \left[\begin{array}{c}\hfill \xi -{\xi}_o\hfill \\ {}\hfill {\xi}_A-{\xi}_A^{(o)}\hfill \end{array}\right]\\={A}^{(o)}\cdot {\xi}_o+{A}^{(o)}\cdot \left(\xi -{\xi}_o\right)+{\left({\xi}_o\otimes {I}_n\right)}^T\left({\xi}_A-{\xi}_A^{(o)}\right), $$
(38b)

and finally to the linearized Gauss–Markov Model

$$ \begin{array}{rcl}\left[\begin{array}{c}\hfill y-{A}^{(o)}\cdot {\xi}_o\hfill \\ {}\hfill \underset{\sim }{0}\hfill \end{array}\right]&=&\left[\begin{array}{cc}\hfill \left.\begin{array}{l}{A}^{(o)}\\ {}0\end{array}\right|\hfill \hfill \begin{array}{l}{\left({\xi}_o\otimes {I}_n\right)}^T\\ {}\kern1em {I}_{nm}\end{array}\hfill \end{array}\right]\left[\begin{array}{c}\hfill \xi -{\xi}_o\hfill \\ {}\hfill {\xi}_A-{\xi}_A^{(o)}\hfill \end{array}\right]\\[16pt]&&+\left[\begin{array}{c}\hfill {e}_y\hfill \\ {}\hfill {e}_A\hfill \end{array}\right],\end{array}\\[-24pt] $$
(39a)
$$ \left[\begin{array}{c}\hfill {e}_y\hfill \\ {}\hfill {e}_A\hfill \end{array}\right]\sim \left(\left[\begin{array}{c}\hfill 0\hfill \\ {}\hfill 0\hfill \end{array}\right],\kern0.5em {\sigma}_o^2\left[\begin{array}{cc}\hfill {Q}_y\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill {Q}_A\hfill \end{array}\right]={\sigma}_o^2Q\right). $$
(39b)

After a number of further manipulations, the weighted LESS for model (39a, b) can be shown to fulfill the “normal equations

$$ \left[{\left({A}^{(o)}\right)}^T{\left({Q}_1^{(o)}\right)}^{-1}{A}^{(o)}\right]\cdot {\widehat{\xi}}^{(1)}={\left({A}^{(o)}\right)}^T{\left({Q}_1^{(o)}\right)}^{-1}y $$
(40a)

with

$$ \begin{array}{l}{Q}_1^{(o)}:=\Big[{Q}_y^{-1}-{Q}_y^{-1}{\left({\xi}_o\otimes {I}_n\right)}^T\cdot \\ {}\kern2.75em \cdot {\left[{Q}_A^{-1}+\left({\xi}_o{\xi}_o^T\otimes {Q}_y^{-1}\right)\right]}^{-1}\left({\xi}_o\otimes {I}_n\right){Q}_y^{-1}\Big]{}^{-1}\end{array} $$
(40b)
$$ ={Q}_y+{\left({\xi}_o\otimes {I}_n\right)}^T{Q}_A\left({\xi}_o\otimes {I}_n\right) $$
(40c)

which nicely corresponds to (35a, b). More details can be found in the forthcoming OSU-Report by Schaffrin (2015).

3.3 The Model of Direct Observations with Nonlinear Constraints

Now, the expectation of the observation vector y is introduced as just another parameter vector ξ y of size \( n\times 1 \) so that the new model combines the direct observation equations

$$ \begin{array}{l}\left[\begin{array}{c}\hfill y\hfill \\ {}\hfill {\it vec}A\hfill \end{array}\right]=\left[\begin{array}{c}\hfill {\xi}_y\hfill \\ {}\hfill {\xi}_A\hfill \end{array}\right]+\left[\begin{array}{c}\hfill {e}_y\hfill \\ {}\hfill {e}_A\hfill \end{array}\right],\kern1em \\ {}\kern3em \left[\begin{array}{c}\hfill {e}_y\hfill \\ {}\hfill {e}_A\hfill \end{array}\right]\sim \left(\left[\begin{array}{c}\hfill 0\hfill \\ {}\hfill 0\hfill \end{array}\right],\kern0.5em {\sigma}_o^2\left[\begin{array}{cc}\hfill {Q}_y\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill {Q}_A\hfill \end{array}\right]={\sigma}_o^2Q\right),\vspace*{6pt}\end{array} $$
(41a)

with the nonlinear constraints

$$ {\xi}_y-{\varXi}_A\cdot \xi =0 \vspace*{6pt}$$
(41b)

which might be linearized into

$$ \left[\begin{array}{ccc}\hfill \left.{A}^{(o)}\right|\hfill & \hfill \left.-{I}_n\right|\hfill & \hfill {\left({\xi}_o\otimes {I}_n\right)}^T\hfill \end{array}\right]\left[\begin{array}{c}\hfill \xi -{\xi}_o\hfill \\ {}\hfill {\xi}_y-{\xi}_y^{(o)}\hfill \\ {}\hfill {\xi}_A-{\xi}_A^{(o)}\hfill \end{array}\right]=0. \vspace*{6pt}$$
(42)

In the already mentioned OSU-Report by Schaffrin (2015), it will be shown how the resulting iterative LESS’s do converge to the Total Least-Squares Solution.

For another take on this model, refer to Donevska et al. (2011) who stress the equivalence to orthogonal regression as applied by Deming (1931, 1934).

4 Conclusions

It has been clarified that the TLS approach towards the EIV-Model requires a nonlinear treatment of the nonlinear model. A number of different algorithms have been presented to generate the Total Least-Squares Solution from a certain set of nonlinear normal equations. A triplet of conventional nonlinear models has also been considered, suggesting that the LEast-Squares Solutions from iterative linearization do converge to the nonlinear TLS-Solution in all three cases. Most of the details, however, will be published in a forthcoming OSU-Report, due to the space restrictions for these Proceedings.