Adjusting the Errors-In-Variables Model: Linearized Least-Squares vs. Nonlinear Total Least-Squares

Schaffrin, Burkhard

doi:10.1007/1345_2015_61

Burkhard Schaffrin⁶

Part of the book series: International Association of Geodesy Symposia ((IAG SYMPOSIA,volume 142))

1074 Accesses
8 Citations

Abstract

It has long been known that the Errors-In-Variables (EIV) Model is a special case of the nonlinear Gauss–Helmert Model (GHM) and can, therefore, be adjusted by standard least-squares techniques in iteratively linearized GH-Models, which is the approach by Helmert (Adjustment Computations Based on the Least-Squares Principle (in German), 1907) and – later – by Deming (Phil Mag 11:146–158, 1931; Phil Mag 17:804–829, 1934).

Apart from the fact that there are, at least, two other nonlinear models that are equivalent to the above GH-Model, thus allowing two more classical least-squares approaches based on iterative linearization, it was the seminal paper by Golub and van Loan (SIAM J Numer Anal 17:883–893, 1980) in which they proved that a purely nonlinear approach can be followed as well, thereby avoiding any model linearization. They called such an approach “Total Least-Squares adjustment” by which any normal equations may be replaced by a simple eigenvalue problem, as long as only diagonal dispersion matrices are involved.

Here, an attempt will be made to show the differences and parallels in various algorithms, even in the fully weighted case, which obviously all generate the same results, but without necessarily showing equal efficiency in doing so, as is well known since the publications by Schaffrin and Wieser (J Geodesy 82:415–421, 2008), Fang (Weighted Total Least-Squares solutions with applications in geodesy, 2011), and Mahboub (J Geodesy 86:359–367, 2012).

Access provided by Autonomous University of Puebla. Download conference paper PDF

Weighted total least-squares with constraints: a universal formula for geodetic symmetrical transformations

Article 05 February 2015

Tikhonov-regularized weighted total least squares formulation with applications to geodetic problems

Article 05 October 2021

A robust weighted total least squares algorithm and its geodetic applications

Article 14 November 2015

Keywords

Errors-In-Variables Models, Total Least-Squares, Equivalent nonlinear models,Linearized Least-Squares

1 Introduction

The Errors-In-Variables (EIV) Model has recently seen a lot of attention since, in accordance with Golub and van Loan (1980), it can be treated in its nonlinear form by a least-squares approach that they coined “Total Least-Squares adjustment”. It eventually leads to a (generalized) eigenvalue problem that needs to be solved in lieu of the sequence of normal equations that would result from a traditional “Least-Squares adjustment” within iteratively linearized models. The latter approach dates, at least, back to Helmert (1907), but has as well been used by Deming (1931, 1934) for the approximation of curves and, more recently, by Neitzel (2010) to determine the parameters of a similarity transformation.

In contrast, the nonlinear Total Least-Squares (TLS) approach which, in its original formulation, could tolerate only “element-wise weighting” and thus only diagonal weight matrices, has since been generalized in several steps by Schaffrin and Wieser (2008), Fang (2011), and Mahboub (2012) to now accept any positive-definite weight matrices. This development will be presented in the following Sect. 2, thereby showing how the more specialized algorithms can be derived from the more general ones by simplification.

Moreover, it should be noted that progress has also been made towards the use of positive-semidefinite dispersion matrices in TLS adjustment, which may be handled as described by Schaffrin et al. (2014). These cases are quite relevant whenever the random error matrix needs to show a certain pattern or structure after the adjustment. Due to the limited space, these advanced methods will not be discussed below.

Instead, attention will be paid to a triplet of classical nonlinear models that all can be constructed to be equivalent to the EIV-Model and, furthermore, may undergo a sequence of Least-Squares adjustments via iterative linearization which, in the end, converge to the very same TLS solution. This will be the theme in Sect. 3 although many details have to be left out; for those, see Schaffrin (2015).

2 Nonlinear TLS Adjustment in an EIV-Model

2.1 Fang’s Algorithm

Let the EIV-Model be defined by

$$ y=\underset{n\times m}{\underbrace{\left(A-{E}_A\right)}}\xi +{e}_y,\kern2em {\it rkA}=m<n, $$

(1a)

$$\begin{array}{rcl} e:&=& \left[ \begin{array}{l}\kern1.25em {e}_y\\ {}{e}_A:={\it vec}{E}_A\end{array}\right]\\ &&\quad \sim \left( \left[\begin{array}{c}\hfill 0\hfill \\ {}\hfill 0\hfill \end{array}\right],\kern0.5em {\sigma}_o^2 \left[\begin{array}{cc}\hfill \underset{n\times n}{P_y^{-1}}\hfill \hfill \kern1.5em0\hfill \\ {}\hfill 0\hfill \hfill \kern1.5em\underset{nm\times nm}{P_A^{-1}}\hfill \end{array}\right] =:{\sigma}_o^2{P}^{-1}\right)\end{array} $$

(1b)

where

y is the $ n\times 1 $ observation vector;
A is the $ n\times m $ (random) coefficient matrix with full column rank (aka “data matrix”);
E _A is the $ n\times m $ (unknown) random error matrix associated with A;
ξ is the $ m\times 1 $ (unknown) parameter vector;
e _y is the $ n\times 1 $ (unknown) random error vector associated with~y;
e _A is the $ nm\times 1 $ vectorial form of the matrix E _A;
Q is the $ n\left(m+1\right)\times n\left(m+1\right) $ block-diagonal pos.- def. cofactor matrix;
$ P:={Q}^{-1} $ is the corresponding block-diagonal pos.- def. weight matrix;
σ ²_o is the (unknown) variance component (unit- free);
$ \mathrm{C}\mathrm{o}\mathrm{v}\left\{e_y, {\it vec}E_A\right\}=0 $ for the sake of simplicity.

The model generalizes the one used by Schaffrin and Wieser (2008) where a Kronecker product structure for

$$ {Q}_A={P}_A^{-1}={Q}_o\otimes {Q}_x $$

(2)

was assumed, as well as the one used by Golub and von Loan (1980) who only allowed diagonal cofactor matrices with

$$ {Q}_o:={I}_m,\kern1em {Q}_x:={Q}_y= {\it Diag}\left({p}_1^{-1},.......,{p}_n^{-1}\right)={P}_y^{-1}. $$

(3)

The objectives of a nonlinear Total Least-Squares (TLS) adjustment are now based on the principle

$$ {e}_y^T{P}_y{e}_y+{e}_A^T{P}_A{e}_A= \min .\kern1.25em \mathrm{s}.\mathrm{t}.\kern0.75em \left(1\mathrm{a}\right), $$

(4)

which can be given the equivalent form of a Lagrange target function, namely:

$$ \begin{array}{l}\phi \left({e}_y,{e}_A,\xi, \lambda \right):={e}_y^T{P}_y{e}_y+{e}_A^T{P}_A{e}_A+\\ {}\kern2.75em +2{\lambda}^T\left[y-A\xi -{e}_y+\left({\xi}^T\otimes {I}_n\right){e}_A\right]=\mathrm{stationary}.\end{array} $$

(5)

Consequently, the Euler-Lagrange necessary conditions result in the following system of nonlinear “normal equations”:

$$ \frac{1}{2}\frac{\partial \phi }{\partial {e}_y}={P}_y{\tilde{e}}_y-\widehat{\lambda}\dot{=}0 $$

(6a)

$$ \frac{1}{2}\frac{\partial \phi }{\partial {e}_A}={P}_A{\tilde{e}}_A+\left({\widehat{\xi}}^T\otimes {I}_n\right)\widehat{\lambda}\dot{=}0, $$

(6b)

$$ \frac{1}{2}\frac{\partial \phi }{\partial \xi }=-{\left(A-{\tilde{E}}_A\right)}^T\widehat{\lambda}\dot{=}0, $$

(6c)

$$ \frac{1}{2}\frac{\partial \phi }{\partial \lambda }=y-A\widehat{\xi}-{\tilde{e}}_y+\left({\widehat{\xi}}^T\otimes {I}_n\right){\tilde{e}}_A\dot{=}0, $$

(6d)

which still needs to be reduced by partial elimination since the sufficient condition is fulfilled as

$$ \frac{1}{2}\frac{\partial^2\phi }{\partial \left[\begin{array}{c}\hfill {e}_y\hfill \\ {}\hfill {e}_A\hfill \end{array}\right]\partial \left[\begin{array}{cc}\hfill \left.{e}_y^T\right|\hfill \hfill {e}_A^T\hfill \end{array}\right]}=\left[\begin{array}{cc}\hfill {P}_y\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill {P}_A\hfill \end{array}\right]\kern0.5em \mathrm{is}\ \mathrm{p}\mathrm{o}\mathrm{s}.\hbox{-} \mathrm{d}\mathrm{e}\mathrm{f}. $$

(7)

Now, (6a, b) are transformed to provide the residual vectors through

$$ {\tilde{e}}_y={Q}_y\widehat{\lambda}\kern0.5em \mathrm{and}\kern0.5em {\tilde{e}}_A=-{Q}_A\left(\widehat{\xi}\otimes {I}_n\right)\widehat{\lambda} $$

(8a)

so that (6d) can be rewritten as

$$ y-A\widehat{\xi}=\left[{Q}_y+{\left(\widehat{\xi}\otimes {I}_n\right)}^T{Q}_A\left(\widehat{\xi}\otimes {I}_n\right)\right]\cdot \widehat{\lambda}=:{Q}_1\cdot \widehat{\lambda}, $$

(8b)

with $ {Q}_1={Q}_1\left(\widehat{\xi}\right) $ being nonsingular, thus leading to

$$ \widehat{\lambda}={Q}_1^{-1}\left(y-A\widehat{\xi}\right) $$

(9)

and, together with (6c), to the system

$$ \left[\begin{array}{cc}\hfill {Q}_1\hfill & \hfill \left(A-{\tilde{E}}_A\right)\hfill \\ {}\hfill {\left(A-{\tilde{E}}_A\right)}^T\hfill & \hfill 0\hfill \end{array}\right]\left[\begin{array}{c}\hfill \widehat{\lambda}\hfill \\ {}\hfill \widehat{\xi}\hfill \end{array}\right]=\left[\begin{array}{c}\hfill y-{\tilde{E}}_A\widehat{\xi}\hfill \\ {}\hfill 0\hfill \end{array}\right] $$

(10)

Obviously, the estimated parameter vector is now obtained as in Fang (2011, p.27) via

$$ \begin{array}{ll} \widehat{\xi}={\left[{\left(A-{\tilde{E}}_A\right)}^T{Q}_1^{-1}\left(A-{\tilde{E}}_A\right)\right]}^{-1}{\left(A-{\tilde{E}}_A\right)}^T{Q}_1^{-1}\ \cdot\\ \hspace*{18pt} \cdot\left(y-{\tilde{E}}_A\widehat{\xi}\right)\end{array} $$

(11)

and allows updates for Q ₁, $ \widehat{\lambda} $, and ẽ _A, from which a new estimate $ \widehat{\xi} $ results.

The Total Sum of weighted Squared Residuals (TSSR) may now readily be computed from

$$ \begin{array}{ll}{\tilde{e}}_y^T{P}_y{\tilde{e}}_y{+}{\tilde{e}}_A^T{P}_A{\tilde{e}}_A{=}{\widehat{\lambda}}^T\!\left[{Q}_y{+}{\left(\widehat{\xi}\otimes {I}_n\right)}^T{Q}_A\left(\widehat{\xi}\otimes {I}_n\right)\right]\widehat{\lambda}\,{=}\\ {}={\widehat{\lambda}}^T{Q}_1\widehat{\lambda}={\widehat{\lambda}}^T\left(y-A\widehat{\xi}\right)=:\mathrm{TSSR}\end{array} $$

(12)

so that a suitable variance component estimate may be obtained through

$$ {\widehat{\sigma}}_o^2={\widehat{\lambda}}^T\left(y-A\widehat{\xi}\right)/\left(n-m\right)=\mathrm{TSSR}/\left(\mathrm{n}\hbox{-} \mathrm{m}\right) $$

(13)

as the redundancy in model (1a, b) is still n-m.

Alternatively, system (10) can be given the asymmetric form

$$ \left[\begin{array}{cc}\hfill {Q}_1\hfill & \hfill A\hfill \\ {}\hfill {\left(A-{\tilde{E}}_A\right)}^T\hfill & \hfill 0\hfill \end{array}\right]\left[\begin{array}{c}\hfill \widehat{\lambda}\hfill \\ {}\hfill \widehat{\xi}\hfill \end{array}\right]=\left[\begin{array}{c}\hfill y\hfill \\ {}\hfill 0\hfill \end{array}\right] $$

(14)

which would then provide the estimated parameter vector through

$$ \widehat{\xi}={\left[{\left(A-{\tilde{E}}_A\right)}^T{Q}_1^{-1}A\right]}^{-1}{\left(A-{\tilde{E}}_A\right)}^T{Q}_1^{-1}y $$

(15)

and should lead to a similar iteration as before. Note that (15) also appears as formula (21) in Xu et al. (2012), but essentially represents a variant of Fang’s algorithm; also, cf. Fang (2013) where further alternatives are presented.

2.2 Mahboub’s Algorithm

On the other hand, combining (9) with (6c) leads to the following sequence of identities:

$$ \begin{array}{l}{A}^T{Q}_1^{-1}\left(y-A\widehat{\xi}\right){=}{A}^T\widehat{\lambda}{=}{\tilde{E}}_A^T\widehat{\lambda}{=}\left({\widehat{\lambda}}^T\otimes {I}_m\right){\it vec}\left({\tilde{E}}_A^T\right)=\\ {}=\left({\widehat{\lambda}}^T\otimes {I}_m\right)\cdot \left(K{\tilde{e}}_A\right)=\left({I}_m\otimes {\widehat{\lambda}}^T\right){\tilde{e}}_A=\\ {}=-\left[{\left({I}_m\otimes \widehat{\lambda}\right)}^T{Q}_A\left(\widehat{\xi}\otimes {I}_n\right)\right]\widehat{\lambda}=:-{R}_1\cdot \widehat{\lambda}=\\ {}=-{R}_1\cdot {Q}_1^{-1}\left(y-A\widehat{\xi}\right)\end{array} $$

(16)

where K denotes a $ nm\times nm $ ”commutation matrix” that is also known as “vec-permutation matrix”; for more details, see Magnus and Neudecker (2007).

Obviously, (16) translates into the estimated parameter vector

$$ \widehat{\xi}={\left[\left({A}^T+{R}_1\right){Q}_1^{-1}A\right]}^{-1}\left({A}^T+{R}_1\right){Q}_1^{-1}y $$

(17a)

with $ {R}_1={R}_1\left(\widehat{\xi},\widehat{\lambda}\right) $ and, from (16), with

$$ {R}_1\widehat{\lambda}=-{\tilde{E}}_A^T\widehat{\lambda} $$

(17b)

without necessarily implying that $ {R}_1=-{\tilde{E}}_A^T $. Therefore, the sequence of solutions to (15) may differ from the sequence of solutions to (17a) when iteratively updating Q ₁, $ \widehat{\lambda} $, and R ₁, before a new parameter vector estimate $ \widehat{\xi} $ can be found; yet the ultimate convergence points will be the same.

Again, the TSSR can be computed from (12) which will lead to the variance component estimate in (13).

2.3 A New Variant of Mahboub’s Algorithm

After giving (16) the form

$$ {A}^T{Q}_1^{-1}\left(y-A\widehat{\xi} \right)=-\left[{\left({I}_m\otimes \widehat{\lambda}\right)}^T{Q}_A\left({I}_m\otimes \widehat{\lambda}\right)\right]\widehat{\xi}, $$

(18a)

the estimated parameter vector may as well be obtained from

$$ \widehat{\xi}=\Big[{A}^T{Q}_1^{-1}A-{\left[{\left({I}_m\otimes \widehat{\lambda}\right)}^T{Q}_A\left({I}_m\otimes \widehat{\lambda}\right)\right]}^{-1}{A}^T{Q}_1^{-1}y $$

(18b)

thus allowing updates for Q ₁ and $ \widehat{\lambda} $. This algorithm will be further explored in the near future.

2.4 The Schaffrin–Wieser Algorithm

This algorithm was designed for the somewhat more special case where the cofactor matrix Q _A can be split into a Kronecker product, thereby indicating that all columns have cofactor matrices proportional to each other. This implies

$$ {Q}_A={Q}_o\otimes {Q}_x\Rightarrow {Q}_1={Q}_y+\left({\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {Q}_x $$

(19)

and thus

$$ \begin{array}{l}{A}^T{Q}_1^{-1}\left(y-A\widehat{\xi}\right)={A}^T{\left[{Q}_y+\left({\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {Q}_x\right]}^{-1}\cdot\\{}\cdot\left(y-A\widehat{\xi}\right)={A}^T\widehat{\lambda}={\tilde{E}}_A^T\widehat{\lambda}\end{array} $$

(20)

with

$$ {\tilde{e}}_A=-\left({Q}_o\widehat{\xi}\otimes {Q}_x\right)\widehat{\lambda}=-{\it vec}\left({Q}_x\widehat{\lambda}{\widehat{\xi}}^T{Q}_o\right)={\it vec}{\tilde{E}}_A. $$

(21)

(20) and (21) together generate the identity

$$ \begin{array}{l}{A}^T{\left[{Q}_y+\left({\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {Q}_x\right]}^{-1}\left(y-A\widehat{\xi}\right)=\\ {}\kern6em =-{Q}_o\widehat{\xi}\cdot \left({\widehat{\lambda}}^T{Q}_x\widehat{\lambda}\right)=:-{Q}_o\widehat{\xi}\cdot \widehat{{\textit v}}\ \end{array} $$

(22a)

suggesting the iteration

$$ \begin{array}{l}\widehat{\xi}={\left({A}^T{\left[{Q}_y+\left({\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {Q}_x\right]}^{-1}A-{Q}_o\widehat{{\textit v}}\right)}^{-1}\cdot \\ {}\kern6em \cdot {A}^T{\left[{Q}_y+\left({\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {Q}_x\right]}^{-1}y\ \end{array} $$

(22b)

with

$$\begin{array}{rcl} \widehat{{\textit v}}:&=&\left({\widehat{\lambda}}^T{Q}_x\widehat{\lambda}\right)\kern0.75em \mathrm{and}\kern0.5em \widehat{\lambda}:={\left[{Q}_y+\left({\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {Q}_x\right]}^{-1} \cdot\\[12pt] &&\cdot\left(y-A\widehat{\xi}\right)\end{array}\\[-24pt]$$

(22c)

while (12) and (13) generate first the TSSR and then a suitable variance component estimate.

2.5 The Golub-van-Loan Algorithm

Now, the condition (19) is further specialized to

$$ {Q}_x:={Q}_y\Rightarrow {Q}_1=\left(1+{\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {Q}_y $$

(23a)

and

$$ \widehat{\lambda}={\left(1+{\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)}^{-1}{Q}_y^{-1}\left(y-A\widehat{\xi}\right) $$

(23b)

so that (22a) becomes

$$ {A}^T{Q}_y^{-1}\left(y{-}A\widehat{\xi}\right){=}-{Q}_o\widehat{\xi}\cdot \widehat{{\textit v}}\left(1{+}{\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)=:{-}{Q}_o\widehat{\xi}\cdot {\sigma}_{\min}^2 $$

(24a)

with

$$ \begin{array}{l}{\sigma}_{\min}^2=\left({\widehat{\lambda}}^T{Q}_y\widehat{\lambda}\right)\left(1+{\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)=\\ {}\kern1.75em ={\left(y-A\widehat{\xi}\right)}^T{Q}_y^{-1}\left(y-A\widehat{\xi}\right)/\left(1+{\widehat{\xi}}^T{Q}_o\widehat{\xi}\right),\end{array} $$

(24b)

and this, from (24a, b), becomes

$$ \begin{array}{l}{\sigma}_{\min}^2\cdot \left(1{+}{\widehat{\xi}}^T{Q}_o\widehat{\xi}\right){=}{y}^T{Q}_y^{-1}\left(y{-}A\widehat{\xi}\right){+}\left({\widehat{\xi}}^T{Q}_o\widehat{\xi}\right)\cdot {\sigma}_{\min}^2{\Rightarrow} \\ {}\Rightarrow {\sigma}_{\min}^2={y}^T{Q}_y^{-1}\left(y-A\widehat{\xi}\right)=\mathrm{TSSR}\end{array} $$

(24c)

(24a) and (24c) allow the problem to be rephrased as a generalized eigenvalue problem, specifically as:

$$ \left[\begin{array}{cc}\hfill {A}^T{Q}_y^{-1}A\hfill & \hfill {A}^T{Q}_y^{-1}y\hfill \\ {}\hfill {y}^T{Q}_y^{-1}A\hfill & \hfill {y}^T{Q}_y^{-1}y\hfill \end{array}\right]\left[\begin{array}{c}\hfill \widehat{\xi}\hfill \\ {}\hfill -1\hfill \end{array}\right]=\left[\begin{array}{cc}\hfill {Q}_o\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill 1\hfill \end{array}\right]\left[\begin{array}{c}\hfill \widehat{\xi}\hfill \\ {}\hfill -1\hfill \end{array}\right]\cdot {\sigma}_{\min}^2 $$

(25)

with the variance component estimate

$$ {\widehat{\sigma}}_o^2={\sigma}_{\min}^2/\left(n-m\right) $$

(26)

The original situation, treated by Golub and van Loan (1980), was characterized by the further specializations

$$ {Q}_o:={I}_m\kern0.5em \mathrm{and}\kern0.5em {Q}_y:=\mathrm{Diag}\left({p}_1^{-1},.........,{p}_n^{-1}\right)={P}^{-1} $$

(27)

which, in turn, lead to the standard eigenvalue problem

$$ \left[\begin{array}{cc}\hfill {A}^TPA\hfill & \hfill {A}^TPy\hfill \\ {}\hfill {y}^TPA\hfill & \hfill {y}^TPy\hfill \end{array}\right]\left[\begin{array}{c}\hfill \widehat{\xi}\hfill \\ {}\hfill -1\hfill \end{array}\right]=\left[\begin{array}{c}\hfill \widehat{\xi}\hfill \\ {}\hfill -1\hfill \end{array}\right]\cdot {\sigma}_{\min}^2 $$

(28)

whose solution provides the Total Least-Squares Solution (TLSS).

In the next section, a few equivalent models will be presented for which, traditionally, an identical weighted LEast-Squares Solution (LESS) would have been found after iterative linearization.

3 Traditional Models, Equivalent to the EIV-Model

3.1 The Nonlinear Gauss–Helmert Model

Here, the new vectors

$$ Y:={\it vec}\left[\left.y\right|A\right]\kern1em \mathrm{and}\kern1em e:={\it vec}\left[\left.{e}_y\right|{E}_A\right] $$

(29)

are introduced. Then,

$$ \underset{\bar{\mkern6mu}}{b}\left(\mu :=\underset{n\left(m+1\right)\times 1}{\underbrace{Y-e}},\xi \right):=\left(y-{e}_y\right)-\left(A-{E}_A\right)\xi =0, $$

(30a)

$$ e\sim \left(0,{\sigma}_o^2\left[\begin{array}{cc}\hfill {Q}_y\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill {Q}_A\hfill \end{array}\right]={\sigma}_o^2Q\right), $$

(30b)

with the nonlinear vector-valued vector function

$$ \underset{\bar{\mkern6mu}}{b}:{R}^{\left(n+1\right)\left(m+1\right)-1}\to {R}^n, $$

(30c)

due to the term $ {E}_A\cdot \xi $, forms an equivalent Gauss–Helmert Model that would traditionally be linearized for an iterative Least-Squares adjustment.

The truncated Taylor series, following Pope (1972), then reads:

$$ 0=\underset{\bar{\mkern6mu}}{b}\left(\mu, \xi \right)\approx \underset{\bar{\mkern6mu}}{b}\left({\mu}_o,{\xi}_o\right)+{\left.\frac{\partial \underset{\bar{\mkern6mu}}{b}\left(\mu, \xi \right)}{\partial \left[\left.{\mu}^T\right|{\xi}^T\right]}\right|}_{\mu_o,{\xi}_o}\cdot \left[\begin{array}{c}\hfill \mu -{\mu}_o\hfill \\ {}\hfill \xi -{\xi}_o\hfill \end{array}\right] $$

(31)

with suitable approximations ξ _o and $ {\mu}_o:=Y-\underset{\sim }{0} $ where $ \underset{\sim }{0} $ here denotes a “stochastic zero vector” of size $ n\left(m+1\right)\times 1 $. This leads first to

$$ 0\approx \underset{\bar{\mkern6mu}}{b}\left({\mu}_o,{\xi}_o\right){+}{\left.{\left.\frac{\partial \underset{\bar{\mkern6mu}}{b}}{\partial {\mu}^T}\right|}_{\mu_o,{\xi}_o}\cdot \left(\underset{\sim }{0}{-}e\right){+}\frac{\partial \underset{\bar{\mkern6mu}}{b}}{\partial {\xi}^T}\right|}_{\mu_o,{\xi}_o}\cdot \left(\xi -{\xi}_o\right), $$

(32)

then to

$$\begin{array}{rcl}\displaystyle \underset{\bar{\mkern6mu}}{b}\left({\mu}_o,{\xi}_o\right)&+&{\left.\displaystyle \frac{\partial \underset{\bar{\mkern6mu}}{b}}{\partial {\mu}^T}\right|}_{\mu_o,{\xi}_o}\cdot \underset{\sim }{0}\approx {\left.\displaystyle\frac{\partial \underset{\bar{\mkern6mu}}{b}}{\partial {\mu}^T}\right|}_{\mu_o,{\xi}_o}\cdot e-{\left.\displaystyle\frac{\partial \underset{\bar{\mkern6mu}}{b}}{\partial {\xi}^T}\right|}_{\mu_o,{\xi}_o}\cdot\\[21pt] &&\cdot \left(\xi -{\xi}_o\right),\end{array}\\[-24pt] $$

(33)

and eventually to the linearized Gauss–Helmert Model:

$$ {w}_o:=\underset{\bar{\mkern6mu}}{b}\left(Y={\mu}_o+\underset{\sim }{0},{\xi}_o\right)\approx {B}^{(o)}\cdot e+{A}^{(o)}\cdot \left(\xi -{\xi}_o\right), $$

(34a)

$$ {B}^{(o)}:=\left[\left.{I}_n\right|-\left({\xi}_o^T\otimes {I}_n\right)\right],\kern1em {A}^{(o)}:=A-\underset{\sim }{0}, $$

(34b)

$$ \begin{array}{l}\\ {}e\sim \left(0,{\sigma}_o^2Q={\sigma}_o^2\left[\begin{array}{cc}\hfill {Q}_y\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill {Q}_A\hfill \end{array}\right]\right).\end{array} $$

(34c)

Note that the weighted LEast-Squares Solution (LESS) is now being formed through the normal equations

$$ \left[{\left({A}^{(o)}\right)}^T{\left({Q}_1^{(o)}\right)}^{-1}{A}^{(o)}\right]\cdot {\widehat{\xi}}^{(1)}={\left({A}^{(o)}\right)}^T{\left({Q}_1^{(o)}\right)}^{-1}\cdot\\ \cdot\left(y-\underset{\sim }{0}\cdot {\xi}_o\right) $$

(35a)

with

$$ {Q}_1^{(o)}:={B}^{(o)}Q{\left({B}^{(o)}\right)}^T={Q}_y+{\left({\xi}_o\otimes {I}_n\right)}^T{Q}_A\left({\xi}_o\otimes {I}_n\right), $$

(35b)

and the residual vectors through

$$ \begin{array}{rcl}{\tilde{e}}^{(1)}:=\left[\begin{array}{c}\hfill {\tilde{e}}_y^{(1)}\hfill \\ {}\hfill {\tilde{e}}_A^{(1)}\hfill \end{array}\right]&=&\left[\begin{array}{c}\hfill {Q}_y\hfill \\ {}\hfill -{Q}_A\left({\widehat{\xi}}^{(1)}\otimes {I}_n\right)\hfill \end{array}\right]{\left({Q}_1^{(o)}\right)}^{-1}\cdot\\[20pt]&&\cdot\left(y-\underset{\sim }{0}\cdot {\xi}_o-{A}^{(o)}\cdot {\widehat{\xi}}^{(1)}\right)\end{array}\\[-24pt] $$

(35c)

Looking at the next and all the following iteration steps, it becomes clear that this represents one specific iterative solver of Fang’s TLS normal equations (11).

For more details, see Fang (2011, ch. 4.4), Snow (2012, ch. 4), and the forthcoming OSU-Report by Schaffrin (2015), as well as Neitzel (2010) for a specific application.

3.2 The Nonlinear Gauss–Markov Model

In this case, the expectation of the data matrix A is introduced as a new $ n\times m $ ”parameter matrix”

$$ {\varXi}_A:=A-{E}_A\kern0.75em \mathrm{with}\kern0.5em {\xi}_A:={\it vec}{\varXi}_A, $$

(36)

leading to the equivalent Gauss-Markov Model

$$\begin{array}{rcl} y={\left(\xi \otimes {I}_n\right)}^T\cdot {\xi}_A&+&{e}_y=:\underset{\bar{\mkern6mu}}{a}\left(\xi, {\xi}_A\right)+{e}_y,\\[10pt]&& {e}_y\sim \left(0,{\sigma}_o^2{Q}_y\right),\end{array}\\[-24pt] $$

(37a)

with the nonlinear vector-valued vector function

$$ \underset{\bar{\mkern6mu}}{a}:{R}^{\left(n+1\right)m}\to {R}^n $$

(37b)

due to the term $ {\varXi}_A\cdot \xi $. The linearization of model (37a, b) with respect to the approximations ξ _o and $ {\xi}_A^{(o)}:={\it vec}\left({A}^{(o)}\right)={\it vec}\left(A-\underset{\sim }{0}\right) $, where $ \underset{\sim }{0} $ now denotes a “stochastic zero matrix” of size $ n\times m $, then leads first to

$$ {\xi}_A-{\xi}_A^{(o)}=\underset{\sim }{0}-{e}_A,\kern1em {e}_A\sim \left(0,{\sigma}_o^2{Q}_A\right), $$

(38a)

$$ y-{e}_y\approx \underset{\bar{\mkern6mu}}{a}\left({\xi}_o,{\xi}_A^{(o)}\right)+{\left.\frac{\partial \underset{\bar{\mkern6mu}}{a}\left(\xi, {\xi}_A\right)}{\partial \left[{\xi}^T,{\xi}_A^T\right]}\right|}_{\xi_o,{\xi}_A^{(o)}}\cdot \left[\begin{array}{c}\hfill \xi -{\xi}_o\hfill \\ {}\hfill {\xi}_A-{\xi}_A^{(o)}\hfill \end{array}\right]\\={A}^{(o)}\cdot {\xi}_o+{A}^{(o)}\cdot \left(\xi -{\xi}_o\right)+{\left({\xi}_o\otimes {I}_n\right)}^T\left({\xi}_A-{\xi}_A^{(o)}\right), $$

(38b)

and finally to the linearized Gauss–Markov Model

$$ \begin{array}{rcl}\left[\begin{array}{c}\hfill y-{A}^{(o)}\cdot {\xi}_o\hfill \\ {}\hfill \underset{\sim }{0}\hfill \end{array}\right]&=&\left[\begin{array}{cc}\hfill \left.\begin{array}{l}{A}^{(o)}\\ {}0\end{array}\right|\hfill \hfill \begin{array}{l}{\left({\xi}_o\otimes {I}_n\right)}^T\\ {}\kern1em {I}_{nm}\end{array}\hfill \end{array}\right]\left[\begin{array}{c}\hfill \xi -{\xi}_o\hfill \\ {}\hfill {\xi}_A-{\xi}_A^{(o)}\hfill \end{array}\right]\\[16pt]&&+\left[\begin{array}{c}\hfill {e}_y\hfill \\ {}\hfill {e}_A\hfill \end{array}\right],\end{array}\\[-24pt] $$

(39a)

$$ \left[\begin{array}{c}\hfill {e}_y\hfill \\ {}\hfill {e}_A\hfill \end{array}\right]\sim \left(\left[\begin{array}{c}\hfill 0\hfill \\ {}\hfill 0\hfill \end{array}\right],\kern0.5em {\sigma}_o^2\left[\begin{array}{cc}\hfill {Q}_y\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill {Q}_A\hfill \end{array}\right]={\sigma}_o^2Q\right). $$

(39b)

After a number of further manipulations, the weighted LESS for model (39a, b) can be shown to fulfill the “normal equations”

$$ \left[{\left({A}^{(o)}\right)}^T{\left({Q}_1^{(o)}\right)}^{-1}{A}^{(o)}\right]\cdot {\widehat{\xi}}^{(1)}={\left({A}^{(o)}\right)}^T{\left({Q}_1^{(o)}\right)}^{-1}y $$

(40a)

with

$$ \begin{array}{l}{Q}_1^{(o)}:=\Big[{Q}_y^{-1}-{Q}_y^{-1}{\left({\xi}_o\otimes {I}_n\right)}^T\cdot \\ {}\kern2.75em \cdot {\left[{Q}_A^{-1}+\left({\xi}_o{\xi}_o^T\otimes {Q}_y^{-1}\right)\right]}^{-1}\left({\xi}_o\otimes {I}_n\right){Q}_y^{-1}\Big]{}^{-1}\end{array} $$

(40b)

$$ ={Q}_y+{\left({\xi}_o\otimes {I}_n\right)}^T{Q}_A\left({\xi}_o\otimes {I}_n\right) $$

(40c)

which nicely corresponds to (35a, b). More details can be found in the forthcoming OSU-Report by Schaffrin (2015).

3.3 The Model of Direct Observations with Nonlinear Constraints

Now, the expectation of the observation vector y is introduced as just another parameter vector ξ _y of size $ n\times 1 $ so that the new model combines the direct observation equations

$$ \begin{array}{l}\left[\begin{array}{c}\hfill y\hfill \\ {}\hfill {\it vec}A\hfill \end{array}\right]=\left[\begin{array}{c}\hfill {\xi}_y\hfill \\ {}\hfill {\xi}_A\hfill \end{array}\right]+\left[\begin{array}{c}\hfill {e}_y\hfill \\ {}\hfill {e}_A\hfill \end{array}\right],\kern1em \\ {}\kern3em \left[\begin{array}{c}\hfill {e}_y\hfill \\ {}\hfill {e}_A\hfill \end{array}\right]\sim \left(\left[\begin{array}{c}\hfill 0\hfill \\ {}\hfill 0\hfill \end{array}\right],\kern0.5em {\sigma}_o^2\left[\begin{array}{cc}\hfill {Q}_y\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill {Q}_A\hfill \end{array}\right]={\sigma}_o^2Q\right),\vspace*{6pt}\end{array} $$

(41a)

with the nonlinear constraints

$$ {\xi}_y-{\varXi}_A\cdot \xi =0 \vspace*{6pt}$$

(41b)

which might be linearized into

$$ \left[\begin{array}{ccc}\hfill \left.{A}^{(o)}\right|\hfill & \hfill \left.-{I}_n\right|\hfill & \hfill {\left({\xi}_o\otimes {I}_n\right)}^T\hfill \end{array}\right]\left[\begin{array}{c}\hfill \xi -{\xi}_o\hfill \\ {}\hfill {\xi}_y-{\xi}_y^{(o)}\hfill \\ {}\hfill {\xi}_A-{\xi}_A^{(o)}\hfill \end{array}\right]=0. \vspace*{6pt}$$

(42)

In the already mentioned OSU-Report by Schaffrin (2015), it will be shown how the resulting iterative LESS’s do converge to the Total Least-Squares Solution.

For another take on this model, refer to Donevska et al. (2011) who stress the equivalence to orthogonal regression as applied by Deming (1931, 1934).

4 Conclusions

It has been clarified that the TLS approach towards the EIV-Model requires a nonlinear treatment of the nonlinear model. A number of different algorithms have been presented to generate the Total Least-Squares Solution from a certain set of nonlinear normal equations. A triplet of conventional nonlinear models has also been considered, suggesting that the LEast-Squares Solutions from iterative linearization do converge to the nonlinear TLS-Solution in all three cases. Most of the details, however, will be published in a forthcoming OSU-Report, due to the space restrictions for these Proceedings.

References

Deming WE (1931) The application of least squares. Phil Mag 11:146–158
Article Google Scholar
Deming WE (1934) On the application of least squares II. Phil Mag 17:804–829
Article Google Scholar
Donevska S, Fiśerova E, Horn K (2011) On the equivalence between orthogonal regression and linear model with type II constraints. Acta Univ Palacki Olomuc, Fac rer nat, Mathematica 50(2):19–27
Google Scholar
Fang X (2011) Weighted Total Least-Squares solutions with applications in geodesy. Publ. No. 294, Department of Geodesy and Geoinformatics, Leibniz University, Hannover
Google Scholar
Fang X (2013) Weighted Total Least-Squares: necessary and sufficient conditions, fixed and random parameters. J Geodesy 87:733–749
Article Google Scholar
Golub GH, van Loan CF (1980) An analysis of the Total Least-Squares problem. SIAM J Numer Anal 17:883–893
Article Google Scholar
Helmert FR (1907) Adjustment computations based on the Least-Squares Principle (in German), 2nd edn. Teubner, Leipzig
Google Scholar
Magnus JR, Neudecker H (2007) Matrix differential calculus with applications in statistics and economics, 3rd edn. Wiley, Chichester
Google Scholar
Mahboub V (2012) On weighted Total Least-Squares for geodetic transformations. J Geodesy 86:359–367
Article Google Scholar
Neitzel F (2010) Generalization of Total Least-Squares on example of unweighted and weighted 2D similarity transformation. J Geodesy 84:751–762
Article Google Scholar
Pope A (1972) Some pitfalls to be avoided in the iterative adjustment of nonlinear problems. In: Proceedings of the 38th Ann. ASPRS Meeting, Amer. Soc. of Photogrammetry: Falls Church, pp 449–477
Google Scholar
Schaffrin B, Wieser A (2008) On weighted Total Least-Squares adjustment for linear regression. J Geodesy 82:415–421
Article Google Scholar
Schaffrin B (2015) The Errors-In-Variables (EIV) model. Nonlinear Total Least-Squares (TLS) adjustment or iteratively linearized Least-Squares (LS) adjustment? OSU-Report, Division of Geodetic Science, School of Earth Sciences, The Ohio State University, Columbus
Google Scholar
Schaffrin B, Snow K, Neitzel F (2014) On the Errors-In-Variables model with singular covariance matrices. J Geodetic Sci 4:28–36
Google Scholar
Snow K (2012) Topics in Total Least-Squares adjustment within the Errors-In-Variables model: singular covariance matrices and prior information, Report No. 502, Division of Geodetic Science, School of Earth Sciences, The Ohio State University, Columbus
Google Scholar
Xu P, Liu JN, Shi C (2012) Total Least-Squares adjustment in partial Errors-In-Variables models. Algorithms and statistical analysis. J Geodesy 86:661–675
Article Google Scholar

Download references

Acknowledgment

This author is very much indebted to the thorough reading of the text by three anonymous reviewers. Their recommendations have improved the readibility quite substantially. Also appreciated are many discussions with his long-time collaborator Kyle Snow.

Author information

Authors and Affiliations

Division of Geodetic Science, School of Earth Sciences, The Ohio State University, Columbus, OH, USA
Burkhard Schaffrin

Authors

Burkhard Schaffrin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Burkhard Schaffrin .

Editor information

Editors and Affiliations

Institute of Geodesy, University of Stuttgart, Stuttgart, Germany
Nico Sneeuw
Department of Mathematics, University of Western Bohemia, Pilsen, Czech Republic
Pavel Novák
Geodesy and Geomatics Division, University of Rome "La Sapienza", Rome, Italy
Mattia Crespi
Dipartimento di Ingegneria Civile e Ambi, Politecnico di Milano, Milano, Italy
Fernando Sansò

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schaffrin, B. (2015). Adjusting the Errors-In-Variables Model: Linearized Least-Squares vs. Nonlinear Total Least-Squares. In: Sneeuw, N., Novák, P., Crespi, M., Sansò, F. (eds) VIII Hotine-Marussi Symposium on Mathematical Geodesy. International Association of Geodesy Symposia, vol 142. Springer, Cham. https://doi.org/10.1007/1345_2015_61

Download citation

DOI: https://doi.org/10.1007/1345_2015_61
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24548-5
Online ISBN: 978-3-319-30530-1
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics

Adjusting the Errors-In-Variables Model: Linearized Least-Squares vs. Nonlinear Total Least-Squares

Abstract

Similar content being viewed by others

Weighted total least-squares with constraints: a universal formula for geodetic symmetrical transformations

Tikhonov-regularized weighted total least squares formulation with applications to geodetic problems

A robust weighted total least squares algorithm and its geodetic applications

Keywords

1 Introduction

2 Nonlinear TLS Adjustment in an EIV-Model

2.1 Fang’s Algorithm

2.2 Mahboub’s Algorithm

2.3 A New Variant of Mahboub’s Algorithm

2.4 The Schaffrin–Wieser Algorithm

2.5 The Golub-van-Loan Algorithm

3 Traditional Models, Equivalent to the EIV-Model

3.1 The Nonlinear Gauss–Helmert Model

3.2 The Nonlinear Gauss–Markov Model

3.3 The Model of Direct Observations with Nonlinear Constraints

4 Conclusions

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Adjusting the Errors-In-Variables Model: Linearized Least-Squares vs. Nonlinear Total Least-Squares

Abstract

Similar content being viewed by others

Weighted total least-squares with constraints: a universal formula for geodetic symmetrical transformations

Tikhonov-regularized weighted total least squares formulation with applications to geodetic problems

A robust weighted total least squares algorithm and its geodetic applications

Keywords

1 Introduction

2 Nonlinear TLS Adjustment in an EIV-Model

2.1 Fang’s Algorithm

2.2 Mahboub’s Algorithm

2.3 A New Variant of Mahboub’s Algorithm

2.4 The Schaffrin–Wieser Algorithm

2.5 The Golub-van-Loan Algorithm

3 Traditional Models, Equivalent to the EIV-Model

3.1 The Nonlinear Gauss–Helmert Model

3.2 The Nonlinear Gauss–Markov Model

3.3 The Model of Direct Observations with Nonlinear Constraints

4 Conclusions

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation