1 Introduction

When outliers are present in a data set, a least-squares (LS) adjustment may not be possible or will produce poor or invalid results (Wolf and Ghilani 1997). Many approaches to mitigate or even eliminate the deteriorating effect of outlying observations on the parameter estimates have been developed (Cook 1977; Koch 1999; Monhor and Verö 2011), albeit there is no universally-accepted definition for an outlier (Barnett and Lewis 1994; Monhor and Takemoto 2005; Monhor and Verö 2011).

There are two essential approaches to control the corrupt effects of outliers: conventional outlier detection test procedures developed in geodetic literature (Baarda 1968; Pope 1976) and robust methods (Huber 1981; Hampel et al. 1986, Rousseeuw and Leroy 1987; Koch 1999; Yang 1999; Hekimoglu and Koch 2000; Xu 2005; Hekimoglu 2005). However, the conventional test procedures are only applicable under the assumption that no more than one outlier is present. In case of multiple outliers, the most practical strategy is to employ the iterative data snooping presented by Kok (1984), whilst procedures for detecting all outliers at once have also been proposed (Hadi and Simonoff 1993; Snow and Schaffrin 2003; Baselga 2011).

To evaluate the influence of one or more observations on the adjustment outputs, the deletion diagnostics have been extensively adopted (Cook 1977, 1979; Chatterjee and Hadi 1988). There are two methods to implement the diagnostics, namely, delete the underlying observation(s) explicitly or implicitly. The explicit one is case-deletion model and the other one is referred to as mean-shift outlier model (Hekimoglu et al. 2012). The aim of this contribution is twofold: first to prove the equivalence of these two methods; second to address influence of outlying observations on the quality measures.

The paper is organized as follows: the equivalence of two multiple outlier detection models is investigated, followed by the computational considerations in performing the mean-shift outlier model. Furthermore, theoretical analyses state that the precision, Minimal Detectable Bias (MDB) measure and Dilution of Precision (DOP) metric are all overoptimistic when the outlying observations should have been taken into account but were neglected.

2 Model description

Let us consider a linear Gauss-Markov model defined by Koch (1999)

$$ E(\boldsymbol{L}) = \boldsymbol{AX}\quad \mbox{with}\ \operatorname {Cov}(\boldsymbol{L}) = \sigma _{0}^{2}\boldsymbol{P}^{ -1}, $$
(1)

where L is the n×1 vector of observations, A the n×u design matrix with full column rank, and X the u×1 vector of unknowns. \(\sigma _{0}^{2}\) is the a priori variance factor of unit weight, and P the symmetric positive-definite weight matrix. Whenever necessary, the observations are supposed to be normally distributed.

Then, the (weighted) LS estimate of the unknowns in model Eq. (1) reads (Koch 1999)

$$ \hat{\boldsymbol{X}} = \bigl(\boldsymbol{A}^{T}\boldsymbol{PA} \bigr)^{ - 1}\boldsymbol{A}^{T}\boldsymbol{PL} $$
(2)

The corresponding residual vector is readily obtained as

$$ \boldsymbol{V} = \boldsymbol{L} - \boldsymbol{A}\hat{\boldsymbol{X}} = \boldsymbol{RL} $$
(3)

where R=I n A(A T PA)−1 A T P maps the original observational vector onto the residual vector as a result of the LS adjustment (Schaffrin 1997; Guo et al. 2011). The matrix R plays an important role in linear adjustment techniques since it contains extremely useful information (Huber 1981; Guo et al. 2007, 2010). One can easily verify that R is idempotent and has the following useful properties

$$ \boldsymbol{R}^{T}\boldsymbol{P} = \boldsymbol{PR} = \boldsymbol{R}^{T}\boldsymbol{PR},\qquad \boldsymbol{RA} = \boldsymbol{O},\qquad \boldsymbol{A}^{T}\boldsymbol{PR} = \boldsymbol{O} $$
(4)

the weighted sum of squares of the LS residuals reads

$$ \varOmega = \boldsymbol{V}^{T}\boldsymbol{PV} = \boldsymbol{L}^{T} \boldsymbol{PRL} $$
(5)

3 Multiple outlier detection models

As is known, LS method is very susceptible to outliers (Wolf and Ghilani 1997; Koch 1999; Guo et al. 2010). There are two procedures to implement the deletion diagnostics, namely, the case-deletion model and the mean-shift outlier model.

Let us assume the i 1th, the i 2th, …, and the i m th observations are to be deleted, while the i m+1th, the i m+2th, …, and the i n th observations are the remaining ones.

3.1 Mean-shift outlier model

For convenience we introduce the following notations,

$$ \boldsymbol{H}_{b} = (\boldsymbol{h}_{i_{1}}, \boldsymbol{h}_{i_{2}}, \ldots,\boldsymbol{h}_{i_{m}}),\qquad \boldsymbol{H}_{r} = (\boldsymbol{h}_{i_{m + 1}}, \boldsymbol{h}_{i_{m + 2}}, \ldots,\boldsymbol{h}_{i_{n}}) $$
(6)

where h i denotes the ith n-vector having a 1 as its ith entry and zeros otherwise. It can be seen (H b ,H r ) is a permutation matrix (Strang and Borre 1997). Since a permutation matrix is orthogonal, one can obtain

$$ (\boldsymbol{H}_{b},\boldsymbol{H}_{r}) ( \boldsymbol{H}_{b},\boldsymbol{H}_{r})^{T} = \boldsymbol{H}_{b}\boldsymbol{H}_{b}^{T} + \boldsymbol{H}_{r}\boldsymbol{H}_{r}^{T} = \boldsymbol{I}_{n} $$
(7)

and

$$(\boldsymbol{H}_{b},\boldsymbol{H}_{r})^{T}( \boldsymbol{H}_{b},\boldsymbol{H}_{r}) = \left ( \begin{array}{c@{\quad}c} \boldsymbol{H}_{b}^{T}\boldsymbol{H}_{b} & \boldsymbol{H}_{b}^{T}\boldsymbol{H}_{r} \\[3pt] \boldsymbol{H}_{r}^{T}\boldsymbol{H}_{b} & \boldsymbol{H}_{r}^{T}\boldsymbol{H}_{r} \end{array} \right ) = \boldsymbol{I}_{n} $$

it follows immediately that

$$ \boldsymbol{H}_{b}^{T}\boldsymbol{H}_{b} = \boldsymbol{I}_{m},\qquad \boldsymbol{H}_{b}^{T} \boldsymbol{H}_{r} = \boldsymbol{O},\qquad \boldsymbol{H}_{r}^{T} \boldsymbol{H}_{r} = \boldsymbol{I}_{n - m} $$
(8)

Accordingly, the corresponding mean-shift outlier model reads

$$ E(\boldsymbol{L}) = \boldsymbol{AX} + \boldsymbol{H}_{b}\boldsymbol{\nabla}\quad \mbox{with }\operatorname {Cov}(\boldsymbol{L}) = \sigma _{0}^{2} \boldsymbol{P}^{ - 1}, $$
(9)

in which (A,H b ) is of full column rank.

Based on the LS principle, one can obtain the following normal equation:

$$ \left ( \begin{array}{c@{\quad}c} \boldsymbol{A}^{T}\boldsymbol{PA} & \boldsymbol{A}^{T}\boldsymbol{PH}_{b} \\ \boldsymbol{H}_{b}^{T}\boldsymbol{PA} & \boldsymbol{H}_{b}^{T}\boldsymbol{PH}_{b} \end{array} \right )\left ( \begin{array}{c} \hat{\boldsymbol{X}}_{\boldsymbol{\nabla}} \\ \hat{\boldsymbol{\nabla}} \end{array} \right ) = \left ( \begin{array}{c} \boldsymbol{A}^{T}\boldsymbol{PL} \\ \boldsymbol{H}_{b}^{T}\boldsymbol{PL} \end{array} \right ) $$
(10)

with which and denoting

$$ \boldsymbol{R}_{\boldsymbol{H}_{b}} = \boldsymbol{I}_{n} - \boldsymbol{H}_{b}\bigl(\boldsymbol{H}_{b}^{T} \boldsymbol{PH}_{b}\bigr)^{ - 1}\boldsymbol{H}_{b}^{T} \boldsymbol{P} $$
(11)

we have

$$ \left \{ \begin{array}{l} \hat{\boldsymbol{X}}_{\boldsymbol{\nabla}} = \bigl(\boldsymbol{A}^{T} \cdot \boldsymbol{PR}_{\boldsymbol{H}_{b}} \cdot \boldsymbol{A}\bigr)^{ - 1}\boldsymbol{A}^{T} \cdot \boldsymbol{PR}_{\boldsymbol{H}_{b}} \cdot \boldsymbol{L} \\[6pt] \hat{\boldsymbol{\nabla}} = \bigl(\boldsymbol{H}_{b}^{T}\boldsymbol{PH}_{b}\bigr)^{ - 1}\boldsymbol{H}_{b}^{T}\boldsymbol{P}(\boldsymbol{L} - \boldsymbol{A}\hat{\boldsymbol{X}}_{\boldsymbol{\nabla}} ) \end{array} \right . $$
(12)

It can be verified that \(\boldsymbol{R}_{\boldsymbol{H}_{b}}\) is idempotent and has the following useful properties

$$ \boldsymbol{R}_{\boldsymbol{H}_{b}}^{T}\boldsymbol{PR}_{\boldsymbol{H}_{b}} = \boldsymbol{PR}_{\boldsymbol{H}_{b}} = \boldsymbol{R}_{\boldsymbol{H}_{b}}^{T} \boldsymbol{P},\qquad \boldsymbol{R}_{\boldsymbol{H}_{b}}\boldsymbol{H}_{b} = \boldsymbol{O},\qquad \boldsymbol{H}_{b}^{T} \boldsymbol{PR}_{\boldsymbol{H}_{b}} = \boldsymbol{O} $$
(13)

The corresponding residual vector is

(14)

and thus

$$ \hat{\sigma} _{\boldsymbol{\nabla}} ^{2} = \frac{\varOmega _{\boldsymbol{\nabla}}}{n - (m + u)} $$
(15)

with

$$ \varOmega _{\boldsymbol{\nabla}} = \boldsymbol{V}_{\boldsymbol{\nabla}} ^{T} \boldsymbol{PV}_{\boldsymbol{\nabla}} = (\boldsymbol{L} - \boldsymbol{A}\hat{\boldsymbol{X}}_{\boldsymbol{\nabla}} )^{T}\boldsymbol{PR}_{\boldsymbol{H}_{b}}( \boldsymbol{L} - \boldsymbol{A}\hat{\boldsymbol{X}}_{\boldsymbol{\nabla}} ) $$
(16)

3.2 Multiple case-deletion model

Under the same condition, the multiple case-deletion model reads

$$ E\bigl(\boldsymbol{H}_{r}^{T}\boldsymbol{L}\bigr) = \boldsymbol{H}_{r}^{T}\boldsymbol{AX}\quad \mbox{with}\ \operatorname {Cov}\bigl(\boldsymbol{H}_{r}^{T}\boldsymbol{L}\bigr) = \sigma _{0}^{2}\boldsymbol{H}_{r}^{T} \boldsymbol{P}^{ -1}\boldsymbol{H}_{r}, $$
(17)

with which one can obtain the LS estimator as follows

$$ \hat{\boldsymbol{X}}_{r} = \bigl(\boldsymbol{A}^{T} \boldsymbol{H}_{r} \cdot \boldsymbol{P}_{r} \cdot \boldsymbol{H}_{r}^{T}\boldsymbol{A}\bigr)^{ - 1} \boldsymbol{A}^{T}\boldsymbol{H}_{r} \cdot \boldsymbol{P}_{r} \cdot \boldsymbol{H}_{r}^{T} \boldsymbol{L} $$
(18)

where

$$ \boldsymbol{P}_{r} = \bigl(\boldsymbol{H}_{r}^{T} \boldsymbol{P}^{ - 1}\boldsymbol{H}_{r}\bigr)^{ - 1} $$
(19)

The permutation matrix (H b ,H r ) is invertible. Therefore, one can obtain

$$ \boldsymbol{P}^{ - 1} = (\boldsymbol{H}_{b}, \boldsymbol{H}_{r})\bigl[(\boldsymbol{H}_{b}, \boldsymbol{H}_{r})^{T}\boldsymbol{P}(\boldsymbol{H}_{b}, \boldsymbol{H}_{r})\bigr]^{ - 1}(\boldsymbol{H}_{b}, \boldsymbol{H}_{r})^{T} $$
(20)

which in combination with Eq. (8) yields

(21)

By virtue of Eqs. (7), (13), (19) and (21), we have

(22)

It follows that

$$ \hat{\boldsymbol{X}}_{r} = \hat{\boldsymbol{X}}_{\boldsymbol{\nabla}} $$
(23)

The weighted sum of squares of the LS residuals in this multiple case-deletion model reads

and thus

$$ \hat{\sigma} _{r}^{2} = \frac{\varOmega _{r}}{(n - m) - u} = \hat{\sigma} _{\boldsymbol{\nabla}} ^{2} $$
(24)

It can be seen from Eqs. (23) and (24) that the mean-shift outlier model is equivalent to the multiple case-deletion model, no matter whether the deleted observations are correlated with the remaining or not.

According to the above discussions, one can conclude that the adjustment outputs are equal to each other no matter whether the (potential) outliers are deleted explicitly or implicitly, even though the removed observations are correlated with the remaining ones.

3.3 Computational consideration

With Eq. (12), one has to deal with the two matrix inversions with orders u and m, as opposed to the two matrix inversions with orders u and nm in Eq. (18). Therefore, Eq. (12) outperforms Eq. (18) in term of computational efficiency for in most applications the number of outliers m is small relative to the number of the original observations n.

However, the computational burden can be further reduced by taking the partitioned structure of the normal matrix in Eq. (10) into account. In fact, the normal equation (10) can also be solved as

$$ \left \{ \begin{array}{l} \hat{\boldsymbol{\nabla}} = \bigl(\boldsymbol{H}_{b}^{T}\boldsymbol{PRH}_{b}\bigr)^{ - 1}\boldsymbol{H}_{b}^{T}\boldsymbol{PRL} \\[6pt] \hat{\boldsymbol{X}}_{\boldsymbol{\nabla}} = \bigl(\boldsymbol{A}^{T}\boldsymbol{PA}\bigr)^{ - 1}\boldsymbol{A}^{T}\boldsymbol{P}(\boldsymbol{L} - \boldsymbol{H}_{b}\hat{\boldsymbol{\nabla}} ) \end{array} \right . $$
(25)

or in more explicit form

$$ \left \{ \begin{array}{l} \hat{\boldsymbol{\nabla}} = \bigl(\boldsymbol{H}_{b}^{T}\boldsymbol{PRH}_{b}\bigr)^{ - 1}\boldsymbol{H}_{b}^{T}\boldsymbol{PV} \\[6pt] \hat{\boldsymbol{X}}_{\boldsymbol{\nabla}} = \hat{\boldsymbol{X}} - \bigl(\boldsymbol{A}^{T}\boldsymbol{PA}\bigr)^{ - 1}\boldsymbol{A}^{T}\boldsymbol{PH}_{b}\hat{\boldsymbol{\nabla}} \end{array} \right . $$
(26)

with which we obtain

$$ \boldsymbol{V}_{\boldsymbol{\nabla}} = \boldsymbol{R}(\boldsymbol{L} - \boldsymbol{H}_{b}\hat{\boldsymbol{\nabla}} ) $$
(27)

and

(28)

Apparently, in this situation it only requires extra calculation of the inverse of the m×m normal matrix \(\boldsymbol{H}_{b}^{T}\boldsymbol{PRH}_{b}\). As a by-product, the estimate of the vector of the disturbance parameters can also be obtained with Eq. (26). This is a sufficient reason for choosing the mean-shift outlier model over the case-deletion model from the computational point of view.

4 Quality Assessment of outlying observations

With Sherman-Morrison-Woodbury-Schur formula (Strang and Borre 1997), we have

(29)

This formula states the apparent increase in precision when the outlying observations should have been taken into account but were neglected under the assumption that a priori variance factor is known before (Schaffrin 1997).

The second term of Eq. (29) has a quadratic form, it follows that

$$ \bigl[\bigl(\boldsymbol{A}^{T}\boldsymbol{PR}_{\boldsymbol{H}_{b}} \boldsymbol{A}\bigr)^{ - 1}\bigr]_{ii} \ge \bigl[\bigl( \boldsymbol{A}^{T}\boldsymbol{PA}\bigr)^{ - 1} \bigr]_{ii},\quad i = 1,2, \ldots,u $$
(30)

This inequality shows that all types of DOP metrics (Strang and Borre 1997) will be over-optimistic if the outliers were ignored, even though the outlying observations are correlated with the remaining ones.

After some matrix manipulation, it follows that

$$ \boldsymbol{\varOmega} _{r} = \boldsymbol{L}^{T}\boldsymbol{H}_{r} \cdot \boldsymbol{P}_{r}\boldsymbol{R}_{r} \cdot \boldsymbol{H}_{r}^{T}\boldsymbol{L} $$
(31)

where \(\boldsymbol{R}_{r} = \boldsymbol{I}_{r} - \boldsymbol{H}_{r}^{T}\boldsymbol{A} \cdot (\boldsymbol{A}^{T}\boldsymbol{H}_{r}\boldsymbol{P}_{r}\boldsymbol{H}_{r}^{T}\boldsymbol{A})^{ - 1} \cdot \boldsymbol{A}^{T}\boldsymbol{H}_{r} \cdot \boldsymbol{P}_{r}\).

By virtue of Eqs. (28) and (31) and since the two quadratic forms, Ω and Ω r , are equivalent for any realization of the random observational vector L, we have

$$ \boldsymbol{H}_{r} \cdot \boldsymbol{P}_{r} \boldsymbol{R}_{r} \cdot \boldsymbol{H}_{r}^{T} = \boldsymbol{PR} - \boldsymbol{PRH}_{b}\bigl(\boldsymbol{H}_{b}^{T} \boldsymbol{PRH}_{b}\bigr)^{ - 1}\boldsymbol{H}_{b}^{T} \boldsymbol{PR} $$
(32)

Obviously, the kth observation in the multiple case-deletion model is just the i m+k th one in the original linear Gauss–Markov model. Consequently, we get

$$ \boldsymbol{H}_{r}^{T}\boldsymbol{h}_{i_{m + k}} = \tilde{\boldsymbol{h}}_{k} $$
(33)

where \(\tilde{\boldsymbol{h}}_{k}\) denotes the kth (nm)-dimensional canonical unit vector with 1 as its ith entry.

The kth Baarda’s w-test in the multiple case-deletion model reads (Baarda 1968)

$$ \tilde{w}_{k} = \frac{\tilde{\boldsymbol{h}}_{k}^{T}\boldsymbol{P}_{r} \boldsymbol{R}_{r}\boldsymbol{H}_{r}^{T}\boldsymbol{L}}{\sigma _{0}\sqrt{\tilde{\boldsymbol{h}}_{k}^{T}\boldsymbol{P}_{r}\boldsymbol{R}_{r}\tilde{\boldsymbol{h}}_{k}}} \sim \mathcal{N}(0, 1) $$
(34)

The corresponding MDB measure is given by

$$ \sigma _{0}\sqrt{\frac{\lambda _{0}}{\tilde{\boldsymbol{h}}_{k}^{T}\boldsymbol{P}_{r}\boldsymbol{R}_{r}\tilde{\boldsymbol{h}}_{k}}} = \sigma _{0} \sqrt{\frac{\lambda _{0}}{\boldsymbol{h}_{i_{m + k}}^{T}\boldsymbol{H}_{r}\boldsymbol{P}_{r}\boldsymbol{R}_{r}\boldsymbol{H}_{r}^{T}\boldsymbol{h}_{i_{m + k}}}} $$
(35)

which in combination with Eq. (32) yields

$$ \sigma _{0}\sqrt{\frac{\lambda _{0}}{\tilde{\boldsymbol{h}}_{k}^{T}\boldsymbol{P}_{r}\boldsymbol{R}_{r}\tilde{\boldsymbol{h}}_{k}}} \ge \sigma _{0} \sqrt{\frac{\lambda _{0}}{\boldsymbol{h}_{i_{m + k}}^{T}\boldsymbol{PRh}_{i_{m + k}}}} $$
(36)

This indicates that all the MDB measures of the remaining observations will become larger.

5 Conclusions

Both the case-deletion model and the mean-shift outlier model can be employed to perform multiple case-deletion diagnostics for linear models. The advantage of the case-deletion model is its intuitive appeal, for the suspicious observations are removed explicitly. The mean-shift outlier model, in which the underlying observations are implicitly deleted, has found wider acceptance because of its computational simplicity. However, these two models are equivalent from the mathematical point of view. Under the assumption that a priori variance factor is known before, theoretical analyses indicate that the precision, MDB measure and all kinds of DOP metrics are all over-optimistic when outliers were neglected.