1 Introduction

Kalman filter (KF) is widely used in processing kinematic geodetic measurements (Bogatin and Kogoj 2008). For linear systems with Gaussian noises, KF is optimal in almost every conceivable sense (Simon 2006). Gaussianity is often adopted due to its tractability and its good asymptotic performances, but exact Gaussianity seems to be an idealistic assumption in many practical cases. In fact Gaussianity is only approximate for cases with small sample data, especially when outliers/biases are not negligible. As a 2-norm-of-error minimizer, KF is rather sensible to deviations from the assumed Gaussianity which is well known as the lack-of-robustness or lack-of-reliability. While robustness/reliability can be rather general concepts, only the robustness against uncertainties in noise probability distribution is considered, i.e., the distributional robustness to be more specific (Huber and Ronchetti 2009).

In the geodetic literature, there are two kinds of approaches to address the non-Gaussianity and/or outliers/biases (Lehmann 2013a), one is the test-based outlier detection methods and the other is robust statistics based method.

For the test-based outlier detection method, in his pioneer work, (Baarda 1968) proposed to use hypothesis test to detect outliers in geodetic measurement. While the a priori variance is used in (Baarda 1968), the test using estimated variance is introduced in (Pope 1976). Trying to extend the the theory of Baarda (1968) and Pope (1976), Teunissen developed the recursive detection, identification, and adaptation (DIA) theory (Teunissen 1990a, b; Teunissen and Salzmann 1989). In DIA, the detection process aims to decide by using a overall test whether some kind of bias or outlier is present; the identification process serves to decide in which channel of the measurement and/or process vector and at which epoch the bias/outlier occurred; in the adaptation process, the detected bias/outlier are corrected or discarded. The DIA methods have found widely applications in the geodetic community, e.g., to detect GNSS pseudorange outliers, phase cycle slips or ionospheric disturbances (Teunissen 1998; Teunissen and De Bakker 2013), station coordinate discontinuities (Perfetti 2006), etc. Also, by inversing the power function of the test, a concept called minimal detectable bias/outlier, some kind of reliability measure, can be derived which can be used to evaluate the strength of the measurement model (De Jong 2000; Koch 2015; Teunissen 1998), and hence further to conduct the design of the measurement model (Salzmann 1991). Some possible difficulties in using DIA include the following. First, appropriate alternative hypotheses should be carefully chosen, this may be the most non-trivial task in DIA (Teunissen 1990a). The alternative hypotheses construction in the identification process depends heavily on the specific problem to be solved and determines directly how to adapt the detected unusual measurement in the adaptation process (Lehmann 2013b). Second, the critical values of the test statistics in the detection and/or identification process are often hard to determine because the complex distribution pattern of the test statistics, sometimes, some kind of numerical method, e.g., the Monte Carlo method, can be used instead of the analytical methods (Lehmann 2012). Third, the application of DIA in the multiple-outlier case needs further investigation.

For the robust statistics based method, the well-developed discipline called robust statistics, aiming to robustify the conventional statistical inference methods such as estimation (Huber and Ronchetti 2009), has been successfully used in geodesy (Guo 2013; Hekimoğlu et al. 2011; Třasák and Štroner 2014). The starting point, which make it different from the test-based method, is to only make the method insensitive to outliers or other statistical uncertainties. In other words, it does not try to detect/identify/correct the uncertainties but only to resist them. It seems also natural to borrow concepts and methods from robust statistics to analyze and improve the robustness of KF. It is indeed so, e.g., the Bayesian estimator, of which KF can be seen a special case, was robustified using the celebrated M-estimator in (Yang 1991). Robust KF for rank-deficient measurement models was studied in (Koch and Yang 1998) focusing on getting the initial estimates at the start of the filtering. Rank-deficient model together with process model uncertainties was addressed recently in (Chang and Liu 2015) focusing on getting more suitable initial estimate in iteratively solving the M-estimation problem, also in (Chang and Liu 2015), the influence function of KF is introduced and derived to evaluate the robustness of a KF-based approach. An adaptively robust KF is proposed in (Yang et al. 2001) to address uncertainties in both process and measurement models. M-estimator based robust KF is also developed in the framework of nonlinear KF, e.g., the unscented KF (Karlgaard 2015).

Note that there are some kind of overlapping of these two categories. First of all, both aim to robustify the methods, though through different approaches. In the robust statistics method, e.g., the M-estimator, some kind of detection and adaptation (down weighting) can be safely considered existing (Lehmann 2013a). Some kinds of combinations of the two are also possible, e.g., in (Lehmann 2013a), it is stated that “Robust estimation procedures can also be considered as preparatory tools for improved outlier testing”.

A robust KF using Chi square test to detect outliers in the measurement is studied in (Chang 2014), in this approach, the Mahalanobis distance of the measurement under assumed Gaussian distribution is constructed as the test statistic. This approach bears some features of both the above two methods. Outliers, raised due to statistical uncertainties, is detected using a Chi square test, this is the same to the detection process of the DIA in its local test form. However, for the detected outlier, the corresponding measurement is down weighted to make the estimate insensitive to it, this follows the lines of robust statistics, of course one can also consider it is following one kind of the adaptation of DIA. In (Chang 2014) Only one total Mahalanobis distance of all measurement elements is calculated and only one scaling factor is introduced to inflate the overall covariance matrix of the innovation vector, so in its original form, the approach cannot efficiently address uncertainties in only part of the measurement channels (Chang 2014). It was mentioned in (Chang 2014) that this problem can be fixed through implementing sequential measurement update, i.e., processing the vectorial measurement element by element. This idea is detailed and further explored in the current work. In addition to addressing uncertainties in part or even individual measurement channel, there are some by-products of this idea, e.g., superior numerical stability can be expected because no matrix inversion is involved. More importantly, accuracy can be further improved especially through elaborately choosing the order of processing the elements of the measurement vector. We attribute this improvement to the higher statistical efficiency gained. More specifically, after part of the elements are processed, better estimation (better than the prediction) can be obtained which serves as better reference to detect outliers in processing the remaining part of the elements. Better reference will increase the probability of correctly resisting the outlying ones and retaining the good ones in the measurement elements, which means a higher statistical efficiency.

The remaining part of the paper is organized as follows. The method is presented in Sect. 2. Accuracy improvement is illustrated with a simulating example in Sect. 3. Some concluding remarks is given in Sect. 4.

2 Method

In the first subsection, after presenting the basic formulae of the KF, the approach of detecting and resisting outliers in the measurement is introduced. In the second subsection, sequentially implementing the measurement update with the previously introduced outlier handling method is derived emphasizing a novel ordering strategy in processing the elements of the measurement vector.

2.1 Kalman filter and the resistance of outliers

The problem studied is represented as the following discrete-time state space model,

$$ \varvec{x}_{k} = \varvec{Fx}_{k - 1} + \varvec{w}_{k - 1} $$
(1)
$$ \varvec{y}_{k} = \varvec{Hx}_{k} + \varvec{v}_{k} $$
(2)

where \( \varvec{x}_{k} \) and \( \varvec{y}_{k} \) are n- and m-dimensional state and measurement vectors at the kth epoch, F and H are transition and design matrices with appropriate dimensions, \( \varvec{w}_{k} \) and \( \varvec{v}_{k} \) are process and measurement noises which are assumed zero-mean Gaussianly distributed with nominal covariance matrix Q and R respectively. Note that in this study the real distribution of \( \varvec{v}_{k} \) can deviate from this assumption. Assume the initial estimate at 0 epoch is \( \widehat{\varvec{x}}_{0\left| 0 \right.} \) with associate covariance estimate being \( \varvec{P}_{{\widehat{\varvec{x}}_{0\left| 0 \right.} ,\widehat{\varvec{x}}_{0\left| 0 \right.} }} \). At any epoch, say k, we have the following KF formulae.

$$ \widehat{\varvec{x}}_{{k\left| {k - 1} \right.}} = \varvec{F}\widehat{\varvec{x}}_{{k - 1\left| {k - 1} \right.}} $$
(3)
$$ \varvec{P}_{{\widehat{\varvec{x}}_{{k\left| {k - 1} \right.}} ,\widehat{\varvec{x}}_{{k\left| {k - 1} \right.}} }} = \varvec{FP}_{{\widehat{\varvec{x}}_{{k - 1\left| {k - 1} \right.}} ,\widehat{\varvec{x}}_{{k - 1\left| {k - 1} \right.}} }} \varvec{F}^{T} + \varvec{Q} $$
(4)
$$ \widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} = \varvec{H}\widehat{\varvec{x}}_{{k\left| {k - 1} \right.}} $$
(5)
$$ \varvec{P}_{{\widehat{y}_{{k\left| {k - 1} \right.}} ,\widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} }} = \varvec{HP}_{{\widehat{\varvec{x}}_{{k\left| {k - 1} \right.}} ,\widehat{\varvec{x}}_{{k\left| {k - 1} \right.}} }} \varvec{H}_{{}}^{T} + \varvec{R} $$
(6)
$$ \varvec{K}_{k} = \varvec{P}_{{\widehat{\varvec{x}}_{{k\left| {k - 1} \right.}} ,\widehat{\varvec{x}}_{{k\left| {k - 1} \right.}} }} \varvec{H}^{T} \left( {\varvec{P}_{{\widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} ,\widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} }} } \right)^{ - 1} $$
(7)
$$ \widehat{\varvec{x}}_{k\left| k \right.} = \widehat{\varvec{x}}_{{k\left| {k - 1} \right.}} + \varvec{K}_{k} \left( {\tilde{\varvec{y}}_{k} - \widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} } \right) $$
(8)
$$ \varvec{P}_{{\widehat{\varvec{x}}_{k\left| k \right.} ,\widehat{\varvec{x}}_{k\left| k \right.} }} = \varvec{P}_{{\widehat{\varvec{x}}_{{k\left| {k - 1} \right.}} ,\widehat{\varvec{x}}_{{k\left| {k - 1} \right.}} }} - \varvec{K}_{k} \varvec{P}_{{\widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} ,\widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} }} \varvec{K}_{k}^{T} $$
(9)

where \( \widehat{\varvec{x}}_{i\left| j \right.} \) represents an estimate of the state vector at the ith epoch using measurements up to the jth epoch, specifically, \( \widehat{\varvec{x}}_{{k\left| {k - 1} \right.}} \) and \( \widehat{\varvec{x}}_{k\left| k \right.} \) are also called the a priori and the a posteriori estimates; \( \varvec{P}_{{\varvec{a},\varvec{b}}} \) represents the (cross) covariance matrix between a and b; \( \tilde{\varvec{y}}_{k} \), a non-random constant, is the actual measurement, or a realization of \( \varvec{y}_{k} \); \( \varvec{K}_{k} \) is gain matrix which combines \( \widehat{\varvec{x}}_{{k\left| {k - 1} \right.}} \) and \( \tilde{\varvec{y}}_{k} - \widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} \) linearly to get \( \widehat{\varvec{x}}_{k\left| k \right.} \). Note that \( \varvec{e}_{k} \text{ = }\tilde{\varvec{y}}_{k} - \widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} \) is often called the innovation vector (Kailath et al. 2000) whose covariance is equal to \( \varvec{P}_{{\widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} ,\widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} }} \) because \( \tilde{\varvec{y}}_{k} \) is non-random.

Under the Gaussian assumption, \( \varvec{y}_{k} \) should be Gaussian with mean \( \widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} \) and covariance \( \varvec{P}_{{\widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} ,\widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} }} \), and the squared Mahalanobis distance of \( \varvec{y}_{k} \) should be Chi square distributed with m freedoms, i.e.,

$$ \gamma \left( {\varvec{y}_{k} } \right) = d^{2} = \left( {\varvec{y}_{k} - \widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} } \right)^{T} \left( {\varvec{P}_{{\widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} ,\widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} }} } \right)^{ - 1} \left( {\varvec{y}_{k} - \widehat{\varvec{y}}_{{k\left| {k - 1} \right.}} } \right) \sim \chi_{m}^{2} $$
(10)

So we can do a Chi square test to judge whether an actual measurement is a realization of \( \varvec{y}_{k} \) under the Gaussian assumption. Let the null hypothesis be that \( \varvec{v}_{k} \) is Gaussianly distributed. For a given significance level, say 1 − α, and the corresponding upper α-quantile \( \chi_{m,\alpha }^{2} \), we have

$$ \Pr \left[ {\gamma \left( {\varvec{y}_{k} } \right) > \chi_{m,\alpha }^{2} } \right] < \alpha $$
(11)

if the null hypothesis holds, where Pr[·] denotes the probability of an event. Given α a rather small value, if \( \gamma \left( {\tilde{\varvec{y}}_{k} } \right) > \chi_{m,\alpha }^{2} \), we can say with a rather high probability, i.e., 1 − α, that the null hypothesis should be rejected, and in this case, \( \tilde{\varvec{y}}_{k} \) is deemed to be outlier. This shows how the outliers are detected. For a detected outlier, its contribution in the measurement update should be decreased, and this is achieved by inflating the covariance of the innovation vector. The inflating factor can be calculated as

$$ \kappa = \frac{{\gamma \left( {\tilde{\varvec{y}}_{k} } \right)}}{{\chi_{m,\alpha }^{2} }} $$
(12)

This shows how the detected outlier is resisted.

As in Eq. (10), only one statistic is calculated to judge whether the overall measurement vector is an outlier, and as in Eq. (12), only one scaling factor is introduced to inflate the whole covariance matrix. If only part or even a single element of the measurement is outlying, the overall measurement vector may be deemed as being outlying. This means that some good measurement elements may be mistaken as outliers. Also an outlying element may fail to be detected because it may be masked by other good elements. This means that some outlying measurement elements may be mistaken as good ones. To summarize, either mistaking good measurement elements as outlying ones or mistaking outlying ones as good ones will result in loss of statistical efficiency and hence in lower accuracy of the estimates.

2.2 Implementing sequential measurement update in robust Kalman filter

As mentioned in (Chang 2014), outliers in individual measurement channel can be efficiently addressed by implementing sequential measurement update of the KF, i.e., only a single measurement element is processed once in a measurement update and hence there will be m measurement updates in one recursion at any epoch. In every measurement update, the previous outlier detection and resistance will be performed.

If the measurement elements are cross correlated, we first de-correlate them through the Cholesky decomposition. One reviewer insisted on putting douts on this decorrelation approach, as it would spread outliers over different observations. Unfortunately, the authors have no sounder solutions for now and will work in depth on one this issue in the future. Let

$$ \varvec{R} = \varvec{LL}^{T} $$
(13)

From Eq. (2), we have

$$ \varvec{L}^{ - 1} \varvec{y}_{k} = \varvec{L}^{ - 1} \varvec{Hx}_{k} + \varvec{L}^{ - 1} \varvec{v}_{k} $$
(14)

Let \( \bar{\varvec{y}}_{k} = \varvec{L}^{ - 1} \varvec{y}_{k} \), \( \bar{\varvec{H}} = \varvec{L}^{ - 1} \varvec{H} \), and \( \bar{\varvec{v}}_{k} = \varvec{L}^{ - 1} \varvec{v}_{k} \), we have de-correlated measurement equation

$$ \bar{\varvec{y}}_{k} = \bar{\varvec{H}}\varvec{x}_{k} + \bar{\varvec{v}}_{k} $$
(15)
$$ \text{cov} \left[ {\bar{\varvec{v}}_{k} } \right] = \varvec{L}^{ - 1} \varvec{RL}^{ - T} = \varvec{L}^{ - 1} \varvec{LL}^{T} \varvec{L}^{ - T} = \varvec{I} $$
(16)

Note that for this measurement equation, the actual measurement should be \( \tilde{\bar{\varvec{y}}}_{k} = \varvec{L}^{ - 1} \tilde{\varvec{y}}_{k} \) accordingly.

Let

$$ \overline{H} = \left[ {\overline{h} _{1}^{T} \quad \overline{h} _{2}^{T} \cdots \overline{h} _{m}^{T} } \right]$$
(17)

In the jth measurement update at the kth epoch, we have

$$ \hat{\bar{y}}_{k,j} { = }\bar{\varvec{h}}_{j} \hat{\varvec{x}}_{k,j - 1} $$
(18)
$$ \sigma _{{\hat{{\overline{\varvec{y}} }}_{{k,j}} }}^{2} = \bar{\varvec{h}}_{j} \varvec{P}_{{\widehat{\varvec{x}}_{{k,j - 1}} ,\widehat{\varvec{x}}_{{k,j - 1}} }} \overline{\varvec{h}} _{j}^{T} + 1 $$
(19)

where \( \hat{\varvec{x}}_{k,j - 1} \) is the estimate after processing the (j − 1)th measurement element and \( \hat{\varvec{x}}_{k,0} = \hat{\varvec{x}}_{{k\left| {k - 1} \right.}} \), \( \hat{\bar{y}}_{k,j} \) is the prediction of the jth element of \( \bar{\varvec{y}}_{k} \). One-freedom Chi square test, is carried out to judge whether \( \tilde{{\overline{y} }}_{{k,j}} \) is an outlier. The statistic is calculated as

$$\gamma \left( {\tilde{{\overline{y} }}_{{k,j}} } \right) = \frac{{\left( {\tilde{{\overline{y} }}_{{k,j}} - \hat{{\overline{y} }}_{{k,j}} } \right)^{2} }}{{\sigma _{{\hat{{\overline{\varvec{y}} }}_{{k,j}} }}^{2} }}$$
(20)

For a given significance level, say also 1 − α, and the corresponding upper α-quantile \( \chi_{1,\alpha }^{2} \), we check if

$$ \gamma \left( {\tilde{\bar{y}}_{k,j} } \right) > \chi_{1,\alpha }^{2} $$
(21)

If Eq. (21) holds, \( \tilde{\bar{y}}_{k,j} \) is deemed as an outlier, then the following scaling factor is calculated to inflate the variance in Eq. (19), i.e.,

$$ \kappa = \frac{{\gamma \left( {\tilde{\bar{y}}_{k,j} } \right)}}{{\chi_{1,\alpha }^{2} }} $$
(22)
$$ \kappa \sigma _{{\hat{{\bar{\varvec{y}} }}_{{k,j}} }}^{2} \to \sigma _{{\hat{{\bar{\varvec{y}} }}_{{k,j}} }}^{2} $$
(23)

And then update the estimate as

$$ {\varvec{K}} = \frac{{\varvec{P}_{{\hat{\varvec{x}}_{{k,j - 1}} ,{\hat{\varvec{x}}}_{{k,j - 1}}}} {\overline{{\varvec{h}}}_{j}^{T}}}}{{\sigma _{{\hat{{\bar{\varvec{y}} }}_{{k,j}} }}^{2} }}$$
(24)
$$ {\hat{\varvec{x}}_{k,j}} = {\hat{\varvec{x}}_{k,j - 1}} + {\varvec{K}} {\left( {\tilde{\bar{y}}_{k,j} - {\hat{\bar{y}}_{k,j}} } \right)} $$
(25)
$$ \varvec{P}_{{\widehat{\varvec{x}}_{{k,j}} ,\widehat{\varvec{x}}_{{k,j}} }} = \varvec{P}_{{\widehat{\varvec{x}}_{{k,j - 1}} ,\widehat{\varvec{x}}_{{k,j - 1}} }} - \sigma _{{\hat{{\bar{\varvec{y}} }}_{{k,j}} }}^{2} \varvec{KK}^{T} $$
(26)

It is well known that \( \hat{\varvec{x}}_{k,j} \) is a more accurate estimate than \( \hat{\varvec{x}}_{k,j - 1} \), because \( \varvec{P}_{{\hat{\varvec{x}}_{k,j} ,\hat{\varvec{x}}_{k,j} }} \le \varvec{P}_{{\hat{\varvec{x}}_{k,j - 1} ,\hat{\varvec{x}}_{k,j - 1} }} \), i.e., \( \varvec{P}_{{\hat{\varvec{x}}_{k,j - 1} ,\hat{\varvec{x}}_{k,j - 1} }} - \varvec{P}_{{\hat{\varvec{x}}_{k,j} ,\hat{\varvec{x}}_{k,j} }} \) is positive semi-definite, which is explicitly shown in Eq. (26). So in checking the (j + 1)th measurement elements, the reference information used, i.e., \( \hat{\varvec{x}}_{k,j} \), is better than \( \hat{\varvec{x}}_{k,j - 1} \), of course even better than \( \hat{\varvec{x}}_{{k\left| {k - 1} \right.}} \). Better reference information implies an increased probability of detecting outliers, so it is more probable to correctly resist the outlying elements and to retain the good ones.

In implementing the above robust sequential measurement update, it should be apparent that the more reliable a measurement element is, the more accurate the estimate will become. So the more reliable elements should be firstly processed in order to get a even better reference information to check the remaining dubious elements. The Mahalanobis distances or their squares of individual elements can represent to some extent the relative qualities of these elements, i.e., the smaller one’s Mahalanobis distance is, the more reliable the elements should be. So for given reference information, Mahalanobis distances of all the remaining elements are calculated, and the element with the smallest Mahalanobis distance is processed in the next measurement update. Of course this sorting process will increase the computation, but the accuracy improvement may reward in some cases.

The algorithm is depicted in Table 1 in the form of pseudo code.

Table 1 Outlier detection and resistance in sequential measurement update

3 An illustrating example

Assume an object is moving forward without side slip in the horizon, the forward distance and velocity are of interest and the north and east position is measured.

A constant velocity model is assumed, so we have

$$ \left[ {\begin{array}{l} {\dot{p}} \hfill \\ {\dot{v}} \hfill \\ \end{array} } \right] = \left[ {\begin{array}{ll} 0 \hfill & 1 \hfill \\ 0 \hfill & 0 \hfill \\ \end{array} } \right]\left[ {\begin{array}{l} p \hfill \\ v \hfill \\ \end{array} } \right] + \left[ {\begin{array}{l} 0 \hfill \\ a \hfill \\ \end{array} } \right] $$
(27)

where, p and v are the forward position and velocity which should be estimated, a is the forward acceleration which is assumed white Gaussian noise with variance \( \sigma_{a}^{2} \). Given the integration interval τ, Eq. (27) is discretized to get the process equation

$$ \left[ {\begin{array}{*{20}c} {p_{k} } \\ {v_{k} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} 1 & \tau \\ 0 & 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {p_{k - 1} } \\ {v_{k - 1} } \\ \end{array} } \right] + \varvec{w}_{k - 1} $$
(28)

with

$$ \varvec{Q}\text{ = cov}\left[ {\varvec{w}_{k - 1} } \right] = \sigma_{a}^{2} \left[ {\begin{array}{*{20}c} {\frac{{\tau^{3} }}{3}} & {\frac{{\tau^{2} }}{2}} \\ {\frac{{\tau^{2} }}{2}} & \tau \\ \end{array} } \right] $$
(29)

The measurement equation is

$$ \left[ {\begin{array}{l} {p_{N,k} } \\ {p_{E,k} } \\ \end{array} } \right] = \left[ {\begin{array}{ll} {\cos \theta } & 0 \\ {\sin \theta } & 0 \\ \end{array} } \right]\left[ {\begin{array}{l} {p_{k} } \\ {v_{k} } \\ \end{array} } \right] + \left[ {\begin{array}{l} {\varepsilon_{k} } \\ {\xi_{k} } \\ \end{array} } \right] $$
(30)

where θ is the heading angle. The measurement noises \( \varepsilon_{k} \) and \( \xi_{k} \) are assumed to be distributed as gross error model, which is proposed in (Huber 1964) to formulate the full kind of the “neighborhood” of an assumed parametric model, the probability density function is

$$ f = \left( {1 - \mu } \right)f_{0} + \mu f_{\text{c}} $$
(31)

where f, f 0, and f c represent the real, the nominal, and the contaminating distribution respectively, μ is called the contaminating ratio. When μ = 0, the nominal distribution represent the real distribution exactly. In this study both f 0 and f c are assumed to be zero-mean Gaussian but with different variance, in this case, the gross error model in Eq. (31) is also called a Gauss mix model.

The following three cases are studied.

Case 1

NoUn: uncertainty exists in neither of the two channels, i.e., the nominal distributions of both \( \varepsilon_{k} \) and \( \xi_{k} \) represent their real distributions;

Case 2

UnOn: uncertainty exists in only one channel, say the north channel;

Case 3

UnBo: uncertainties exist in both channels.

Let f 0 ~ N(0,1) and f c ~ N(0,10), if uncertainty exists, let μ = 0.1, otherwise μ = 0.

Three approaches are checked in all three cases,

  • Approach 1 KF, the standard KF;

  • Approach 2 RKF, the robust KF proposed in (Chang 2014), also introduced in Sect. 2.1. In this approach let α = 0.05, so \( \chi_{{2,{\kern 1pt} {\kern 1pt} {\kern 1pt} 0.05}}^{2} = 5.99 \).

  • Approach 3 RKFs, the robust KF with sequential measurement update proposed in this work, illustrated in Table 1. In this approach let α = 0.05, so \( \chi_{{1,{\kern 1pt} {\kern 1pt} {\kern 1pt} 0.05}}^{2} = 3.84 \).

Monte Carlo experiments with 10,000 independent runs are conducted. The mean squared (over different Monte Carlo runs) errors of the estimates at any epoch by all three approaches in all three cases are calculated and depicted in the figures. The overall mean squared errors (over both different Monte Carlo runs and different epochs) of the estimates by three approaches in three cases are also calculated and summarized in Table 2.

Table 2 Root mean squared errors in position and velocity estimations by three approaches, i.e., Kalman filter (KF), robust Kalman filter (RKF), and robust Kalman filter with sequential measurement update (RKFs), in three cases, i.e., with no uncertainty (NoUn), with uncertainties in one channel (UnOn), and with uncertainties in both channels (UnBo)

In Figs. 1 and 2, it is found that all three approaches perform well in the first case. As the KF is optimal in this case, the comparative performance of the two robust approaches validates their statistical efficiency in this case. More specifically, there exists inevitably probability that the good measurement may be mistaken as outliers, but this probability is rather low. The high efficiency is achieved by deliberately selecting a rather high significance level, i.e., 1 − α.

Fig. 1
figure 1

Mean squared errors in the position estimates in Case 1 (NoUn)

Fig. 2
figure 2

Mean squared errors in the velocity estimates in Case 1 (NoUn)

From Figs. 3 and 4, we see that the performance of the KF degrades significantly which clearly shows its lack of robustness. However, two robust approaches can still provide relatively good estimates in this case. The higher accuracy of the two robust approaches compared to that of the KF is due to the robustness of two. Note that in spite of the robustness, the root mean squared errors of the two robust approaches still increase compared to the first case, this has nothing to do with robustness, rather this is because the equivalent accuracy of the measurement decrease compared to the first case. In the first case, the standard deviation of the measurement noise (north channel) is 1 m, while the equivalent standard deviation (north channel) in the second case is

$$ \bar{\sigma }\, = \,\sqrt {\left( {1 - \mu } \right)\sigma_{{f_{0} }}^{2} + \mu \sigma_{{f_{c} }}^{2} } = \sqrt {0.9 \times 1 + 0.1 \times 100} \,\approx\, 3.3 > 1 $$
(32)
Fig. 3
figure 3

Mean squared errors in the position estimates in Case 2 (UnOn)

Fig. 4
figure 4

Mean squared errors in the velocity estimates in Case 2 (UnOn)

From Figs. 3 and 4, we can also find that the RKFs approach performs slightly better than the RKF approach. The superiority of RKFs can be clearly seen in Table 2. As explained previously, this is due to the higher statistical efficiency of RKFs than RKF.

From Figs. 5 and 6, we observe that again in this case, the two robust approaches outperform the standard KF which is due to the robustness. Again in this case, the performances of the two robust approaches decrease compared to the second case which is caused by the increased standard deviation of measurement noise in east channel. The RKFs performs slightly better than RKF as is clearly demonstrated in Table 2, again, this is due to the relatively higher statistical efficiency of the former compared to the latter.

Fig. 5
figure 5

Mean squared errors in the position estimates in Case 3 (UnBo)

Fig. 6
figure 6

Mean squared errors in the velocity estimates in Case 3 (UnBo)

4 Concluding remarks

The distributional uncertainty in the measurement noise, or more specifically the non-Gaussianity of the measurement noise’s distribution, is addressed through employing a robust KF based on Chi square test to detect outliers and innovation vector covariance inflation to resist the detected outliers. Through implementing the so called sequential measurement update of the robust KF, we achieved manifold merits: the ability to address outliers in part of or even individual measurement channel; higher numerical stability because matrix inverse is no longer needed; and most importantly higher accuracy because of a higher statistical efficiency in detecting outliers. The higher statistical efficiency is brought about by a higher probability of correctly detecting the outlying measurement elements and retaining the good ones.

It must be admitted that there is inevitably computation increase in the proposed method, mainly because hypothesis test should be done for every measurement element and a sorting process should be carried out to select the measurement element for the next processing, but we believe that in some case, the higher accuracy gained may reward the increased computation paid.