1 Introduction

Kalman filtering has been extensively applied in many science and engineering disciplines, such as target tracking, control, signal processing, communication, navigation and robotics [16]. The Kalman filter is an optimal state estimator in terms of minimum mean square error for a linear state-space model with Gaussian state and measurement noises. A priori knowledge of the noise statistics (mean vectors and covariance matrices of state and measurement noises) are required to implement the Kalman filter. The performance of the Kalman filter relies heavily on the prior noise statistics, and the estimation accuracy of the Kalman filter degrades dramatically when the inaccurate or wrong prior noise statistics are used [13]. However, in a range of practical applications, the noise statistics may be unknown or even time-varying. For example, in the strap-down inertial navigation system (SINS)- and global positioning system (GPS)-based integrated navigation system, the state and measurement noises are, respectively, induced by the measurement errors of inertial measurement units (IMUs) and GPS, and the biases and observation accuracy of IMUs and GPS may vary with the change of the environment, which results in unknown and time-varying mean vectors and covariance matrices of state and measurement noises [3,4,5, 14, 19].

To address the filtering problem of a linear Gaussian state-space model with inaccurate noise statistics, a large number of adaptive Kalman filters (AKFs) have been proposed based on different methods. The Sage-Husa AKF (SHAKF) has been proposed based on the covariance matching method, in which the noise statistics are recursively estimated using the maximum a posterior criterion [2, 17]. Although the SHAKF can estimate the mean vectors and covariance matrices of state and measurement noises simultaneously, both the convergence to the right noise covariance matrices and the positive definiteness of the estimated noise covariance matrices are not guaranteed by the SHAKF [6]. As a result, the SHAKF is often found to halt its operation due to filtering divergence or numerical instability. The IAKF has been proposed based on a maximum likelihood rule, and it estimates the noise covariance matrices using the innovation sequence that is a white process in the Kalman filtering framework [14]. In general, a large windows of innovations are required by the IAKF to achieve reliable and accurate estimations of noise covariance matrices so that the performance of IAKF may degrade for rapidly varying noise covariance matrices [11]. Also, the positive definiteness of the estimated noise covariance matrices is not guaranteed by the IAKF, which may result in numerical instability. The multiple-model AKF (MMAKF) has been proposed based on the Bayesian rule, in which the state and measurement noises are modeled by Gaussian distributions with different noise statistics, and a bank of Kalman filters are operated using different noise statistics, and the state estimate of the MMAKF is a weighted sum of the estimates of these Kalman filters [12]. Unfortunately, in practical applications, a large number of Gaussian distributions are selected to model the state and measurement noises to guarantee the performance of the MMAKF, which leads to substantial computational complexities [18].

Recently, the expectation-maximization-based AKF (EMAKF) and the variational Bayesian-based AKF (VBAKF) have been proposed to deal with the filtering problem of a Gaussian state-space model with inaccurate noise covariance matrices, in which the prediction error covariance matrix and measurement noise covariance matrix instead of the state and measurement noise covariance matrices are jointly estimated [6, 7]. For the EMAKF, the approximate maximum likelihood estimates of the prediction error covariance matrix and measurement noise covariance matrix are obtained based on the expectation-maximization (EM) algorithm [7]. On the other hand, for the VBAKF, the state vector, the prediction error covariance matrix and the measurement noise covariance matrix are jointly inferred using the variational Bayesian (VB) approach, and the posterior probability density functions (PDFs) of the prediction error covariance matrix and measurement noise covariance matrix are approximated by inverse-Wishart (IW) [6]. Although the EMAKF and VBAKF exhibit satisfactory performance for the case of time-varying and inaccurate noise covariance matrices, their estimation accuracy degrades for the case of time-varying and inaccurate mean vectors and covariance matrices of state and measurement noises.

In this paper, a new AKF is proposed for a linear Gaussian state-space model with inaccurate noise statistics. Both the prior joint PDF of the one-step prediction and corresponding prediction error covariance matrix and the joint PDF of the mean vector and covariance matrix of measurement noise are modeled by Normal-inverse-Wishart (NIW) distributions, from which a new NIW-based hierarchical Gaussian state-space model is constructed. The state vector, the one-step prediction and corresponding prediction error covariance matrix, and the mean vector and covariance matrix of measurement noise are jointly estimated based on the VB approach. The proposed AKF and existing state-of-the-art AKFs are applied to a problem of target tracking with inaccurate noise statistics. Simulation results demonstrate that the proposed AKF has better estimation accuracy than existing state-of-the-art AKFs.

The remainder of this paper is organized as follows. Section 2 gives the notations that are used in this paper. Section 3 presents the problem formulation. In Sect. 4, a new NIW-based hierarchical Gaussian state-space model is constructed, based on which a new AKF is proposed using the VB approach. In Sect. 5, the proposed AKF and existing state-of-the-art AKFs are compared by a target tracking example, and simulation results are given. Concluding remarks are given in Sect. 6.

2 Notations

Throughout this paper, we denote \({\varvec{z}}_{i:j}\triangleq \{{\varvec{z}}_k|i\le k\le j\}\); \(\mathrm {N}({\varvec{\mu }}, {\varvec{\varSigma }})\) and \(\mathrm {N}(\cdot ; {\varvec{\mu }}, {\varvec{\varSigma }})\) denote, respectively, the multivariate Gaussian distribution and Gaussian PDF with mean vector \({\varvec{\mu }}\) and covariance matrix \({\varvec{\varSigma }}\); \(\mathrm {NiW}(\cdot ,\cdot ;{\varvec{\mu }},\gamma ,\delta ,{\varvec{\varLambda }})\) denotes the NIW PDF with mean vector \({\varvec{\mu }}\), priori confidence parameter \(\gamma \), degrees of freedom (dof) parameter \(\delta \), and scale matrix \({\varvec{\varLambda }}\); \(\mathrm {IW}(\cdot ;\mu , {\varvec{\varSigma }})\) denotes the IW PDF with dof parameter \(\mu \) and inverse scale matrix \({\varvec{\varSigma }}\); \(\mathrm {E}_{x}[\cdot ]\) denotes the expectation operator with respect to the PDF of random variable x; \(\log \) denotes the natural logarithm; \({\varvec{I}}_{n}\) denotes the \(n\times n\) identity matrix; \({\varvec{0}}_{n\times 1}\) denotes n dimensional zero vector; the superscript “\(-1\)” denotes the inverse operation of a matrix; the superscript “\(\mathrm {T}\)” denotes the transpose operation of a vector or matrix; and \(|\cdot |\) and \(\mathrm {tr}(\cdot )\) denote the determinant and trace operations of a matrix, respectively.

3 Problem Formulation

Consider the following discrete-time linear stochastic system as shown by the state-space model [8]

$$\begin{aligned} {\varvec{x}}_{k}= & {} {\varvec{F}}_{k-1}{\varvec{x}}_{k-1}+{\varvec{w}}_{k-1}, \end{aligned}$$
(1)
$$\begin{aligned} {\varvec{z}}_{k}= & {} {\varvec{H}}_{k}{\varvec{x}}_{k}+{\varvec{v}}_{k}, \end{aligned}$$
(2)

where (1) and (2) are, respectively, the state and measurement equations, k is the discrete-time index, \({\varvec{x}}_k\in \mathbb {R}^{n}\) is the state vector, \({\varvec{z}}_k\in \mathbb {R}^{m}\) is the measurement vector, \({\varvec{F}}_{k}\in \mathbb {R}^{n{\times }n}\) is the state transition matrix, \({\varvec{H}}_{k}\in \mathbb {R}^{m{\times }n}\) is the observation matrix, and \({\varvec{w}}_{k}\in \mathbb {R}^{n}\) and \({\varvec{v}}_{k}\in \mathbb {R}^{m}\) are, respectively, the state and measurement noise vectors. The state and measurement noises are assumed to have stationary Gaussian distributions, i.e., \({\varvec{w}}_{k}\sim \mathrm {N}({\varvec{q}},{\varvec{Q}})\) and \({\varvec{v}}_{k}\sim \mathrm {N}({\varvec{r}},{\varvec{R}})\), where \({\varvec{q}}\) and \({\varvec{Q}}\) are, respectively, the mean vector and covariance matrix of state noise, and \({\varvec{r}}\) and \({\varvec{R}}\) are, respectively, the mean vector and covariance matrix of measurement noise. The initial state vector \({\varvec{x}}_{0}\) is assumed to have a Gaussian distribution with mean vector \({\hat{{\varvec{x}}}}_{0|0}\) and covariance matrix \({\varvec{P}}_{0|0}\). Moreover, \({\varvec{x}}_{0}\), \({\varvec{w}}_{k}\) and \({\varvec{v}}_{j}\) are assumed to be mutually uncorrelated for any time samples j and k.

For the linear Gaussian state-space model formulated in (1)–(2), the Kalman filter is often employed to estimate unknown state vector \({\varvec{x}}_{k}\) based on the available measurement information \({\varvec{z}}_{1:k}\), model parameters \(\left\{ {\varvec{F}}_{k-1}, {\varvec{H}}_{k}\right\} \), and noise statistics \(\left\{ {\varvec{q}},{\varvec{Q}},{\varvec{r}},{\varvec{R}}\right\} \). The recursive Kalman filter is composed of time update and measurement update, which is formulated as follows: [16]

Time update

$$\begin{aligned} {\hat{{\varvec{x}}}}_{k|k-1}= & {} {\varvec{F}}_{k-1}{\hat{{\varvec{x}}}}_{k-1|k-1}+{\varvec{q}}, \end{aligned}$$
(3)
$$\begin{aligned} {\varvec{P}}_{k|k-1}= & {} {\varvec{F}}_{k-1}{\varvec{P}}_{k-1|k-1}{\varvec{F}}_{k-1}^{\mathrm {T}}+{\varvec{Q}}, \end{aligned}$$
(4)

Measurement update

$$\begin{aligned} {\varvec{K}}_{k}= & {} {\varvec{P}}_{k|k-1}{\varvec{H}}_{k}^{\mathrm {T}}({\varvec{H}}_{k} {\varvec{P}}_{k|k-1}{\varvec{H}}_{k}^{\mathrm {T}}+{\varvec{R}})^{-1}, \end{aligned}$$
(5)
$$\begin{aligned} {\hat{{\varvec{x}}}}_{k|k}= & {} {\hat{{\varvec{x}}}}_{k|k-1}+{\varvec{K}}_{k} \left( {\varvec{z}}_{k}-{\varvec{H}}_{k}{\hat{{\varvec{x}}}}_{k|k-1}-{\varvec{r}}\right) , \end{aligned}$$
(6)
$$\begin{aligned} {\varvec{P}}_{k|k}= & {} \left( {\varvec{I}}_{n}-{\varvec{K}}_{k}{\varvec{H}}_{k}\right) {\varvec{P}}_{k|k-1}, \end{aligned}$$
(7)

where \({\hat{{\varvec{x}}}}_{k|k-1}\) and \({\varvec{P}}_{k|k-1}\) denote, respectively, the one-step prediction and corresponding prediction error covariance matrix, and \({\varvec{K}}_{k}\) denotes the Kalman gain, and \({\hat{{\varvec{x}}}}_{k|k}\) and \({\varvec{P}}_{k|k}\) are, respectively, the state estimate vector and corresponding estimation error covariance matrix.

Kalman filter is a minimum mean square error state estimator for a linear Gaussian state-space model with accurate noise statistics \(\left\{ {\varvec{q}},{\varvec{Q}},{\varvec{r}},{\varvec{R}}\right\} \). The performance of Kalman filter degrades severely when inaccurate mean vectors and covariance matrices of state and measurement noises are used. Unfortunately, accurate noise statistics may be unavailable in some engineering applications, such as target tracking. For example, the inaccurate noise statistics may be induced by severe maneuvering when an agile target is tracked. Therefore, there is a great demand for a new AKF suitable for operation with inaccurate noise statistics. Next, a new AKF will be proposed based on NIW distribution using the VB approach.

4 Main Results

4.1 A New NIW-Based Hierarchical Gaussian State-Space Model

By the fact that the measurement noise vector \({\varvec{v}}_{k}\) has a Gaussian distribution with mean vector \({\varvec{r}}\) and covariance matrix \({\varvec{R}}\) and using measurement model (2), the likelihood PDF is formulated as

$$\begin{aligned} p({\varvec{z}}_{k}|{\varvec{x}}_{k},{\varvec{r}},{\varvec{R}})=\mathrm {N}({\varvec{z}}_{k}; {\varvec{H}}_{k}{\varvec{x}}_{k}+{\varvec{r}},{\varvec{R}}). \end{aligned}$$
(8)

In the Kalman filtering framework, the one-step prediction PDF is updated as Gaussian, i.e.,

$$\begin{aligned} p({\varvec{x}}_{k}|{\hat{{\varvec{x}}}}_{k|k-1}, {\varvec{P}}_{k|k-1}, {\varvec{z}}_{1:k-1})=\mathrm {N}({\varvec{x}}_{k};{\hat{{\varvec{x}}}}_{k|k-1}, {\varvec{P}}_{k|k-1}). \end{aligned}$$
(9)

To resist the uncertainties of noise statistics, the state vector \({\varvec{x}}_{k}\), the one-step prediction \({\hat{{\varvec{x}}}}_{k|k-1}\), the prediction error covariance matrix \({\varvec{P}}_{k|k-1}\), the mean vector \({\varvec{r}}\) and covariance matrix \({\varvec{R}}\) of measurement noise will be jointly inferred. To this end, the joint conjugate priori distributions for \(\left\{ {\hat{{\varvec{x}}}}_{k|k-1},{\varvec{P}}_{k|k-1}\right\} \) and \(\left\{ {\varvec{r}},{\varvec{R}}\right\} \) need to be firstly selected since the conjugacy can guarantee that the posterior distribution is of the same functional form as the priori distribution. In Bayesian statistics, a NIW distribution is usually used as the joint conjugate priori for the mean vector and covariance matrix of a Gaussian distribution [15]. Since both \(\left\{ {\hat{{\varvec{x}}}}_{k|k-1},{\varvec{P}}_{k|k-1}\right\} \) and \(\left\{ {\varvec{r}},{\varvec{R}}\right\} \) are the mean vectors and covariance matrices of Gaussian PDFs in (8)–(9), their joint prior distributions \(p({\hat{{\varvec{x}}}}_{k|k-1},{\varvec{P}}_{k|k-1}|{\varvec{z}}_{1:k-1})\) and \(p({\varvec{r}},{\varvec{R}}|{\varvec{z}}_{1:k-1})\) are chosen as NIW PDFs, i.e.,

$$\begin{aligned}&p({\hat{{\varvec{x}}}}_{k|k-1},{\varvec{P}}_{k|k-1}|{\varvec{z}}_{1:k-1}) =\mathrm {NiW}({\hat{{\varvec{x}}}}_{k|k-1},{\varvec{P}}_{k|k-1};{\varvec{u}}_{k}, \alpha _{k},\omega _{k},{\varvec{\varSigma }}_{k}), \end{aligned}$$
(10)
$$\begin{aligned}&p({\varvec{r}},{\varvec{R}}|{\varvec{z}}_{1:k-1})=\mathrm {NiW}({\varvec{r}},{\varvec{R}}; {\varvec{\lambda }}_{k},\beta _{k},\nu _{k},{\varvec{\varDelta }}_{k}), \end{aligned}$$
(11)

where \({\varvec{u}}_{k}\), \(\alpha _{k}\), \(\omega _{k}\) and \({\varvec{\varSigma }}_{k}\) are, respectively, the mean vector, priori confidence parameter, dof parameter and scale matrix of \(p({\hat{{\varvec{x}}}}_{k|k-1},{\varvec{P}}_{k|k-1}|{\varvec{z}}_{1:k-1})\), and \({\varvec{\lambda }}_{k}\), \(\beta _{k}\), \(\nu _{k}\) and \({\varvec{\varDelta }}_{k}\) are, respectively, the mean vector, priori confidence parameter, dof parameter and scale matrix of \(p({\varvec{r}},{\varvec{R}}|{\varvec{z}}_{1:k-1})\), and the NIW PDF \(\mathrm {NiW}({\varvec{a}},{\varvec{A}};{\varvec{\mu }},\gamma ,\delta ,{\varvec{\varLambda }})\) can be written as [1]

$$\begin{aligned} \mathrm {NiW}({\varvec{a}},{\varvec{A}};{\varvec{\mu }},\gamma ,\delta ,{\varvec{\varLambda }}) =\mathrm {N}({\varvec{a}};{\varvec{\mu }},{\varvec{A}}/\gamma )\mathrm {IW}({\varvec{A}};\delta ,{\varvec{\varLambda }}). \end{aligned}$$
(12)

Utilizing (12) in (10)–(11) gives

$$\begin{aligned}&p({\hat{{\varvec{x}}}}_{k|k-1},{\varvec{P}}_{k|k-1}|{\varvec{z}}_{1:k-1})= \mathrm {N}({\hat{{\varvec{x}}}}_{k|k-1};{\varvec{u}}_{k},{\varvec{P}}_{k|k-1}/\alpha _{k})\times \mathrm {IW}({\varvec{P}}_{k|k-1};\omega _{k},{\varvec{\varSigma }}_{k}) \nonumber \\ \end{aligned}$$
(13)
$$\begin{aligned}&p({\varvec{r}},{\varvec{R}}|{\varvec{z}}_{1:k-1})=\mathrm {N}({\varvec{r}}; {\varvec{\lambda }}_{k},{\varvec{R}}/\beta _{k})\mathrm {IW}({\varvec{R}};\nu _{k},{\varvec{\varDelta }}_{k}). \end{aligned}$$
(14)

Employing (13)–(14), the joint priori PDFs for \(\left\{ {\hat{{\varvec{x}}}}_{k|k-1},\right. \)\(\left. {\varvec{P}}_{k|k-1}\right\} \) and \(\left\{ {\varvec{r}},{\varvec{R}}\right\} \) can be rewritten as the following hierarchical Gaussian forms

$$\begin{aligned}&p({\hat{{\varvec{x}}}}_{k|k-1}|{\varvec{P}}_{k|k-1},{\varvec{z}}_{1:k-1})=\mathrm {N} ({\hat{{\varvec{x}}}}_{k|k-1};{\varvec{u}}_{k},{\varvec{P}}_{k|k-1}/\alpha _{k}), \end{aligned}$$
(15)
$$\begin{aligned}&p({\varvec{P}}_{k|k-1}|{\varvec{z}}_{1:k-1})=\mathrm {IW}({\varvec{P}}_{k|k-1}; \omega _{k},{\varvec{\varSigma }}_{k}), \end{aligned}$$
(16)
$$\begin{aligned}&p({\varvec{r}}|{\varvec{R}},{\varvec{z}}_{1:k-1})=\mathrm {N}({\varvec{r}}; {\varvec{\lambda }}_{k},{\varvec{R}}/\beta _{k}), \end{aligned}$$
(17)
$$\begin{aligned}&p({\varvec{R}}|{\varvec{z}}_{1:k-1})=\mathrm {IW}({\varvec{R}};\nu _{k},{\varvec{\varDelta }}_{k}). \end{aligned}$$
(18)

To capture the priori information of \(\left\{ {\hat{{\varvec{x}}}}_{k|k-1},{\varvec{P}}_{k|k-1}\right\} \), the mean values of \({\hat{{\varvec{x}}}}_{k|k-1}\) and \({\varvec{P}}_{k|k-1}\) are, respectively, set as the nominal one-step prediction \({\hat{{\varvec{x}}}}_{k|k-1}^{*}\) and the nominal prediction error covariance matrix \({\varvec{P}}_{k|k-1}^{*}\), i.e.,

$$\begin{aligned} {\varvec{u}}_{k}= & {} {\hat{{\varvec{x}}}}_{k|k-1}^{*}={\varvec{F}}_{k-1}{\hat{{\varvec{x}}}}_{k-1|k-1}+{\varvec{q}}^{*}, \end{aligned}$$
(19)
$$\begin{aligned} \frac{{\varvec{\varSigma }}_{k}}{\omega _{k}-n-1}= & {} {\varvec{P}}_{k|k-1}^{*} ={\varvec{F}}_{k-1}{\varvec{P}}_{k-1|k-1}{\varvec{F}}_{k-1}^{\mathrm {T}}+{\varvec{Q}}^{*}, \end{aligned}$$
(20)

where \({\varvec{q}}^{*}\) and \({\varvec{Q}}^{*}\) are, respectively, the nominal mean vector and covariance matrix of state noise. Let

$$\begin{aligned} \omega _{k}=\tau +n+1. \end{aligned}$$
(21)

where \(\tau \) is a tuning parameter satisfying \(\tau >0\).

Substituting (21) in (20) yields

$$\begin{aligned} {\varvec{\varSigma }}_{k}=\tau {\varvec{P}}_{k|k-1}^{*}. \end{aligned}$$
(22)

To capture the priori information of \(\left\{ {\varvec{r}},{\varvec{R}}\right\} \), the mean value of \({\varvec{r}}\) is set as the nominal mean vector \({\varvec{r}}^{*}\), and the dof parameter \(\nu _{k}\) and scale matrix \({\varvec{\varDelta }}_{k}\) are set as previous estimates, i.e.,

$$\begin{aligned} {\varvec{\lambda }}_{k}={\varvec{r}}^{*}, \quad \nu _{k}={\hat{\nu }}_{k-1|k-1}, \quad {\varvec{\varDelta }}_{k}={\hat{{\varvec{\varDelta }}}}_{k-1|k-1}. \end{aligned}$$
(23)

where \({\hat{\nu }}_{k-1|k-1}\) and \({\hat{{\varvec{\varDelta }}}}_{k-1|k-1}\) denote, respectively, the estimates of the dof parameter and scale matrix of \({\varvec{R}}\) at time \(k-1\), and the initial dof parameter \(\nu _{0}\) and scale matrix \({\varvec{\varDelta }}_{0}\) satisfy \(\frac{{\varvec{\varDelta }}_{0}}{\nu _{0}-m-1}={\varvec{R}}^{*}\) and \(\nu _{0}>m+1\) with \({\varvec{R}}^{*}\) denoting the nominal measurement noise covariance matrix.

Fig. 1
figure 1

Diagram of the constructed NIW-based hierarchical Gaussian state-space model

Equations (8)–(9) and (15)–(23) constitute a new NIW-based hierarchical Gaussian state-space model, whose diagram is illustrated in Fig. 1. Next, the state vector, the one-step prediction and corresponding prediction error covariance matrix, and the mean vector and covariance matrix of measurement noise, i.e., \({\varvec{\varPsi }}_{k}\triangleq \{{\varvec{x}}_{k},{\hat{{\varvec{x}}}}_{k|k-1},{\varvec{P}}_{k|k-1},{\varvec{r}},{\varvec{R}}\}\), will be jointly estimated based on the constructed hierarchical Gaussian state-space model using the VB approach, from which a new AKF with inaccurate noise statistics will be proposed.

4.2 Variational Approximations of Posterior PDFs

To infer the state vector, the one-step prediction and corresponding prediction error covariance matrix, and the mean vector and covariance matrix of measurement noise simultaneously, the joint posterior PDF \(p({\varvec{\varPsi }}_{k}|{\varvec{z}}_{1:k})\) needs to be calculated since it provides full description for parameter set \({\varvec{\varPsi }}_{k}\). Unfortunately, the optimal solution for the joint posterior PDF is unavailable based on the constructed hierarchical Gaussian state-space model because the IW PDF has not a closed form. In the paper, the VB approach is employed to obtain a freeform factored approximate solution for the joint posterior PDF \(p({\varvec{\varPsi }}_{k}|{\varvec{z}}_{1:k})\), i.e., [9, 10]

$$\begin{aligned} p({\varvec{\varPsi }}_{k}|{\varvec{z}}_{1:k}){\approx }q({\varvec{x}}_{k}) q({\hat{{\varvec{x}}}}_{k|k-1})q({\varvec{P}}_{k|k-1})q({\varvec{r}})q({\varvec{R}}), \end{aligned}$$
(24)

where \(q(\cdot )\) denotes the approximate posterior PDF.

Based on the VB approach, these approximate posterior PDFs are achieved by minimizing the Kullback–Leibler divergence (KLD) between approximate posterior PDF and true posterior PDF, i.e., [9, 10]

$$\begin{aligned}&\left\{ q(\mathbf {x}_{k}), q({\hat{\mathbf{x}}}_{k|k-1}), q(\mathbf {P}_{k|k-1}), q(\mathbf {r}), q(\mathbf {R})\right\} =\arg \min \nonumber \\&\mathrm {KLD}\left( q(\mathbf {x}_{k})q({\hat{\mathbf{x}}}_{k|k-1}) q(\mathbf {P}_{k|k-1})q(\mathbf {r})q(\mathbf {R})||p(\mathbf {\varPsi }_{k}|\mathbf {z}_{1:k})\right) , \nonumber \\ \end{aligned}$$
(25)

where \(\mathrm {KLD}\left( q({\varvec{x}})||p({\varvec{x}})\right) \triangleq \int q({\varvec{x}})\log \frac{q({\varvec{x}})}{p({\varvec{x}})}d{\varvec{x}}\) denotes the KLD operation between PDFs \(q({\varvec{x}})\) and \(p({\varvec{x}})\).

The optimal solution for (25) can be formulated as [9, 10]

$$\begin{aligned} \log q({\varvec{\theta }})=\mathrm {E}_{{\varvec{\varPsi }}_{k}^{(-{\varvec{\theta }})}}[\log p({\varvec{\varPsi }}_{k},{\varvec{z}}_{1:k})]+c_{{\varvec{\theta }}}, \end{aligned}$$
(26)

where \({\varvec{\theta }}\) is an arbitrary element of \({\varvec{\varPsi }}_{k}\), and \({\varvec{\varPsi }}_{k}^{(-{\varvec{\theta }})}\) is the set of all elements in \({\varvec{\varPsi }}_{k}\) except for \({\varvec{\theta }}\), and \(c_{{\varvec{\theta }}}\) denotes the constant with respect to variable \({\varvec{\theta }}\).

However, an analytical solution for (26) is unavailable since the variational parameters are mutually coupled. The fixed-point iteration approach is often employed to obtain an approximate solution for (26). That is to say, the approximate posterior PDF \(q({\varvec{\theta }})\) of an arbitrary element \({\varvec{\theta }}\) is updated as \(q^{(i+1)}({\varvec{\theta }})\) at the \(i+1\)th iteration by using the approximate posterior PDFs \(q^{(i+1)}({\varvec{\gamma }})\) and \(q^{(i)}({\varvec{\varPsi }}_{k}^{(-\{{\varvec{\theta }},{\varvec{\gamma }}\})})\) to calculate the expectation in (26), where \({\varvec{\gamma }}\) is composed of the elements that have been updated at the \(i+1\)th iteration. The iterations converge to a local optimum of Eq. (26).

According to the Bayesian theorem and using the conditional independence properties of the constructed NIW-based hierarchical Gaussian state-space model, the joint PDF \(p({\varvec{\varPsi }}_{k},{\varvec{z}}_{1:k})\) can be formulated as

$$\begin{aligned} p({\varvec{\varPsi }}_{k},{\varvec{z}}_{1:k})= & {} p({\varvec{z}}_{k}|{\varvec{x}}_{k},{\varvec{r}},{\varvec{R}}) p({\varvec{x}}_{k}|{\varvec{z}}_{1:k-1}, {\hat{{\varvec{x}}}}_{k|k-1}, {\varvec{P}}_{k|k-1})p({\hat{{\varvec{x}}}}_{k|k-1}|{\varvec{P}}_{k|k-1},{\varvec{z}}_{1:k-1}) \nonumber \\&\quad p({\varvec{P}}_{k|k-1}|{\varvec{z}}_{1:k-1})p({\varvec{r}}|{\varvec{R}},{\varvec{z}}_{1:k-1}) p({\varvec{R}}|{\varvec{z}}_{1:k-1})p({\varvec{z}}_{1:k-1}). \end{aligned}$$
(27)

Substituting (8)–(9) and (15)–(18) in (27) yields

$$\begin{aligned}&p({\varvec{\varPsi }}_{k},{\varvec{z}}_{1:k})=\mathrm {N}({\varvec{z}}_{k};{\varvec{H}}_{k}{\varvec{x}}_{k} +{\varvec{r}},{\varvec{R}})\mathrm {N}({\varvec{x}}_{k};{\hat{{\varvec{x}}}}_{k|k-1}, {\varvec{P}}_{k|k-1})\mathrm {N}({\hat{{\varvec{x}}}}_{k|k-1};{\varvec{u}}_{k},{\varvec{P}}_{k|k-1}/\alpha _{k}) \nonumber \\&\quad \mathrm {IW}({\varvec{P}}_{k|k-1};\omega _{k},{\varvec{\varSigma }}_{k})\mathrm {N}({\varvec{r}}; {\varvec{\lambda }}_{k},{\varvec{R}}/\beta _{k})\mathrm {IW}({\varvec{R}};\nu _{k}, {\varvec{\varDelta }}_{k})p({\varvec{z}}_{1:k-1}). \end{aligned}$$
(28)

Let \({\varvec{\theta }}={\hat{{\varvec{x}}}}_{k|k-1}\) and \({\varvec{\theta }}={\varvec{r}}\) and using (28) in (26), \(q^{(i+1)}({\hat{{\varvec{x}}}}_{k|k-1})\) and \(q^{(i+1)}({\varvec{r}})\) are updated as Gaussian PDFs, i.e.,

$$\begin{aligned} q^{(i+1)}({\hat{{\varvec{x}}}}_{k|k-1})= & {} \mathrm {N}({\hat{{\varvec{x}}}}_{k|k-1}; {\hat{{\varvec{u}}}}_{k}^{(i+1)},{\hat{{\varvec{U}}}}_{k}^{(i+1)}), \end{aligned}$$
(29)
$$\begin{aligned} q^{(i+1)}({\varvec{r}})= & {} \mathrm {N}({\varvec{r}};{\hat{{\varvec{\lambda }}}}_{k}^{(i+1)}, {\hat{{\varvec{\varOmega }}}}_{k}^{(i+1)}), \end{aligned}$$
(30)

where the mean vectors \({\hat{{\varvec{u}}}}_{k}^{(i+1)}\) and \({\hat{{\varvec{\lambda }}}}_{k}^{(i+1)}\) and covariance matrices \({\hat{{\varvec{U}}}}_{k}^{(i+1)}\) and \({\hat{{\varvec{\varOmega }}}}_{k}^{(i+1)}\) are given by

$$\begin{aligned} {\hat{{\varvec{u}}}}_{k}^{(i+1)}= & {} \frac{\alpha _{k}{\varvec{u}}_{k}+{\hat{{\varvec{x}}}}_{k|k}^{(i)}}{\alpha _{k}+1}, \qquad \qquad {\hat{{\varvec{U}}}}_{k}^{(i+1)}=\frac{{\varvec{\bar{P}}}_{k|k-1}^{(i)}}{\alpha _{k}+1}, \end{aligned}$$
(31)
$$\begin{aligned} {\hat{{\varvec{\lambda }}}}_{k}^{(i+1)}= & {} \frac{\beta _{k}{\varvec{\lambda }}_{k} +\left( {\varvec{z}}_{k}-{\varvec{H}}_{k}{\hat{{\varvec{x}}}}_{k|k}^{(i)}\right) }{\beta _{k}+1}, \quad {\hat{{\varvec{\varOmega }}}}_{k}^{(i+1)}=\frac{{\varvec{\bar{R}}}_{k}^{(i)}}{\beta _{k}+1}, \end{aligned}$$
(32)

and the modified prediction error covariance matrix \({\varvec{\bar{P}}}_{k|k-1}^{(i)}\) and modified measurement noise covariance matrix \({\varvec{\bar{R}}}_{k}^{(i)}\) are formulated as

$$\begin{aligned} {\varvec{\bar{P}}}_{k|k-1}^{(i)}=\left\{ \mathrm {E}^{(i)}[{\varvec{P}}_{k|k-1}^{-1}]\right\} ^{-1}, \quad {\varvec{\bar{R}}}_{k}^{(i)}=\left\{ \mathrm {E}^{(i)}\left[ {\varvec{R}}^{-1}\right] \right\} ^{-1}, \end{aligned}$$
(33)

where the proofs of (29)–(33) are given in “Appendix A”.

Let \({\varvec{\theta }}={\varvec{P}}_{k|k-1}\) and \({\varvec{\theta }}={\varvec{R}}\) and exploiting (28) in (26), \(q^{(i+1)}({\varvec{P}}_{k|k-1})\) and \(q^{(i+1)}({\varvec{R}})\) are updated as IW PDFs, i.e.,

$$\begin{aligned} q^{(i+1)}({\varvec{P}}_{k|k-1})= & {} \mathrm {IW}({\varvec{P}}_{k|k-1}; {\hat{\omega }}_{k}^{(i+1)},{\hat{{\varvec{\varSigma }}}}_{k}^{(i+1)}), \end{aligned}$$
(34)
$$\begin{aligned} q^{(i+1)}({\varvec{R}})= & {} \mathrm {IW}({\varvec{R}};{\hat{\nu }}_{k}^{(i+1)}, {\hat{{\varvec{\varDelta }}}}_{k}^{(i+1)}), \end{aligned}$$
(35)

where the dof parameters \({\hat{\omega }}_{k}^{(i+1)}\) and \({\hat{\nu }}_{k}^{(i+1)}\) and inverse scale matrices \({\hat{{\varvec{\varSigma }}}}_{k}^{(i+1)}\) and \({\hat{{\varvec{\varDelta }}}}_{k}^{(i+1)}\) are given by

$$\begin{aligned} {\hat{\omega }}_{k}^{(i+1)}= & {} \omega _{k}+2, \end{aligned}$$
(36)
$$\begin{aligned} {\hat{{\varvec{\varSigma }}}}_{k}^{(i+1)}= & {} {\varvec{A}}_{k}^{(i+1)}+{\varvec{B}}_{k}^{(i+1)} +{\varvec{\varSigma }}_{k}, \end{aligned}$$
(37)
$$\begin{aligned} {\hat{\nu }}_{k}^{(i+1)}= & {} \nu _{k}+2, \end{aligned}$$
(38)
$$\begin{aligned} {\hat{{\varvec{\varDelta }}}}_{k}^{(i+1)}= & {} {\varvec{C}}_{k}^{(i+1)}+{\varvec{D}}_{k}^{(i+1)} +{\varvec{\varDelta }}_{k}, \end{aligned}$$
(39)

and the auxiliary parameters \({\varvec{A}}_{k}^{(i+1)}\), \({\varvec{B}}_{k}^{(i+1)}\), \({\varvec{C}}_{k}^{(i+1)}\) and \({\varvec{D}}_{k}^{(i+1)}\) are, respectively, given by

$$\begin{aligned} {\varvec{A}}_{k}^{(i+1)}= & {} \mathrm {E}^{(i)}\left[ \left( {\varvec{x}}_{k}-{\hat{{\varvec{x}}}}_{k|k-1} \right) \left( {\varvec{x}}_{k}-{\hat{{\varvec{x}}}}_{k|k-1}\right) ^{\mathrm {T}}\right] , \end{aligned}$$
(40)
$$\begin{aligned} {\varvec{B}}_{k}^{(i+1)}= & {} \alpha _{k}\mathrm {E}^{(i)}\left[ \left( {\hat{{\varvec{x}}}}_{k|k-1} -{\varvec{u}}_{k}\right) ({\hat{{\varvec{x}}}}_{k|k-1}-{\varvec{u}}_{k})^{\mathrm {T}}\right] , \end{aligned}$$
(41)
$$\begin{aligned} {\varvec{C}}_{k}^{(i+1)}= & {} \mathrm {E}^{(i)}\left[ \left( {\varvec{z}}_{k} -{\varvec{H}}_{k}{\varvec{x}}_{k}-{\varvec{r}}\right) \left( {\varvec{z}}_{k}-{\varvec{H}}_{k}{\varvec{x}}_{k} -{\varvec{r}}\right) ^{\mathrm {T}}\right] , \end{aligned}$$
(42)
$$\begin{aligned} {\varvec{D}}_{k}^{(i+1)}= & {} \beta _{k}\mathrm {E}^{(i)} \left[ \left( {\varvec{r}}-{\varvec{\lambda }}_{k}\right) \left( {\varvec{r}}-{\varvec{\lambda }}_{k} \right) ^{\mathrm {T}}\right] , \end{aligned}$$
(43)

where the proofs of (34)–(43) are given in “Appendix B”.

Let \({\varvec{\theta }}={\varvec{x}}_{k}\) and using (28) in (26), \(q^{(i+1)}({\varvec{x}}_{k})\) is updated as a Gaussian PDF, i.e.,

$$\begin{aligned} q^{(i+1)}({\varvec{x}}_{k})=\mathrm {N}({\varvec{x}}_{k};{\hat{{\varvec{x}}}}_{k|k}^{(i+1)}, {\varvec{P}}_{k|k}^{(i+1)}), \end{aligned}$$
(44)

where the mean vector \({\hat{{\varvec{x}}}}_{k|k}^{(i+1)}\) and covariance matrix \({\varvec{P}}_{k|k}^{(i+1)}\) are given by

$$\begin{aligned} {\varvec{K}}_{k}^{(i+1)}= & {} {\varvec{\bar{P}}}_{k|k-1}^{(i+1)}{\varvec{H}}_{k}^{\mathrm {T}} ({\varvec{H}}_{k}{\varvec{\bar{P}}}_{k|k-1}^{(i+1)} {\varvec{H}}_{k}^{\mathrm {T}}+{\varvec{\bar{R}}}_{k}^{(i+1)})^{-1}, \end{aligned}$$
(45)
$$\begin{aligned} {\hat{{\varvec{x}}}}_{k|k}^{(i+1)}= & {} {\hat{{\varvec{u}}}}_{k}^{(i+1)}+{\varvec{K}}_{k}^{(i+1)} ({\varvec{z}}_{k}-{\hat{{\varvec{\lambda }}}}_{k}^{(i+1)}-{\varvec{H}}_{k}{\hat{{\varvec{u}}}}_{k}^{(i+1)}), \end{aligned}$$
(46)
$$\begin{aligned} {\varvec{P}}_{k|k}^{(i+1)}= & {} \left( {\varvec{I}}_{n}-{\varvec{K}}_{k}^{(i+1)}{\varvec{H}}_{k}\right) {\varvec{\bar{P}}}_{k|k-1}^{(i+1)}, \end{aligned}$$
(47)

where \({\varvec{K}}_{k}^{(i+1)}\) denotes the modified Kalman gain, and the proofs of (44)–(47) are given in “Appendix C”.

After fixed-point iteration N, the approximate posterior PDFs of the state vector, the one-step prediction and corresponding prediction error covariance matrix, and the mean vector and covariance matrix of measurement noise are, respectively, updated as

$$\begin{aligned}&q({\varvec{x}}_{k})\approx \mathrm {N}({\varvec{x}}_{k};\hat{{\varvec{x}}}_{k|k}^{(N)}, {\varvec{P}}_{k|k}^{(N)})=\mathrm {N}({\varvec{x}}_{k};\hat{{\varvec{x}}}_{k|k},{\varvec{P}}_{k|k}), \end{aligned}$$
(48)
$$\begin{aligned}&q({\hat{{\varvec{x}}}}_{k|k-1})\approx \mathrm {N}({\hat{{\varvec{x}}}}_{k|k-1}; {\hat{{\varvec{u}}}}_{k}^{(N)},{\hat{{\varvec{U}}}}_{k}^{(N)})=\mathrm {N} ({\hat{{\varvec{x}}}}_{k|k-1};{\hat{{\varvec{u}}}}_{k|k},{\hat{{\varvec{U}}}}_{k|k}), \end{aligned}$$
(49)
$$\begin{aligned}&q({\varvec{P}}_{k|k-1})\approx \mathrm {IW}({\varvec{P}}_{k|k-1};{\hat{\omega }}_{k}^{(N)}, {\hat{{\varvec{\varSigma }}}}_{k}^{(N)})=\mathrm {IW}({\varvec{P}}_{k|k-1}; {\hat{\omega }}_{k|k},{\hat{{\varvec{\varSigma }}}}_{k|k}), \end{aligned}$$
(50)
$$\begin{aligned}&q({\varvec{r}})\approx \mathrm {N}({\varvec{r}};{\hat{{\varvec{\lambda }}}}_{k}^{(N)}, {\hat{{\varvec{\varOmega }}}}_{k}^{(N)})=\mathrm {N}({\varvec{r}};{\hat{{\varvec{\lambda }}}}_{k|k}, {\hat{{\varvec{\varOmega }}}}_{k|k}), \end{aligned}$$
(51)
$$\begin{aligned}&q({\varvec{R}})\approx \mathrm {IW}({\varvec{R}};{\hat{\nu }}_{k}^{(N)}, {\hat{{\varvec{\varDelta }}}}_{k}^{(N)})=\mathrm {IW}({\varvec{R}};{\hat{\nu }}_{k|k}, {\hat{{\varvec{\varDelta }}}}_{k|k}). \end{aligned}$$
(52)

To update the approximate posterior PDFs, we need to calculate the expectations in (33) and (40)–(43). Using (29)–(30), (34)–(35) and (44), the required expectations in (33) and (40)–(43) are calculated as follows

$$\begin{aligned} {\varvec{\bar{P}}}_{k|k-1}^{(i+1)}= & {} {\hat{{\varvec{\varSigma }}}}_{k}^{(i+1)} \bigg /\left( {\hat{\omega }}_{k}^{(i+1)}-n-1\right) , \end{aligned}$$
(53)
$$\begin{aligned} {\varvec{\bar{R}}}_{k}^{(i+1)}= & {} {\hat{{\varvec{\varDelta }}}}_{k}^{(i+1)}\bigg /\left( {\hat{\nu }}_{k}^{(i+1)} -m-1\right) , \end{aligned}$$
(54)
$$\begin{aligned} {\varvec{A}}_{k}^{(i+1)}= & {} {\varvec{P}}_{k|k}^{(i)}+{\hat{{\varvec{U}}}}_{k}^{(i+1)} +\left( {\hat{{\varvec{x}}}}_{k|k}^{(i)}-{\hat{{\varvec{u}}}}_{k}^{(i+1)}\right) \left( {\hat{{\varvec{x}}}}_{k|k}^{(i)} -{\hat{{\varvec{u}}}}_{k}^{(i+1)}\right) ^{\mathrm {T}}, \end{aligned}$$
(55)
$$\begin{aligned} {\varvec{B}}_{k}^{(i+1)}= & {} \alpha _{k}{\hat{{\varvec{U}}}}_{k}^{(i+1)}+\alpha _{k} \left( {\hat{{\varvec{u}}}}_{k}^{(i+1)}-{\varvec{u}}_{k}\right) \left( {\hat{{\varvec{u}}}}_{k}^{(i+1)} -{\varvec{u}}_{k}\right) ^{\mathrm {T}}, \end{aligned}$$
(56)
$$\begin{aligned} {\varvec{C}}_{k}^{(i+1)}= & {} {\hat{{\varvec{\varOmega }}}}_{k}^{(i+1)}+\left( {\varvec{z}}_{k} -{\varvec{H}}_{k}{\hat{{\varvec{x}}}}_{k|k}^{(i)}-{\hat{{\varvec{\lambda }}}}_{k}^{(i+1)}\right) \left( {\varvec{z}}_{k}-{\varvec{H}}_{k}{\hat{{\varvec{x}}}}_{k|k}^{(i)} -{\hat{{\varvec{\lambda }}}}_{k}^{(i+1)}\right) ^{\mathrm {T}}\nonumber \\&+{\varvec{H}}_{k}{\varvec{P}}_{k|k}^{(i)}{\varvec{H}}_{k}^{\mathrm {T}}, \end{aligned}$$
(57)
$$\begin{aligned} {\varvec{D}}_{k}^{(i+1)}= & {} \beta _{k}{\hat{{\varvec{\varOmega }}}}_{k}^{(i+1)} +\beta _{k}\left( {\hat{{\varvec{\lambda }}}}_{k}^{(i+1)}-{\varvec{\lambda }}_{k}\right) \left( {\hat{{\varvec{\lambda }}}}_{k}^{(i+1)}-{\varvec{\lambda }}_{k}\right) ^{\mathrm {T}}, \end{aligned}$$
(58)

where the proofs of (55)–(58) are given in “Appendix D”.

Table 1 One-time step of the proposed AKF

The proposed AKF operates recursively by combining the variational approximations of posterior PDFs in (29)–(32), (34)–(39) and (44)–(52) and the calculations of expectations in (53)–(58). The implementation pseudo-code for the one-time step of the proposed AKF is given in Table 1.

4.3 Parameter Selection of the Proposed AKF

To implement the proposed AKF, the nominal mean vectors \({\varvec{q}}^{*}\) and \({\varvec{r}}^{*}\) and covariance matrices \({\varvec{Q}}^{*}\) and \({\varvec{R}}^{*}\) of state and measurement noises, the priori confidence parameters \(\alpha _{k}\) and \(\beta _{k}\), the tuning parameter \(\tau \), and the initial dof parameter \(\nu _{0}\) require to be selected in advance.

Firstly, we derive the specific forms of the modified one-step prediction \({\hat{{\varvec{u}}}}_{k}^{(i+1)}\), the modified measurement noise mean vector \({\hat{{\varvec{\lambda }}}}_{k}^{(i+1)}\), the modified prediction error covariance matrix \({\varvec{\bar{P}}}_{k|k-1}^{(i+1)}\) and the modified measurement noise covariance matrix \({\varvec{\bar{R}}}_{k}^{(i+1)}\) at the \(i+1\)th iteration. Substituting (19) and (23) in (31)–(32) gives

$$\begin{aligned} {\hat{{\varvec{u}}}}_{k}^{(i+1)}= & {} \frac{\alpha _{k}\left( {\varvec{F}}_{k-1} {\hat{{\varvec{x}}}}_{k-1|k-1}+{\varvec{q}}^{*}\right) +{\hat{{\varvec{x}}}}_{k|k}^{(i)}}{\alpha _{k}+1}, \end{aligned}$$
(59)
$$\begin{aligned} {\hat{{\varvec{\lambda }}}}_{k}^{(i+1)}= & {} \frac{\beta _{k}{\varvec{r}}^{*} +\left( {\varvec{z}}_{k}-{\varvec{H}}_{k}{\hat{{\varvec{x}}}}_{k|k}^{(i)}\right) }{\beta _{k}+1}. \end{aligned}$$
(60)

Using (36)–(37) and (53) in (33), the modified prediction error covariance matrix \({\varvec{\bar{P}}}_{k|k-1}^{(i+1)}\) can be rewritten as

$$\begin{aligned} {\varvec{\bar{P}}}_{k|k-1}^{(i+1)}=\frac{{\varvec{A}}_{k}^{(i+1)}+{\varvec{B}}_{k}^{(i+1)} +{\varvec{\varSigma }}_{k}}{\omega _{k}-n+1}, \end{aligned}$$
(61)

where \({\varvec{A}}_{k}^{(i+1)}\) and \({\varvec{B}}_{k}^{(i+1)}\) are, respectively, given by (55)–(56).

Substituting (20)–(22) in (61) yields

$$\begin{aligned} {\varvec{\bar{P}}}_{k|k-1}^{(i+1)}=\frac{\tau \left( {\varvec{F}}_{k-1} {\varvec{P}}_{k-1|k-1}{\varvec{F}}_{k-1}^{\mathrm {T}}+{\varvec{Q}}^{*}\right) +{\varvec{A}}_{k}^{(i+1)} +{\varvec{B}}_{k}^{(i+1)}}{\tau +2}. \end{aligned}$$
(62)

Exploiting (38)–(39) and (52), we have

$$\begin{aligned} {{\hat{\nu }}}_{k|k}= & {} \nu _{k}+2, \end{aligned}$$
(63)
$$\begin{aligned} {\hat{{\varvec{\varDelta }}}}_{k|k}= & {} {\varvec{C}}_{k}^{(N)}+{\varvec{D}}_{k}^{(N)}+{\varvec{\varDelta }}_{k}. \end{aligned}$$
(64)

Substituting (23) in (63)–(64) results in

$$\begin{aligned} {{\hat{\nu }}}_{k|k}= & {} {{\hat{\nu }}}_{k-1|k-1}+2, \end{aligned}$$
(65)
$$\begin{aligned} {\hat{{\varvec{\varDelta }}}}_{k|k}= & {} {\hat{{\varvec{\varDelta }}}}_{k-1|k-1}+{\varvec{C}}_{k}^{(N)}+{\varvec{D}}_{k}^{(N)}. \end{aligned}$$
(66)

According to the recursive relation in (65)–(66), \({\hat{\nu }}_{k-1|k-1}\) and \({\hat{{\varvec{\varDelta }}}}_{k-1|k-1}\) can be calculated as

$$\begin{aligned} {{\hat{\nu }}}_{k-1|k-1}= & {} \nu _{0}+2(k-1), \end{aligned}$$
(67)
$$\begin{aligned} {\hat{{\varvec{\varDelta }}}}_{k-1|k-1}= & {} (\nu _{0}-m-1){\varvec{R}}^{*} +\sum \limits _{j=1}^{k-1}\left[ {\varvec{C}}_{j}^{(N)}+{\varvec{D}}_{j}^{(N)}\right] , \end{aligned}$$
(68)

where \({\hat{\nu }}_{0|0}=\nu _{0}\) and \({\hat{{\varvec{\varDelta }}}}_{0|0}={\varvec{\varDelta }}_{0}=(\nu _{0}-m-1){\varvec{R}}^{*}\) are used in (67)–(68).

Employing (23) and (67)–(68) in (38)–(39) yields

$$\begin{aligned} {\hat{\nu }}_{k}^{(i+1)}= & {} \nu _{0}+2k, \end{aligned}$$
(69)
$$\begin{aligned} {\hat{{\varvec{\varDelta }}}}_{k}^{(i+1)}= & {} (\nu _{0}-m-1){\varvec{R}}^{*}+{\varvec{C}}_{k}^{(i+1)} +{\varvec{D}}_{k}^{(i+1)}+\sum \limits _{j=1}^{k-1}\left[ {\varvec{C}}_{j}^{(N)} +{\varvec{D}}_{j}^{(N)}\right] . \end{aligned}$$
(70)

Substituting (69)–(70) in (54), \({\varvec{\bar{R}}}_{k}^{(i+1)}\) can be reformulated as

$$\begin{aligned} {\varvec{\bar{R}}}_{k}^{(i+1)}= & {} \left\{ (\nu _{0}-m-1){\varvec{R}}^{*}+{\varvec{C}}_{k}^{(i+1)} +{\varvec{D}}_{k}^{(i+1)}+\right. \left. \sum \limits _{j=1}^{k-1}\left[ {\varvec{C}}_{j}^{(N)}+{\varvec{D}}_{j}^{(N)}\right] \right\} / \nonumber \\&\quad \left[ (\nu _{0}-m-1)+2k\right] . \end{aligned}$$
(71)

Next, we discuss the effects of parameters \({\varvec{q}}^{*}\), \({\varvec{r}}^{*}\), \({\varvec{Q}}^{*}\), \({\varvec{R}}^{*}\), \(\alpha _{k}\), \(\beta _{k}\), \(\tau \) and \(\nu _{0}\) upon the proposed AKF. It is observed from (59)–(60), (62) and (71) that the modified one-step prediction \({\hat{{\varvec{u}}}}_{k}^{(i+1)}\) is a weighted sum of priori information \(\left( {\varvec{F}}_{k-1}{\hat{{\varvec{x}}}}_{k-1|k-1}+{\varvec{q}}^{*}\right) \) and innovation \({\hat{{\varvec{x}}}}_{k|k}^{(i)}\) with weights \(\alpha _{k}\) and 1, respectively; the modified measurement noise mean vector \({\hat{{\varvec{\lambda }}}}_{k}^{(i+1)}\) is a weighted sum of priori information \({\varvec{r}}^{*}\) and innovation \(\left( {\varvec{z}}_{k}-{\varvec{H}}_{k}{\hat{{\varvec{x}}}}_{k|k}^{(i)}\right) \) with weights \(\beta _{k}\) and 1, respectively; the modified prediction error covariance matrix \({\varvec{\bar{P}}}_{k|k-1}^{(i+1)}\) is a weighted sum of priori information \(\left( {\varvec{F}}_{k-1}{\varvec{P}}_{k-1|k-1}{\varvec{F}}_{k-1}^{\mathrm {T}}+{\varvec{Q}}^{*}\right) \) and innovation \(\left( {\varvec{A}}_{k}^{(i+1)}+{\varvec{B}}_{k}^{(i+1)}\right) \) with weights \(\tau \) and 2, respectively; and the modified measurement noise covariance matrix \({\varvec{\bar{R}}}_{k}^{(i+1)}\) is a weighted sum of priori information \({\varvec{R}}^{*}\) and innovation \(\left\{ {\varvec{C}}_{k}^{(i+1)}+{\varvec{D}}_{k}^{(i+1)}+\sum \nolimits _{j=1}^{k-1}\left[ {\varvec{C}}_{j}^{(N)}+{\varvec{D}}_{j}^{(N)}\right] \right\} \) with weights \(\nu _{0}-m-1\) and 2k, respectively. Thus, the parameters \({\varvec{q}}^{*}\), \({\varvec{r}}^{*}\), \({\varvec{Q}}^{*}\) and \({\varvec{R}}^{*}\) dominate the accuracy of priori information, and the parameters \(\alpha _{k}\), \(\beta _{k}\), \(\tau \) and \(\nu _{0}\) dominate the confidence level to priori information.

Since the VB approach only guarantees the local convergence of variational iterations, accurate priori information is necessary. To this end, the nominal parameters \({\varvec{q}}^{*}\), \({\varvec{r}}^{*}\), \({\varvec{Q}}^{*}\) and \({\varvec{R}}^{*}\) require to be near the true values \({\varvec{q}}\), \({\varvec{r}}\), \({\varvec{Q}}\) and \({\varvec{R}}\), respectively. In this paper, the nominal mean vectors and covariance matrices of state and measurement noises are, respectively, suggested to be selected as \({\varvec{q}}^{*}=[q_{1}, \ldots , q_{i}, \ldots , q_{n}]^{\mathrm {T}}\), \({\varvec{r}}^{*}=[r_{1}, \ldots , r_{j}, \ldots , r_{m}]^{\mathrm {T}}\), \({\varvec{Q}}^{*}=\mathrm {diag}[Q_{1}, \ldots , Q_{i}, \ldots , Q_{n}]\) and \({\varvec{R}}^{*}=\mathrm {diag}[R_{1}, \ldots , R_{j}, \ldots , R_{m}]\), where \(Q_{i}>0\) and \(R_{j}>0\). The explicit selections of the parameters \(q_{i}\), \(r_{j}\), \(Q_{i}\) and \(R_{j}\) depend on practical application, and the approximated values of the parameters are available in many practical application. The selections of the parameters \(\alpha _{k}\), \(\beta _{k}\), \(\tau \) and \(\nu _{0}\) rely heavily on the selections of the nominal parameters \({\varvec{q}}^{*}\), \({\varvec{r}}^{*}\), \({\varvec{Q}}^{*}\) and \({\varvec{R}}^{*}\) because they determine the accuracy of priori information.

Finally, we study the numerical stability of the proposed AKF with the selected parameters. Since the nominal covariance matrices of state and measurement noises are, respectively, set as \({\varvec{Q}}^{*}=\mathrm {diag}[Q_{1}, \ldots , Q_{i}, \ldots , Q_{n}]\) and \({\varvec{R}}^{*}=\mathrm {diag}[R_{1}, \ldots , R_{j}, \ldots , R_{m}]\) with \(Q_{i}>0\) and \(R_{j}>0\), both \({\varvec{Q}}^{*}\) and \({\varvec{R}}^{*}\) are positive matrices, i.e.,

$$\begin{aligned} {\varvec{Q}}^{*}>{\varvec{0}}, \qquad {\varvec{R}}^{*}>{\varvec{0}}. \end{aligned}$$
(72)

According to (40)–(43), the auxiliary parameters \({\varvec{A}}_{k}^{(i+1)}\), \({\varvec{B}}_{k}^{(i+1)}\), \({\varvec{C}}_{k}^{(i+1)}\) and \({\varvec{D}}_{k}^{(i+1)}\) are all positive semi-definite matrices, i.e.,

$$\begin{aligned} {\varvec{A}}_{k}^{(i+1)}\ge {\varvec{0}}, \quad {\varvec{B}}_{k}^{(i+1)}\ge {\varvec{0}}, \quad {\varvec{C}}_{k}^{(i+1)}\ge {\varvec{0}}, \quad {\varvec{D}}_{k}^{(i+1)}\ge {\varvec{0}}. \end{aligned}$$
(73)

Utilizing \({\varvec{P}}_{k-1|k-1}>{\varvec{0}}\), \(\tau >0\), \(\nu _{0}-m-1>0\) and (72)–(73) in (62) and (71) obtains

$$\begin{aligned} {\varvec{\bar{P}}}_{k|k-1}^{(i+1)}>{\varvec{0}}, \qquad {\varvec{\bar{R}}}_{k}^{(i+1)}>{\varvec{0}}. \end{aligned}$$
(74)

We can see from (74) that both the modified prediction error covariance matrix \({\varvec{\bar{P}}}_{k|k-1}^{(i+1)}\) and the modified measurement noise covariance matrix \({\varvec{\bar{R}}}_{k}^{(i+1)}\) are positive definite, based on which the modified innovation covariance matrix \(({\varvec{H}}_{k}{\varvec{\bar{P}}}_{k|k-1}^{(i+1)}{\varvec{H}}_{k}^{\mathrm {T}}\)\(+{\varvec{\bar{R}}}_{k}^{(i+1)})\) in (45) is also positive definite. Thus, the proposed AKF has numerical stability using the selected parameters.

Remark 1

The estimation accuracy and computational complexity of the proposed filter are determined by the number of iterations N. The more number of iterations is used, the better estimation accuracy can be obtained but the higher computational complexity is needed. Normally, the number of iterations is a tradeoff of the estimation accuracy and computational complexity. In practical engineering applications, the selection of the number of iterations relies on the users’ requirements. If the users require good estimation accuracy, then a large N should be selected. Conversely, if the users require good real-time performance, then an appropriately large N should be chosen, which implies that the computational complexity can be improved by sacrificing estimation accuracy.

5 Simulation Study

The proposed AKF and existing state-of-the-art adaptive Kalman filters are tested and compared in a maneuvring target tracking example. The target moves with a constant velocity in a plane, whose positions are observed in clutter. The target is tracked using a constant velocity model, and the noise corrupted positions are used for measurement vectors. The cartesian coordinates and corresponding velocities are selected as a state vector, i.e., \({\varvec{x}}_{k}\triangleq [x_{k}\;y_{k}\;\dot{x}_{k}\;\dot{y}_{k}]\), where \(x_{k}\), \(y_{k}\), \(\dot{x}_{k}\) and \(\dot{y}_{k}\) denote the cartesian coordinates and corresponding velocities, respectively. The discrete-time linear state-space model is given by (1)–(2), and the state transition matrix \({\varvec{F}}_{k}\) and measurement matrix \({\varvec{H}}_{k}\) are given by [8]

$$\begin{aligned} {\varvec{F}}_{k}=\left[ \begin{array}{cc} {\varvec{I}}_{2} &{} \quad \varDelta t{\varvec{I}}_{2} \\ {\varvec{0}} &{} \quad {\varvec{I}}_{2} \end{array}\right] , \qquad \qquad {\varvec{H}}_{k}=\left[ \begin{array}{c} {\varvec{I}}_{2} \quad {\varvec{0}} \end{array}\right] , \end{aligned}$$
(75)

where the sampling interval \(\varDelta t=1\mathrm {s}\).

The state and measurement noises are assumed to have stationary Gaussian distributions, i.e., \({\varvec{w}}_{k}\sim \mathrm {N}({\varvec{q}},{\varvec{Q}})\) and \({\varvec{v}}_{k}\sim \mathrm {N}({\varvec{r}},{\varvec{R}})\), where the true mean vectors and covariance matrices are given by

$$\begin{aligned} {\varvec{q}}= & {} [1\;\;1\;\;1\;\;1]^{\mathrm {T}}, \quad {\varvec{Q}}=3\left[ \begin{array}{cc}\frac{\varDelta {t}^{3}}{3}{\varvec{I}}_{2} &{}\quad \frac{\varDelta {t}^{2}}{2}{\varvec{I}}_{2}\\ \frac{\varDelta {t}^{2}}{2}{\varvec{I}}_{2} &{}\quad \varDelta {t}{\varvec{I}}_{2}\end{array}\right] , \end{aligned}$$
(76)
$$\begin{aligned} {\varvec{q}}= & {} [1\;\;1]^{\mathrm {T}}, \quad {\varvec{R}}= \left[ \begin{array}{cc} 10 &{}\quad 5 \\ 5 &{}\quad 10 \end{array}\right] . \end{aligned}$$
(77)

In this simulation, the nominal mean vectors and covariance matrices of state and measurement noises are, respectively, set as \({\varvec{q}}^{*}={\varvec{0}}_{4\times 1}\), \({\varvec{r}}^{*}={\varvec{0}}_{2\times 1}\), \({\varvec{Q}}^{*}={\varvec{I}}_{4}\) and \({\varvec{R}}^{*}=100{\varvec{I}}_{2}\). To show the effectiveness and superiority of the proposed AKF, we compare the performance of the Kalman filter with nominal noise statistics (KFNNS), the Kalman filter with true noise statistics (KFTNS), the existing SHAKF [2], the existing VBAKF for estimating \({\varvec{P}}_{k|k-1}\) and \({\varvec{R}}\) (VBAKF-PR) [6], and the proposed AKF with inaccurate noise statistics. Note that, in this simulation, the existing SHAKF often diverges, so its simulation results are not given in the following analysis and comparison. The algorithm parameters of the existing VBAKF-PR are selected as: the tuning parameter \(\tau =3\) and the forgetting factor \(\rho =1\). The algorithm parameters of the proposed AKF are set as: the tuning parameter \(\tau =3\), the priori confidence parameters \(\alpha _{k}=0.5\) and \(\beta _{k}=3\), and the initial dof parameter \(\nu _{0}=4\). Moreover, in the existing VBAKF-PR, the mean vectors of state and measurement noises are, respectively, set as \({\varvec{q}}^{*}\) and \({\varvec{r}}^{*}\) due to its inability to estimate unknown mean vectors. The true initial state vector \({\varvec{x}}_{0}=[100,100,10,10]^{\mathrm {T}}\), and the initial estimation error covariance matrix \({\varvec{P}}_{0|0}=10{\varvec{I}}_{4}\), and the initial state estimate \({\hat{{\varvec{x}}}}_{0|0}\) is chosen from \(N({\varvec{x}}_{0},{\varvec{P}}_{0|0})\) randomly. The number of measurements is 200, and the number of iteration is set as \(N=10\), and 1000 independent Monte Carlo runs are performed. All Kalman filtering algorithms are coded with MATLAB and the used computer has an Intel Core i7-6500U CPU at 2.50 GHz.

The root mean square errors (RMSEs) and the averaged root mean square errors (ARMSEs) of position and velocity are utilized to evaluate the estimation accuracy of state vector. We define the RMSE and ARMSE of position as follows

$$\begin{aligned} \left\{ \begin{array}{l} \mathrm {RMSE_{pos}}=\sqrt{\frac{1}{M}\sum \limits _{j=1}^{M}\left( (x_{k}^{j} -\hat{x}_{k|k}^{j})^{2}+(y_{k}^{j}-\hat{y}_{k|k}^{j})^{2}\right) } \\ \mathrm {ARMSE_{pos}}=\sqrt{\frac{1}{MT}\sum \limits _{k=1}^{\mathrm {T}} \sum \limits _{j=1}^{M}\left( (x_{k}^{j}-\hat{x}_{k|k}^{j})^{2}+(y_{k}^{j} -\hat{y}_{k|k}^{j})^{2}\right) } \end{array}\right. , \end{aligned}$$
(78)

where \((x_{k}^{j},y_{k}^{j})\) and \((\hat{x}_{k|k}^{j},\hat{y}_{k|k}^{j})\) denote, respectively, the true position and the filtering estimate of position at the j-th Monte Carlo run, and \(T=200\) s denotes the simulation times, and \(M=1000\) denotes the total number of Monte Carlo runs. Similarly, we can also formulate the RMSE and ARMSE of velocity by \(\mathrm {RMSE_{vel}}\) and \(\mathrm {ARMSE_{vel}}\).

The square root of normalized Euclidean norm (SRNEN) and averaged SRNEN (ASRNEN) are employed to evaluate the estimation accuracy of the one-step prediction and measurement noise mean vector. We define the SRNEN and ASRNEN of the one-step prediction as follows

$$\begin{aligned} \left\{ \begin{array}{l} \mathrm {SRNEN}_{x}\triangleq \left( \frac{1}{nM}\sum \limits _{j=1}^{M} \Vert {\hat{{\varvec{x}}}}_{k|k-1}^{j}-{\hat{{\varvec{x}}}}_{\mathrm {o},k|k-1}^{j} \Vert _{2}^{2}\right) ^{\frac{1}{4}} \\ \mathrm {ASRNEN}_{x}\triangleq \left( \frac{1}{nMT}\sum \limits _{k=1}^{\mathrm {T}} \sum \limits _{j=1}^{M}\Vert {\hat{{\varvec{x}}}}_{k|k-1}^{j}-{\hat{{\varvec{x}}}}_{\mathrm {o}, k|k-1}^{j}\Vert _{2}^{2}\right) ^{\frac{1}{4}} \end{array}\right. , \end{aligned}$$
(79)

where \(\Vert \cdot \Vert _{2}\) denotes the Euclidean norm, and \({\hat{{\varvec{x}}}}_{k|k-1}^{j}\) denotes the estimate of one-step prediction at the j-th Monte Carlo run, and \({\hat{{\varvec{x}}}}_{\mathrm {o},k|k-1}^{j}\) denotes the accurate one-step prediction at the j-th Monte Carlo run provided by the KFTNS. Similarly, we can also formulate the SRNEN and ASRNEN of the measurement noise mean vector by \(\mathrm {SRNEN}_{r}\) and \(\mathrm {ASRNEN}_{r}\).

Fig. 2
figure 2

RMESs of position and velocity

Fig. 3
figure 3

SRNENs of \({\hat{{\varvec{x}}}}_{k|k-1}\) and \({\varvec{r}}\)

Fig. 4
figure 4

SRNFNs of \({\varvec{P}}_{k|k-1}\) and \({\varvec{R}}\)

Fig. 5
figure 5

ARMESs of position and velocity for different numbers of iteration \(N=1,2,\ldots ,20\)

Fig. 6
figure 6

ASRNENs of \({\hat{{\varvec{x}}}}_{k|k-1}\) and \({\varvec{r}}\) for different numbers of iteration \(N=1,2,\ldots ,20\)

Fig. 7
figure 7

ASRNFNs of \({\varvec{P}}_{k|k-1}\) and \({\varvec{R}}\) for different numbers of iteration \(N=1,2,\ldots ,20\)

The square root of normalized Frobenius norm (SRNFN) and averaged SRNFN (ASRNFN) are used to evaluate the estimate accuracy of the prediction error and measurement noise covariance matrices. The SRNFN and ASRNFN of the prediction error covariance matrix are defined as follows [6]

$$\begin{aligned} \left\{ \begin{array}{l} \mathrm {SRNFN}_{P}\triangleq \left( \frac{1}{n^{2}M}\sum \limits _{j=1}^{M}\Vert {\hat{{\varvec{P}}}}_{k|k-1}^{j}-{\varvec{P}}_{\mathrm {o},k|k-1}^{j}\Vert _{F}^{2}\right) ^{\frac{1}{4}}\\ \mathrm {ASRNFN}_{P}\triangleq \left( \frac{1}{n^{2}MT}\sum \limits _{k=1}^{\mathrm {T}} \sum \limits _{j=1}^{M}\Vert {\hat{{\varvec{P}}}}_{k|k-1}^{j}-{\varvec{P}}_{\mathrm {o},k|k-1}^{j}\Vert _{F}^{2} \right) ^{\frac{1}{4}} \end{array}\right. , \end{aligned}$$
(80)

where \(\Vert \cdot \Vert _{F}\) denotes the Frobenius norm, and \({\hat{{\varvec{P}}}}_{k|k-1}^{j}\) denotes the estimate of prediction error covariance matrix at the j-th Monte Carlo run, and \({\varvec{P}}_{\mathrm {o},k|k-1}^{j}\) denotes the accurate prediction error covariance matrix at the j-th Monte Carlo run provided by the KFTNS. Similarly, we can also formulate the SRNFN and ASRNFN of the measurement noise covariance matrix by \(\mathrm {SRNFN}_{R}\) and \(\mathrm {ASRNFN}_{R}\).

Figures 2, 3 and 4 show, respectively, the RMSEs of position and velocity, the SRNENs of the one-step prediction \({\hat{{\varvec{x}}}}_{k|k-1}\) and measurement noise mean vector \({\varvec{r}}\), and the SRNFNs of the prediction error covariance matrix \({\varvec{P}}_{k|k-1}\) and measurement noise covariance matrix \({\varvec{R}}\), where the black and blue lines coincide in the bottom subfigure of Fig. 3. It is seen from Fig. 2 that the proposed AKF has smaller RMSEs of position and velocity than the existing KFNNS and VBAKF-PR, and the proposed AKF can achieve almost the same RMSEs of position as the optimal KFTNS. We can also see from Figs. 3 and 4 that the proposed AKF has smaller SRNENs and SRNFNs than the existing KFNNS and VBAKF-PR. Furthermore, the implementation times of the proposed AKF and existing KFNNS and VBAKF-PR in a single step run are, respectively, 0.63 ms, 0.02 ms and 0.46 ms. Thus, the proposed AKF can better estimate the statistical parameters \(\left\{ {\hat{{\varvec{x}}}}_{k|k-1}, {\varvec{r}}, {\varvec{P}}_{k|k-1}, {\varvec{R}}\right\} \) but has higher computational complexity as compared with the existing KFNNS and VBAKF-PR, which results in improved state estimation accuracy.

Fig. 8
figure 8

RMESs of position and velocity for different priori confidence parameters \(\alpha _{k}=0.5, 1.0, 1.5, 2.0, 2.5, 3.0\)

Fig. 9
figure 9

RMESs of position and velocity for different priori confidence parameters \(\beta _{k}=2, 3, 4, 5, 6, 7\)

Figures 5, 6 and 7 show, respectively, the ARMSEs of position and velocity, the ASRNENs of \({\hat{{\varvec{x}}}}_{k|k-1}\) and \({\varvec{r}}\), and the ASRNFNs of \({\varvec{P}}_{k|k-1}\) and \({\varvec{R}}\) when different numbers of iteration \(N=1,2,\ldots ,20\) are selected, where the black and blue lines coincide in the bottom subfigure of Fig. 6. It can be observed from Figs. 5, 6 and 7 that the proposed AKF has smaller ARMSEs, ASRNENs and ASRNFNs than the existing KFNNS and VBAKF-PR when \(N\ge 3\), and the proposed AKF has almost the same ARMSEs of position as the optimal KFTNS when \(N\ge 5\). Thus, the proposed AKF has better accuracy for jointly estimating position, velocity and statistical parameters \(\left\{ {\hat{{\varvec{x}}}}_{k|k-1}, {\varvec{r}}, {\varvec{P}}_{k|k-1}, {\varvec{R}}\right\} \) when \(N\ge 3\). Moreover, we can observe from Figs. 5, 6 and 7 that the ARMSEs, ASRNENs and ASRNFNs of the proposed AKF all converge when \(N\ge 10\). Therefore, the number of iteration \(N=10\) is enough to guarantee the performance of the proposed AKF.

Fig. 10
figure 10

RMESs of position and velocity for different tuning parameters \(\tau =1, 2, 3, 4, 5, 6\)

Fig. 11
figure 11

RMESs of position and velocity for different initial dof parameters \(\nu _{0}=3.1, 3.5, 4.0, 4.5, 5.0, 5.5\)

Figures 8, 9, 10 and 11 show the RMSEs of position and velocity when different priori confidence parameters \(\alpha _{k}=0.5, 1.0, 1.5, 2.0, 2.5, 3.0\) and \(\beta _{k}=2, 3, 4, 5, 6, 7\), different tuning parameters \(\tau =1, 2, 3, 4, 5, 6\), and different initial dof parameters \(\nu _{0}=3.1, 3.5, 4.0, 4.5, 5.0, 5.5\) are, respectively, selected. It is seen from Fig. 8 that the proposed AKFs with selected priori confidence parameters \(\alpha _{k}\) have better estimation accuracy than existing KFNNS and VBAKF-PR, and the proposed AKF with \(\alpha _{k}=0.5\) achieves the best estimation accuracy. It can be seen from Figs. 9 and 10 that the proposed AKFs with selected priori confidence parameters \(\beta _{k}\) and tuning parameters \(\tau \) have almost consistent estimation accuracy and achieve better estimation accuracy than the existing KFNNS and VBAKF-PR. We can see from Fig. 11 that the proposed AKFs with selected initial dof parameters have almost consistent estimation accuracy of position and better estimation accuracy as compared with the existing KFNNS and VBAKF-PR.

6 Conclusion

In this paper, a new AKF was proposed for a linear Gaussian state-space model with inaccurate noise statistics based on the VB approach. To resist the uncertainties of noise statistics, the state vector, the one-step prediction and corresponding prediction error covariance matrix, and the mean vector and covariance matrix of measurement noise are jointly estimated based on the VB approach. Both the prior joint PDFs of the one-step prediction and corresponding prediction error covariance matrix and the joint PDF of the mean vector and covariance matrix of measurement noise are selected as NIW distributions, based on which the posterior PDFs of these unknown parameters are, respectively, approximated by Gaussian and IW distributions using the VB approach. Simulation results of a target tracking example illustrated that the proposed AKF has better estimation accuracy but higher computational complexity than existing state-of-the-art AKFs, which is induced by the fact that the proposed AKF can better estimate the one-step prediction and corresponding prediction error covariance matrix and the mean vector and covariance matrix of measurement noise as compared with existing state-of-the-art AKFs.