Robust Linear Filter with Parameter Estimation Under Student-t Measurement Distribution

Wang, Zongyuan; Zhou, Weidong

doi:10.1007/s00034-018-0972-8

Robust Linear Filter with Parameter Estimation Under Student-t Measurement Distribution

Published: 03 November 2018

Volume 38, pages 2445–2470, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Robust Linear Filter with Parameter Estimation Under Student-t Measurement Distribution

Download PDF

480 Accesses
11 Citations
Explore all metrics

Abstract

In this paper, we propose an approximate Kalman filter of measurements following Student’s t-distribution by using the variational Bayes approach. This approach can decompose the estimation of multivariate parameters into a univariate estimation. The recursive formula for the approximate posterior densities of parameters and states is derived in detail. Then, the asymptotic Bayesian Cramer–Rao lower bounds are derived for the proposed filter. Numerical simulations verify both the performance of the proposed filter and the variance lower bounds under time-varying noise. The efficiency of the proposed filter is also demonstrated in a real application, namely an integrated strapdown inertial navigation system/Doppler velocity log shipborne test for navigation.

Conditionally Minimax Nonlinear Filter and Unscented Kalman Filter: Empirical Analysis and Comparison

Article 12 July 2019

An Adaptive Robust Student’s t-Based Kalman Filter Based on Multi-sensor Fusion

A New Heavy-Tailed Robust Kalman Filter with Time-Varying Process Bias

Article 19 October 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The classical Kalman filter (KF) is a minimum mean-square-error (MSE) estimator for linear systems under Gaussian noise. Improved KFs are extensively studied to meet the demand of current, more complex systems. For instance, KF with noise correlation at one-epoch apart outperforms the classical KF when the measurement is dependent on the previous state [9]. In applications such as navigation, target tracking and location, noise plays an important role to KF, but its statistics are usually unknown and outliers may appear. Huang et al. [11] found that the KF is suboptimal for non-Gaussian processes or measurement noise with outliers, as the required Gaussian assumptions are not satisfied. To overcome this problem, the contaminated Gaussian distribution [18] considers outliers for noise modeling, and recursion is based on the Bayesian framework with first-order approximation of the prior distribution of the state. Another approach to model the effect of outliers is to use the Student’s t-distribution [19], which has longer tails than the Gaussian distribution assigning non-negligible probabilities to outliers and enabling filters to deal with outliers natively [1]. A linear filter based on the Student’s t-distribution has been proposed assuming that the predicted state and measurement are both Student’s t-distributions, and the posterior probability density function (PDF) is approximately a Student’s t-distribution by adjusting matrix parameters and the degrees of freedom (DOF) [15]. However, this scheme is limited to a specific noise model with process and measurement t-distributions having the same number of DOF. Huang and Zhang [8] propose a robust stochastic cubature filter based on the Student’s t-distribution by modeling a heavy-tailed process and measurement noises as t-distributions with varying DOF, thus extending the stochastic numerical integral based on the Gaussian assumption.

Recently, estimation using a Student-t noise model has been combined with the expectation maximization algorithm [10] and variational Bayes (VB) approach [11, 24]. Expectation maximization iteratively determines the maximum likelihood estimates of parameters for models with some latent variables. The estimation accuracy is related to the number of observations available and often considered as a special case of the VB approach when the approximate density $ q\left( \theta \right) $ of parameter $ \theta $ satisfies $ q\left( \theta \right) = \delta \left( {\theta - \theta^{ * } } \right) $, with $ \theta^{ * } $ being the true value and δ representing a delta function [21]. In fact, the VB approach is an advanced Bayes estimator in the presence of latent variables [11, 24]. Compared with expectation maximization, in which only the single most probable value is estimated by maximum likelihood [4], the VB approach can provide estimation of a posterior distribution of parameters and the posterior distribution of latent variables [13], being an effective alternative to traditional Bayesian methods and expectation maximization [22]. Huang et al. [11] derive a hierarchical Gaussian estimation algorithm based on the Student’s t-distribution in the presence of heavy-tailed process and measurement noise. They combine VB principles with the conventional KF, consider the coupling relation of state and noise variance, and estimate the state and parameters online by alternating recursion and updating. The VB approach has been successfully used in many other research fields like continuous-discrete stochastic dynamic systems [2] and nonlinear dynamical systems [3, 16].

The mean values of parameters are also important for determining Gaussian distribution or Student’s t-distribution, and for applications such as navigation [7] and odometry [14], it is usually nonzero and time varying by factors such as sensor offset. Mean estimation has been exploited and theoretically developed using approaches such as maximum a posteriori with one-step smoothing [7] and the VB approach [25]. In [25], a Gaussian mixture measurement model is utilized but details on the derivation are not presented. From the perspective of Bayesian inference, the mean is treated as a random variable and coupled with the state and variance, and thus the posterior distribution deduced from the VB approach is different from the previous expressions without mean, as detailed in [11, 24].

On the other hand, lower bounds indicate performance limitations, and hence they can be used to determine whether performance requirements can be satisfied [20, 26]. Specifically, the Bayesian Cramer–Rao lower bounds (CRLBs), also called posterior CRLBs, are used for evaluating performance limitations for unknown random parameters with known prior distribution. Although several variations of the Bayesian CRLBs are available, lower bounds for linear approximate estimation with the VB approach have been seldom studied.

Motivated by the above-mentioned aspects, in this paper we use the Student’s t-distribution to model non-Gaussian measurement noise considering mean parameters. First, the VB approach estimates the distributions of the parameters and state of a linear system, where the conjugate prior PDFs of mean and scale matrix are modeled as Gaussian–Wishart distributions. Then, asymptotic Bayesian CRLBs for VB estimators of the linear filter under the Student-t measurement noise are derived to establish the lower bound of error variance for multivariate linear estimation. Both the state and parameters are considered random, and the hybrid derivation rule of the CRLB is employed.

The rest of this paper is organized as follows. The problem formulation is detailed in Sect. 2, and the estimation of parameters and state under Student-t measurement distribution is derived in Sect. 3. Then, the asymptotic Bayesian CRLBs for the estimated parameters and state are provided in Sect. 4. Simulation and comparison results on the stochastic resonator model and the integrated strapdown inertial navigation system (SINS)/Doppler velocity log (DVL) shipborne test are presented in Sect. 5. Finally, we draw conclusions in Sect. 6.

2 Problem Formulation

Consider the following state-space model, which is applicable to integrated navigation and target tracking [7]:

$$ \left\{ {\begin{array}{*{20}l} {\varvec{x}_{k} = \varvec{F}_{k - 1} \varvec{x}_{k - 1} + \varvec{G}_{k - 1} \varvec{w}_{{k{ - 1}}} } \hfill \\ {\varvec{z}_{k} = \varvec{H}_{k} \varvec{x}_{k} + \varvec{v}_{k} } \hfill \\ \end{array} } \right., $$

(1)

where $ \varvec{x}_{k} \in \text{R}^{n} $ is system state, $ \varvec{z}_{k} \in \text{R}^{m} $ is the measurement data, $ \varvec{F}_{k} \in \text{R}^{n \times n} $ is the state transition matrix, $ \varvec{G}_{k} \in \text{R}^{n \times p} $ is the system noise matrix, $ \varvec{w}_{k} \in \text{R}^{p} $ is the process noise with distribution $ p\left( {\varvec{w}_{k} } \right) = N\left( {\varvec{w}_{k} \left| {{\mathbf{0}}_{p \times 1} ,\varvec{Q}_{k} } \right.} \right) $, where $ N\left( { \cdot \left| {\varvec{\mu}_{k} ,{\mathbf{\sum }}_{k} } \right.} \right) $ denotes the Gaussian PDF with mean vector $ \varvec{\mu}_{k} $ and covariance matrix $ {\mathbf{\sum }}_{k} $, whose inverse is called the scale matrix. In addition, $ \varvec{H}_{k} \in \text{R}^{m \times n} $ is the measurement matrix, and all the matrices in Eq. (1) are assumed to be known. Initial state $ \varvec{x}_{0} $ follows Gaussian distribution $ p\left( {\varvec{x}_{0} } \right) = N\left( {\varvec{x}_{0} \left| {\hat{\varvec{x}}_{0\left| 0 \right.} ,\varvec{P}_{0\left| 0 \right.} } \right.} \right) $, and $ \varvec{x}_{0} $, $ \varvec{w}_{k} $, and $ \varvec{v}_{k} $ are mutually independent. We model measurement noise $ \varvec{v}_{k} \in \text{R}^{m} $ as a symmetrical Student’s t-distribution, whose PDF is given by [6]

$$ p\left( {\varvec{v}_{k} } \right) = \text{S}\left( {\varvec{v}_{k} \left| {\varvec{\mu}_{k} ,\varvec{\varLambda}_{k} ,\nu_{k} } \right.} \right) = \frac{{\Gamma \left( {{{\left( {\nu_{k} + m} \right)} \mathord{\left/ {\vphantom {{\left( {\nu_{k} + m} \right)} 2}} \right. \kern-0pt} 2}} \right)}}{{\Gamma \left( {{{\nu_{k} } \mathord{\left/ {\vphantom {{\nu_{k} } 2}} \right. \kern-0pt} 2}} \right)}}\frac{{\left| {\varvec{\varLambda}_{k} } \right|^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}}} }}{{(\pi \nu_{k} )^{{{m \mathord{\left/ {\vphantom {m 2}} \right. \kern-0pt} 2}}} }}\left[ {1 + \frac{1}{{\nu_{k} }}(\varvec{v}_{k} -\varvec{\mu}_{k} )^{\text{T}}\varvec{\varLambda}_{k} (\varvec{v}_{k} -\varvec{\mu}_{k} )} \right]^{{^{{{{ - (\nu_{k} + m)} \mathord{\left/ {\vphantom {{ - (\nu_{k} + m)} 2}} \right. \kern-0pt} 2}}} }} , $$

(2)

where $ \text{S}\left( { \cdot \left| {\varvec{\mu}_{k} ,{\varvec{\Lambda}}_{k} ,\nu_{k} } \right.} \right) $ denotes the PDF of the Student’s t-distribution with mean vector $ \varvec{\mu}_{k} $, scale matrix $ {\varvec{\Lambda}}_{k} $, $ \nu_{k} $ denotes the DOF, and $ \left| \cdot \right| $ and $ \Gamma \left( \cdot \right) $ represent the determinant of square matrix and Gamma function, respectively. Note that $ \left( {{\varvec{\Lambda}}_{k} } \right)^{ - 1} $ is the nominal covariance of the t-distribution.

As the Student’s t-distribution can be expressed as an infinite mixture of Gaussian distributions with common mean and variances scaled by the Gamma distribution, likelihood PDF $ p\left( {\varvec{z}_{k} \left| {\varvec{x}_{k} } \right.} \right)\; $ is expressed as [8]

$$ p\left( {\varvec{z}_{k} \left| {\varvec{x}_{k} } \right.} \right) = \int\limits_{0}^{ + \infty } {N\left( {\varvec{z}_{k} \left| {\varvec{H}_{k} \varvec{x}_{k} +\varvec{\mu}_{k} ,\left( {u_{k}\varvec{\varLambda}_{k} } \right)^{ - 1} } \right.} \right) \times G\left( {u_{k} \left| {{{\nu_{k} } \mathord{\left/ {\vphantom {{\nu_{k} } 2}} \right. \kern-0pt} 2},{{\nu_{k} } \mathord{\left/ {\vphantom {{\nu_{k} } 2}} \right. \kern-0pt} 2}} \right.} \right){\text{d}}u_{k} } , $$

(3)

where $ u_{k} $ is an introduced latent variable, $ G\left( { \cdot \left| {\alpha ,\beta } \right.} \right) $ is the Gamma distribution PDF with shape parameter $ \alpha $ and rate parameter $ \beta $, and the PDF of $ u_{k} $ is given by

$$ p\left( {u_{k} } \right) = G\left( {u_{k} \left| {\alpha ,\beta } \right.} \right) = \frac{{\beta^{\alpha } }}{{\Gamma \left( \alpha \right)}}\left( {u_{k} } \right)^{\alpha - 1} \exp \left\{ { - u_{k} \beta } \right\}. $$

(4)

Likelihood PDF $ p\left( {\varvec{z}_{k} \left| {\varvec{x}_{k} } \right.} \right)\; $ can be rewritten in the hierarchical Gaussian form:

$$ \begin{aligned} p\left( {\varvec{z}_{k} \left| {\varvec{x}_{k} ,\varvec{\mu}_{k} ,u_{k} ,\varvec{\varLambda}_{k} } \right.} \right) & = N\left( {\varvec{z}_{k} \left| {\varvec{H}_{k} \varvec{x}_{k} +\varvec{\mu}_{k} ,\left( {u_{k}\varvec{\varLambda}_{k} } \right)^{ - 1} } \right.} \right), \\ p\left( {u_{k} \left| {\nu_{k} } \right.} \right) & = G\left( {u_{k} \left| {{{\nu_{k} } \mathord{\left/ {\vphantom {{\nu_{k} } 2}} \right. \kern-0pt} 2},{{\nu_{k} } \mathord{\left/ {\vphantom {{\nu_{k} } 2}} \right. \kern-0pt} 2}} \right.} \right). \\ \end{aligned} $$

(5)

Parameter estimation under the Student-t measurement model can thus be formulated under the Bayesian inference framework. One-step predicted PDF $ p\left( {\varvec{x}_{k} \left| {\varvec{z}_{1:k - 1} } \right.} \right) $ is assumed to follow a Gaussian distribution:

$$ p\left( {\varvec{x}_{k} \left| {\varvec{z}_{1:k - 1} } \right.} \right)\text{ = }N\left( {\varvec{x}_{k} \left| {\hat{\varvec{x}}_{{k\left| {k - 1} \right.}} ,\varvec{P}_{{k\left| {k - 1} \right.}} } \right.} \right). $$

(6)

The prior distribution for DOF $ \nu_{k} $ is assumed to follow a Gamma distribution [11]:

$$ p\left( {\nu_{k} } \right) = G\left( {\nu_{k} \left| {a_{{k\left| {k - 1} \right.}} ,b_{{k\left| {k - 1} \right.}} } \right.} \right). $$

(7)

As given in [23], the joint conjugate prior distribution for mean vector $ \varvec{\mu}_{k} $ and scale matrix $ {\varvec{\Lambda}}_{k} $ follows a Gaussian–Wishart distribution and is factorized as their constituting distributions:

$$ p\left( {\varvec{\mu}_{k} ,{\varvec{\Lambda}}_{k} } \right) = p\left( {\varvec{\mu}_{k} \left| {{\varvec{\Lambda}}_{k} } \right.} \right)p\left( {{\varvec{\Lambda}}_{k} } \right) = N\left( {\varvec{\mu}_{k} \left| {\varvec{\eta}_{k} \text{,}\left( {\beta_{{k\left| {k - 1} \right.}} {\varvec{\Lambda}}_{k} } \right)^{ - 1} } \right.} \right)W\left( {{\varvec{\Lambda}}_{k} \left| {\lambda_{{k\left| {k - 1} \right.}} \text{,}{\mathbf{U}}_{{k\left| {k - 1} \right.}} } \right.} \right), $$

(8)

where $ \beta_{{k\left| {k - 1} \right.}} $ is the precision factor, $ W\left( { \cdot \left| {\lambda_{k} ,{\mathbf{U}}_{k} } \right.} \right) $ denotes the Wishart distribution with $ \lambda_{{k\left| {k - 1} \right.}} $ DOF and scale matrix $ {\mathbf{U}}_{{k\left| {k - 1} \right.}} $ [6]. In the sequel, we investigate a VB approach to approximate the posterior densities of the measurement parameters and state for the system in Eq. (1) to minimize the Kullback–Leibler divergence.

3 Estimation Using VB Approach

VB is an iterative approach commonly studied in machine learning and statistics and approximates the posterior distribution as a factored distribution. The parameters of the approximated factored distribution are adjusted by minimizing the Kullback–Leibler divergence between the selected density and the true density. Approximate posterior density $ q\left( {\theta_{k} } \right) $ of parameter $ \theta_{k} $ is determined by [6]

$$ q\left( {\theta_{k} } \right) = \exp \left( {E_{{q\left( {\theta_{l \ne k} } \right)}} \left[ {\ln p\left( {\varvec{X}\text{,}\varvec{\varTheta}} \right)} \right]} \right), $$

(9)

where $ E_{{q\left( {\theta_{l \ne k} } \right)}} \left[ \cdot \right] $ denotes the expectation with respect to each $ \theta_{l} $ such that $ l \ne k $, $ \ln \left( \cdot \right) $ denotes the natural logarithm, and $ q\left( {\theta_{k} } \right) $ is the variational posterior distribution (VPD) of $ \theta_{k} $.

Given that VB requires the joint distribution of measurements and parameters, their joint PDF is given and factorized as

$$ p\left( {\varvec{x}_{k} ,\varvec{\mu}_{k} ,u_{k} ,{\varvec{\Lambda}}_{k} ,\nu_{k} \left| {\varvec{z}_{1:k} } \right.} \right) \propto p\left( {\varvec{x}_{k} \left| {\varvec{z}_{1:k - 1} } \right.} \right)p\left( {\varvec{\mu}_{k} \left| {{\varvec{\Lambda}}_{k} } \right.} \right)p\left( {{\varvec{\Lambda}}_{k} } \right)p\left( {u_{k} \left| {\nu_{k} } \right.} \right)p\left( {\nu_{k} \left| {\varvec{z}_{1:k - 1} } \right.} \right)p\left( {\varvec{z}_{k} \left| {\varvec{x}_{k} ,\varvec{\mu}_{k} ,u_{k} ,{\varvec{\Lambda}}_{k} } \right.} \right). $$

(10)

Considering Eqs. (5), (6), and (7), joint posterior PDF $ p\left( {\varvec{x}_{k} ,\varvec{\mu}_{k} } \right.,\left. {u_{k} ,{\varvec{\Lambda}}_{k} ,\nu_{k} \left| {\varvec{z}_{1:k} } \right.} \right) $ in Eq. (10) has no analytical solution. Thus, a variational method is utilized. Let

$$ p\left( {\varvec{x}_{k} ,\varvec{\mu}_{k} ,u_{k} ,{\varvec{\Lambda}}_{k} ,\nu_{k} \left| {\varvec{z}_{1:k} } \right.} \right) \approx q\left( {\varvec{x}_{k} } \right)q\left( {\varvec{\mu}_{k} } \right)q\left( {u_{k} } \right)q\left( {{\varvec{\Lambda}}_{k} } \right)q\left( {\nu_{k} } \right), $$

(11)

where $ q\left( {\varvec{x}_{k} } \right) $, $ q\left( {\varvec{\mu}_{k} } \right) $, $ q\left( {u_{k} } \right) $, $ q\left( {{\varvec{\Lambda}}_{k} } \right) $, and $ q\left( {\nu_{k} } \right) $ denote, respectively, the approximate posterior PDFs of state $ \varvec{x}_{k} $, mean $ \varvec{\mu}_{k} $, auxiliary parameter $ u_{k} $, scale matrix $ {\varvec{\Lambda}}_{k} $, and DOF $ \nu_{k} $. These VPDs are coupled to each other and cannot be directly derived from Eq. (11), but fixed-point iteration [11] can be used to solve them.

3.1 State and Parameter Estimation Under Student-t Measurement Noise

The optimal solution in Eq. (9) and the PDF decomposition in Eq. (11) allow to calculate the approximate posterior PDF of $ \varvec{x}_{k} $ as

$$ \ln q_{i} \left( {\varvec{x}_{k} } \right) = - \frac{1}{2}{\rm E}_{{q_{i - 1} \left(\varvec{\mu}\right)q_{i - 1} \left( {\varvec{\Lambda}} \right)}} \left[ {\varvec{\xi}_{k}^{T} \left( {u_{k}^{{\left( {i - 1} \right)}} {\varvec{\Lambda}}_{k} } \right)\varvec{\xi}_{k} } \right] - \frac{1}{2}\left( {\varvec{x}_{k} - \hat{\varvec{x}}_{{k\left| {k - 1} \right.}} } \right)^{\text{T}} {\mathbf{P}}_{{k\left| {k - 1} \right.}}^{ - 1} \left( {\varvec{x}_{k} - \hat{\varvec{x}}_{{k\left| {k - 1} \right.}} } \right) + C_{\varvec{x}} , $$

(12)

where $ \varvec{\xi}_{k} = \varvec{z}_{k} - \varvec{H}_{k} \varvec{x}_{k} -\varvec{\mu}_{k} $, $ u_{k}^{{\left( {i - 1} \right)}} { = {\rm E}}_{{q_{i - 1} \left( {u_{k} } \right)}} \left[ {u_{k} } \right] $, and $ q_{i - 1} \left( \cdot \right) $ represents the posterior density at the $ \left( {i - 1} \right) $-th iteration. Taking expectations with respect to the parameters enclosed in square brackets in Eq. (12) and expressing the terms independent from $ \varvec{x}_{k} $ as constant $ C_{\varvec{x}} $ results in the simplified expression:

$$ \ln q_{i} \left( {\varvec{x}_{k} } \right) = \frac{1}{2}\varvec{\zeta}_{k}^{T} \left( {u_{k}^{{\left( {i - 1} \right)}} E_{{q_{i - 1} \left( {{\varvec{\Lambda}}_{k} } \right)}} \left[ {{\varvec{\Lambda}}_{k} } \right]} \right)\varvec{\zeta}_{k} - \frac{1}{2}\left( {\varvec{x}_{k} - \hat{\varvec{x}}_{{k\left| {k - 1} \right.}} } \right)^{\text{T}} {\mathbf{P}}_{{k\left| {k - 1} \right.}}^{ - 1} \left( {\varvec{x}_{k} - \hat{\varvec{x}}_{{k\left| {k - 1} \right.}} } \right) + C, $$

(13)

where $ \varvec{\zeta}_{k} = \varvec{z}_{k} - \varvec{H}_{k} \varvec{x}_{k} - E_{{q_{i - 1} \left(\varvec{\mu}\right)}} \left[ {\varvec{\mu}_{k} } \right] $. Equation (13) is quadratic in $ \varvec{x}_{k} $, and hence $ q_{i} \left( {\varvec{x}_{k} } \right) $ is Gaussian, i.e., $ q_{i} \left( {\varvec{x}_{k} } \right) = N\left( {\varvec{x}_{k} \left| {\hat{\varvec{x}}_{k\left| k \right.}^{\left( i \right)} \text{,}} \right.\varvec{P}_{k\left| k \right.}^{\left( i \right)} } \right) $, with $ \hat{\varvec{x}}_{k\left| k \right.}^{\left( i \right)} \text{,}\,\varvec{P}_{k\left| k \right.}^{\left( i \right)} $ being the first two moments of the distribution obtained as

$$ \hat{\varvec{x}}_{k\left| k \right.}^{\left( i \right)} = \hat{\varvec{x}}_{{k\left| {k - 1} \right.}} + {\mathbf{K}}_{k} \left( {\varvec{z}_{k} - \varvec{H}_{k} \hat{\varvec{x}}_{{k\left| {k - 1} \right.}} - E_{{q_{i - 1} \left( {\varvec{\mu}_{k} } \right)}} \left[ {\varvec{\mu}_{k} } \right]} \right), $$

(14)

$$ \varvec{P}_{k\left| k \right.}^{\left( i \right)} = \varvec{P}_{{k\left| {k - 1} \right.}} - {\mathbf{K}}_{k} \varvec{H}_{k} \varvec{P}_{{k\left| {k - 1} \right.}} , $$

(15)

with Kalman gain

$$ {\mathbf{K}}_{k} = \varvec{P}_{{k\left| {k - 1} \right.}} \varvec{H}_{k}^{T} \left( {\left( {u_{k}^{{\left( {i - 1} \right)}} E_{{q_{i - 1} \left( {{\varvec{\Lambda}}_{k} } \right)}} \left[ {{\varvec{\Lambda}}_{k} } \right]} \right)^{ - 1} + \varvec{H}_{k} \varvec{P}_{{k\left| {k - 1} \right.}} \varvec{H}_{k}^{T} } \right)^{ - 1} . $$

(16)

To derive the VPD of $ {\varvec{\Lambda}}_{k} $, the logarithm of the joint VPD of $ \left( {\varvec{\mu}_{k} ,{\varvec{\Lambda}}_{k} } \right) $ is first computed as

$$ \begin{aligned} \ln q_{i} \left( {\varvec{\mu}_{k} ,{\varvec{\Lambda}}_{k} } \right) & = E_{{q_{i - 1} \left( {u_{k} } \right)\,q_{i - 1} \left( {\nu_{k} } \right){\kern 1pt} q_{i} \left( {\varvec{x}_{k} } \right)}} \left[ {\ln p\left( {\varvec{z}_{k} \left| {\varvec{x}_{k} ,\varvec{\mu}_{k} ,u_{k} ,{\varvec{\Lambda}}_{k} } \right.} \right)} \right.\left. { + \ln p\left( {\varvec{\mu}_{k} \left| {{\varvec{\Lambda}}_{k} } \right.} \right) + \ln p\left( {{\varvec{\Lambda}}_{k} } \right)} \right] + C \\ & = \frac{1}{2}E_{{q_{i - 1} \left( {u_{k} } \right)}} \left[ {u_{k}^{m} } \right]\ln \left| {{\varvec{\Lambda}}_{k} } \right| + \frac{1}{2}\ln \left| {\beta_{{k\left| {k - 1} \right.}} u_{k}^{{\left( {i - 1} \right)}} {\varvec{\Lambda}}_{k} } \right| + \frac{{\lambda_{{k\left| {k - 1} \right.}} - m - 1}}{2}\ln \left| {{\varvec{\Lambda}}_{k} } \right| \\ & \quad - \,\frac{{u_{k}^{{\left( {i - 1} \right)}} }}{2}E_{{q_{i} \left( {\varvec{x}_{k} } \right)}} \left[ {\varvec{\xi}_{k}^{T} {\varvec{\Lambda}}_{k}\varvec{\xi}_{k} } \right] - \frac{1}{2}\text{Tr}\left[ {\left( {{\mathbf{U}}_{{k\left| {k - 1} \right.}} } \right)^{ - 1} {\varvec{\Lambda}}_{k} } \right] \\ & \quad - \,\frac{1}{2}\left( {\varvec{\mu}_{k} -\varvec{\eta}_{{k\left| {k - 1} \right.}} } \right)^{\text{T}} \left( {\beta_{{k\left| {k - 1} \right.}} u_{k}^{{\left( {i - 1} \right)}} {\varvec{\Lambda}}_{k} } \right)\left( {\varvec{\mu}_{k} -\varvec{\eta}_{{k\left| {k - 1} \right.}} } \right) + C_{{\varvec{\mu}_{k} ,\varLambda_{k} }} . \\ \end{aligned} $$

(17)

Then, approximate densities $ q_{i} \left( {\varvec{\mu}_{k} } \right) $ and $ q_{i} \left( {{\varvec{\Lambda}}_{k} } \right) $ are derived, and by the conditional PDF of $ {\varvec{\Lambda}}_{k} $ in Eq. (8), the approximate logarithm of the posterior distribution of $ \varvec{\mu}_{k} $ is given by

$$ \begin{aligned} \ln q_{i} (\varvec{\mu}_{k} |{\varvec{\Lambda}}_{k} ) & = E_{{q_{i - 1} \left( {u_{k} } \right)\,q_{i - 1} \left( {\nu_{k} } \right){\kern 1pt} q_{i} \left( {\varvec{x}_{k} } \right)}} \left[ {\ln p\left( {\varvec{z}_{k} \left| {\varvec{x}_{k} ,\varvec{\mu}_{k} ,u_{k} ,{\varvec{\Lambda}}_{k} } \right.} \right)} \right. + \left. {\ln p\left( {\varvec{\mu}_{k} \left| {{\varvec{\Lambda}}_{k} } \right.} \right)} \right] + C \\ & { = } - \frac{{u_{k}^{{\left( {i - 1} \right)}} }}{2}E_{{q_{i} \left( {\varvec{x}_{k} } \right)}} \left[ {\varvec{\xi}_{k}^{T} {\varvec{\Lambda}}_{k}\varvec{\xi}_{k} } \right] + \frac{1}{2}E_{{q_{i - 1} \left( {u_{k} } \right)}} \left[ {\ln \left| {u_{k} {\varvec{\Lambda}}_{k} } \right|} \right] + \frac{1}{2}\ln \left| {\beta_{{k\left| {k - 1} \right.}} {\varvec{\Lambda}}_{k} } \right| \\ & \quad - \,\frac{1}{2}\left( {\varvec{\mu}_{k} -\varvec{\eta}_{{k\left| {k - 1} \right.}} } \right)^{\text{T}} \left( {\beta_{{k\left| {k - 1} \right.}} u_{k}^{{\left( {i - 1} \right)}} {\varvec{\Lambda}}_{k} } \right)\left( {\varvec{\mu}_{k} -\varvec{\eta}_{{k\left| {k - 1} \right.}} } \right) + C \\ & = - \frac{1}{2}\varvec{\mu}_{k}^{T} \left( {u_{k}^{{\left( {i - 1} \right)}} + \beta_{{k\left| {k - 1} \right.}} u_{k}^{{\left( {i - 1} \right)}} } \right){\varvec{\Lambda}}_{k}\varvec{\mu}_{k} \\ & \quad + \,\varvec{\mu}_{k}^{T} {\varvec{\Lambda}}_{k} \left[ {u_{k}^{{\left( {i - 1} \right)}} \left( {\varvec{z}_{k} - \varvec{H}_{k} {\rm E}_{{q_{i} \left( {\varvec{x}_{k} } \right)}} \left[ {\varvec{x}_{k} } \right]} \right) + \beta_{{k\left| {k - 1} \right.}} u_{k}^{{\left( {i - 1} \right)}}\varvec{\eta}_{{k\left| {k - 1} \right.}} } \right] + C_{\varvec{\mu}} , \\ \end{aligned} $$

(18)

which is quadratic in $ \varvec{\mu}_{k} $, and $ q_{i} \left( {\varvec{\mu}_{k} \left| {{\varvec{\Lambda}}_{k} } \right.} \right) $ is Gaussian:

$$ q_{i} \left( {\varvec{\mu}_{k} \left| {{\varvec{\Lambda}}_{k} } \right.} \right) = N\left( {\varvec{\mu}_{k} \left| {\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} ,\left( {\beta_{k\left| k \right.}^{\left( i \right)} {\varvec{\Lambda}}_{k} } \right)^{ - 1} } \right.} \right), $$

(19)

with parameters

$$ \beta_{k\left| k \right.}^{\left( i \right)} = \beta_{{k\left| {k - 1} \right.}} + u_{k}^{{\left( {i - 1} \right)}} , $$

(20)

$$ \varvec{\eta}_{k\left| k \right.}^{\left( i \right)} =\varvec{\eta}_{{k\left| {k - 1} \right.}} + u_{k}^{{\left( {i - 1} \right)}} \left( {\beta_{k\left| k \right.}^{\left( i \right)} } \right)^{ - 1} \left( {\varvec{z}_{k} - \varvec{H}_{k} \hat{\varvec{x}}_{k\left| k \right.}^{\left( i \right)} -\varvec{\eta}_{{k\left| {k - 1} \right.}} } \right). $$

(21)

Note that if $ u_{k}^{{\left( {i - 1} \right)}} \equiv 1 $ in Eqs. (20) and (21), these equations become a recursive formula for the mean parameter deduced by VB under Gaussian measurement distribution.

The posterior distribution of $ \ln q_{i} \left( {{\varvec{\Lambda}}_{k} } \right) $ is obtained as the difference between Eqs. (17) and (18):

$$ \begin{aligned} \ln q_{i} \left({{\varvec{\Lambda}}_{k}} \right) =& - \frac{m}{2}\ln \left| {\beta_{k\left| k \right.}} \right| - \frac{1}{2}\ln \left| {{\varvec{\Lambda}}_{k}} \right| - \frac{{\lambda_{{k\left| {k - 1} \right.}} - m - 1}}{2}\ln \left| {{\varvec{\Lambda}}_{k}} \right| \\ &\quad - \,\frac{1}{2}u_{k}^{({i - 1})}\text{Tr}\left\{\left[{E_{{q_{i} \left( {{\varvec{x}}_{k} } \right)}} } \left[{{\varvec{\xi}}_{k} {\varvec{\xi}}_{k}^{T}} \right] + \left({\mathbf{U}}_{{k| {k - 1}}} \right)^{- 1} + \beta_{{k| {k - 1}}}\right.\right.\\ &\quad\left.\left. \cdot \left({\varvec{\mu}_{k} - {\varvec{\eta}}_{{k| {k - 1}}}} \right)\left({{\varvec{\mu}}_{k} - {\varvec{\eta}}_{{k| {k - 1}}}} \right)^{\rm T} - \beta_{k| k}^{\left(i \right)} \left({{\varvec{\mu}}_{k} -\varvec{\eta}_{k| k}^{(i)}} \right)\left({{\varvec{\mu}}_{k} - {\varvec{\eta}}_{k| k}^{(i)}} \right)^{\rm T} \right]{\varvec{\Lambda}}_{k} \right\} + C_{{\varvec{\Lambda}}}.\end{aligned} $$

(22)

By conjugate prior, the VPD of $ {\varvec{\Lambda}}_{k} $ can be conveniently written as a Wishart distribution:

$$ q_{i} \left( {{\varvec{\Lambda}}_{k} } \right) = \text{W}\left( {{\varvec{\Lambda}}_{k} \left| {\lambda_{k\left| k \right.}^{\left( i \right)} ,{\mathbf{U}}_{k\left| k \right.}^{\left( i \right)} } \right.} \right), $$

(23)

where the hyperparameters are obtained by substituting Eqs. (20) and (21) into Eq. (22):

$$ \lambda_{k\left| k \right.}^{\left( i \right)} = \lambda_{{k\left| {k - 1} \right.}} + 1, $$

(24)

$$ \left( {{\mathbf{U}}_{k\left| k \right.}^{\left( i \right)} } \right)^{ - 1} = \left( {{\mathbf{U}}_{{k\left| {k - 1} \right.}} } \right)^{ - 1} + u_{k}^{{\left( {i - 1} \right)}} \varvec{H}_{k} \varvec{P}_{k\left| k \right.}^{\left( i \right)} \varvec{H}_{k}^{T} + u_{k}^{{\left( {i - 1} \right)}} \left( {1 - \left( {\beta_{k\left| k \right.}^{\left( i \right)} } \right)^{ - 1} u_{k}^{{\left( {i - 1} \right)}} } \right)\left( {\varvec{z}_{k} - \varvec{H}_{k} \hat{\varvec{x}}_{k\left| k \right.}^{\left( i \right)} -\varvec{\eta}_{{k\left| {k - 1} \right.}} } \right)\left( {\varvec{z}_{k} - \varvec{H}_{k} \hat{\varvec{x}}_{k\left| k \right.}^{\left( i \right)} -\varvec{\eta}_{{k\left| {k - 1} \right.}} } \right)^{\text{T}} . $$

(25)

Note that if $ u_{k}^{{\left( {i - 1} \right)}} $ in the right-hand side of Eq. (25) equals one, this equation becomes a recursive form for the scale matrix of the corresponding Gaussian distribution.

VPD $ q_{i} \left( {u_{k} } \right) $ is obtained as

$$ \begin{aligned} \ln q_{i} \left( {u_{k} } \right) & = E_{{q_{i} \left( {\varvec{\mu}_{k} } \right)\,q_{i} \left( {\varvec{\varLambda}_{k} } \right){\kern 1pt} q_{i} \left( {\varvec{x}_{k} } \right)q_{i - 1} \left( {\nu_{k} } \right)}} \left[ {\ln p\left( {\varvec{z}_{k} \left| {\varvec{x}_{k} ,\varvec{\mu}_{k} ,u_{k} ,\varvec{\varLambda}_{k} } \right.} \right)} \right.\left. { + \ln p\left( {u_{k} \left| {\nu_{k} } \right.} \right)} \right] + C_{u} \\ & = \left( {\frac{{E_{{q_{i - 1} \left( {\nu_{k} } \right)}} \left[ {\nu_{k} } \right]}}{2} - 1} \right)\ln u_{k} - \frac{{E_{{q_{i - 1} \left( {\nu_{k} } \right)}} \left[ {\nu_{k} } \right]}}{2}u_{k} + \frac{m}{2}\ln u_{k}\\ &\quad + \frac{1}{2}\ln \left| {\varvec{\varLambda}_{k} } \right| - \frac{1}{2}E_{{q_{i} \left( {\varvec{\mu}_{k} } \right)\,q_{i} \left( {\varvec{\varLambda}_{k} } \right){\kern 1pt} q_{i} \left( {\varvec{x}_{k} } \right)}} \left[ {\varvec{\xi}_{k}^{T} \left( {u_{k}\varvec{\varLambda}_{k} } \right)\varvec{\xi}_{k}^{T} } \right] + C_{u} \\ & = \left( {\frac{{E_{{q_{i - 1} \left( {\nu_{k} } \right)}} \left[ {\nu_{k} } \right] - 2 + m}}{2}} \right)\ln u_{k} - \frac{{E_{{q_{i - 1} \left( {\nu_{k} } \right)}} \left[ {\nu_{k} } \right]}}{2}u_{k}\\ &\quad - \frac{1}{2}{\rm E}_{{q_{i} \left( {\varvec{\mu}_{k} } \right)\,q_{i} \left( {\varvec{\varLambda}_{k} } \right){\kern 1pt} q_{i} \left( {\varvec{x}_{k} } \right)}} \left[ {\varvec{\xi}_{k}^{T} \left( {u_{k}\varvec{\varLambda}_{k} } \right)\varvec{\xi}_{k}^{T} } \right] + C_{u} . \\ \end{aligned} $$

(26)

By conjugate prior, $ q_{i} \left( {u_{k} } \right) $ is Gamma distribution $ q_{i} \left( {u_{k} } \right) = G\left( {u_{k} \left| {\nu_{k}^{1\left( i \right)} ,\nu_{k}^{2\left( i \right)} } \right.} \right) $, where

$$ \nu_{k}^{1\left( i \right)} = \frac{1}{2}\left( {E_{{q_{i - 1} \left( {\nu_{k} } \right)}} \left[ {\nu_{k} } \right] + m} \right), $$

(27)

$$ \nu_{k}^{2\left( i \right)} = \frac{1}{2}\left( {E_{{q_{i - 1} \left( {\nu_{k} } \right)}} \left[ {\nu_{k} } \right] + E_{{q_{i} \left( {\varvec{\mu}_{k} } \right)\,q_{i} \left( {\varvec{\varLambda}_{k} } \right){\kern 1pt} q_{i} \left( {\varvec{x}_{k} } \right)}} \left[ {\varvec{\xi}_{k}^{T}\varvec{\varLambda}_{k}\varvec{\xi}_{k}^{T} } \right]} \right). $$

(28)

The computation of $ E_{{q_{i} \left( {\varvec{\mu}_{k} } \right)\,q_{i} \left( {\varvec{\varLambda}_{k} } \right){\kern 1pt} q_{i} \left( {\varvec{x}_{k} } \right)}} \left[ {\varvec{\xi}_{k}^{T}\varvec{\varLambda}_{k}\varvec{\xi}_{k}^{T} } \right] $ is given by

$$ \begin{aligned} & E_{{q_{i} \left( {\varvec{\mu}_{k} } \right)\,q_{i} \left( {\varvec{\varLambda}_{k} } \right){\kern 1pt} q_{i} \left( {\varvec{x}_{k} } \right)}} \left[ {\left( {\varvec{z}_{k} - \varvec{H}_{k} \varvec{x}_{k} -\varvec{\mu}_{k} } \right)^{\text{T}}\varvec{\varLambda}_{k} \left( {\varvec{z}_{k} - \varvec{H}_{k} \varvec{x}_{k} -\varvec{\mu}_{k} } \right)} \right] \\ & \text{ = Tr}\left[ {E_{{q_{i} \left( {\varvec{\varLambda}_{k} } \right)}} \left[ {\varvec{\varLambda}_{k} } \right]\left( {\varvec{y}_{k} \varvec{y}_{k}^{T} - \varvec{y}_{k} \left( {\hat{\varvec{x}}_{k\left| k \right.}^{\left( i \right)} } \right)^{\text{T}} \varvec{H}_{k}^{T} - \varvec{H}_{k} \hat{\varvec{x}}_{k\left| k \right.}^{\left( i \right)} \varvec{y}_{k}^{T} } \right.} \right. \\ & \quad \left. {\left. { + \,\varvec{H}_{k} \left( {D_{{q_{i} \left( {\varvec{x}_{k} } \right)}} \left[ {\varvec{x}_{k} } \right] + \hat{\varvec{x}}_{k\left| k \right.}^{\left( i \right)} \left( {\hat{\varvec{x}}_{k\left| k \right.}^{\left( i \right)} } \right)^{\text{T}} } \right)\varvec{H}_{k}^{T} } \right)} \right] + m\left( {\beta_{k\left| k \right.}^{\left( i \right)} } \right)^{ - 1} , \\ \end{aligned} $$

(29)

where $ \varvec{y}_{k} \varvec{ = z}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} $ and $ D_{{q_{i} \left( {\varvec{x}_{k} } \right)}} \left[ \cdot \right] $ denotes the variance of $ \varvec{x}_{k} $.

Next, we derive approximate density $ q_{i} \left( {\nu_{k} } \right) $ by using the conditional PDF of $ u_{k} $ in Eq. (5) and the prior distribution in Eq. (7):

$$ \begin{aligned} \ln q_{i} \left( {\nu_{k} } \right) & = E_{{q_{i} \left( {u_{k} } \right)}} \left[ {\ln p\left( {u_{k} \left| {\nu_{k} } \right.} \right) + \ln p\left( {\nu_{k} } \right)} \right] + C_{{\nu_{k} }} \\ & = \frac{{\nu_{k} }}{2}\ln \frac{{\nu_{k} }}{2} - \ln\Gamma \left( {\frac{{\nu_{k} }}{2}} \right) + \left( {\frac{{\nu_{k} }}{2} - 1} \right)E_{{q_{i} \left( {u_{k} } \right)}} \left[ {\ln u_{k} } \right] - \frac{{\nu_{k} }}{2}E_{{q_{i} \left( {u_{k} } \right)}} \left[ {u_{k} } \right] \\ & \quad + \,\left( {a_{{k\left| {k - 1} \right.}} - 1} \right)\ln \nu_{k} - b_{{k\left| {k - 1} \right.}} \nu_{k} + C_{{\nu_{k} }} . \\ \end{aligned} $$

(30)

Using Stirling’s approximation $ \ln\Gamma \left( {\frac{{\nu_{k} }}{2}} \right) \approx \frac{{\nu_{k} - 1}}{2}\ln \frac{{\nu_{k} }}{2} - \frac{{\nu_{k} }}{2} $ in Eq. (30), the variational estimator of DOF $ \nu_{k} $ follows Gamma distribution $ q_{i} \left( {\nu_{k} } \right) = \text{G}\left( {\nu_{k} \left| {a_{k\left| k \right.}^{\left( i \right)} ,b_{k\left| k \right.}^{\left( i \right)} } \right.} \right) $, where

$$ a_{k\left| k \right.}^{\left( i \right)} = a_{{k\left| {k - 1} \right.}} + \frac{1}{2}, $$

(31)

$$ b_{k\left| k \right.}^{\left( i \right)} = b_{{k\left| {k - 1} \right.}} + \frac{1}{2}u_{k}^{\left( i \right)} - \frac{1}{2}E_{{q_{i} \left( {u_{k} } \right)}} \left[ {\ln u_{k} } \right] - \frac{1}{2}. $$

(32)

As indicated above, the approximate posterior distribution of each variable depends on the expected values of some of the others. The expectations of the parameters are given by

$$ \begin{aligned} & E_{{q_{i} \left( {\varvec{\varLambda}_{k} } \right)}} \left[ {\varvec{\varLambda}_{k} } \right] = \lambda_{k\left| k \right.}^{\left( i \right)} {\mathbf{U}}_{k\left| k \right.}^{\left( i \right)} , \\ & E_{{q_{i} \left( {\varvec{\mu}_{k} } \right)}} \left[ {\varvec{\mu}_{k} } \right] =\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} , \\ & E_{{q_{i} \left( {u_{k} } \right)}} \left[ {u_{k} } \right]\text{ = }u_{k}^{\left( i \right)} = {{\nu_{k}^{1\left( i \right)} } \mathord{\left/ {\vphantom {{\nu_{k}^{1\left( i \right)} } {\nu_{k}^{2\left( i \right)} }}} \right. \kern-0pt} {\nu_{k}^{2\left( i \right)} }}, \\ & E_{{q_{i} \left( {u_{k} } \right)}} \left[ {\ln u_{k} } \right] =\Psi \left( {\nu_{k}^{1\left( i \right)} } \right)\text{ - }\ln \nu_{k}^{2\left( i \right)} , \\ & E\left[ {\nu_{k} } \right] = {{a_{k\left| k \right.}^{\left( i \right)} } \mathord{\left/ {\vphantom {{a_{k\left| k \right.}^{\left( i \right)} } {b_{k\left| k \right.}^{\left( i \right)} }}} \right. \kern-0pt} {b_{k\left| k \right.}^{\left( i \right)} }}, \\ \end{aligned} $$

(33)

where $ {\varPsi (} \cdot \text{)} $ denotes the digamma function, i.e., the first derivative of the natural logarithm of the gamma function.

The selection of number of iterations $ N $ is discussed next, as this parameter determines the estimation accuracy and implementation time. Here, we use a stopping criterion based on the difference in values from two consecutive estimates:

$$ \delta_{k} = \left\{ {\begin{array}{*{20}l} { < 10^{ - 6} } \hfill & {\text{terminate process}} \hfill \\ {\text{otherwise}} \hfill & {{\text{continute}}\,\,{\text{iterative}}\,\,{\text{process}}} \hfill \\ \end{array} } \right., $$

(34)

where the difference is given by $ \delta_{k} \sim\left\| {\hat{\varvec{x}}_{k\left| k \right.}^{\left( i \right)} - \hat{\varvec{x}}_{k\left| k \right.}^{{\left( {i - 1} \right)}} } \right\|^{2} $.

3.2 Hyperparameter Update

State update is the same as that of the standard KF. Considering only the update of unknown parameters, similar to [16, 24], we introduce factor $ \rho $ that indicates the statistics of fluctuation. Uncertainty reduces as $ \rho $ approaches to zero more, and hence we considered a value of $ \rho $ not very small for consistency. The updating of noise parameters is expressed as

$$ \begin{aligned} \lambda_{{k\left| {k - 1} \right.}} = \rho \lambda_{{k - 1\left| {k - 1} \right.}} , \hfill \\ \beta_{{k\left| {k - 1} \right.}} = \rho \beta_{{k - 1\left| {k - 1} \right.}} , \hfill \\ a_{{k\left| {k - 1} \right.}} = \rho a_{{k - 1\left| {k - 1} \right.}} , \hfill \\ b_{{k\left| {k - 1} \right.}} = \rho b_{{k - 1\left| {k - 1} \right.}} , \hfill \\\varvec{\eta}_{{k\left| {k - 1} \right.}} = \rho\varvec{\eta}_{{k - 1\left| {k - 1} \right.}} , \hfill \\ {\mathbf{U}}_{{k\left| {k - 1} \right.}} = \frac{1}{\rho }{\mathbf{U}}_{{k - 1\left| {k - 1} \right.}} . \hfill \\ \end{aligned} $$

(35)

3.3 Filter Algorithm

We summarize the estimation process of the proposed filter in Algorithm 1 and illustrate the hierarchical Gaussian model in Fig. 1.

Note that if latent variable $ u_{k} \equiv 1 $ and the recursive expression for $ \nu_{k} $ are omitted, the proposed algorithm degenerates into the estimation under Gaussian measurement noise. In addition, if the recursive expressions for $ \beta_{k\left| k \right.} $ and $ \varvec{\eta}_{k} $ are omitted, the proposed algorithm degenerates into the estimation without mean under Student-t measurement noise [24].

4 Error Bounds of VB Estimator

The fundamental CRLB sets a lower limit on the MSE for any estimator $ \hat{\theta }\left( \varvec{z} \right) $. Specifically, the Bayesian Cramer–Rao inequality [12, 20] shows that the MSE of any estimator $ \hat{\theta }\left( \varvec{z} \right) $ is lower bounded by

$$ E_{{p\left( {\varvec{z,}\theta } \right)}} \left\{ {\left( {\hat{\theta }\left( \varvec{z} \right) - \theta } \right)\left( {\hat{\theta }\left( \varvec{z} \right) - \theta } \right)^{\text{T}} } \right\} \ge \varvec{J}^{ - 1} , $$

(36)

where $ \theta $ is an $ r $-dimensional estimated random parameter and $ \hat{\theta }\left( \varvec{z} \right) $ is an estimate of $ \theta $. For square matrices $ \varvec{A} $ and $ \varvec{B} $, $ \varvec{A} \ge \varvec{B} $ indicates that $ \varvec{A} - \varvec{B} $ is a positive definite matrix. $ \varvec{J} $ is the $ r \times r $ (Fisher) information matrix that is computed by expectation to joint density $ p\left( {\varvec{z},\theta } \right) $ and can be written as

$$ \varvec{J = }E\left[ { - \nabla_{\varvec{Z}} \nabla_{\varvec{\varTheta}}^{T} \log p_{{\varvec{z,}\theta }} \left( {\varvec{Z},\varvec{\varTheta}} \right)} \right], $$

(37)

where $ \nabla_{\varvec{\varTheta}} = \left[ {\frac{\partial }{{\partial\varvec{\varTheta}_{1} }}, \ldots ,\frac{\partial }{{\partial\varvec{\varTheta}_{r} }}} \right]^{\text{T}} $. As $ p\left( \varvec{Z} \right) $ is the integral of $ p_{{\varvec{z,}\theta }} \left( {\varvec{Z},\varvec{\varTheta}} \right) $ over $ \varvec{\varTheta} $, dependency on $ \varvec{\varTheta} $ is removed, and therefore we obtain the following alternative expression for the information matrix:

$$ \varvec{J} = E\left[ { - \nabla_{\varvec{\varTheta}} \nabla_{\varvec{\varTheta}}^{T} \log p_{{\theta \left| \varvec{z} \right.}} \left( {\varvec{\varTheta}\left| \varvec{Z} \right.} \right)} \right]. $$

(38)

The traditional Bayesian CRLB for sequential estimation requires a double integration over the parameter and every measurement at each iteration, thus being computationally intensive. A posterior CRLB and a conditional posterior CRLB were investigated in [20] and [26], respectively, for a general multidimensional discrete-time filtering problem, providing a recursive approach for calculating the sequential Bayesian CRLB.

We now introduce the VB-CRLB to set the lower bound on the performance of the VB estimator given past measurement $ \varvec{Z}_{k} $ up to time $ k $. As the Bayesian approach considers random state $ \varvec{x} $ and parameter $ \varvec{\theta} $, the joint logarithm PDF for $ \varvec{x}_{k} $ and $ \varvec{\theta}_{k} $ is computed using VB and factorized as

$$ \ln p\left( {\varvec{\theta}_{k} ,\varvec{x}_{k} \left| {\varvec{Z}_{k} } \right.} \right) \approx \ln q\left( {\varvec{\theta}_{k} \left| {\varvec{Z}_{k} } \right.} \right) + \ln q\left( {\varvec{x}_{k} \left| {\varvec{\theta}_{k} ,\varvec{Z}_{k} } \right.} \right), $$

(39)

where $ q_{i} \left( {\varvec{x}_{k} \left| {\varvec{Z}_{k} } \right.} \right) $ and $ q_{i} \left( {\varvec{\theta}_{k} \left| {\varvec{Z}_{k} } \right.} \right) $ are not exact marginal distributions of joint PDF $ p\left( {\varvec{x}_{k} ,\varvec{\theta}_{k} \left| {\varvec{Z}_{k} } \right.} \right) $ but factors whose product is an approximate joint PDF, as indicated in Eq. (11). Assume that parameter $ \varvec{\theta} $ is decomposed into two parts as $ \left[ {\varvec{\theta}_{\alpha }^{T} \;\varvec{\theta}_{\beta }^{T} } \right]^{\text{T}} $, information matrix $ \varvec{J} $ can be decomposed as the corresponding block matrix:

$$ \varvec{J} = \left[ {\begin{array}{*{20}c} {\varvec{J}_{\alpha \alpha } } & {\varvec{J}_{\alpha \beta } } \\ {\varvec{J}_{\beta \alpha } } & {\varvec{J}_{\beta \beta } } \\ \end{array} } \right]. $$

(40)

For clarity, Student-t nominal covariance $ \left( {\varvec{\varLambda}_{k} } \right)^{ - 1} $ should be expressed in terms of Gaussian covariance $ {\varvec{\Sigma}}_{k} $ by introducing multiplier $ {1 \mathord{\left/ {\vphantom {1 {u_{k}^{\left( i \right)} }}} \right. \kern-0pt} {u_{k}^{\left( i \right)} }} $:

$$ {\varvec{\Sigma}}_{k} = {{\left( {{\varvec{\Lambda}}_{k} } \right)^{ - 1} } \mathord{\left/ {\vphantom {{\left( {{\varvec{\Lambda}}_{k} } \right)^{ - 1} } {u_{k}^{\left( i \right)} }}} \right. \kern-0pt} {u_{k}^{\left( i \right)} }}, $$

(41)

with expectation

$$ {\hat{\varvec{\Sigma}}}_{k}^{\left( i \right)} = E_{{q_{i} \left( {{\varvec{\Sigma}}_{k} } \right)}} \left[ {{\varvec{\Sigma}}_{k} } \right] = {{\left( {E_{{q_{i} \left( {\varvec{\varLambda}_{k} } \right)}} \left[ {{\varvec{\Lambda}}_{k} } \right]} \right)^{ - 1} } \mathord{\left/ {\vphantom {{\left( {E_{{q_{i} \left( {\varvec{\varLambda}_{k} } \right)}} \left[ {{\varvec{\Lambda}}_{k} } \right]} \right)^{ - 1} } {u_{k}^{\left( i \right)} }}} \right. \kern-0pt} {u_{k}^{\left( i \right)} }}. $$

(42)

For parameters $ \varvec{\mu} $ and $ {\varvec{\Sigma}} $, by Eqs. (19), (23), and (41), joint VPD $ q_{i} \left( {\varvec{\mu}_{k} ,{\varvec{\Sigma}}_{k} } \right) = NW\left( {\varvec{\mu}_{k} ,{\varvec{\Sigma}}_{k} \left| {\beta_{k\left| k \right.}^{\left( i \right)} ,\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} ,\lambda_{k\left| k \right.}^{\left( i \right)} ,{\mathbf{U}}_{k\left| k \right.}^{\left( i \right)} } \right.} \right) $ is given by

$$\begin{aligned} & q_{i} \left( {\varvec{\mu}_{k} ,{\varvec{\Sigma}}_{k} } \right) \propto \left| {\left( {{\varvec{\Sigma}}_{k} } \right)^{ - 1} } \right|^{{{{\left( {\lambda_{k\left| k \right.}^{\left( i \right)} - m} \right)} \mathord{\left/ {\vphantom {{\left( {\lambda_{k\left| k \right.}^{\left( i \right)} - m} \right)} 2}} \right. \kern-0pt} 2}}}\\ &\quad \exp \left\{ { - \frac{1}{2}\left[ {\left( {u_{k}^{\left( i \right)} } \right)^{ - 1} \left( {{\varvec{\Sigma}}_{k} } \right)^{ - 1} } \right.} \right.\left. {\left. { \cdot \left( {\left( {{\mathbf{U}}_{k\left| k \right.}^{\left( i \right)} } \right)^{ - 1} + \beta_{k\left| k \right.}^{\left( i \right)} \left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)\left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)^{\text{T}} } \right)\,} \right]\,} \right\}.\end{aligned} $$

(43)

Let parameters $ \varvec{\theta}_{1} =\varvec{\mu}_{k} $ and $ \varvec{\theta}_{2} = {\varvec{\Sigma}}_{k} $. By Eqs. (38) and (40), we have

$$ \varvec{J}_{11,k\left( i \right)} = - E_{{q_{i} \left( {\varvec{\mu}_{k} ,{\varvec{\Sigma}}_{k} } \right)}} \left[ {\nabla_{{\varvec{\mu}_{k}^{T} }} \left[ {\nabla_{{\varvec{\mu}_{k} }} \ln q_{i} \left( {\varvec{\mu}_{k} ,{\varvec{\Sigma}}_{k} } \right)} \right]} \right], $$

(44)

and omitting terms independent of $ \varvec{\mu} $, entry $ \varvec{J}_{11,k\left( i \right)} $ can be rewritten as

$$ \varvec{J}_{11,k\left( i \right)} = - E_{{q_{i} \left( {\varvec{\mu}_{k} ,\varvec{\varSigma}_{k} } \right)}} \left[ {\nabla_{{\varvec{\mu}_{k}^{T} }} \left[ {\nabla_{{\varvec{\mu}_{k} }} \left( { - \frac{1}{2}\left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)^{\text{T}} \left( {\beta_{k\left| k \right.}^{\left( i \right)} {\varvec{\Lambda}}_{k} } \right)\left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)} \right)} \right]} \right], $$

(45)

which is simplified as

$$ \varvec{J}_{11,k\left( i \right)} = E_{{q_{i} \left( {\varvec{\mu}_{k} \left| {\varvec{\varLambda}_{k} } \right.} \right)}} \left[ {E_{{q_{i} \left( {\varvec{\varLambda}_{k} } \right)}} \left[ {\beta_{k\left| k \right.}^{\left( i \right)} {\varvec{\Lambda}}_{k} } \right]} \right]. $$

(46)

Substituting $ E_{{q_{i} \left( {\varvec{\varLambda}_{k} } \right)}} \left[ {{\varvec{\Lambda}}_{k} } \right] = \lambda_{k\left| k \right.}^{\left( i \right)} {\mathbf{U}}_{k\left| k \right.}^{\left( i \right)} $ into Eq. (46) yields

$$ \varvec{J}_{11,k\left( i \right)} = \beta_{k\left| k \right.}^{\left( i \right)} \lambda_{k\left| k \right.}^{\left( i \right)} {\mathbf{U}}_{k\left| k \right.}^{\left( i \right)} , $$

(47)

indicating that the derived asymptotic Bayesian CRLB for $ \varvec{\mu}_{k} $ is $ ABCLB_{i} \left( {\varvec{\mu}_{k} } \right) = \left( {\varvec{J}_{11,k\left( i \right)} } \right)^{ - 1} $.

Next, the first non-diagonal entry is computed by

$$ \varvec{J}_{21,k\left( i \right)} = - E_{{q_{i} \left( {\varvec{\mu}_{k} ,{\varvec{\Sigma}}_{k} } \right)}} \left[ {\nabla_{{\varvec{\mu}_{k}^{T} }} \left[ {\nabla_{{{\varvec{\Sigma}}_{k} }} \left( {\ln q_{i} \left( {\varvec{\mu}_{k} ,{\varvec{\Sigma}}_{k} } \right)} \right)} \right]} \right], $$

(48)

which is simplified by omitting terms unrelated to $ {\varvec{\Sigma}}_{k} $ as

$$\begin{aligned} \varvec{J}_{21,k\left( i \right)} &= - E_{{q_{i} \left( {\varvec{\mu}_{k} ,{\varvec{\Sigma}}_{k} } \right)}} \left[ {\nabla_{{\varvec{\mu}_{k}^{T} }} \left[ {\nabla_{{{\varvec{\Sigma}}_{k} }} \left( {\frac{{\left( {\lambda_{k\left| k \right.}^{\left( i \right)} - m} \right)}}{2}\ln \left| {{\varvec{\Sigma}}_{k}^{ - 1} } \right| }\right.}\right.}\right.\\ & \left.{\left.{\left.{- \frac{1}{2}} \right.} \right.} \right.\left. {\left. {\left. {\text{Tr}\left( {\left[ {\left( {{\mathbf{U}}_{k\left| k \right.}^{\left( i \right)} } \right)^{ - 1} + \beta_{k\left| k \right.}^{\left( i \right)} \left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)\left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)^{T} } \right]\left( {u_{k}^{\left( i \right)} } \right)^{ - 1} {\varvec{\Sigma}}_{k}^{ - 1} } \right)} \right)} \right]} \right]. \end{aligned}$$

(49)

Let us recall some rules on derivatives and traces of matrices that facilitate the calculation of derivatives of function $ \ln q_{i} \left( {\varvec{\mu}_{k} ,{\varvec{\Sigma}}_{k} } \right) $ presented in Eq. (49). Let $ \varvec{A} $ and $ \varvec{B} $ be invertible matrices and $ \varvec{X} $ a column vector, then [5]

$$ \begin{aligned} \frac{\partial }{{\partial \varvec{A}}}\ln \left| {\varvec{A}^{ - 1} } \right| & = - \frac{\partial }{{\partial \varvec{A}}}\ln \left| \varvec{A} \right| = - \frac{1}{{\left| \varvec{A} \right|}}\frac{{\partial \left| \varvec{A} \right|}}{{\partial \varvec{A}}} = - \left( {\varvec{A}^{ - 1} } \right)^{\text{T}} , \\ \text{Tr}\left( {\varvec{AB}} \right) & = \text{Tr}\left( {\varvec{BA}} \right), \\ \frac{\partial }{{\partial \varvec{A}}}\text{Tr}\left( {\varvec{BA}^{ - 1} } \right) & = - \left( {\varvec{A}^{ - 1} \varvec{BA}^{ - 1} } \right)^{\text{T}} . \\ \end{aligned} $$

(50)

The derivative of a matrix product with respect to a vector is given by

$$ \frac{{\partial \varvec{A}\left( \varvec{X} \right)\varvec{B}\left( \varvec{X} \right)}}{{\partial \varvec{X}}} = \frac{{\partial \varvec{A}\left( \varvec{X} \right)}}{{\partial \varvec{X}}}\varvec{B}\left( \varvec{X} \right) + \left[ {\varvec{I}_{n} \otimes \varvec{A}\left( \varvec{X} \right)} \right]\frac{{\partial \varvec{B}\left( \varvec{X} \right)}}{{\partial \varvec{X}}}, $$

(51)

where $ \otimes $ denotes the Kronecker product and $ \varvec{I}_{n} $ denotes the identity matrix of dimension n.

Using Eqs. (50), (49) is computed as

$$ \varvec{J}_{21,k\left( i \right)} = \frac{1}{2}E_{{q_{i} \left( {\varvec{\mu}_{k} ,{\varvec{\Sigma}}_{k} } \right)}} \left[ {\nabla_{{\varvec{\mu}_{k}^{T} }} \left[ {{\mathbf{T}}\left( {{\varvec{\Sigma}}_{k} } \right) + {\varvec{\Sigma}}_{k}^{ - 1} \cdot \left( {u_{k}^{\left( i \right)} } \right)^{ - 1} \beta_{k\left| k \right.}^{\left( i \right)} \varvec{S}\left( {\varvec{\mu}_{k} } \right) \cdot {\varvec{\Sigma}}_{k}^{ - 1} } \right]} \right], $$

(52)

where $ {\mathbf{T}}\left( {{\varvec{\Sigma}}_{k} } \right) = \left( {m - \lambda_{k\left| k \right.}^{\left( i \right)} } \right)\left( {{\varvec{\Sigma}}_{k}^{ - 1} } \right)^{\text{T}} + \left( {u_{k}^{\left( i \right)} } \right)^{ - 1} {\varvec{\Sigma}}_{k}^{ - 1} \left( {{\mathbf{U}}_{k\left| k \right.}^{\left( i \right)} } \right)^{ - 1} {\varvec{\Sigma}}_{k}^{ - 1} $ and $ \varvec{S}\left( {\varvec{\mu}_{k} } \right) = \left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)\left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)^{\text{T}} $ denote the statistics about $ {\varvec{\Sigma}}_{k} $ and $ \varvec{\mu}_{k} $, respectively. Using the matrix rule in Eqs. (51), (52) is obtained as

$$ \begin{aligned} \varvec{J}_{21,k\left( i \right)} & = \frac{1}{2}E_{{q_{i} \left( {{\varvec{\Sigma}}_{k} } \right)}} \left( {\left( {u_{k}^{\left( i \right)} } \right)^{ - 1} \beta_{k\left| k \right.}^{\left( i \right)} \left( {\varvec{I}_{m} \otimes {\varvec{\Sigma}}_{k}^{ - 1} } \right)_{mm \times mm} } \right. \\ &\quad \cdot E_{{q_{i} \left( {\varvec{\mu}_{k} \left| {{\varvec{\Sigma}}_{k} } \right.} \right)}} \left[ {\left( {\left( {{\text{vec}}\left( {\varvec{I}_{m} } \right)} \right)_{mm \times 1} \otimes \left( {\left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)^{\text{T}} } \right)_{1 \times m} } \right.} \right. \\ & \left. {\left. {\quad + \,\left( {\varvec{I}_{m} \otimes \left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)} \right)_{mm \times m} } \right)} \right]\left. { \cdot {\varvec{\Sigma}}_{k}^{ - 1} } \right), \\ \end{aligned} $$

(53)

where subscript $ a \times b $ denotes the dimension of the corresponding matrix and $ {\text{vec}}\left( {\varvec{I}_{m} } \right) $ is a column vector obtained by stacking the consecutive columns of $ \varvec{I}_{m} $. Note that $ E_{{q_{i} \left( {\varvec{\mu}_{k} \left| {\varSigma_{k} } \right.} \right)}} \left[ {\left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)^{\text{T}} } \right] = E_{{q_{i} \left( {\varvec{\mu}_{k} \left| {{\varvec{\Sigma}}_{k} } \right.} \right)}} \left[ {\left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)} \right] = 0 $, and $ \varvec{J}_{12,k\left( i \right)} $ has a similar term. Hence, the non-diagonal entries are both zeros, i.e., $ \varvec{J}_{21,k\left( i \right)} = {\mathbf{0}}_{mm \times m} $ and $ \varvec{J}_{12,k\left( i \right)} = {\mathbf{0}}_{m \times mm} $. As a result, information matrix $ \varvec{J} $ is a block diagonal matrix, and the asymptotic Bayesian CRLB is its inverse.

As entry $ \varvec{J}_{22,k\left( i \right)} $ is mathematically intractable for a general matrix $ {\varvec{\Sigma}}_{k} $, we consider $ {\varvec{\Sigma}}_{k} $ to be a diagonal matrix, and hence $ {\varvec{\Lambda}}_{k} $ is also diagonal:

$$ \left( {{\varvec{\Lambda}}_{k} } \right)^{ - 1} = {\text{diag}}\left( {\left[ {\begin{array}{*{20}c} {\bar{\sigma }_{k,1}^{2} ,} & {\bar{\sigma }_{k,2}^{2} ,} & { \cdots ,} & {\bar{\sigma }_{k,m}^{2} } \\ \end{array} } \right]} \right), $$

(54)

where the prior distribution of element $ \bar{\sigma }_{k,j}^{2} $ follows inverse Gamma distribution $ p\left( {\bar{\sigma }_{k,j}^{2} } \right)\text{ = }{\text{IG}}\left( {\bar{\sigma }_{k,j}^{2} \left| {\bar{\kappa }_{k\left| k \right. - 1,j} \text{,}\bar{\gamma }_{{k\left| {k - 1} \right.,j}} } \right.} \right) $, which leads to the VPD being inverse Gamma distribution

$$ q_{i} \left( {\bar{\sigma }_{k,j}^{2} } \right)\text{ = }{\text{IG}}\left( {\bar{\sigma }_{k,j}^{2} \left| {\bar{\kappa }_{k\left| k \right.,j}^{\left( i \right)} \text{,}\bar{\gamma }_{k\left| k \right.,j}^{\left( i \right)} } \right.} \right) $$

(55)

with parameters

$$ \begin{aligned} \bar{\kappa }_{k\left| k \right.,j}^{\left( i \right)} & = \bar{\kappa }_{k\left| k \right. - 1,j} + \frac{1}{2}, \\ \bar{\gamma }_{k\left| k \right.,j}^{\left( i \right)} & = \bar{\gamma }_{{k\left| {k - 1} \right.,j}} + \frac{{u_{k}^{{\left( {i - 1} \right)}} }}{2}\left[ {\varvec{H}_{k} \varvec{P}_{k\left| k \right.}^{\left( i \right)} \varvec{H}_{k}^{T} } \right]_{jj} + \frac{{u_{k}^{{\left( {i - 1} \right)}} }}{2}\left[ {1 - \left( {\beta_{k\left| k \right.}^{\left( i \right)} } \right)^{ - 1} } \right] \\ & \quad \times \,\left[ {\left( {\varvec{z}_{k} - \varvec{H}_{k} \hat{\varvec{x}}_{k\left| k \right.}^{\left( i \right)} -\varvec{\eta}_{{k\left| {k - 1} \right.}} } \right)\left( {\varvec{z}_{k} - \varvec{H}_{k} \hat{\varvec{x}}_{k\left| k \right.}^{\left( i \right)} -\varvec{\eta}_{{k\left| {k - 1} \right.}} } \right)^{T} } \right]_{jj} , \\ \end{aligned} $$

(56)

and predictions $ \bar{\kappa }_{k + 1\left| k \right.,j} = \rho \bar{\kappa }_{k\left| k \right.,j} $ and $ \bar{\gamma }_{k + 1\left| k \right.,j} = \rho \bar{\gamma }_{k\left| k \right.,j} $. In this case, the mean of individual parameter $ \varvec{\mu}_{k,j} $ follows distribution

$$ q_{i} \left( {\varvec{\mu}_{k,j} \left| {\bar{\sigma }_{k,j}^{2} } \right.} \right) = N\left( {\varvec{\mu}_{k,j} \left| {\varvec{\eta}_{k\left| k \right.,j}^{\left( i \right)} ,\left( {\beta_{k\left| k \right.}^{\left( i \right)} } \right)^{ - 1} \bar{\sigma }_{k\left( i \right),j}^{2} } \right.} \right), $$

(57)

resulting in $ \varvec{J}_{11,k\left( i \right),j} = \beta_{k\left| k \right.}^{\left( i \right)} \left( {E\left[ {\bar{\sigma }_{k\left( i \right),j}^{2} } \right]} \right)^{ - 1} = $$ \beta_{k\left| k \right.}^{\left( i \right)} {{\left( {\bar{\kappa }_{k\left| k \right.,j}^{\left( i \right)} - 1} \right)} \mathord{\left/ {\vphantom {{\left( {\bar{\kappa }_{k\left| k \right.,j}^{\left( i \right)} - 1} \right)} {\bar{\gamma }_{k\left| k \right.,j}^{\left( i \right)} }}} \right. \kern-0pt} {\bar{\gamma }_{k\left| k \right.,j}^{\left( i \right)} }} $. The lower bound for the estimation variance associated with the mean of individual parameter $ \varvec{\mu}_{k,j} $ is given by

$$ ABCLB_{i} \left( {\varvec{\mu}_{k,j} } \right) = \varvec{J}_{11,k\left( i \right),j}^{ - 1} = {{\bar{\gamma }_{k\left| k \right.,j}^{\left( i \right)} } \mathord{\left/ {\vphantom {{\bar{\gamma }_{k\left| k \right.,j}^{\left( i \right)} } {\left( {\beta_{k\left| k \right.}^{\left( i \right)} \left( {\bar{\kappa }_{k\left| k \right.,j}^{\left( i \right)} - 1} \right)} \right)}}} \right. \kern-0pt} {\left( {\beta_{k\left| k \right.}^{\left( i \right)} \left( {\bar{\kappa }_{k\left| k \right.,j}^{\left( i \right)} - 1} \right)} \right)}}, $$

(58)

and the lower bound on the variance of the individual parameter is computed from

$$ \varvec{J}_{22,k\left( i \right)} = - E_{{q_{i} \left( {\varvec{\mu}_{k} ,\varvec{\varSigma}_{k} } \right)}} \left[ {\nabla_{{\varvec{\varSigma}_{k} }} \left[ {\nabla_{{\varvec{\varSigma}_{k} }} \left( {\ln q_{i} \left( {\varvec{\mu}_{k} ,{\varvec{\Sigma}}_{k} } \right)} \right)} \right]} \right]_{mm \times mm} , $$

(59)

with joint variational posterior distribution being

$$\begin{aligned} q_{i} \left( {\varvec{\mu}_{k} ,{\varvec{\Sigma}}_{k} } \right)& \propto \left[ {\prod\limits_{j = 1}^{m} {\left( {\sigma_{k,j}^{2} } \right)^{{ - \left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} + \frac{3}{2}} \right)}} } } \right]\\ & \quad \times \exp \left\{ { - \frac{1}{{2u_{k}^{{\left( {i - 1} \right)}} }}\sum\limits_{j = 1}^{m} {\frac{{\beta_{k\left| k \right.}^{\left( i \right)} \left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)_{j}^{2} + 2\bar{\gamma }_{k\left| k \right.,j}^{\left( i \right)} }}{{\sigma_{k,j}^{2} }}} } \right\}, \end{aligned}$$

(60)

where $ \sigma_{k,j}^{2} $ is the jj-th element of diagonal matrix $ {\varvec{\Sigma}}_{k} $ distributed according to $ q_{i} \left( {\sigma_{k,j}^{2} } \right)\text{ = }{\text{IG}}\left( {\sigma_{k,j}^{2} \left| {\kappa_{k\left| k \right.,j}^{\left( i \right)} \text{,}\gamma_{k\left| k \right.,j}^{\left( i \right)} } \right.} \right) $, with $ \kappa_{k\left| k \right.,j}^{\left( i \right)} = \bar{\kappa }_{k\left| k \right.,j}^{\left( i \right)} $ and $ \gamma_{k\left| k \right.,j}^{\left( i \right)} = {{\bar{\gamma }_{k\left| k \right.,j}^{\left( i \right)} } \mathord{\left/ {\vphantom {{\bar{\gamma }_{k\left| k \right.,j}^{\left( i \right)} } {u_{k}^{{\left( {i - 1} \right)}} }}} \right. \kern-0pt} {u_{k}^{{\left( {i - 1} \right)}} }} $. As $ \varvec{J}_{22,k\left( i \right)} $ is a block diagonal matrix involving $ m $ subblocks, the nonzero jj-th element of the j-th subblock is

$$\begin{aligned} \varvec{J}_{22,k\left( i \right)}^{jj} &= E_{{q_{i} \left( {\varvec{\mu}_{k,j} ,\sigma_{k,j}^{2} } \right)}} \left[ {\nabla_{{\sigma_{k,j}^{2} }} \left[ {\nabla_{{\sigma_{k,j}^{2} }} \left( {\vphantom{\frac{{\beta_{k\left| k \right.}^{\left( i \right)} \left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)_{j}^{2} + 2\bar{\gamma }_{k\left| k \right.,j}^{\left( i \right)} }}{{\sigma_{k,j}^{2} }}}\left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} + \frac{3}{2}} \right)\ln \sigma_{k,j}^{2} } \right.} \right.} \right. \\ &\quad \quad + \left. {\left. {\left. {\frac{1}{{2u_{k}^{{\left( {i - 1} \right)}} }}\frac{{\beta_{k\left| k \right.}^{\left( i \right)} \left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)_{j}^{2} + 2\bar{\gamma }_{k\left| k \right.,j}^{\left( i \right)} }}{{\sigma_{k,j}^{2} }}} \right)} \right]} \right] \end{aligned} $$

(61)

$$ \begin{aligned} & = E_{{q_{i} \left( {\sigma_{k,j}^{2} } \right)}} \left[ {E_{{q_{i} \left( {\varvec{\mu}_{k,j} \left| {\sigma_{k,j}^{2} } \right.} \right)}} \left[ { - \frac{{2\kappa_{k\left| k \right.,j}^{\left( i \right)} + 3}}{{2\left( {\sigma_{k,j}^{2} } \right)^{2} }} + \frac{{\left( {\beta_{k\left| k \right.}^{\left( i \right)} \left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)_{j}^{2} + 2\bar{\gamma }_{k\left| k \right.,j}^{\left( i \right)} } \right)}}{{u_{k}^{{\left( {i - 1} \right)}} \left( {\sigma_{k,j}^{2} } \right)^{3} }}} \right]} \right] \\ & = - \frac{1}{2}E_{{q_{i} \left( {\sigma_{k,j}^{2} } \right)}} \left[ {\frac{{2\kappa_{k\left| k \right.,j}^{\left( i \right)} + 3}}{{\left( {\sigma_{k,j}^{2} } \right)^{2} }}} \right] + E_{{q_{i} \left( {\sigma_{k,j}^{2} } \right)}} \left[ {E_{{q\left( {\varvec{\mu}_{k,j} \left| {\sigma_{k,j}^{2} } \right.} \right)}} \left[ {\frac{{\left( {\beta_{k\left| k \right.}^{\left( i \right)} \left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)_{j}^{2} + 2\bar{\gamma }_{k\left| k \right.,j}^{\left( i \right)} } \right)}}{{u_{k}^{{\left( {i - 1} \right)}} \left( {\sigma_{k,j}^{2} } \right)^{3} }}} \right]} \right]. \\ \end{aligned} $$

(62)

Using the conditional covariance of $ \varvec{\mu}_{k,j} $ in Eq. (57), we obtain

$$ E_{{q\left( {\varvec{\mu}_{k,j} \left| {\sigma_{k,j}^{2} } \right.} \right)}} \left[ {\beta_{k\left| k \right.}^{\left( i \right)} \left( {\varvec{\mu}_{k} -\varvec{\eta}_{k\left| k \right.}^{\left( i \right)} } \right)_{j}^{2} } \right] = \beta_{k\left| k \right.}^{\left( i \right)} D\left[ {\varvec{\mu}_{k,j} } \right] = u_{k}^{{\left( {i - 1} \right)}} \sigma_{k\left( i \right),j}^{2} , $$

(63)

which substituted into (62) simplifies $ \varvec{J}_{22,k\left( i \right)}^{jj} $ as

$$ \varvec{J}_{22,k\left( i \right)}^{jj} = - \frac{1}{2}E_{{q_{i} \left( {\sigma_{k,j}^{2} } \right)}} \left[ {\frac{{2\kappa_{k\left| k \right.,j}^{\left( i \right)} + 1}}{{\left( {\sigma_{k,j}^{2} } \right)^{2} }}} \right] + E_{{q_{i} \left( {\sigma_{k,j}^{2} } \right)}} \left[ {\frac{{2\gamma_{k\left| k \right.,j}^{\left( i \right)} }}{{\left( {\sigma_{k,j}^{2} } \right)^{3} }}} \right]. $$

(64)

Note that if random variable $ \alpha $ follows inverse Gamma distribution $ {\text{IG}}\left( {\alpha \left| {c,d} \right.} \right) $, where $ c $ and $ d $ are shape and scale parameters, respectively, the following general result about expectation is obtained:

$$ E_{{\text{IG}\left( \alpha \right)}} \left[ {\frac{1}{{\alpha^{n} }}} \right] = d^{ - n} {{{\varvec{\Gamma}}\left( {c + n} \right)} \mathord{\left/ {\vphantom {{{\varvec{\Gamma}}\left( {c + n} \right)} {{\varvec{\Gamma}}\left( c \right)}}} \right. \kern-0pt} {{\varvec{\Gamma}}\left( c \right)}}. $$

(65)

If $ \alpha = \sigma_{k,j}^{2} $, $ c = \kappa_{k\left| k \right.,j}^{\left( i \right)} $, $ d = \gamma_{k\left| k \right.,j}^{\left( i \right)} $, and $ n = 2 $ we obtain

$$\begin{aligned} E_{{q_{i} \left( {\sigma_{k,j}^{2} } \right)}} \left[ {\frac{1}{{\left( {\sigma_{k,j}^{2} } \right)^{2} }}} \right] &= {{{{{\varvec{\Gamma}}\left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} + 2} \right)} \mathord{\left/ {\vphantom {{{\varvec{\Gamma}}\left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} + 2} \right)} {{\varvec{\Gamma}}\left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} } \right)}}} \right. \kern-0pt} {{\varvec{\Gamma}}\left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} } \right)}}} \mathord{\left/ {\vphantom {{{{{\varvec{\Gamma}}\left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} + 2} \right)} \mathord{\left/ {\vphantom {{{\varvec{\Gamma}}\left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} + 2} \right)} {{\varvec{\Gamma}}\left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} } \right)}}} \right. \kern-0pt} {{\varvec{\Gamma}}\left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} } \right)}}} {\left( {\gamma_{k\left| k \right.,j}^{\left( i \right)} } \right)^{2} }}} \right. \kern-0pt} {\left( {\gamma_{k\left| k \right.,j}^{\left( i \right)} } \right)^{2} }} \\ &= {{\kappa_{k\left| k \right.,j}^{\left( i \right)} \left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} + 1} \right)} \mathord{\left/ {\vphantom {{\kappa_{k\left| k \right.,j}^{\left( i \right)} \left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} + 1} \right)} {\left( {\gamma_{k\left| k \right.,j}^{\left( i \right)} } \right)^{2} }}} \right. \kern-0pt} {\left( {\gamma_{k\left| k \right.,j}^{\left( i \right)} } \right)^{2} }}. \end{aligned}$$

(66)

Similarly, for $ n = 3 $ we obtain

$$ E_{{q_{i} \left( {\sigma_{k,j}^{2} } \right)}} \left[ {\frac{{2\gamma_{k\left| k \right.,j}^{\left( i \right)} }}{{\left( {\sigma_{k,j}^{2} } \right)^{3} }}} \right] = \frac{{2\kappa_{k\left| k \right.,j}^{\left( i \right)} \left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} + 1} \right)\left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} + 2} \right)}}{{\left( {\gamma_{k\left| k \right.,j}^{\left( i \right)} } \right)^{2} }}. $$

(67)

Substituting Eqs. (66) and (67) into Eq. (64) and performing simple algebraic manipulation, we obtain

$$ \varvec{J}_{22,k\left( i \right)}^{jj} = {{\kappa_{k\left| k \right.,j}^{\left( i \right)} \left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} + 1} \right)\left( {2\kappa_{k\left| k \right.,j}^{\left( i \right)} + 7} \right)} \mathord{\left/ {\vphantom {{\kappa_{k\left| k \right.,j}^{\left( i \right)} \left( {\kappa_{k\left| k \right.,j}^{\left( i \right)} + 1} \right)\left( {2\kappa_{k\left| k \right.,j}^{\left( i \right)} + 7} \right)} {2\left( {\gamma_{k\left| k \right.,j}^{\left( i \right)} } \right)^{2} }}} \right. \kern-0pt} {2\left( {\gamma_{k\left| k \right.,j}^{\left( i \right)} } \right)^{2} }}. $$

(68)

Hence, the asymptotic Bayesian CRLB on $ \sigma_{k,j}^{2} $ is $ ABCLB_{i} \left( {\sigma_{k,j}^{2} } \right) = \left( {\varvec{J}_{22,k\left( i \right)}^{jj} } \right)^{ - 1} . $

To determine the asymptotic Bayesian CRLB of parameter $ \nu $, we use the logarithm of the VPD in Eq. (30). By Eq. (38), the VB information value of $ \nu $ is

$$ {\text{VBI}}_{k\left( i \right)} \left( {\nu_{k} } \right) = - E_{{q_{i} \left( {\nu_{k} } \right)}} \left[ {\frac{{\partial^{2} \ln q_{i} \left( {\nu_{k} } \right)}}{{\partial \nu_{k}^{2} }}} \right]. $$

(69)

Omitting the term not containing $ \nu_{k} $, the first derivative of $ \ln q\left( {\nu_{k} } \right) $ is given by

$$ \frac{{\partial \ln q_{i} \left( {\nu_{k} } \right)}}{{\partial \nu_{k} }} = \frac{1}{2}\ln \nu_{k} - \frac{{\Gamma ^{{\prime }} \left( {{{\nu_{k} } \mathord{\left/ {\vphantom {{\nu_{k} } 2}} \right. \kern-0pt} 2}} \right)}}{{\Gamma \left( {{{\nu_{k} } \mathord{\left/ {\vphantom {{\nu_{k} } 2}} \right. \kern-0pt} 2}} \right)}} + \frac{{a_{{k\left| {k - 1} \right.}} - 1}}{{\nu_{k} }}, $$

(70)

and its second derivative by

$$ \frac{{\partial^{2} \ln q_{i} \left( {\nu_{k} } \right)}}{{\partial \nu_{k}^{2} }} = \frac{1}{2}\frac{1}{{\nu_{k} }} + \frac{{1 - a_{{k\left| {k - 1} \right.}} }}{{\nu_{k}^{2} }} -\Psi ^{{\prime }} \left( {\frac{{\nu_{k} }}{2}} \right), $$

(71)

where $ \Psi ^{{\prime }} \text{(} \cdot \text{)} $ denotes the first derivative of the digamma function. Then, $ a_{k\left| k \right.}^{\left( i \right)} $ from Eq. (31) is replaced into Eq. (71) for Eq. (69) to become

$$ {\text{VBI}}_{k\left( i \right)} \left( {\nu_{k} } \right) = - \frac{1}{2}E_{{q_{i} \left( {\nu_{k} } \right)}} \left[ {\frac{1}{{\nu_{k} }}} \right] - E_{{q_{i} \left( {\nu_{k} } \right)}} \left[ {\frac{{{3 \mathord{\left/ {\vphantom {3 2}} \right. \kern-0pt} 2} - a_{k\left| k \right.}^{\left( i \right)} }}{{\nu_{k}^{2} }}} \right] + E_{{q_{i} \left( {\nu_{k} } \right)}} \left[ {\Psi ^{{\prime }} \left( {\frac{{\nu_{k} }}{2}} \right)} \right]. $$

(72)

Using the expectation of the Gamma distribution, we obtain

$$ E_{{q_{i} \left( {\nu_{k} } \right)}} \left[ {\frac{1}{{\nu_{k} }}} \right] = {{b_{k\left| k \right.}^{\left( i \right)} } \mathord{\left/ {\vphantom {{b_{k\left| k \right.}^{\left( i \right)} } {\left( {a_{k\left| k \right.}^{\left( i \right)} - 1} \right)}}} \right. \kern-0pt} {\left( {a_{k\left| k \right.}^{\left( i \right)} - 1} \right)}}\quad E_{{q_{i} \left( {\nu_{k} } \right)}} \left[ {\frac{1}{{\nu_{k}^{2} }}} \right] = {{\left( {b_{k\left| k \right.}^{\left( i \right)} } \right)^{2} } \mathord{\left/ {\vphantom {{\left( {b_{k\left| k \right.}^{\left( i \right)} } \right)^{2} } {\left[ {\left( {a_{k\left| k \right.}^{\left( i \right)} - 1} \right)\left( {a_{k\left| k \right.}^{\left( i \right)} - 2} \right)} \right]}}} \right. \kern-0pt} {\left[ {\left( {a_{k\left| k \right.}^{\left( i \right)} - 1} \right)\left( {a_{k\left| k \right.}^{\left( i \right)} - 2} \right)} \right]}}, $$

(73)

which substituted into Eq. (72) results in

$$ {\text{VBI}}_{k\left( i \right)} \left( {\nu_{k} } \right) = E_{{q_{i} \left( {\nu_{k} } \right)}} \left[ {\Psi ^{{\prime }} \left( {\frac{{\nu_{k} }}{2}} \right)} \right] - \frac{1}{2}\frac{{b_{k\left| k \right.}^{\left( i \right)} }}{{\left( {a_{k\left| k \right.}^{\left( i \right)} - 1} \right){\kern 1pt} }} - \frac{{\left( {b_{k\left| k \right.}^{\left( i \right)} } \right)^{2} \left( {{3 \mathord{\left/ {\vphantom {3 2}} \right. \kern-0pt} 2} - a_{k\left| k \right.}^{\left( i \right)} } \right)}}{{\left( {a_{k\left| k \right.}^{\left( i \right)} - 1} \right)\left( {a_{k\left| k \right.}^{\left( i \right)} - 2} \right){\kern 1pt} }}. $$

(74)

Hence, the asymptotic Bayesian CRLB on $ \nu $ is $ ABCLB_{i} \left( {\nu_{k} } \right) $$ = {1 \mathord{\left/ {\vphantom {1 {{\text{VBI}}_{k\left( i \right)} \left( {\nu_{k} } \right)}}} \right. \kern-0pt} {{\text{VBI}}_{k\left( i \right)} \left( {\nu_{k} } \right)}} $.

For parameter $ \varvec{x}_{k} $, when its posterior distribution $ q\left( {\varvec{\theta}\left| \varvec{Z} \right.} \right) $ becomes $ p\left( {\varvec{\theta}\left| \varvec{Z} \right.} \right) $, $ \hat{\varvec{x}}_{k\left| k \right.}^{\left( i \right)} $ asymptotically attains the asymptotic Bayesian CRLB corresponding to the following Fisher information matrix:

$$ \varvec{J}_{k\left( i \right)}^{{\varvec{xx}}} = - E_{{q_{i} \left( \varvec{x} \right)q_{i - 1} \left( \theta \right)}} \left[ {\frac{{\partial^{2} \ln q\left( {\varvec{\theta}_{k} ,\varvec{x}_{k} ,\varvec{z}_{k} } \right)}}{{\partial \varvec{x}_{k} \partial \varvec{x}_{k}^{T} }}} \right] = - E_{{q_{i} \left( \varvec{x} \right)q_{i - 1} \left( \theta \right)}} \left[ {\frac{{\partial^{2} \ln q\left( {\varvec{x}_{k} \left| {\varvec{\theta}_{k} ,\varvec{z}_{k} } \right.} \right)}}{{\partial \varvec{x}_{k} \partial \varvec{x}_{k}^{T} }}} \right], $$

(75)

which is only function of $ q_{i} \left( {\varvec{\theta}_{k} } \right) $ given the quadratic nature of $ \ln q\left( {\varvec{x}_{k} \left| {\varvec{\theta}_{k} ,\varvec{z}_{k} } \right.} \right) $ in $ \varvec{x}_{k} $, and by Eq. (15) we obtain

$$ \varvec{J}_{k\left( i \right)}^{{\varvec{xx}}} = E_{{q_{i - 1} \left( \theta \right)}} \left[ {\left( {\varvec{P}_{k\left| k \right.}^{\left( i \right)} \left( {\varvec{\theta}_{k} } \right)} \right)^{ - 1} } \right], $$

(76)

where $ \varvec{\theta}_{k} $ includes $ u_{k} $ and $ {\varvec{\Lambda}}_{k} $ as shown in Eq. (16). Then, we obtain $ ABCLB_{i} \left( {\varvec{x}_{k} } \right) = \left( {\varvec{J}_{k\left( i \right)}^{{\varvec{xx}}} } \right)^{ - 1} $, which is identical to the derived covariance matrix of $ \varvec{x}_{k} $, indicating that the VB update for $ \varvec{x}_{k} $ is an extension of the minimum MSE estimation when the estimate is intractable for model parameter inaccuracies.

5 Results and Discussion

We conducted a numerical simulation to test the performance of the proposed method and verify the derived asymptotic Bayesian CRLB. Then, SINS/DVL integrated navigation was tested to further evaluate the proposed filter algorithm.

5.1 Stochastic Resonator Model

For the first simulation, we used a randomly drifting stochastic resonator [17], which is a typical signal detection model widely used in the field of sensor measurements in areas such as navigation information acquisition, spread spectrum communication, and biological instruments. The state–state model is expressed as

$$\begin{aligned} \left[ {\begin{array}{*{20}c} {\varvec{x}_{k,1} } \\ {\varvec{x}_{k,2} } \\ {\varvec{x}_{k,3} } \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} 1 & 0 & 0 \\ 0 & {\cos \left( {\omega \Delta t} \right)} & {\frac{{\sin \left( {\omega \Delta t} \right)}}{\omega }} \\ 0 & { - \omega \sin \left( {\omega \Delta t} \right)} & {\cos \left( {\omega \Delta t} \right)} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\varvec{x}_{k - 1,1} } \\ {\varvec{x}_{k - 1,2} } \\ {\varvec{x}_{k - 1,3} } \\ \end{array} } \right] + \varvec{w}_{k} , \end{aligned}$$

(77)

and the measurement model as

$$\begin{aligned} \varvec{z}_{k} = \left[ {\begin{array}{*{20}c} 0 & 0 & 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {\varvec{x}_{k,1} } \\ {\varvec{x}_{k,2} } \\ {\varvec{x}_{k,3} } \\ \end{array} } \right] + \varvec{v}_{k} , \end{aligned} $$

(78)

where $ \varvec{w}_{k} = \left[ {\begin{array}{*{20}c} {\varvec{w}_{k,1} } & {\varvec{w}_{k,2} } & {\varvec{w}_{k,3} } \\ \end{array} } \right]^{\text{T}} $, measurement noise $ \varvec{v}_{k} $ is an unknown random variable, sampling time $ \Delta t = 0.4 $ s, and angular velocity ω = 0.05 rad/s. The initial state was assumed to follow Gaussian distribution $ \varvec{x}_{0,i} \sim N\left( {0,1} \right) $ for $ i = 1,2,3 $. In addition, process noise was considered as a Gaussian distribution with zero mean and covariance matrix $ \varvec{Q}_{0} = {\text{diag}}\left( {\left[ {\begin{array}{*{20}c} {{0}{.01}} & {{0}{.01}} & {{0}{.01}} \\ \end{array} } \right]} \right) $, and the measurement noise was modeled as a Student’s t-distribution.

We evaluated the robustness of the proposed algorithm considering Gaussian measurement noise with varying mean and variance over time. Specifically, for $ t \in \left[ {1\,{\text{s}},100\,{\text{s}}} \right] $ and $ t \in \left[ {601\,{\text{s}},1000\,{\text{s}}} \right] $, $ \varvec{v}\left( t \right) \sim N\left( {{\mathbf{0}},\varvec{R}} \right) $ with $ \varvec{R} = 1 $, whereas for $ t \in \left[ {101\,{\text{s}},600\,{\text{s}}} \right] $, $ \varvec{v}\left( t \right) \sim N\left( {6,25 \times \varvec{R}} \right) $. Then, we performed state and parameter estimation for the mean and variance of the measurement noise using the proposed algorithm. We set the number of iterations for convergence to 10.

Figure 2 shows the estimated and true mean, indicating that the proposed filter can gradually track the true value from 100 to 600 s. Moreover, when the distribution changes at 600 s, the filter quickly adapts. Figure 3 depicts the variance estimate and compares it with that obtained from the VB-KF proposed in [17]. When the true variance changes, estimation using the proposed filter follows this change more quickly than VB-KF. However, when the variance remains unchanged from 100 to 600 s, the estimation of the proposed filter shows more intense oscillations than that of the VB-KF, indicating the lower consistency of the algorithm in this aspect compared to the VB-KF.

Figure 4 and Table 1, respectively, show the estimation error curves and root MSE (RMSE) of the states. For constant Gaussian noise, the state estimation results using the VB methods outperform those using the conventional KF, and the performance of proposed filter is superior to that of VB-KF, which can be attributed to the inclusion of all noise parameters in the estimation of the proposed filter.

Table 1 RMSE of states by applying the proposed filter and similar methods to a stochastic resonator model

Full size table

Figures 5 and 6 show the calculated RMSE of mean and variance, respectively, and their theoretical $ \sqrt {\varvec{ABCLB}} $. The results verify that the asymptotic Bayesian CRLB is very close to the lower bounds of variance estimation for the posterior distributions of mean and variance under Gaussian and even non-Gaussian noise. Note that ABCLB provides the MSE performance bound of parameter estimation online.

5.2 Experiments on Navigation Data

We also evaluated the proposed filter on a SINS/DVL shipborne test for velocity estimation and compared it to the conventional KF and VB-KF. The experimental platform is composed of a SINS, a DVL, and a GPS receiver. The body angular rate and specific force were measured by gyroscopes and accelerometers, respectively, at a rate of 100 Hz, and GPS data were sampled at 1 Hz to provide accurate position and velocity information for the integrated SINS/DVL.

The filter was indirectly applied by considering navigation parameter errors as system state variables and using output correction to modify the SINS parameters. The dynamic model of the SINS/DVL navigation error is given by

$$ \dot{\varvec{X}}\left( t \right) = \varvec{A}\left( t \right)\varvec{X}\left( t \right) + \varvec{B}(t)\varvec{w}\left( t \right), $$

(79)

where state vector $ \varvec{X} $ is composed of seven navigation error variables, $ \varvec{A} $ is the state transition matrix of the system, and $ \varvec{B} $ is the noise matrix, as detailed in [7]. Variable $ \varvec{w} $ represents the Gaussian process noise with zero mean and covariance matrix $ \varvec{Q} $, representing the inertial sensor bias consisting of accelerometer and gyroscope biases, which are given by

$$ \begin{aligned} \varvec{X}\left( t \right) & = \left[ {\begin{array}{*{20}c} {\delta L} & {\delta \lambda } & {\delta V_{e} } & {\delta V_{n} } & {\varphi_{e} } & {\varphi_{n} } & {\varphi_{u} } \\ \end{array} } \right]^{\text{T}} \\ \varvec{w}\left( t \right) & = \left[ {\begin{array}{*{20}c} 0 & 0 & {w_{ax} } & {w_{ay} } & {w_{gx} } & {w_{gy} } & {w_{gz} } \\ \end{array} } \right]^{\text{T}} , \\ \end{aligned} $$

(80)

where $ \delta L $ and $ \delta \lambda $ are the latitude and longitude errors, respectively, $ \delta V_{e} $ and $ \delta V_{n} $ are velocity errors, subscripts e, n, and u denote the east, north, and up components in the navigation frame, $ \varphi_{e} $, $ \varphi_{n} $, and $ \varphi_{u} $ are the pitch, roll, and heading errors, respectively, and $ x,y $, and $ z $ denote the right, front, and up components in the body frame, respectively. The transformation from the body frame to the navigation frame is given by direction cosine matrix $ C_{b}^{n} $ in [7].

The measurement model utilizes a loosely coupled method. Hence, measurements are determined from the level velocity errors between the DVL and SINS, from which the measurement model is formulated as

$$ \varvec{z}\left( t \right) = \varvec{Hx}\left( t \right) + \varvec{v}\left( t \right) = \left[ {\begin{array}{*{20}c} {v_{e}^{\text{SINS}} } \\ {v_{n}^{\text{SINS}} } \\ \end{array} } \right] - \left[ {\begin{array}{*{20}c} {v_{e}^{\text{DVL}} } \\ {v_{n}^{\text{DVL}} } \\ \end{array} } \right], $$

(81)

where $ \varvec{z} $ is the measurement vector, $ \varvec{H} = \left[ {\begin{array}{*{20}c} {{\mathbf{0}}_{2 \times 2} } & {\varvec{I}_{2 \times 2} } & {{\mathbf{0}}_{2 \times 3} } \\ \end{array} } \right] $ is the measurement matrix, and $ \varvec{v} $ is the measurement noise. The discretized form of dynamic Eq. (79) has the same form as the model in Eq. (1). The diagram of the SINS/DVL integrated navigation is shown in Fig. 7.

When measurements were available, we used the output of the filter to correct the solutions of the SINS. The voyage data from the sea vehicle were collected in the East China Sea for a trajectory of a long endurance test over 7 h. The SINS trajectory and velocities from the DVL are shown in Figs. 8 and 9, respectively. Further, the performance parameters of the gyroscope and accelerometer are listed in Table 2.

Table 2 Parameters of gyroscopes and accelerometers used to obtain data for testing the proposed filter

Full size table

We tested only the SINS for the first 2 h and then integrated the DVL and GPS data. The DVL provides velocities with ± 1% accuracy of the speed and updates at 10 Hz, which was the filter update rate when measurements were available. In addition, the sea status during hours 4–5 of the voyage was rough, leading to the oscillations in velocity measurements shown in Fig. 9, where non-Gaussian noise appears.

The initial latitude and longitude of the sea vehicle were 31.25°N and 121.76°E, respectively. The initial east and north velocity were 6 and − 5.6 m/s (i.e., heading south). The iterative update of the VB-KF and proposed algorithm were set to $ N = 10 $. The initial filter parameters were selected as $ \hat{\varvec{x}}_{0\left| 0 \right.} = {\mathbf{0}}_{ 7\times 1} $, $ \varvec{P}_{0\left| 0 \right.} = {\text{diag}}\left\{ {\begin{array}{*{20}c} {\left( {1000\,{\text{m}}} \right)^{2} } & {\left( {1000\,{\text{m}}} \right)^{2} } & {\left( {0.1\,{\text{m/s}}} \right)^{2} } & {\left( {0.1\,{\text{m/s}}} \right)^{2} } & {\left( {0.1^{ \circ } } \right)^{2} } & {\left( {0.1^{ \circ } } \right)^{2} } & {\left( {0.1^{ \circ } } \right)^{2} } \\ \end{array} } \right\} $, $ \varvec{\mu}_{0} = {\mathbf{0}}_{2 \times 1} $, $ \varvec{R}_{0} = {\text{diag}}\left\{ {\begin{array}{*{20}c} {\left( {0. 5\,{\text{m/s}}} \right)^{2} } & {\left( {0. 5\,{\text{m/s}}} \right)^{2} } \\ \end{array} } \right\} $, $ \varvec{Q} = {\text{diag}}\left\{ {\begin{array}{*{20}c} {\left( {100\,\upmu{\text{g}}} \right)^{2} } & {\left( {100\,\upmu{\text{g}}} \right)^{2} } & {\begin{array}{*{20}c} {\left( {0.01^{ \circ } / {\text{h}}} \right)^{2} } & {\left( {0.01^{ \circ } / {\text{h}}} \right)^{2} } \\ \end{array} } \\ \end{array} } \right.\left. {\left( {0.01^{ \circ } / {\text{h}}} \right)^{2} } \right\} $. For the proposed filter and VB-KF, the initial hyperparameters were set to $ \lambda_{0\left| 0 \right.} = 0.1 $, $ {\mathbf{U}}_{0\left| 0 \right.} = {\text{diag}}\left\{ {\begin{array}{*{20}c} {\left( {1\,{\text{m/s}}} \right)^{2} } & {\left( {1\,{\text{m/s}}} \right)^{2} } \\ \end{array} } \right\} $, $ a_{0\left| 0 \right.} = b_{0\left| 0 \right.} = 0.12 $, $ \varvec{\eta}_{0\left| 0 \right.} = \left[ {\begin{array}{*{20}c} {0.1\,{\text{m/s}}} & {0.1\,{\text{m/s}}} \\ \end{array} } \right]^{\text{T}} $, $ \beta_{0\left| 0 \right.} = 2 $, and forgetting factor was set to $ \rho = 1 - \exp \left( { - 4} \right) $. The velocity errors compared with the other two methods are shown in Fig. 10, and the corresponding RMSE values and running times are listed in Table 3.

Table 3 State RMSE and average running time of evaluated methods applied on navigation data

Full size table

From Figs. 9 and 10 and Table 3, we can see that for Gaussian noise, the standard KF shows the best performance for velocity estimation. However, the RMSE in hours 4–5 shows that KF is sensitive to noise outliers and tends to diverge over some periods. In contrast, the two methods based on VB are more stable and show higher performances than KF under outliers. For estimation of east velocity, the proposed filter slightly outperforms VB-KF, whereas for north velocity, the VB-KF is slightly superior to the proposed filter. Regarding average running time (Table 3), the proposed algorithm takes approximately 2–3 times the running time of the VB-KF and 7–8 times that of KF for all the methods executed on a computer with Intel Core i3-4170 CPU at 3.70 GHz.

6 Conclusion

We propose a linear approximate filter with parameter estimation under Student-t measurement model. Specifically, we derive a variational recursive formula to determine state and noise parameters. Then, the sequential variational CRLB is derived for the estimation variance of the proposed algorithm. Two numerical simulations, on a theoretical model and measurement data, were performed and demonstrate that under varying parameters of measurement noise, the proposed method shows better performance than comparison methods, and the derived CRLB agrees with the lower bound of the estimation variance.

References

G. Agamennoni, J.I. Nieto, E.M. Nebot, Approximate inference in state-space models with heavy-tailed noise. IEEE Trans. Signal Process. 60(10), 5024–5037 (2012)
Article MathSciNet MATH Google Scholar
J. Ala-Luhtala, S. Särkkä, R. Piché, Gaussian filtering and variational approximations for Bayesian smoothing in continuous-discrete stochastic dynamic systems. Signal Process. 111, 124–136 (2015)
Article Google Scholar
T. Baldacchino, E.J. Cross, K. Worden, J. Rowson, Variational Bayesian mixture of experts models and sensitivity analysis for nonlinear dynamical systems. Mech. Syst. Signal Process. 66, 178–200 (2016)
Article Google Scholar
J.M. Bernardo, M.J. Bayarri, J.O. Berger, A.P. Dawid, D. Heckerman, A.F.M. Smith, M. West, The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. Bayesian Stat. 7, 453–464 (2003)
MathSciNet Google Scholar
D.S. Bernstein, Matrix Mathematics (Princeton University Press, Princeton, 2009)
Book MATH Google Scholar
C.M. Bishop, Pattern Recognition and Machine Learning, 2nd edn. (Springer, Berlin, 2007)
MATH Google Scholar
W. Gao, J.C. Li, G.T. Zhou, Q. Li, Adaptive Kalman filtering with recursive noise estimator for integrated SINS/DVL systems. J. Navig. 68(1), 142–161 (2015)
Article Google Scholar
Y.L. Huang, Y.G. Zhang, Robust Student’s t-based stochastic cubature filter for nonlinear systems with heavy-tailed process and measurement noises. IEEE Access 5, 7964–7974 (2017)
Article Google Scholar
Y.L. Huang, Y.G. Zhang, N. Li, Z. Shi, Design of Gaussian approximate filter and smoother for nonlinear systems with correlated noises at one epoch apart. Circuits Syst. Signal Process. 35(11), 3981–4008 (2016)
Article MATH Google Scholar
Y.L. Huang, Y.G. Zhang, N. Li, S.M. Naqvi, J.A. Chambers, A robust and efficient system identification method for a state-space model with heavy-tailed process and measurement noises, in 19th International Conference on Information Fusion (Heidelberg, 2016), pp. 441–448
Y.L. Huang, Y.G. Zhang, N. Li, Z.M. Wu, J.A. Chambers, A novel robust Student’s t-based Kalman filter. IEEE Trans. Aerosp. Electron. Syst. 53(3), 1545–1554 (2017)
Article Google Scholar
E.G. Larsson, E.K. Larsson, The Cramér–Rao bound for continuous-time autoregressive parameter estimation with irregular sampling. Circuits Syst. Signal Process. 21(6), 581–601 (2002)
Article MATH Google Scholar
N. Nasios, A.G. Bors, Variational learning for Gaussian mixture models. IEEE Trans. Syst. Man Cybern. B Cybern. 36(4), 849–862 (2006)
Article Google Scholar
E. Özkan, V. Smidl, S. Saha, C. Lundquist, F. Gustafsson, Marginalized adaptive particle filtering for nonlinear models with unknown time-varying noise parameters. Automatica 49(6), 1566–1575 (2013)
Article MathSciNet MATH Google Scholar
M. Roth, E. Özkan, F. Gustafsson, A Student’s filter for heavy-tailed process and measurement noise, in Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (2013), pp. 5770–5774
S. Särkkä, J. Hartikainen, Variational Bayesian adaptation of noise covariance in nonlinear Kalman filtering, in Proceedings of International Workshop on Machine Learning for Signal Processing (Southampton, 2013), pp. 1–6
S. Särkkä, A. Nummenmaa, Recursive noise adaptive Kalman filtering by variational Bayesian approximations. IEEE Trans. Autom. Control 54(3), 596–600 (2009)
Article MathSciNet MATH Google Scholar
I.C. Schick, S.K. Mitter, Robust recursive estimation in the presence of heavy-tailed observation noise. Ann. Stat. 2, 1045–1080 (1994)
Article MathSciNet MATH Google Scholar
A. Solin, S. Särkkä, State space methods for efficient inference in Student-t process regression, in Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS) (San Diego, 2015), pp. 885–893
P. Tichavský, C.H. Muravchik, A. Nehorai, Posterior Cramér–Rao bounds for discrete-time nonlinear filtering. IEEE Trans. Signal Process. 46(5), 1386–1396 (1998)
Article Google Scholar
D.G. Tzikas, A.C. Likas, N.P. Galatsanos, The variational approximation for Bayesian inference. IEEE Signal Process. Mag. 25(6), 131–146 (2008)
Article Google Scholar
S.Y. Wang, C. Yin, S.K. Duan, L.D. Wang, A modified variational Bayesian noise adaptive Kalman filter. Circuits Syst. Signal Process. 36(10), 4260–4277 (2017)
Article MATH Google Scholar
X. Wei, C. Li, The infinite Student’s t-mixture for robust modeling. Signal Process. 92(1), 224–234 (2012)
Article Google Scholar
H. Zhu, H. Leung, Z. He, A variational Bayesian approach to robust sensor fusion based on Student-t distribution. Inf. Sci. 221, 201–214 (2013)
Article MATH Google Scholar
H. Zhu, H. Leung, Z.S. He, State estimation in unknown non-Gaussian measurement noise using variational Bayesian technique. IEEE Trans. Aerosp. Electron. Syst. 49(4), 2601–2614 (2013)
Article Google Scholar
L. Zuo, R.X. Niu, P.K. Varshney, Conditional posterior Cramér–Rao lower bounds for nonlinear sequential Bayesian estimation. IEEE Trans. Signal Process. 59(1), 1–14 (2011)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grants 61773133 and the Fundamental Research Funds for the Central Universities under Grants HEUCF181110.

Author information

Authors and Affiliations

College of Science, Harbin Engineering University, Harbin, 150001, China
Zongyuan Wang
College of Automation, Harbin Engineering University, Harbin, 150001, China
Zongyuan Wang & Weidong Zhou

Authors

Zongyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weidong Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Z., Zhou, W. Robust Linear Filter with Parameter Estimation Under Student-t Measurement Distribution. Circuits Syst Signal Process 38, 2445–2470 (2019). https://doi.org/10.1007/s00034-018-0972-8

Download citation

Received: 02 May 2018
Revised: 23 October 2018
Accepted: 24 October 2018
Published: 03 November 2018
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s00034-018-0972-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Robust Linear Filter with Parameter Estimation Under Student-t Measurement Distribution

Abstract

Similar content being viewed by others

Conditionally Minimax Nonlinear Filter and Unscented Kalman Filter: Empirical Analysis and Comparison

An Adaptive Robust Student’s t-Based Kalman Filter Based on Multi-sensor Fusion

A New Heavy-Tailed Robust Kalman Filter with Time-Varying Process Bias

1 Introduction

2 Problem Formulation

3 Estimation Using VB Approach

3.1 State and Parameter Estimation Under Student-t Measurement Noise

3.2 Hyperparameter Update

3.3 Filter Algorithm

4 Error Bounds of VB Estimator

5 Results and Discussion

5.1 Stochastic Resonator Model

5.2 Experiments on Navigation Data

6 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation