1 Introduction

According to Turner and Rasmussen (2012), three of the most used approximate solutions to the problem of nonlinear filtering are the unscented Kalman filter (UKF; Julier and Uhlmann 1997), the extended Kalman filter (EKF; Maybeck 1979) and the cubature Kalman filter (CKF; Arasaratnam and Haykin 2009).

The EKF has been the most popular nonlinear filter of the last decades (Psiaki 2013). It is nonetheless known that the first-order Taylor series approximations used by the filter can hinder its applicability in situations where nonlinearities are significant, or when the necessary Jacobians are difficult to obtain. In spite of its known shortcomings, the EKF is still popular, probably due to its very low computational cost.

The CKF was described by its authors as being numerically more accurate and stable than the UKF [see Section VII of Arasaratnam and Haykin (2009)]. It is though known that the CKF can be reduced to a particular case of the UKF (Turner and Rasmussen 2012), when \(\alpha =1\), \(\beta =0\) and \(\kappa = 0\).

The UKF is known to be able to achieve better estimation performance than the EKF, although both filters display the same order of computational complexity (Wan and van der Merwe 2000). To yield better state estimates than the EKF, the UKF resorts to the UT (Julier and Industries 2002) to properly propagate the mean and covariance of the state estimate through the nonlinear functions of the dynamic system. To work well, the UT demands the tuning of its three scalar parameters (\(\alpha ,~\beta \) and \(\kappa \)), a difficult task given the lack of comprehensive theoretical guidance for picking the parameter values. A common tuning strategy is given in Julier et al. (2000), which is to make \(\kappa = 3 - n_{x}\) Footnote 1, where \(n_x\) is the dimension of the state vector. We call the UKF tuned with such heuristic the default UKF (UkfD). This strategy, though, may produce poor results.

To fulfill the need for more effective tuning strategies, a number of recently published works approach the tuning of the UT as a numerical optimization problem.

Abbeel et al. (2005) propose to use machine learning algorithms (Barber 2012) to automatically tune the covariance matrices of both the process and measurement noises of a Kalman filter. Despite not dealing with the UT, this reference is important because it approaches the tuning of the noise covariance matrices as an optimization problem guided by the maximization of the likelihood of the measurements taken from the dynamical system. The maximization of such likelihood was later used in a number of UT tuning works (Dunik et al. 2012; Turner and Rasmussen 2012; Straka et al. 2014).

In Sakai and Kuroda (2010), values for both the UT parameters and the noise covariance matrices are obtained by offline optimization. The optimization process, nonetheless, demands the availability of highly accurate measurements of the state vector or at least of part of it, reducing the applicability of the approach.

An online adaptation strategy of the scaling parameter \(\kappa \) is proposed in Dunik et al. (2010, 2012), Straka et al. (2012, 2014) and Scardua and da Cruz (2015). Under the proposed strategy, at each filtering iteration, the scaling parameter \(\kappa \) is selected from a discrete set of possible values, while keeping \(\alpha =1\) and \(\beta =0\). A value for \(\kappa \) is selected each time a measurement is received by the filter. The chosen value is the one that maximizes a given performance criterion, such as the likelihood of the measurements received by the filter. The proposed strategy is claimed to be able to cope with changes in the operating point of the dynamical system, but the selection of values for \(\kappa \) demands the realization of extra measurement update steps, entailing reasonable computational cost.

The machinery of model-based optimization (Forrester et al. 2008) was used for tuning all three UT parameters by Turner and Rasmussen (2010) and Turner and Rasmussen (2012). The optimization seeks to find values for the UT parameters that maximize the popular upper confidence bound (UCB) criterion (Cox and John 1997). The only data needed from the original dynamical system are the measurements which would be regularly available for the filter. The proposed approach does not increase the computational cost of the UKF, but confidence bound criteria are known to be sensitive to the tuning of the parameter that controls the exploration/exploitation trade-off (Jones 2001). Moreover, the algorithms involved in model-based optimization can be relatively complex to implement. We call UkfO the UKF that stems from this optimization approach.

We also approach the tuning of the UT parameters as an optimization problem. We propose a search method that is based on ideas of the BSF (Gordon et al. 1993). The optimization process is guided solely by noisy measurements of the nonlinear dynamic system that is to be filtered. The search space is a cube in the Euclidean space, with each dimension corresponding to one of the UT parameters. The points in the cube are called particles. The tuning is performed offline, thus incurring no extra cost for the filter during runtime. We call PfUkf the UKF tuned with the proposed optimizer.

The rest of this paper is organized as follows. Section 2 gives the necessary background on the UT, showing how the UT parameters influence the estimates produced by the UKF. Section 3 describes how the tuning of the UT parameters was approached as an optimization problem. Section 4 justifies why a direct search algorithm would be suitable to the tuning of the UT. Section 5 describes the optimization algorithm developed for the task. Section 6 describes the numerical testing of the filters, and Sect. 7 presents the final comments.

2 The Unscented Kalman Filter

The discrete-time nonlinear filtering problem consists in estimating \(p(\mathbf {x}_{k}|\mathbf {y}_{1:k})\), the marginal distribution of the current state \(\mathbf {x}_{k}\) of a discrete-time dynamic system, given the sequence of measurements \(\mathbf {y}_{1:k}=\{\mathbf {y}_{1},\ldots ,\mathbf {y}_{k}\}\).

The sequential Bayesian filter (Candy 2009) provides optimal solution to the discrete-time nonlinear filtering problem. To this end, the filter alternates between prediction and correction steps. The prediction step estimates \(p(\mathbf {x}_k|\mathbf {y}_{1:k-1})\), the conditional probability distribution of the state at time instant k, given both the measurements available at time instant \(k-1\) and the transition probability \(p(\mathbf {x_k|x_{k-1}})\), which is determined by the dynamic model. The correction step uses measurement \(\mathbf {y}_k\) and the distribution \(p(\mathbf {y_{k}|x_{k}})\) given by the measurement model to generate \(p(\mathbf {x}_k|\mathbf {y}_{1:k})\), the corrected conditional probability distribution of \(\mathbf {x}_k\). The sequential Bayesian filter equations are

$$\begin{aligned} p(\mathbf {x}_k|\mathbf {y}_{1:k-1})= & {} \int p(\mathbf {x}_k|\mathbf {x}_{k-1})p(\mathbf {x}_{k-1}|\mathbf {y}_{1:k-1})\mathrm{d}\mathbf {x}_{k-1} \nonumber \\ p(\mathbf {y}_k|\mathbf {y}_{1:k-1})= & {} \int p(\mathbf {y}_k|\mathbf {x}_k)p(\mathbf {x}_k|\mathbf {y}_{1:k-1})\mathrm{d}\mathbf {x}_k \nonumber \\ p(\mathbf {x}_k|\mathbf {y}_{1:k})= & {} \frac{p(\mathbf {y}_k|\mathbf {x}_k)p(\mathbf {x}_k|\mathbf {y}_{1:k-1})}{p(\mathbf {y}_k|\mathbf {y}_{1:k-1})}\,. \end{aligned}$$
(1)

A closed-form solution to (1) can only be found in specific situations. One such situation arises when the dynamic system is linear with additive Gaussian noise. In this case, the required distributions are Gaussian and the solution is the Kalman filter [Section 5.3 of Candy (2009)]. When the dynamic system is nonlinear, it is necessary to resort to approximate solutions.

One type of approximate solution to (1) is obtained when the probability distributions involved are assumed to be Gaussian. This approach yields the so-called Gaussian filters (Wu et al. 2006). It then suffices to calculate the necessary means and covariances, in order to completely specify such distributions.

Gaussian filters feature a general form, from which stem a number of popular nonlinear filters (Wu et al. 2006), such as the EKF, the CKF and the UKF. To provide further context to the present discussion, let us define that the discrete-time nonlinear dynamic system model used in the nonlinear filtering problem is given by

$$\begin{aligned} \mathbf {x}_k= & {} f(\mathbf {x}_{k-1}) + \mathbf {w}_{k-1}\nonumber \\ \mathbf {y}_{k}= & {} h(\mathbf {x}_k) + \mathbf {v}_k \,, \end{aligned}$$
(2)

where \(f(\cdot )\) and \(h(\cdot )\) are known functions, \(\mathbf {w}_{k-1}\sim \mathcal {N}(\mathbf {0},\mathbf {Q})\) is the process noise, and \(\mathbf {v}_{k}\sim \mathcal {N}(\mathbf {0},\mathbf {R})\) is the measurement noise. Both noises are white and uncorrelated with each other and with the initial system state.

The general Gaussian filter uses a Kalman-like structure to completely address the estimation task for system (2) (Wu et al. 2006). The equations of the additive noise general Gaussian filter are [Algorithm 6.3 of Sarkka (2013)]:

Prediction

$$\begin{aligned} \mathbf {m}_{k}^{-}= & {} \int f\left( \mathbf {x}_{k-1}\right) \mathcal {N}\left( \mathbf {x}_{k-1}|\mathbf {m}_{k-1},\mathbf {P}_{k-1}\right) \mathrm{d}\mathbf {x}_{k-1}\nonumber \\ \mathbf {P}_{k}^{-}= & {} \int \left( f\left( \mathbf {x}_{k-1}\right) -\mathbf {m}_{k}^{-}\right) \left( f\left( \mathbf {x}_{k-1}\right) -\mathbf {m}_{k}^{-}\right) ^{T} \nonumber \\&\mathcal {N} \left( \mathbf {x}_{k-1}|\mathbf {m}_{k-1},\mathbf {P}_{k-1}\right) \mathrm{d}\mathbf {x}_{k-1} + \mathbf {Q}_{k-1} \end{aligned}$$
(3)

Update

$$\begin{aligned} {\varvec{\mu }}_{k}= & {} \int h\left( \mathbf {x}_{k}\right) \mathcal {N}\left( \mathbf {x}_{k}|\mathbf {m}_{k}^{-},\mathbf {P}_{k}^{-}\right) \mathrm{d}\mathbf {x}_{k}\nonumber \\ \mathbf {S}_{k}= & {} \int \left( h\left( \mathbf {x}_{k}\right) -{\varvec{\mu }}_{k}\right) \left( h\left( \mathbf {x}_{k}\right) -{\varvec{\mu }}_{k}\right) ^{T}\mathcal {N}\left( \mathbf {x}_{k}|\mathbf {m}_{k}^{-},P_{k}^{-}\right) \mathrm{d}\mathbf {x}_{k}\nonumber \\&+\,\mathbf {R}_{k}\nonumber \\ \mathbf {C}_{k}= & {} \int \left( \mathbf {x}_{k}-\mathbf {m}_{k}^{-}\right) \left( h\left( \mathbf {x}_{k}\right) -{\varvec{\mu }}_{k}\right) ^{T}\mathcal {N}\left( \mathbf {x}_{k}|\mathbf {m}_{k}^{-},\mathbf {P}_{k}^{-}\right) \mathrm{d}\mathbf {x}_{k}\nonumber \\ \mathbf {K}_k= & {} \mathbf {C}_k \mathbf {S}_{k}^{-1}\nonumber \\ \mathbf {m}_k= & {} \mathbf {m}_k^{-} + \mathbf {K}_{k}\left( \mathbf {y}_k - {\varvec{\mu }}_k\right) \nonumber \\ \mathbf {P}_k= & {} \mathbf {P}_{k}^{-}-\mathbf {K}_{k}\mathbf {S}_{k}\mathbf {K}_{k}^{T} \end{aligned}$$
(4)

Different approaches to solving the moment integrals in (3) and (4) give rise to different Gaussian filters (Wu et al. 2006). Cubature rules give rise to the CKF. Approximating functions \(f(\cdot )\) and \(g(\cdot )\) with first-order Taylor series gives rise to the first-order EKF, and using the UT to approximately calculate those moment integrals gives rise to the UKF.

The UKF thus uses the UT to approximately compute the mean and covariance of a random variable \(\varvec{y}\) that results from a nonlinear transformation \(\varvec{h}(\cdot )\) of a Gaussian random variable \(\varvec{x}\sim \mathcal {N}(\bar{\varvec{x}},\varvec{P}_x)\), with \(\varvec{x}\in \mathfrak {R}^{n_x}\).

To this end, the dimensionless UT parameters \(\{\alpha ,\beta ,\kappa \}\in \mathfrak {R}^{+}\) are used to calculate a set of \(2n_x + 1\) deterministic points (sigma points) and weights. The sigma points are given by

$$\begin{aligned} {\varvec{\mathcal {X}}}_{0}= & {} {\varvec{\bar{x}}}\nonumber \\ {\varvec{\mathcal {X}}}_{i}= & {} {\varvec{\bar{x}}}+\left( \sqrt{(n_x + \lambda )\mathbf {P_x}} \right) _{i}\quad i = 1,\ldots ,n_x\nonumber \\ {\varvec{\mathcal {X}}}_{i}= & {} {\varvec{\bar{x}}}-\left( \sqrt{(n_x + \lambda )\mathbf {P_x}} \right) _{i} \quad i = n_x + 1,\ldots ,2n_x, \end{aligned}$$
(5)

where \((\cdot )_i\) denotes the ith column of the matrix and \(\lambda \) is a scaling parameter given by

$$\begin{aligned} \lambda = \alpha ^{2}(n_x + \kappa ) - n_x \ . \end{aligned}$$
(6)

The corresponding weights are

$$\begin{aligned} w_{0}^{(m)}= & {} \frac{\lambda }{n_x + \lambda } \nonumber \\ w_{0}^{(c)}= & {} \frac{\lambda }{n_x + \lambda } + \left( 1-\alpha ^2 + \beta \right) \nonumber \\ w_{i}^{(m)}= & {} w_{i}^{(c)} = \frac{\lambda }{2(n_x + \lambda )}\,. \end{aligned}$$
(7)

Sigma points \({\varvec{\mathcal {X}}}_{\varvec{i}}\) are then submitted to the nonlinear transformation \(\varvec{h}(\cdot )\), yielding

$$\begin{aligned} {\varvec{\mathcal {Y}}}_{i} = \varvec{h}({\varvec{\mathcal {X}}}_{i}) \quad i = 0,\ldots ,2n_x. \end{aligned}$$
(8)

Finally, the first two moments of \(\varvec{y}\) are approximated as

$$\begin{aligned} {\bar{\varvec{y}}}\approx & {} \sum _{i=0}^{2n_x}w_{i}^{(m)}{\varvec{\mathcal {Y}}}_i \nonumber \\ {\varvec{P}}_{\varvec{y}}\approx & {} \sum _{i=0}^{2n_x}w_{i}^{(c)} \left( {\varvec{\mathcal {Y}}}_i-\varvec{\bar{y}}\right) \left( {\varvec{\mathcal {Y}}}_i-\varvec{\bar{y}}\right) ^{T}. \end{aligned}$$
(9)

The results yielded by the UT thus depend heavily on the values of \(\alpha \)\(\beta \) and \(\kappa \). Accordingly, the quality of the UKF state estimates also depends heavily on such values. Properly tuning the UT parameters is hence essential to obtaining good UKF performance.

3 Approaching the Tuning of the UKF as an Optimization Problem

The UT tuning process described in this article seeks to improve the state estimation performance of the UKF without imposing extra computational costs to the filter. Moreover, the tuning process must be able to deal with the fact that it would be difficult to obtain an explicit model of the UKF applied to a particular dynamic system. These requirements lead us to resort to an offline optimization strategy that could cope with black-box functions.

To perform the optimization of the UT parameters, it would thus be necessary to conceive a scalar goal function \(g(\cdot )\) that could properly weight points of the UT domain. The weight of a given point should be an indication of the state estimation performance of the UKF when tuned with that point. The tuning process would then consist in the maximization of a scalar goal function \(g({\varvec{\theta }})\), where \({\varvec{\theta }}\) is a point in the UT parameter space.

The tuning process described in this article adopts the goal function used in Turner and Rasmussen (2010). This goal function maximizes the likelihood of the measurements that serve as training data. This strategy was also applied to the offline tuning of the EKF noise covariance matrices in Abbeel et al. (2005) and to the online tuning of the \(\kappa \) UT parameter in Dunik et al. (2010).

For a given point \({\varvec{\theta }}\), this goal function runs a UKF tuned with \({\varvec{\theta }}\) on one or more time series

$$\begin{aligned} \mathbf {Y}_{i}=\left\{ \mathbf {y}_{i}(1),\mathbf {y}_{i}(2),\ldots ,\mathbf {y}_{i}(T) \right\} , \quad i=1,\ldots ,n_Y \end{aligned}$$
(10)

where the \(\mathbf {y}_{i}(t)\) are the sequential noisy measurements of the dynamic system state that comprise time series \(\mathbf {Y}_{i}\).

For each \(\mathbf {Y}_i\), the means and covariances of the measurement predictions yielded by the UKF tuned with \(\varvec{\theta }\) are used to calculate the log of the conditional likelihood \(p(\mathbf {Y}_{i}|{\varvec{\theta }})\), which is given by

$$\begin{aligned} \mathrm {log}\,p\left( \mathbf {Y}_{i}|{\varvec{\theta }}\right) = \frac{\sum _{t=1}^{T}\mathrm {log}\left( \mathcal {N}\left( \mathbf {y}_{i}(t) |\hat{\mathbf {y}}_{i}(t),\mathbf {S}_{i}(t)\right) \right) }{T} \,, \end{aligned}$$
(11)

where \(\hat{\mathbf {y}_{i}}(t)\) and \(\mathbf {S}_{i}(t)\) are, respectively, the mean and covariance of the measurement estimates yielded by the UKF tuned with \({\varvec{\theta }}\). The sequence of calculations performed by the goal function is shown in Algorithm 1. The goal function receives the UT parameters \({\varvec{\theta }}\), the training time series \(\mathbf {Y}=\{\mathbf {Y}_{1},\ldots ,\mathbf {Y}_{n_Y} \}\) and both the mean \(\mathbf {M}_{0}\) and covariance \(\mathbf {P}_{0}\) of the initial state. It then returns the conditional log-likelihood of \(\mathbf {Y}\).

figure a

It is important to note that the maximization of the log-likelihood of the training data \(\mathbf {Y}\) does not guarantee the maximization of the state estimation performance of the filter, since no information about the true states that gave rise to the measurements is used (Turner and Rasmussen 2010). It would be more convenient to maximize the likelihood of the true states that gave rise to the measurements, but in many cases, it would be costly or even impossible to obtain such ground-truth data. Fortunately, the results described in Sect. 6 show that it is possible to significantly improve the performance of the UKF without having to obtain ground-truth data.

4 Stochastic Tuning of the UKF

As seen in Sect. 3, the tuning of the UT parameters will be approached as the optimization of a black-box function. Though its analytic form is unknown, a black-box function can be evaluated to yield its value, entailing the adoption of gradient-free optimization algorithms, such as pattern search (Hooke and Jeeves 1961) or genetic algorithms (Goldberg 1989). Nonetheless, existing gradient-free optimizers usually require a large number of function evaluations, and Algorithm 1 may be costly to evaluate in a number of situations, such as when the time series that comprise the training data are long or when the dynamic model (2) takes significant computational time to run.

A better alternative would thus be to conceive a direct search algorithm able to find a good tuning of the UT parameters without the need to perform too many evaluations of the goal function. This can be accomplished by assuming that the goal function shows some degree of smoothness.

If this is the case, then it is reasonable to expect that two points in the input space that are close to each other are likely to have similar values of the goal function. Hence, the value of the goal function at a given point can be estimated by the values of the goal function at points that are near. This property means that close to points where the value of the goal function is high, there are points where the value of the goal function is similarly high, or maybe even higher.

As a result, an optimization algorithm that searches close to points that display high values of the goal function described by Algorithm 1 would have the potential to find points where this goal function displays even higher values. Nonetheless, to minimize the chance of getting stuck at poor local maxima, the optimizer must implement some sort of exploration of the input space.

To balance the conflicting goals of searching near points where the goal function is known to be high (exploitation of what is already known) and searching in regions of the input space where there are few or no samples at all (exploration or the input space), an optimization algorithm could evaluate the goal function at points sampled from a given probability distribution. This probability distribution should be conceived in a way that it would produce points located near and also points located not so near existing samples. The former points would allow the algorithm to exploit what is already known about the shape of the goal function, while the latter points would allow the algorithm to explore the input space.

5 The Optimizer

The proposed tuning algorithm is based on ideas of the BSF (Gordon et al. 1993). The BSF approximates the probability distribution of the system state \(\mathbf {x}_k\) at time step k by a set of random samples (particles). The mean and covariance of the state estimate \(\hat{\mathbf {x}}_k\) yielded by the filter at time step k are, respectively, given by the mean and covariance of the existing particles at time step k. The filter works sequentially, performing two basic steps, a prediction step and an update step.

To briefly describe how the BSF works when applied to system (2), let us assume that, at time step \(k-1\), the set of particles is \(\mathbf {X}_{k-1}=\{\mathbf {x}_{k-1}(i): i=1,\ldots ,N\}\) , where \(\mathbf {x}_{k-1}(i)\) is the ith particle in \(\mathbf {X}_{k-1}\). This set of particles is indeed a set of random samples from the probability distribution \(p(\mathbf {x}_{k-1}|\mathbf {D}_{k-1})\), where \(\mathbf {D}_{k-1}=\{\mathbf {y}_{i},i=1,\dots ,k-1\}\) is the set of measurements available at \(k-1\).

The prediction step is aimed at predicting what would be the next state \(\mathbf {x}_k \sim p(\mathbf {x}_{k}|\mathbf {D}_k)\). To this end, each particle in \(\mathbf {X}_{k-1}\) is transformed by the state transition function \(f(\cdot )\), yielding prior samples \(\mathbf {x}_{k}^{-}(i)=f(\mathbf {x}_{k-1}(i))+\mathbf {w}_{k-1}\), with \(\mathbf {w}_{k-1}\) sampled from the probability distribution of the system process noise.

The update step uses measurement \(\mathbf {y}_k\) to correct the prior samples. To this end, normalized weights are attributed to the prior samples, so that the weight \(q_i\) associated with prior sample \(\mathbf {x}_{k}^{-}(i)\) is proportional to the conditional likelihood \(p(\mathbf {y}_k|\mathbf {x}_{k}^{-}(i))\). Each weight \(q_i\) is seen as the probability mass associated with the prior sample \(\mathbf {x}_{k}^{-}(i)\) and is hence used to form a discrete probability distribution over the prior samples. The corrected samples (particles) \(\mathbf {x}_{k}(i)\) are then obtained by sampling from this discrete distribution, so that the probability that a given prior particle will be resampled is equal to its weight.

The operations performed by the proposed tuning algorithm resemble those of the update step of the BSF. At each iteration of the proposed optimizer, the particles associated with the \(N_s\) highest values of the goal function are selected, forming a subset \(\varvec{\varTheta }^{\star }\). The values of the goal function corresponding to the points in \(\varvec{\varTheta }^{\star }\) are then normalized to interval [0, 1], in a way that the points in \(\varvec{\varTheta }^{\star }\) and the normalized weights form a discrete probability distribution. A population \(\varvec{\varTheta }_\mathrm{best}\) is then formed by randomly sampling \(N_s\) points from this discrete distribution. In Sect. 6, the resampling is performed according to the stratified resampling strategy (Douc and Cappe 2005).

This resampling strategy steadily increases the number of copies of particles with high weights and decreases the number of copies of particles with low weights. This can be seen as a way to exploit the information conveyed by the existing particles, but it does not bring any new information to the optimization process. To bring new information to the optimization process, the algorithm samples \(N_s\) new particles from the distribution

$$\begin{aligned} p(\varvec{\theta }_\mathrm{new}) = \mathcal {N}(\mathbf {M}_\mathrm{best},\mathbf {S}_\mathrm{best}), \end{aligned}$$
(12)

where \(\mathbf {M}_\mathrm{best}\) and \(\mathbf {S}_\mathrm{best}\) are, respectively, the mean and covariance of \(\varvec{\varTheta }_\mathrm{best}\). The normal distribution was chosen because the UKF is a Gaussian filter.

The goal function is evaluated at \(\varvec{\theta }_\mathrm{new}\), providing them with weights. The new particles and their weights are then added to the current particle population, thus bringing new information to the optimization process.

Probability distribution (12) is interesting because it can yield samples located within varying distances of the best existing particles. Samples that are close to existing particles allow the tuning algorithm to exploit existing information about the goal function, while points that are farther away allow the algorithm to explore the input space.

A more systematic description of these general tuning steps is given in Algorithm 2, which receives as inputs the range of possible values for the particles, the budget of evaluations of the goal function, a set \(\varvec{\varTheta }=\{\varvec{\theta }_{1},\ldots ,\varvec{\theta }_{N}\}\) of N initial particles and the number \(N_s\) of new particles generated at each iteration.

figure b

6 Numerical Experiments

All numerical experiments were executed according to the following steps:

  1. 1.

    The training data \(\mathbf {Y}^\mathrm{train}=\{\mathbf {Y}_{1}^\mathrm{train},\ldots ,\mathbf {Y}_{N_\mathrm{tr}}^\mathrm{train} \}\) were comprised of \(N_\mathrm{tr}\) time series generated by Monte Carlo simulation of the nonlinear system;

  2. 2.

    The tuning for PfUkf was obtained by feeding Algorithm 2 with \(\mathbf {Y}^\mathrm{train}\) and the other necessary parameters;

  3. 3.

    The tuning for UkfO was obtained by feeding the optimizer provided in Turner (2014) with \(\mathbf {Y}^\mathrm{train}\) and the other necessary parameters;

  4. 4.

    The test data \(\mathbf {Y}^\mathrm{test}=\{\mathbf {Y}^\mathrm{test}_{1},\ldots ,\mathbf {Y}^\mathrm{test}_{N_\mathrm{ts}}\}\) were comprised of \(N_\mathrm{ts}\) different time series generated by Monte Carlo simulation of the nonlinear system;

  5. 5.

    The testing of the filters consisted in assessing the quality of the state estimates yielded by the filters on the true states corresponding to \(\mathbf {Y}^\mathrm{test}\).

In all experiments, the optimizations corresponding to steps 2 and 3 used the same set of 27 initial samples also used in Turner and Rasmussen (2010). The maximum number of evaluations of the goal function (budget) was 100, and the range of possible values for the UT parameters was

$$\begin{aligned} 0.01\le & {} \alpha \le 4 \nonumber \\ 0\le & {} \beta \le 4 \nonumber \\ 0\le & {} \kappa \le 5\,. \end{aligned}$$
(13)

The numerical experiments also assumed that the covariance noise matrices \(\mathbf {Q}\) and \(\mathbf {R}\) were known, and the \(N_{s}\) free parameter of Algorithm 2 was always \(N_{s}= 10\). The distribution of the initial state of the filters was the same distribution used to generate the training and test data.

6.1 Benchmarks

Three commonly adopted nonlinear filtering problems are used in the numerical experiments.

6.1.1 Kitagawa

Variations of the Kitagawa (1996) model have been used as benchmark for nonlinear filters (Gordon et al. 1993; Turner and Rasmussen 2010, 2012). The model equations are

$$\begin{aligned} x_{k+1}= & {} 0.5x_{k} + \frac{25x_{t}}{(1+x_{t}^{2})}+w_k, \quad w_{k}\sim \mathcal {N}(0,0.2^{2})\nonumber \\ y_{k}= & {} 5\mathrm {sin}(2x_{k})+v_{k}, \quad v_{k}\sim \mathcal {N}(0,0.01^{2})\,. \end{aligned}$$
(14)

For the numerical experiment with Kitagawa, the training data consisted of a single time series of 1000 sequential measurements, and the test data consisted of 500 independent time series of 10 sequential measurements each. The initial state estimate for generating both the training and test data was \(\mathbf {x}_{0}\sim \mathcal {N}(0,0.5^{2})\).

6.1.2 Sinusoid

The Sinusoid problem (Turner and Rasmussen 2010, 2012) is given by

$$\begin{aligned} x_{k+1}= & {} 3\sin (x_{k}) + w_{k}, \quad w_{k} \sim \mathcal {N}\left( 0,0.1^2\right) \end{aligned}$$
(15)
$$\begin{aligned} y_{k}= & {} \sigma (x_{k}/3)+ v_k, \quad v_{k} \sim \mathcal {N}\left( 0,0.1^2\right) , \end{aligned}$$
(16)

where \(\sigma (\cdot )\) represents the logistic sigmoid.

For the numerical experiment with Sinusoid, the training data consisted of a single time series of 1000 sequential measurements, and the test data consisted of 500 independent time series of 100 sequential measurements each. The initial state distribution for both the training and test data was \(\mathbf {x}_{0}\sim \mathcal {N}(0,1^{2})\).

6.1.3 Bearings Only

Different versions of the Bearings only are also popular (Dunik et al. 2012; Bar-Shalom et al. 2002). The model represents the problem of tracking a moving target based solely on measurements provided by one angular sensor. In this paper, the system state is the position of the target in the Cartesian plane. The model equations are

$$\begin{aligned} \mathbf {x}_{k+1}= & {} \begin{bmatrix} 0.9&0\\ 0&1 \end{bmatrix}\mathbf {x}_{k}+\mathbf {w}_k , \quad \mathbf {w}_{k}\sim \mathcal {N}(\mathbf {0,Q})\nonumber \\ y_k= & {} \mathrm {tan}^{-1}\left( \frac{x_{2,k}-\sin (k)}{x_{1,k}-\cos (k)} \right) +v_k,\quad v_k ~\sim \mathcal {N}(0,R),\nonumber \\ \end{aligned}$$
(17)

where

$$\begin{aligned} \mathbf {Q}= & {} \begin{bmatrix} 0.1&0.01\\ 0.01&0.1 \end{bmatrix}\nonumber \\ R= & {} 0.025\,. \end{aligned}$$
(18)

For the numerical experiment with Bearings Only, the training data consisted of a single time series of 1000 sequential measurements, and the test data consisted of 500 independent time series of 100 sequential measurements each. The distribution of the initial state for both the training and test data was

$$\begin{aligned} \mathbf {x}_{0}\sim \mathcal {N}\left( \begin{bmatrix} 20\\ 5 \end{bmatrix},\begin{bmatrix} 0.1&0\\ 0&0.1 \end{bmatrix}\right) . \end{aligned}$$
(19)

6.2 Performance Criteria

The performances of the different filters are assessed with the root-mean-squared error (RMSE) and the mean absolute error (MAE) of the estimated states. Considering that the prediction errors in each component i of the state vector \(\mathbf {x}\) at time step k are given by

$$\begin{aligned} e_{i,k} = \hat{x}_{i,k} - x_{i,k}\,, \end{aligned}$$
(20)

where \(\hat{x}_{i,k}\) and \(x_{i,k}\) are, respectively, the predicted and the real value of the \(i_{th}\) component of \(\mathbf {x}\) at time step k, the equations of the error criteria are

$$\begin{aligned} \mathrm {RMSE} = \sqrt{\frac{1}{n_T}\sum _{k=1}^{n_T}\sum _{i=1}^{n_x} e_{i,k}^{2}} \end{aligned}$$
(21)

and

$$\begin{aligned} \mathrm {MAE}=\frac{1}{n_T}\sum _{k=1}^{n_T} \sum _{i=1}^{n_x} |e_{i,k}|\,, \end{aligned}$$
(22)

where \(n_T\) is the number of time steps of the series of true data.

The two criteria provide different views of the estimation performances. In MAE, all errors are equally weighted, while in RMSE, bigger errors have more impact than small errors. We also employ boxplots of the squared state estimation errors, in order to provide further assessment of the variability of the predictions yielded by the filters.

6.3 Results

The state estimation errors for all three benchmark problems are shown in Tables 12 and 3. The measurement estimation errors are shown in Tables 45 and 6. The UT parameters used in the UKF filters are shown in Tables 78 and 9.

Table 1 State errors computed from 500 Monte Carlo simulations for Kitagawa
Table 2 State errors computed from 500 Monte Carlo simulations for Sinusoid
Table 3 State errors computed from 500 Monte Carlo simulations for Bearings
Table 4 Measurement errors computed from 500 Monte Carlo simulations for Kitagawa
Table 5 Measurement errors computed from 500 Monte Carlo simulations for Sinusoid
Table 6 Measurement errors computed from 500 Monte Carlo simulations for Bearings
Table 7 Kitagawa—UT parameters computed from a budget of 100 points and 1 training time series
Table 8 Sinusoid—UT parameters computed from a budget of 100 points and 1 training time series
Table 9 Bearings—UT parameters computed from a budget of 100 points and 1 training time series

The variability of the root-squared errors of the state estimates yielded by the filters is shown in Figs. 12 and 3.

Fig. 1
figure 1

Kitagawa

Fig. 2
figure 2

Sinusoid

Fig. 3
figure 3

Bearings only

6.4 Analysis

Tables 12 and 3, together with Figs. 12 and 3, show that PfUkf performed better than the other filters in Kitagawa and Bearings only. In Sinusoid, its state estimation performance was on a par with that of UkfO. Except for UkfD in Sinusoid, the remaining filters state estimation performances were significantly worse than the performances of the tuned filters.

As for the measurement estimation performances, Tables 45 and 6 show that there can be some disagreement with the state estimation performances, as expected.

7 Conclusion

We have presented a particle-based optimization algorithm for automatically tuning the UT parameters. The main advantage of the proposed approach is that it can significantly enhance the state estimation performance of the UKF without imposing extra computational cost to the filter at runtime.

Numerical experiments showed that the proposed algorithm yielded state estimation results better than filters EKF, CKF and UkfD. The results also were comparable to or better than the results of UkfO, a recently published model-based optimization algorithm. The proposed optimizer, though, is much easier to implement and also computationally simpler than UkfO.