Keywords

Synonyms

MHE

Introduction

In state estimation, we consider a dynamic system from which measurements are available. In discrete time, the system description is

$$\displaystyle{ x^{+} = f(x,w)\qquad y = h(x) + v }$$
(1)

The state of the systems is \(x \in \mathbb{R}^{n}\), the measurement is \(y \in \mathbb{R}^{P}\), and the notation x+ means x at the next sample time. A control input u may be included in the model, but it is considered a known variable, and its inclusion is irrelevant to state estimation, so we suppress it in the model under consideration here. We receive measurement y from the sensor, but the process disturbance, \(w \in \mathbb{R}^{g}\); measurement disturbance \(v \in \mathbb{R}^{p}\); and system initial state, x(0), are considered unknown variables.

The goal of state estimation is to construct or estimate the trajectory of x from only the measurements y. Note that for control purposes, we are usually interested in the estimate of the state at the current time, T, rather than the entire trajectory over the time interval [0, T]. In the moving horizon estimation (MHE) method, we use optimization to achieve this goal. We have two sources of error: the state transition is affected by an unknown process disturbance (or noise), w, and the measurement process is affected by another disturbance, v. In the MHE approach, we formulate the optimization objective to minimize the size of these errors thus finding a trajectory of the state that comes close to satisfying the (error-free) model while still fitting the measurements.

First, we define some notation necessary to distinguish the system variables from the estimator variables. We have already introduced the system variables (x, w, y, v). In the estimator optimization problem, these have corresponding decision variables, which we denote by the Greek letters (χ, ω, η, ν). The relationships between these variables are

$$\displaystyle{ \chi ^{+} = f(\chi,\omega )\qquad y = h(\chi ) +\nu }$$
(2)

and they are depicted in Fig. 1. Notice that ν measures the gap between the model prediction η = h(χ) and the measurement y. The optimal decision variables are denoted \((\hat{x},\hat{w},\hat{y},\hat{v})\), and these optimal decisions are the estimates provided by the state estimator.

Moving Horizon Estimation, Fig. 1
figure 1228figure 1228

The state, measured output, and disturbance variables appearing in the state estimation optimization problem. The state trajectory (gray circles in lower half ) is to be reconstructed given the measurements (black circles in upper half )

Full Information Estimation

The full information objective function is

$$\displaystyle{ V _{T}(\chi (0),\boldsymbol{\omega }) =\ell _{x}\big(\chi (0) -\overline{x}_{0}\big) +\sum _{ i=0}^{T-1}\ell_{ i}(\omega (i),\nu (i)) }$$
(3)

subject to (2) in which T is the current time, \(\boldsymbol{\omega }\) is the estimated sequence of process disturbances, (ω(0), , ω(T − 1)), y(i) is the measurement at time i, and \(\overline{x}_{0}\) is the prior, i.e., available, value of the initial state. Full information here means that we use all the data on time interval [0, T] to estimate the state (or state trajectory) at time T. The stage cost i (ω, ν) costs the model disturbance and the fitting error, the two error sources that we reconcile in all state estimation problems.

The full information estimator is then defined as the solution to

$$\displaystyle{ \min _{\chi (0),\boldsymbol{\omega }}V _{T}(\chi (0),\boldsymbol{\omega }) }$$
(4)

The solution to the optimization exists for all \(T \in \mathbb{I}_{\geq 0}\) under mild continuity assumptions and choice of stage cost. Many choices of (positive, continuous) stage costs x (⋅ ) and i (⋅ ) are possible, providing a rich class of estimation problems that can be tailored to different applications. Because the system model (1) and cost function (3) are so general, it is perhaps best to start off by specializing them to see the connection to some classic results.

Related Problem: The Kalman Filter

If we specialize to the linear dynamic model f(x, w) = Ax + Gw, h(x) = Cx, and let x(0), w, and v be independent, normally distributed random variables, the classic Kalman filter is known to be the statistically optimal estimator, i.e., the Kalman filter produces the state estimate that maximizes the conditional probability of x(T) given y(0), , y(T). The full information estimator is equivalent to the Kalman filter given the linear model assumption and the following choice quadratic of stage costs

$$\displaystyle\begin{array}{rcl} \ell_{x}(\chi (0),\overline{x}_{0})& =& (1/2)\left \|\chi (0) -\overline{x}_{0}\right \|_{P_{0}^{-1}}^{2} {}\\ \ell_{i}(\omega,\nu )& =& (1/2)\bigg(\left \|\omega \right \|_{Q^{-1}}^{2} + \left \|\nu \right \|_{ R^{-1}}^{2}\bigg) {}\\ \end{array}$$

in which random variable x(0) is assumed to have mean \(\overline{x}_{0}\) and variance P0 and random variables w and v are assumed zero mean with variances Q and R, respectively. The Kalman filter is also a recursive solution to the state estimation problem so that only the current mean \(\hat{x}\) and variance P of the conditional density are required to be stored, instead of the entire history of measurements y(i), i = 0, , T. This computational efficiency is critical for success in online application for processes with short time scales requiring fast processing.

But if we consider nonlinear models, the maximization of conditional density is usually an intractable problem, especially in online applications. So, MHE becomes a natural alternative for nonlinear models or if an application calls for hard constraints to be imposed on the estimated variables.

Moving the Horizon

An obvious problem with solving the full information optimization problem is that the number of decision variables grows linearly with time T, which quickly renders the problem intractable for continuous processes that have no final time. A natural alternative to full information is to consider instead a finite moving horizon of the most recent N measurements. Figure 2 displays this idea. The initial condition χ(0) is now replaced by the initial state in the horizon, χ(TN), and the decision variable sequence of process disturbances is now just the last N variables \(\boldsymbol{\omega }= (\omega (T - N),\ldots,\omega (T - 1))\). Now, the big question remaining is what to do about the neglected, past data. This question is strongly related to what penalty to use on the initial state in the horizon χ(TN). If we make this initial state a free variable, that is equivalent to completely discounting the past data. If we wish to retain some of the influence of the past data and keep the moving horizon estimation problem close to the full information problem, then we must choose an appropriate penalty for the initial state. We discuss this problem next.

Moving Horizon Estimation, Fig. 2
figure 1229figure 1229

Schematic of the moving horizon estimation problem

Arrival Cost.When time is less than or equal to the horizon length, TN, we can simply do full information estimation. So we assume throughout that T > N. For T > N, we express the MHE objective function as

$$\displaystyle\begin{array}{rcl} \hat{V }_{T}(\chi (T - N),\boldsymbol{\omega })& =& \Gamma _{T-N}(\chi (T - N)) {}\\ & & +\sum _{i=T-N}^{T-1}\ell_{ i}(\omega (i),\nu (i)) {}\\ \end{array}$$

subject to (2). The MHE problem is defined to be

$$\displaystyle{ \min _{\chi (T-N),\boldsymbol{\omega }}\hat{V }_{T}(\chi (T - N),\boldsymbol{\omega }) }$$
(5)

in which \(\boldsymbol{\omega }=\{\omega (T - N),\ldots,\omega (T - 1)\}\) and the hat on V distinguishes the MHE objective function from full information. The designer must now choose this prior weighting \(\Gamma _{k}(\cdot )\) for k > N.

To think about how to choose this prior weighting, it is helpful to first think about solving the full information problem by breaking it into two non-overlapping sequences of decision variables: the decision variables in the time interval corresponding to the neglected data (ω(0), ω(1), , ω(TN − 1)) and those in the time interval corresponding to the considered data in the horizon (ω(TN), , ω(T − 1)). If we optimize over the first sequence of variables and store the solution as a function of the terminal state χ(TN), we have defined what is known as the arrival cost. This is the optimal cost to arrive at a given state value.

Definition 1 (arrival cost)

The (full information) arrival cost is defined for k ≥ 1 as

$$\displaystyle{ Z_{k}(x) =\min _{\chi (0),\boldsymbol{\omega }}V _{k}(\chi (0),\boldsymbol{\omega }) }$$
(6)

subject to (2) and \(\chi (k;\chi (0),\boldsymbol{\omega }) = x\).

Notice the terminal constraint that χ at time k ends at value x. Given this arrival cost function, we can then solve the full information problem by optimizing over the remaining decision variables. What we have described is simply the dynamic programming strategy for optimizing over a sum of stage costs with a dynamic model (Bertsekas 1995).

We have the following important equivalence.

Lemma 1 (MHE and full information estimation)

The MHE problem (5) is equivalent to the full information problem (4) for the choice\(\Gamma _{k}(\cdot ) = Z_{k}(\cdot )\)for all k > N and N ≥ 1.

Using dynamic programming to decompose the full information problem into an MHE problem with an arrival cost penalty is conceptually important to understand the structure of the problem, but it doesn’t yet provide us with an implementable estimation strategy because we cannot compute and store the arrival cost when the model is nonlinear or other constraints are present in the problem. But if we are not too worried about the optimality of the estimator and are mainly interested in other properties, such as stability of the estimator, we can find simpler design methods for choosing the weighting \(\Gamma _{k}(\cdot )\). We address this issue next.

Estimator Properties: Stability

An estimator is termed stable if small disturbances (w, v) lead to small estimate errors \(x -\hat{ x}\) as time increases. Precise definitions of this basic idea are available elsewhere (Rawlings and Ji 2012), but this basic notion is sufficient for the purposes of this overview. In applications, properties such as stability and insensitivity to model errors are usually more important than optimality. It is possible for a filter to be optimal and still not stable. In the linear system context, this cannot happen for “nice” systems. Such nice systems are classified as detectable. Again, the precise definition of detectability for the linear case is available in standard references (Kwakernaak and Sivan 1972). Defining detectability for nonlinear systems is a more delicate affair, but useful definitions are becoming available for the nonlinear case as well (Sontag and Wang 1997).

If we lower our sights and do not worry if MHE is equivalent to full information estimation and require only that it be a stable estimator, then the key result is that the prior penalty \(\Gamma _{k}(\cdot )\) need only be chosen smaller than the arrival cost as shown in Fig. 3. See Rawlings and Mayne (2009, Theorem 4.20) for a precise statement of this result. Of course this condition includes the flat arrival cost, which does not penalize the initial state in the horizon at all. So neglecting the past data completely leads to a stable estimator for detectable systems. If we want to improve on this performance, we can increase the prior penalty, and we are guaranteed to remain stable as long as we stay below the upper limit set by the arrival cost.

Moving Horizon Estimation, Fig. 3
figure 1230figure 1230

Arrival cost Z k (x), underbounding prior weighting \(\Gamma _{k}(x)\), and MHE optimal value \(\hat{V }_{k}^{0}\); for all x and k > N, \(Z_{k}(x) \geq \Gamma _{k}(x) \geq \hat{ V }_{k}^{0}\), and \(Z_{k}(\hat{x}(k)) = \Gamma _{k}(\hat{x}(k)) =\hat{ V }_{k}^{0}\)

Related Problem: Statistical Sampling

MHE is based on optimizing an objective function that bears some relationship to the conditional probability of the state (trajectory) given the measurements. As discussed in the section on the Kalman filter, if the system is linear with normally distributed noise, this relationship can be made exact, and MHE is therefore an optimal statistical estimator. But in the nonlinear case, the objective function is chosen with engineering judgment and is only a surrogate for the conditional probability. By contrast, sampling methods such as particle filtering are designed to sample the conditional density also in the nonlinear case. The mean and variance of the samples then provide estimates of the mean and variance of the conditional density of interest. In the limit of infinitely many samples, these methods are exact. The efficiency of the sampling methods depends strongly on the model and the dimension of the state vector n, however. The efficiency of the sampling strategy is particularly important for online use of state estimators. Rawlings and Bakshi (2006) and Rawlings and Mayne (2009, pp. 329–355) provide some comparisons of particle filtering with MHE and also describe some hybrid methods combining MHE and particle filtering.

Summary and Future Directions

MHE is one of few state estimation methods that can be applied to nonlinear models for which properties such as estimator stability can be established (Rao et al. 2003; Rawlings and Mayne 2009). The required online solution of an optimization problem is computationally demanding in some applications but can provide significant benefits in estimator accuracy and rate of convergence (Patwardhan et al. 2012). Current topics for MHE theoretical research include treating bounded rather than convergent disturbances and establishing properties of suboptimal MHE (Rawlings and Ji 2012). The current main focus for MHE applied research involves reducing the online computational complexity to reliably handle challenging large dimensional, nonlinear applications (Kuhl et al. 2011; Lopez-Negrete and Biegler 2012; Zavala and Biegler 2009; Zavala et al. 2008).

Cross-References

Recommended Reading

Moving horizon estimation has by this point a fairly extensive literature; a recent overview is provided in Rawlings and Mayne (2009, pp. 356–357). The following references provide either (i) general background required to understand MHE theory and its relationship to other methods or (ii) computational methods for solving the real-time MHE optimization problem or (iii) challenging nonlinear applications that demonstrate benefits and probe the current limits of MHE implementations.