Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Finding a state-space model of a system and an associated optimal observer/Kalman filter gain in the presence of unknown process and measurement noises is a well-known and important system identification problem. The identification problem is nonlinear because it involves the product of the system matrix and the state, both of which are unknown. A method known as Observer/Kalman filter identification (OKID) bypassed the need to determine the system states by working through an intermediate set of parameters called Observer/Kalman filter Markov parameters [1]. These parameters are related to input-output data linearly, and can thus be found by a simple linear least-squares solution. The nonlinear step is handled by the Eigensystem Realization Algorithm (ERA), which offers an analytical (exact) solution to the problem of recovering a state-space model representation from the identified Markov parameters [2]. Under appropriate conditions, OKID identifies a state-space model of the system and an associated Kalman filter gain that is optimal with respect to the unknown process and measurement noises embedded in the input-output data. There are numerous extensions of the observer-based method [35], such as residual whitening [6], and methods that are derived from interaction matrices [7].

Another important class of methods that solved this problem is known as “subspace methods” [8, 9]. These methods put the emphasis on the recovery of the system states from input-output measurements by various oblique projection techniques. A fundamental attraction of this approach is that once the states are known, the identification of the state-space model becomes linear. These methods are also capable of recovering the optimal Kalman filter gain when the input-output measurements are corrupted by unknown process and measurement noises. There are numerous variations of the subspace technique [1012]. The method used in this paper for comparison purpose is N4SID [13].

This paper describes a recently developed class of methods which can be referred to as “superspace methods”. A superstate vector is made up of input and output measurements. These superstate vectors are treated as the states of the system, and used directly in the identification of the state-space model and an associated Kalman filter gain. The superspace method bypasses the need to recover the states of the system as required in a subspace method. It also sidesteps the need to work through the Markov parameters as in OKID-based methods. In this paper, we further show that in the space of the superstates, the system matrices that define the state portion of the Kalman filter are made up entirely of 1’s and 0’s. They do not need to be identified, and even more interestingly, they are independent of the system dynamics and the process and measurement noise statistics. All system dynamics are carried in the output measurement portion of the Kalman filter. When model reduction is applied, the dynamics of the system returns to the state portion of the model as one would expect in a state-space model. The superspace idea has also been recently applied successfully to the bilinear system identification problem [14].

2 Mathematical Formulation

The state-space identification problem has many related or equivalent forms. We will use one that is both common in literature and convenient for stating our algorithm.

2.1 Problem Statement

Suppose a set of input-output measurements, \(\{u_{0},u_{1},\cdots \,,u_{s}\}\), \(\{y_{0},y_{1},\cdots \,,y_{s}\}\), of an m-input, q-output system corrupted by unknown process and measurement noises, is given. The objective of the problem is to find a state-space representation {A, B, C, D} and a steady-state Kalman filter gain K such that the input-output measurements are related to each other by the following innovation form of the state-space model,

$$\displaystyle{ \hat{x}_{k+1} = A\hat{x}_{k} + \mathit{Bu}_{k} + \mathit{Ke}_{k} }$$
(1)
$$\displaystyle{ y_{k} = C\hat{x}_{k} + \mathit{Du}_{k} + e_{k} }$$
(2)

where \(\hat{x}_{k}\) denotes the (unknown) Kalman filter state, K is the (unknown) corresponding steady-state Kalman filter gain, and e k is the (unknown) Kalman filter residual that is unique for the given system, input-output measurements, and noises in the system, \(e_{k} = y_{k} -\hat{y}_{k} = y_{k} - (C\hat{x}_{k} + \mathit{Du}_{k})\). The innovation form is derived from the conventional process form,

$$\displaystyle{ \hat{x}_{k+1} = A\hat{x}_{k} + \mathit{Bu}_{k} + w_{k} }$$
(3)
$$\displaystyle{ y_{k} = C\hat{x}_{k} + \mathit{Du}_{k} + n_{k}\ }$$
(4)

The process noise w k and measurement noises n k are assumed to be independent, white, zero-mean, and Gaussian. The Kalman filter gain K is a function of the system state-space model and the covariances of the process and measurement noises. In the identification problem, only input-output measurements are known. The noise covariances are unknown.

2.2 A Superstate Vector Definition

A superstate vector z k is defined from input-output data as follows,

$$\displaystyle{ z_{k} = \left [\begin{array}{*{20}c} v_{k-p}\\ \vdots \\ v_{k-2} \\ v_{k-1}\\ \end{array} \right ]\,\,\,\,\,\,\,\,\,\,\,\,\,v_{k} = \left [\begin{array}{*{20}c} u_{k} \\ y_{k}\\ \end{array} \right ] }$$
(5)

From the given input-output measurements, these superstates can be easily created, and used in the subsequent superspace identification method.

3 A Superspace Identification Algorithm

A superspace identification algorithm is summarized below.

  • Choose a value for p such that (m + q)p defines the dimension of the superstate vector, which is typically larger than the true minimum state dimension of the system being identified.

  • Form the following matrix Z from the available input-output data,

    $$\displaystyle{ Z = \left [\begin{array}{*{20}{c}} v_{0} & v_{1} & \cdots & v_{s-p} &v_{s-p+1} \\ v_{1} & v_{2} & \cdots &v_{s-p+1} & v_{s-p+2}\\ \vdots & \vdots & \vdots & \vdots & \vdots \\ v_{p-2} & v_{p-1} & \cdots & v_{s-2} & v_{s-1} \\ v_{p-1} & v_{p} &\cdots & v_{s-1} & v_{s}\end{array} \right ] }$$
    (6)

    Define Z 0 as Z with its last column removed, and Z 1 as Z with its first column removed. Furthermore, define

    $$\displaystyle{ U_{p} = \left [\begin{array}{*{20}c} u_{p}&u_{p+1} & \cdots &u_{s}\\ \end{array} \right ]\,\,\,\,\,\,\,\,\,Y _{p} = \left [\begin{array}{*{20}c} y_{p}&y_{p+1} & \cdots &y_{s}\\ \end{array} \right ]\,\,\,\,\,\,\,\,V _{p} = \left [\frac{U_{p}} {Y _{p}}\right ] }$$
    (7)
  • Solve for \(\bar{A}^{{\ast}}\), \(\bar{B}^{{\ast}}\), C , and D by least-squares from

    $$\displaystyle{ Z_{1} =\bar{ A}^{{\ast}}Z_{ 0} +\bar{ B}^{{\ast}}V _{ p} }$$
    (8)
    $$\displaystyle{ Y _{p} = C^{{\ast}}Z_{ 0} + D^{{\ast}}U_{ p} + E_{p}^{{\ast}} }$$
    (9)

    It turns out that there is no need to solve for \(\bar{A}^{{\ast}}\), \(\bar{B}^{{\ast}}\) from (8), as they are simply matrices made up of 0’s and 1’s,

    $$\displaystyle{ \bar{A}^{{\ast}} = \left [\frac{\left.0_{(p-1)b\times b}\,\,\,\,\right \vert \,\,I_{(p-1)b\times (p-1)b}} {\left.0_{b\times b}\,\,\,\,\right \vert \,\,0_{b\times (p-1)b}} \right ]\,\,\,\,\,\,\,\,\,\bar{B}^{{\ast}} = \left [\frac{0_{(p-1)b\times b}} {I_{b\times b}} \right ] }$$
    (10)

    where \(b = m + q\). A key point to observe here is that \(\bar{A}^{{\ast}}\), \(\bar{B}^{{\ast}}\) are completely independent of the system dynamics and the process and measurement noise statistics. Information about the system is completely contained in C and D , which can be solved by least-squares,

    $$\displaystyle{ \left [\left.C^{{\ast}}\,\,\right \vert \,\,\,D^{{\ast}}\right ] = Y _{ p}\left [\frac{Z_{0}} {U_{p}}\right ]^{\dag } }$$
    (11)

    The † denotes the Moore-Penrose pseudo-inverse. Z 1 is not needed in (11), but will be used later in establishing the optimality of the algorithm.

  • Lastly, a representation of {A, B, C, D, K} denoted by \(\{A^{{\ast}},B^{{\ast}},C^{{\ast}},D^{{\ast}},K^{{\ast}}\}\) can be recovered from \(\bar{A}^{{\ast}}\), \(\bar{B}^{{\ast}}\), \(C^{{\ast}}\), D based on the following relationship,

    $$\displaystyle{ \bar{A}^{{\ast}} = A^{{\ast}}- K^{{\ast}}C^{{\ast}}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\bar{B}^{{\ast}} = \left [\left.B^{{\ast}}- K^{{\ast}}D^{{\ast}}\,\,\right \vert \,\,K^{{\ast}}\right ] }$$
    (12)

    Recall that C and D are obtained from (11), \(\bar{A}^{{\ast}}\) and \(\bar{B}^{{\ast}}\) are given in (10). The matrix \(\bar{B}^{{\ast}}\), which is defined in (10), has two partitions according to (12): the left partition is \(B^{{\ast}}- K^{{\ast}}D^{{\ast}}\) of dimensions \(\mathit{pb} \times m\), the right partition is K of dimensions pb × q. Because \(\bar{B}^{{\ast}}\) is made up of entirely 0’s and 1’s and known beforehand, these partitions are known. B can be recovered from the first partition of \(\bar{B}^{{\ast}}\) because K are D known. Similarly, A can be recovered from \(\bar{A}^{{\ast}}\) because \(K^{{\ast}}\) and C are known. Having obtained a full set of \(\{A^{{\ast}},B^{{\ast}},C^{{\ast}},D^{{\ast}},K^{{\ast}}\}\), standard model reduction techniques can be applied to reduce the dimension of \(\{A^{{\ast}},B^{{\ast}},C^{{\ast}},D^{{\ast}},K^{{\ast}}\}\) to the correct minimum state dimension of the system being identified.

4 Optimality of Superspace Identification

We now establish the optimality of the combination \(\{A^{{\ast}},B^{{\ast}},C^{{\ast}},D^{{\ast}},K^{{\ast}}\}\) by proving that Markov parameters of the combination \(\{A^{{\ast}},B^{{\ast}},C^{{\ast}},D^{{\ast}},K^{{\ast}}\}\) match the Markov parameters of the optimal Kalman filter given in (1) and (2). First, we need to eliminate e k from the state portion of the Kalman filter by solving for e k in (2) and substituting it into (1) to produce

$$\displaystyle{ \hat{x}_{k+1} =\bar{ A}\hat{x}_{k} +\bar{ B}v_{k}\ }$$
(13)
$$\displaystyle{ y_{k} = C\hat{x}_{k} + \mathit{Du}_{k} + e_{k}\ }$$
(14)

where \(\bar{A} = A -\mathit{KC}\), \(\bar{B} = \left [\left.B -\mathit{KD}\,\right \vert \,K\right ]\), and v k is defined in (5). Define

$$\displaystyle{ \hat{X}_{p} = \left [\begin{array}{*{20}{c}} \hat{x}_{p}&\cdots &\hat{x}_{s} \end{array} \right ]\,\,\,\,\,\,\,\,\,\,\,\,\hat{X}_{p+1} = \left [\begin{array}{*{20}{c}} \hat{x}_{p+1} & \cdots &\hat{x}_{s+1} \end{array} \right ]\,\,\,\,\,\,\,\,\,\,\,\,E_{p} = \left [\begin{array}{*{20}{c}} \hat{e}_{p}&\cdots &\hat{e}_{s} \end{array} \right ] }$$
(15)

Equations (13) and (14) can be written for all available time steps,

$$\displaystyle{ \hat{X}_{p+1} =\bar{ A}\hat{X}_{p} +\bar{ B}V _{p}\ }$$
(16)
$$\displaystyle{ Y _{p} = C\hat{X}_{p} + \mathit{DU}_{p} + E_{p}\ }$$
(17)

where \(V _{p} =\,\, \left [\begin{array}{*{20}c} v_{p}&v_{p+1} & \cdots &v_{s}\\ \end{array} \right ].\) Next, we express \(\hat{X}_{p}\) and \(\hat{X}_{p+1}\) in terms of input and output measurements. As long as p is sufficiently large such that \(\bar{A}^{p} \approx 0\), then by repeated substitution, the Kalman filter state can be expressed in terms of input and output measurements, and when packaged together, can be put in the form,

$$\displaystyle{ \left [\begin{array}{*{20}{c}} \hat{x}_{p}&\hat{x}_{p+1} & \cdots &\hat{x}_{s+1} \end{array} \right ] = \left [\begin{array}{*{20}{c}} \bar{A}^{p-1}\bar{B}&\cdots &\bar{A}\bar{B}&\bar{B} \end{array} \right ]\left [\begin{array}{*{20}{c}} v_{0} & v_{1} & \cdots &v_{s-p+1} \\ v_{1} & v_{2} & \cdots &v_{s-p+2}\\ \vdots & \vdots & \vdots & \vdots \\ v_{p-1} & v_{p}&\cdots & v_{s} \end{array} \right ] }$$
(18)

Define \(\mathcal{C}_{p} = \left [\bar{A}^{p-1}\bar{B}\,,\,\,\,\ldots,\,\,\,\bar{A}\bar{B},\,\,\,\bar{B}\right ]\). It follows from (18) that

$$\displaystyle{ \hat{X}_{p} = \mathcal{C}_{p}Z_{0}\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\hat{X}_{p+1} = \mathcal{C}_{p}Z_{1} }$$
(19)

where Z 1 and Z 2 are the two partitions of Z as previously defined in (6). Substituting (19) into (16) produces

$$\displaystyle{ \mathcal{C}_{p}Z_{1} =\bar{ A}\mathcal{C}_{p}Z_{0} +\bar{ B}V _{p} }$$
(20)
$$\displaystyle{ Y _{p} = C\mathcal{C}_{p}Z_{0} + \mathit{DU}_{p} + E_{p} }$$
(21)

This is the relation that the ideal optimal system matrices need to satisfy with the given input-output measurements. Notice that E p here is made of the innovation sequence {e k }. In the superspace algorithm, we impose (8) and (9). To relate (20) and (21) to (8) and (9), premultiplying (8) by \(\mathcal{C}_{p}\) produces

$$\displaystyle{ \mathcal{C}_{p}Z_{1} = \mathcal{C}_{p}\bar{A}^{{\ast}}Z_{ 0} + \mathcal{C}_{p}\bar{B}^{{\ast}}V _{ p} }$$
(22)
$$\displaystyle{ Y _{p} = C^{{\ast}}Z_{ 0} + D^{{\ast}}U_{ p} + E_{p}^{{\ast}} }$$
(23)

In the superspace identification algorithm, C and D are solved from (11) by least-squares. This step ensures that the residual \(E_{p}^{{\ast}}\) is minimized and orthogonal to the input and output data. These are the conditions that the optimal Kalman filter residual must satisfy, hence \(E_{p}^{{\ast}} = E_{p}\). Furthermore, if the input-output data set is sufficiently rich such that the matrix formed by Z 0 and V p is full rank, we also have

$$\displaystyle{ \bar{A}\mathcal{C}_{p} = \mathcal{C}_{p}\bar{A}^{{\ast}}\ }$$
(24)
$$\displaystyle{ \bar{B} = \mathcal{C}_{p}\bar{B}^{{\ast}}\ }$$
(25)
$$\displaystyle{ C\mathcal{C}_{p} = C^{{\ast}}\ }$$
(26)
$$\displaystyle{ D = D^{{\ast}}\ }$$
(27)

As long as p is sufficiently large such that \(\bar{A}^{p} \approx 0\), our choices of \(\bar{A}^{{\ast}}\) and \(\bar{B}^{{\ast}}\) in (10) indeed satisfy (24) and (25). Furthermore, it can be shown that the Markov parameters of the identified Kalman filter match the Markov parameters of the optimal Kalman filter. For example, the first Markov parameter can be shown to match,

$$\displaystyle{ C^{{\ast}}\bar{B}^{{\ast}} = (C\mathcal{C}_{ p})\bar{B}^{{\ast}} = C(\mathcal{C}_{ p}\bar{B}^{{\ast}}) = C\bar{B}\ }$$
(28)

Similarly, the second Markov parameter can also be shown to match,

$$\displaystyle{ C^{{\ast}}\bar{A}^{{\ast}}\bar{B}^{{\ast}} = C(\mathcal{C}_{ p}\bar{A}^{{\ast}})\bar{B}^{{\ast}} = C(\bar{A}\mathcal{C}_{ p})\bar{B}^{{\ast}} = C\bar{A}(\mathcal{C}_{ p}\bar{B}^{{\ast}}) = C\bar{A}\bar{B}\ }$$
(29)

and so on. The two sets of system matrices have the same Markov parameters. This result establishes the optimality of the identified Kalman filter.

We discuss briefly how the superspace method is different from the subspace method. Although there are several variants of the subspace identification method, the well-known N4SID is used for the present discussion. The N4SID method first estimates the optimal states by an oblique projection, then estimates all the system state-space matrices by least-squares. The superspace method bypasses the state estimation step by forming the superstates, and uses them directly in the identification. The least-squares calculation is used to obtain the coefficients of the measurement equations. The state portion of the Kalman filter are made up of 1’s and 0’s, and they do not need to be computed. More fundamentally, in deriving the subspace method, the process form of the state-space model is used to establish the input-output relationship, whereas in the superspace method the innovation form which involves the Kalman filter is used. The subspace method is thus process form oriented, whereas the superspace method is innovation form oriented. A consequence of using the innovation form is that the superspace identification method simultaneously recovers the steady-state Kalman filter gain together with the system state-space model, whereas in the subspace method, the Kalman filter gain is typically computed as an additional step.

5 Illustrative Examples

This section demonstrates the effectiveness of the superspace identification method using both simulated and experimental data. Comparison of the results to those obtained with the subspace N4SID algorithm will also be provided.

5.1 A Simulated System

The following system is driven by random input excitation, and the resultant output is recorded for system identification,

$$\displaystyle{A = \left [\begin{array}{*{20}{c}} -0.35&-0.5\\ 1 & 0 \end{array} \right ]\,\,\,\,\,\,\,\,\,B = \left [\begin{array}{*{20}{c}} 1\\ 0 \end{array} \right ]\,\,\,\,\,\,\,\,\,C = \left [\begin{array}{*{20}{c}} 0&0.5 \end{array} \right ]\,\,\,\,\,\,\,\,\,D = 0}$$

In order to show the convergent behavior, the input-output data record is deliberately chosen to be large (216 samples). The input and measurement noise covariances are \(Q = 0.025,R = 0.025\) corresponding to the input and measurement noises

Fig. 1
figure 1

Input data with added noise

Fig. 2
figure 2

Output data with added noise

shown in Figs. 1 and 2. Using p = 8 in the identification, the final model order is reduced to 2. Figure 3 shows the Kalman filter Markov parameters, \(C\left (A -\mathit{KC}\right )^{k}B\) and \(C\left (A -\mathit{KC}\right )^{k}K\), constructed from the identified state-space model and the identified Kalman filter gain by both methods. These Markov parameters are compared to those of the optimal Kalman filter whose gain is computed from perfect knowledge of the system model and the input and measurement noise covariances. Figure 4 shows the residuals of the identified Kalman filters and the state-state models by both methods (superspace and N4SID) matching the residuals of the optimal filter and of the truth model point-wise. The results confirm that both methods produce optimal identification results as expected.

Fig. 3
figure 3

Kalman filter Markov parameters by two identification methods (superspace and N4SID) compared to the Kalman filter Markov parameters computed from perfect knowledge of the system model and noise statistics

Fig. 4
figure 4

Comparison of identified filter residuals to optimal Kalman residual (left), and identified state-space model residuals to truth model residual (right)

5.2 CD Player Arm

A set of experimental data of a CD player arm is used in this example [15]. The system has two inputs and two outputs. The input-output data record used for identification is shown in Figs. 5 and 6 (2,048 samples). Using p = 6 in the identification, the final system order is reduced to 12 before the identification results are compared. Figure 7 shows a comparison of the identified Kalman filter outputs to the measured outputs. Figure 8 shows a comparison of the identified state-space model outputs to the measured outputs. Overall, both methods appear to capture the dynamics of the mechanism relatively well with this set of input-output data.

Fig. 5
figure 5

Input signals of CD player arm

Fig. 6
figure 6

Output signals of CD player arm

Fig. 7
figure 7

Comparison of Kalman filter outputs to measured outputs by two identification methods (superspace and N4SID)

Fig. 8
figure 8

Comparison of state-space model outputs to measured outputs by two identification methods (superspace and N4SID)

6 Conclusions

A superspace method for identification of a system state-space model and its associated Kalman filter gain has been formulated. It is found that in the space of the superstates, which are vectors of input and output measurements, the matrices that define the state portion of the Kalman filter are made up entirely of 0’s and 1’s. These matrices are known in advance, and do not need to be identified. Because they are known in advance, they are completely independent of the actual system dynamics and the noise statistics. This is a highly intriguing and very counter-intuitive result. Moreover, in the superstate space, the system dynamics are contained the measurement equation, not the state portion, of the Kalman filter. When model reduction is applied, the actual system dynamics returns to the state portion of the reduced-order model as one would expect in a state-space model. Optimality of the proposed superspace identification method is also established in theory and confirmed in numerical simulation. The Kalman filter identified from input-output measurements by the superspace technique is found to match the optimal Kalman filter derived from perfect knowledge of the system and perfect knowledge of the noise statistics, both in their Markov parameters and their output residuals. When applied to experimental data of a CD arm mechanism, the method produces excellent results when compared to an established subspace identification method.