Bilinear Realization from I/O Data with NNs

Karachalios, D. S.; Gosea, I. V.; Kour, K.; Antoulas, A. C.

doi:10.1007/978-3-031-54517-7_21

D. S. Karachalios^18,19,
I. V. Gosea¹⁹,
K. Kour¹⁹ &
…
A. C. Antoulas^19,20,21

Part of the book series: Mathematics in Industry ((TECMI,volume 43))

Included in the following conference series:

International Conference on Scientific Computing in Electrical Engineering

169 Accesses

Abstract

We present a method that connects a well-established nonlinear (bilinear) identification method from data in the time domain with the advantages of neural networks (NNs). The main challenge for fitting bilinear systems is the accurate recovery of the corresponding Markov parameters from the input and output measurements. Afterward, a realization algorithm similar to that proposed by Isidori can be employed. The novel step is that NNs are used here as a surrogate data simulator to construct input-output (i/o) data sequences from a single experiment. Then, classical realization theory is used to build an interpretable bilinear model that can further optimize engineering processes through robust simulations and control design.

Access provided by Autonomous University of Puebla. Download conference paper PDF

On Bilinear Time-Domain Identification and Reduction in the Loewner Framework

Iterative Identification Algorithms for Bilinear-in-parameter Systems by Using the Over-parameterization Model and the Decomposition

Article 13 September 2018

The Auxiliary Model Based Hierarchical Estimation Algorithms for Bilinear Stochastic Systems with Colored Noises

Article 06 November 2019

1 Introduction

Evolutionary phenomena can be formally described as continuous dynamical models with partial differential equations (PDE)s. The continuous nature of these physical models is equipped with analytical results for an efficient discrete approximation in space and in time. In particular, methods such as finite elements or finite differences bridge the continuous analytical laws of the physical world with computational science [1]. On the other hand, data science allows model discovery when the identification feature is considered [2]. Quantification of these equivalences, in combination with the stochastic nature that governs real-world applications, aims to explain the digital twin [3]. Spatial discretization of PDEs, in many cases, results in a continuous in-time system of ordinary differential equations (ODE)s that is described by the operators $(\textbf{F},~\textbf{G})$ and can be approximated with Carleman linearization (e.g., bilinear system form) [4] where we present the single-input single-output (SISO) case with the continuous operators to be denoted with the subscript "c."

$$\begin{aligned} \boldsymbol{\varSigma }:\left\{ \begin{aligned} \dot{{{\textbf {x}}}}(t)&=\textbf{F}({{\textbf {x}}}(t))+\textbf{G}({{\textbf {x}}}(t))u(t)\\ y(t)&={{\textbf {H}}}{{\textbf {x}}}(t),~{{\textbf {x}}}_0=\textbf{0},~t\ge 0. \end{aligned}\right. \xrightarrow [\boldsymbol{\varSigma }\approx \boldsymbol{\varSigma }_{bil}]{\text {Carleman}}\boldsymbol{\varSigma }_{bil}:\left\{ \begin{aligned} \dot{{{\textbf {x}}}}(t)&=\textbf{A}_c{{\textbf {x}}}(t)+\textbf{N}_c{{\textbf {x}}}(t)u(t)+\textbf{B}_c u(t)\\ y(t)&=\textbf{C}_c{{\textbf {x}}}(t),~{{\textbf {x}}}_0={\textbf {0}},~t\ge 0. \end{aligned}\right. \end{aligned}$$

(1)

If the original system has dimension n, since Carleman linearization [4] preserves up to the quadratic term ${{\textbf {x}}}(t)\otimes {{\textbf {x}}}(t)$^{Footnote 1}, the dimension of the resulting bilinear system $(\textbf{A}_c,~\textbf{N}_c,~\textbf{B}_c,~\textbf{C}_c)$ increases to $N=n^2+n$.

Data-driven methods can be classified into two general classes. The first provides prediction through regression techniques such as neural networks (NN)s from machine learning (ML). At the same time, the second has its roots in system theory and allows model discovery [2, 5]. Generally, NNs are sensitive to parameter tuning and lack model interpretability due to the inherent “black-box” structure [6], while the latter construct interpretable models and can explain the hidden dynamics. ML models learn the features by composing non-linear activation functions and utilizing mainly the backpropagation algorithm to adjust the network weights during training. Therefore, by using data points for training, the prediction would be expressed as a function of these data points (finite memory). Until recently, the ML and system identification (SI) techniques were developed independently. But in recent years, great effort has been invested into establishing a common ground [7].

The authors in [8] have extended the subspace realization theory from linear to bilinear systems. For example, in applications that concern chemical processes, the controls are flow rates, and, from the first principles, e.g., mass and heat balances, these will appear in the system equations as products with the state variables. Therefore, the bilinear equation has the physical form $\small M\dot{{{\textbf {x}}}}=\sum _i{{\textbf {q}}}_i{{\textbf {x}}}_i-\sum _m {{\textbf {q}}}_m{{\textbf {x}}}_m,~{{\textbf {q}}}\text {(inputs)},~{{\textbf {x}}}\text {(state)}$. The authors of [9] could construct bilinear systems with white noise input based on an iterative deterministic-stochastic subspace approach. The author in [10] uses the properties of the linear model of the bilinear system when subjected to a constant input. Constant inputs can transform the bilinear model to an equivalent linear model [11].

In Sect. (2), we introduce the theory of bilinear realization by explaining in detail the data acquisition procedure to compute the bilinear Markov parameters that will enter the bilinear Hankel matrix. Further, we present a concise algorithm that can achieve bilinear identification, detailed by two examples. In Sect. (3), we train a neural network with a single i/o data sequence to mimic the unknown simulator and combine it with the bilinear realization theory. As a result, we could construct a bilinear model from a single i/o data with slightly better fit performance compared with another state-of-the-art bilinear SI approach. Finally, we provide the conclusion and the outlook in Sect. (4).

2 The Bilinear Realization Framework

In the case of linear systems, Ho and Kalman [12] have provided the mathematical foundations for realizing linear systems from i/o data. In the nonlinear case and towards the exact scope of identifying nonlinear systems, Isidori in [13] has extended these results for the bilinear case, and Al Baiyat in [14] has provided an SVD-based algorithm.

Time discretization as in [15] of the single-input single-output (SISO) bilinear system Eq. (1) with sampling time $\varDelta t$, results in fully discrete models defined at time instances given by $0<\varDelta t<2\varDelta t<\cdots <k\varDelta t$, with ${{\textbf {x}}}_c(k\varDelta t)={{\textbf {x}}}_k$ and $u(k\varDelta t)=u_k$ for $k=0,\ldots ,m-1$

$$\begin{aligned} \boldsymbol{\varSigma }_{\text {disc}} :\left\{ \begin{aligned} {{\textbf {x}}}_{k+1}&={{\textbf {A}}}{{\textbf {x}}}_k+{{\textbf {N}}}{{\textbf {x}}}_k u_k+{{\textbf {B}}}u_k,\\ y_k&={{\textbf {C}}}{{\textbf {x}}}_k,~{{\textbf {x}}}_0={\textbf {0}}. \end{aligned}\right. \end{aligned}$$

(2)

The discrete-time system in Eq. (2) has state dimension N, so, ${{\textbf {x}}}\in {\mathbb R}^N$ and the operators have dimensions ${{\textbf {A}}},{{\textbf {N}}}\in {\mathbb R}^{N\times N},~{{\textbf {B}}},{{\textbf {C}}}^T\in {\mathbb R}^{N}$. We have assumed homogeneous initial conditions and a zero feed-forward term (e.g., ${{\textbf {D}}}=0$ term). As far as the authors are aware, the forward Euler scheme is the only numerical scheme that preserves the bilinear structure in a discrete set-up with the cost of conditional stability. Moreover, a more sophisticated scheme can exactly interpolate the continuous model at the sampling points in [16] but is restricted to only a subclass of bilinear systems. Therefore, a good choice in terms of stability is the backward Euler scheme from [15], which preserves the bilinear structure asymptotically, and the transformation in Eq. (3) that leads to the discrete system is

$$\begin{aligned} \begin{aligned} &\phi :~{{\textbf {A}}}=({{\textbf {I}}}-\varDelta t{{\textbf {A}}}_c)^{-1},~{{\textbf {N}}}=\varDelta t({{\textbf {I}}}-\varDelta t{{\textbf {A}}}_c)^{-1}{{\textbf {N}}}_c,~{{\textbf {B}}}=\varDelta t({{\textbf {I}}}-\varDelta t{{\textbf {A}}}_c)^{-1}{{\textbf {B}}}_c,~{{\textbf {C}}}={{\textbf {C}}}_c,\\ &\boldsymbol{\varSigma }_{b}^{c}:({{\textbf {A}}}_c,{{\textbf {N}}}_c,{{\textbf {B}}}_c,{{\textbf {C}}}_c)\overset{\phi ^{-1}}{\leftrightarrow }\boldsymbol{\varSigma }_{b}^{d}:({{\textbf {A}}},{{\textbf {N}}},{{\textbf {B}}},{{\textbf {C}}}) \end{aligned} \end{aligned}$$

(3)

Definition 1

The reachability matrix $ \mathcal{R} _{n}=\left[ \begin{array}{ccc} {{\textbf {R}}}_{1} &{} \cdots &{} {{\textbf {R}}}_{n} \\ \end{array}\right] $ is defined recursively from the following relation: ${{\textbf {R}}}_j=\left[ \begin{array}{cc} {{\textbf {A}}}{{\textbf {R}}}_{j-1}~~ & ~~{{\textbf {N}}}{{\textbf {R}}}_{j-1}\end{array}\right] ,~j=2,\ldots n,~{{\textbf {R}}}_1={{\textbf {B}}}$.

Then, the state space of the bilinear system is spanned by the states reachable from the origin if and only if $\text {rank}( \mathcal{R} _n)=n$.

Definition 2

The observability matrix $ \mathcal{O} _{n}=\left[ \begin{array}{ccc} {{\textbf {O}}}_{1} &{} \cdots &{} {{\textbf {O}}}_{n} \\ \end{array}\right] ^T$ is defined recursively from the following relation: ${{\textbf {O}}}_j^T=\left[ \begin{array}{cc} {{\textbf {O}}}_{j-1}{{\textbf {A}}}~~ & ~~{{\textbf {O}}}_{j-1}{{\textbf {N}}}\end{array}\right] ^T,~j=2,\ldots n,~{{\textbf {O}}}_1={{\textbf {C}}}$.

Then the state space of the bilinear system is observable iff $\text {rank}( \mathcal{O} _n)=n$. The following Def. (3) will allow a concise representation of the i/o relation.

Definition 3

${{\textbf {u}}}_j(h)=\left[ \begin{array}{c} {{\textbf {u}}}_{j-1}(h) \\ {{\textbf {u}}}_{j-1}(h)u(h+j-1) \end{array}\right] ,~j=2,\ldots ,~{{\textbf {u}}}_1(h)=u(h)$.

Let $\{{{\textbf {w}}}_1,{{\textbf {w}}}_2,\ldots ,{{\textbf {w}}}_j,\ldots \}$ be an infinite sequence of row vectors, in which ${{\textbf {w}}}_j\in {\mathbb R}^{1\times 2^{j-1}}$ and is defined recursively as follows ${{\textbf {w}}}_j={{\textbf {C}}}{{\textbf {R}}}_{j},~j=1,2,\ldots $;

The state response of system Eq. (2) from the state ${{\textbf {x}}}_0={\textbf {0}}$ at time $k=0$, under a given input function, can be expressed as:

$$\begin{aligned} \begin{aligned} {{\textbf {x}}}_1&={{\textbf {B}}}u_0\triangleq {{\textbf {R}}}_1{{\textbf {u}}}_1(0),\\ {{\textbf {x}}}_2&={{\textbf {A}}}{{\textbf {R}}}_{1}{{\textbf {u}}}_{1}(0)+{{\textbf {N}}}{{\textbf {R}}}_{1}{{\textbf {u}}}_{1}(0)u(1)+{{\textbf {B}}}u(1)\triangleq {{\textbf {R}}}_{2}{{\textbf {u}}}_{2}(0)+{{\textbf {R}}}_{1}{{\textbf {u}}}_{1}(1),\\ \vdots \\ {{\textbf {x}}}_k&=\sum _{j=1}^{k}{{\textbf {R}}}_{j}{{\textbf {u}}}_{j}(k-j),~k=1,2,\ldots ; \end{aligned} \end{aligned}$$

(4)

Finally, the zero-state input-output map of system Eq. (2) after multiplication with the vector ${{\textbf {C}}}$ from the left can be written as:

$$\begin{aligned} y_k=\sum _{j=1}^{k}{{\textbf {w}}}_j{{\textbf {u}}}_j(k-j),~k=1,2,\ldots ; \end{aligned}$$

(5)

2.1 The Bilinear Markov Parameters

The bilinear Markov (invariant) parameters are encoded in the $\{{{\textbf {w}}}_j\}$ vectors for $j\in {\mathbb Z}_+$. These are invariant quantities of the bilinear system in connection with the input-output relation. After making use of Def. (3), we can write

$$\begin{aligned} \underbrace{\left[ \begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_k \end{array}\right] }_{{{\textbf {Y}}}}=\underbrace{\left[ \begin{array}{cccc} {{\textbf {u}}}_{1}^T(0) &{} 0 &{} \cdots &{} 0 \\ {{\textbf {u}}}_{1}^T(1) &{} {{\textbf {u}}}_{2}^T(0) &{} \cdots &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {{\textbf {u}}}_{1}^T(k-1) &{} {{\textbf {u}}}_{2}^T(k-2) &{} \cdots &{} {{\textbf {u}}}_{k}^T(0) \end{array}\right] }_{{{\textbf {U}}}}\cdot \underbrace{\left[ \begin{array}{c} {{\textbf {w}}}_1^T \\ {{\textbf {w}}}_2^T \\ \vdots \\ {{\textbf {w}}}_k^T\end{array}\right] }_{{{\textbf {W}}}}, \end{aligned}$$

(6)

where the dimensions are: ${{\textbf {Y}}}\in {\mathbb R}^{k\times 1}$, ${{\textbf {U}}}\in {\mathbb R}^{k\times m}$, and ${{\textbf {W}}}\in {\mathbb R}^{m\times 1}$.

The least squares problem filled out with k time steps will remain under-determined $\forall k\in \{2,3,\ldots \}$ as long as the $m=2^k-1$ bilinear Markov parameters are activated. Thus, we must deal with the k equations and the $2^k-1$ unknowns. Solving an under-determined system is not impossible, but the solutions are infinite, and regularization schemes cannot easily lead to identification. Therefore, one way to uniquely identify bilinear Markov parameters and determine the solution vector ${{\textbf {W}}}$ can be achieved by solving a coupled least squares system after applying several simulations to the original system.

To uniquely determine the $(2^k-1)$ parameters, the column rank of the matrix ${{\textbf {U}}}$ should be complete. This can be accomplished by adding rows with more experiments to the matrix ${{\textbf {U}}}$ until the new augmented matrix $\hat{{{\textbf {U}}}}$ has more rows than columns. Thus, we need at least $2^{k-1}$ independent simulations of the original system. That is exactly the bottleneck expected for nonlinear identification frameworks that deal with time-domain data. Later, we will relax this condition in a novel way using NNs. Equation (7) describes the coupled linear least squares system with $d=2^{k-1}$ independent simulations that can provide the unique solution $ \mathcal{W} $ with bilinear Markov parameters.

$$\begin{aligned} \underbrace{\left[ \begin{array}{ccc} {{\textbf {Y}}}_{1}&\cdots &{{\textbf {Y}}}_{d} \end{array}\right] ^T}_{\hat{{{\textbf {Y}}}}}=\underbrace{\left[ \begin{array}{ccc} {{\textbf {U}}}_{1}&\cdots {{\textbf {U}}}_{d} \end{array}\right] ^T}_{\hat{{{\textbf {U}}}}}\cdot \mathcal{W} \end{aligned}$$

(7)

Hence, we repeat the simulation d times, and each time we get k equations, with the $i^{\text {th}}$ simulation to be ${{\textbf {Y}}}_i=\left[ \begin{array}{cccc} y_1^{(i)} & y_2^{(i)} & \cdots & y_k^{(i)} \end{array}\right] ^T$ and accordingly for the ${{\textbf {U}}}_i$, the real matrix $\hat{{{\textbf {U}}}}$ has dimension $2^k\times (2^k-1)$. After concatenating all the lower triangular matrices with full column rank, the matrix $\hat{{{\textbf {U}}}}$ results. To enforce that $\hat{{{\textbf {U}}}}$ will also have full column rank, one choice is to use a white input (sampled from a Gaussian distribution) for the simulations. The use of a white input is widespread for SI. Still, in that case, a careful choice of deterministic inputs can make the inversion exact and recover the bilinear Markov parameters. The solution is as follows: $\text {rank}(\hat{{{\textbf {U}}}})=2^k-1,~\text {so, the unique solution is:}~ \mathcal{W} =\hat{{{\textbf {U}}}}^{-1}\hat{{{\textbf {Y}}}}\in {\mathbb R}^{2^k-1}$. The vector $ \mathcal{W} $ contains the $2^k-1$ bilinear Markov parameters. A generalized Hankel matrix can be computed from the bilinear Markov parameters.

2.2 The Bilinear Hankel Matrix

The bilinear Hankel matrix is the product of the observability and reachability matrices. The bilinear Hankel matrix is denoted with $ \mathcal{H} _b$ and is defined as the product of the following two infinite matrices $ \mathcal{O} ,~ \mathcal{R} $,

$$\begin{aligned} \mathcal{H} _b= \mathcal{O} \mathcal{R} =\left[ \begin{array}{c} {{\textbf {C}}}\\ {{\textbf {C}}}{{\textbf {A}}}\\ {{\textbf {C}}}{{\textbf {N}}}\\ \vdots \end{array}\right] \left[ \begin{array}{cccc} {{\textbf {B}}}& {{\textbf {A}}}{{\textbf {B}}}& {{\textbf {N}}}{{\textbf {B}}}& \cdots \end{array}\right] =\left[ \begin{array}{cccc} {{\textbf {C}}}{{\textbf {B}}}&{} {{\textbf {C}}}{{\textbf {A}}}{{\textbf {B}}}&{} {{\textbf {C}}}{{\textbf {N}}}{{\textbf {B}}}&{} \cdots \\ {{\textbf {C}}}{{\textbf {A}}}{{\textbf {B}}}&{} {{\textbf {C}}}{{\textbf {A}}}^2{{\textbf {B}}}&{} {{\textbf {C}}}{{\textbf {A}}}{{\textbf {N}}}{{\textbf {B}}}&{} \cdots \\ {{\textbf {C}}}{{\textbf {N}}}{{\textbf {B}}}&{} {{\textbf {C}}}{{\textbf {N}}}{{\textbf {A}}}{{\textbf {B}}}&{} {{\textbf {C}}}{{\textbf {N}}}^2{{\textbf {B}}}&{} \cdots \\ \vdots &{} \vdots &{} \vdots &{} \ddots \end{array}\right] \end{aligned}$$

(8)

Equation (8) reveals the connection with the bilinear Markov parameters $ \mathcal{W} ={{\textbf {C}}} \mathcal{R} $ that appear in the first row of $ \mathcal{H} _b$. In general, the construction of the bilinear Hankel matrix is described in [13] with the partial and completed realization theorems along with the partitions^{Footnote 2} $ \mathcal{S} ^{A},~ \mathcal{S} ^{N}$ [14].

2.3 Bilinear Realization Algorithm

Input: Input-output time-domain data from a system $u\rightarrow \boxed {\boldsymbol{\varSigma }?}\rightarrow y$.

Output: A minimal bilinear system $({{\textbf {A}}}_r,{{\textbf {N}}}_r,{{\textbf {B}}}_r,{{\textbf {C}}}_r)$ of low dimension r that $\boldsymbol{\varSigma }_r\approx \boldsymbol{\varSigma }$.

1.

Excite the system $\boldsymbol{\varSigma }$ k times with ${{\textbf {u}}}_m\sim \mathcal {{{\textbf {N}}}} (\mu ,\sigma )$ and collect ${{\textbf {y}}}_m$, where $k=2^{m-1}$.

1st simulation	$[u_{1}(1)\cdots u_{1}(m)]\rightarrow \boxed {\boldsymbol{\varSigma }}\rightarrow [y_{1}(1)\cdots y_{1}(m)]={{\textbf {Y}}}_1$, and ${{\textbf {U}}}_1$ as in Definition 3
$\vdots $	$\vdots $
kth simulation	$[u_{k}(1)\cdots u_{k}(m)]\rightarrow \boxed {\boldsymbol{\varSigma }}\rightarrow [y_{k}(1)\cdots y_{k}(m)]={{\textbf {Y}}}_k$, and ${{\textbf {U}}}_k$ as in Definition 3.

2.
Identify the $(2^m-1)$ bilinear Markov parameters by solving the system in (7).
3.
Construct the bilinear Hankel matrix $ \mathcal{H} _b$ and the sub-matrices $ \mathcal{S} ^{{{\textbf {A}}}},~ \mathcal{S} ^{{{\textbf {N}}}}$.
4.
Compute $[{{\textbf {U}}},\boldsymbol{\varSigma },{{\textbf {V}}}]=\text {SVD}( \mathcal{H} _b)$ and truncate w.r.t the singular values decay ($r\ll n$) - the reduced/identified bilinear model $({{\textbf {A}}}_r,{{\textbf {N}}}_r,{{\textbf {B}}}_r,{{\textbf {C}}}_r)$ is constructed
$$\begin{aligned} {{\textbf {A}}}_r= & {} \boldsymbol{\varSigma }^{-1/2}{{\textbf {U}}}^T \mathcal{S} ^{{{\textbf {A}}}}{{\textbf {V}}}\boldsymbol{\varSigma }^{-1/2}\end{aligned}$$
(9)

$$\begin{aligned} {{\textbf {N}}}_r= & {} \boldsymbol{\varSigma }^{-1/2}{{\textbf {U}}}^T \mathcal{S} ^{{{\textbf {N}}}}{{\textbf {V}}}\boldsymbol{\varSigma }^{-1/2}\end{aligned}$$
(10)

$$\begin{aligned} {{\textbf {B}}}_r= & {} \boldsymbol{\varSigma }^{1/2}{{\textbf {V}}}^{T}~~\rightarrow ~~\text {1st column}\end{aligned}$$
(11)

$$\begin{aligned} {{\textbf {C}}}_r= & {} {{\textbf {U}}}\boldsymbol{\varSigma }^{1/2}~~\rightarrow ~~\text {1st row} \end{aligned}$$
(12)

Example 1

(A toy system) Let the following bilinear system of order 2 be

$$\begin{aligned} {{\textbf {A}}}=\left[ \begin{array}{cc} 0.9 &{} 0.0\\ 0.0 &{} 0.8 \end{array}\right] ,~{{\textbf {N}}}=\left[ \begin{array}{cc} 0.1 &{} 0.2\\ 0.3 &{} 0.4 \end{array}\right] ,~{{\textbf {B}}}=\left[ \begin{array}{c} 1.0\\ 0.0 \end{array}\right] ,~{{\textbf {C}}}=\left[ \begin{array}{c} 1.0\\ 1.0 \end{array}\right] ^T. \end{aligned}$$

(13)

Applying the algorithm in Sect. (2.3), by choosing $m=4$, we can recover $2^m-1=15$ bilinear Markov parameters. The solution of the system in Eq. (7) is:

$$\begin{aligned} {{\textbf {W}}}=\left[ \begin{array}{ccccccccccccccc} 1.0 & 0.9 & 0.4 & 0.81 & 0.33 & 0.36 & 0.22 & 0.729 & 0.273 & 0.297 & 0.183 & 0.324 & 0.18 & 0.198 & 0.118 \end{array}\right] \end{aligned}$$

By reshuffling the vector $ \mathcal{W} $, we can form the $ \mathcal{H} _b$ matrix and the shifted versions $ \mathcal{S} ^{{{\textbf {A}}}},~ \mathcal{S} ^{{{\textbf {N}}}}$ as described above. The Hankel matrix (3 rows & 7 columns are displayed) along with the shifted versions are $\left\{ \mathcal{H} _b,~ \mathcal{S} ^A,~ \mathcal{S} ^N\right\} :=$

$$\begin{aligned}\left\{ \left[ \begin{array}{ccccccc} 1.0 &{} 0.9 &{} 0.4 &{} 0.81 &{} 0.33 &{} 0.36 &{} 0.22\\ 0.9 &{} 0.81 &{} 0.33 &{} 0.729 &{} 0.273 &{} 0.297 &{} 0.183\\ 0.4 &{} 0.36 &{} 0.22 &{} 0.324 &{} 0.18 &{} 0.198 &{} 0.118 \end{array}\right] ,~\left[ \begin{array}{ccc} 0.9 &{} 0.81 &{} 0.33\\ 0.81 &{} 0.729 &{} 0.273\\ 0.36 &{} 0.324 &{} 0.18 \end{array}\right] ,~\left[ \begin{array}{ccc} 0.4 &{} 0.36 &{} 0.22\\ 0.33 &{} 0.297 &{} 0.183\\ 0.22 &{} 0.198 &{} 0.118 \end{array}\right] \right\} . \end{aligned}$$

In Fig. (1), the 3rd normalized singular value has reached machine precision $\sigma _{3}/\sigma _{1}=5.2501e-17$, that is the criterion for choosing the order of the fitted system (which is minimal, in this case) of the underlying bilinear system. Therefore, we construct a bilinear model of order $r=2$, and the realization obtained is equivalent to the original (minimal) one, up to a coordinate (similarity) transformation. Other ways of constructing reduced models from Hankel$\subset $Loewner matrices can be obtained with the CUR (cross approximations based) decomposition scheme as in [17] (Fig. 1).

$$\begin{aligned} {{\textbf {A}}}_r=\left[ \begin{array}{cc} 0.89394 &{} 0.11305\\ 0.0050328 &{} 0.80606 \end{array}\right] ,~{{\textbf {N}}}_r=\left[ \begin{array}{cc} 0.41116 &{} -0.2281\\ -0.24782 &{} 0.088841 \end{array}\right] ,~{{\textbf {B}}}_r=\left[ \begin{array}{c} -1.0001\\ -0.053577 \end{array}\right] ,~{{\textbf {C}}}_r=\left[ \begin{array}{c} -1.0001\\ 0.0040101 \end{array}\right] ^T. \end{aligned}$$

(14)

Example 2

(The viscous Burgers’ equation example) Following [15] after spatial semi-discretization and the Carleman linearization technique, yields a bilinear system of dimension $N=30^2+30=930$. The viscosity parameter is $\nu =0.1$; the sampling time is $\varDelta t=0.1$ and with $2^{m-1}=512$ independent random inputs of length $m=10$ each, we construct a database of 5, 120 points. Solving Eq. (7), we get the bilinear Markov parameters, and the bilinear Hankel matrix is constructed. On the left pane of Fig. (2), the decay of bilinear Hankel singular values captures the nonlinear nature of the Burgers’ equation, while, on the other hand, the linear Hankel framework captures only the linear minimal response. It is evident in the right pane of Fig. (2) that after using the inverse transformation $\phi $ from (3), the reduced continuous-time bilinear model of order $r=18$ performs well, producing an error $O(10^{-5})$ where at the same time the linear fit is off (Fig. 2).

3 From a Single Data Sequence to Bilinear Realization

A repetitive data assimilation simulation in the time domain is required to achieve bilinear realization as in [13]. In many cases, the data from a simulated system are available as a single i/o sequence [9]. Using the NARX-net-based model, in the case of a single experiment, the expensive, repetitive simulations can be avoided in a real engineering environment. These models learn from a unique data sequence and can predict the output behavior under different excitations. That is precisely where the NARX-net model architecture will play the role of a surrogate simulator. Then, by constructing an NN-based model [19] and combining the realization theory in [13], a state-space bilinear model can be constructed as in (2). Using a state-space model, which relies on the classical nonlinear realization theory with many known results (especially on bilinear systems and in the study direction of stability, approximation, and control), is beneficial compared to the NARX.

Example 3

(Heat exchanger) The process is a liquid-saturated steam heat exchanger, where water is heated by pressurized saturated steam through a copper tube. The input variable is the liquid flow rate, and the output variable is the outlet liquid temperature. The sampling time is 1(s), and the number of samples is 4, 000. More details can be found in [20], and the data set can be downloaded from the database to identify systems (DaISy): https://homes.esat.kuleuven.be/~tokka/daisydata.html.

$$\begin{aligned} \left\{ \begin{aligned} \dot{{{\textbf {x}}}}(t)&=\left[ \begin{array}{ccc} 0.9164 &{} 0.09167 &{} -0.1847\\ -0.2663 &{} -0.1515 &{} 0.1232\\ -0.07227 &{} 0.4778 &{} 0.3571 \end{array}\right] {{\textbf {x}}}(t)+\left[ \begin{array}{ccc} 0.02717 &{} 0.5169 &{} 0.5555\\ -0.09674 &{} 0.5467 &{} 0.5696\\ 0.1878 &{} -0.06846 &{} -1.981 \end{array}\right] {{\textbf {x}}}(t)u(t)+\\ &+\left[ \begin{array}{c} 2.9063\\ 2.909\\ -0.16088 \end{array}\right] u(t)+\left[ \begin{array}{c} -1.073\\ -1.074\\ 0.05938 \end{array}\right] ,\\ y(t)&=\left[ \begin{array}{ccc} -0.7852 & 0.7794 & -0.05203 \end{array}\right] {{\textbf {x}}}(t)+96.9358,~{{\textbf {x}}}(0)={\textbf {0}},~t\ge 0. \end{aligned}\right. \end{aligned}$$

(15)

Figure 3 illustrates the superiority of the proposed method in terms of accuracy. From the single i/o data sequence, a neural network NN with 3-layers and 20-lags was trained using the same training data^{Footnote 3} as in [9] (1000 points). The trained NN was used in the bilinear realization algorithm to generate more data, and a stable reduced bilinear model of order $r=3$ shown in Eq. (15) was successfully constructed. The original noisy data were explained with a lower mean percentage error $\text {MPE}=0.56\%$ compared to the subspace method for the entire data set. Another NN architecture, s.a., the NARMAX^{Footnote 4} belongs to a subclass of bilinear systems and will filter some nonlinear features without achieving such a good MPE.

4 Conclusion

In conclusion, NN architectures are a superclass of NARMAX models used in the classical robust identification theory. Consequently, NN models share the same strong argument with the Carleman linearization scheme that can approximate general nonlinear systems. Finally, NN and realization theory successfully bridge data science with computational science to build reliable, interpretable nonlinear models. Different NN architectures (s.a., recurrent NNs, DeepOnets, etc.) in combination with other realization frameworks (s.a., the Loewner framework) and for other types of nonlinearities (s.a., quadratic-bilinear) are left for future research endeavors.

Notes

1.
$\otimes $: Kronecker product.
2.
$ \mathcal{S} ^{A}=\{\text {set of } \mathcal{H} _b \text {columns}:\text {from}~2^m~\text {to}~(3\cdot 2^{m-1}-1),~m=1,2,\ldots \}$,
$~ \mathcal{S} ^{N}=\{\text {set of } \mathcal{H} _b \text {columns}:\text {from}~3\cdot 2^{m-1}~\text {to}~(2^{m+1}-1),~m=1,2,\ldots \}$.
3.
Data detrend: $u_n=(u-\bar{u})/\sigma _u,~y_n=(y-\bar{y})/\sigma _y$; zero-response: data were doubled in size for learning the zero-response, i.e., $u_n=0\rightarrow \boxed {\boldsymbol{\varSigma }}\rightarrow y_n=0$.
4.
NARMAX: The nonlinear auto-regressive moving average model with exogenous input [14, 19].

References

Antoulas, A.C.: Approximation of Large-Scale Dynamical Systems. SIAM, Philadelphia (2005)
Book Google Scholar
Antoulas, A.C., Beattie, C.A., Gugercin, S.: Interpolatory Methods for Model Reduction. SIAM, Philadelphia (2020)
Book Google Scholar
Kapteyn, M., Knezevic, D., Huynh, D., Tran, M., Willcox, K.: Data-driven physics-based digital twins via a library of component-based reduced-order models. Int. J. Numer. Meth. Eng. 123(13), 2986–3003 (2022). https://doi.org/10.1002/nme.6423
Article Google Scholar
Carleman, T.: Application de la théorie des équations intégrales linéaires aux systèmes d’équations différentielles non linéaires. Acta Math. 59, 63–87 (1932)
Article MathSciNet Google Scholar
Antoulas, A.C., Gosea, I.V., Ionita, A.C.: Model reduction of bilinear systems in the Loewner framework. SIAM J. Sci. Comput. 38(5), B889–B916 (2016)
Article MathSciNet Google Scholar
Schilders, W., Meijer, P., Ciggaar, E.: Behavioural modelling using the MOESP algorithm, dynamic neural networks and the Bartels-Stewart algorithm. Appl. Numer. Math. 58(12), 1972–1993 (2008). https://doi.org/10.1016/j.apnum.2007.11.013
Article MathSciNet Google Scholar
Ljung, L., Hjalmarsson, H., Ohlsson, H.: Four encounters with system identification. Eur. J. Control. 17(5), 449–471 (2011). https://doi.org/10.3166/ejc.17.449-471
Article MathSciNet Google Scholar
Favoreel, W., De Moor, B., Van Overschee, P.: Subspace identification of bilinear systems subject to white inputs. IEEE Trans. Autom. Control 44(6), 1157–1165 (1999). https://doi.org/10.1109/9.769370
Article MathSciNet Google Scholar
dos Santos, P.L., Ramos, J.A., de Carvalho, J.L.M.: Identification of bilinear systems with white noise inputs: an iterative deterministic-stochastic subspace approach. IEEE Trans. Control Syst. Technol. 17(5), 1145–1153 (2009). https://doi.org/10.1109/TCST.2008.2002041
Article Google Scholar
Juang, J.N.: Continuous-time bilinear system identification. Nonlinear Dyn. 39(1), 79–94 (2005). https://doi.org/10.1007/s11071-005-1915-z
Article MathSciNet Google Scholar
Gosea, I.V., Karachalios, D.S., Antoulas, A.C.: On computing reduced-order bilinear models from time-domain data. PAMM 21(1), e202100,254 (2021). https://doi.org/10.1002/pamm.202100254
Ho, B.L., Kalman, R.E.: Editorial: effective construction of linear state-variable models from input/output functions. at - Automatisierungstechnik 14(1–12), 545–548 (1966). https://doi.org/10.1524/auto.1966.14.112.545
Isidori, A.: Direct construction of minimal bilinear realizations from nonlinear input-output maps. IEEE Trans. Autom. Control 18(6), 626–631 (1973). https://doi.org/10.1109/TAC.1973.1100424
Article MathSciNet Google Scholar
Al-Baiyat, S.A.: Model reduction of bilinear systems described by input-output difference equation. Int. J. Syst. Sci. 35(9), 503–510 (2004). https://doi.org/10.1080/00207720410001734237
Article MathSciNet Google Scholar
Benner, P., Breiten, T., Damm, T.: Generalised tangential interpolation for model reduction of discrete-time mimo bilinear systems. Int. J. Control 84(8), 1398–1407 (2011). https://doi.org/10.1080/00207179.2011.601761
Article MathSciNet Google Scholar
Dunoyer, A., Balmer, L., Burnham, K.J., James, D.J.G.: On the discretization of single-input single-output bilinear systems. Int. J. Control 68(2), 361–372 (1997). https://doi.org/10.1080/002071797223668
Article MathSciNet Google Scholar
Karachalios, D.S., Gosea, I.V., Antoulas, A.C.: The Loewner framework for system identification and reduction, pp. 181–228. De Gruyter (2021). https://doi.org/10.1515/9783110498967-006
Karachalios, D.S., Gosea, I.V., Antoulas, A.C.: On bilinear time-domain identification and reduction in the loewner framework. In: Benner, P., Breiten, T., Fabender, H., Hinze, M., Stykel, T., Zimmermann, R. (eds.) Model Reduction of Complex Dynamical Systems, vol. 171, pp. 3–30. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72983-7_1
Billings, S.A.: Neural Networks for Nonlinear System Identification, chap. 8, pp. 261–287. John Wiley & Sons, Ltd., Hoboken (2013). https://doi.org/10.1002/9781118535561.ch8
Bittanti, S., Piroddi, L.: Nonlinear identification and control of a heat exchanger: a neural network approach. J. Franklin Inst. 334(1), 135–153 (1997). https://doi.org/10.1016/S0016-0032(96)00059-2
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Electrical Engineering in Medicine, University Luebeck, Luebeck, Germany
D. S. Karachalios
Max Planck Institute for Dynamics of Complex Technical Systems, Magdeburg, Germany
D. S. Karachalios, I. V. Gosea, K. Kour & A. C. Antoulas
Department of Electrical and Computer Engineering, Rice University, Houston, TX, 77005, USA
A. C. Antoulas
Baylor College of Medicine, Houston, TX, 77030, USA
A. C. Antoulas

Authors

D. S. Karachalios
View author publications
You can also search for this author in PubMed Google Scholar
I. V. Gosea
View author publications
You can also search for this author in PubMed Google Scholar
K. Kour
View author publications
You can also search for this author in PubMed Google Scholar
A. C. Antoulas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. S. Karachalios .

Editor information

Editors and Affiliations

Eindhoven University of Technology, Eindhoven, The Netherlands
Martijn van Beurden
DIAM, Numerical Analysis, Delft University of Technology, Delft, Zuid-Holland, The Netherlands
Neil V. Budko
Politehnica University of Bucharest, Bucharest, Romania
Gabriela Ciuprina
Department Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands
Wil Schilders
Eindhoven University of Technology, Eindhoven, The Netherlands
Harshit Bansal
INESC-ID, Lisbon, Portugal
Ruxandra Barbulescu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karachalios, D.S., Gosea, I.V., Kour, K., Antoulas, A.C. (2024). Bilinear Realization from I/O Data with NNs. In: van Beurden, M., Budko, N.V., Ciuprina, G., Schilders, W., Bansal, H., Barbulescu, R. (eds) Scientific Computing in Electrical Engineering. SCEE 2022. Mathematics in Industry(), vol 43. Springer, Cham. https://doi.org/10.1007/978-3-031-54517-7_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-54517-7_21
Published: 01 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54516-0
Online ISBN: 978-3-031-54517-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

1st simulation	\([u_{1}(1)\cdots u_{1}(m)]\rightarrow \boxed {\boldsymbol{\varSigma }}\rightarrow [y_{1}(1)\cdots y_{1}(m)]={{\textbf {Y}}}_1\), and \({{\textbf {U}}}_1\) as in Definition 3
\(\vdots \)	\(\vdots \)
kth simulation	\([u_{k}(1)\cdots u_{k}(m)]\rightarrow \boxed {\boldsymbol{\varSigma }}\rightarrow [y_{k}(1)\cdots y_{k}(m)]={{\textbf {Y}}}_k\), and \({{\textbf {U}}}_k\) as in Definition 3.

Bilinear Realization from I/O Data with NNs

Abstract

Similar content being viewed by others

On Bilinear Time-Domain Identification and Reduction in the Loewner Framework

Iterative Identification Algorithms for Bilinear-in-parameter Systems by Using the Over-parameterization Model and the Decomposition

The Auxiliary Model Based Hierarchical Estimation Algorithms for Bilinear Stochastic Systems with Colored Noises

1 Introduction