1 Introduction

Evolutionary phenomena can be formally described as continuous dynamical models with partial differential equations (PDE)s. The continuous nature of these physical models is equipped with analytical results for an efficient discrete approximation in space and in time. In particular, methods such as finite elements or finite differences bridge the continuous analytical laws of the physical world with computational science [1]. On the other hand, data science allows model discovery when the identification feature is considered [2]. Quantification of these equivalences, in combination with the stochastic nature that governs real-world applications, aims to explain the digital twin [3]. Spatial discretization of PDEs, in many cases, results in a continuous in-time system of ordinary differential equations (ODE)s that is described by the operators \((\textbf{F},~\textbf{G})\) and can be approximated with Carleman linearization (e.g., bilinear system form) [4] where we present the single-input single-output (SISO) case with the continuous operators to be denoted with the subscript "c."

$$\begin{aligned} \boldsymbol{\varSigma }:\left\{ \begin{aligned} \dot{{{\textbf {x}}}}(t)&=\textbf{F}({{\textbf {x}}}(t))+\textbf{G}({{\textbf {x}}}(t))u(t)\\ y(t)&={{\textbf {H}}}{{\textbf {x}}}(t),~{{\textbf {x}}}_0=\textbf{0},~t\ge 0. \end{aligned}\right. \xrightarrow [\boldsymbol{\varSigma }\approx \boldsymbol{\varSigma }_{bil}]{\text {Carleman}}\boldsymbol{\varSigma }_{bil}:\left\{ \begin{aligned} \dot{{{\textbf {x}}}}(t)&=\textbf{A}_c{{\textbf {x}}}(t)+\textbf{N}_c{{\textbf {x}}}(t)u(t)+\textbf{B}_c u(t)\\ y(t)&=\textbf{C}_c{{\textbf {x}}}(t),~{{\textbf {x}}}_0={\textbf {0}},~t\ge 0. \end{aligned}\right. \end{aligned}$$
(1)

If the original system has dimension n, since Carleman linearization [4] preserves up to the quadratic term \({{\textbf {x}}}(t)\otimes {{\textbf {x}}}(t)\)Footnote 1, the dimension of the resulting bilinear system \((\textbf{A}_c,~\textbf{N}_c,~\textbf{B}_c,~\textbf{C}_c)\) increases to \(N=n^2+n\).

Data-driven methods can be classified into two general classes. The first provides prediction through regression techniques such as neural networks (NN)s from machine learning (ML). At the same time, the second has its roots in system theory and allows model discovery [2, 5]. Generally, NNs are sensitive to parameter tuning and lack model interpretability due to the inherent “black-box” structure [6], while the latter construct interpretable models and can explain the hidden dynamics. ML models learn the features by composing non-linear activation functions and utilizing mainly the backpropagation algorithm to adjust the network weights during training. Therefore, by using data points for training, the prediction would be expressed as a function of these data points (finite memory). Until recently, the ML and system identification (SI) techniques were developed independently. But in recent years, great effort has been invested into establishing a common ground [7].

The authors in [8] have extended the subspace realization theory from linear to bilinear systems. For example, in applications that concern chemical processes, the controls are flow rates, and, from the first principles, e.g., mass and heat balances, these will appear in the system equations as products with the state variables. Therefore, the bilinear equation has the physical form \(\small M\dot{{{\textbf {x}}}}=\sum _i{{\textbf {q}}}_i{{\textbf {x}}}_i-\sum _m {{\textbf {q}}}_m{{\textbf {x}}}_m,~{{\textbf {q}}}\text {(inputs)},~{{\textbf {x}}}\text {(state)}\). The authors of [9] could construct bilinear systems with white noise input based on an iterative deterministic-stochastic subspace approach. The author in [10] uses the properties of the linear model of the bilinear system when subjected to a constant input. Constant inputs can transform the bilinear model to an equivalent linear model [11].

In Sect. (2), we introduce the theory of bilinear realization by explaining in detail the data acquisition procedure to compute the bilinear Markov parameters that will enter the bilinear Hankel matrix. Further, we present a concise algorithm that can achieve bilinear identification, detailed by two examples. In Sect. (3), we train a neural network with a single i/o data sequence to mimic the unknown simulator and combine it with the bilinear realization theory. As a result, we could construct a bilinear model from a single i/o data with slightly better fit performance compared with another state-of-the-art bilinear SI approach. Finally, we provide the conclusion and the outlook in Sect. (4).

2 The Bilinear Realization Framework

In the case of linear systems, Ho and Kalman [12] have provided the mathematical foundations for realizing linear systems from i/o data. In the nonlinear case and towards the exact scope of identifying nonlinear systems, Isidori in [13] has extended these results for the bilinear case, and Al Baiyat in [14] has provided an SVD-based algorithm.

Time discretization as in [15] of the single-input single-output (SISO) bilinear system Eq. (1) with sampling time \(\varDelta t\), results in fully discrete models defined at time instances given by \(0<\varDelta t<2\varDelta t<\cdots <k\varDelta t\), with \({{\textbf {x}}}_c(k\varDelta t)={{\textbf {x}}}_k\) and \(u(k\varDelta t)=u_k\) for \(k=0,\ldots ,m-1\)

$$\begin{aligned} \boldsymbol{\varSigma }_{\text {disc}} :\left\{ \begin{aligned} {{\textbf {x}}}_{k+1}&={{\textbf {A}}}{{\textbf {x}}}_k+{{\textbf {N}}}{{\textbf {x}}}_k u_k+{{\textbf {B}}}u_k,\\ y_k&={{\textbf {C}}}{{\textbf {x}}}_k,~{{\textbf {x}}}_0={\textbf {0}}. \end{aligned}\right. \end{aligned}$$
(2)

The discrete-time system in Eq. (2) has state dimension N, so, \({{\textbf {x}}}\in {\mathbb R}^N\) and the operators have dimensions \({{\textbf {A}}},{{\textbf {N}}}\in {\mathbb R}^{N\times N},~{{\textbf {B}}},{{\textbf {C}}}^T\in {\mathbb R}^{N}\). We have assumed homogeneous initial conditions and a zero feed-forward term (e.g., \({{\textbf {D}}}=0\) term). As far as the authors are aware, the forward Euler scheme is the only numerical scheme that preserves the bilinear structure in a discrete set-up with the cost of conditional stability. Moreover, a more sophisticated scheme can exactly interpolate the continuous model at the sampling points in [16] but is restricted to only a subclass of bilinear systems. Therefore, a good choice in terms of stability is the backward Euler scheme from [15], which preserves the bilinear structure asymptotically, and the transformation in Eq. (3) that leads to the discrete system is

$$\begin{aligned} \begin{aligned} &\phi :~{{\textbf {A}}}=({{\textbf {I}}}-\varDelta t{{\textbf {A}}}_c)^{-1},~{{\textbf {N}}}=\varDelta t({{\textbf {I}}}-\varDelta t{{\textbf {A}}}_c)^{-1}{{\textbf {N}}}_c,~{{\textbf {B}}}=\varDelta t({{\textbf {I}}}-\varDelta t{{\textbf {A}}}_c)^{-1}{{\textbf {B}}}_c,~{{\textbf {C}}}={{\textbf {C}}}_c,\\ &\boldsymbol{\varSigma }_{b}^{c}:({{\textbf {A}}}_c,{{\textbf {N}}}_c,{{\textbf {B}}}_c,{{\textbf {C}}}_c)\overset{\phi ^{-1}}{\leftrightarrow }\boldsymbol{\varSigma }_{b}^{d}:({{\textbf {A}}},{{\textbf {N}}},{{\textbf {B}}},{{\textbf {C}}}) \end{aligned} \end{aligned}$$
(3)

Definition 1

The reachability matrix \( \mathcal{R} _{n}=\left[ \begin{array}{ccc} {{\textbf {R}}}_{1} &{} \cdots &{} {{\textbf {R}}}_{n} \\ \end{array}\right] \) is defined recursively from the following relation: \({{\textbf {R}}}_j=\left[ \begin{array}{cc} {{\textbf {A}}}{{\textbf {R}}}_{j-1}~~ & ~~{{\textbf {N}}}{{\textbf {R}}}_{j-1}\end{array}\right] ,~j=2,\ldots n,~{{\textbf {R}}}_1={{\textbf {B}}}\).

Then, the state space of the bilinear system is spanned by the states reachable from the origin if and only if \(\text {rank}( \mathcal{R} _n)=n\).

Definition 2

The observability matrix \( \mathcal{O} _{n}=\left[ \begin{array}{ccc} {{\textbf {O}}}_{1} &{} \cdots &{} {{\textbf {O}}}_{n} \\ \end{array}\right] ^T\) is defined recursively from the following relation: \({{\textbf {O}}}_j^T=\left[ \begin{array}{cc} {{\textbf {O}}}_{j-1}{{\textbf {A}}}~~ & ~~{{\textbf {O}}}_{j-1}{{\textbf {N}}}\end{array}\right] ^T,~j=2,\ldots n,~{{\textbf {O}}}_1={{\textbf {C}}}\).

Then the state space of the bilinear system is observable iff \(\text {rank}( \mathcal{O} _n)=n\). The following Def. (3) will allow a concise representation of the i/o relation.

Definition 3

\({{\textbf {u}}}_j(h)=\left[ \begin{array}{c} {{\textbf {u}}}_{j-1}(h) \\ {{\textbf {u}}}_{j-1}(h)u(h+j-1) \end{array}\right] ,~j=2,\ldots ,~{{\textbf {u}}}_1(h)=u(h)\).

Let \(\{{{\textbf {w}}}_1,{{\textbf {w}}}_2,\ldots ,{{\textbf {w}}}_j,\ldots \}\) be an infinite sequence of row vectors, in which \({{\textbf {w}}}_j\in {\mathbb R}^{1\times 2^{j-1}}\) and is defined recursively as follows \({{\textbf {w}}}_j={{\textbf {C}}}{{\textbf {R}}}_{j},~j=1,2,\ldots \);

The state response of system Eq. (2) from the state \({{\textbf {x}}}_0={\textbf {0}}\) at time \(k=0\), under a given input function, can be expressed as:

$$\begin{aligned} \begin{aligned} {{\textbf {x}}}_1&={{\textbf {B}}}u_0\triangleq {{\textbf {R}}}_1{{\textbf {u}}}_1(0),\\ {{\textbf {x}}}_2&={{\textbf {A}}}{{\textbf {R}}}_{1}{{\textbf {u}}}_{1}(0)+{{\textbf {N}}}{{\textbf {R}}}_{1}{{\textbf {u}}}_{1}(0)u(1)+{{\textbf {B}}}u(1)\triangleq {{\textbf {R}}}_{2}{{\textbf {u}}}_{2}(0)+{{\textbf {R}}}_{1}{{\textbf {u}}}_{1}(1),\\ \vdots \\ {{\textbf {x}}}_k&=\sum _{j=1}^{k}{{\textbf {R}}}_{j}{{\textbf {u}}}_{j}(k-j),~k=1,2,\ldots ; \end{aligned} \end{aligned}$$
(4)

Finally, the zero-state input-output map of system Eq. (2) after multiplication with the vector \({{\textbf {C}}}\) from the left can be written as:

$$\begin{aligned} y_k=\sum _{j=1}^{k}{{\textbf {w}}}_j{{\textbf {u}}}_j(k-j),~k=1,2,\ldots ; \end{aligned}$$
(5)

2.1 The Bilinear Markov Parameters

The bilinear Markov (invariant) parameters are encoded in the \(\{{{\textbf {w}}}_j\}\) vectors for \(j\in {\mathbb Z}_+\). These are invariant quantities of the bilinear system in connection with the input-output relation. After making use of Def. (3), we can write

$$\begin{aligned} \underbrace{\left[ \begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_k \end{array}\right] }_{{{\textbf {Y}}}}=\underbrace{\left[ \begin{array}{cccc} {{\textbf {u}}}_{1}^T(0) &{} 0 &{} \cdots &{} 0 \\ {{\textbf {u}}}_{1}^T(1) &{} {{\textbf {u}}}_{2}^T(0) &{} \cdots &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {{\textbf {u}}}_{1}^T(k-1) &{} {{\textbf {u}}}_{2}^T(k-2) &{} \cdots &{} {{\textbf {u}}}_{k}^T(0) \end{array}\right] }_{{{\textbf {U}}}}\cdot \underbrace{\left[ \begin{array}{c} {{\textbf {w}}}_1^T \\ {{\textbf {w}}}_2^T \\ \vdots \\ {{\textbf {w}}}_k^T\end{array}\right] }_{{{\textbf {W}}}}, \end{aligned}$$
(6)

where the dimensions are: \({{\textbf {Y}}}\in {\mathbb R}^{k\times 1}\), \({{\textbf {U}}}\in {\mathbb R}^{k\times m}\), and \({{\textbf {W}}}\in {\mathbb R}^{m\times 1}\).

The least squares problem filled out with k time steps will remain under-determined \(\forall k\in \{2,3,\ldots \}\) as long as the \(m=2^k-1\) bilinear Markov parameters are activated. Thus, we must deal with the k equations and the \(2^k-1\) unknowns. Solving an under-determined system is not impossible, but the solutions are infinite, and regularization schemes cannot easily lead to identification. Therefore, one way to uniquely identify bilinear Markov parameters and determine the solution vector \({{\textbf {W}}}\) can be achieved by solving a coupled least squares system after applying several simulations to the original system.

To uniquely determine the \((2^k-1)\) parameters, the column rank of the matrix \({{\textbf {U}}}\) should be complete. This can be accomplished by adding rows with more experiments to the matrix \({{\textbf {U}}}\) until the new augmented matrix \(\hat{{{\textbf {U}}}}\) has more rows than columns. Thus, we need at least \(2^{k-1}\) independent simulations of the original system. That is exactly the bottleneck expected for nonlinear identification frameworks that deal with time-domain data. Later, we will relax this condition in a novel way using NNs. Equation (7) describes the coupled linear least squares system with \(d=2^{k-1}\) independent simulations that can provide the unique solution \( \mathcal{W} \) with bilinear Markov parameters.

$$\begin{aligned} \underbrace{\left[ \begin{array}{ccc} {{\textbf {Y}}}_{1}&\cdots &{{\textbf {Y}}}_{d} \end{array}\right] ^T}_{\hat{{{\textbf {Y}}}}}=\underbrace{\left[ \begin{array}{ccc} {{\textbf {U}}}_{1}&\cdots {{\textbf {U}}}_{d} \end{array}\right] ^T}_{\hat{{{\textbf {U}}}}}\cdot \mathcal{W} \end{aligned}$$
(7)

Hence, we repeat the simulation d times, and each time we get k equations, with the \(i^{\text {th}}\) simulation to be \({{\textbf {Y}}}_i=\left[ \begin{array}{cccc} y_1^{(i)} & y_2^{(i)} & \cdots & y_k^{(i)} \end{array}\right] ^T\) and accordingly for the \({{\textbf {U}}}_i\), the real matrix \(\hat{{{\textbf {U}}}}\) has dimension \(2^k\times (2^k-1)\). After concatenating all the lower triangular matrices with full column rank, the matrix \(\hat{{{\textbf {U}}}}\) results. To enforce that \(\hat{{{\textbf {U}}}}\) will also have full column rank, one choice is to use a white input (sampled from a Gaussian distribution) for the simulations. The use of a white input is widespread for SI. Still, in that case, a careful choice of deterministic inputs can make the inversion exact and recover the bilinear Markov parameters. The solution is as follows: \(\text {rank}(\hat{{{\textbf {U}}}})=2^k-1,~\text {so, the unique solution is:}~ \mathcal{W} =\hat{{{\textbf {U}}}}^{-1}\hat{{{\textbf {Y}}}}\in {\mathbb R}^{2^k-1}\). The vector \( \mathcal{W} \) contains the \(2^k-1\) bilinear Markov parameters. A generalized Hankel matrix can be computed from the bilinear Markov parameters.

2.2 The Bilinear Hankel Matrix

The bilinear Hankel matrix is the product of the observability and reachability matrices. The bilinear Hankel matrix is denoted with \( \mathcal{H} _b\) and is defined as the product of the following two infinite matrices \( \mathcal{O} ,~ \mathcal{R} \),

$$\begin{aligned} \mathcal{H} _b= \mathcal{O} \mathcal{R} =\left[ \begin{array}{c} {{\textbf {C}}}\\ {{\textbf {C}}}{{\textbf {A}}}\\ {{\textbf {C}}}{{\textbf {N}}}\\ \vdots \end{array}\right] \left[ \begin{array}{cccc} {{\textbf {B}}}& {{\textbf {A}}}{{\textbf {B}}}& {{\textbf {N}}}{{\textbf {B}}}& \cdots \end{array}\right] =\left[ \begin{array}{cccc} {{\textbf {C}}}{{\textbf {B}}}&{} {{\textbf {C}}}{{\textbf {A}}}{{\textbf {B}}}&{} {{\textbf {C}}}{{\textbf {N}}}{{\textbf {B}}}&{} \cdots \\ {{\textbf {C}}}{{\textbf {A}}}{{\textbf {B}}}&{} {{\textbf {C}}}{{\textbf {A}}}^2{{\textbf {B}}}&{} {{\textbf {C}}}{{\textbf {A}}}{{\textbf {N}}}{{\textbf {B}}}&{} \cdots \\ {{\textbf {C}}}{{\textbf {N}}}{{\textbf {B}}}&{} {{\textbf {C}}}{{\textbf {N}}}{{\textbf {A}}}{{\textbf {B}}}&{} {{\textbf {C}}}{{\textbf {N}}}^2{{\textbf {B}}}&{} \cdots \\ \vdots &{} \vdots &{} \vdots &{} \ddots \end{array}\right] \end{aligned}$$
(8)

Equation (8) reveals the connection with the bilinear Markov parameters \( \mathcal{W} ={{\textbf {C}}} \mathcal{R} \) that appear in the first row of \( \mathcal{H} _b\). In general, the construction of the bilinear Hankel matrix is described in [13] with the partial and completed realization theorems along with the partitionsFootnote 2 \( \mathcal{S} ^{A},~ \mathcal{S} ^{N}\) [14].

2.3 Bilinear Realization Algorithm

Input: Input-output time-domain data from a system \(u\rightarrow \boxed {\boldsymbol{\varSigma }?}\rightarrow y\).

Output: A minimal bilinear system \(({{\textbf {A}}}_r,{{\textbf {N}}}_r,{{\textbf {B}}}_r,{{\textbf {C}}}_r)\) of low dimension r that \(\boldsymbol{\varSigma }_r\approx \boldsymbol{\varSigma }\).

  1. 1.

    Excite the system \(\boldsymbol{\varSigma }\) k times with \({{\textbf {u}}}_m\sim \mathcal {{{\textbf {N}}}} (\mu ,\sigma )\) and collect \({{\textbf {y}}}_m\), where \(k=2^{m-1}\).

    1st simulation

    \([u_{1}(1)\cdots u_{1}(m)]\rightarrow \boxed {\boldsymbol{\varSigma }}\rightarrow [y_{1}(1)\cdots y_{1}(m)]={{\textbf {Y}}}_1\), and \({{\textbf {U}}}_1\) as in Definition  3

    \(\vdots \)

    \(\vdots \)

    kth simulation

    \([u_{k}(1)\cdots u_{k}(m)]\rightarrow \boxed {\boldsymbol{\varSigma }}\rightarrow [y_{k}(1)\cdots y_{k}(m)]={{\textbf {Y}}}_k\), and \({{\textbf {U}}}_k\) as in Definition 3.

  2. 2.

    Identify the \((2^m-1)\) bilinear Markov parameters by solving the system in (7).

  3. 3.

    Construct the bilinear Hankel matrix \( \mathcal{H} _b\) and the sub-matrices \( \mathcal{S} ^{{{\textbf {A}}}},~ \mathcal{S} ^{{{\textbf {N}}}}\).

  4. 4.

    Compute \([{{\textbf {U}}},\boldsymbol{\varSigma },{{\textbf {V}}}]=\text {SVD}( \mathcal{H} _b)\) and truncate w.r.t the singular values decay (\(r\ll n\)) - the reduced/identified bilinear model \(({{\textbf {A}}}_r,{{\textbf {N}}}_r,{{\textbf {B}}}_r,{{\textbf {C}}}_r)\) is constructed

    $$\begin{aligned} {{\textbf {A}}}_r= & {} \boldsymbol{\varSigma }^{-1/2}{{\textbf {U}}}^T \mathcal{S} ^{{{\textbf {A}}}}{{\textbf {V}}}\boldsymbol{\varSigma }^{-1/2}\end{aligned}$$
    (9)
    $$\begin{aligned} {{\textbf {N}}}_r= & {} \boldsymbol{\varSigma }^{-1/2}{{\textbf {U}}}^T \mathcal{S} ^{{{\textbf {N}}}}{{\textbf {V}}}\boldsymbol{\varSigma }^{-1/2}\end{aligned}$$
    (10)
    $$\begin{aligned} {{\textbf {B}}}_r= & {} \boldsymbol{\varSigma }^{1/2}{{\textbf {V}}}^{T}~~\rightarrow ~~\text {1st column}\end{aligned}$$
    (11)
    $$\begin{aligned} {{\textbf {C}}}_r= & {} {{\textbf {U}}}\boldsymbol{\varSigma }^{1/2}~~\rightarrow ~~\text {1st row} \end{aligned}$$
    (12)

Example 1

(A toy system) Let the following bilinear system of order 2 be

$$\begin{aligned} {{\textbf {A}}}=\left[ \begin{array}{cc} 0.9 &{} 0.0\\ 0.0 &{} 0.8 \end{array}\right] ,~{{\textbf {N}}}=\left[ \begin{array}{cc} 0.1 &{} 0.2\\ 0.3 &{} 0.4 \end{array}\right] ,~{{\textbf {B}}}=\left[ \begin{array}{c} 1.0\\ 0.0 \end{array}\right] ,~{{\textbf {C}}}=\left[ \begin{array}{c} 1.0\\ 1.0 \end{array}\right] ^T. \end{aligned}$$
(13)

Applying the algorithm in Sect. (2.3), by choosing \(m=4\), we can recover \(2^m-1=15\) bilinear Markov parameters. The solution of the system in Eq. (7) is:

$$\begin{aligned} {{\textbf {W}}}=\left[ \begin{array}{ccccccccccccccc} 1.0 & 0.9 & 0.4 & 0.81 & 0.33 & 0.36 & 0.22 & 0.729 & 0.273 & 0.297 & 0.183 & 0.324 & 0.18 & 0.198 & 0.118 \end{array}\right] \end{aligned}$$

By reshuffling the vector \( \mathcal{W} \), we can form the \( \mathcal{H} _b\) matrix and the shifted versions \( \mathcal{S} ^{{{\textbf {A}}}},~ \mathcal{S} ^{{{\textbf {N}}}}\) as described above. The Hankel matrix (3 rows & 7 columns are displayed) along with the shifted versions are \(\left\{ \mathcal{H} _b,~ \mathcal{S} ^A,~ \mathcal{S} ^N\right\} :=\)

$$\begin{aligned}\left\{ \left[ \begin{array}{ccccccc} 1.0 &{} 0.9 &{} 0.4 &{} 0.81 &{} 0.33 &{} 0.36 &{} 0.22\\ 0.9 &{} 0.81 &{} 0.33 &{} 0.729 &{} 0.273 &{} 0.297 &{} 0.183\\ 0.4 &{} 0.36 &{} 0.22 &{} 0.324 &{} 0.18 &{} 0.198 &{} 0.118 \end{array}\right] ,~\left[ \begin{array}{ccc} 0.9 &{} 0.81 &{} 0.33\\ 0.81 &{} 0.729 &{} 0.273\\ 0.36 &{} 0.324 &{} 0.18 \end{array}\right] ,~\left[ \begin{array}{ccc} 0.4 &{} 0.36 &{} 0.22\\ 0.33 &{} 0.297 &{} 0.183\\ 0.22 &{} 0.198 &{} 0.118 \end{array}\right] \right\} . \end{aligned}$$

In Fig. (1), the 3rd normalized singular value has reached machine precision \(\sigma _{3}/\sigma _{1}=5.2501e-17\), that is the criterion for choosing the order of the fitted system (which is minimal, in this case) of the underlying bilinear system. Therefore, we construct a bilinear model of order \(r=2\), and the realization obtained is equivalent to the original (minimal) one, up to a coordinate (similarity) transformation. Other ways of constructing reduced models from Hankel\(\subset \)Loewner matrices can be obtained with the CUR (cross approximations based) decomposition scheme as in [17] (Fig. 1).

$$\begin{aligned} {{\textbf {A}}}_r=\left[ \begin{array}{cc} 0.89394 &{} 0.11305\\ 0.0050328 &{} 0.80606 \end{array}\right] ,~{{\textbf {N}}}_r=\left[ \begin{array}{cc} 0.41116 &{} -0.2281\\ -0.24782 &{} 0.088841 \end{array}\right] ,~{{\textbf {B}}}_r=\left[ \begin{array}{c} -1.0001\\ -0.053577 \end{array}\right] ,~{{\textbf {C}}}_r=\left[ \begin{array}{c} -1.0001\\ 0.0040101 \end{array}\right] ^T. \end{aligned}$$
(14)
Fig. 1.
figure 1

On the left figure, the singular value decay of the bilinear Hankel matrix is depicted. On the right figure, the input response \(u_k=1/(k+1),~k=0,1,\ldots \) certifies that all models are equivalent.

Example 2

(The viscous Burgers’ equation example) Following [15] after spatial semi-discretization and the Carleman linearization technique, yields a bilinear system of dimension \(N=30^2+30=930\). The viscosity parameter is \(\nu =0.1\); the sampling time is \(\varDelta t=0.1\) and with \(2^{m-1}=512\) independent random inputs of length \(m=10\) each, we construct a database of 5, 120 points. Solving Eq. (7), we get the bilinear Markov parameters, and the bilinear Hankel matrix is constructed. On the left pane of Fig. (2), the decay of bilinear Hankel singular values captures the nonlinear nature of the Burgers’ equation, while, on the other hand, the linear Hankel framework captures only the linear minimal response. It is evident in the right pane of Fig. (2) that after using the inverse transformation \(\phi \) from (3), the reduced continuous-time bilinear model of order \(r=18\) performs well, producing an error \(O(10^{-5})\) where at the same time the linear fit is off (Fig. 2).

Fig. 2.
figure 2

Left pane: The recommended reduced bilinear model with \(\varDelta t=0.1\) is of order \(r=18\) where \(\sigma _{19}/\sigma _1=1.18\cdot 10^{-12}\). Right pane: \(u_1=(1+\cos (2\pi t))e^{-t},t\in [0,2.5],u_2=2\text {sawtooth}(8\pi t),t\in [2.5,3.75],u_3=0\), it is compared with a continuous bilinear identification method based on the Loewner framework in both frequency and time domain approaches [5, 18].

3 From a Single Data Sequence to Bilinear Realization

A repetitive data assimilation simulation in the time domain is required to achieve bilinear realization as in [13]. In many cases, the data from a simulated system are available as a single i/o sequence [9]. Using the NARX-net-based model, in the case of a single experiment, the expensive, repetitive simulations can be avoided in a real engineering environment. These models learn from a unique data sequence and can predict the output behavior under different excitations. That is precisely where the NARX-net model architecture will play the role of a surrogate simulator. Then, by constructing an NN-based model [19] and combining the realization theory in [13], a state-space bilinear model can be constructed as in (2). Using a state-space model, which relies on the classical nonlinear realization theory with many known results (especially on bilinear systems and in the study direction of stability, approximation, and control), is beneficial compared to the NARX.

Example 3

(Heat exchanger) The process is a liquid-saturated steam heat exchanger, where water is heated by pressurized saturated steam through a copper tube. The input variable is the liquid flow rate, and the output variable is the outlet liquid temperature. The sampling time is 1(s), and the number of samples is 4, 000. More details can be found in [20], and the data set can be downloaded from the database to identify systems (DaISy): https://homes.esat.kuleuven.be/~tokka/daisydata.html.

Fig. 3.
figure 3

Comparison and model fit of the proposed NARX-net bilinear model (15) with the subspace method from [9] for the same reduced order \((r=3)\).

$$\begin{aligned} \left\{ \begin{aligned} \dot{{{\textbf {x}}}}(t)&=\left[ \begin{array}{ccc} 0.9164 &{} 0.09167 &{} -0.1847\\ -0.2663 &{} -0.1515 &{} 0.1232\\ -0.07227 &{} 0.4778 &{} 0.3571 \end{array}\right] {{\textbf {x}}}(t)+\left[ \begin{array}{ccc} 0.02717 &{} 0.5169 &{} 0.5555\\ -0.09674 &{} 0.5467 &{} 0.5696\\ 0.1878 &{} -0.06846 &{} -1.981 \end{array}\right] {{\textbf {x}}}(t)u(t)+\\ &+\left[ \begin{array}{c} 2.9063\\ 2.909\\ -0.16088 \end{array}\right] u(t)+\left[ \begin{array}{c} -1.073\\ -1.074\\ 0.05938 \end{array}\right] ,\\ y(t)&=\left[ \begin{array}{ccc} -0.7852 & 0.7794 & -0.05203 \end{array}\right] {{\textbf {x}}}(t)+96.9358,~{{\textbf {x}}}(0)={\textbf {0}},~t\ge 0. \end{aligned}\right. \end{aligned}$$
(15)

Figure 3 illustrates the superiority of the proposed method in terms of accuracy. From the single i/o data sequence, a neural network NN with 3-layers and 20-lags was trained using the same training dataFootnote 3 as in [9] (1000 points). The trained NN was used in the bilinear realization algorithm to generate more data, and a stable reduced bilinear model of order \(r=3\) shown in Eq. (15) was successfully constructed. The original noisy data were explained with a lower mean percentage error \(\text {MPE}=0.56\%\) compared to the subspace method for the entire data set. Another NN architecture, s.a., the NARMAXFootnote 4 belongs to a subclass of bilinear systems and will filter some nonlinear features without achieving such a good MPE.

4 Conclusion

In conclusion, NN architectures are a superclass of NARMAX models used in the classical robust identification theory. Consequently, NN models share the same strong argument with the Carleman linearization scheme that can approximate general nonlinear systems. Finally, NN and realization theory successfully bridge data science with computational science to build reliable, interpretable nonlinear models. Different NN architectures (s.a., recurrent NNs, DeepOnets, etc.) in combination with other realization frameworks (s.a., the Loewner framework) and for other types of nonlinearities (s.a., quadratic-bilinear) are left for future research endeavors.