1 Introduction

In natural sciences, evolutionary phenomena can be modeled as dynamical systems. An ever-increasing need for improving the approximation accuracy has motivated including more involved and detailed features in the modeling process, thus inevitably leading to large-scale dynamical systems [3]. To overcome this problem, efficient finite methods heavily rely on model reduction. Model reduction methods can be classified into two broad categories, namely, SVD based and Krylov based (moment matching).

The most prominent among the SVD-based methods is balanced truncation (BT). In general, balancing methods are based on the computation of controllability and observability gramians and lead to the elimination of state variables which are difficult to reach and to observe. Besides having high-computational cost of solving the associated matrix Lyapunov equations, the advantages of balancing methods include the preservation of stability and an a priori computable error bound. For more details on these topics as well as on other model reduction methods not treated here (e.g., proper orthogonal decomposition (POD)/reduced basis (RB)), we refer the reader to the book [3] and the surveys [9, 15].

One way to perform model reduction is by employing tangential interpolation. These methods are known as rational Krylov methods or moment-matching methods. Krylov-based methods are numerically efficient and have lower computational cost, but, in general, the preservation of other properties (e.g., stability or passivity) is not automatic. For an extensive study in interpolatory model reduction, we refer the reader to the recent book [4]. In what follows, we will consider exclusively interpolatory model reduction methods and, in particular, the LF. For recent surveys on the LF, see [2, 6, 29]. The sensitivity to noise in the LF was already discussed in [19, 30].

When input-output data are offered, data-driven methods, such as the Loewner framework (LF), dynamic mode decomposition (DMD) [37], sparse identification of nonlinear systems (with control) (SINDYc) in [27], vector fitting (VF) [23], Hankel [25] or subspace methods [8, 24, 26], moment-matching [36], and operator inference [13, 14, 32], remain the only feasible approaches for recovering the hidden information.

DMD-based methods represent viable alternatives that require state-derivative estimations.

While the underlying dynamical system acts as a black box, model identification tools are important for the reliability of the discovered models (i.e., stability, prediction). At the same time, these discovered models might have large dimension and hence are not suitable for fast numerical simulation and control. The LF is a direct data-driven interpolatory method able to identify and reduce models derived directly from measurements. For measured data in the frequency domain, the LF is well established for linear and nonlinear systems (e.g., bilinear or quadratic-bilinear systems) see [5, 22]. In the case of time-domain data, the LF was already applied for approximating linear models [21, 24, 31]. As the aim of this paper is to extend the identification and reduction procedure to the class of bilinear systems from time-domain data, we start our analysis by introducing the mathematical description of the input-u(t) to output-y(t) relation as depicted in Fig. 1. The differential and algebraic operators are denoted with \(\mathbf{f}\) and, respectively, with \(\mathbf{z}\). To achieve this goal, all the important steps from nonlinear system theory and interpolatory model reduction are summarized.

Fig. 1
figure 1

Mathematical formalism for evolutionary phenomena

1.1 Outline of the Paper

The rest of the paper is organized as follows:

  • Section 2 contains a brief description of system theory starting from the linear case followed by extensions to the nonlinear case by means of the Volterra series representation. The single-input and single-output case is addressed for both frequency- and time-domain representations.

  • Section 3 introduces the Loewner framework as an interpolatory tool for model approximation; the results that are presented here actually set the foundation for identification and reduction of linear time-invariant systems.

  • Section 4 introduces a special class of nonlinear systems, e.g., bilinear systems. The theoretical discussion for analyzing such systems starts with the growing exponential approach and the derivation of the generalized frequency response functions (GFRFs) up to the case where a double-tone input is assumed. In addition, the kernel separation strategy for improving the measurements and the linear identification/reduction part is presented. A concise algorithm that summarizes the method is presented.

  • Section 5 presents the numerical experiments performed in order to illustrate the practical applicability of the newly proposed method. This section includes both a simple (low-dimensional) example and a large-scale example, compared to another state-of-the-art method.

  • Section 6 presents the concluding remarks and also some potential future developments of the current method.

2 System Theory Preliminaries

In this section, we will briefly present some important material from system theory starting from the linear case.

2.1 Linear Systems

Consider SISO linear, time-invariant systems with n internal variables (called “states” whenever the matrix \(\mathbf{E}\) is non-singular).

$$\begin{aligned} \mathbf{\Sigma }_{l}:~\left\{ \begin{aligned} \mathbf{E}\,\dot{\mathbf{x}}(t)&=\mathbf{A}\mathbf{x}(t)+\mathbf{b}u (t),\quad \\ y (t)&=\mathbf{c}\mathbf{x}(t),~t\ge 0, \end{aligned}\right. \end{aligned}$$
(1)

where \(\mathbf{E},~\mathbf{A}\in {\mathbb R}^{n\times n},~\mathbf{b}\in {\mathbb R}^{n\times 1},~\mathbf{c}\in {\mathbb R}^{1\times n}\). In the sequel, we will restrict our attention to invertible matrix \(\mathbf{E}\) and with zero d-term (\(d=0\)) in the state-output equationFootnote 1. The explicit solution with the convolution integralFootnote 2 notation and the time-domain linear kernel h(t) as the impulse response of the system can be written as

$$\begin{aligned} y(t)=\mathbf{c}e^{\mathbf{A}t}\mathbf{x}(0)+(h*u)(t),~t\ge 0, \end{aligned}$$
(2)

where multiplication with \(\mathbf{E}^{-1}\) from the left has been performed in the differential part of Eq. (1). Also, we keep the same notation for the remaining matrix \(\mathbf{A}\) and vector \(\mathbf{b}\). By assuming zero initial conditions and performing a Laplace transform, we obtain the transfer function description:

$$\begin{aligned} H(s)=\frac{Y(s)}{U(s)}=\mathbf{c}(s\mathbf{I}-\mathbf{A})^{-1}\mathbf{b},~s\in {\mathbb C}, \end{aligned}$$
(3)

where Y(s), U(s) stand for the input and the output in the frequency domain.

2.2 Nonlinear Systems

A large class of nonlinear systems can be described by means of the Volterra-Wiener approach in [35]. Other relevant works on nonlinear systems and nonlinear modeling/identification include Schetzen (1980), Chen and Billings (1989), Boyd and Chua (1985) et. al.

The aim in this study is to identify and reduce special types of nonlinear systems (s.a., bilinear) from time-domain measurements. By knowing only the input and the simulated or measured output in the time domain as in Fig. 2, we will identify the hidden model. In such situations where only snapshots are available, beyond the linear fit which is well established a nonlinear fit of a special type will be developed.

Fig. 2
figure 2

The input-output mapping from the data-driven perspective with the unknown system \(\mathbf{\Sigma }\). Specific structures of the unknown system can be assumed/inspired by the physical problem. For instance, if the underlying physical phenomenon is fluid flow inside a control volume, quadratic models should be constructed, e.g., [22]

2.2.1 Approximation of Nonlinear Systems (Volterra Series)

The input-output relationship for a wide class of nonlinear systems [35] can be approximated by a Volterra series for sufficiently high N as

$$\begin{aligned} y(t)=\sum _{n=1}^{N}y_{n}(t),~~y_{n}(t)=\int _{-\infty }^{\infty }\cdots \int _{-\infty }^{\infty }h_{n}(\tau _{1},\ldots ,\tau _{n})\prod _{i=1}^{n}u(t-\tau _{i})d\tau _{i}, \end{aligned}$$
(4)

where \(h_{n}(\tau _{1},\ldots ,\tau _{n})\) is a real-valued function of \(\tau _{1}, \ldots ,\tau _{n}\) known as the nth-order Volterra kernel.

Definition 1

The nth-order generalized frequency response function (GFRF) is defined as

$$\begin{aligned} H_{n}(j\omega _{1},\ldots , j\omega _{n})=\int _{-\infty }^{\infty }\cdots \int _{-\infty }^{\infty }h_{n}(\tau _{1},\ldots ,\tau _{n})e^{\left( -j\sum _{i=1}^{n}\omega _{i}\tau _{i}\right) }d\tau _{1}\cdots d\tau _{n}, \end{aligned}$$
(5)

which is the multidimensional FourierFootnote 3 transform of \(h_{n}(\tau _{1},\ldots ,\tau _{n})\).

By applying the inverse Fourier transform of the nth-order GFRF, Eq. (5) can be written as

$$\begin{aligned} y_{n}(t)=\frac{1}{{(2\pi )}^n}\int _{-\infty }^{\infty }\cdots \int _{-\infty }^{\infty }H_{n}(j\omega _{1},\ldots , j\omega _{n})\prod _{i=1}^{n}U(j\omega _{i})e^{j(\omega _{1}+\cdots +\omega _{n})t}d\omega _{i}. \end{aligned}$$
(6)

The nth Volterra operator is defined as

$$\begin{aligned} V_{n}(u_1,u_2,...,u_n)=\int _{-\infty }^{\infty }\cdots \int _{-\infty }^{\infty }h_{n}(\tau _{1},...,\tau _{n})\prod _{i=1}^{n}u_{i}(t-\tau _{i})d\tau _{i}, \end{aligned}$$
(7)

so that \(y_{n}=V_{n}(u,u,...,u)\) holds true.

> Homogeneity of the Volterra operator

The map \(u(t)\rightarrow y_{n}(t)\) is homogeneous of degree n, that is, \(\alpha u\rightarrow \alpha ^n y_{n}\), \(\alpha \in {\mathbb C}\). Each Volterra kernel \(h_{n}(t)\) determines a symmetric multi-linear operator. Small amplitudes (e.g., \(|\alpha |<\epsilon \)) will allow ordering the nonlinear terms in such a way that terms with large powers of the amplitude (\(\alpha ^n\)) will be negligible. That is precisely the sense of approximating weakly nonlinear systems with Volterra series.

2.2.2 A Single-Tone Input

Consider the excitation of a system with an input consisting of two complex exponentials as in Eq. (8). Such inputs are typically used in chemical engineering applications as [33].

$$\begin{aligned} u(t)=A\cos (\omega t)=\left( \frac{A}{2}\right) e^{j\omega t}+\left( \frac{A}{2}\right) e^{-j\omega t}. \end{aligned}$$
(8)

By using the above input in Eq. (4), we can derive the first Volterra term with \(n=1\) as

$$\begin{aligned} \begin{aligned} y_{1}(t)&=\int _{-\infty }^{\infty }h_{1}(\tau _{1})[u(t-\tau _{1})]d\tau _{1}\\&=\frac{A}{2}e^{j\omega t}\underbrace{\int _{-\infty }^{\infty }h_{1}(\tau _{1})e^{-j\omega \tau _{1}}d\tau _{1}}_{H_{1}(j\omega )}+\frac{A}{2}e^{-j\omega t}\underbrace{\int _{-\infty }^{\infty }h_{1}(\tau _{1})e^{j\omega \tau _{1}}d\tau _{1}}_{H_{1}(-j\omega )}\Rightarrow \\ y_{1}(t)&=\frac{A}{2}\bigg (e^{j\omega t}H_{1}(j\omega )+e^{-j\omega t}H_{1}(-j\omega )\bigg ). \end{aligned} \end{aligned}$$
(9)

Similarly, for the second term, we can derive

$$\begin{aligned} y_{2}(t)=\bigg (\frac{A}{2}\bigg )^2\bigg [e^{2j\omega }H_{2}(j\omega ,j\omega )+2e^{0}H_{2}(j\omega ,-j\omega )+e^{-2j\omega }H_{2}(-j\omega ,-j\omega )\bigg ]. \end{aligned}$$
(10)

Remark 1

(Conjugate symmetry): \(H_{2}^*(j\omega ,-j\omega )=H_{2}(-j\omega ,j\omega ),~\forall \omega \in {\mathbb R}\).

The input amplitude is A, the angular frequency is \(\omega \), the imaginary unit is \(\mathrm {j}\), the first-order response function is \(H_{1}(j\omega )\), and \(H_{n}(j\omega ,...,j\omega )\), for \(n\ge 2\), are the higher order FRFs or GFRFs. Then, the nth Volterra term can be written as

$$\begin{aligned} y_{n}(t)=\left( \frac{A}{2}\right) ^{n}\sum _{p+q=n}{}^{n}C_{q}H_{n}^{p,q}(j\omega )e^{j\omega _{p,q}t},~\omega _{p,q}=(p-q)\omega . \end{aligned}$$
(11)

where the following notations have been used:

$$\begin{aligned} H_{n}^{p,q}(j\omega )=H_{n}(\underbrace{j\omega ,...,j\omega }_{p-times};\underbrace{-j\omega ,...,-j\omega }_{q-times}),~\omega _{p,q}=(p-q)\omega ,~{}^{n}C_{q}=\frac{n!}{q!(n-q)!}. \end{aligned}$$
(12)

2.2.3 Time-Domain Representation of Harmonics

The mth harmonic in the time domain can be computed by collecting the identical exponential power coefficients from Eq. (13) and by setting \(p-q=m\), with \(p=m+i-1\) and \(q=i-1\) in Eq. (11). Hence, it follows that

$$\begin{aligned} y_{m^{th}}(t)=\sum _{i=1}^{\infty }\left( \frac{A}{2}\right) ^{m+2i-2}{}^{m+2i-2}C_{i-1}H_{m+2i-2}^{m+i-1,i-1}(j\omega )e^{jm\omega t}. \end{aligned}$$
(13)

2.2.4 Frequency-Domain Representation of Harmonics

The mth harmonic in the frequency domain by applying single-sided Fourier transform in Eq. (13) is the following:

$$\begin{aligned} Y_{m^{th}}(jm\omega )=\sum _{i=1}^{\infty }\left( \frac{A}{2}\right) ^{m+2i-2}{}^{m+2i-2}C_{i-1}H_{m+2i-2}^{m+i-1,i-1}(j\omega )\delta (jm\omega ), \end{aligned}$$
(14)

where \(\delta (\cdot )\) is the Dirac delta distribution. When a single-tone input excites a nonlinear dynamical system, the steady-state frequency response is characterized by a spectrum with higher harmonics (as can be seen, for example, in Fig. 3). This behavior is not observed in the linear case, where only one harmonic appears at the input frequency.

Fig. 3
figure 3

An instance of the single-sided power spectrum with a singleton input with \(\omega =1\) is depicted. The underlying system is nonlinear and as a result higher harmonics appeared with a DC (direct current—non-periodic) term as well

3 The Loewner Framework

We start with an account of the Loewner framework (LF) in the linear case [2, 6, 29]. The LF is an interpolatory method that seeks reduced models whose transfer function matches that of the original system at selected interpolation points. An important attribute is that it provides a trade-off between accuracy of fit and complexity of the model. It constructs models from given frequency data in a straightforward manner. In the case of SISO systems, we have the rational scalar interpolation problem to solve.

Consider a given set of complex data as

$$ \{\left( s_k,f_k(s_k)\right) \in {\mathbb C}\times {\mathbb C}:k=1,\ldots ,2n)\}. $$

We partition the data in two disjoint sets:

$$ \mathbf{S}=[\underbrace{s_1,\ldots ,s_n}_{\mu },\underbrace{s_{n+1},\ldots ,s_{2n}}_{\lambda }],~\mathbf{F}=[\underbrace{f_1,\ldots ,f_n}_{{\mathbb V}},\underbrace{f_{n+1},\ldots ,f_{2n}}_{{\mathbb W}}], $$

where \(\mu _i=s_i\), \(\lambda _i=s_{n+i}\), \(v_{i}=f_i\), \(w_{i}=f_{n+i}\) for \(i=1,\ldots ,n\).

The objective is to find \(H(s)\in {\mathbb C}\), such that

$$\begin{aligned} H(\mu _i)=v_{i},~i=1,\ldots ,n,~\text {and}~H(\lambda _j)=w_{j},~j=1,\ldots ,n. \end{aligned}$$
(15)

The left dataset is denoted as

$$\begin{aligned} \mathbf{M}=\left[ \mu _1,\cdots ,\mu _n\right] \in {\mathbb C}^{1\times n},~~{\mathbb V}=\left[ v_1,\cdots ,v_n\right] ^{T}\in {\mathbb C}^{n\times 1}, \end{aligned}$$
(16)

while the right dataset as

$$\begin{aligned} \boldsymbol{\Lambda }=\left[ \lambda _1,\cdots ,\lambda _n\right] ^{T}\in {\mathbb C}^{n\times 1},~~{\mathbb W}=[w_1,\cdots , w_n]\in {\mathbb C}^{1\times n}. \end{aligned}$$
(17)

Interpolation points are determined by the problem or are selected to achieve given model reduction goals. For ways of choosing the interpolation grids and of partitioning the data into the left and right sets, we refer the reader to the recent survey [29].

3.1 The Loewner Matrix

Given a row array of complex numbers \((\mu _j,v_j)\), \(j=1,\ldots ,{n}\), and a column array, \((\lambda _i,w_i)\), \(i=1,\ldots ,{n},\) (with \(\lambda _i\) and the \(\mu _j\) mutually distinct) the associated Loewner matrix \({\mathbb L}\) and the shifted Loewner matrix \({{{\mathbb L}_s}}\) are defined as

$$ {\mathbb L}\!=\!\left[ \!\begin{array}{ccc} \frac{v_1-w_1}{\mu _1-\lambda _1} &{} \cdots &{} \frac{v_1-w_{n}}{\mu _1-\lambda _{n}} \\ \vdots &{} \ddots &{} \vdots \\ \frac{v_n-w_1}{\mu _n-\lambda _1} &{} \cdots &{} \frac{v_n-w_{n}}{\mu _n-\lambda _{n}} \end{array}\!\right] \!\in \!{\mathbb C}^{n\times n},~ {{{\mathbb L}_s}}\!=\!\left[ \!\begin{array}{ccc} \frac{\mu _1v_1-\lambda _1w_1}{\mu _1-\lambda _1} &{} \cdots &{} \frac{\mu _1v_1-\lambda _{n}w_{n}}{\mu _1-\lambda _{n}} \\ \vdots &{} \ddots &{} \vdots \\ \frac{\mu _n v_n-\lambda _1 w_1}{\mu _n-\lambda _1} &{} \cdots &{} \frac{\mu _n v_n-\lambda _{n}w_{n}}{\mu _n-\lambda _{n}} \end{array}\!\right] \!\in \!{\mathbb C}^{n\times n}. $$

Definition 2

If g is rational, i.e., \(g(s)=\frac{p(s)}{q(s)}\), for appropriate polynomials p, q, the McMillan degree or the complexity of g is \(\text{ deg }\,g=\max \{\text{ deg }(p),\text{ deg }(q)\}\).

Now, if \(w_i=g(\lambda _i)\) and \(v_j=g(\mu _j)\) are samples of a rational function g, the main property of Loewner matrices asserts the following.

Theorem 1

[2] Let \({\mathbb L}\) be as above. If \( k,q\ge \mathrm{deg}\,g\), then \(\,\mathrm{rank}\, {\mathbb L}= {\deg }\,g\).

In other words, the rank of \({\mathbb L}\) encodes the complexity of the underlying rational function g. Furthermore, the same result holds for matrix-valued functions g.

3.2 Construction of Interpolants

If the pencil  \(({{{\mathbb L}_s}},\,{\mathbb L})\)  is regular, then   \(\mathbf{E}=-{\mathbb L},~~ \mathbf{A}=-{{{\mathbb L}_s}},~~ \mathbf{b}={\mathbb V},~~ \mathbf{c}={\mathbb W}\),  is a minimal realization of an interpolant for the data, i.e., \(H(s)={\mathbb W}({{{\mathbb L}_s}}-s{\mathbb L})^{-1}{\mathbb V}\). Otherwise, as shown in [2], the problem in Eq. (15) has a solution provided that

$$\begin{aligned} \text{ rank }\,\left[ s\,{\mathbb L}-{{{\mathbb L}_s}}\right] =\text{ rank }\,\left[ {\mathbb L},\ \, {{{\mathbb L}_s}}\right] = \text{ rank }\,\left[ \!\begin{array}{c}{\mathbb L}\\ {{{\mathbb L}_s}}\end{array}\!\right] \!= {r}, \end{aligned}$$

for all  \(s\in \{\mu _i\}\cup \{\lambda _j\}\).  Consider then the thin SVDs:

$$\begin{aligned} \left[ {\mathbb L},\ \, {{{\mathbb L}_s}}\right] =\mathbf{Y}\widehat{\Sigma }_{ {r}}\tilde{\mathbf{X}}^*,~~ \left[ \begin{array}{c}{\mathbb L}\\ {{{\mathbb L}_s}}\end{array}\right] = {\tilde{\mathbf{Y}}}\Sigma _{ {r}} \mathbf{X}^*, \end{aligned}$$

where  \(\widehat{\Sigma }_{ {r}}\), \(\Sigma _{ {r}}\) \(\in \) \({\mathbb R}^{{ {r}}\times {r}}\)\(\mathbf{Y}\in {\mathbb C}^{n\times {r}}\), \(\mathbf{X}\) \(\in \) \({\mathbb C}^{n \times {r}}\), \(\tilde{\mathbf{Y}} \in {\mathbb C}^{2n\times {r}}\), \(\tilde{\mathbf{X}}\) \(\in \) \({\mathbb C}^{r \times {2n}}\).

Remark 2

r can be chosen as the numerical rank (as opposed to the exact rank) of the Loewner pencil.

Theorem 2

The quadruple \((\tilde{\mathbf{A}},\tilde{\mathbf{b}},\tilde{\mathbf{c}},\tilde{\mathbf{E}})\) of size,  \( {r}\times {r}\),  \( {r}\times {1}\),  \( {1}\times r\),  \(r\times {r}\),  given by

$$\begin{aligned} \tilde{\mathbf{E}} = -\mathbf{Y}^T{\mathbb L}\mathbf{X},~~\tilde{\mathbf{A}} = -\mathbf{Y}^T{{{\mathbb L}_s}}\mathbf{X}, ~~\tilde{\mathbf{b}} = \mathbf{Y}^T{\mathbb V},~~ \tilde{\mathbf{c}} = {\mathbb W}\mathbf{X}, \end{aligned}$$

is a descriptor realization of an (approximate) interpolant of the data with McMillan degree \(r=rank({\mathbb L})\), where \(\tilde{H}(s)=\tilde{\mathbf{c}}(s\tilde{\mathbf{E}}-\tilde{\mathbf{A}})^{-1}\tilde{\mathbf{b}}\).

For more details on the construction/identification of linear systems with the LF, we refer the reader to [4, 6, 29] where both the SISO and MIMO cases are addressed together with other more technical aspects (e.g., how to impose the construction of real-valued models, etc.).

4 The Special Case of Bilinear Systems

In recent years, projection-based Krylov methods have extensively been applied for model reduction of bilinear systems. We mention the following contributions [1, 5, 7, 10,11,12, 17, 20, 34] and the references within.

Scalar bilinear systems are described by the set of matrices; \(\mathbf{\Sigma }_{b}=(\mathbf{A},\mathbf{N},\mathbf{b},\mathbf{c},\mathbf{E})\) and characterized by the following equations:

$$\begin{aligned} \mathbf{\Sigma }_{b}: \left\{ \begin{aligned} \mathbf{E}\dot{\mathbf{x}}(t)&=\mathbf{A}\mathbf{x}(t)+\mathbf{N}\mathbf{x}(t)u(t)+\mathbf{b}u(t),\\ y(t)&=\mathbf{c}\mathbf{x}(t), \end{aligned}\right. \end{aligned}$$
(18)

where \(\mathbf{E},\mathbf{A},\mathbf{N}\in {\mathbb R}^{n\times n}\), \(\mathbf{b}\in {\mathbb R}^{n\times 1}\), \(\mathbf{c}\in {\mathbb R}^{1\times n}\), and \(\mathbf{x}\in {\mathbb R}^{n\times 1},u,y\in {\mathbb R}\). In what follows, we restrict our analysis to systems with non-singular \(\mathbf{E}\) matrices (e.g., identity matrix).

4.1 The Growing Exponential Approach

The properties of the growing exponential approach can be adapted readily to the problem of finding transfer functions for constant-parameter (stationary) state equations. Let us consider the bilinear model in Eq. (18) with zero initial conditions. A single-tone input with amplitude \(A<1\) is considered as in Eq. (8).

$$\begin{aligned} u(t)=A\cos (\omega t)=\frac{A}{2}e^{j\omega t}+\frac{A}{2}e^{-j\omega t}=a e^{j\omega t}+a e^{-j\omega t}, \end{aligned}$$
(19)

where \(a=A/2\) and \(a\in (0,\epsilon )\) with \(0<\epsilon <1/2\) and for all \(t\ge 0\). The steady-state solution for the differential equation in Eq. (18) can be written as follows:

$$\begin{aligned} \mathbf{x}(t)=\sum _{p,q\in {\mathbb N}}^{\infty }\mathbf{G}_{n}^{p,q}(\underbrace{j\omega ,\ldots ,j\omega }_{p-times},\underbrace{-j\omega ,...,-j\omega }_{q-times})a^{p+q}e^{j\omega (p-q)t}. \end{aligned}$$
(20)

The symbolFootnote 4 \(\mathbf{G}_{n}^{p,q}\) denotes the nth input to state frequency response containing p-times the frequency \(\omega \) and q-times the frequency \(-\omega \). By substituting in Eq. (18) and collecting the terms of the same exponential (as the \(e^{j\omega _m t}\)), we can derive the input to state frequency responses \(\mathbf{G}_n\) for every n as follows:

$$\begin{aligned} \begin{aligned}&\sum _{p,q\in {\mathbb N}}^{\infty }\left( j\omega (p-q)\mathbf{E}-\mathbf{A}\right) \mathbf{G}_{n}^{p,q}a^{p+q}e^{j\omega (p-q)t}=\mathbf{b}(ae^{j\omega t}+ae^{-j\omega t})+\\&+\mathbf{N}\left( \sum _{p,q\in {\mathbb N}}^{\infty }\mathbf{G}_{n}^{p,q}a^{p+q+1}e^{j\omega (p+1-q)t}+\sum _{p,q\in {\mathbb N}}^{\infty }\mathbf{G}_{n}^{p,q}a^{p+q+1}e^{j\omega (p-q-1)t}\right) . \end{aligned} \end{aligned}$$

For the first choices of p and q up to \(p+q\le 2,~(1,0),(0,1),(2,0),(0,2),(1,1)\) and by denoting the resolvent \(\boldsymbol{\Phi }(j\omega )=\left( j\omega \mathbf{E}-\mathbf{A}\right) ^{-1}\in {\mathbb C}^{n\times n}\), c.t. conjugate terms, we derive the first set of terms

$$\begin{aligned} \begin{aligned}&\boldsymbol{\Phi }(j\omega )^{-1}\mathbf{G}_{1}^{1,0}ae^{j\omega t}+\boldsymbol{\Phi }(2j\omega )^{-1}\mathbf{G}_{2}^{2,0}a^2e^{2j\omega t}+\boldsymbol{\Phi }(0)^{-1}\mathbf{G}_{2}^{1,1}a^2+c.t.+\cdots =\\&\mathbf{N}\mathbf{G}_{1}^{1,0}a^2e^{2j\omega t}+\mathbf{N}\mathbf{G}_{2}^{2,0}a^3e^{3j\omega t}+\mathbf{N}\mathbf{G}_{2}^{1,1}a^3e^{j\omega t}+c.t.+\cdots +\mathbf{b}ae^{j\omega t}+c.t. \end{aligned} \end{aligned}$$

Collecting the same powers in both exponential and polynomial magnitudes, we compute the first and the second time/input-invariant GFRFs:

$$\begin{aligned} \begin{aligned} \mathbf{G}_{1}^{1,0}(j\omega )&=\boldsymbol{\Phi }(j\omega )\mathbf{b},\\ \mathbf{G}_{2}^{2,0}(j\omega )&=\boldsymbol{\Phi }(2j\omega )\mathbf{N}\mathbf{G}_{1}^{1,0}=\boldsymbol{\Phi }(2j\omega )\mathbf{N}\boldsymbol{\Phi }(j\omega )\mathbf{b}. \end{aligned} \end{aligned}$$
(21)

Then, the following input to state transfer functions \(\mathbf{G}_n\) using induction are

$$\begin{aligned} \begin{aligned}&\mathbf{G}_{n}^{n,0}(j\omega )=\boldsymbol{\Phi }(nj\omega )\mathbf{N}\boldsymbol{\Phi }((n-1)j\omega )\mathbf{N}\cdots \mathbf{N}\boldsymbol{\Phi }(j\omega )\mathbf{b},\\&\mathbf{G}_{n}^{0,n}(j\omega )=\boldsymbol{\Phi }(-nj\omega )\mathbf{N}\boldsymbol{\Phi }(-(n-1)j\omega )\mathbf{N}\cdots \mathbf{N}\boldsymbol{\Phi }(-j\omega )\mathbf{b},\\&\mathbf{G}_{n}^{p,q}(j\omega )=\boldsymbol{\Phi }((p-q)j\omega )\mathbf{N}\left[ \mathbf{G}_{n-1}^{p,q-1}(j\omega )+\mathbf{G}_{n-1}^{p-1,q}(j\omega )\right] ,~p,q\ge 1, \end{aligned} \end{aligned}$$
(22)

for \(n\ge 1\) and \(p+q=n\). By multiplying with the output vector \(\mathbf{c}\), we can further derive the input-output generalized frequency responses GFRFs as

$$\begin{aligned} \begin{aligned}&H_{n}^{n,0}(j\omega )=\mathbf{c}\boldsymbol{\Phi }(nj\omega )\mathbf{N}\boldsymbol{\Phi }((n-1)j\omega )\mathbf{N}\cdots \mathbf{N}\boldsymbol{\Phi }(j\omega )\mathbf{b},\\&H_{n}^{0,n}(j\omega )=\mathbf{c}\boldsymbol{\Phi }(-nj\omega )\mathbf{N}\boldsymbol{\Phi }(-(n-1)j\omega )\mathbf{N}\cdots \mathbf{N}\boldsymbol{\Phi }(-j\omega )\mathbf{b},\\&H_{n}^{p,q}(j\omega )=\mathbf{c}\boldsymbol{\Phi }((p-q)j\omega )\mathbf{N}\left[ \mathbf{G}_{n-1}^{p,q-1}(j\omega )+\mathbf{G}_{n-1}^{p-1,q}(j\omega )\right] ,~p,q\ge 1. \end{aligned} \end{aligned}$$
(23)

At this point, we can write the Volterra series by using the above specific structure of the GFRFs that were derived with the growing exponential approach for the bilinear case. An important property to notice is that the nth kernel is a multivariate function of order n. It is obvious that the identification of the nth-order FRF involves an n-dimensional frequency space. For that reason, next, we derive the general second symmetric kernel for the bilinear case with a double-tone input. Consider:

$$\begin{aligned} u(t)=A_{1}\cos (\omega _1 t)+A_{2}\cos (\omega _2 t)=\sum _{i=1}^{2}\alpha _{i}(e^{j\omega _i t}+e^{-j\omega _i t}), \end{aligned}$$
(24)

where \(\alpha _1=\frac{A_1}{2}\) and \(\alpha _2=\frac{A_2}{2}\). In that case, with the growing exponential approach the state solution in steady state is

$$\begin{aligned} \mathbf{x}(t)=\sum _{m_{1},\ldots ,m_{4}\in {\mathbb N}}^{\infty }\mathbf{G}_{n}^{m_1,m_2,m_3,m_4}\alpha _1^{m_1+m_2}\alpha _2^{m_3+m_4}e^{j((m_1-m_2)\omega _1+(m_3-m_4)\omega _2)t}. \end{aligned}$$
(25)

We are looking for the input to state frequency response \(\mathbf{G}(j\omega _1,j\omega _2)\). By substituting to the bilinear model in Eq. (18) and collecting the appropriate terms while at the same time using the symmetry \(\mathbf{G}(j\omega _1,j\omega _2)=\mathbf{G}(j\omega _2,j\omega _1)\), we conclude that

$$\begin{aligned} \mathbf{G}_{2}(j\omega _1,j\omega _2) {=}\frac{1}{2}\left[ \!(j\omega _1+j\omega _2)\mathbf{E}-\mathbf{A}\!\right] ^{-1}\mathbf{N}\left[ \!\left( j\omega _1\mathbf{E}-\mathbf{A}\right) ^{-1}\mathbf{b}+\left( j\omega _2\mathbf{E}-\mathbf{A}\right) ^{-1}\mathbf{b}\!\right] , \end{aligned}$$
(26)

where by using the resolvent notation and multiplying with \(\mathbf{c}\), we derive the second-order symmetric generalized frequency response function as

$$\begin{aligned} H_{2}(j\omega _1,j\omega _2)=\frac{1}{2}\mathbf{c}\boldsymbol{\Phi }(j\omega _1+j\omega _2)\mathbf{N}\left[ \boldsymbol{\Phi }(j\omega _1)\mathbf{b}+\boldsymbol{\Phi }(j\omega _2)\mathbf{b}\right] . \end{aligned}$$
(27)

4.2 The Kernel Separation Method

One way to deduce Volterra kernels is by means of interpolation. This problem is equivalent to that of estimating a polynomial with noisy coefficients. This interpolation scheme builds a linear system with a Vandermonde matrix which is invertible since the amplitudes are distinct and nonzero. The inverse of a Vandermonde matrix can be explicitly computed and there are stable ways to solve these equations [16]. The recently proposed method presented in [18] solves the exponentially ill-condition problem of the Vandermonde matrix with Arnoldi orthogonalization. The mth harmonic in the frequency domain is derived by applying a (single-sided) Fourier transform. More precisely, the explicit formulation is as follows:

$$\begin{aligned} \begin{aligned} Y_{m^{th}}(jm\omega )&=\sum _{i=1}^{\infty }\underbrace{\left( \frac{A}{2}\right) ^{m+2i-2}{}^{m+2i-2}C_{i-1}}_{\alpha ^{m+2i-2}}H_{m+2i-2}^{m+i-1,i-1}(j\omega )\delta (jm\omega )\\&=\sum _{i=1}^{\infty }\alpha ^{m+2i-2}H_{m+2(i-1)}^{m+i-1,i-1}(j\omega )\delta (jm\omega ). \end{aligned} \end{aligned}$$
(28)

We simplify the notation in order to reveal the adaptive method that will help us to estimate the GFRFs up to a specific order. Next, write the linear system of equations that connects the harmonic information with the higher Volterra kernels as follows:

$$\begin{aligned} \begin{aligned} \underbrace{\left[ \begin{array}{c} Y_{0}(0j\omega ) \\[1mm] Y_{1}(1j\omega ) \\[1mm]Y_{2}(2j\omega ) \\[1mm]Y_{3}(3j\omega ) \\[1mm] \vdots \\[1mm] Y_{m}(mj\omega )\end{array}\right] }_{\mathbf{Y}_{(\alpha ,\omega )}}=\Bigg \{\underbrace{\left[ \begin{array}{cccc} {\alpha ^{0}} &{} {\alpha ^{2}} &{} \alpha ^{4} &{} \dots \\[1mm] {\alpha ^{1}}&{} {\alpha ^{3}} &{} \alpha ^{5} &{} \dots \\[1mm] {\alpha ^{2}}&{} \alpha ^{4} &{} \alpha ^{6} &{} \dots \\[1mm]{\alpha ^{3}}&{} \alpha ^{5} &{} \alpha ^{7} &{} \dots \\[1mm] \vdots &{} \vdots &{} \vdots &{} \vdots \\[1mm] \alpha ^{m}&{} \alpha ^{m+2} &{} \alpha ^{m+4} &{} \dots \end{array}\right] }_{\mathbf{M}_{\alpha }}&\odot \underbrace{\left[ \begin{array}{cccc} H_{0}^{0,0}&{} H_{2}^{1,1} &{} H_{4}^{2,2} &{} \dots \\[1mm] H_{1}^{1,0}&{} H_{3}^{2,1} &{} H_{5}^{3,2} &{}\dots \\[1mm] H_{2}^{2,0}&{} H_{4}^{3,1} &{} H_{6}^{4,2} &{} \dots \\[1mm]H_{3}^{3,0}&{} H_{5}^{4,1} &{} H_{7}^{5,2} &{} \dots \\[1mm] \vdots &{} \vdots &{} \vdots &{} \vdots \\ H_{n}^{n,0}&{} H_{n+2}^{n+1,1} &{} H_{n+4}^{n+2,2} &{} \dots \end{array}\right] }_{\mathbf{P}_{\omega }}\Bigg \}\underbrace{\left[ \begin{array}{c} 1 \\[1mm] 1\\[1mm] 1 \\[1mm]1 \\[1mm] \vdots \\[1mm] 1 \end{array}\right] }_\mathbf{e _{n+1,1}}. \end{aligned} \end{aligned}$$
(29)

By introducing the Hadamard product notationFootnote 5 and by substituting the \(\delta \)’s with ones, we can compactly rewrite the above system in the following form:

$$\begin{aligned} \mathbf{Y}_{(\alpha ,\omega )}=\left[ \mathbf{M}_{\alpha }\odot \mathbf{P}_{\omega }\right] \cdot \mathbf{e} _{n+1,1}. \end{aligned}$$
(30)

The above system offers the level of approximation we want to achieve. Note that the frequency response \(\mathbf{Y}\) depends on both the amplitude and the frequency, while the right-hand side of Eq. (30) reveals the separation of the aforementioned quantities. As we neglect higher order Volterra kernels, the measurement set tends to be corrupted by noise.

> Kernel separation and stage \(\ell \)-approximation

For a given system, the procedure consists in exciting it with a single-tone input. By varying the driving frequency, as well as the amplitude, we can approximate the GFRFs by minimizing the (2-norm) of the remaining systems.

$$\begin{aligned} \mathbf{Y}_{m+1,\ell }(jm\omega ,\alpha _{\ell })=\left[ \mathbf{M}_{m+1,\ell }(\alpha _{\ell })\odot \mathbf{P}_{m+1,\ell }(jm\omega )\right] \cdot \mathbf{e} _{n+1,1}. \end{aligned}$$
(31)

The m-“direction” gives us the threshold up to the specific harmonic that we measure while the \(\ell \)-“direction” gives us the level of the kernel separation that we want to achieve. For instance, for the second stage approximation, it holds \(\ell =2\) with \(Y_{m}\approx 0,~\forall m \text {~with~} \ell =2<m=3,4,...\).

4.3 Identification of the Matrix \(\mathbf{N}\)

The difference between linear and bilinear models is the presence of the product between the input and the state that is scaled by the matrix \(\mathbf{N}\). As the LF is able to identify the linear part (\(\mathbf{A},\mathbf{b},\mathbf{c},\mathbf{E}\)) of the bilinear model the only thing that remains is the identification of the matrix \(\mathbf{N}\). The matrix \(\mathbf{N}\) enters linearly in the following kernels (as \(\mathbf{E}\) has been considered invertible, for simplicity, it is assumed \(\mathbf{E}=\mathbf{I}\)):

  • With a single-tone input the kernel \(H_{2}^{1,1}\) can be written as

    $$\begin{aligned} H_{2}(j\omega _1,-j\omega _1)=\frac{1}{2}\mathbf{c}\left( -\mathbf{A}\right) ^{-1}\mathbf{N}\left( (j\omega _1\mathbf{I}-\mathbf{A})^{-1}\mathbf{b}+(-j\omega _1\mathbf{I}-\mathbf{A})^{-1}\mathbf{b}\right) \end{aligned}$$
    (32)

    and the kernel \(H_{2}^{2,0}\) as

    $$\begin{aligned} H_{2}(j\omega _1,j\omega _1)=\mathbf{c}\left( 2j\omega _1\mathbf{I}-\mathbf{A}\right) ^{-1}\mathbf{N}(j\omega _1\mathbf{I}-\mathbf{A})^{-1}\mathbf{b}. \end{aligned}$$
    (33)
  • While with a double-tone input the general kernel \(H_{2}\) can be written as

    $$\begin{aligned} H_{2}(j\omega _1,j\omega _2)=\frac{1}{2}\mathbf{c}\bigg ((j\omega _1+j\omega _2)\mathbf{I}-\mathbf{A}\bigg )^{-1}\mathbf{N}\bigg ((j\omega _1\mathbf{I}-\mathbf{A})^{-1}\mathbf{b}+(j\omega _2\mathbf{I}-\mathbf{A})^{-1}\mathbf{b}\bigg ). \end{aligned}$$
    (34)

We introduce the following notation:

$$\begin{aligned} \begin{aligned} \mathcal {O}(j\omega _1,j\omega _2)&=\frac{1}{2}\mathbf{c}\bigg (j(\omega _1+j\omega _2)\mathbf{I}-\mathbf{A}\bigg )^{-1}\in {\mathbb C}^{1\times n},\\ \mathcal {R}(j\omega _1,j\omega _2)&=\bigg ((j\omega _1\mathbf{I}-\mathbf{A})^{-1}\mathbf{b}+(j\omega _2\mathbf{I}-\mathbf{A})^{-1}\mathbf{b}\bigg )\in {\mathbb C}^{n\times 1}. \end{aligned} \end{aligned}$$
(35)

Then, Eq. (34) can be compactly rewritten as

$$\begin{aligned} H_{2}(j\omega _1,j\omega _2)=\mathcal {O}(j\omega _1,j\omega _2)\mathbf{N}\mathcal {R}(j\omega _1,j\omega _2). \end{aligned}$$
(36)

Assume that k measurements of the function \(H_2\) are available (measured) for k different pairs \((\omega _1,\omega _2)\). By vectorizing in respect to the measurement set, we have for the kth measurement:

$$\begin{aligned} \underbrace{H_{2}(j\omega _{1}^{(k)},j\omega _{2}^{(k)})}_{\mathbf{Y}^{(k)}}=\underbrace{\mathcal {O}(j\omega _{1}^{(k)},j\omega _{2}^{(k)})}_{\mathcal {O}_{1,n}^{(k)}}\underbrace{\mathbf{N}}_{n\times n}\underbrace{\mathcal {R}(j\omega _{1}^{(k)},j\omega _{2}^{(k)})}_{\mathcal {R}_{n,1}^{(k)}}, \end{aligned}$$
$$\begin{aligned} \text {For all }k\text { measurements}\rightarrow \mathbf{Y}_{(1:k,1)}=\underbrace{\left( \mathcal {O}_{(1,n)}^{(k)}\otimes \mathcal {R}_{(1,n)}^{T(k)}\right) }_{(1:k,n^2)}{\underbrace{vec\left( \mathbf{N}\right) }_{(1:n^2,1)}}. \end{aligned}$$
(37)

Note that Eqs. (32), (33), (34) can be equivalently rewritten as the one linear matrix equation given in Eq. (37). By filling out the above matrix \(\left[ \mathcal {O}\otimes \mathcal {R}^T\right] \) with the information from \(H_{2}(j\omega _1,-j\omega _1)\) and from \(H_{2}(j\omega _1,j\omega _1)\) as well, the solution can be improved. Hence, we are able to solve Eq. (37) with full rank and identify the matrix \(\mathbf{N}\). All the symmetry properties of the kernels are appropriately used, e.g., conjugate-real symmetry. For n denoting the dimension of the bilinear model and k the number of measurements, we have the following two casesFootnote 6:

  1. 1.

    \(k<n^2\) underdetermined \(\rightarrow \) least-squares (LS) solution (minimizing the 2-norm) as in [28],

  2. 2.

    \(k\ge n^2\) determined-rank completion \(\rightarrow \) identification of \(\mathbf{N}\),

Proposition 1

Let \(\Sigma _{b}=(\mathbf{A},\mathbf{N},\mathbf{b},\mathbf{c},\mathbf{E})\) be a bilinear system of dimension n for which the linear subsystem \(\Sigma _{l}=(\mathbf{A},\mathbf{b},\mathbf{c},\mathbf{E})\) is fully controllable and observable. Then, for \(k\ge n^2\) measurements so that \((j\omega _{1}^{(k)},j\omega _{2}^{(k)})\) are distinct complex pairs with \((\omega _{1}^{(k)},\omega _{2}^{(k)})\in {\mathbb R}_{+}^{2}\) and \(\omega _{1}^{(k)}\ne \omega _{2}^{(k)}\), the following holds:

$$\begin{aligned} rank\Bigg (\underbrace{\left[ \begin{array}{c} \mathcal {O}^{(1)}\otimes \mathcal {R}^{T(1)}\\ \mathcal {O}^{(2)}\otimes \mathcal {R}^{T(2)}\\ \vdots \\ \mathcal {O}^{(k)}\otimes \mathcal {R}^{T(k)} \end{array}\right] }_{(1:k\ge n^2,n^2)}\Bigg )=n^2. \end{aligned}$$
(38)

As the above result indicates, one would need at least \(n^2\) measurements to identify the matrix \(\mathbf{N}\) corresponding to bilinear system of dimension n.

4.4 A Separation Strategy for the second Kernel

To identify the nth Volterra kernel, we need an n-tone input signal. As we want to identify the second kernel, the input signal needs to be chosen as a double-tone Eq. (24). The propagating harmonics are \(e^{(j(m_1-m_2)\omega _1+j(m_3-m_4)\omega _2)t}\) or more compactly \(e^{(\pm kj\omega _1\pm lj\omega _2)t}\), where \(k,l\in {\mathbb N}\). The aim is to differentiate the \((\omega _1+\omega _2)\) harmonic from the others harmonics. More precisely, we want the following result to hold:

$$\begin{aligned} \omega _1+\omega _2\ne k\omega _1+l\omega _2,~\forall (k,l)\in {\mathbb Z}\times {\mathbb Z}\setminus {\{1,1\}}. \end{aligned}$$
(39)

Suppose \(\omega _2=\phi \omega _1,~\phi \in {\mathbb R}\). The suitable \(\phi \)’s where Eq. (39) holds are

$$\begin{aligned} \begin{aligned} \omega _1+\phi \omega _1&=k\omega _1+l\phi \omega _1\Rightarrow 1+\phi =k+l\phi \Rightarrow \phi =\frac{k-1}{1-l},~k,l\in {\mathbb Z}\setminus \{1\}. \end{aligned} \end{aligned}$$
(40)

By choosing \(\phi \) so that the equality in Eq. (40) doesn’t hold, with harmonic mixing index \(m=k+l\), it makes the harmonic \((\omega _1+\omega _2)\) uniquely defined in the frequency spectrum up to the mth kernel.

To visualize this feature, we choose \(\omega _1=1\), and \(\omega _2=\omega _1\phi =\phi \), for harmonic mixing index \(m=4\). Then, the constraints of \(\phi \) are depicted in Fig. 4 with blue dots.

Next, in Fig. 5 and on the left pane, one \(\phi \) constraint that occurs commensurate harmonics is depicted with the second and the third kernel to contribute at the same harmonic. On the right pane, the harmonic is uniquely defined at \((\omega _1+\omega _2)\) from the second kernel up to the mixing order \(m=4\).

Fig. 4
figure 4

This figure shows the constrains of \(\phi \) (e.g., \(\phi =0,1/3,1/2,1,2,3,\ldots \), etc.). By choosing \(\phi \)’s within the blue dots, we construct frequency bandwidths with a unique \((\omega _1+\omega _2)\)

Fig. 5
figure 5

Left pane: Overlapping kernels contributing to the same harmonic with invalid \(\phi =0.5\). Right pane: Uniquely defined harmonic at \((\omega _1+\omega _2)\) with valid \(\phi =1.5\). Here, it holds \((n=k+l)\)

The next result allows us to construct sweeping frequency schemes to get enough measurements for the \(H_{2}(j\omega _1,j\omega _2)\). So, for every \(\omega _1>0\) the following should hold:

$$\begin{aligned} \omega _2\in \left( \phi _{i-1}\omega _1,\phi _{i}\omega _1\right) ,~i=1,\ldots \end{aligned}$$
(41)

where \(\phi _i\) are the constraints (see Fig. 4 blue dots).

Remark 3

Note that in the proposed framework, the separation of the kernels that contribute at \((\omega _1+\omega _2)\) harmonic is forced only under a specific mixing order m. We do not offer any general solution to this separation problem for multi-tone input, although techniques have been introduced such as in [16]. Therefore, it was also stated that the solution of the full separation of harmonics is, in general, not possible.

4.5 The Loewner-Volterra Algorithm for Time-Domain Bilinear Identification and Reduction

We start with a set of single-tone inputs \(u(t)=\alpha _\ell \cos (\omega _1^{(i)} t),~i=1,...,k\), with \(\alpha _\ell <1\). For those k measurements, we can estimate the linear kernel \(H_{1}(j\omega _1^{(i)})\), the \(H_{2}(j\omega _{1}^{(k)},j\omega _{1}^{(k)})\) and the \(H_{2}(j\omega _1^{(k)},-j\omega _1^{(k)})\) by simply measuring the first harmonic as \(\mathbf{Y}_{1}\), the second harmonic as \(\mathbf{Y}_{2}\), and the DC term as \(\mathbf{Y}_{0}\), from the frequency spectrum as shown in Fig. 3. To improve the accuracy of the estimations for the aforementioned kernels, we could further upgrade to an \(\ell \)-stage approximation by varying the amplitude \(\alpha _{\ell }\) as explained in Sect. 4.2. This approach is necessary whenever higher harmonics are considered to be numerically nonzero, hence meaningful. The reason for this is that the first harmonic is hence corrupted by noise introduced by the term \(H_{3}^{2,1}\) and the rest of the terms which appears on the second row of matrix \(\mathbf{P}_\omega \) in Eq. (29).

Since the LF reveals the underlying order of the linear system denoted with r, the value of k should be at least equal to 2r. Then, we can take the decision on what will be the order r of the reduced system by analyzing the singular value decay. Up to the previous step, we have identified the linear part with the LF, and we have filled the LS problem Eq. (37) with measurements from the diagonal of the second kernel and from the the perpendicular to the diagonal axis \((\omega _1,-\omega _1)\). Those measurements contribute to the problem, but with an underdetermined (rank deficient) LS problem.

We need more measurements of \(H_{2}\) to reach the full rank \((r^2)\) solution that will lead to the identification of \(\mathbf{N}\). So, we proceed by measuring the \(H_{2}\) out of the diagonal (\(\omega _{1}\ne \omega _{2}\)) with a double-tone input as \(u(t)=\alpha _{\ell }\cos (\omega _{1}^{(k)} t)+\beta _{\ell }\cos (\omega _{2}^{(k)} t)\), for a set of frequency pairs \((\omega _1,\omega _2)\) up to \(r^2\). The kernel separation problem for the frequency \((\omega _1,\omega _2)\) appears now. To deal with this problem, we follow the solution proposed in Sect. 4.4 (up to a mixing degree). Last, we solve the realFootnote 7 full-rank LS problem described in Eq. (37) by using all the symmetric properties of these kernels (i.e., real symmetry, conjugate symmetry, and the fact that \(H_{2}(j\omega _1,j\omega _2)=H_{2}(j\omega _2,j\omega _1)\)). An algorithm that summarizes the above procedure is presented below.

(Algorithm) The Loewner-Volterra algorithm for bilinear identification and reduction from time-domain data.

Input/Data acquisition: Use as control input the signals: \(u(t)=\alpha _{\ell }\cos (\omega _{1}^{(k)} t)+\beta _{\ell }\cos (\omega _{2}^{(k)} t),~t\ge 0\), by sweeping the small amplitudes \((<1)\) and a particular range of frequencies.

Output: A bilinear system of dimension-r: \(\Sigma _{b_r}:\left( \mathbf{A}_{r},\mathbf{N}_{r},\mathbf{b}_{r},\mathbf{c}_{r},\mathbf{E}_{r}\right) \)

  1. 1.

    Apply one-tone input u(t) with \(\beta _{\ell }=0\), \(\omega _{1}^{(k)}\) for \(k=1,\ldots ,n\), and collect the snapshots y(t) in steady state.

  2. 2.

    Apply Fourier transform and collect the following measurements:

    • DC term: \(~Y_{O}(0\cdot j\omega _{1}^{(k)})\),

    • 1st harmonic: \(Y_{I}(1\cdot j\omega _{1}^{(k)})\),

    • 2nd harmonic: \(Y_{II}(2\cdot j\omega _{1}^{(k)})\),

      \(\vdots \)

    • mth harmonic: \(Y_{m^{th}}(m\cdot j\omega _{1}^{(k)})\) (last numerically nonzero harmonic).

  3. 3.

    If the second harmonic or higher harmonics are nonzero, the system is nonlinear. By sweeping the amplitude and using the adaptive scheme (stage \(\ell \)-approximation) in Eq. (30), the estimations of the first and the second kernels can be improved. If the second and higher harmonics are equal to zero, the bilinear matrix \(\mathbf{N}\) remains zero and the underlying system is linear.

  4. 4.

    Apply the linear LF, see Algorithm 1 in [29] by using the measurements (e.g., \(H_{1}(j\omega _{1}^{(k)})\approx 2Y_{I}(j\omega _{1}^{(k)})/\alpha _{\ell }\) for the second stage approximation \(Y_{m}\approx 0\) for \(m>2\)) and get the order r linear model.

  5. 5.

    If the system is nonlinear, by fitting a bilinear matrix \(\mathbf{N}\) will improve the accuracy. Apply the two-tone input \(u(t)=\alpha _{\ell }\cos (\omega _{1}^{(k)} t)+\beta _{\ell }\cos (\omega _{2}^{(k)} t)\) to get enough measurements \((\le r^2)\) to produce a full-rank LS problem. Measure the \((\omega _1+\omega _2)\) harmonic as explained in Sect. 4.4 and get the estimations for the second kernel as: \(H_{2}(j\omega _{1}^{(k)},j\omega _{2}^{(k)})\approx 2Y_{II}(j\omega _{1}^{(k)},j\omega _{2}^{(k)})/(\alpha _{\ell }\beta _{\ell })\).

  6. 6.

    Solve the full-rank least-squares problem as described in Eq. (37) and compute the real-valued bilinear matrix \(\mathbf{N}\). When the inversion is not exact due to numerical issues, the least-squares solution is obtained with a thresholding SVD.

4.6 Computational Effort of the Proposed Method

In this section, we discuss the computational effort of the proposed method by analyzing each step. We comment on the applicability of large-scale problems and the relation with real-world scenarios.

Simulation of processes with harmonic inputs constitutes a classical technique which is applied in many engineering applications; data acquisition in the time domain is a common procedure. Nevertheless, using advanced electronic devices such as vector network analyzers (VNAs), frequency-domain data can also be obtained (directly). The Loewner framework applied in the case where frequency-domain data that are obtained from VNAs offers an excellent identification and reduction tool in the linear case (with many applications in electrical, mechanical, or civil engineering). In the context of the current paper, we deal with time-domain data for a special class of nonlinear problems.

For the purpose of identifying and reducing bilinear systems from time-domain measurements, the most expensive procedure is that of data collection. This is done by simulating time-domain models with Euler’s method (bilinear models such as the ones approximating Burgers’ equation). Nevertheless, the heavy computational cost of simulating large dimensional systems in time domain could be alleviated using parallel processing (e.g., for multiple computational clusters). The process of estimating transfer functions values by computing the Fourier transform hence remains robust. In addition, the LF can adaptively detect the decay of the singular values and hence the procedure can be terminated for a specific reduced order \(r\ll n\).

In the beginning, a linear system of reduced dimension r is fitted using the LF. For the rest of the proposed algorithm, note that we will use the lower dimension r to our advantage, and hence the method remains robust. The next step is to compute the matrix \(\mathbf{N}\) that characterizes the nonlinearity of bilinear systems. As the fitted linear system is of dimension r, we hence need to detect exactly \(r^2\) unknowns (the entries of matrix \(\mathbf{N}\)). As presented in Sect. 4.3, this boils down to solving a full-rank LS problem that can be easily dealt with.

The aim of the newly proposed method is to accurately train bilinear models from time-domain data. We offer a first step approach toward complete identification of such systems within the Volterra series approximation approach. In many cases, large-scale systems are sparse (due to spatial domain semi-discretization) and hence reduction techniques can be applied. The new method deals with the inherent redundancies through the linear subsystem (compression by means of SVD). Afterward, it updates the nonlinear behavior by introducing an appropriate low-dimensional bilinear matrix that improves the overall approximation. Note also that the new method relies on the controllability/observability of the fitted linear system. Additionally, noise values up to a particular threshold can be handled as presented in Sect. 5; further analysis on noise-related issues is left for future research.

Table 1 Measurements of the first (linear) kernel

5 Numerical Examples

Example 1

(Identifying a low-order bilinear toy example) The aim of this experiment is to identify a simple bilinear model from time-domain measurements. Consider the following controllable/observable bilinear model Eq. (18) of dimension-2 with a non-symmetric matrix \(\mathbf{N}\), zero initial condition and matrices as

$$\begin{aligned} \mathbf{E}=\left[ \begin{array}{cc} 1 &{} 0\\ 0 &{} 1 \end{array}\right] ,~\mathbf{A}=\left[ \begin{array}{cc} -1 &{} -10\\ 10 &{} -1 \end{array}\right] ,~\mathbf{N}=\left[ \begin{array}{cc} 1 &{} -2\\ 3 &{} -4 \end{array}\right] ,~\mathbf{B}=\left[ \begin{array}{c} 1\\ 1 \end{array}\right] ,~ \mathbf{C}=\left[ \begin{array}{cc} 1&1 \end{array}\right] . \end{aligned}$$
(42)

We simulate the system in the time domain with an input as: \(u(t)=A\cos (\omega t)\), magnitude \(A=0.01\), frequency \(\omega \in \left[ \begin{array}{cccc} 0.5&1&1.5&2\end{array}\right] 2\pi \), and time step \(dt=1e-4\). Next, the second-stage approximation results for the linear kernel \(\tilde{H}_{1}\) in comparison with the theoretical values of \(H_{1}\) are presented in Table 1.

With the estimations of the linear transfer function and by using the LF as the data-driven identification and reduction tool for linear systems, we identify the linear system \((\tilde{\mathbf{A}},\tilde{\mathbf{b}},\tilde{\mathbf{c}},\tilde{\mathbf{E}})\). We stopped at the fourth measurement due to the fact that the underlying system is of second order (McMillan degree 2). Otherwise, more measurements will be needed to have a sufficient decay of the singular values as shown in Fig. 6. The singular value decay offers a choice of reduction. As long as the simulation of the system is done, with time step \(dt=1e-4\), the singular values with magnitude below that threshold are neglected.

Fig. 6
figure 6

The singular value decay of the LF as a fundamental characterization of the McMillan degree of the underlying linear system. Here, a truncation scheme of order \(r=2\) is recommended where the second stage approximation gave \(\sigma _{3}/\sigma _{1}=4.721\cdot 10^{-5}\), while for the noise-free case the third singular values have reached the machine precision

Construction of the linear system with order \(r=2\), by using the theoretical noise-free measurements (subscript “t”) appears next:

$$\begin{aligned} \tilde{\mathbf{A}}_t=\left[ \begin{array}{cc} -1.4513 &{} -8.8181\\ 11.363 &{} -0.54868 \end{array}\right] ,~\tilde{\mathbf{B}}_t=\left[ \begin{array}{c} -0.92979\\ 1.3967 \end{array}\right] ,~ \tilde{\mathbf{C}}_t=\left[ \begin{array}{cc} -0.76857&0.9203 \end{array}\right] , \end{aligned}$$
(43)

while by using the measured data with second-stage approximation results to the following:

$$\begin{aligned} \tilde{\mathbf{A}}=\left[ \begin{array}{cc} -1.458 &{} -8.8137\\ 11.367 &{} -0.55162 \end{array}\right] ,~\tilde{\mathbf{B}}=\left[ \begin{array}{c} -0.9342\\ 1.4 \end{array}\right] ,~ \tilde{\mathbf{C}}=\left[ \begin{array}{cc} -0.7675&0.91611 \end{array}\right] . \end{aligned}$$
(44)

> Identified linear dynamics

Even if the coordinate system is different, one crucial qualitative result is to compute the poles and zeros of the linear transfer function. For the identified system with the theoretical measurements (noise free), the poles and zeros are exactly as the original: \(\tilde{p}_t=-1\pm 10{}\mathrm {i}\) and the zero is: \(\tilde{z}_t=-1\) while for the second-stage approximation to the linear system, the corresponding results are: \(\tilde{p}=-1.0048\pm 9.9989{}\mathrm {i},~\tilde{z}=-1.0042\).

At this point, we have recovered the linear part of the bilinear system up to an accuracy due to the truncation of Volterra series. The inexact simulations of the continuous system which are done with a finite time step \(dt=1e-4\), and the Fourier accuracy led to quite accurate results with a perturbation of the order \(\sim O(1e-3)\) by comparing the theoretical poles and zeros. We proceed by collecting the measurements of the second kernel. Table 2, contains measurements of the second kernel with one-tone input.

Table 2 Measurements of the \(H_{2}\) on the diagonal and perpendicular to the diagonal

We can get \(\mathbf{N}\) by solving the least-squares problem by just minimizing the 2-norm as in [28]. This result was not toward the identification of the matrix \(\mathbf{N}\) and here is the new approach working toward the identification of bilinear systems.

? Can we identify the matrix \(\mathbf{N}\)?

The improvement relies on the rank deficiency problem that is produced by getting the least-squares solution without taking under consideration measurements out of the diagonal of the second kernel \(H_{2}\). By filling in the least-squares problem in Eq. (37) with these extra equations, as Proposition 1 indicates, the problem solution upgrades to a full-rank inversion and the answer is affirmative.

Back to our introductory example, the rank of the least-squares problem is less than \(r^2=4\). So, we need to increase the rank. We take measurements (\(\le 4\)) out of the diagonal from the second kernel by using the input \(u(t)=A_{1}\cos (\omega _{1})+B_{1}\cos (\omega _{2})\). Table 3 includes the theoretical and measured results.

The full-rank least-squares solution gave for the theoretical noise-free case and for the second-stage approximation the following results, respectively:

$$\begin{aligned} \tilde{\mathbf{N}}_t=\left[ \begin{array}{cc} -4.1542 &{} -2.0998\\ 3.236 &{} 1.1542 \end{array}\right] ,~\tilde{\mathbf{N}}=\left[ \begin{array}{cc} -4.1557 &{} -2.1084\\ 3.2284 &{} 1.1513 \end{array}\right] \end{aligned}$$
(45)

> Coordinate transformation

By transforming all the matrices to the same coordinate system as in [26], we conclude to the

  • Noise-free case—exact identification

    $$\begin{aligned} \breve{\mathbf{A}}_t=\left[ \begin{array}{cc} -1.0 &{} -10.0\\ 10.0 &{} -1.0 \end{array}\right] ,~\breve{\mathbf{N}}_t=\left[ \begin{array}{cc} 1.0 &{} -2.0\\ 3.0 &{} -4.0 \end{array}\right] ,~\breve{\mathbf{B}}_t=\left[ \begin{array}{c} 1.0\\ 1.0 \end{array}\right] ,~ \breve{\mathbf{C}}_t=\left[ \begin{array}{c} 1.0 \\ 1.0 \end{array}\right] ^T. \end{aligned}$$
    (46)
  • Simulated case—approximated identification

    $$\begin{aligned} \breve{\mathbf{A}}=\left[ \begin{array}{cc} -1.0037 &{} -9.9941\\ 10.004 &{} -1.0059 \end{array}\right] ,~\breve{\mathbf{N}}=\left[ \begin{array}{cc} 0.99525 &{} -1.997\\ 3.006 &{} -3.9997 \end{array}\right] ,~\breve{\mathbf{B}}=\left[ \begin{array}{c} 0.99925\\ 1.0003 \end{array}\right] ,~ \breve{\mathbf{C}}=\left[ \begin{array}{c} 1.0 \\ 1.0 \end{array}\right] ^T. \end{aligned}$$
    (47)

Next, in Fig. 7, evaluation results for the linear and the second-order generalized transfer function are presented:

Finally, time-domain simulations for each system performed in Fig. 8 with a larger amplitude than the probing one.

Table 3 Measurements of the second kernel (out of the diagonal)
Fig. 7
figure 7

The identified first and second kernel with second-stage approximation in comparison with the theoretical kernels

Fig. 8
figure 8

The evaluation of the models with order \(r=2\) performed with input as \(u(t)=\cos (t),~t\in [0,20]\). The noise-free case has reached machine precision

Fig. 9
figure 9

The first and the second kernel evaluations in comparison with the originals

Fig. 10
figure 10

Time-domain simulation for the Burgers’ equation example; viscosity parameter \(\nu \) is set as 1 and the dimension of the semi-discretized model is chosen to be 420. A comparison among the identified/reduced bilinear of order \(r=2\) with the linear and with the frequency-domain Loewner bilinear is depicted. The input is chosen as: \(u(t)=(1+2\cos (2\pi t))e^{-t},t\in [0,2.5],~u(t)=4\text {sawtooth}(8\pi t),t\in [2.5,3.75],~u(t)=0,t\in [3.75,5]\)

Example 2

Time-domain reduction of the Burgers’ Equation. This example illustrates the bilinear modeling and reduction concepts proposed in [5] for the viscous Burgers’ equation from time-domain simulations. We simulate the system with 40 measurements as \(\omega _{k}=j2\pi [0.1,0.2,\ldots ,4]\). We present the corresponding results with initial system dimension \(n=420\) reduced by the proposed method to order \(r=2\) with the first normalized neglected singular value to be \(\sigma _{3}/\sigma _{1}=4.6255\cdot 10^{-4}\). As the order was chosen \(r=2\), the reduced bilinear matrix \(\tilde{\mathbf{N}}\) was introduced by using the following measurements as \(\omega _{1}=j2\pi [0.2,0.4]\) and \(\omega _{2}=j2\pi [0.3,0.6]\). In Fig. 9, evaluation results are presented.

Lastly, in Fig. 10, a time-domain simulation reveals that the proposed method can improve the accuracy by fitting a nonlinear model. Table 4 contains approximation results both in the frequency and, also in the time-domain. For the example presented (dimension reduction from \(n=420\) to \(r=2\)), we offer a comparison of the newly proposed method (Time-LoewBil) with another method, i.e., the frequency-domain bilinear Loewner framework introduced in [5] (Freq-LoewBil). The common frequency grid was selected as described above while the sampling values of the tranfser functions (in the frequency-domain) were corrupted with white-noise. The noise magnitude of the latter was selected to match the noise values introduced by performing time-domain simulations with a time step of \(dt=1e-4\).

Table 4 Summary of the results from the two examples with Time-LoewBil and comparison with [5] for Burgers’ Example 2 of dimension \(n=420\)

Remark 4

(Computational cost for the discretized Burgers’ model of dimension 420) The proposed time-domain Loewner bilinear method uses measurements corresponding to symmetric transfer functions. Such values can be directly inferred from time-domain data by processing the spectral domain, i.e., by computing the FFT of the observed output signals for oscillatory input signals. All experiments were performed on a computer with 12 GB RAM and an Intel(R) Core(TM) i7-10510U CPU running at 1.80 GHz, 2304 Mhz, 4 Cores, 8 Logical Processors. To simulate a system of dimension 420, each measurement took \({\sim }3\) min. So, the data acquisition cost was reported in the range of 1 or 2 h where the identification/reduction part was almost direct. The proposed method seems to efficiently for moderate dimensions; for large-scale problems, the computational issues that appear belong to the class of “embarrassingly parallel” tasks; as the simulations are independent to each other, one can easily speed up the whole process by using instead parallel clusters.

Remark 5

(Discussion and comparison between the two methods) In what follows, we will state the pluses and minuses of the two methods applied for the second numerical example. The frequency Loewner bilinear framework (Freq-LoewBil)

  • Pluses: recovers the original bilinear system with high accuracy, incorporates linear and nonlinear transfer function measurements in a coupled way (“all at once”), can be easily extended to cope with higher order regular kernels, can also be viewed as a Petrov-Galerkin projection-based moment-matching approach.

  • Minuses: It is not completely clear how to measure/obtain the frequency-domain data needed for this method; it uses measurements of regular transfer functions which cannot be (directly) inferred from time-domain simulations.

The time-Loewner bilinear framework (Time-LoewBil)

  • Pluses: It uses measurements corresponding to symmetric transfer functions. Such values can be directly inferred from time-domain data by processing the spectral domain, i.e., by computing the FFT of the observed output signals for oscillatory input signals.

  • Minuses: The fitted bilinear model is as good as the fitted linear model (it relies on the linear fit). As opposed to the first method, it fits the linear and nonlinear parts separately (not “all at once”). It introduces additional errors due to conversion from the time domain to the frequency domain. The latter disadvantage could also occur for the method in [5], provided that “regular transfer function” measurements could be successfully inferred from time-domain data.

6 Conclusion

The proposed method offers approximate bilinear system identification from time-domain measurements, since it is not possible to measure the corresponding kernels exactly. An adaptive scheme that improves the estimation of the kernels was presented. Our proposed method uses only input-output measurements without requiring state-space access. What makes this algorithm feasible is the combination of the data-driven Loewner framework with the nonlinear Volterra series framework.

We have shown that for the noise-free case, the proposed method achieves system identification from time-domain measurements through the symmetric kernels. Further study is required to quantify the effects of the noise introduced by the truncation of the Volterra series (in the \(\ell \)-stage approximation). All the time-domain numerical simulations have been implemented by means of the backward Euler approximation scheme which certifies that this method can handle some level of numerical noise. Multi-stepping methods, e.g., Runge-Kutta can offer a significance improvement to the results and reduce the influence of numerical noise.

The variational approach is a theoretical method to identify regular kernels which are appropriate for system identification purposes [35]. However, these kernels do not have a physical meaning, i.e., cannot be directly measured from time-domain simulations. This is not an issue for the growing exponential approach. The derived transfer functions by means of this method can be measured from time-domain data. The difficulty in combining both derivations, i.e., symmetric and regular is also explained from the nth-dimensional integral that connects those through the triangular kernels. Extensions to the MIMO case and to other nonlinearity structures, e.g., quadratic or bilinear quadratic etc., are promising endeavors that will be the matter of future research.