1 Introduction

Large-scale models often emerge from all the branches of engineering and sciences. For instance, spatial discretization of partial differential equations (PDEs) using the finite volume [127, 178] or finite elements [57, 59], modeling complex fluid dynamics [40, 129, 170], VLSI systems [125, 173], disease modeling [147], micro or nano-electro-magnetic systems [113, 142], represent systems of scientific interest and are large-scale in nature. The need for higher accuracy necessitates enhanced details in these models, which inevitably increases the computational complexity. Furthermore, it is often desired to repeatedly simulate such systems for optimization and control, thus making untenable burdens on computational resources. Model order reduction intends to replace such high-fidelity systems with much cheaper models that can accurately replicate the input–output behavior as that of the original system while preserving some dominant properties such as stability and passivity [1, 10, 49, 119]. Such surrogate models are then reliably used to carry intense simulations to develop faster controllers for real-time applications [5, 61, 107, 131, 135, 146, 155].

One of the most successful MOR methods for reducing large-scale systems is the interpolatory method, also known as the moment-matching method [7, 9, 10, 12]. These methods produce a reduced-order model that interpolates the full-order system’s response at some predefined interpolation points. The main reason these methods are so successful is their numerical stability and that these methods involve less dense matrix transformations. These methods are also closely related to rational Krylov subspace methods. Some excellent reviews can be found in Refs. [21, 28, 65, 68, 69, 72].

This review aims to provide a complete understanding of moment-matching based model reduction methods and presents the most significant developments in this direction. To introduce the problem to first-time readers, we begin with the moment-matching methods in linear systems in Sect. 2. There, we discuss the moment-matching techniques in the frequency and time domain. We also explain the tangential interpolation problem for the multiple-input multiple-output systems and the notion of moments in second-order dynamical systems. Then, we point-out some open issues in these methods and provide some recent advancements in this direction. In Sect. 3, we present a comprehensive moment-matching based model reduction technique in nonlinear systems, based on the recently developed notion of steady-state interpolation. To illustrate the idea, we present numerical results for capturing highly oscillatory dynamics of MEMS switch through reduced modeling technique. We also discuss some of the prominent nonlinear MOR techniques based on Krylov subspace methods and present some open problems. Finally, we provide some concluding remarks in the end.

Throughout this manuscript, we study of MOR via moment-matching for a general class of continuous systems described in state-space representation given as:

$$\begin{aligned} \frac{d}{dt}\varvec{h}(\varvec{x}(t))&=\varvec{f} (\varvec{x}(t))+\varvec{Bu}(t)\quad \varvec{x}(0)=\varvec{x_{0}} \end{aligned}$$
(1a)
$$\begin{aligned} \varvec{y}(t)&=\varvec{C}\varvec{x}(t) \end{aligned}$$
(1b)

where \(\varvec{x}(t) \in \mathbb {R}^{n}\) is known as the state-vector in control system parlance. This represents the vector of unknowns whose entries are called internal variables or state variables. Matrices \(\varvec{B}\in \mathbb {R}^{n \times m}\) and \(\varvec{C}\in \mathbb {R}^{n \times p}\) are the input and output matrices respectively, whereas \(\varvec{f},\varvec{h} : \mathbb {R}^{n} \rightarrow \mathbb {R}^n\) are the two nonlinear mappings. Vectors \(\varvec{u}(t) \in \mathbb {R}^{m}\) and \(\varvec{y}(t) \in \mathbb {R}^{p}\) are the inputs and outputs of the system. System (1) represents a multiple-input multiple-output (MIMO) system with \(m,p<<n\). However, for single-input single-output (SISO) systems, \(m=p=1\), and the matrices \(\varvec{B}\), \(\varvec{C}\) become vectors \(\varvec{b}\) and \(\varvec{c}^T\) whereas vectors \(\varvec{u}\) and \(\varvec{y}\) become scalars u and y respectively. Furthermore, if \(\varvec{f}(.)\) and \(\varvec{h}(.)\) are linear mappings, then system (1) is called a linear dynamical system otherwise a nonlinear (control-affine) system. Thus, system (1) essentially represents a large-scale, initial value problem that we are interested in solving for some predefined time-span, in the presence of some excitation function \(\varvec{u}(t)\).

The most common approach to obtain a reduced-order model of system (1) is via projection [39, 104, 171, 180], i.e., by projecting the dynamics of full-order model on a lower-dimensional manifold, spanning lesser degrees of freedom \((r<<n)\). This is followed by re-projecting the reduced dynamics onto the original space to obtain the approximation. The approximation ansatz is given as \(\varvec{x}(t)=\varvec{Vx_{r}}(t)\) with \(\varvec{x_{r}}(t) \in \mathbb {R}^{r}\) and where the columns of \(\varvec{V} \in \mathbb {R}^{n \times r}\) represent the basis for the reduced subspace. Substituting this approximation in Eq. (1) yields an over-determined systems of equations given as:

$$\begin{aligned} \frac{d}{dt}\varvec{h}(\varvec{Vx_{r}}(t))&=\varvec{f} (\varvec{Vx_{r}}(t))+\varvec{Bu}(t)+\varvec{r}(t) \end{aligned}$$
(2a)
$$\begin{aligned} \varvec{y}(t)&=\varvec{C}\varvec{Vx_{r}}(t) \end{aligned}$$
(2b)

where \(\varvec{r}(t)\) is the residual vector due to the subspace approximation. In order to make this a well-posed problem, the Petrov-Galerkin condition is then imposed, i.e., \(\varvec{W}^{T}\varvec{r}(t)=0\). This gives:

$$\begin{aligned} \frac{d}{dt}\varvec{W^{T}h}(\varvec{Vx_{r}}(t))&=\varvec{W^{T}f}(\varvec{Vx_{r}}(t))+\varvec{W^{T}Bu}(t) \end{aligned}$$
(3a)
$$\begin{aligned} \varvec{y}(t)&=\varvec{C}\varvec{Vx_{r}}(t) \end{aligned}$$
(3b)

where, it is usually required that \(\varvec{W^{T}V}=\varvec{I}\), such that \(\varvec{VW^{T}}\) is a projector onto an r-dimensional subspace. The various reduction methods vary in the choice of the projection matrices \(\varvec{V}\) and \(\varvec{W}\). However, one common aim is that the response of the reduced model Eq. (3) is “close” to the high-fidelity model Eq. (1).

In what follows, we discuss the moment-matching technique in linear time-invariant systems followed by nonlinear systems.

2 Moment Matching for Linear-Time Invariant Systems

Many mathematical and physical systems can be modeled as linear time-invariant (LTI) systems. To introduce the idea of MOR via moment-matching, consider a large-scale system of autonomous differential equations resulting from the spatial discretization of PDE’s represented in a continuous-time, state-space model as given below:

$$\begin{aligned} \varSigma _{L}:{\left\{ \begin{array}{ll} \varvec{E}\dot{\varvec{x}}(t)=\varvec{Ax}(t)+\varvec{Bu}(t) \\ \quad \varvec{y}(t)=\varvec{C}\varvec{x}(t) \end{array}\right. } \end{aligned}$$
(4)

where n is the order of the state-space model (\(\varSigma _{L}\)) i.e., the states \(\varvec{x}(t) \in \mathbb {R}^{n}\) span an n-dimensional space, \(\varvec{E} \in \mathbb {R}^{n \times n}\) is the (non-singular) descriptor matrix, \(\varvec{A} \in \mathbb {R}^{n\times n}\) is the system matrix, \(\varvec{B} \in \mathbb {R}^{n\times m}\) is the input matrix and \(\varvec{C} \in \mathbb {R}^{n\times p}\) is the output matrix. The LTI system in Eq. (4) is called regular if the matrix \(\varvec{E}\) is nonsingular or descriptor system otherwise. In case of singular \(\varvec{E}\), Eq. (4) represents a system of differential-algebraic equations (DAEs) rather than ordinary differential equations (ODEs) which are more difficult to solve.

For LTI systems considered here, the impulse response is given as:

$$\begin{aligned} {\hbar }(t)=\varvec{C}e^{(\varvec{E}^{-1}\varvec{A})t}\varvec{E}^{-1}\varvec{B} \end{aligned}$$
(5)

and the frequency domain transfer function, which is a \(p \times m\) matrix-valued rational function, is obtained by taking the Laplace transform of the impulse response given as:

$$\begin{aligned} \varvec{H}(s)&=\mathcal {L}({\hbar }(t))=\int _{0}^{\infty }{\hbar }(t)e^{-st}dt\\ {}&=\varvec{C}(s\varvec{E}-\varvec{A})^{-1}\varvec{B} \end{aligned}$$
(6)

which can also be written as

$$\begin{aligned} \varvec{H}(s)=-\varvec{C}(\varvec{I}-s\varvec{A}^{-1}\varvec{E})^{-1}\varvec{A}^{-1}\varvec{B} \end{aligned}$$
(7)

Using the Neumann expansion, we can rewrite the above expression as:

$$\begin{aligned} (\varvec{I}-s\varvec{A}^{-1}\varvec{E})^{-1}=\sum _{i=0}^{\infty }(\varvec{A}^{-1}\varvec{E}s)^{i} \end{aligned}$$
(8)

which can be further expanded via Taylor series as

$$\begin{aligned} \varvec{H}(s)&=-\varvec{CA}^{-1}\varvec{B}-\varvec{C}\varvec{A}^{-1}\varvec{E}\varvec{A}^{-1}\varvec{B}s- \cdots \\&\quad -\varvec{C}(\varvec{A}^{-1}\varvec{E})^{i}\varvec{A}^{-1}\varvec{B}s^{i}-\cdots \end{aligned}$$
(9)

Definition 1

[7] The moments of the LTI system Eq. (4) about the expansion point \(s=0\) are the negative coefficients of the Taylor series expression Eq. (9) expanded about \(s=0\) and are given as

$$\begin{aligned} \varvec{\eta }_{i}=\varvec{C}(\varvec{A}^{-1}\varvec{E})^{i}\varvec{A}^{-1}\varvec{B}\quad i=0,1,\ldots \end{aligned}$$
(10)

The moments of system (4) at the point \(s=0\), are also the successive derivatives of the transfer function \(\varvec{H}(s)\) i.e.,

$$\begin{aligned} \varvec{\eta }_{i}^{0}=-\frac{1}{i}\frac{d^{i}\varvec{H}(s)}{ds^{i}}\bigg |_{s=0} \end{aligned}$$
(11)

If the expansion is performed at some other point than \(s=0\) e.g. at \(s=\sigma\), then the moments of (4) about \(s=\sigma\) are computed by replacing the matrix \(\varvec{A}\) with \((\varvec{A}-\sigma \varvec{E})\) in Eq. (10) i.e.,

$$\begin{aligned} \varvec{\eta }_{i}^{\sigma }= \varvec{C}((\varvec{A}-\sigma \varvec{E})^{-1}\varvec{E})^{i}(\varvec{A}-\sigma \varvec{E})^{-1}\varvec{B} \end{aligned}$$
(12)

for \(i=0,1,\ldots\) and where it is assumed that the matrix pencil \((\varvec{A}-\sigma \varvec{E})\) is nonsingular.

Definition 2

[7] The coefficients of the Taylor series of transfer function \(\varvec{H}(s)\) about \(s=\sigma\) for \(\sigma \rightarrow \infty\), are defined as the Markov parameters of the LTI system (4) and are given as:

$$\begin{aligned} \varvec{M}_{i}=\varvec{C}(\varvec{E}^{-1}\varvec{A})^{i}\varvec{E}^{-1}\varvec{B} \quad i=0,1,\ldots \end{aligned}$$
(13)

Furthermore, the ith Markov parameter is equal to ith derivative of the impulse response of system (4) at \(t=0\) i.e,

$$\begin{aligned} \varvec{M}_{i}=\frac{d^{i}{\hbar }(t)}{dt^{i}}\bigg |_{t=0} \end{aligned}$$
(14)

This implies that the first Markov parameter \(\varvec{M}_{0}\) is the system’s impulse response at \(t=0\).

2.1 Moment Matching in Frequency Domain

Model reduction via moment matching implies that a reduced-order model is to be obtained whose moments match the the full-order model moments at certain frequencies of interest. If the moments are matched about \(s=0\), the reduced-order model is called the Padé approximant, and the problem is known as Padé’s approximation. If the moments are matched at some other point \(s=\sigma\), the problem is known as shifted Padé’s approximation, and when the Markov parameters are matched, the problem is termed as Partial realization. It is often desired to match moments at more than one expansion point i.e., at specific frequency intervals, then we talk about the multipoint Padé or the rational interpolation problem. Thus, model reduction via moment matching is to construct a reduced order model (ROM) given as:

$$\begin{aligned} \varSigma _{LR}: {\left\{ \begin{array}{ll} \varvec{E_{r}}\dot{\varvec{x}_{r}}(t)=\varvec{A_{r}x_{r}}(t)+\varvec{B_{r}u}(t) \\ \quad \varvec{y_{r}}(t)=\varvec{C_{r}}\varvec{x_{r}}(t) \end{array}\right. } \end{aligned}$$
(15)

where \(\varvec{x_{r}}(t) \in \mathbb {R}^{r} (r<<n)\), and where the reduced system matrices are calculated using the projector \(\varvec{VW}^{T}\) given as:

$$\begin{aligned} \varvec{E_{r}}&=\varvec{W}^{T}\varvec{EV},\quad \varvec{A_{r}}=\varvec{W}^{T}\varvec{AV}\\ \varvec{B_r}&=\varvec{W}^{T}\varvec{B},\quad \varvec{C_{r}}=\varvec{CV} \end{aligned}$$
(16)

such that the moments of \(\varvec{H}(s)\) match the reduced system’s moments having transfer function \(\varvec{H_{r}}(s)\) where

$$\begin{aligned} \varvec{H_{r}}(s)=-\varvec{C_{r}}(\varvec{I}-s\varvec{A_{r}}^{-1}\varvec{E_{r}})^{-1}\varvec{A_{r}}^{-1}\varvec{B_{r}} \end{aligned}$$
(17)

Now, this begs the questions on how to select the expansion points and, as such, obtain the projection matrices \(\varvec{V}\) and \(\varvec{W}.\) This is explained next.

2.1.1 SISO Case

For SISO systems, Eq. (4) simplifies to

$$\begin{aligned} \varSigma _{\text {SISO}}: {\left\{ \begin{array}{ll} \varvec{E}\dot{\varvec{x}}(t)=\varvec{Ax}(t)+\varvec{bu}(t) \\ \quad \varvec{y}(t)=\varvec{c}^{T}\varvec{x}(t) \end{array}\right. } \end{aligned}$$
(18)

and its ROM via Eq. (16) is given as:

$$\begin{aligned} \varvec{W}^{T}\varvec{EV}\dot{\varvec{x_{r}}}(t)&=\varvec{W}^{T}\varvec{AV}\varvec{x_{r}}(t)+\varvec{W}^{T}\varvec{bu}(t) \\ \quad \varvec{y_{r}}(t)&=\varvec{c}^{T}\varvec{Vx_{r}}(t) \end{aligned}$$
(19)

where \(\varvec{W}^{T}\varvec{b} \in \mathbb {R}^{r}\) and \(\varvec{c}^{T}\varvec{V} \in \mathbb {R}^{r}\).

Theorem 1

[7, 71, 84] In order to match r moments between the full-order system (18) and reduced model (19) at the expansion point \(s=0\), it is required that the columns of projection matrix \(\varvec{V}\) used in Eq. (16) form a basis for the Krylov subspace \(\mathcal {K}_{r}(\varvec{A}^{-1}\varvec{E},\varvec{A}^{-1}\varvec{b})\). Furthermore the matrix \(\varvec{W}\) is selected such that the matrix \(\varvec{A_{r}}\) is nonsingular.

Remark 1

The subspace \(\mathcal {K}_{r}(\varvec{A}^{-1}\varvec{E},\varvec{A}^{-1}\varvec{b})\) is known as input Krylov subspace and the reduced scheme is known as the one-sided Krylov subspace method.

A typical choice of one-sided Krylov subspace method is \(\varvec{W}=\varvec{V}\). This has an advantage of preserving stability and passivity of reduced model for some specific large-scale models

Theorem 2

[7, 71, 84] In order to match 2r moments between the full-order system (18) and reduced model (19) at the expansion point \(s=0\), it is required that the columns of projection matrices \(\varvec{V}\)and \(\varvec{W}\) used in Eq. (16) form the basis for the Krylov subspaces given as \(\mathcal {K}_{r}(\varvec{A}^{-1}\varvec{E},\varvec{A}^{-1}\varvec{b})\) and \(\mathcal {K}_{r}(\varvec{A}^{-T}\varvec{E}^{T},\varvec{A}^{-T}\varvec{c})\) respectively, where \(\varvec{A}\) and \(\varvec{A_{r}}\) are assumed to be invertible.

This reduction scheme is known as two-sided Krylov subspace method, and this corresponds to matching double the number of moments than the one-sided method.

Theorem 3

[7, 71, 84] In order to match 2r moments between the full-order system (18) and reduced model (19) at the expansion point \(s=\sigma\), it is required that the columns of projection matrices \(\varvec{V}\)and \(\varvec{W}\) used in Eq. (16) form the basis for the Krylov subspaces given as \(\mathcal {K}_{r}((\varvec{A}-\sigma \varvec{E})^{-1}\varvec{E},\varvec{(}{A}-\sigma \varvec{E})^{-1}\varvec{b})\) and \(\mathcal {K}_{r}((\varvec{A}-\sigma \varvec{E})^{-T}\varvec{E}^{T},(\varvec{A}-\sigma \varvec{E})^{-T}\varvec{c})\) respectively, where \(\varvec{A}\) and \(\varvec{A_{r}}\) are assumed to be invertible.

Remark 2

This is achieved by substituting the matrix \(\varvec{A}\) by \((\varvec{A}-\sigma \varvec{E})\) in the respective Krylov subspaces as explained earlier.

Sometimes, one is interested in capturing the high-speed dynamics of the system at hand, which is achieved by matching moments at higher frequencies (\(s \rightarrow \infty\)) i.e, matching some of the Markov parameters.

Theorem 4

Let \(m_{1} \in \mathbb {Z}\) and \(0 \le m_{1} \le r\), then by selecting the matrix \(\varvec{V}\) used in Eq. (16) as a basis for Krylov subspace given as \(\mathcal {K}_{r}(\varvec{A}^{-1}\varvec{E},(\varvec{E}^{-1}\varvec{A})^{m_{1}}\varvec{A}^{-1}\varvec{b})\) matches the first \(m_{1}\) Markov parameters and the first \(r-m_{1}\) moments of system (18) and (19) resp.

Similarly, the matrix \(\varvec{W}\) is chosen such that \(\varvec{A_{r}}\) and \(\varvec{E_{r}}\) are non-singular. This immediately follows the number of moments, in this case, can be doubled by selecting the suitable input and output Krylov subspaces.

Table 1 Various examples of MOR by moment matching in SISO systems

Theorem 5

Let \(m_{1},m_{2} \in \mathbb {Z}\) where \(0\le m_{1},m_{2}\le r\), then by selecting the matrices \(\varvec{V}\) and \(\varvec{W}\), used in Eq. (16), as the basis for Krylov subspaces given as \(\mathcal {K}_{r}(\varvec{A}^{-1}\varvec{E},(\varvec{E}^{-1}\varvec{A})^{m_{1}}\varvec{A}^{-1}\varvec{b})\) and \(\mathcal {K}_{r}(\varvec{A}^{-T}\varvec{E}^{T},(\varvec{E}^{-T}\varvec{A})^{m_{2}}\varvec{A}^{-T}\varvec{c})\) respectively, matches the first \(m_{1}+m_{2}\) Markov parameters and the first \(2r-m_{1}-m_{2}\) moments of system (18) and (19) respectively.

Now, if the aim is to match moments at multiple expansion points: \(\sigma _{1},\sigma _{2},\ldots ,\sigma _{k}\), then k different Krylov subspaces are to be constructed. In this case, the projection matrix is obtained by the union of all the respective Krylov subspaces such that a universal basis is found.

Theorem 6

The first \(r_{i}\) moments about \(\sigma _{i}\) are matched between systems (18) and (19) respectively, by selecting the matrix \(\varvec{V}\), used in Eq. (16), as follows:

$$\begin{aligned} \bigcup _{i=1}^{k}\mathcal {K}_{r_{i}}((\varvec{A}-\sigma _{i}\varvec{E})^{-1}\varvec{E},(\varvec{A}-\sigma _{i}\varvec{E})^{-1}\varvec{b}) \subseteq colspan(\varvec{V}) \end{aligned}$$
(20)

where its assumed that (\(\varvec{A}-\sigma _{i}\varvec{E}\)) and (\(\varvec{A_{r}}-\sigma _{i}\varvec{E_{r}}\)) are both nonsingular and \(\varvec{W}\) is an arbitrary full rank matrix.

Similar to previous cases, a two-sided method can provide double the number of moments about each point \(\sigma _{i}\). All the theorems mentioned above are summed up in Table 1.

So far, it is not clear how to numerically obtain the projection matrices \(\varvec{V}\) and \(\varvec{W}\). An early attempt at this was the Asymptotic Waveform Evaluation (AWE) method [145] in which the moments were explicitly calculated rather than computing the matrices \(\varvec{V}\) and \(\varvec{W}\). The method became prominent due to its capability to reduce RC interconnect models containing thousands of variables. Later-on, a multi-point version of the method was proposed also in Ref. [53]. AWE based methods, however, had a numerical instability, that is, the vectors become linearly dependent and converge to an eigenvector of \(\varvec{A}\). This shortcoming was first pointed-out by Gallivan et al. [76] and later by Feldman and Freund [62]. Consequently, this led to the development of implicit based moment-matching methods, and the first significant contribution came via the Arnoldi method [13, 69]. This method employs a one-sided projection to iteratively construct a set of normalized vectors satisfying \(\varvec{V}^{T}\varvec{V}=\varvec{I}\). Within every iteration, a new vector is generated, which is orthogonal to all the previous ones. This results in an upper Hessenberg structure of the matrix \(\varvec{A_{r}}\) and the vector \(\varvec{b_{r}}\) becomes a multiple of the first unit vector. However, this algorithm also generates a linearly dependent set of basis vectors for a relatively large value of r. This is normally avoided by deflating the redundant columns of \(\varvec{V}\) to retain reduced models with a certain degree of accuracy [54]. This has been demonstrated by B. Salimbahrami [156] where a modified Gram-Schmidt orthogonalization scheme is employed.

Another key contribution towards implicit moment matching was the two-sided Lanczos method [62, 117], also known as the Padé via Lanczos (PVL) method. This method constructs two sequences of basis vectors which span the respective input and output Krylov subspaces satisfying \(\varvec{W}^{T}\varvec{V}=\varvec{I}\), resulting in an upper triangular structure of the matrix \(\varvec{A_{r}}\). Apart from matching moments, this method was initially proposed for obtaining reduced models based on the computation of eigenvalues [136]. Further work in this direction led to the development of partial realization via Lanczos method by Gragg and Lindquist [81]. The method was also extended to MIMO systems by Aliaga et al. [4] (also see Refs . [108, 109]). Boley [41] addressed the issue regarding the loss of biorthogonalization in classical Lanczos method. Later-on, new results also appeared in the areas of stability retention [85], error analysis [101] and in control literature [41, 174]. However, these studies didn’t present any new structure in projection technique for rational interpolation.

To retain passivity among the reduced models, the passive reduced-order interconnect macromodelling algorithm (PRIMA) was proposed by Odabasioglu et al. [137]. The idea was demonstrated in linear RLC systems. In order to match moments at multiple expansion points, the rational Lanczos method and the dual Arnoldi method were proposed by Grimme et al. [84]. Furthermore, the issue of unstable partial realizations in classical Krylov methods was addressed by the restarting techniques proposed in Ref. [85] and the implicitly restarted dual Arnoldi method by Jaimoukha and Kasemally [102].

2.1.2 MIMO Case

For the case of MIMO systems, (given in Eq. (4)), the block Krylov subspaces are defined as:

$$\begin{aligned} \mathcal {K}_{r}(\varvec{A},\varvec{B})=span(\varvec{B},\varvec{AB},\ldots ,\varvec{A}^{r-1}\varvec{B}) \end{aligned}$$
(21)

The block subspace for m starting vectors/columns of \(\varvec{B}\) can be considered as a union of m Krylov subspaces for each starting vector [156]. Thus, by using the block Krylov subspaces, all the above mentioned theorems for matching the moments/Markov parameters can be generalized for MIMO systems. For instance, one has to use the block versions, i.e, \(\mathcal {K}_{r}(\varvec{A}^{-1}\varvec{E},\varvec{A}^{-1}\varvec{B})\) and \(\mathcal {K}_{r}(\varvec{A}^{-T}\varvec{E}^{T},\varvec{A}^{-T}\varvec{C}^{T})\) that generalizes Theorem 2. Consequently, \(\frac{r}{m}\) moments are matched in one-sided method and \(\frac{r}{m}+\frac{r}{p}\) in the two-sided method.

2.1.3 Tangential Interpolation Problem

The notion of “interpolation” for MIMO systems implies that the interpolating matrix-valued rational function \(\varvec{H_{r}}(s)\) matches the origin function \(\varvec{H}(s)\) with respect to some predefined error. This would, in effect, require \(p\times m\) interpolation conditions for each interpolation point. As such, this will result in large size of reduced-order model r for even a moderate size of input and output dimensions m and p. Thus, for MIMO systems, it’s desired that the interpolating function matches the original function along certain specific directions or tangent directions. Consequently, this involves selecting interpolation points as well as the interpolation directions.

Tangential interpolation based moment matching for MIMO systems was presented in many studies such as in Refs. [7, 11, 21, 62, 71, 76, 139]. It is the extension of rational Krylov methods to MIMO systems when all tangent directions are the same. The tangential interpolation problem for descriptor systems was also proposed by Gugercin et al. [91] and for index one descriptor system by Antoulas et al. [11].

Definition 3

Let \(\varvec{w}_{i} \in \mathbb {C}^{m}\) be the (non-trivial) right tangent direction, we define \(\varvec{H_{r}}(s)\) to be the right tangent interpolant of \(\varvec{H}(s)\) at \(s=\sigma _{i}\) along \(\varvec{w}_{i}\) if

$$\begin{aligned} \varvec{H}(\sigma _{i})\varvec{w}_{i}=\varvec{H_{r}}(\sigma _{i})\varvec{w}_{i} \end{aligned}$$
(22)

Similarly, the left tangential interpolant is defined as:

Definition 4

Let \(\varvec{v}_{i} \in \mathbb {C}^{p}\) be the (non-trivial) left tangent direction, then we define \(\varvec{H_{r}}(s)\) to be the left tangent interpolant of \(\varvec{H}(s)\) at \(s=\mu _{i}\) along \(\varvec{v}_{i}\) if

$$\begin{aligned} \varvec{v}_{i}^{T}\varvec{H}(\mu _{i})=\varvec{v}_{i}^{T}\varvec{H_{r}}(\mu _{i}) \end{aligned}$$
(23)

Remark 3

Given a set of r left interpolation points \(\{ \mu _{i}\}_{i=1}^{r}\), r left tangential directions \(\{\varvec{v}_{i}\}_{i=1}^{r}\), r right interpolation points \(\{ \sigma _{i}\}_{i=1}^{r}\), and r right tangential directions \(\{\varvec{w}_{i}\}_{i=1}^{r}\), we can formulate the model reduction problem via tangential interpolation as finding a degree-r reduced transfer function \(\varvec{H_{r}}(s)\) such that Eqs. (22) and (23) hold for \(i=1,2,..,r\).

Definition 5

Given a set of r left and r right tangential directions as \(\varvec{v}_{i}\) and \(\varvec{w}_{i}\) respectively, we define \(\varvec{H_{r}}(s)\) to be a bitangential Hermite interpolant of \(\varvec{H}(s)\), if \(\varvec{H_{r}}(s)\) satisfies both Eqs. (22) and (23). In addition to this, it is required that

$$\begin{aligned} \varvec{v}_{i}^{T}\frac{d\varvec{H}(s)}{ds}\bigg |_{s=\sigma _{i}}\varvec{w}_{i}=\varvec{v}_{i}^{T}\frac{d\varvec{H_{r}}(s)}{ds}\bigg |_{s=\sigma _{i}}\varvec{w}_{i} \end{aligned}$$
(24)

holds for \(i=1,2,\ldots ,r\).

Theorem 7

[77] If the columns of matrices \(\varvec{V}\) and \(\varvec{W}\) used in Eq. (16) are selected as:

$$\begin{aligned}&(\sigma \varvec{E}-\varvec{A})^{-1}\varvec{Bw}\in Ran(\varvec{V}) \end{aligned}$$
(25a)
$$\begin{aligned}&(\varvec{v}^{T}\varvec{C}(\mu \varvec{E}-\varvec{A})^{-1})^{T}\in Ran(\varvec{W}) \end{aligned}$$
(25b)

then the following tangential interpolation conditions are satisfied

$$\begin{aligned} \varvec{H}(\sigma )\varvec{w}&=\varvec{H_{r}}(\sigma )\varvec{w} \end{aligned}$$
(26a)
$$\begin{aligned} \varvec{v}^{T}\varvec{H}(\mu )&=\varvec{v}^{T}\varvec{H_{r}}(\mu ) \end{aligned}$$
(26b)

with the assumption that \((\sigma \varvec{E}-\varvec{A})\) and \((\mu \varvec{E}-\varvec{A})\) are invertible. Furthermore, if Eq.  (25) hold with \(\sigma =\mu\) then the bitangential Hermite condition, i.e,

$$\begin{aligned} \varvec{v}^{T}\frac{d\varvec{H}(s)}{ds}\bigg |_{s=\sigma }\varvec{w}=\varvec{v}^{T}\frac{d\varvec{H_{r}}(s)}{ds}\bigg |_{s=\sigma }\varvec{w} \end{aligned}$$
(27)

is satisfied as well.

Remark 4

Theorem 7 demonstrates a left or right interpolation condition without the need to explicitly calculate the values that are interpolated.

The idea can be easily extended to the case of r interpolation points i.e., given a set of r left interpolation points \(\{ \mu _{i}\}_{i=1}^{r}\), r right interpolation points \(\{ \sigma _{i}\}_{i=1}^{r}\), r left tangent directions \(\{\varvec{v}_{i}\}_{i=1}^{r}\) and r right tangent directions \(\{\varvec{v}_{i}\}_{i=1}^{r}\) respectively, we can construct the matrices \(\varvec{V}\) and \(\varvec{W}\), that satisfy the Lagrange tangential interpolation conditions in Eqs. (22,23) and also the bitangential Hermite interpolation condition in Eq. (24) for \(\sigma _{i}=\mu _{i}\), given as follows:

$$\begin{aligned} {[}(\sigma _{1}\varvec{E}-\varvec{A})^{-1}\varvec{B}\varvec{w}_{1},\ldots ,(\sigma _{r}\varvec{E}-\varvec{A})^{-1}\varvec{B}\varvec{w}_{r}]= \varvec{V} \end{aligned}$$
(28)

and

$$\begin{aligned} {[}(\mu _{1}\varvec{E}-\varvec{A})^{-T}\varvec{C}^{T}\varvec{v}_{1},\ldots ,(\mu _{r}\varvec{E}-\varvec{A})^{-T}\varvec{C}^{T}\varvec{v}_{r}]=\varvec{W} \end{aligned}$$
(29)

The scheme can also be used for higher-order Hermite interpolation given as follows:

Theorem 8

Given the interpolation points \(\sigma ,\mu \in \mathbb {C}\) and the (nontrivial) tangent directions \(\varvec{v}\in \mathbb {C}^{p}\) and \(\varvec{w} \in \mathbb {C}^{m}\). Let the matrix pencils \((\sigma \varvec{E}-\varvec{A})\) and \((\mu \varvec{E}-\varvec{A})\) be invertible, then if the columns of matrix \(\varvec{V}\) used in Eq. (16) are constructed as

$$\begin{aligned} ((\sigma \varvec{E}-\varvec{A})^{-1}\varvec{E})^{k-1}(\sigma \varvec{E}-\varvec{A})^{-1}\varvec{Bw}\in Ran(\varvec{V}) \end{aligned}$$
(30)

for \(k=1,..,N\), then

$$\begin{aligned} \frac{d^{j}\varvec{H}(s)}{ds^{j}}\bigg |_{s=\sigma } \varvec{w}= \frac{d^{j}\varvec{H_{r}}(s)}{ds^{j}}\bigg |_{s=\sigma } \varvec{w} ~~~\text {for}~j=0,..,N-1 \end{aligned}$$
(31)

and if the columns of matrix \(\varvec{W}\) used in Eq. (16) are constructed as

$$\begin{aligned} ((\mu \varvec{E}-\varvec{A})^{-T}\varvec{E}^{T})^{k-1} (\mu \varvec{E}-A)^{-T}\varvec{C}^{T} \varvec{v}\in Ran(\varvec{W}) \end{aligned}$$
(32)

for \(k=1,..,M\), then

$$\begin{aligned} \varvec{v}^{T} \frac{d^{j}\varvec{H}(s)}{ds^{j}}\bigg |_{s=\mu } =\varvec{v}^{T} \frac{d^{j}\varvec{H_{r}}(s)}{ds^{j}}\bigg |_{s=\mu } ;j=0,..,M-1 \end{aligned}$$
(33)

and if both Eqs. (31) and (32) hold with \(\sigma =\mu\) then

$$\begin{aligned} \varvec{v}^{T}\frac{d^{j}\varvec{H}(s)}{ds^{j}}\bigg |_{s=\sigma }\varvec{w}=\varvec{v}^{T}\frac{d^{j}\varvec{H_{r}}(s)}{ds^{j}}\bigg |_{s=\sigma }\varvec{w} \end{aligned}$$
(34)

for \(j=0,\ldots ,N+M-1\)

The main benefit of interpolatory model reduction methods is that it avoids solving large-scale Lyapunov or Riccati equations, making it a convenient reduction platform for large-scale systems. However, the main cost among these methods is solving the shifted linear systems that are sparse in most cases and can be solved using direct methods (e.g., Gaussian elimination). Also, one can prefer iterative solution methods for obtaining the reduction basis \(\varvec{V}\) and \(\varvec{W}\) when dealing with systems with millions of degrees of freedom.

2.1.4 Second-Order Systems

While modeling certain electrical or mechanical systems, one has to deal with models in second-order form. The model reduction techniques for such systems aim to construct reduced models that preserve the second-order structure [27, 123, 124, 158, 169]. To define the generalized notion of moments, consider a large-scale, MIMO, second-order, state-space model given as:

$$\begin{aligned}&\varvec{F}\varvec{\ddot{x}}(t)+\varvec{D\dot{x}}(t)+\varvec{Gx}(t)=\varvec{Zu}(t) \end{aligned}$$
(35a)
$$\begin{aligned}&\varvec{y}(t)=\varvec{Lx}(t) \end{aligned}$$
(35b)

where the matrices \(\varvec{F,D}\) and \(\varvec{G}\) represent the mass, damping and stiffness matrices. The overall dimension of the systems is 2n with m inputs and p outputs. Since the mass, stiffness, and damping matrices in most of the practical systems are symmetric and positive definite, it is often recommended to represent the system (35) as a transformed state-space model given as follows:

$$\begin{aligned}&\underbrace{\begin{bmatrix} &{}-\varvec{G}&{}~~\varvec{0}\\ {} &{}~~\varvec{0}&{}~~\varvec{F} \end{bmatrix} }_{\varvec{E}}\underbrace{\begin{bmatrix} \dot{\varvec{x}}(t)\\ \varvec{\ddot{x}}(t) \end{bmatrix}}_{\mathbf {\dot{x}}}= \underbrace{\begin{bmatrix} &{}~~\varvec{0}&{}-\varvec{G}\\ &{}-\varvec{G}&{}-\varvec{D} \end{bmatrix}}_{\varvec{A}} \underbrace{\begin{bmatrix} &{}\varvec{x}(t)\\ &{}\dot{\varvec{x}}(t)\end{bmatrix}}_{\mathbf {x}} + \underbrace{\begin{bmatrix}\varvec{0} \\ \varvec{Z} \end{bmatrix}}_{\varvec{B}} \varvec{u}(t) \end{aligned}$$
(36a)
$$\begin{aligned}&\varvec{y}(t)=\underbrace{\begin{bmatrix}&\varvec{L}&\,&\varvec{0}\end{bmatrix}}_{\varvec{C}}\underbrace{\begin{bmatrix} &{}\varvec{x}(t)\\ &{}\dot{\varvec{x}}(t)\end{bmatrix}}_{\mathbf {x}} \end{aligned}$$
(36b)

As a result the symmetry and definiteness of matrix \(\varvec{E}\) are taken care of by matrices \(\varvec{F}\) and \(\varvec{G}\) and the symmetry of matrix \(\varvec{A}\) is maintained by matrices \(\varvec{G}\) and \(\varvec{D}\) respectively.

Definition 6

The ith moment of system (35) around expansion point \(s=0\) is defined as :

$$\begin{aligned} \varvec{\eta }_{i}^{0}=\begin{bmatrix}&\varvec{0}&~~-\varvec{LG}^{-1} \end{bmatrix}\begin{bmatrix} &{}\varvec{0}&{}~ -\varvec{FG}^{-1}\\ {} &{}\varvec{I}&{}~-\varvec{DG}^{-1} \end{bmatrix}^{i}\begin{bmatrix} \varvec{0} \\ \varvec{Z} \end{bmatrix} \end{aligned}$$
(37)

It is hereby assumed that matrix \(\varvec{G}\) is invertible so as to ensure that the matrix \(\varvec{A}\) remains nonsingular. The Markov parameter of system (35) is defined as:

$$\begin{aligned} \varvec{M}_{i}=\begin{bmatrix}&\varvec{0}&~~\varvec{LF}^{-1} \end{bmatrix}\begin{bmatrix} &{}-\varvec{DF}^{-1}&{}~~\varvec{I}\\ {} &{}-\varvec{GF}^{-1}&{}~~\varvec{0} \end{bmatrix}^{i}\begin{bmatrix} \varvec{Z}\\ \varvec{0} \end{bmatrix} \end{aligned}$$
(38)

with the condition that matrix \(\varvec{F}\) is invertible.

In order to evaluate the projection basis and hence the reduced second-order model, we first define the second-order Krylov subspace as follows:

Definition 7

[157] Given two matrices \(\varvec{X}_{1}\) and \(\varvec{X}_{2} \in \mathbb {R}^{n \times n}\) and another matrix \(\varvec{H}_{1} \in \mathbb {R}^{n \times m}\) whose columns represent the starting vectors, the second-order Krylov subspace is defined as:

$$\begin{aligned} \mathcal {K}_{r}(\varvec{\varvec{X}_{1}},\varvec{X}_{2},\varvec{H}_{1})=colspan\{\varvec{Q}_{0},\varvec{Q}_{1},\ldots ,\varvec{Q}_{r-1}\} \end{aligned}$$
(39)

where

$$\begin{aligned} {\left\{ \begin{array}{ll} \varvec{Q}_{0}=\varvec{H}_{1},~\varvec{Q}_{1}=\varvec{X}_{1}\varvec{Q}_{0} \\ \varvec{Q}_{i}=\varvec{X}_{1}\varvec{Q}_{i-1}+\varvec{X}_{2}\varvec{Q}_{i-2},~i=2,3,.. \end{array}\right. } \end{aligned}$$
(40)

Definition 8

The second-order input Krylov subspace is then defined as:

$$\begin{aligned} \mathcal {K}_{r_{1}}(-\varvec{G}^{-1}\varvec{D},-\varvec{G}^{-1}\varvec{F},-\varvec{G}^{-1}\varvec{Z}) \end{aligned}$$
(41)

and the second-order output Krylov subspace is defined as:

$$\begin{aligned} \mathcal {K}_{r_{2}}(-\varvec{G}^{-T}\varvec{D}^{T},-\varvec{G}^{-T}\varvec{F}^{T},-\varvec{G}^{-T}\varvec{L}^{T}) \end{aligned}$$
(42)

Similar to first-order systems, the reduction techniques using only one second-order Krylov subspace are known as the one-sided Krylov subspace method, and the reduction techniques involving both the second-order Krylov subspaces are known as the two-sided Krylov subspace methods. Now consider the approximation ansatz given as \(\varvec{x}(t)=\varvec{Vx_{r}}(t), \varvec{V}\in \mathbb {R}^{n \times r}\). Substituting this approximation in Eq. (35) and pre-multiplying by \(\varvec{W}^{T} \in \mathbb {R}^{n \times r}\) yields a second-order reduced model of dimension 2r given as:

$$\begin{aligned}&\varvec{F_{r}}\varvec{\ddot{x_{r}}}(t)+\varvec{D_{r} \dot{x_{r}}}(t)+\varvec{G_{r}x_{r}}(t)=\varvec{Z_{r}u}(t) \end{aligned}$$
(43a)
$$\begin{aligned}&\varvec{y_{r}}(t)=\varvec{L_{r}x_{r}}(t) \end{aligned}$$
(43b)

where

$$\begin{aligned} \varvec{F_{r}}&=\varvec{W}^{T}\varvec{F}\varvec{V},~\varvec{D_{r}}=\varvec{W}^{T}\varvec{D}\varvec{V}, \varvec{G_{r}}=\varvec{W}^{T}\varvec{G}\varvec{V}, \\ \varvec{Z_{r}}&=\varvec{W}^{T}\varvec{Z}, \varvec{L_{r}}=\varvec{LV.} \end{aligned}$$
(44)

Using the formulation in Eq. (36), the second-order reduced model (43) is expressed in a transformed state-space given as:

$$\begin{aligned} {\left\{ \begin{array}{ll} \underbrace{\begin{bmatrix} &{}-\varvec{W^{T}}\varvec{GV}&{}~\varvec{0}\\ {} &{}\varvec{0}&{}\varvec{W}^{T}\varvec{FV} \end{bmatrix} }_{\varvec{E_{r}}}\underbrace{\begin{bmatrix} \dot{\varvec{x}_{r}}(t)\\ \varvec{\ddot{x}_{r}}(t) \end{bmatrix}}_{\mathbf {\dot{x}_{r}}} \\ =\underbrace{\begin{bmatrix} &{}\varvec{0}&{}-\varvec{W}^{T}\varvec{GV}\\ &{}-\varvec{W}^{T}\varvec{GV}&{}-\varvec{W}^{T}\varvec{DV} \end{bmatrix}}_{\varvec{A_{r}}} \underbrace{\begin{bmatrix} &{}\varvec{x_{r}}(t)\\ &{}\dot{\varvec{x}_{r}}(t)\end{bmatrix}}_{\mathbf {x_{r}}} + \underbrace{\begin{bmatrix}\varvec{0} \\ \varvec{W}^{T}\varvec{Z} \end{bmatrix}}_{\varvec{B_{r}}} \varvec{u}(t) \\ \varvec{y_{r}}(t)=\underbrace{\begin{bmatrix} &{}\varvec{LV}&{}&{}\varvec{0}\end{bmatrix}}_{\varvec{C_{r}}}\underbrace{\begin{bmatrix} &{}\varvec{x_{r}}(t)\\ &{}\dot{\varvec{x}_{r}}(t)\end{bmatrix}}_{\mathbf {x_{r}}} \end{array}\right. } \end{aligned}$$
(45)

where the projection matrices are defined as:

$$\begin{aligned} \varvec{\tilde{V}}=\begin{bmatrix} &{}\varvec{V}&{}\varvec{0}\\ {} &{}\varvec{0}&{}\varvec{V} \end{bmatrix},\quad \varvec{\tilde{W}}= \begin{bmatrix} &{}\varvec{W}&{}\varvec{0}\\ {} &{}\varvec{0}&{}\varvec{W} \end{bmatrix} \end{aligned}$$
(46)

This formulation preserves the structure of the full-model. Now in order to obtain the projection basis to match any desired moments, all the previously mentioned theorems can be generalized for the second-order system in a similar fashion.

Theorem 9

[157, 158] In order to match the first \(r_{1}+ r_{2}\) moments between the original second-order model Eq. (35) and the reduced second-order model Eq. (43), it is required that the columns of projection matrices \(\varvec{V}\) and \(\varvec{W}\) used in Eq.  (44) form the basis for the input and output second-order Krylov subspaces Eqs. (41) and (42) respectively. Also, it is assumed that the matrices \(\varvec{G}\) and \(\varvec{G}_{r}\) are invertible.

And similarly the rational interpolation can achieved as follows:

Theorem 10

[157, 158] In order to match first \(r_{i}\) moments about \(\sigma _{i}, i=1,..,k_{2}\) of the original and reduced models, it is assumed that matrices \(\varvec{V}\) and \(\varvec{W}\) are chosen as follows:

$$\begin{aligned}&span (\varvec{V})=\\&\quad \bigcup _{i=1}^{k_{1}}\mathcal {K}_{r_{i}}(-(\varvec{G}+s_{i}\varvec{D}+s_{i}^{2}\varvec{F})^{-1}(\varvec{D}+2s_{i}\varvec{F}),\\ {}&-(\varvec{G}+s_{i}\varvec{D}+s_{i}^{2}\varvec{F})^{-1}\varvec{F},-(\varvec{G}+s_{i}\varvec{D}+s_{i}^{2}\varvec{F})^{-1}\varvec{Z}) \end{aligned}$$

and

$$\begin{aligned}&span (\varvec{W})=\\&\quad \bigcup _{i=k_{1}+1}^{k_{2}}\mathcal {K}_{r_{i}}(-(\varvec{G}+s_{i}\varvec{D}+s_{i}^{2}\varvec{F})^{-T}(\varvec{D}+2s_{i}\varvec{F})^{T},\\ {}&\quad\quad-(\varvec{G}+s_{i}\varvec{D}+s_{i}^{2}\varvec{F})^{-T}\varvec{F}^{T},-(\varvec{G}+s_{i}\varvec{D}+s_{i}^{2}\varvec{F})^{-T}\varvec{L}^{T}) \end{aligned}$$

The first successful attempt in reducing second-order models came from Meyer, and Srinivasan [130] which involved the evaluation of the free-velocity and the zero-velocity gramians. This was followed by the much improved second-order balancing truncation method by Chahlaoui et al. [48]. However, balancing for second-order systems is not recommended from a numerical perspective. Instead, faster iterative schemes based on Arnoldi or Lanczos are more suitable for reducing systems in the second-order structure. An early attempt in this direction came from Su and Craig [169] and later by Bastian and Haase [27] in which Krylov subspace methods were employed. However, these methods could not match nonzero moments, and the number of moments matched was less than the classical Krylov subspace methods. Consequently, several papers appeared based on similar ideas [72, 120]. These methods obtained reduced models by applying different projection mappings to an equivalent state-space model while preserving its structure. Behnam et al. proposed two-different techniques in Ref. [158] for reducing second-order system by first reducing an equivalent state-space model followed by back conversion of the reduced model to second-order form. This resulted in matching double the number of moments than the second-order Krylov methods proposed in Refs. [23, 124, 157].

2.2 Moment Matching in Time-Domain

So far, we have discussed the frequency domain notion of moment matching for LTI systems for both SISO and MIMO systems. We saw that the reduced model obtained using Krylov subspaces results in a local approximation of frequency response. However, this cannot guarantee a good overall approximation of impulse response. Gunupudi and Nakhla [92], were among the first to present a scheme to match the first derivatives of time-response of full order model with that of the reduced one. Later-on Wang et al. [179] presented a passive model order reduction method based on Chebyshev expansion of the impulse response so as to match the transient responses of the full model and the reduced-order model. The Laguerre polynomial expansion based reduction framework was presented in Ref. [52]. This was later extended and further developed by Rudy [58]. However, the major contribution in this direction came from Astolfi [15] who also extended the idea further to nonlinear systems [16,17,18]. In the following, we describe the time-domain notion of moment matching based on Refs. [15,16,17].

Definition 9

The moments \(\varvec{\bar{\eta _{i}}}(\sigma )\) of the impulse response \({\hbar }(t)\) around a point \(s=\sigma\) is given by the weighted integrals over the time function and satisfy

$$\begin{aligned} \varvec{\bar{\eta _{i}}}(\sigma )&=\int _{0}^{\infty }\tau ^{i}e^{-\sigma \tau } \hbar (t) d\tau \end{aligned}$$
(47a)
$$\begin{aligned}&=\int _{0}^{\infty } \tau ^{i}\varvec{C}e^{(\varvec{E}^{-1}\varvec{A}-\sigma \varvec{I})\tau }\varvec{E}^{-1}\varvec{B}d\tau \end{aligned}$$
(47b)
$$\begin{aligned}&=i!~\varvec{C}((\sigma \varvec{E}-\varvec{A})^{-1}\varvec{E})^{i}(\sigma \varvec{E}-\varvec{A})^{-1}\varvec{B} \end{aligned}$$
(47c)

Remark 5

Thus, the time-domain moments \(\varvec{\bar{\eta _{i}}}(\sigma )\) and the frequency-domain moments \(\varvec{\eta _{i}}(\sigma )\) only differ by a factor:

$$\begin{aligned} \varvec{\eta _{i}}(\sigma )=\frac{(-1)^{i}}{i!}\varvec{\bar{\eta _{i}}}(\sigma ) \end{aligned}$$
(48)

This result is used to develop the notion of moments in terms of steady-state response of system (4) interconnected with a linear signal generator.

Definition 10

[14] The 0-moment of system (4) at \(s=\sigma\) can also be defined as

$$\begin{aligned} \varvec{\eta _{0}}(\sigma )=\varvec{CV} \end{aligned}$$
(49)

where \(\varvec{V} \in \mathbb {R}^{n\times r}\) uniquely solves the linear Sylvester equation given as:

$$\begin{aligned} \varvec{AV}+\varvec{B}=\sigma \varvec{EV} \end{aligned}$$
(50)

Proposition 1

[16] The moments of system (4), \(\varvec{\eta }_{0}(\sigma ), \varvec{\eta }_{1}(\sigma ),\ldots ,\varvec{\eta }_{k}(\sigma )\) can be uniquely determined by the elements of matrix \(\varvec{CV}\) i.e., there exists a one-to-one relation between the moments of system (4) and the elements of matrix \(\varvec{CV}\), with \(\varvec{V}\) being a unique solution of the following Sylvester equation

$$\begin{aligned} \varvec{EV\Xi }- \varvec{AV}=\varvec{B\Psi } \end{aligned}$$
(51)

where \(\varvec{\Xi } \in \mathbb {R}^{r \times r}\) is any non-derogatory matrix and it is assumed that the pair (\(\varvec{\Psi },\varvec{\Xi }\)) is observable. Now, consider the following exogenous linear system (also known as signal generator):

$$\begin{aligned} \dot{\varvec{\zeta }}(t)&=\varvec{\Xi }\varvec{\zeta }(t) \quad \varvec{\zeta }(0)\ne 0 \end{aligned}$$
(52a)
$$\begin{aligned} \varvec{u}(t)&=\varvec{\Psi \zeta }(t) \end{aligned}$$
(52b)

with \(\varvec{\zeta }(t) \in \mathbb {R}^{r}\), \(\lambda (\varvec{E}^{-1}\varvec{A}) \cap \lambda (\varvec{\Xi }) = \emptyset\) and the triple \((\varvec{\Xi ,\Psi ,\zeta }(0))\) to be minimal i.e., the pair \((\varvec{\Psi },\varvec{\Xi })\) is observable and \((\varvec{\Xi },\varvec{\zeta }(0))\) is controllable such that the generated input signal \(\varvec{u}(t)\) is persistently exciting [20, 122].

Theorem 11

[16, 17] Consider system (4), \(\sigma \in \mathbb {C}\). Assume \(\lambda (\varvec{E^{-1}}\varvec{A}) \cap \lambda (\varvec{\Xi }) = \emptyset\). Let \(\varvec{V}\) satisfies the Sylvester equation (51) and \(\varvec{W}\) such that det(\(\varvec{W}^{T}\varvec{EV}\ne 0\)). Consider the interconnection of exogenous system (52) with system (4) as follows:

$$\begin{aligned} \varvec{E}\dot{\varvec{x}}(t)&=\varvec{Ax}(t)+\varvec{B}\varvec{\Psi }(e^{\varvec{\Xi }t}\varvec{\zeta (0)}) \end{aligned}$$
(53a)
$$\begin{aligned} \quad \varvec{y}(t)&=\varvec{C}\varvec{x}(t) \end{aligned}$$
(53b)

where the pair (\(\varvec{\Psi },\varvec{\Xi }\)) is observable. Then, the (well-defined) steady-state of the output of the said interconnection can be uniquely determined by the respective moments \(\varvec{\eta }_{0}(\sigma ),\varvec{\eta }_{1}(\sigma ),\ldots ,\varvec{\eta }_{k}(\sigma )\).

Fig. 1
figure 1

Time-domain illustration of linear moment-matching in terms of steady-state response matching

Remark 6

Connecting a linear signal generator with the full order system (4) is similar to providing exponential inputs to the system given as \(\varvec{u}(t)=\varvec{\Psi }(e^{\varvec{\Xi }t}\varvec{\zeta (0)})\). The (well-defined )steady-state of interconnected system is given as:

$$\begin{aligned} \varvec{y}(t)=\underbrace{\varvec{C}e^{\varvec{E}^{-1}\varvec{A}(t)}(\varvec{x}(0)-\varvec{V\zeta }(0))}_{\varvec{y}_{p}(t)} + \underbrace{\varvec{CV\zeta }(t)}_{\varvec{y}_{h}(t)} \end{aligned}$$
(54)

where \(\varvec{y}_{h}(t)\) represents the steady-state response of the system and for an asymptotically stable system, considered here, the transient response, represented by \(\varvec{y}_{p}(t)\) decays to zero at \(t \rightarrow \infty\).

Using Theorem 11, the reduced-order model via moment-matching can be obtained as follows:

Proposition 2

[17] Consider the full-order model (4) and the system described in Eq. (15). Fix \(\varvec{\Psi }\) and \(\varvec{\Xi }\) such that the pair (\(\varvec{\Psi },\varvec{\Xi }\)) is observable. Also assume that \(\lambda (\varvec{E^{-1}}\varvec{A}) \cap \lambda (\varvec{\Xi }) = \emptyset\) . Then, the reduced model (15) matches moments with system (4) at \((\varvec{\Xi ,\Psi ,\zeta }(0))\) if

$$\begin{aligned} \varvec{CV}=\varvec{C_{r}V_{r}} \end{aligned}$$
(55)

where \(\varvec{V_{r}}\) uniquely solves the following Sylvester equation

$$\begin{aligned} \varvec{E_{r}V_{r}\Xi }-\varvec{A_{r}V_{r}}=\varvec{B_{r}\Psi } \end{aligned}$$
(56)

Thus, moment-matching in the time-domain implies ma- tching the full order model’s steady-state response with the reduced model when both are excited by appropriate inputs from the signal generator. However, the steady-state response is interpolated for inputs other than those generated from the signal generator.

The time-domain illustration of moment-matching in terms of interpolation of steady-state response is depicted in Fig. 1. This kind of notion of moment-matching in terms of steady-state response is particularly useful in the sense that as it permits one to define moments even for those systems which don’t have a transfer function representation, such as the time-varying systems.

2.3 Issues and Recent Advancements in Linear Moment Matching

Model order reduction using Krylov subspace methods provides an indispensable platform for reducing large-scale problems; however, there are many issues related to the automatic generation of the reduced model. Although various algorithms have been developed in the past, these apply only to some specific class of problems or under some specific conditions. Next, we highlight some major relevant problems and provide an overview of some of the significant achievements in this direction.

2.3.1 Choice of Expansion Point(s)

The choice and number of interpolation points or shifts in Krylov subspace methods is an important factor in dictating the quality of the approximation. To address this issue, a lot of work has been carried out over the past years. In order for the reduced system to minimize the \(\mathcal {H}_{2}\)-norm error, various optimality conditions have been formulated either in terms of rational interpolation conditions [29,30,31, 45, 88, 90, 111, 128, 175] or in terms of Sylvester and Lyapunov equations [38, 93, 168, 183, 186]. For SISO systems, the interpolation conditions were first proposed by Meier and Luenberger [128]. Gugercin et al. proposed the iterative rational Krylov alogorithm (IRKA) [88, 89], which produces reduced models satisfying the first-order necessary conditions for \(\mathcal {H}_{2}\) optimality by selecting the interpolation points as the mirror images of the poles of reduced system. The idea was later extended to MIMO systems in Refs. [45, 90, 175]. Van Dooren et al. derived the optimality conditions for the case of repeated poles of \(\varvec{H_{r}}(s)\) [175]. Generally speaking, IRKA has been a significant success in obtaining optimal reduced models and as such finds applications in psychophysiology [105], optimal cooling of steel profiles [90] and many others. Recently, this method was also extended for the reduction of bilinear systems by Benner and Breiten [34]. The convergence of IRKA is guaranteed a priori in some situations [67], albeit fails in some cases [67, 90]. However, the choice of initial starting values remains an open question.

Druskin and Simoncini [56] proposed an adaptive computation of interpolation points for the rational Kry- lov methods. Though this method is less accurate than IRKA, it has a low computational cost. This was later followed by the SPARK algorithm by Panzer et al. [140] in which the choice of interpolation point(s), as well as the order of reduced model, is adaptively selected. Similar work was also presented in Ref. [66]. Based on the binary search principle, Bollhöfer and Bodendiek [42] presented adaptive rules for selecting the shifts. They showed a general scheme for selecting both the shifts and the moments by combining the adaptive shift selection scheme with the adaptive moment selecting method as presented earlier in Ref. [118]. However, the choice of selecting the reduced-order dimension remains unknown.

Besides the appropriate choice of interpolation point, it is often required in certain applications that the interpolation is carried out in certain frequency regions of interest. This requires weighting certain frequencies more than others. To allow frequency weighting, the weighted \(\mathcal {H}_{2}\) model reduction was first proposed by Halevi [93, 168] as the solution of Riccati and Lyapunov equations. However, a more efficient version of this scheme was introduced in SISO systems by Anić et al. [6] and for MIMO systems by Breiten et al. [44].

All the above methods described demand more or less heuristics, as the user has to manually pick and try several interpolation points and decide what works satisfactorily for his/her application. This is because of the absence of global error bounds for Krylov subspace methods, which will be discussed next. However, a general scheme that one can follow (as given in Ref. [58]) is that by selecting \(s=0\), the steady-state accuracy is improved as the DC gain of both the full-order and reduced-order model is matched. By selecting \(s\rightarrow \infty\), the resulting reduced system better approximates the transient response of the full-order system. Also, selecting multiple interpolation points across the frequency spectrum leads to a better approximation on a broader frequency band.

2.3.2 Global Error Indicator

Another major open issue in Krylov-based reduction is the lack of a global error bound between the full-order system and the reduced one. Earlier results in this direction led to the development of local error bounds for the transfer function only for a certain frequency range [25, 84]. Later, some heuristic error indicators were presented in Refs. [32, 84]. It was proposed that the error between the full-order model and the reduced model is approximately equal to one between two successive reduced-order models. However, no proof was provided. A deductible error estimator was presented first by Konkel et al. [110].

A posterior error indicator was presented by Feng and Benner [63], whereby a greedy algorithm determines the next interpolation point based on the largest error. However, the method is applicable only to some special LTI systems where the matrix \(\varvec{E}\) in Eq. (4) is symmetric positive definite. Similar requirement is mandated in the method presented by Panzer et al. [141] whereby global \(\mathcal {H}_{2}\) and \(\mathcal {H}_{\infty }\) error bounds are derived. Furthermore, it is also required that \(\varvec{A}+\varvec{A}^{T}>0\). Another study by Wolf et al. [182] proposed a gramian-based output error bound. However, the method is not computationally practical for large scale systems because it involves the explicit computation of observability gramian.

Thus, we conclude that the calculation of an exact global error bound in Krylov based reduction methods requires the involvement of the full-order system, and this would become practically challenging for large-scale settings. As such, this remains an open area of research.

2.3.3 Preserving Stability and Passivity

One of the most fundamental requirements in any reduction scheme is that the property of stability and passivity remains preserved in the reduced model to make sense of surrogate modeling. It is well-known that Krylov subspace-based reduced-order modeling techniques do not preserve stability or passivity in general. However, there are some appreciable efforts in this direction. Methods proposed in Refs. [70, 106] offer guaranteed stability and passivity of the reduced models if the original system is passive. Another set of methods were presented in Refs. [24, 85, 102]. These methods are based on post-processing schemes, in which the unstable poles of the reduced system are removed using explicitly restarted Lanczos and Arnoldi methods. Inspired by the relation between Löwner and Pick matrices, the interpolation-based passivity preserving methods were proposed in Refs. [8, 167].

Similar to the previous discussion, these methods are confined to a special class of LTI systems and extend only to one-sided Krylov methods. The interpolation-based methods are numerically costly than classical Krylov based MOR. As far as the restarted algorithms are considered, these apply to SISO systems only. Furthermore, after removing the unstable poles, these methods do not preserve moment matching property and, as such, do not always guarantee to obtain a stable a reduced model with a finite number of restarts, thus making this an open problem to be addressed.

3 Moment Matching for Nonlinear Systems

Since most of the practical real-life dynamical systems are inherently nonlinear in nature, this has led to a lot of research efforts from the past decades to reduce large-scale nonlinear systems. Special classes of large-scale systems such as bilinear systems, DAE systems and Hamiltonian systems have been successfully reduced, see e.g. Lall et al. [116], Soberg et al. [166], Al-Baiyat et al. [3] and Fujimoto [73]. Polynomial approximation of weakly nonlinear systems has been carried by Chen [51] , Rewienski and White [153, 154] and Benner [33]. Later-on, the idea of reducing certain nonlinear system by transforming them into quadratic bilinear form was proposed by Benner and Breiten [36], Gu [87] and Antoulas et al. [9]. Methods based on variational analysis were also proposed [35, 87, 144].

The concept of balancing for nonlinear systems was first proposed in Scherpen [164]. This further encouraged the development of energy-based methods and the notion of Hankel operator, see e.g., Gray and Mesko [82], Scherpen and Gray [163], Scherpen and Van-der-Schaft [165] and Fujimoto and Scherpen [74, 75]. Methods based on proper orthogonal decomposition (POD) were proposed by Willcox and Peraire [181], Kunisch and Volkwein [114], Hinze and Volkwein [94], Kunisch and Volkwein [115], Grepl et al. [83] and Astrid et al. [19]. Krylov subspace methods for special nonlinear system classes have also been studied in Refs. [43, 80].

The relation between moments and Sylvester equation was first proposed by Gallivan et al. [78, 79]. Based on this, the notion of nonlinear moments was firstly presented in Astolfi [14] followed by Ref. [15]. Since then, methods on linear moment matching and, especially, nonlinear moment-matching (NLMM) have been developed in many publications. A class of (nonlinear) parameterized ROMs achieving moment matching was defined in Ref. [96]. A two-sided, nonlinear moment matching theory was developed by Ionescu and Astolfi [97, 98]. The problem of MOR via moment-matching for linear and nonlinear differential time-delay systems with discrete and distributed-delays was studied by Scarciotti and Astolfi [160]. In addition to this, the online estimation of moments of linear and nonlinear systems from input/output measurements has been proposed in Ref. [161, 162]. The study presented new algorithms to construct ROM’s that asymptotically match the moments of an unknown nonlinear system to be reduced by solving a recursive, moving window, least-square estimation problem using input/output snapshot measurements. In reference to this, Maria et al. [176] developed a practical simulation-free, NLMM algorithm in which certain numerical simplifications are proposed that avoids the expensive solution of Sylvester PDE. Recently, Rafiq and Bazaz proposed a nonlinear MOR framework based on NLMM with dynamic mode decomposition (DMD) [152]. The framework involves the computation of the offline projection matrix \(\varvec{V}\) by the application of NLMM, whereas the underlying nonlinearity is approximated via DMD modes. Also, a parametric nonlinear MOR method based on NLMM has been proposed in Ref. [151](see also [148,149,150]). Faedo et al. [60] recently extended this idea to wave energy systems. Furthermore, an optimal \(\mathcal {H}_{2}\)-norm based MOR method using time-domain moment-matching was proposed by Ion Necoara and Ionescu [133].

As mentioned before, this manuscript focuses on interpolatory/moment-matching-based MOR methods, and as such, we will not discuss data-driven/snapshots based methods such as POD.

3.1 Moments of a Nonlinear System

Similar to the notion of moments for linear systems described in Sect. 2.2, we revisit the nonlinear enhancement of moment-matching for large-scale systems. The idea is based on the concepts emerging from output regulation of nonlinear systems [112], the center manifold theory [47] and the steady-state response of nonlinear-systems [46, 99, 100].

Fig. 2
figure 2

Time-domain illustration of nonlinear moment-matching in terms of steady-state response matching between FOM and ROM

Consider a large-scale, nonlinear, time-invariant, exponentially stable, MIMO state-space model of the form

$$\begin{aligned} \varvec{E}\frac{d\varvec{x}(t)}{dt}&=\varvec{f}(\varvec{x}(t),\varvec{u}(t)),\quad \varvec{x}(0)=\varvec{x}_{0} \end{aligned}$$
(57a)
$$\begin{aligned} \varvec{y}(t)&=\varvec{g }(\varvec{x}(t)) \end{aligned}$$
(57b)

with non-singular descriptor matrix \(\varvec{E} \in \mathbb {R}^{n\times n}\), the vectors \(\varvec{x}(t) \in \mathbb {R}^{n}, \varvec{u} \in \mathbb {R}^{m}, \varvec{y}(t) \in \mathbb {R}^{p}\) and \(\varvec{f}(\varvec{x},\varvec{u}):\mathbb {R}^{n}\times \mathbb {R}^{m}\rightarrow \mathbb {R}^{n},\varvec{g}(\varvec{x}):\mathbb {R}^{n}\rightarrow \mathbb {R}^{p}\) are the two nonlinear, vector valued functions such that \(\varvec{f}(\varvec{0},\varvec{0})=\varvec{0}\) and \(\varvec{g}(\varvec{0})=\varvec{0}\). Also, the zero equilibrium, (calculated from \(\varvec{0}=\varvec{f}(\varvec{x},\varvec{0})\)), is assumed to be locally exponentially stable.

Consider the following nonlinear exogenous (signal generator) system:

$$\begin{aligned} \dot{\varvec{\zeta }}(t)&=\varvec{\varpi }(\varvec{\zeta }(t)) \quad \varvec{\zeta }(0)=\varvec{\zeta }_{0}\ne 0 \end{aligned}$$
(58a)
$$\begin{aligned} \varvec{u}(t)&=\varvec{\varrho }(\varvec{\zeta }(t)) \end{aligned}$$
(58b)

where \(\varvec{\varpi }(\varvec{\zeta }) \in \mathbb {R}^{r}\rightarrow \mathbb {R}^{r}\) and \(\varvec{\varrho }(\varvec{\zeta }) \in \mathbb {R}^{r}\rightarrow \mathbb {R}^{m}\) are smooth mapping such that \(\varvec{\varpi }(\varvec{0})=\varvec{0}\) and \(\varvec{\varrho }(\varvec{0})=\varvec{0}\). It is hereby assumed that the exogenous system Eq. (58) is observable \((\varvec{\varrho },\varvec{\varpi },\varvec{\zeta _{0}})\), i.e., the output trajectories corresponding to any initial conditions do not coincide. Also, its is assumed that the point \(\varvec{\zeta _{0}}\) is a stable equilibrium (in ordinary sense of Lyapunov) such that inputs generated by such a system remain bounded. Furthermore, the input signal \(\varvec{u}(t)\) is assumed to be persistently exciting in time [99]. This implies that every point \(\varvec{\zeta ^{o}}\) is Poisson stable, such that no trajectory can decay to zero as \(t \rightarrow \infty\). When both the above conditions are met, we define the signal generator to be “neutrally stable” [99] i.e., if the output signal of such a signal generator is fed to the input of nonlinear system (57), the steady-state of output of such an interconnection is guaranteed to be well-defined. The signal generator thus essentially captures the requirement that one is interested in studying the the behavior of Eq. (57) only in specific circumstances. The interconnection of exogenous system (58) with nonlinear system (57) is given as:

$$\begin{aligned} \varvec{E}\frac{d\varvec{x}(t)}{dt}&=\varvec{f} (\varvec{x}(t),\varvec{\varrho }(\varvec{\zeta }(t)) \end{aligned}$$
(59a)
$$\begin{aligned} \varvec{y}(t)&=\varvec{g }(\varvec{x}(t)) \end{aligned}$$
(59b)

The steady-state of such an interconnection is given as:

$$\begin{aligned} \varvec{y}(t)= \underbrace{\varvec{g\tau }(\varvec{x}_{0}-\varvec{v}(\varvec{\zeta _{0}}))}_{\varvec{y_{p}}(t)} +\underbrace{\varvec{gv}(\varvec{\zeta }(t))}_{\varvec{y_{h}}(t)} \end{aligned}$$
(60)

where the mapping \(\varvec{v}(\varvec{\zeta })\) is a local mapping defined in the neighborhood of \(\varvec{\zeta }=0\). Since the nonlinear system (57) is assumed to be exponentially stable, as such the transient solution decays exponentially, i.e., \(\lim \limits _{t \rightarrow \infty }\varvec{y_{h}}(t)=0\), and \(\varvec{y_{ss}}(t)=\varvec{g}(\varvec{v}(\varvec{\zeta }(t)))\) is the steady-state response of the interconnected system. Interconnecting the nonlinear system (57) with the signal generator (58) corresponds to exciting the nonlinear system with user-defined inputs.

In order to provide a generalized notion of moments for nonlinear systems, the following assumption is required (as provided in Ref. [17])

Assumption: The unique mapping \(\varvec{v}(\varvec{\zeta })\) solves the nonlinear Sylvester equation given as:

$$\begin{aligned} \varvec{E}\dfrac{\partial \varvec{v}(\varvec{\zeta })}{\partial \varvec{\zeta }}\varvec{\varpi }(\varvec{\zeta })=\varvec{f}(\varvec{v}(\varvec{\zeta }),\varvec{\varrho }(\varvec{\zeta })) \end{aligned}$$
(61)

By the center manifold theory, the interconnected system (59) possesses a locally well-defined invariant manifold at \((\varvec{x_{eq}},\varvec{\zeta _{eq}}=(\varvec{0,0}))\), given as \(\mathcal {M}=\{ (\varvec{x},\varvec{\zeta }) \in \mathbb {R}^{n+r}:\varvec{x}=\varvec{v}(\varvec{\zeta })\}\) where \(\varvec{v}(\varvec{\zeta })\) solves Eq. (61).

Definition 11

[16, 17] The moment of system (57) at \(\varvec{\varpi (\varvec{\zeta })}\) is defined by function \(\varvec{g}(\varvec{v}(\varvec{\zeta }))\) under the aforesaid assumptions. Furthermore, the 0th nonlinear moment of system (57) at \((\varvec{\varpi }(\varvec{\zeta }(t))),\varvec{\varrho }(\varvec{\zeta }(t)),\varvec{\zeta _{0}})\) is related to the (locally well-defined) steady-state response \(\varvec{y_{ss}}(t)=\varvec{g}(\varvec{v}(\varvec{\zeta }(t)))\) where the mapping \(\varvec{v}(\varvec{\zeta })\) solves the nonlinear Sylvester equation (61).

Theorem 12

[16, 17] The moments of the nonlinear system (57) at \((\varvec{\varpi }(\varvec{\zeta }),\varvec{\varrho }(\varvec{\zeta }),\varvec{\zeta _{0}})\) coincide with the (well-defined) steady-state response of the output of the interconnected system Eq. (59) when the aforesaid assumptions hold.

Thus, similar to linear systems, a nonlinear reduced model can be defined that achieves moment-matching with original system (57) at \((\varvec{\varpi }(\varvec{\zeta }),\varvec{\varrho }(\varvec{\zeta }),\varvec{\zeta _{0}})\).

Definition 12

[16, 17] A reduced-order model defined as:

$$\begin{aligned} \varvec{E_{r}}\frac{d\varvec{x_{r}}(t)}{dt}&=\varvec{f_{r}}(\varvec{x_{r}}(t),\varvec{u}(t)),\quad \varvec{x_{r}}(0)=\varvec{x_{r}}_{0} \end{aligned}$$
(62a)
$$\begin{aligned} \varvec{y_{r}}(t)&=\varvec{g_{r}}(\varvec{x_{r}}(t)) \end{aligned}$$
(62b)

of dimension \(r~(r<<n)\) matches moments with the full-order system (57) at \((\varvec{\varpi }(\varvec{\zeta }),\varvec{\varrho }(\varvec{\zeta }),\varvec{\zeta _{0}})\) if the reduced nonlinear Sylvester equation given as:

$$\begin{aligned} \varvec{E_{r}}\dfrac{\partial \varvec{v_{r}}(\varvec{\zeta })}{\partial \varvec{\zeta }}\varvec{\varpi }(\varvec{\zeta })=\varvec{f_{r}}(\varvec{v_{r}}(\varvec{\zeta }),\varvec{\varrho }(\varvec{\zeta })) \end{aligned}$$
(63)

is uniquely solved by \((\varvec{g_{r}}(\varvec{\zeta }))\) such that

$$\begin{aligned} \varvec{g}(\varvec{v}(\varvec{\zeta }))=\varvec{g_{r}}(\varvec{v_{r}}(\varvec{\zeta })) \end{aligned}$$
(64)

Remark 7

Thus, moment-matching for nonlinear systems can be interpreted as the exact matching of the steady-state response of the FOM and ROM (cf. Fig. 2).

3.1.1 Illustration of Nonlinear Moment-Matching on MEMS Micro-Switch Model

In order to illustrate the idea of nonlinear moment-matching based MOR, we examine the time-dependent, large-amplitude dynamics of a highly nonlinear squeeze-film damping problem involving electrostatic, mechanical and fluidic components [121, 132, 138, 184]. This nonlinear benchmark model is used as a pressure sensor due to its extreme sensitivity to surrounding atmospheric conditions [103, 172, 185]. A finite-difference discretization scheme with M points along the length and N points for the width yields a system of nonlinear ODE of the form as Eq.  (57) with dimension \((2+M)N\). The test signal (input excitation) is chosen as \(u_{\text {test}}=(3\cos (2\omega t )+7\cos (0.5\omega t))^{2}\) with \(\omega =2\pi \times 1e4\) and the output is taken as the center-point deflection of the beam. The full-order model of size \(n=374\) is integrated using an implicit Euler’s scheme, which resulted in a simulation time of 287.17 s. The reduced-order model of size \(r=15\) is constructed using the scheme proposed in Ref. [176] which involves solving a simplified version of Sylvester equation (61). The signal generated used in training phase is chosen as \(\dot{\varvec{\zeta }}(t)=16.5\sin (\pi t)+11e3\) whereas the test signal is kept the same. Figure 3 shows the full-order and the reduced-order systems’ response along-with the relative error-profile. It is seen that the reduced-order model effectively captures the oscillatory dynamics of the original system with a simulation time of 8.31 s.

Fig. 3
figure 3

a Comparison of the output responses of FOM and ROM for the MEMS model, b error profile of the reduced model

Unlike linear-moment, where the interpolation points are implicitly/explicitly defined in the frequency range of interest, nonlinear moment-matching involves matching/interpolating the steady-state of FOM and ROM when both are excited by user-defined signal generator(s). For a more detailed analysis of choosing the appropriate signal generator for a system at hand and the numerical aspects involved, we refer the reader to Ref. [176].

3.2 Other Nonlinear MOR Methods

Besides the time-domain moment-matching method for nonlinear systems described above, there are other reduction schemes that use the Krylov subspaces to obtain the projection basis. One of the most simplest is the quadratic method [51] or the more accurate bilinearazation method by Bai et al. [22, 143]. These methods are the extension of linear moment-matching methods and fall in the category of frequency domain MOR methods. These methods either approximate the underlying nonlinearity by a polynomial of low degree [22, 51, 64, 143, 144] or transform into quadratic bilinear system [35, 86, 87]. Furthermore, there exist certain methods which involve efficient estimation of nonlinear function \(\varvec{f}(.)\) such as the empirical interpolation method by Barrault et al. [26] or the discrete empirical interpolation method by Chaturantabut and Sorensen [50], the best point interpolation method by Nguyen et al. [134] and the missing point estimate by Astrid et al. [19]. Here we discuss some effective methods of practice.

3.2.1 Quadratic Method

Proposed by Chen [51], this method approximates the nonlinear function \(\varvec{f}(.)\) as a kronecker product formulation of \(\varvec{x}\), which is given as:

$$\begin{aligned} \varvec{f}(\varvec{x}(t))=&\varvec{f}(0)+\varvec{A_{1}x}(t)+\varvec{A_{2}}(\varvec{x}(t)\otimes \varvec{x}(t))\\&+\varvec{A_{3}}(\varvec{x}(t)\otimes \varvec{x}(t)\otimes \varvec{x}(t))+\cdots \end{aligned}$$
(65)

where the matrices \(\varvec{A_{i}} \in \mathbb {R}^{n \times n}\) denote ith partial derivative of \(\varvec{f}\) at \(\varvec{x}=0\). The quadratic system thus obtained is given as:

$$\begin{aligned} \dot{\varvec{x}}(t)&=\varvec{f}(0)+\varvec{A_{1}x}(t)+\varvec{A_{2}}(\varvec{x}(t)\otimes \varvec{x}(t))+\varvec{Bu}(t)\\ \varvec{y}(t)&=\varvec{Cx}(t) \end{aligned}$$
(66)

The projection matrix \(\varvec{V}\), in this method, is obtained as the basis of the Krylov subspace given as:

$$\begin{aligned} \text {colspan}(\varvec{V})=\mathcal {K}_{r}(\varvec{A_{1}}^{-1},\varvec{A_{1}}^{-1}\varvec{B}) \end{aligned}$$
(67)

and the reduced quadratic model is obtained by using the one-sided approximation ansatz \(\varvec{x}\approx \varvec{Vx}_{r}\) in Eq. (66) given as:

$$\begin{aligned} \dot{\varvec{x}_{r}}(t)&=\varvec{V^{T}A_{1}Vx_{r}}(t)+\varvec{V^{T}A_{2}}(\varvec{Vx_{r}}(t)\otimes \varvec{Vx_{r}}(t))\\ {}&+\varvec{V^{T}Bu}(t)\\ \varvec{y}(t)&=\varvec{CVx_{r}}(t) \end{aligned}$$
(68)

It is assumed that \(\varvec{f}(0)=0\) otherwise is taken as a part of input signal.

3.2.2 Bilinearization Method

Based on Carleman linearization of a nonlinear systems [159], the bilinear form was proposed in Refs. [22, 143]. This involves the use of first two terms of the series in Eq. (65) so as to approximate the nonlinear function as:

$$\begin{aligned} \varvec{f}(\varvec{x}(t))\approx \varvec{f}(0)+\varvec{A_{1}x}(t)+\varvec{A_{2}}(\varvec{x}(t)\otimes \varvec{x}(t)) \end{aligned}$$

As such the bilinear, input-affine system is obtained as follows:

$$\begin{aligned} \dot{\varvec{x}}_{\otimes }(t)&=\varvec{A}_{\otimes }\varvec{x}_{\otimes }+\varvec{N}_{\otimes }\varvec{x}_{\otimes }\varvec{u}(t)+\varvec{B}_{\otimes }\varvec{u}(t) \end{aligned}$$
(69a)
$$\begin{aligned} \varvec{y}(t)&=\varvec{C}_{\otimes }\varvec{x}_{\otimes } \end{aligned}$$
(69b)

where \(\varvec{x}_{\otimes }=\begin{bmatrix} \varvec{x}(t)\\ \varvec{x}(t)\otimes \varvec{x}(t) \end{bmatrix},\quad \varvec{A}_{\otimes }=\begin{bmatrix} &{}\varvec{A_{1}}&{}\varvec{A_{2}}\\ {} &{}0&{}\varvec{A_{1}}\otimes \varvec{I}+\varvec{I}\otimes \varvec{A_{1}}\end{bmatrix}\) \(\varvec{N}_{\otimes }=\begin{bmatrix} &{}0&{}0\\ {} &{}\varvec{B}\otimes \varvec{I}+\varvec{I}\otimes \varvec{B}&{}0 \end{bmatrix},\varvec{B}_{\otimes }=\begin{bmatrix} \varvec{B}\\ 0 \end{bmatrix},\varvec{C^{T}}_{\otimes }=\begin{bmatrix} \varvec{C^{T}}\\ 0 \end{bmatrix}\) After transforming the nonlinear system to bilinear form Eq. (69), there are several choices of calculating the projection basis. One particular choice was proposed in Ref. [143]. It was shown that after constructing a series of Krylov subspaces for given \(r_{1},r_{2},\ldots ,r_{K}\), as follows:

$$\begin{aligned} \text {colspace}(\varvec{V_{1}})=\mathcal {K}_{r_{1}}(\varvec{A}_{\otimes }^{-1},\varvec{B}_{\otimes }) \end{aligned}$$
(70)

and

$$\begin{aligned} \text {colspace}(\varvec{V_{k}})=\mathcal {K}_{r_{k}}(\varvec{A}_{\otimes }^{-1},\varvec{N}_{\otimes }\varvec{V_{k-1}}) \end{aligned}$$
(71)

for \(1<k<K\), the matrix \(\varvec{V}\) is calculated by taking the union of all the subspaces spanned by columns of \(\varvec{V_{k}}\), i.e.,

$$\begin{aligned} \text {colspan}(\varvec{V})=\bigcup _{k=1}^{K}\{\varvec{V_{k}}\} \end{aligned}$$
(72)

Another choice of transformation matrix \(\varvec{V}\) was proposed in Ref. [22], which matches as many multimoments as the reduced model and is given as:

$$\begin{aligned} \text {colspace}(\varvec{V_{1}})=\mathcal {K}_{r_{1}}(\varvec{A}_{\otimes }^{-1},\varvec{A}_{\otimes }^{-1}\varvec{B}_{\otimes }) \end{aligned}$$
(73)

and for \(k>1\)

$$\begin{aligned} \text {colspace}(\varvec{V_{k}})=\mathcal {K}_{r_{k}}(\varvec{A}_{\otimes }^{-1},\varvec{A}_{\otimes }^{-1}\varvec{N}_{\otimes }\varvec{V_{k-1}}) \end{aligned}$$
(74)

and the final transformation matrix \(\varvec{V}\) is constructed in a similar fashion, i.e.,

$$\begin{aligned} \text {colspan}(\varvec{V})=\bigcup _{k=1}^{K}\{\varvec{V_{k}}\} \end{aligned}$$
(75)

After constructing the matrix \(\varvec{V}\), the reduced bilinear system is then obtained by using the approximation \(\varvec{x}_{\otimes }=\varvec{V}\hat{\varvec{x}}_{\otimes }\) in system (69) given as:

$$\begin{aligned} \dot{\hat{\varvec{x}}}_{\otimes }(t)&=\hat{\varvec{A}}_{\otimes }\hat{\varvec{x}}_{\otimes }+\hat{\varvec{N}}_{\otimes }\hat{\varvec{x}}_{\otimes }\varvec{u}(t)+\hat{\varvec{B}}_{\otimes }\varvec{u}(t) \end{aligned}$$
(76a)
$$\begin{aligned} \varvec{y}(t)&=\hat{\varvec{C}}_{\otimes }\hat{\varvec{x}}_{\otimes } \end{aligned}$$
(76b)

where \(\varvec{x}_{\otimes }=\varvec{V\hat{x}}_{\otimes }\), \(\hat{\varvec{A}}_{\otimes }=(\varvec{V}^{T}\varvec{A}_{\otimes }\varvec{V}), \hat{\varvec{N}}_{\otimes }=\varvec{V}^{T}\varvec{N}_{\otimes }\varvec{V}\), \(\hat{\varvec{B}}_{\otimes }=\varvec{V}^{T}\varvec{B}_{\otimes }\) and \(\hat{\varvec{C}}_{\otimes }=\varvec{CV}_{\otimes }\).

Another way of reducing bilinear systems is the balanced truncation (BT) for bilinear system method originally developed by Achar and Nakhla [2], later developed by Condon and Ivanov [126] and by Benner et al. [37]. This method is similar to standard BT method for linear systems. Although, this method outperforms moment-matching methods in terms of approximating quality, it involves solving the generalized Lyapunov equation which is computationally demanding.

3.2.3 Trajectory Piecewise Linear (TPWL) Method

TPWL method replaces the polynomial approximation of the nonlinear system with a piece-wise approximation. This method offers more robustness when dealing with strong nonlinearities. The idea is to linearize the nonlinear function \(\varvec{f}(.)\) at a number of linearization points \(\varvec{x}_{i} (i=0,1,\ldots ,l)\) in response to some training input. The nonlinear function \(\varvec{f}(.)\) is then approximated by the weighted sum of these linear models i.e., \(\varvec{f}(\varvec{x}(t))\approx \sum _{i=0}^{l}w_{i}(\varvec{f}(\varvec{x}_{i})+\varvec{A}_{i}(\varvec{x}(t)-\varvec{x}_{i}))\) and the original system is represented as:

$$\begin{aligned} \dot{\varvec{x}}(t)&=\sum _{i=0}^{l}w_{i}(\varvec{x}(t))\varvec{f}(\varvec{x}_{i})\\&+\sum _{i=0}^{l}w_{i}(\varvec{x}(t))\varvec{A}_{i}(\varvec{x}(t)-\varvec{x}_{i}))+\varvec{Bu}(t)\\ \varvec{y}(t)&=\varvec{Cx}(t) \end{aligned}$$
(77)

where \(w_{i},i=0,1,\ldots ,l\) is a vector of weights and \(\varvec{A}_{i}\) is the Jacobian matrix of \(\varvec{f}(.)\) at \(\varvec{x}_{i}\). The projection matrix \(\varvec{V}\), in this case, is obtained as orthonormal basis of the following Krylov subspace which spans the reduced state-space.

$$\begin{aligned} \text {colspan}(\varvec{V})=\text {span}\{\varvec{A}_{0}^{-1}\varvec{B},\ldots ,\varvec{A}_{0}^{-(r-1)}\varvec{B}\} \end{aligned}$$
(78)

The reduction is then performed by using the approximation ansatz \(\varvec{x}(t)=\varvec{Vx_{r}}(t)\) in Eq. (77).

$$\begin{aligned} \dot{\varvec{x}_{r}}(t)&=\left( \sum _{i=0}^{l} \varvec{V}^{T}\varvec{A}_{i}\varvec{V} \tilde{w}_{i}(\varvec{x_{r}}(t))\right) \varvec{x_{r}}(t)\\&+\sum _{i=0}^{l}\varvec{V}^{T}(\varvec{f}(\varvec{x}_{i}-\varvec{A}_{i}\varvec{x}_{i})\tilde{w}_{i}(\varvec{x_{r}}(t))+\varvec{V}^{T}\varvec{Bu}(t)\\ \varvec{y}(t)&=\varvec{CVx_{r}}(t) \end{aligned}$$

Here, the weights \(\tilde{w}\) depend on \(\varvec{x_{r}}\) and can be computed using the information about the distances \(||\varvec{x_{r}}-\varvec{x_{r}}_{i}||\) of the (projected) linearization points \(\varvec{V}^{T}\varvec{x}_{i}\) from the current state \(\varvec{x_{r}}\). An error bound for TPWL was proposed in Ref. [154] along-with some discussion on stability and passivity preservation. A down-side of this method, however, is the choice of an appropriate training input. The reduced model loses accuracy if the training inputs are chosen far away from the actual inputs. This happens because the computed trajectory departs from the actual behavior of the state vector \(\varvec{x}(t)\).

In continuation with this, a polynomial piece-wise approximation was proposed by Dong and Roychowdhury [55] instead of a linear piece-wise approximation for each piece. Similar ideas later emerged in various studies such as in Refs. [177]. A more detailed review on piece-wise linearzation based reduction methods can be found in Ref. [95].

3.3 Issues in Nonlinear MOR

Although nonlinear MOR methods have seen a prominent success, there are still some unresolved queries/issues, which are enlisted as follows:

  1. 1.

    A common issue among all nonlinear reduction methods is the lack of practical global error bounds for the reduced-order model.

  2. 2.

    Nonlinear moment-matching based MOR method presented in Refs. [15,16,17,18] involves the a priori solution of the nonlinear, state-dependent, Sylvester-like PDE (61) which is computationally expensive.

  3. 3.

    The choice of the signal generator in NLMM [176] is heuristic and requires repeated simulations.

  4. 4.

    There is no a priori knowledge of acceptable training inputs and linearization points in the TPWL method.

  5. 5.

    An optimal choice for the number of multi-moments in the bilinearization methods is not known.

4 Concluding Remarks

The idea of automatically extracting reduced-order models from large-scale dynamical systems is a central challenge. By and large, reduced-order modeling via moment-matching techniques for linear systems is relatively well explored, and a suite of efficient algorithms is available at our disposal. However, constructing optimal approximations of the large-scale models while preserving passivity or stability remains elusive. Furthermore, the systems arising while modeling complex fluids or in the simulation of electronics circuits are inherently nonlinear. Thus, employing nonlinear reduction techniques directly to these systems is desirable. The nonlinear model order reduction area though witnessed some notable achievements in the past few years, however certain issues (as discussed) remain challenging. Moment-matching-based MOR methods have seen a long history. Numerous modifications/advancements have appeared from time to time to make this technique a more robust platform for treating problems in large-scale settings. Some of these modifications are currently a work in progress, whereas others are yet to happen.

In this survey, we have reviewed the model order reduction problem from the moment-matching perspective. Both frequency and time-domain notions of the moment-matching technique are discussed. We have pointed out the differences and similarities among various methods. Several open problems are also exposed in the case of linear and nonlinear reduction methods.