1 Introduction

In many mathematical and engineering applications, a phenomenon of interest can be summarized in different ways. For instance, to describe the state of a two-dimensional incompressible fluid flow, one can either record velocity and pressure fields or stream function and vorticity (Hirsch 2007). Furthermore, these states can often be approximated using a low-dimensional set of proper orthogonal decomposition (POD) modes (Holmes et al. 1998), a set of dynamic modes (Schmid 2010; Schmid et al. 2012), or a finite collection of Lagrangian particles (Monaghan 1992). A mathematical example is the linear time-invariant (LTI) system provided by \(\varvec{x}(n+1) = \varvec{A} \varvec{x}(n)\), where \(\varvec{x}(n)\) is the system state at the nth timestep. Written as such, the evolution of \(\varvec{x}\) is governed by the eigenvalues of \(\varvec{A}\). One could also consider the invertible but nonlinear change in variables, \(\varvec{z}(n) = \varvec{T}(\varvec{x}(n))\), which generates a nonlinear evolution law for \(\varvec{z}\). Both approaches (i.e., \(\varvec{x}\) or \(\varvec{z}\)) describe the same fundamental behavior, yet one description may be preferable to others. For example, solving an LTI system is almost certainly preferable to evolving a nonlinear system from a computational standpoint.

In general, one measures (or computes) the state of a system using a set of scalar observables, which are functions defined on state space, and watches how the values of these functions evolve in time. As we will show shortly, one can write an evolution law for the dynamics of this set of observables, and if they happen to be “rich enough,” reconstruct the original system state from the observations. Because the properties of this new dynamical system depend on our choice of variables (observables), it would be highly desirable if one could find a set of observables whose dynamics appear to be governed by a linear evolution law. If such a set could be identified, the dynamics would be completely determined by the spectrum of the evolution operator. Furthermore, this could enable the simple yet effective algorithms designed for linear systems, for example controller design (Todorov 2007; Stengel 2012) or stability analysis (Mauroy and Mezic 2013; Lehoucq et al. 1998), to be applied to nonlinear systems.

Mathematically, the evolution of observables of the system state is governed by the Koopman operator (Koopman and Neumann 1932; Koopman 1931; Budišić et al. 2012; Rowley et al. 2009), which is a linear but infinite-dimensional operator that is defined for a given dynamical system. Of particular interest here is the “slow” subspace of the Koopman operator, which is the span of the eigenfunctions associated with eigenvalues near the unit circle in discrete time (or near the imaginary axis in continuous time). These eigenvalues and eigenfunctions capture the long-term dynamics of observables that appear after the fast transients have subsided and could serve as a low-dimensional approximation of the otherwise infinite-dimensional operator when a spectral gap, which clearly delineates the “fast” and “slow” temporal dynamics, is present. In addition to the eigenvalues and eigenfunctions, the final element of Koopman spectral analysis is the set of Koopman modes for the full-state observable (i.e., the identity operator) (Budišić et al. 2012; Rowley et al. 2009) which are vectors that enable us to reconstruct the state of the system as a linear combination of the Koopman eigenfunctions. Overall, the “tuples” of Koopman eigenfunctions, eigenvalues, and modes enable us to: (a) transform state space so that the dynamics appear to be linear, (b) determine the temporal dynamics of the linear system, and (c) reconstruct the state of the original system from our new linear representation. In principle, this framework is quite broadly applicable and useful even for problems with multiple attractors that cannot be accurately approximated using models based on local linearization.

There are several algorithms in the literature that can computationally approximate subsets of these quantities. Three examples are numerical implementations of generalized Laplace analysis (GLA) (Budišić et al. 2012; Mauroy and Mezić 2012; Mauroy et al. 2013), the Ulam Galerkin method (Froyland et al. 2014; Bollt and Santitissadeekorn 2013), and dynamic mode decomposition (DMD) (Schmid 2010; Tu et al. 2014; Rowley et al. 2009). None of these techniques require explicit governing equations, so all, in principle, can be applied directly to data. GLA can approximate both the Koopman modes and eigenfunctions, but it requires knowledge of the eigenvalues to do so (Budišić et al. 2012; Mauroy and Mezić 2012; Mauroy et al. 2013). The Ulam Galerkin method has been used to approximate the eigenfunctions and eigenvalues (Froyland et al. 2014), though it is more frequently used to generate finite-dimensional approximations of the Perron–Frobenius operator, which is the adjoint of the Koopman operator. Finally, DMD has been used to approximate the Koopman modes and eigenvalues (Rowley et al. 2009; Tu et al. 2014), but not the Koopman eigenfunctions.

Even in pairs instead of triplets, approximations of these quantities are useful. DMD and its variants (Wynn et al. 2013; Chen et al. 2012; Jovanović et al. 2014) have been successfully used to analyze nonlinear fluid flows using data from both experiments and computation (Schmid 2010; Muld et al. 2012; Seena and Sung 2011). GLA and similar methods have been applied to extract meaningful spatio-temporal structures using sensor data from buildings and power systems (Eisenhower et al. 2010; Susuki and Mezić 2011, 2012, 2014). Finally, the Ulam Galerkin method has been used to identify coherent structures and almost invariant sets (Froyland et al. 2007; Froyland and Padberg 2009; Froyland 2005) based on the singular value decomposition of (a slight modification of) the Perron–Frobenius operator.

In this manuscript, we present a data-driven method that approximates the leading Koopman eigenfunctions, eigenvalues, and modes from a data set of successive “snapshot” pairs and a dictionary of observables that spans a subspace of the space of scalar observables. There are many possible ways to choose this dictionary, and it could be comprised of polynomials, Fourier modes, spectral elements, or other sets of functions of the full-state observable. We will argue that this approach is an extension of DMD that can produce better approximations of the Koopman eigenfunctions; as such, we refer to it as extended dynamic mode decomposition (EDMD). One regime where the behavior of both EDMD and DMD can be formally analyzed and contrasted is in the limit of large data. Following the definition of DMD in Tu et al. (2014), we consider the case where this large data set consists of a distribution of snapshot pairs rather than a single time series. In this regime, we will show that the numerical approximation of the Koopman eigenfunctions generated by EDMD converges to the numerical approximation we would obtain from a Galerkin method (Boyd 2013) in that the residual is orthogonal to the subspace spanned by the elements of the dictionary. With finite amounts of data, we will demonstrate the effectiveness of EDMD on two deterministic examples: one that highlights the quantitative accuracy of the method and other a more practical application.

Because EDMD is an entirely data-driven procedure, it can also be applied to data from stochastic systems without any algorithmic changes. If the underlying system is a Markov process, we will show that EDMD approximates the eigenfunctions of the Kolmogorov backward equation (Givon et al. 2004; Bagheri 2014) which has been called the stochastic Koopman operator (SKO) (Mezić 2005). Once again, we will demonstrate the effectiveness of the EDMD procedure when the amount of data is limited, by applying it to two stochastic examples: the first to test the accuracy of the method and the second to highlight a potential application of EDMD as a nonlinear manifold learning technique. In the latter example, we highlight two forms of model reduction: reduction that occurs when the dynamics of the system state are constrained to a low-dimensional manifold and reduction that occurs when the evolution of statistical moments of the stochastic dynamical system are effectively low dimensional.

In the remainder of the manuscript, we will detail the EDMD algorithm and show (when mathematically possible) or demonstrate through examples that it accurately approximates the leading Koopman eigenfunctions, eigenvalues, and modes for both deterministic and stochastic sets of data. In particular, in Sect. 2, the EDMD algorithm will be presented, and we will prove that it converges to a Galerkin approximation of the Koopman operator given a sufficiently large amount of data. In Sect. 3, we detail three choices of dictionary that we have found to be effective in a broad set of applications. In Sect. 4, we will demonstrate that the EDMD approximation can be accurate even with finite amounts of data and can yield useful parameterizations of common dynamical structures in problems with multiple basins of attraction when the underlying system is deterministic. In Sect. 5, we experiment by applying EDMD to stochastic data and show it approximates the eigenfunctions of the SKO for Markov processes. Though the interpretation of the eigenfunctions now differs, we demonstrate that they can still be used to accomplish useful tasks such as the parameterization of nonlinear manifolds. Finally, some brief concluding remarks are given in Sect. 6.

2 Dynamic Mode Decomposition and the Koopman Operator

Our ambition in this section is to establish the connection between the Koopman operator and what we call EDMD. To accomplish this, we will define the Koopman operator in Sect. 2.1. Using this definition, we will outline the EDMD algorithm in Sect. 2.2 and then show how it can be used to approximate the Koopman eigenvalues, eigenfunctions, and modes. Next, in Sect. 2.3, we will prove that the EDMD method almost surely converges to a Galerkin method in the limit of large data. Finally, in Sect. 2.4, we will highlight the connection between the EDMD algorithm and standard DMD.

2.1 The Koopman Operator

Because the Koopman operator is central to all that follows, we will define it along with the properties relevant to EDMD in this subsection. No new mathematical results are presented here; our only objective is to include for completeness the terms and definitions we will require later in the paper. In this manuscript, we consider the case where the dynamics of interest are governed by an autonomous, discrete-time dynamical system \(({\mathcal {M}}, n, \varvec{F})\), where \({\mathcal {M}}\subseteq \mathbb {R}^N\) is the state space, \(n\in \mathbb {Z}\) is (discrete) time, and \(\varvec{F}:{\mathcal {M}}\rightarrow {\mathcal {M}}\) is the evolution operator. Unlike \(\varvec{F}\), which acts on \(\varvec{x} \in {\mathcal {M}}\), the Koopman operator, \(\mathcal {K}\), acts on functions of state space, \(\psi \in {\mathcal {F}}\) with \(\psi :{\mathcal {M}}\rightarrow \mathbb {C}\). The action of the Koopman operator is

$$\begin{aligned} \mathcal {K}\psi = \psi \circ \varvec{F}, \end{aligned}$$
(1)

where \(\circ \) denotes the composition of \(\psi \) with \(\varvec{F}\). We stress once again that the Koopman operator maps functions of state space to functions of state space and not states to states (Koopman and Neumann 1932; Koopman 1931; Budišić et al. 2012).

In essence, the Koopman operator defines a new dynamical system, \(({\mathcal {F}}, n, \mathcal {K})\), that governs the evolution of observables, \(\psi \in {\mathcal {F}}\), in discrete time. In what follows, we assume that \({\mathcal {F}}= L^2({\mathcal {M}}, \rho )\), where \(\rho \) is a positive, single-valued analytic function with \(\Vert \rho \Vert _{\mathcal {M}}= \int _{\mathcal {M}}\rho (\varvec{x}) \;d\varvec{x} = 1\), but not necessarily an invariant measure of the underlying dynamical system. This assumption, which has been made before in the literature (Budišić et al. 2012; Koopman 1931), is required so that the inner products in the Galerkin-like method we will present can be taken. Because it acts on functions, \(\mathcal {K}\) is infinite dimensional even when \(\varvec{F}\) is finite dimensional, but provided that \({\mathcal {F}}\) is a vector space, it is also linear even when \(\varvec{F}\) is nonlinear. The infinite-dimensional nature of the Koopman operator is potentially problematic, but if it can, practically, be truncated without too great a loss of accuracy (e.g., if the system has multiple timescales), then the result would be a linear and finite-dimensional approximation. Therefore, the promise of the Koopman approach is to take the tools developed for linear systems and apply them to the dynamical system defined by the Koopman operator, thus obtaining a linear approximation of a nonlinear system without directly linearizing around a particular fixed point.

The dynamical system defined by \(\varvec{F}\) and the one defined by \(\mathcal {K}\) are two different parameterizations of the same fundamental behavior. The link between these parameterizations is the identity operator or “full-state observable,” \(\varvec{g}(\varvec{x}) = \varvec{x}\) and \(\{(\mu _k, \varphi _k, \varvec{v}_k)\}_{k=1}^{N_K}\), the set of \({N_K}\) “tuples” of Koopman eigenvalues, eigenfunctions, and modes required to reconstruct the full state. Note that \({N_K}\) could (and often will) be infinite. Although \(\varvec{g}\) is a vector-valued observable, each component of it is a scalar-valued observable, i.e., \(g_i\in {\mathcal {F}}\) where \(g_i\) is the ith component of \(\varvec{g}\). Assuming \(g_i\) is in the span of our set of \({N_K}\) eigenfunctions, \(g_i = \sum _{k=1}^{N_K} v_{ik}\varphi _k\) with \(v_{kj}\in \mathbb {C}\). Then, \(\varvec{g}\) can be obtained by “stacking” these weights into vectors (i.e., \(\varvec{v}_j = [v_{1j}, v_{2j},\ldots , v_{Nj}]^T\)). As a result,

$$\begin{aligned} \varvec{x} = \varvec{g}(\varvec{x}) = \sum _{k=1}^{N_K} \varvec{v}_k \varphi _k(\varvec{x}), \end{aligned}$$
(2)

where \(\varvec{v}_k\) is the kth Koopman mode and \(\varphi _k\) is the kth Koopman eigenfunction. In doing this, we have assumed that each of the scalar observables that comprise \(\varvec{g}\) are in the subspace of \({\mathcal {F}}\) spanned by our \({N_K}\) eigenfunctions, but we have not assumed that the eigenfunctions form a basis for \({\mathcal {F}}\). In some applications, the Koopman operator will have a continuous spectrum, which as shown in  Mezić (2005) would introduce an additional term in (2). The system state at future times can be obtained either by directly evolving \(\varvec{x}\) or by evolving the full-state observable through Koopman:

$$\begin{aligned} \varvec{F}(\varvec{x})&= \left( \mathcal {K}\varvec{g}\right) (\varvec{x}) = \sum _{k=1}^{N_K} \varvec{v}_k (\mathcal {K}\varphi _k)(\varvec{x}) = \sum _{k=1}^{N_K} \mu _k \varvec{v}_k \varphi _k(\varvec{x}). \end{aligned}$$
(3)

This representation of \(\varvec{F}(\varvec{x})\) is particularly advantageous because the dynamics associated with each eigenfunction are determined by its corresponding eigenvalue.

Figure 1 shows a commutative diagram that acts as a visual summary of this section. The top row shows the direct evolution of states, \(\varvec{x} \in {\mathcal {M}}\), governed by \(\varvec{F}\); the bottom row shows the evolution of observables, \(\psi \in {\mathcal {F}}\), governed by the Koopman operator. Although \(\varvec{F}\) and \(\mathcal {K}\) act on different spaces, they encapsulate the same dynamics. For example, once given a state \(\varvec{x}\), to compute \((\mathcal {K}\psi )(\varvec{x})\) one could either take the observable \(\psi \), apply \(\mathcal {K}\), and evaluate it at \(\varvec{x}\) (the bottom route), or use \(\varvec{F}\) to compute \(\varvec{F}(\varvec{x})\) and then evaluate \(\psi \) at this updated position (the top route). Similarly, to compute \(\varvec{F}(\varvec{x})\), one could either apply \(\varvec{F}\) to \(\varvec{x}\) (the top route) or apply \(\mathcal {K}\) to the full-state observable and evaluate \((\mathcal {K}\varvec{g})(\varvec{x})\) (the bottom route). As a result, one can either choose to work with a finite-dimensional, nonlinear system or an infinite-dimensional, linear system depending upon which “path” is simpler/more useful for a given problem.

Fig. 1
figure 1

A cartoon of the Koopman operator and how it relates to the underlying dynamical system. The top path updates the state, \(\varvec{x}\in {\mathcal {M}}\), using the evolution operator \(\varvec{F}\). The bottom path updates the observables, \(\psi \in {\mathcal {F}}\), using the Koopman operator, \(\mathcal {K}\). Here, both dynamical systems are autonomous, so the (discrete) time, \(n\in \mathbb {Z}\), does not appear explicitly. The connection between the states and observables is through the full-state observable \(\varvec{g}(\varvec{x}) = \varvec{x}\). By writing \(\varvec{g}\) in terms of the Koopman eigenfunctions, we substitute the complex evolution of \(\varvec{x}\) with the straightforward, linear evolution of the \({\varphi _i}\). To reconstruct \(\varvec{x}\), we superimpose the Koopman eigenfunctions evaluated at a point, which satisfy \((\mathcal {K}\varphi _{i})({x}_{i}) = \mu _i\varphi _i(\varvec{x}_i)\), using the Koopman modes as shown in (3). As a result, these two “paths” commute, and one can either solve a finite dimensional but nonlinear problem (the top path) or an infinite dimensional but linear problem (the bottom path) if one can compute the Koopman eigenvalues, eigenfunctions, and modes

We should note that for autonomous systems of ordinary differential equations (ODEs), such as

$$\begin{aligned} \dot{\varvec{x}} = \varvec{f}(\varvec{x}), \end{aligned}$$
(4)

there is a semigroup of Koopman operators, \(\mathcal {K}_{\Delta t}\), each of which is associated with the flow map for the time interval \(\Delta t\) (Budišić et al. 2012; Santitissadeekorn and Bollt 2007). The infinitesimal generator of this semigroup is

$$\begin{aligned} \hat{\mathcal {K}} \psi = \varvec{f} \cdot \nabla \psi , \end{aligned}$$
(5)

which we will refer to as the continuous-time Koopman operator. If \(\varphi _k\) is the kth eigenfunction of \(\hat{\mathcal {K}}\) associated with the eigenvalue \(\lambda _k\), then the future value of some vector-valued observable \(\varvec{g}\) can be written as

$$\begin{aligned} \varvec{g}(\varvec{x}(t + \Delta t)) = (\mathcal {K}_{\Delta t} \varvec{g})(\varvec{x}(t)) = \sum _{k=1}^{N_K} e^{\lambda _k \Delta t}\varvec{v}_k \varphi _k(\varvec{x}(t)), \end{aligned}$$
(6)

which is similar in form to (3), but now “fast” and “slow” are determined by the real part of \(\lambda _k\) rather than its absolute value.

In what follows, we will not have access to the “right-hand side” function, \(\varvec{f}\), and therefore cannot approximate \(\hat{\mathcal {K}}\) directly. However, if the discrete-time dynamical system \(\varvec{F}\) is the flow map associated with \(\varvec{f}\) for a fixed time interval \(\Delta t\) (i.e., \(\mathcal {K}= \mathcal {K}_{\Delta t}\)), then (3) and (6) are equivalent with \(\mu _k = e^{\lambda _k\Delta t}\). As a result, although we will be approximating the Koopman operator associated with discrete-time dynamical systems, we will often present our results in terms of \(\lambda _k\) rather than \(\mu _k\) when the underlying system is a flow rather than map.

2.2 Extended Dynamic Mode Decomposition

In this subsection, we outline extended dynamic mode decomposition (EDMD), which is a method that approximates the Koopman operator and therefore the Koopman eigenvalue, eigenfunction, and mode tuples defined in Sect. 2.1. The EDMD procedure requires two prerequisites: (a) a data set of snapshot pairs, i.e., \(\{(\varvec{x}_m, \varvec{y}_m)\}_{m=1}^M\) that we will organize as a pair of data sets,

$$\begin{aligned} \varvec{X} = \begin{bmatrix} \varvec{x}_1&\varvec{x}_2&,\ldots ,&\varvec{x}_M \end{bmatrix}, \qquad \varvec{Y} = \begin{bmatrix} \varvec{y}_1&\varvec{y}_2&,\ldots ,&\varvec{y}_M \end{bmatrix}, \end{aligned}$$
(7)

where \(\varvec{x}_i\in {\mathcal {M}}\) and \(\varvec{y}_i\in {\mathcal {M}}\) are snapshots of the system state with \(\varvec{y}_i = \varvec{F}(\varvec{x}_i)\), and (b) a dictionary of observables, \({\mathcal {D}}=\{\psi _1, \psi _2, \ldots , \psi _{N_K}\}\) where \(\psi _i\in {\mathcal {F}}\), whose span we denote as \({\mathcal {F}}_{{\mathcal {D}}}\subset {\mathcal {F}}\); for brevity, we also define the vector-valued function \(\varvec{\Psi }:{\mathcal {M}}\rightarrow \mathbb {C}^{1\times N_K}\) where

$$\begin{aligned} \varvec{\Psi }(\varvec{x}) = \begin{bmatrix} \psi _1(\varvec{x})&\psi _2(\varvec{x})&,\ldots ,&\psi _{N_K}(\varvec{x}) \end{bmatrix} . \end{aligned}$$
(8)

The data set needed is typically constructed from multiple short bursts of simulation or from experimental data. For example, if the data were given as a single time series, then for a given snapshot \(\varvec{x}_i\), \(\varvec{y}_i = \varvec{F}(\varvec{x}_i)\) is the next snapshot in the time series. In the original definition of DMD (Schmid 2010; Rowley et al. 2009), the data provided to the algorithm was assumed to come in the form of a single time series. This restriction was relaxed in  Tu et al. (2014), where it was shown that a generalization of DMD that requires only a data set of snapshot pairs would produce equivalent results. Because EDMD is based on this definition of DMD, we will assume only that \(\varvec{x}_i\) and \(\varvec{y}_i\) are related by the dynamics even if the entire data set is in the form of a time series. However, if the data were the form of a time series, then EDMD could also be understood using the Krylov-based arguments used to justify “standard” DMD (Rowley et al. 2009). Furthermore, the optimal choice of dictionary elements remains an open question, but a short discussion including some pragmatic choices will be given in Sect. 3. For now, we assume that \({\mathcal {D}}\) is “rich enough” to accurately approximate a few of the leading Koopman eigenfunctions.

2.2.1 Approximating the Koopman Operator and its Eigenfunctions

Now we seek to generate \(\varvec{K}\in \mathbb {C}^{{N_K} \times {N_K}}\), a finite-dimensional approximation of \(\mathcal {K}\). By definition, a function \(\psi \in {\mathcal {F}}_{{\mathcal {D}}}\) can be written as

$$\begin{aligned} \psi (\varvec{x}) = \sum _{k=1}^{N_K} a_k\psi _k(\varvec{x}) = \varvec{\Psi }(\varvec{x})\varvec{a}, \end{aligned}$$
(9)

the linear superposition of the \(N_K\) elements in the dictionary with the weights \(\varvec{a}\). Note that unlike in (2), where \(N_k\) could be infinite, \(N_k\) is finite in both (9) and all that follows. Because \({\mathcal {F}}_{{\mathcal {D}}}\) is typically not an invariant subspace of \(\mathcal {K}\),

$$\begin{aligned} (\mathcal {K}\psi )(\varvec{x}) = (\varvec{\Psi }\circ \varvec{F})(\varvec{x}) \varvec{a} = \varvec{\Psi }(\varvec{x})(\varvec{K}\varvec{a}) + r(\varvec{x}) \end{aligned}$$
(10)

which includes the residual term \(r\in {\mathcal {F}}\). To determine \(\varvec{K}\), we will minimize

$$\begin{aligned} \begin{aligned} J&= \frac{1}{2}\sum _{m=1}^M |r(\varvec{x}_m)|^2 \\&= \frac{1}{2}\sum _{m=1}^M \left| ((\varvec{\Psi }\circ \varvec{F})(\varvec{x}_m) - \varvec{\Psi }(\varvec{x}_m)\varvec{K})\varvec{a} \right| ^2 \\&= \frac{1}{2}\sum _{m=1}^M \left| (\varvec{\Psi }(\varvec{y}_m) - \varvec{\Psi }(\varvec{x}_m)\varvec{K})\varvec{a} \right| ^2 \\ \end{aligned} \end{aligned}$$
(11)

where \(\varvec{x}_m\) is the mth snapshot in \(\varvec{X}\), and \(\varvec{y}_m = \varvec{F}(\varvec{x}_m)\) is the mth snapshot in \(\varvec{Y}\). Equation 11 is a least-squares problem and therefore cannot have multiple isolated local minima; it must have either a unique global minimizer or a continuous family (or families) of minimizers. As a result, regularization (here via the truncated singular value decomposition) may be required to ensure the solution is unique, and the \(\varvec{K}\) that minimizes (11) is:

$$\begin{aligned} \varvec{K} \triangleq \varvec{G}^+\varvec{A}, \end{aligned}$$
(12)

where \(+\) denotes the pseudoinverse and

$$\begin{aligned} \varvec{G}&= \frac{1}{M}\sum _{m=1}^M \varvec{\Psi }(\varvec{x}_m)^* \varvec{\Psi }(\varvec{x}_m),\end{aligned}$$
(13a)
$$\begin{aligned} \varvec{A}&= \frac{1}{M}\sum _{m=1}^M \varvec{\Psi }(\varvec{x}_m)^* \varvec{\Psi }(\varvec{y}_m), \end{aligned}$$
(13b)

with \(\varvec{K},\varvec{G},\varvec{A}\in \mathbb {C}^{{N_K} \times {N_K}}\). As a result, \(\varvec{K}\) is a finite-dimensional approximation of \(\mathcal {K}\) that maps \(\psi \in {\mathcal {F}}_{{\mathcal {D}}}\) to some other \(\hat{\psi }\in {\mathcal {F}}_{{\mathcal {D}}}\) by minimizing the residuals at the data points. As a consequence, if \(\varvec{\xi }_j\) is the jth eigenvector of \(\varvec{K}\) with the eigenvalue \(\mu _j\), then the EDMD approximation of an eigenfunction of \(\mathcal {K}\) is

$$\begin{aligned} \varphi _j(\varvec{x}) = \varvec{\Psi }(\varvec{x})\varvec{\xi }_j. \end{aligned}$$
(14)

Finally, in many applications the discrete-time data in \(\varvec{X}\) and \(\varvec{Y}\) are generated by a continuous-time process with a sampling interval of \(\Delta t\). If this is the case, we define \(\lambda _j = \frac{\ln (\mu _j)}{\Delta t}\) to approximate the eigenvalues of the continuous-time system. In the remainder of the manuscript, we denote the eigenvalues of \(\varvec{K}\) with the \(\mu _j\) and (when applicable) the approximation of the corresponding continuous-time eigenvalues as \(\lambda _j\). Although both embody the same information, one choice is often more natural for a specific problem.

2.2.2 Computing the Koopman Modes

Next, we will compute approximations of the Koopman modes for the full-state observable using EDMD. Recall that the Koopman modes are the weights needed to express the full state in the Koopman eigenfunction basis. As such, we will proceed in two steps: First, we will express the full-state observable using the elements of \({\mathcal {D}}\); then, we will find a mapping from the elements of \({\mathcal {D}}\) to the numerically computed eigenfunctions. Applying these two steps in sequence will yield the observables expressed as a linear combination of Koopman eigenfunctions, which are, by definition, the Koopman modes for the full-state observable.

Recall that the full-state observable, \(\varvec{g}(\varvec{x}) = \varvec{x}\), is a vector-valued observable (e.g., \(\varvec{g}:{\mathcal {M}}\rightarrow \mathbb {R}^N\)) that can be generated by “stacking” N scalar-valued observables, \(g_i:{\mathcal {M}}\rightarrow \mathbb {R}\), as follows:

$$\begin{aligned} \varvec{g}(\varvec{x}) = \begin{bmatrix} g_1(\varvec{x})\\ g_2(\varvec{x}) \\ \vdots \\ g_N(\varvec{x}) \end{bmatrix} = \begin{bmatrix} \varvec{e}_1^*\varvec{x} \\ \varvec{e}_2^*\varvec{x} \\ \vdots \\ \varvec{e}_N^* \varvec{x} \end{bmatrix}, \end{aligned}$$
(15)

where \(\varvec{e}_i\) is the ith unit vector in \(\mathbb {R}^N\). At this time, we conveniently assume that all \(g_i(\varvec{x})\in {\mathcal {F}}_{{\mathcal {D}}}\) so that \(g_i(\varvec{x}) = \sum _{k=1}^{N_K} \psi _k(\varvec{x}) b_{k,i} = \varvec{\Psi }(\varvec{x})\varvec{b}_i\), where \(\varvec{b}_i\) is some appropriate vector of weights. If this is not the case, approximate Koopman modes can be computed by projecting \(g_i\) onto \({\mathcal {F}}_{\mathcal {D}}\), though the accuracy and usefulness of this fit clearly depend on the choice of \({\mathcal {D}}\). To avoid this issue, we take \(g_i\in {\mathcal {F}}_{\mathcal {D}}\) for \(i=1,\ldots ,N\) in all examples that follow. In either case, the entire vector-valued observable can be expressed (or approximated) in this manner as

$$\begin{aligned} \varvec{g}(\varvec{x}) = \varvec{B}^T \varvec{\Psi }(\varvec{x})^T = (\varvec{\Psi }(\varvec{x})\varvec{B})^T, \qquad \varvec{B} = \begin{bmatrix} \varvec{b}_1&\varvec{b}_2&,\ldots ,&\varvec{b}_N \end{bmatrix}, \end{aligned}$$
(16)

where \(\varvec{B}\in \mathbb {C}^{K\times N_K}\).

Next, we will express the \(\psi _i\) in terms of all the \(\varphi _i\), which are our numerical approximations of the Koopman eigenfunctions. For notational convenience, we define the vector-valued function \(\varvec{\Phi }:{\mathcal {M}}\rightarrow \mathbb {C}^{1\times N_K}\), where

$$\begin{aligned} \varvec{\Phi }(\varvec{x}) = \begin{bmatrix} \varphi _1(\varvec{x})&\varphi _2(\varvec{x})&, \ldots ,&\varphi _{N_K}(\varvec{x}) \end{bmatrix}. \end{aligned}$$
(17)

Using (14) and (16), this function can also be written as

$$\begin{aligned} \varvec{\Phi }(\varvec{x}) = \varvec{\Psi }(\varvec{x})\varvec{\Xi } , \qquad \varvec{\Xi }= \begin{bmatrix} \varvec{\xi }_1&\varvec{\xi }_2&, \ldots ,&\varvec{\xi }_{N_K} \end{bmatrix}, \end{aligned}$$
(18)

where \(\varvec{\xi }_i\in \mathbb {C}^{N_K}\) is the ith eigenvector of \(\varvec{K}\) associated with \(\mu _i\). Therefore, we can determine the \(\psi _i\) as a function of \(\varphi _i\) by inverting \(\varvec{\Xi }^T\). Because \(\varvec{\Xi }\) is a matrix of eigenvectors, its inverse is

$$\begin{aligned} \varvec{\Xi }^{-1} = \varvec{W}^* = \begin{bmatrix} \varvec{w}_1&\varvec{w}_2&, \ldots ,\varvec{w}_{N_K} \end{bmatrix}^*, \end{aligned}$$
(19)

where \(\varvec{w}_i\) is the ith left eigenvector of \(\varvec{K}\) also associated with \(\mu _i\) (i.e., \(\varvec{w}_i^*\varvec{K} = \varvec{w}_i^*\mu _i\)) appropriately scaled so \(\varvec{w}_i^*\varvec{\xi }_i = 1\). We combine (16) and (19), and after some slight algebraic manipulation found that

$$\begin{aligned} \varvec{g}(\varvec{x}) = \varvec{V} \varvec{\Phi }(\varvec{x})^T = \sum _{k=1}^{N_K} \varvec{v}_k \varphi _k, \qquad \varvec{V} = \begin{bmatrix} \varvec{v}_1&\varvec{v}_2&\ldots&\varvec{v}_{N_K} \end{bmatrix} =\left( \varvec{W}^*\varvec{B}\right) ^T, \end{aligned}$$
(20)

where \(\varvec{v}_i = (\varvec{w}_i^*\varvec{B})^T\) is the ith Koopman mode. This is the formula for the Koopman modes that we desired.

2.2.3 Algorithm Summary

This subsection has been added to address the fourth referee’s suggestion about an algorithm summary section. In summary, the EDMD procedure requires the user to supply the following:

  1. (1)

    A data set of snapshot pairs \(\{(\varvec{x}_m, \varvec{y}_m)\}_{m=1}^M\).

  2. (2)

    A set of functions, \({\mathcal {D}}= \{\psi _1, \psi _2, \ldots , \psi _{N_K}\}\), that will be used to approximate the Koopman eigenfunctions; for brevity, we define the vector-valued function, \(\varvec{\Psi }\), in (8) that contains these dictionary elements.

  3. (3)

    A matrix \(\varvec{B}\), defined in (16), which contains the weights required to reconstruct the full-state observable using the elements of \({\mathcal {D}}\).

With these prerequisites, the procedure to obtain the Koopman eigenvalues, modes, and eigenfunctions is as follows:

  1. (1)

    Loop through the data set of snapshot pairs to form the matrices \(\varvec{G}\) and \(\varvec{A}\) using (13); these computations can be performed in parallel and are compatible with the MapReduce framework (Dean and Ghemawat 2008).

  2. (2)

    Form \(\varvec{K} \triangleq \varvec{G}^+\varvec{A}\). Then, compute the set of eigenvalues, \(\mu _i\), eigenvectors, \(\varvec{\xi }_i\), and left eigenvectors, \(\varvec{w}_i\).

  3. (3)

    Compute the set of Koopman modes by setting \(\varvec{v}_i \triangleq (\varvec{w}_i^*\varvec{B})^T\), where \(\varvec{v}_i\) is the ith Koopman mode.

As a result of this procedure, we have a set of \(\mu _i\), which is our approximation of the ith Koopman eigenvalue and a set of \(\varvec{v}_i\), which is our approximation of the ith Koopman mode. We also have the vector \(\varvec{\xi }_i\), which allows the ith Koopman eigenfunction to be approximated using (14).

Ultimately, the EDMD procedure is a regression procedure and can be applied to any set of snapshot pairs without modification. However, the contribution of this manuscript is to show how the regression problem above relates to the Koopman operator, and making this connection requires some knowledge about the process that created the data. If the underlying system is (effectively) a discrete-time, autonomous dynamical system (or, equivalently, an autonomous flow sampled at a fixed interval of \(\Delta t\)), then the results of this section hold. If the underlying system is stochastic, then a connection with the Koopman operator still exists, but will be discussed later in Sect. 5. Furthermore, even some non-autonomous systems could, in principle, be analyzed using EDMD by augmenting the state vector to include time; this, however, will be left for future work.

2.3 Convergence of the EDMD Algorithm to a Galerkin Method

In this subsection, we relate EDMD to the Galerkin methods one would use to approximate the Koopman operator with complete information about the underlying dynamical system. In this context, a Galerkin method is a weighted-residual method where the residual, as defined in (10), is orthogonal to the span of \({\mathcal {D}}\). In particular, we show that the EDMD approximation of the Koopman operator converges to the approximation that would be obtained from a Galerkin method in the large-data limit. In this manuscript, the large-data limit is when \(M\rightarrow \infty \) and the elements of \(\varvec{X}\) are drawn independently from a distribution on \({\mathcal {M}}\) with the probability density \(\rho \). Note that one method of generating a more complex \(\rho \) (e.g., not a Gaussian or uniform distribution) is to randomly sample points from a single, infinitely long trajectory. In this case, \(\rho \) is one of the natural measures associated with the underlying dynamical system. We also assume that \({\mathcal {F}}=L^2({\mathcal {M}}, \rho )\). The first assumption defines a process for adding new data points to our set and could be replaced with other sampling schemes. The second assumption is required so that the inner products in the Galerkin method converge, which is relevant for problems where \({\mathcal {M}}=\mathbb {R}^N\).

If EDMD were a Galerkin method, then the entries of \(\varvec{G}\) and \(\varvec{A}\) in (12) would be defined as

$$\begin{aligned} \begin{aligned} \hat{\varvec{G}}_{ij}&= \int _{{\mathcal {M}}} \psi _i^*(\varvec{x})\psi _j(\varvec{x})\rho (\varvec{x})\;\mathrm{d}\varvec{x} = \left\langle \psi _i, \psi _j\right\rangle _\rho , \\ \hat{\varvec{A}}_{ij}&= \int _{{\mathcal {M}}} \psi _i^*(\varvec{x})\psi _j(\varvec{F}(\varvec{x}))\rho (\varvec{x})\;\mathrm{d}\varvec{x} = \left\langle \psi _i, \mathcal {K}\psi _j\right\rangle _\rho , \end{aligned} \end{aligned}$$
(21)

where \(\left\langle p,q\right\rangle _\rho = \int _{{\mathcal {M}}} p^*(\varvec{x})q(\varvec{x})\rho (\varvec{x}) \; \mathrm{d}\varvec{x}\) is the inner product and the finite-dimensional Galerkin approximation of the Koopman operator would be \(\hat{\varvec{K}} = \hat{\varvec{G}}^{-1}\hat{\varvec{A}}\). The performance of this method certainly depends upon the choice of \(\psi _j\) and \(\rho \), but it is nevertheless a Galerkin method as the residual would be orthogonal to \({\mathcal {F}}_{\mathcal {D}}\) (Boyd 2013; Trefethen 2000). There are non-trivial questions about what sets of \(\psi \) and what measures, \(\rho \), are required if the Galerkin method is to generate a useful approximation of the Koopman operator (e.g., when can we “trust” our eigenfunctions if \(\rho \) is compactly supported but \({\mathcal {M}}=\mathbb {R}^N\)?), but they are beyond the scope of this manuscript and will be the focus of future work.

For a finite M, the ijth element of \(\varvec{G}\) is

$$\begin{aligned} \varvec{G}_{ij} \triangleq \frac{1}{M}\sum _{m=1}^M\psi _i^*(\varvec{x}_m)\psi _j(\varvec{x}_m), \end{aligned}$$
(22a)

So the ijth element of \(\varvec{G}\) contains the sample mean of \(\psi _i^*(\varvec{x})\psi _j(\varvec{x})\). Similarly,

$$\begin{aligned} \varvec{A}_{ij} \triangleq \frac{1}{M}\sum _{m=1}^M\psi _i^*(\varvec{x}_m)\psi _j(\varvec{y}_m). \end{aligned}$$
(22b)

When M is finite, (21) is approximated by (22). However, by the law of large numbers, the sample means almost surely converge to the expected values when the number of samples, M, becomes sufficiently large. For this system, the expectations can be written as

$$\begin{aligned} \begin{aligned} \lim _{M\rightarrow \infty } \varvec{G}_{ij}&= \int _{{\mathcal {M}}} \psi _i^*(\varvec{x})\psi _j(\varvec{x})\rho (\varvec{x})\;\mathrm{d}\varvec{x} = \left\langle \psi _i, \psi _j\right\rangle _\rho = \hat{\varvec{G}}_{ij}, \\ \lim _{M\rightarrow \infty } \varvec{A}_{ij}&= \int _{{\mathcal {M}}} \psi _i^*(\varvec{x})\psi _j(\varvec{F}(\varvec{x}))\rho (\varvec{x})\;\mathrm{d}\varvec{x} = \left\langle \psi _i, \mathcal {K}\psi _j\right\rangle _\rho = \hat{\varvec{A}}_{ij}, \end{aligned} \end{aligned}$$
(23)

which reintroduces the integrals in (21). As a result, the entries of \(\varvec{A}\) and \(\varvec{G}\) converge to their analytically determined values, and therefore, the output of the EDMD procedure will converge to the output of a Galerkin method. With randomly distributed initial data, the needed integrals are computed using Monte Carlo integration, and the rate of convergence will be \(\mathcal {O}(M^{-1/2})\). Other sampling choices, such as placing points on a uniform grid, effectively use different quadrature rules and could therefore obtain a better rate of convergence.

Implicit in this argument is the fact that \(\varvec{x}_m\) and \(\varvec{y}_m\) are snapshots of the system state, which implies that a snapshot, say \(\varvec{x}_m\), cannot map to “two different places.” However, there are many cases where \(\varvec{x}_m\) and \(\varvec{y}_m\) are only partial measurements of the system state, in which case this could happen. For example, \(\varvec{x}_m\) and \(\varvec{y}_m\) could consist of N elements of a vector in \(\mathbb {R}^{N_0}\) where \(N_0 > N\), and so some state measurements have simply been neglected. This is quite common and occurs when the data have been compressed using POD or other related techniques. Another common example is where \(\varvec{x}_m\) and \(\varvec{y}_m\) are snapshots of the full state, but the underlying system is periodically forced; in this case, the missing information is the “phase” of the forcing.

Because the Koopman operator acts on scalar observables, one could still consider the EDMD procedure as an approximation of a Koopman operator using a set of functions, \(\psi _k\), that are constant in the “missing” components. As a result, if the true eigenfunctions are nearly constant in these directions [an assumption that is justifiable in some cases, such as fast-slow systems (Froyland et al. 2014)], then the missing information may have only a small impact on the resulting eigenfunctions. However, this assumption clearly does not hold in general, so care must be taken in order to have as “complete” a set of measurements as possible.

2.4 Relationship with DMD

When M is not large, EDMD will not be an accurate Galerkin method because the quadrature errors generated by the Monte Carlo integrator will be significant, and so the residual will probably not be orthogonal to \({\mathcal {F}}_{{\mathcal {D}}}\). However, it is still formally an extension of DMD, which has empirically been shown to yield meaningful results even without exhaustive data sets. In this section, we show that EDMD is equivalent to DMD for a very specific—and restrictive—choice of \({\mathcal {D}}\) because EDMD and DMD will produce the same set of eigenvalues and modes for any set of snapshot pairs.

Because there are many conceptually equivalent but mathematically different definitions of DMD, the one we adopt here is taken from  Tu et al. (2014), which defines the DMD modes as the eigenvectors of the matrix

$$\begin{aligned} \varvec{K}_{\mathrm{DMD}}\triangleq \varvec{Y}\varvec{X}^+, \end{aligned}$$
(24)

where the jth mode is associated with the jth eigenvalue of \(\varvec{K}_{\mathrm{DMD}}\), \(\mu _j\). \(\varvec{K}_{\mathrm{DMD}}\) is constructed using the data matrices in (7), where \(+\) again denotes the pseudoinverse. This definition is a generalization of preexisting DMD algorithms (Schmid 2010; Schmid et al. 2011) and does not require the data to be in the form of a single time series. All that is required are states and their images (for a map) or their updated positions after a fixed interval in time of \(\Delta t\) (for a flow). However, in the event that they are in the form of a single time series, then as discussed in  Tu et al. (2014), this generalization is related to the Krylov method presented in  Rowley et al. (2009).

Now, we will prove that the Koopman modes computed using EDMD are equivalent to DMD modes, if the dictionary used for EDMD consists of scalar observables of the form \(\psi _i(\varvec{x}) = \varvec{e}_i^*\varvec{x}\) for \(i=1,\ldots ,N\). This is the special (if relatively restrictive) choice of dictionary alluded to earlier. In particular, we show that the ith Koopman mode, \(\varvec{v}_i\), is also an eigenvector of \(\varvec{K}_{\mathrm{DMD}}\), hence a DMD mode. Because the elements of the full-state observable are the dictionary elements, \(\varvec{B}=\varvec{I}\) in (16). Then, the Koopman modes are the complex conjugates of the left eigenvectors of \(\varvec{K}\), so \(\varvec{v}_i^T = \varvec{w}_i^*\). Furthermore, \(\varvec{G}^T = \frac{1}{M}\varvec{X}\varvec{X}^*\) and \(\varvec{A}^T = \frac{1}{M}\varvec{Y}\varvec{X}^*\). Then,

$$\begin{aligned} \varvec{K}^T = \varvec{A}^T\varvec{G}^{T+} = \varvec{Y}\varvec{X}^*\left( \varvec{X}\varvec{X}^*\right) ^+ = \varvec{Y}\varvec{X}^+ = \varvec{K}_{\mathrm{DMD}}. \end{aligned}$$
(25)

Therefore, \(\varvec{K}_{\mathrm{DMD}}\varvec{v}_i = (\varvec{v}_i^T\varvec{K}_{\mathrm{DMD}}^T)^T = (\varvec{w}_i^*\varvec{K})^T = (\mu _i \varvec{w}_i^*)^T = \mu _i \varvec{v}_i\), and all the Koopman modes computed by EDMD are eigenvectors of \(\varvec{K}_{\mathrm{DMD}}\) and are thus the DMD modes. Once again, the choice of dictionary is critical; EDMD and DMD are equivalent only for this very specific \({\mathcal {D}}\), and other choices of \({\mathcal {D}}\) will enable EDMD to generate different (and potentially more useful) results.

Conceptually, DMD can be thought of as producing an approximation of the Koopman eigenfunctions using the set of linear monomials as basis functions for \({\mathcal {F}}_{{\mathcal {D}}}\), which is analogous to a one-term Taylor expansion. For problems where the eigenfunctions can be approximated accurately using linear monomials (e.g., in some small neighborhood of a stable fixed point), then DMD will produce an accurate local approximation of the Koopman eigenfunctions. However, this is certainly not the case for all systems (particularly beyond the region of validity for local linearization). EDMD can be thought of as an extension of DMD that retains additional terms in the expansion, where these additional terms are determined by the elements of \({\mathcal {D}}\). The quality of the resulting approximation is governed by \({\mathcal {F}}_{{\mathcal {D}}}\) and, therefore, depends upon the choice of \({\mathcal {D}}\). In all the examples that follow, the dictionaries we use will be strict supersets of the dictionary chosen implicitly by DMD. Assuming this “richer” dictionary and using the argument presented in  Tu et al. (2014), it is easy to show that if DMD is able to exactly recover a Koopman mode and eigenvalue from a given set of data, then EDMD will also exactly recover the same Koopman modes and eigenvalues. However, in many applications the tuples of interest cannot be exactly recovered. In these cases, the idea is that the more extensive dictionary used by EDMD will produce better approximations of the leading Koopman tuples because \({\mathcal {F}}_{{\mathcal {D}}}\) is larger and therefore better able to represent the eigenfunctions of interest.

3 The Choice of the Dictionary

As with all spectral methods, the accuracy and rate of convergence of EDMD depend on \({\mathcal {D}}\), whose elements span the subspace of observables, \({\mathcal {F}}_{\mathcal {D}}\subset {\mathcal {F}}\). Possible choices for the elements of \({\mathcal {D}}\) include: polynomials (Boyd 2013), Fourier modes (Trefethen 2000), radial basis functions (Wendland 1999), and spectral elements (Karniadakis and Sherwin 2013), but the optimal choice of basis functions likely depends on both the underlying dynamical system and the sampling strategy used to obtain the data. Any of these sets are, in principle, a useful choice for \({\mathcal {D}}\), though some care must be taken on infinite domains to ensure that any needed inner products will converge.

Table 1 Some commonly used dictionaries of functions and the application where they are, in our experience, best suited

Choosing \({\mathcal {D}}\) for EDMD is, in some cases, more difficult than selecting a set of basis functions for use in a standard spectral method because the domain on which the underlying dynamical system is defined, \({\mathcal {M}}\), and is not necessarily known. Typically, we can define \(\Omega \supset {\mathcal {M}}\) so that it contains all the data in \(\varvec{X}\) and \(\varvec{Y}\); e.g., pick \(\Omega \) to be a “box” in \(\mathbb {R}^N\) that contains every snapshot in \(\varvec{X}\) and \(\varvec{Y}\). Next, we choose the elements of \({\mathcal {D}}\) to be a basis for \(\tilde{{\mathcal {F}}}_{{\mathcal {D}}}\subset \tilde{{\mathcal {F}}}\), where \(\tilde{{\mathcal {F}}}\) is the space of functions that map \(\Omega \rightarrow \mathbb {C}\). Because \({\mathcal {F}}\subset \tilde{{\mathcal {F}}}\), this choice of \({\mathcal {D}}\) can be used in the EDMD procedure, but there is no guarantee that the elements of \({\mathcal {D}}\) form a basis for \({\mathcal {F}}_{{\mathcal {D}}}\) as there may be redundancies. The potential for these redundancies and the numerical issues they generate is why regularization, and hence, the pseudoinverse (Hansen 1990) is required in (12). An example of these redundancies and their effects is given in Appendix.

Although the optimal choice of \({\mathcal {D}}\) is unknown, there are three choices that are broadly applicable in our experience. They are: Hermite polynomials, radial basis functions (RBFs), and discontinuous spectral elements. The Hermite polynomials are the simplest of the three sets and are best suited to problems defined on \(\mathbb {R}^N\) if the data in \(\varvec{X}\) are normally distributed. The observables that comprise \({\mathcal {D}}\) are products of the Hermite polynomials in a single dimension (e.g., \(H_1(x)H_2(y)H_0(z)\), where \(H_i\) is the ith Hermite polynomial and \(\varvec{x} = (x,y,z)\)). This set of basis functions is simple to implement and conceptually related to approximating the Koopman eigenfunctions with a Taylor expansion. Furthermore, because they are orthogonal with respect to Gaussian weights, \(\varvec{G}\) will be diagonal if the \(\varvec{x}_m\) are drawn from a normal distribution, which can be beneficial numerically.

An alternative to the Hermite polynomials is discontinuous spectral elements. To use this set, we define a set of \(B_N\) boxes, \(\{\mathcal {B}_i\}_{i=1}^{B_N}\), such that \(\cup _{i=1}^{B_N}\mathcal {B}_i \supset {\mathcal {M}}\). Then, on each of the \(\mathcal {B}_i\), we define \(K_i\) (suitably transformed) Legendre polynomials. For example, in one dimension, each basis function is of the form

$$\begin{aligned} \psi _{ij}(x) = {\left\{ \begin{array}{ll} L_j(\xi ) &{} \quad x\in \mathcal {B}_i, \\ 0 &{} \quad \text { otherwise}, \end{array}\right. } \end{aligned}$$
(26)

where \(L_j\) is the jth Legendre polynomial and \(\xi \) is x transformed such that the “edges” of the box are at \(\xi = \pm 1\). The advantage of this basis is that \(\varvec{G}\) will be block-diagonal and therefore easy to invert even if a very large number of basis functions are employed.

With a fixed amount of data, an equally difficult task is choosing the \(\mathcal {B}_i\); the number and arrangement of the \(\mathcal {B}_i\) are a balance between span of the basis functions (i.e., h-type convergence), which increases as the number of boxes is increased, and the accuracy of the quadrature rule, which decreases because smaller boxes contain fewer data points. To generate a covering of \({\mathcal {M}}\), we use a method similar to the one used by GAIO (Dellnitz et al. 2001). Initially, all the data (i.e., \(\varvec{X}\) and \(\varvec{Y}\)) are contained within a single user-selected box, \(\mathcal {B}_1^{(0)}\). If this box contains more than a pre-specified number of data points, it is subdivided into \(2^N\) domains of equal Lebesgue measure (e.g., in one dimension, \(\mathcal {B}_1^{(0)} = \mathcal {B}_1^{(1)} \cup \mathcal {B}_2^{(1)}\)). We then proceed recursively: if any of \(\mathcal {B}_i^{(1)}\) contain more than a pre-specified number of points, then they too are subdivided; this proceeds until no box has an “overly large” number of data points. Any \(\mathcal {B}_i^{(j)}\) that do not contain any data points are pruned, which after j iterates leaves the set of subdomains, \(\{\mathcal {B}_i^{(j)}\}\), on which we define the Legendre polynomials. The resulting set of functions are compactly supported and can be evaluated efficiently using \(2^N\) trees, where N is the dimension of a snapshot. Finally, the higher-order polynomials used here allow for more rapid p-type convergence if the eigenfunctions happen to be smooth.

The final choice is a set of radial basis functions (RBFs), which appeal to previous work on “mesh-free” methods (Liu 2010). Because these methods do not require a computational grid or mesh, they are particularly effective for problems where \({\mathcal {M}}\) has what might be called a complex geometry. Many different RBFs could be effective, but one particularly useful set of RBFs are the thin plate splines (Wendland 1999; Belytschko et al. 1996) because they do not require the scaling parameter that other RBFs (e.g., Gaussians) do. However, we still must choose the “centers” about which the RBFs are defined, which we do with k-means clustering (Bishop 2006) with a pre-specified value of k on the combined data set. Although we make no claims of optimality, in our examples, the density of the RBF centers appears to be directly related to the density of data points, which is, intuitively, a reasonable method for distributing the RBF centers as regions with more samples will also have more spatial resolution.

There are, of course, other dictionaries that may prove more effective in other circumstances. For example, basis functions defined in polar coordinates are useful when limit cycles or other periodic orbits are present as they mimic the form of the Koopman eigenfunctions for simple limit cycles (Bagheri 2013). How to choose the best set of functions is an important, yet open, question; fortunately, the EDMD method often produces useful results even with the relatively naïve choices presented in this section and summarized in Table  1.

4 Deterministic Data and the Koopman Eigenfunctions

Most applications of DMD assume that the data sets were generated by a deterministic dynamical system. In Sect. 2, we showed that, as long as the dictionary can accurately approximate the leading Koopman eigenfunctions, EDMD produces an approximation of the Koopman eigenfunctions, eigenvalues, and modes with large amounts of data (the regime in which EDMD is equivalent to a Galerkin method). In this section, we demonstrate that EDMD can produce accurate approximations of the Koopman eigenfunctions, eigenvalues, and modes with relatively limited amounts of data by applying the method to two illustrative examples. The first is a discrete-time linear system, for which the eigenfunctions, eigenvalues, and modes are known analytically, and serves as a test case for the method. The second is the unforced Duffing equation. Our goal there is to demonstrate that the approximate Koopman eigenfunctions obtained via EDMD have the potential to serve as a data-driven parameterization of a system with multiple basins of attraction.

4.1 A Linear Example

4.1.1 The Governing Equation, Data, and Analytically Obtained Eigenfunctions

One system where the Koopman eigenfunctions are known analytically is a simple LTI system of the form

$$\begin{aligned} \varvec{x}(n+1) = \varvec{J} \varvec{x}(n), \end{aligned}$$
(27)

with \(\varvec{x}(n)\in \mathbb {R}^N\) and \(\varvec{J}\in \mathbb {R}^{N\times N}\). It is clear that an eigendecomposition yields complete information about the underlying system provided \(\varvec{J}\) has a complete set of eigenvectors. Because the underlying dynamics are linear, it should not be surprising that the Koopman approach contains the eigendecomposition of \(\varvec{J}\).

To show this, note that the action of the Koopman operator for this problem is

$$\begin{aligned} \mathcal {K}\psi (\varvec{x}) = \psi (\varvec{J}\varvec{x}), \end{aligned}$$
(28)

where \(\psi \in {\mathcal {F}}\). Assuming \(\varvec{J}\) has a complete set of eigenvectors, it will have N left eigenvectors, \(\varvec{w}_i\), that satisfy \(\varvec{w}_i^*\varvec{J} = \mu _j\varvec{w}_i^*\), and where the ith eigenvector is associated with the eigenvalue \(\mu _i\). Then, the function

$$\begin{aligned} \varphi _{n_1, n_2, \ldots , n_N}(\varvec{x}) = \prod _{i=1}^N (\varvec{w}_i^*\varvec{x})^{n_i} \end{aligned}$$
(29)

is an eigenfunction of the Koopman operator with the eigenvalue \(\prod _{i=1}^N \mu _i^{n_i}\) for \(n_i\in \mathbb {N}\). This is a well-known result (see, e.g.,  Rowley et al. 2009), so we show the proof of this only for the sake of completeness. We proceed directly:

$$\begin{aligned} \begin{aligned} \mathcal {K}\varphi _{n_1, n_2, \ldots , n_N}(\varvec{x})&= \prod _{i=1}^N (\varvec{w}_i^*\varvec{J}\varvec{x})^{n_i} = \prod _{i=1}^N \mu _i^{n_i}(\varvec{w}_i^*\varvec{x})^{n_i} \\&= \left( \prod _{i=1}^N \mu _i^{n_i}\right) \varphi _{n_1, n_2, \ldots , n_N}(\varvec{x}), \end{aligned} \end{aligned}$$

by making use of (28) and the definition of \(\varvec{w}_i\) as a left eigenvector. Then, the representation of the full-state observable in terms of the Koopman modes and eigenfunctions (i.e., (2)) is

$$\begin{aligned} \varvec{x} = \sum _{i=1}^N \varvec{v}_i \left( \varvec{w}_i^*\varvec{x}\right) , \end{aligned}$$
(30)

where the ith Koopman mode, \(\varvec{v}_i\), is the ith eigenvector of \(\varvec{J}\) suitably scaled so that \(\varvec{w}_i^*\varvec{v}_i = 1\). This is identical to writing \(\varvec{x}\) in terms of the eigenvectors of \(\varvec{J}\); inner products with the left eigenvectors determine the component in each direction, and the (right) eigenvectors allow the full state to be reconstructed. As a concrete example, consider

$$\begin{aligned} \varvec{x}(n+1) = \begin{bmatrix} 0.9&-0.1 \\ 0.0&0.8 \end{bmatrix}\varvec{x}(n) = \varvec{J}\varvec{x}(n), \end{aligned}$$
(31)

where \(\varvec{x}_{n} = [x_n,y_n]\). From (29), the Koopman eigenfunctions and eigenvalues are

$$\begin{aligned} \varphi _{ij}(x, y) = \left( \frac{x - y}{\sqrt{2}}\right) ^iy^j, \qquad \lambda _{ij} = (0.9)^i(0.8)^j, \end{aligned}$$
(32)

for \(i,j\in \mathbb {Z}\), where the factor of \(1/\sqrt{2}\) was added to normalize the left eigenvectors of \(\varvec{J}\). Figure 2 shows the first 8 nontrivial eigenfunctions sorted by their associated eigenvalue. The zeroth eigenfunction, \(\varphi _{00}(\varvec{x}) = 1\) with \(\mu _{00} = 1\), was omitted because it is always an eigenfunction and will be recovered by EDMD if \(\psi =1\) is included as a dictionary element.

Fig. 2
figure 2

Pseudocolor plots of the first eight Koopman eigenfunctions, as ordered by their corresponding eigenvalue, plotted using their analytical expressions, (32), along with the associated eigenvalue. Note that the eigenfunctions were scaled so that \(\Vert \varphi _i\Vert _\infty =1\) on the domain shown

To apply the EDMD procedure, one needs both data and a dictionary of observables. The data in \(\varvec{X}\) consist of 100 normally distributed initial conditions, \(\varvec{x}_i\), and their images, \(\varvec{y}_i = \varvec{J}\varvec{x}_i\), which we aggregate in the matrix \(\varvec{Y}\), i.e., \(\varvec{X},\varvec{Y}\in \mathbb {R}^{2\times 100}\). The dictionary, \({\mathcal {D}}\), is chosen to contain the direct product of Hermite polynomials in x and in y that include up to the fourth-order terms in x and y, i.e.,

$$\begin{aligned} \begin{aligned} \mathcal {D}&= \{\psi _0, \psi _1, \psi _2, \ldots \} \\&= \begin{matrix} \{H_0(x)H_0(y), &{} H_1(x)H_0(y), &{} H_2(x)H_0(y), &{} H_3(x)H_0(y), &{}H_4(x)H_0(y), \\ H_0(x)H_1(y), &{} H_1(x)H_1(y), &{} H_2(x)H_1(y), &{} H_3(x)H_1(y), &{}H_4(x)H_1(y), \\ H_0(x)H_2(y), &{} H_1(x)H_2(y), &{} H_2(x)H_2(y), &{} H_3(x)H_2(y), &{}H_4(x)H_2(y), \\ H_0(x)H_3(y), &{} H_1(x)H_3(y), &{} H_2(x)H_3(y), &{} H_3(x)H_3(y), &{}H_4(x)H_3(y), \\ H_0(x)H_4(y), &{} H_1(x)H_4(y), &{} H_2(x)H_4(y), &{} H_3(x)H_4(y), &{}H_4(x)H_4(y)\}, \end{matrix} \end{aligned} \ \end{aligned}$$
(33)

so the function \(\psi _i\) is the product of a Hermite polynomial in x of degree \((i \mod 5)\) and a Hermite polynomial in y of degree \(\left\lfloor \frac{i}{5}\right\rfloor \). The Hermite polynomials were chosen because they are orthogonal with respect to the weight function \(\rho (\varvec{x}) = e^{-\Vert \varvec{x}\Vert ^2}\) that is implicit in the normally distributed sampling strategy used here and can also represent the leading Koopman eigenfunctions in this problem.

4.1.2 Results

Figure 3 shows the same eight eigenfunctions computed using the EDMD method. Overall, there is (as expected) excellent quantitative agreement, both in the eigenvalues and in the eigenfunctions, with the analytical results presented in Fig. 2. On the domain shown, the eigenvalues are accurate to ten digits, and the maximum pointwise difference between the true and computed eigenfunction is \(10^{-6}\). In this problem, standard DMD also generates highly accurate approximations of \(\varphi _{10}\) and \(\varphi _{01}\) along with their associated eigenvalues, but will not produce any of the other eigenfunctions; the standard choice of the dictionary contains only linear terms and, therefore, cannot reproduce eigenfunctions with constant terms or any nonlinear terms. As a result, expanding the basis allows EDMD to capture more of the Koopman eigenfunctions than standard DMD could. These additional eigenfunctions are not necessary to reconstruct the full-state observable of an LTI system, but are in principle needed in nonlinear settings.

Fig. 3
figure 3

Pseudocolor plots of the first eight non-trivial eigenfunctions of the Koopman operator, ordered by their corresponding eigenvalue, computed using the EDMD procedure. The eigenfunctions obtained using EDMD were scaled such that \(\Vert \varphi _i\Vert _\infty =1\) on the domain shown. With this scaling, there is excellent agreement between these results and those presented in Fig. 2

This level of accuracy is in large part because the first nine eigenfunctions are in \({\mathcal {F}}_{{\mathcal {D}}}\), the subspace of observables spanned by \({\mathcal {D}}\). When this is not the case, the result is either a missing or erroneous eigenfunction like the examples shown in Fig. 4. The eigenfunction \(\varphi _{50}=\left( \frac{x-y}{\sqrt{2}}\right) ^5\) with \(\mu = 0.9^5 = 0.59049\) is not captured by EDMD with the dictionary chosen here because it lacks the needed fifth-order monomials in x and y, which is similar to how DMD skips the second Koopman eigenfunction due to a lack of quadratic terms.

Fig. 4
figure 4

A subset of the spectrum of the Koopman operator and the EDMD computed approximation. As shown here, there are clearly errors in the spectrum further away from the unit circle (which is not contained in the plotting region). The center plot shows an example of a “missing” eigenfunction that is not captured by EDMD; this eigenfunction is proportional to \((x-y)^5\not \in {\mathcal {F}}_{{\mathcal {D}}}\) and cannot be represented with our basis functions. The right plot shows an example of an “erroneous” eigenfunction that appears because \({\mathcal {F}}_{{\mathcal {D}}}\) is not an invariant subspace of the Koopman operator

The erroneous eigenfunction appears because \({\mathcal {F}}_{{\mathcal {D}}}\) is not invariant with respect to the action of the Koopman operator. In particular, \(\varphi _{50}\) contains the term \(yx^4\) whose image \(\mathcal {K}(yx^4)\not \in {\mathcal {F}}_{{\mathcal {D}}}\) because \(x^5,y^5\not \in {\mathcal {F}}_{{\mathcal {D}}}\). In most applications, there are small components of the eigenfunction that cannot be represented in the dictionary chosen, which results in errors in the eigenfunction such as the one seen here. Even in the limit of infinite data, we would compute the eigenfunctions of \(\mathcal {P}_{{\mathcal {F}}_{\mathcal {D}}}\mathcal {K}\), where \(\mathcal {P}_{{\mathcal {F}}_{\mathcal {D}}}\) is the projection onto \({\mathcal {F}}_{{\mathcal {D}}}\), rather than the eigenfunction of \(\mathcal {K}\). To see that this not a legitimate eigenfunction, we added \(H_5(x)\) and \(H_5(y)\) to \({\mathcal {D}}\), which removes this erroneous eigenfunction.

In our experience, erroneous eigenfunctions tend to appear in one of the two places. Sometimes they will be associated with one of the slowly decaying eigenvalues and in that case are often associated with lower mode energies (i.e., the value of the corresponding Koopman eigenfunction is small when the corresponding Koopman mode is normalized). Otherwise, they also typically form a “cloud” of rapidly decaying eigenvalues, which are ignored as we are primarily only interested in the leading Koopman eigenvalues. One pragmatic method for determining whether or not an eigenvalue is spurious or not is to apply the EDMD procedure to subsets of the data and compare the resulting eigenvalues; while all of the eigenvalues will be perturbed, the erroneous ones tend to have larger fluctuations. Unfortunately, missing eigenvalues are more difficult to detect without prior knowledge, but tend to live in the same part of the complex plane that the erroneous eigenvalues do as both are caused by a lack of dictionary elements. In practice, we take the cloud of erroneous eigenvalues as the cutoff point below which the EDMD method cannot be trusted to produce accurate results.

Finally, we compute the Koopman modes for the full-state observable. Using the ordering of the dictionary elements given in (33), the weight matrix, \(\varvec{B}\) in (16), needed to compute the Koopman modes for the full-state observable, \(\varvec{g}(\varvec{x})^T = [x, y]\) is

$$\begin{aligned} \varvec{B} = \begin{bmatrix} 0&0 \\ 1&0 \\ 0&0 \\ 0&0 \\ 0&0 \\ 0&1 \\ 0&0 \\ \vdots&\vdots \end{bmatrix}. \end{aligned}$$
(34)

This \(\varvec{B}\) in conjunction with (20) yields a numerical approximation of a Koopman mode for each of the Koopman eigenvalues computed previously. The Koopman modes associated with \(\mu _{10} = 0.9\) is \(\varvec{v}_{10} = [0, -\sqrt{2}]^T\), while the Koopman mode associated with \(\mu _{01} = 0.8\) is \(\varvec{v}_{01} = [-1, -1]^T\); again, these are the eigenvectors of \(\varvec{J}\). The contribution of the other numerically computed eigenfunctions in reconstructing the full-state observable is negligible (i.e., \(\Vert \varvec{v}_k\Vert \approx 10^{-11}\) for \(k\ne 1,3\)), so the Koopman/EDMD analysis is an eigenvalue/eigenvector decomposition once numerical errors are taken into consideration.

Although EDMD reveals a richer set of Koopman eigenfunctions that are analytically known to exist, their associated Koopman modes are zero, and hence, they can be neglected. Our goal in presenting this example is not to demonstrate any new phenomenon, but rather to demonstrate that there is good quantitative agreement between the analytically obtained Koopman modes, eigenvalues, and eigenfunctions and the approximations produced by EDMD. Furthermore, it allowed us to highlight the types of errors that appear when \({\mathcal {F}}_{{\mathcal {D}}}\) is not an invariant subspace of \(\mathcal {K}\), which results in erroneous eigenfunctions, or when the dictionary is missing elements, which results in missing eigenfunctions.

4.2 The Duffing Equation

In this section, we will compute the Koopman eigenfunctions for the unforced Duffing equation, which for the parameter regime of interest here has two stable spirals and a saddle point whose stable manifold defines the boundary between the basins of attraction. Following  Mauroy et al. (2013) and the references contained therein, the eigenvalues of the linearizations about the fixed points in the system are known to be a subset of the Koopman eigenvalues, and for each stable spiral, the magnitude and phase of the associated Koopman eigenfunction parameterizes the relevant basin of attraction. Additionally, because basins of attraction are forward invariant sets, there will be two eigenfunctions with \(\mu = 0\), each of which is supported on one of the two basins of attraction in this system (or, equivalently, there will be a trivial eigenfunction and another eigenfunction with \(\mu =0\) whose level sets denote the basins of attraction). Ultimately, we are not interested in recovering highly accurate eigenfunctions in this example. Instead, we will demonstrate that the eigenfunctions computed by EDMD are accurate enough that they can be used to identify and parameterize the basins of attraction that are present in this problem for the region of interest.

The governing equations for the unforced Duffing equation are

$$\begin{aligned} \ddot{x} = -\delta \dot{x} - x(\beta + \alpha x^2), \end{aligned}$$
(35)

which we will study using the parameters \(\delta = 0.5\), \(\beta = -1\), and \(\alpha = 1\). In this regime, there are two stable spirals at \(x = \pm 1\) with \(\dot{x} = 0\), and a saddle at \(x,\dot{x} = 0\), so almost every initial condition, except for those on the stable manifold of the saddle, is drawn to one of the spirals. Despite the existence of an unstable equilibrium, the continuous-time Koopman eigenvalues for this system that are required to reconstruct the full-state observable are all non-positive, which can intuitively be understood by considering the value of \(\lim _{t\rightarrow \infty } \varvec{e}_i^*(\varvec{x}(t))\) for the ith unit vector \(\varvec{e}_i\). For any initial condition \(\varvec{x}(0)\), the values of these observables will relax to either 0 or \(\pm 1\) depending on i and the initial condition. This relaxation implies that the eigenvalues used in the expansion in (6) should not have positive real parts, otherwise exponential growth would be observed.

This is shown in Gaspard and Tasaki (2001) and Gaspard et al. (1995) for the pitchfork and Hopf bifurcations, where the eigenfunctions and eigendistributions can be obtained analytically. Unlike those examples, we do not have analytical expressions for the eigenfunctions (or eigendistributions) here, but we will appeal to the same intuition. In particular, we expect EDMD to approximate the Koopman eigenfunctions associated with the attractors at \((\pm 1, 0)\) because those quantities, at least away from the saddle, can lie in or near the subspace spanned by our dictionary. There will also be eigenvalues associated with the unstable equilibrium at the origin, but they are paired with eigendistributions [e.g., Dirac delta distributions and their derivatives (Gaspard and Tasaki 2001; Gaspard et al. 1995)], which are not in the span of any dictionary we will choose and therefore could not be recovered even if we had a large quantity of data near that equilibrium.

Therefore, we use a set of data that consists of \(10^3\) trajectories with 11 samples each with a sampling interval of \(\Delta t = 0.25\) (i.e., \(\varvec{X},\varvec{Y}\in \mathbb {R}^{2\times 10^4}\)), and initial conditions uniformly distributed over \(x,\dot{x}\in [-2, 2]\). With this sampling rate and initialization scheme, many trajectories will approach the stable spirals, but few will have (to numerical precision) reached the fixed points. As a result, the basins of attraction cannot be determined by observing the last snapshot in a given trajectory. Instead, EDMD will be used to “stitch” together this ensemble of trajectories to form a single coherent picture.

However, because there are multiple basins of attraction, the leading eigenfunctions appear to be discontinuous (Mauroy et al. 2013) and supported only on the appropriate basin of attraction. In principle, our computation could be done “all at once” using a single \({\mathcal {D}}\) and applying EDMD to the complete data set. To enforce the compactly supported nature of the eigenfunctions regardless of which dictionary we use, we will proceed in a two-tiered fashion. First, the basins of attraction will be identified using all of the data and a dictionary with support everywhere we have data. Once we have identified these basins, both state space and the data will be partitioned into subdomains based on the numerically identified basins. The EDMD procedure will then be run on each subdomain and the corresponding partitioned data set individually.

4.2.1 Locating Basins of Attraction

Figure 5 highlights the first step: partitioning of state space into basins of attraction. We used a dictionary consisting of the constant function and 1,000 radial basis functions (RBFs) (the thin plate splines described in Sect. 3), where k-means clustering (Bishop 2006) on the full data set was used to choose the RBF centers. RBFs were chosen here because of the geometry of the computational domain; indeed, RBFs are often a fundamental component of “mesh-free” methods that avoid the non-trivial task of generating a computational mesh (Liu 2010).

Fig. 5
figure 5

(left) A plot of the first non-trivial eigenfunction generated with an eigenvalue of \(\lambda =-10^{-3}\) obtained from \(10^3\) randomly initialized trajectories each consisting of ten steps taken with \(\Delta t = 0.25\) This eigenfunction should be constant in each basin of attraction and have \(\lambda = 0\), so EDMD is generating an approximation of the eigenfunction rather than a fully converged solution. (center) The numerically computed eigenfunction on the left evaluated at the points where we had data. (right) A plot of the misclassified points (less than 0.5 % of the data); each of these points lies (to the eye) on the boundary between the invariant sets

The leading (continuous-time) eigenvalue is \(\lambda _0 = -10^{-14}\) which corresponds to the constant function. The second eigenfunction, shown in the leftmost image of Fig. 5, has \(\lambda _1 = -10^{-3}\), which should be considered an approximation of zero. The discrepancy between the numerically computed eigenfunction and the theoretical one is due to the choice of the dictionary. The analytical eigenfunction possesses a discontinuity on the edge of the basin of attraction (i.e., the stable manifold of the saddle point at the origin), but discontinuous functions are not in the space spanned by RBFs. Therefore, the numerically computed approximation “blurs” this edge as shown in Fig. 5.

The scatter plot in the center of Fig. 5 shows the data points colored by the first non-trivial eigenfunction. There is good qualitative agreement between the numerically computed basin of attraction and the actual basin. By computing the mean value of \(\varphi _1\) on the data and using that value as the threshold that determines which basin of attraction a point belongs to, the EDMD approach misclassifies only 46 of the \(10^4\) data points, resulting in an error of only 0.5 % as shown by the rightmost plot. As a result, the leading eigenfunctions computed by EDMD are sufficiently accurate to produce a meaningful partition of the data.

4.2.2 Parameterizing a Basin of Attraction

Now that the basins of attraction have been identified, the next task is to develop a coordinate system or parameterization of the individual basins. To do so, we will use the eigenfunctions associated with the eigenvalues of the system linearization about the corresponding fixed point. Because these fixed points are spirals, this parameterization can be realized using the amplitude and phase of one member of the complex-conjugate pair of eigenfunctions. To approximate these eigenfunctions, we first partition our data into two sets as mentioned above using the leading Koopman eigenfunctions (including the misclassified data points). On each subset of the data, the k-means procedure was run again to select a new set of 1000 RBF centers, and this “adjusted” basis along with the constant function comprised the \({\mathcal {D}}\) used by EDMD. Figure 6 shows the amplitude and phase of the eigenfunction with eigenvalue closest to \(-0.25 + 1.387\imath \) computed using the data in each basin of attraction. The computed eigenvalues agree favorably with the analytically obtained eigenvalue; the basin of the spiral at (1, 0) has the eigenvalue \(-0.237 + 1.387\imath \), and the basin of the spiral at \((-1, 0)\) has the eigenvalue \(-0.24 + 1.35\imath \).

Fig. 6
figure 6

(top) Amplitude and phase of the Koopman eigenfunction with \(\lambda = -0.237 + 1.387i\) (analytically, \(-0.250 + 1.392i\)) for the stable spiral at (1, 0). (bottom) The same pair of plots for the spiral at \((-1, 0)\). In all four images, the data points are colored by value of the eigenfunction. Near the equilibria, the level sets of the eigenfunctions are included as a guide for the eye. Although there are errors near the “edge” of the basin of attraction, the amplitude and phase of this eigenfunction can serve as a polar coordinate system for the corresponding basin of attraction

Figure 6 demonstrates that the amplitude and phase of a Koopman eigenfunction forms something analogous to an “action–angle” parameterization of the basin of attraction. Due to the nonlinearity in the Duffing oscillator, this parameterization is more complicated than an appropriately shifted polar coordinate system and is, therefore, not the parameterization that would be generated by linearization about either \((\pm 1, 0)\). The level sets of the amplitude of this eigenfunction are the so-called isostables (Mauroy et al. 2013). One feature predicted in that manuscript is that the 0-level set of the isostable is the fixed point in the basin of attraction; this feature is reflected in Fig. 6 by the blue region, which corresponds to small values of the eigenfunction that are near the fixed points at \((\pm 1, 0)\). Additionally, a singularity in the phase can be observed there. The EDMD approach produces noticeable numerical errors near the edges of the basin. These errors can be due to a lack of data or due to the singularities in the eigenfunctions that can occur at unstable fixed points (Mauroy and Mezic 2013).

In this section, we applied the EDMD procedure to deterministic systems and showed that it produces an approximation of the Koopman operator. With a sensible choice of data and \({\mathcal {D}}\), we showed that EDMD generates a quantitatively accurate approximation of the Koopman eigenvalues, eigenfunctions, and modes for the linear example. In the second example, we used the Koopman eigenfunctions to identify and parameterize the basins of attraction of the Duffing equation. Although the EDMD approximation of the eigenfunctions could be made more accurate with more data, it is still accurate enough to serve as an effective parameterization. As a result, the EDMD method can be useful with limited quantities of data and should be considered an enabling technology for data-driven approximations of the Koopman eigenvalues, eigenfunction, and modes.

5 Stochastic Data and the Kolmogorov Backward Equation

The EDMD approach is entirely data driven and will produce an output regardless of the nature of the data given to it. However, if the results of EDMD are to be meaningful, then certain assumptions must be made about the dynamical system that produced the data used. In the previous section, it was assumed that the data were generated by a deterministic dynamical system; as a result, EDMD produced approximations of the tuples of Koopman eigenfunctions, eigenvalues, and modes.

Another interesting case to consider is when the underlying dynamical system is a Markov process, such as a stochastic differential equation (SDE). For such systems, the evolution of an observable is governed by the Kolmogorov backward (KB) equation [33], whose “right-hand side” has been called the “stochastic Koopman operator” (SKO) (Mezić 2005). In this section, we will show that EDMD produces approximations of the eigenfunctions, eigenvalues, and modes of the SKO if the underlying dynamical system happens to be a Markov process.

To accomplish this, we will prove that the EDMD method converges to a Galerkin method in the large-data limit. After that, we will demonstrate its accuracy with finite amounts of data by applying it to the model problem of a one-dimensional SDE with a double-well potential, where the SKO eigenfunctions can be computed using standard numerical methods.

Another proposed application of the Koopman operator is for the purposes of model reduction, which was first explored in  Mezić (2005) and later work such as  Froyland et al. (2014). Model reduction based on the Koopman eigenfunctions is equally applicable in both deterministic and stochastic settings, but we choose to present it for stochastic systems to highlight the similarities between EDMD and manifold learning techniques such as diffusion maps (Nadler et al. 2005; Coifman and Lafon 2006). In particular, we apply EDMD to an SDE defined on a “Swiss roll,” which is a nonlinear manifold often used to test manifold learning methods (Lee and Verleysen 2007). The purpose of this example is twofold: First, we show that a data-driven parameterization of the Swiss roll can be obtained using EDMD, and second, we show that this parameterization will preferentially capture “slow” dynamics on that manifold before the “fast” dynamics when the noise is made anisotropic.

5.1 EDMD with Stochastic Data

For a discrete-time Markov process,

$$\begin{aligned} \varvec{x} \mapsto \varvec{F}(\varvec{x};\varvec{\omega }), \end{aligned}$$

the SKO (Mezić 2005) is defined as

$$\begin{aligned} (\mathcal {K}\psi )(\varvec{x})={\mathbb {E}}[\psi (\varvec{F}(\varvec{x};\varvec{\omega }))], \end{aligned}$$
(36)

where \(\varvec{\omega }\in \Omega _s\) is an element in the probability space associated with the stochastic dynamics (\(\Omega _s\)) and probability measure P, \({\mathbb {E}}\) denotes the expected value over that space, and \(\psi \in {\mathcal {F}}\) is a scalar observable. The SKO  (Mezić 2005) takes an observable of the full system state and returns the conditional expectation of the observable “one timestep in the future.” Note that this definition is compatible with the deterministic Koopman operator because \({\mathbb {E}}[\psi (\varvec{F}(\varvec{x}))]=\psi (\varvec{F}(\varvec{x}))\) if \(\varvec{F}\) is deterministic.

As with the deterministic case, we assume the snapshots in \(\varvec{X}\in \mathbb {R}^{N\times M}\) were generated by randomly placing initial conditions on \({\mathcal {M}}\) with the density of \(\rho (\varvec{x})\) and that M is sufficiently large. Once again, \(\rho \) does not need to be an invariant measure of the underlying dynamical system; it is simply the sampling density of the data. Due to the stochastic nature of the system, there are two probability spaces involved: one related to the samples in \(\varvec{X}\) and another for the stochastic dynamics. Because our system has “process” rather than “measurement” noise, the \(\varvec{x}_i\) are known exactly, and the interpretation of the Gram matrix, \(\varvec{G}\), remains unchanged. Therefore,

$$\begin{aligned} \lim _{M\rightarrow \infty } {\varvec{G}}_ {i,j}=\int _{{\mathcal {M}}}\psi _{i}^*(\varvec{x})\psi _{j}(\varvec{x})\rho (\varvec{x})\mathrm{d}\varvec{x}=\left\langle \psi _{i},\psi _{j}\right\rangle _{\rho }, \end{aligned}$$

by the law of large numbers when M is large enough. This is identical to the deterministic case. However, the definition of \(\varvec{A}\) will change. Assuming that the choice of \(\varvec{\omega }\) and \(\varvec{x}\) are independent,

$$\begin{aligned} \begin{aligned} \lim _{M\rightarrow \infty }\varvec{A}_{i,j}&={\mathbb {E}}[\psi _{i}^*(\mathcal {K}\psi _{j})] =\int _{{\mathcal {M}}\times \Omega _{s}}\psi _{i}^*(\varvec{x})\psi _{j}(\varvec{F}(\varvec{x},\varvec{\omega }))\rho (\varvec{x})_{}\; \mathrm{d}\varvec{x}\mathrm{d}P(\varvec{\omega }) \\&=\int _{{\mathcal {M}}}\psi _{i}^*(\varvec{x}){\mathbb {E}}[\psi _{j}(\varvec{F}(\varvec{x},\varvec{\omega }))]\rho (\varvec{x})\, \mathrm{d}\varvec{x} =\left\langle \psi _{i},\mathcal {K}\psi _{j}\right\rangle _{\rho }. \end{aligned}. \end{aligned}$$

The elements of \(\varvec{A}\) now contain a second integral over the probability space that pertains to the stochastic dynamics, which produces the expectation of the observable in the expression above.

The accuracy of the resulting method will depend on the dictionary, \({\mathcal {D}}\), the manifold on which the dynamical system is defined, the data, and the dynamics used to generate it. One interesting special case is if the basis functions are indicator functions supported on “boxes.” When this is the case, EDMD is equivalent to the widely used Ulam Galerkin method (Bollt and Santitissadeekorn 2013; Dellnitz et al. 2001). This equivalence is lost for other choices of \({\mathcal {D}}\) and \(\rho \), but as we will demonstrate in the subsequent sections, EDMD can produce accurate approximations of the eigenfunctions for many other choices of these quantities.

The “stochastic Koopman modes” can then be computed using (20), but they too must be reinterpreted as the weights needed to reconstruct the expected value of the full-state observable using the eigenfunctions of the SKO. Due to the stochastic nature of the dynamics, the Koopman modes can no longer exactly specify the state of the system. However, they can be used as approximations of the Koopman modes that would be obtained in the “noise-free” limit when some appropriate restrictions are placed on the nature of the noise and the underlying dynamical system. Indeed, these are the modes we are truly computing when we apply DMD or EDMD to experimental data, which by its very nature contains some noise.

5.2 A Stochastic Differential Equation with a Double-Well Potential

In this section, we will show that the EDMD procedure is capable of accurately approximating the eigenfunctions of the stochastic Koopman operator by applying it to an SDE with a double-well potential. Although we do not have analytical solutions for the eigenfunctions, the problem is simple enough that we can accurately compute them using standard numerical methods.

5.2.1 The Double-Well Problem and Data

First, consider an SDE with a double-well potential. Let the governing equations for this system be

$$\begin{aligned} \mathrm{d}x = -\nabla U(x) dt + \sigma \mathrm{d} W_t, \end{aligned}$$
(37)

where x is the state, \(-\nabla U(x)\) the drift, \( \sigma \) is the (constant) the diffusion coefficient, and \(W_t\) is a Wiener process. Furthermore, no-flux boundary conditions are imposed at \(x = \pm 1\). For this problem, we let \(U(x) = -2 (x^2 - 1)^2x^2\) as shown in Fig. 7.

Fig. 7
figure 7

Double-well potential \(U(x) = -2(x^2 - 1)^2x^2\)

The Fokker–Planck equation associated with this SDE is

$$\begin{aligned} \frac{\partial \rho (x,t)}{\partial t} = - \frac{\partial }{\partial x}\left( -\frac{\partial U}{\partial x} \rho (x,t)\right) + \frac{\sigma ^2}{2}\frac{\partial ^2\rho (x,t)}{\partial x^2} = \mathcal {P}\rho , \end{aligned}$$
(38)

where \(\rho \) is a probability density with \(\partial _x\rho (x,t)\big |_{x=\pm 1} = 0\) due to the no-flux boundary conditions we impose, and \(\hat{\mathcal {P}}\) is the infinitesimal generator of the semigroup of Perron–Frobenius operators. The adjoint of the Perron–Frobenius operator determines the Kolmogorov backward equation and thus defines the generator of the semigroup of stochastic Koopman operators, \(\hat{\mathcal {K}} = \hat{\mathcal {P}}^\dagger \), which we refer to as the continuous-time stochastic Koopman operator. In the remainder of this section, we focus on the continuous-time SKO rather than on its discrete-time analog like we did in Sect. 2. For this example,

$$\begin{aligned} \hat{\mathcal {K}}\psi = -\frac{\partial U}{\partial x}\frac{\partial \psi }{\partial x} + \frac{\sigma ^2}{2}\frac{\partial ^2 \psi }{\partial x^2} \end{aligned}$$
(39)

with Neumann boundary conditions, \(\partial _x\psi \big |_{x=\pm 1} = 0\). To directly approximate the Koopman eigenfunctions, (39) is discretized in space using a second-order finite-difference scheme with 1024 interior points. The eigenvalues and eigenvectors of the resulting finite-dimensional approximation of the Koopman operator will be used to validate the EDMD computations.

The data are \(10^6\) initial points on \(x\in [-1, 1]\) drawn from a uniform distribution, which constitute \(\varvec{X}\), and their positions after \(\Delta t = 0.1\), which constitute \(\varvec{Y}\). The evolution of each initial condition was accomplished through \(10^2\) steps of the Euler–Maruyama method (Higham 2001; Kloeden and Platen 1992) with a timestep of \(10^{-3}\) using the double-well potential in Fig. 7. Note that only the initial and final points in this trajectory were retained, so we have access to \(10^6\) and not \(10^8\) snapshot pairs. The dictionary chosen is a discontinuous spectral element basis that splits \(x\in [-1,1]\) into four equally sized subdomains with up to tenth-order Legendre polynomials on each subdomain (see Sect. 3) for a total of forty degrees of freedom.

5.2.2 Recovering the Koopman Eigenfunctions and Eigenvalues

Because the Koopman operator is infinite dimensional, we will clearly be unable to approximate all of the tuples. Instead, we focus on the leading (i.e., most slowly decaying) tuples, which govern the long-term dynamics of the underlying system. In this example, we seek to demonstrate that our approximation is: (a) quantitatively accurate and (b) valid over a range of coefficients, \(\sigma \), and not solely in the small (or large) noise limits.

Figure 8 shows the first six eigenvalues obtained using a finite-difference discretization of the Koopman operator and the EDMD approximation as a function of \(\sigma \). In this problem, the constant function \(\varphi _0=1\) is always an eigenfunction of the Koopman operator with \(\lambda _0 = 0\) for all values of \(\sigma \). Because \({\mathcal {D}}\) contains the constant function, it should be no surprise that the EDMD method is able to identify it as an eigenfunction. Although trivial, the existence of this eigenfunction is a “sanity check” for the method.

Fig. 8
figure 8

(left) First six eigenvalues of the stochastic Koopman operator obtained with a finite-difference discretization of \(\mathcal {K}\). (right) The first six eigenvalues obtained with the EDMD approach. In both plots, a marker is placed every tenth data point. While there is good quantitative agreement between the true eigenvalues and those obtained with EDMD, some small “noise” due to quadrature errors does appear in the right plot as \(\sigma \rightarrow 0\)

More interesting is the first non-trivial eigenfunction which has the eigenvalue \(\lambda _1\). The change in \(\varphi _1\) as a function of \(\sigma \) is shown in Fig. 9. As with the eigenvalue, there is good agreement between EDMD and the directly computed eigenfunctions at different values of \(\sigma \). For all values of \(\sigma \), \(\varphi _1\) is an odd function; what changes is how rapidly \(\varphi _1\) transitions from its maximum to its minimum. When \(\sigma \) is small, this transition is rapid, and \(\varphi _1\) will approach a step function as \(\sigma \rightarrow 0\). When \(\sigma \) grows, this eigenfunction is “smoothed out” and the transition becomes slower. In the limit as \(\sigma \rightarrow \infty \), the dynamics of the problem are dominated by the diffusion term, and \(\varphi _1\) will be proportional to \(\cos (\pi x/L)\) as is implied by the rightmost plot in the figure.

Fig. 9
figure 9

A comparison of the leading non-trivial eigenfunction computed with the finite-difference method and EDMD for: a \(\sigma = 0.2\), b \(\sigma = 0.5\), and c \(\sigma = 1.0\). As with the eigenvalues, there is excellent agreement between the “true” and data-driven approximations of this eigenfunction, though there are small quantitative differences in the approximation

In many system-identification algorithms [e.g.,  Juang (1994)], one often constructs deterministic governing equations from inherently stochastic data (either due to measurement or due to process noise). Similarly, methods like DMD have been applied to noisy sets of data to produce an approximation of the Koopman modes and eigenvalues with the assumption that the underlying system is deterministic. In this example, this is equivalent to using the output of EDMD with data taken with \(0<\sigma \ll 1\) as an approximation of the Koopman tuples that would be obtained with \(\sigma = 0\).

For certain tuples, this is a reasonable approach. Taking \(\sigma \rightarrow 0\), \(\lambda _3\) and \(\lambda _4\) and \(\varphi _3\) and \(\varphi _4\) are good approximations of their deterministic counterparts. In particular, \(\varphi _3\) and \(\varphi _4\) are one-to-one with their associated basin of attraction and appear to possess a zero at the stable fixed point. However, these approximate eigenfunctions lack some important features such as a singularity at \(x =0\) that occurs due to the unstable fixed point there. Therefore, both eigenfunctions are good approximations of their \(\sigma = 0\) counterparts, but cannot be “trusted” in the vicinity of an unstable fixed point.

For other tuples, even a small amounts of noise can be important. Consider the “slowest” nonzero eigenvalue, \(\lambda _2\), which appears to approach \(-4\) as \(\sigma \rightarrow 0\), but is not obtained by the EDMD method when \(\sigma = 0\). Formally, the existence of an eigenvalue of \(-4\) is not surprising. The fixed point at \(x=0\) is unstable with \(\lambda = 4\), and in continuous time, if (\(\lambda _n\), \(\varphi _n\)) is an eigenvalue/eigenfunction pair then (\(k\lambda _n\), \(\varphi ^k_n\)) is, at least formally, an eigenvalue/eigenfunction pair for any scalar k. Using an argument similar to  Matkowsky and Schuss (1981), it can be shown that \(\varphi _2(x) = C_0\exp (-4x^2/\sigma ^2) + \mathcal {O}(\sigma ^2)\) as \(\sigma \rightarrow 0\) where \(C_0\) is chosen to normalize \(\varphi _2\). However, this approaches a delta function as \(\sigma \rightarrow 0\) and therefore leaves the subspace of observables spanned by our dictionary. When this occurs, this tuple appears to “vanish,” which is why it does not appear in the \(\sigma = 0\) limit. As a result, when applying methods like EDMD or DMD to noisy data, the spectrum of the finite-dimensional approximation is not necessarily a good approximation of the spectrum that would be obtained with noise-free data. Some of the tuples, such as those containing \(\varphi _1\), \(\varphi _3\), and \(\varphi _4\), have eigenvalues that closely approximate the ones found in the deterministic problem. However, others such as the tuple containing \(\varphi _2\) do not. Furthermore, the only method to determine that \(\lambda _2\) can be neglected is by directly examining the eigenfunction. As a result, when we apply methods like DMD/EDMD to noisy data with the purpose of using the spectrum to determine the timescales and behaviors of the underlying system, we must keep in mind that not all of the eigenvalues obtained with noisy data will be present if “clean” data are used instead.

5.2.3 Rate of Convergence

Among other things, the performance of the EDMD method is dependent upon the number of snapshots provided to it, the distribution of the data, the underlying dynamical system, and the dictionary. In this section, we examine the convergence of EDMD to a Galerkin method as the number of snapshots increases in order to provide some intuition about the “usefulness” of the eigenfunctions obtained without an exhaustive amount of data. To do so, we generated a larger set of data consisting of \(10^7\) initial conditions chosen from a spatially uniform distribution for the case with \(\sigma = 1\). Each initial condition was propagated using the Euler–Maruyama method described in the previous section, but only the initial and terminal states were retained. Then, we applied EDMD using the same dictionary to subsets of the data, computed the leading nontrivial eigenvalue and eigenfunction, and compared the results to the “true” leading eigenfunction and eigenvalue computed using a finite-difference approximation of the stochastic Koopman operator.

Figure 10 shows the convergence of the leading non-trivial eigenvalue and eigenfunction as a function of the number of snapshots, M. In the rightmost plot, we define the error as \(\Vert \varphi _{1,\text {EDMD}} - \varphi _{1,\text {True}}\Vert \) after both eigenfunctions have been normalized so that \(\Vert \varphi _{1,\text {EDMD}}\Vert _2 = \Vert \varphi _{1,\text {True}}\Vert _2\). As expected, EDMD is inaccurate when M is small (here, \(M<100\)); there is not enough data to accurately approximate the scalar products. For \(M > 10^3\), the eigenfunction produced by EDMD have the right shape, and the eigenvalue is approaching its true value. For \(M > 10^4\), there is no “visible” difference in the leading eigenvalue, and the error in the leading eigenfunction is less than \(10^{-3}\).

Fig. 10
figure 10

(left) Plot of the first non-trivial eigenvalue as a function of the number of data points with \(\sigma = 1\). The red dashed line denotes the “exact” value computed using direct numerical methods. Although the EDMD approximation is poor with \(M<100\), it quickly becomes more accurate as M is increased. (right) Plot of the error, i.e., \(\Vert \varphi _{1,\text {EDMD}} - \varphi _{1,\text {True}}\Vert \), as a function of M. Because the scalar products in (12) are evaluated using Monte Carlo integration, the method converges like \(\mathcal {O}(M^{-1/2})\) as shown by the fit on the right (Color figure online)

To quantify the rate of convergence, we fit a line to the plot of error versus M in the right panel of Fig. 10. As expected, EDMD converges like \(M^{-0.49}\), which is very close to the predicted value of \(\mathcal {O}(M^{-0.5})\) associated with Monte Carlo integration. Because this problem is stochastic, we cannot increase the rate of convergence by uniform sampling (the integral over the probability space associated with the stochastic dynamics will still converge like \(\mathcal {O}(M^{-1/2})\)), even though that is a simple method for enhancing the rate of convergence for deterministic problems.

To provide some intuition about what the eigenfunctions with \(\sigma =1\) look like, Fig. 11 plots the leading nontrivial eigenfunction for \(M=1623\), 14384, and 1128837. Ultimately, the purpose of EDMD is to produce effective approximations of the leading Koopman eigenfunctions. As shown in this subsection, EDMD is qualitatively accurate even at the smallest values of M, but there are clearly some numerical issues at the edges of the domain and near \(x=0\) where the discontinuities in the numerically computed eigenfunctions can occur with our choice of dictionary. To obtain a more quantitatively accurate solution, additional data points are required. When \(M=14384\), the numerical issues at the boundaries and the discontinuity at \(x=0\) have diminished. As shown in the plot with \(M = 1128837\), this process continues until the EDMD eigenfunction is visually identical to the true eigenfunction. In principle, highly accurate approximations of the leading Koopman eigenfunctions using EDMD are possible, but because EDMD for stochastic systems is a Monte Carlo method, the relatively slow rate of convergence may make the amount of data required to obtain this level of accuracy infeasible in practice.

Fig. 11
figure 11

Comparison of the EDMD approximation of the leading non-trivial Koopman eigenfunction with \(\sigma =1\) for \(M=1623\), 14384, and 1128837 with the “exact” solution obtained using direct numerical methods. The EDMD solution is similar to the exact solution even when \(M\sim 10^3\), but we would not characterize it as quantitatively accurate until \(M > 10^4\). Beyond \(M>10^5\), EDMD and the exact solution are visually identical

5.3 Parameterizing Nonlinear Manifolds and Reducing Stochastic Dynamics

In this section, we will briefly demonstrate how the EDMD method can be used to parameterize nonlinear manifolds and reduce stochastic differential equations defined on those manifolds. Everything done here could also be done for a deterministic system; we chose to use an SDE rather than an ODE only to highlight the similarities between EDMD and nonlinear manifold learning techniques such as diffusion maps, and not because of any restriction on the Koopman approach. We proceed in two steps: First, we will show that data from an SDE defined on the Swiss roll, which is a nonlinear manifold often used as a test of nonlinear manifold learning techniques (Nadler et al. 2005, 2006; Coifman and Lafon 2006; Lee and Verleysen 2007); in conjunction with the EDMD procedure can generate a data-driven parameterization of that manifold. For this first example, isotropic diffusion is used, so there is no “fast” or “slow” solution component that can be meaningfully neglected. Instead, we will show that the leading eigenfunctions are one-to-one with the “length” and “width” of the Swiss roll. Then, we alter the SDE and introduce a “fast” component by making the diffusion term anisotropic. In this case, EDMD will “pick” the slower components before the faster ones. We should stress that in this application, EDMD could be used as a replacement for (rather than in conjunction with) other methods such as diffusion maps (DMAPs) and its variants (Nadler et al. 2005, 2006; Coifman and Lafon 2006; Dsilva et al. 2013). Although there are advantages to combining diffusion maps and Koopman (Budisic and Mezic 2012), exploration of those approaches is beyond the scope of this manuscript.

5.3.1 Parameterizing a Nonlinear Manifold with a Diffusion Process

For this example, the data are generated by a diffusion process on a rectangular domain,

$$\begin{aligned} \mathrm{d}\varvec{s}=2\mathrm{d}\varvec{W}_\mathrm{t}, \end{aligned}$$
(40)

where \(\varvec{s}=(s_1,s_2)\) is the state and \(\varvec{W}_\mathrm{t}\) is a two-dimensional Wiener process with \(s_1\in [0,3\pi ]\) and \(s_2\in [0,2\pi ]\). No-flux boundary conditions are imposed at the domain edges. If one had access to the true variables, SKO could be written as

$$\begin{aligned} \mathcal {K}\psi = 2\partial _{s_1}^2\psi + 2\partial _{s_2}^2\psi , \end{aligned}$$
(41)

also with no-flux boundary conditions; in this particular problem, the SKO is self-adjoint and therefore equivalent to the Perron–Frobenius operator. The eigenfunctions are \(\varphi _{ij} = \cos \left( is_1/3\right) \cos \left( js_2/2\right) \) with the eigenvalues \(\lambda _{ij} = -2(i^2/9+j^2/4)\). Note that the leading eigenfunctions, \(\varphi _{1,0}\) and \(\varphi _{0,1}\), are \(\cos \left( s_1/3\right) \) and \(\cos \left( s_2/2\right) \), which are one-to-one with \(s_1\) and \(s_2\) on \([0, 3\pi ]\) and \([0, 2\pi ]\), respectively, and could be used to parameterize state space if \(s_1\) and \(s_2\) were not known.

In this example, the true data (i.e., the state expressed in terms of \(s_1\) and \(s_2\)) on the rectangle are mapped onto a “Swiss roll” via the transformation

$$\begin{aligned} \varvec{g}(\varvec{s)}=\begin{bmatrix}(s_1+0.1)\cos (s_1)\\ s_2\\ (s_1+0.1)\sin (s_1) \end{bmatrix}, \end{aligned}$$
(42)

which, among other things, has introduced a new spatial dimension. In all that follows, the EDMD approach is applied to the three-dimensional, transformed variables and not the two-dimensional, true variables. Our objective here is to determine a 2-parameter description of what initially appears to be three-dimensional data, directly from the data.

The data given to EDMD were generated by \(10^4\) initial conditions uniformly distributed in \(s_1\) and \(s_2\) that were evolved for a total time of \(\Delta t = 0.1\) using the Euler–Maruyama method with 100 timesteps. Then, both the initial and terminal states of the system were mapped into three dimensions using (42). Next, a dictionary must be defined. However, \({\mathcal {M}}\) is unknown (indeed, parameterizing \({\mathcal {M}}\) is the entire point), so the domain \(\Omega \) is taken to be the “box” in \(\mathbb {R}^3\) such that \(x\in [-3\pi -0.1, 3\pi +0.1]\), \(y\in [0, 2\pi ]\) and \(z\in [-3\pi -0.1, 3\pi +0.1]\). In this larger domain, the spectral element basis consisting of 4096 rectangular subdomains (16 each in x, y, and z) with up to linear polynomials in each subdomain is employed. Because \({\mathcal {M}}\subset \Omega \), extraneous and redundant functions are expected, and \(\varvec{G}\) is often ill conditioned.

Figure 12 shows the transformed data colored by the first and second non-trivial eigenfunctions of the Koopman operator. Unlike many of the previous examples, there are clear differences in the analytical and computed eigenvalues and eigenfunctions. However, the first eigenfunction is one-to-one with the “arclength” along the Swiss roll (i.e., the \(s_1\) direction), and the second eigenfunction is one-to-one with the “width” (i.e., the \(s_2\) direction). Furthermore, the first two eigenvalues obtained with EDMD, -0.234 and -0.491, arrange the eigenfunctions in the correct order and compare favorably with the true eigenvalues of \(-2/9\) and \(-1/2\). Therefore, while our computed Koopman eigenfunctions may not be highly accurate, they are accurate enough to parameterize the nonlinear manifold, which was the goal.

Fig. 12
figure 12

First two non-trivial Koopman eigenfunctions for the diffusion process on the “Swiss roll” using \(3\times 10^4\) data points. The first eigenfunction is one-to-one with \(s_1\) (the “length” of the Swiss roll), and the second eigenfunction is one-to-one with \(s_2\). As a result, they could act as a data-driven parameterization of this nonlinear manifold. The eigenvalues associated with these eigenfunctions are \(-\)0.234 and \(-\)0.491 compared to the theoretical values of \(-\frac{2}{9}\) and \(-\frac{1}{2}\) respectively

The procedure for incorporating new data points is simple: The embedding for any \(\tilde{\varvec{x}}\in {\mathcal {M}}\) can be obtained simply by evaluating the relevant eigenfunctions at \(\tilde{\varvec{x}}\). It should be stressed that although the \(\varphi \) are defined on \(\Omega \), their value is only meaningful on (or very near) \({\mathcal {M}}\) because that is where the dynamical system is defined. Therefore, these new points must be elements of \({\mathcal {M}}\) if the resulting embedding is to have any meaning.

5.3.2 Reducing Multiscale Dynamics

In the previous example, the noise was isotropic, so the dynamics were equally “fast” in both directions. Because we are interested in the most slowly decaying eigenfunctions, the eigenfunction that is one-to-one with \(s_1\) is more important because it decays more slowly than the eigenfunction that is one-to-one with \(s_2\). In this example, we introduce anisotropic diffusion and therefore create “fast” and “slow” directions on the nonlinear manifold. The purpose of this example is to show that EDMD will “reorder” its ranking of the eigenfunctions and recover the slower component before the faster one if the level of anisotropy is large enough.

Our particular example is

$$\begin{aligned} \mathrm{d}s_1= & {} \frac{2}{\epsilon }\mathrm{d}W_1, \end{aligned}$$
(43a)
$$\begin{aligned} \mathrm{d}s_2= & {} 2\mathrm{d}W_2, \end{aligned}$$
(43b)

with \(\epsilon =0.1\), which is again transformed onto the same Swiss roll. Although the domain in \(s_1\) is larger than it is in the \(s_2\) component, the dynamics of \(s_1\) are now significantly faster than those of \(s_2\) due to a much larger amount of “noise.” Because the two random processes in (43) are independent, the eigenfunctions themselves should not change. However, the eigenvalues associated with each of the eigenfunctions should.

Figure 13 shows how EDMD captures this difference in the underlying diffusion process. Before, the first two non-trivial eigenfunctions were one-to-one with \(s_1\) and \(s_2\) respectively; now, the first is one-to-one with \(s_2\), and the second is a higher harmonic (but still only a function of \(s_2\)). The eigenfunction that is one-to-one with \(s_1\) still exists, but it is no longer associated with a leading eigenvalue. Analytically, its eigenvalue is \(-20/9\) though EDMD computes a value of \(-2.3\). Therefore, it is no longer a leading eigenfunction, but still is approximated by the EDMD procedure.

Fig. 13
figure 13

First two non-trivial Koopman eigenfunctions for the multiscale diffusion process in (43) on the “Swiss roll” computed using \(3\times 10^4\) data points. The eigenvalues associated with these eigenfunctions are \(-0.504\) (analytically it has the value \(-1/2\)) and \(-2.1\) (analytically, \(-2\)). In contrast to Fig. 12, these eigenfunctions are a parameterization of the “width” of the Swiss roll (\(s_2\)) and a higher harmonic. As a result, the eigenfunctions computed by EDMD take into account both the geometry of the underlying manifold and the dynamics defined on it

This section explored the application of the EDMD method to data taken from a Markov process. Algorithmically, the method remains the same regardless of how the data were generated, but as demonstrated here, EDMD computes an approximation of the tuples associated with SKO rather than the Koopman operator. To demonstrate the effectiveness of EDMD, we applied the method to a simple SDE with a double-well potential, and an SDE defined on a Swiss roll, which is a nonlinear manifold often used as a benchmark for manifold learning techniques. One advantage of the Koopman approach for applications such as manifold learning or model reduction is that the Koopman tuples take into account both the geometry of the manifold, through the eigenfunction and mode, and the dynamics, through the eigenvalue. As a result, the approach taken here is aware of both geometry and dynamics and does not focus solely on one or the other.

6 Conclusions

In this manuscript, we presented a data-driven method that computes approximations of the Koopman eigenvalues, eigenfunctions, and modes (what we call Koopman tuples) directly from a set of snapshot pairs. We refer to this method as extended dynamic mode decomposition (EDMD). The finite-dimensional approximation generated by EDMD is the solution to a least-squares problem and converges to a Galerkin method with a large amount of data. While the usefulness of the Galerkin method depends on the sampling density and dictionary selected, several “common sense” choices of both appear to produce useful results.

We demonstrate the effectiveness of the method with four examples: two examples dealt with deterministic data and two with stochastic data. First, we applied EDMD to a linear system where the Koopman eigenfunctions are known analytically. Direct comparison of the EDMD eigenfunctions and the analytic values demonstrated that EDMD can be highly accurate with the proper choice of data and dictionary. Next, we applied EDMD to the unforced Duffing equation, for which the Koopman eigenfunctions are not known explicitly. Although more data will increase the accuracy of the resulting eigenfunctions, they appeared to be accurate enough to effectively partition the domain of interest and parameterize the resulting partitions.

The final two examples used data generated by Markov processes. First, we applied EDMD to data taken from an SDE with a double-well potential and demonstrated the accuracy of the method by comparing those results with a direct numerical approximation of the stochastic Koopman operator over a range of diffusion parameters. Next, we applied EDMD to data from a diffusion process on a “Swiss roll,” which is a nonlinear manifold commonly used as an example for nonlinear dimensionality reduction. Similar to those methods (see e.g.,  Coifman and Lafon (2006) and Lee and Verleysen (2007)), EDMD generated an effective parameterization of the manifold using the leading eigenfunctions. By making the diffusion anisotropic, we then demonstrated that EDMD extracts a parameterization that is dynamically, rather than only geometrically, meaningful. Due to the simplicity of this problem, the eigenfunctions remain unchanged despite the anisotropy; the difference appears in the temporal evolution of the eigenfunctions, which is dictated by the corresponding set of eigenvalues. As a result, the purpose of that example was to show that EDMD “ordered” the eigenvalues of each tuple differently.

The Koopman operator governs the evolution of observables defined on the state space of a dynamical system. By judiciously selecting how we observe our system, we can generate linear models that are valid on all of (or, at least, a larger subset of) state space rather than just some small neighborhood of a fixed point; this could allow algorithms designed for linear systems to be applied even in nonlinear settings. However, the tuples of eigenvalues, eigenfunctions, and modes required to do so are decidedly non-trivial to compute. Data-driven methods such as EDMD have the potential to allow accurate approximations of these quantities to be computed without knowledge of the underlying dynamics or geometry. As a result, they could be a practical method for enabling Koopman-based analysis and model reduction in large nonlinear systems.