On Convergence of Extended Dynamic Mode Decomposition to the Koopman Operator

Korda, Milan; Mezić, Igor

doi:10.1007/s00332-017-9423-0

On Convergence of Extended Dynamic Mode Decomposition to the Koopman Operator

Published: 13 November 2017

Volume 28, pages 687–710, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Nonlinear Science Aims and scope Submit manuscript

On Convergence of Extended Dynamic Mode Decomposition to the Koopman Operator

Download PDF

4707 Accesses
199 Citations
5 Altmetric
Explore all metrics

Abstract

Extended dynamic mode decomposition (EDMD) (Williams et al. in J Nonlinear Sci 25(6):1307–1346, 2015) is an algorithm that approximates the action of the Koopman operator on an N-dimensional subspace of the space of observables by sampling at M points in the state space. Assuming that the samples are drawn either independently or ergodically from some measure $\mu $, it was shown in Klus et al. (J Comput Dyn 3(1):51–79, 2016) that, in the limit as $M\rightarrow \infty $, the EDMD operator $\mathcal {K}_{N,M}$ converges to $\mathcal {K}_N$, where $\mathcal {K}_N$ is the $L_2(\mu )$-orthogonal projection of the action of the Koopman operator on the finite-dimensional subspace of observables. We show that, as $N \rightarrow \infty $, the operator $\mathcal {K}_N$ converges in the strong operator topology to the Koopman operator. This in particular implies convergence of the predictions of future values of a given observable over any finite time horizon, a fact important for practical applications such as forecasting, estimation and control. In addition, we show that accumulation points of the spectra of $\mathcal {K}_N$ correspond to the eigenvalues of the Koopman operator with the associated eigenfunctions converging weakly to an eigenfunction of the Koopman operator, provided that the weak limit of the eigenfunctions is nonzero. As a by-product, we propose an analytic version of the EDMD algorithm which, under some assumptions, allows one to construct $\mathcal {K}_N$ directly, without the use of sampling. Finally, under additional assumptions, we analyze convergence of $\mathcal {K}_{N,N}$ (i.e., $M=N$), proving convergence, along a subsequence, to weak eigenfunctions (or eigendistributions) related to the eigenmeasures of the Perron–Frobenius operator. No assumptions on the observables belonging to a finite-dimensional invariant subspace of the Koopman operator are required throughout.

A Data–Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition

Article 05 June 2015

Dynamic Mode Decomposition for Continuous Time Systems with the Liouville Operator

Article 01 December 2021

On the Convergence of Iterative Filtering Empirical Mode Decomposition

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Recently, there has been an expanding interest in utilizing the spectral expansion-based methodology that enabled progress in data-driven analysis of high-dimensional nonlinear systems. The research was initiated in Mezić (2005) and Mezić and Banaszuk (2004), using composition (Koopman) operator representation originally defined in Koopman (1931). The framework is being used for model reduction, identification, prediction, data assimilation and control of deterministic (e.g., Budisić et al. 2012; Mauroy and Goncalves 2017; Rowley et al. 2009; Williams et al. 2015; Brunton et al. 2016b; Giannakis et al. 2015; Korda and Mezić 2016) as well as stochastic dynamical systems (e.g., Takeishi et al. 2017; Riseth and Taylor-King 2017; Wu and Noé 2017). This has propelled the theory to wide use on a diverse set of applications such as fluid dynamics (Sharma et al. 2016; Glaz et al. 2016), power grid dynamics (Raak et al. 2016), neurodynamics (Brunton et al. 2016a), energy efficiency (Georgescu and Mezić 2015), molecular dynamics (Wu et al. 2017) and data fusion (Williams et al. 2015).

Numerical methods for approximation of the spectral properties of the Koopman operator have been considered since the inception of the data-driven analysis of dissipative dynamical systems (Mezić and Banaszuk 2004; Mezić 2005). These belong to the class of generalized Laplace analysis (GLA) methods (Mezić 2013). An alternative line of algorithms, called the dynamic mode decomposition (DMD) algorithm (Schmid 2010; Rowley et al. 2009), has also been advanced, enabling concurrent data-driven determination of approximate eigenvalues and eigenvectors of the underlying DMD operator. The examples of DMD-type algorithms are (1) the companion matrix method proposed by Rowley et al. (2009), (2) the SVD-based DMD developed by Schmid (2010), (3) the Exact DMD method introduced by Tu et al. (2014), (4) the Extended DMD (Williams et al. 2015). The relationship between these methods and the spectral operator properties of the Koopman operator was first noticed in Rowley et al. (2009), based on the spectral expansion developed in Mezić (2005). However, rigorous results in this direction are sparse. Williams et al. (2015) provided a result, the corollary of which is that the spectrum of the EDMD approximation is contained in the spectrum of the Koopman operator provided the observables belong to a finite-dimensional invariant subspace of the Koopman operator and the data matrix is of the same rank. The work of Arbabi and Mezić (2016) suggested that an alternative assumption to the finite rank is that the number of sampling points M goes to infinity even though the results of Arbabi and Mezić (2016) still implicitly rely on a finite-dimensional assumption.

The work (Klus et al. 2016) showed that, assuming either independent identically distributed (iid) or ergodic sampling from a measure $\mu $, the EDMD operator on N observables constructed using M samples, $\mathcal {K}_{N,M}$, converges as $M\rightarrow \infty $ to $\mathcal {K}_N$, where $\mathcal {K}_N$ is the $L_2(\mu )$-orthogonal projection of the action of the Koopman operator on the finite-dimensional subspace of observables. In this work, we show that $\mathcal {K}_N$ converges to $\mathcal {K}$ in the strong operator topology. As a result, predictions of a given observable obtained using $\mathcal {K}_N$ or $\mathcal {K}_{N,M}$ over any finite prediction horizon converge in the $L_2(\mu )$ norm (as N or N and M tend to infinity) to its true values. In addition, we show that, as $N\rightarrow \infty $, accumulation points of the spectra of $\mathcal {K}_N$ correspond to eigenvalues of the Koopman operator and the associated eigenfunctions converge weakly to an eigenfunction of the Koopman operator, provided that the weak limit of the eigenfunctions is nonzero. The results hold in a very general setting with minimal underlying assumptions. In particular, we do not assume that the finite-dimensional subspace of observables is invariant under the action of the Koopman operator or that the dynamics is measure preserving.

As a by-product of our results, we propose an analytic version of the EDMD algorithm which allows one to construct $\mathcal {K}_N$ directly, without the need for sampling, under the assumption that the transition mapping of the dynamical system is known analytically and the N-dimensional subspace of observables is such that the integrals of the products of the observables precomposed with the transition mapping can be evaluated in closed form. This method is not immediately useful for large-scale data-driven applications that the EDMD was originally designed for, but it may prove useful in control applications (e.g., Korda and Mezić 2016), where model is often known, or for numerical studies of Koopman operator approximations on classical examples, eliminating the sampling error in both cases.

Finally, we analyze convergence of $\mathcal {K}_{N,N}$, i.e., the situation where the number of samples M and the number of observables N are equal. Under the additional assumptions that the sample points lie on the same trajectory and the mapping T is a homeomorphism, we prove convergence along a subsequence to weak eigenfunctions (or eigendistributions in the sense of Gelfand and Shilov 1964) of the Koopman operator, which also turn out to be eigenmeasures of the Perron–Frobenius operator.

The paper is organized as follows: In Sect. 2, we introduce the setting of EDMD. In Sect. 3, we show that EDMD is an orthogonal projection of the action of the Koopman operator on a finite subspace of observables with respect to the empirical measure supported on sample points drawn from a measure $\mu $. In Sect. 4, we show that this projection converges to the $L_2(\mu )$-projection of the action of the Koopman operator. In Sect. 5, we analyze the convergence of the EDMD approximations as the dimension of the subspace N goes to infinity, showing convergence in strong operator topology and convergence of the eigenvalues along a subsequence plus weak convergence of the associated eigenfunctions. In Sect. 6, we show convergence of finite-horizon predictions. Section 7 describes the analytic construction of $\mathcal {K}_N$. Section 8 contains results for the case when $M=N$ and only convergence to weak eigenfunctions, along a subsequence, is proven. We conclude in Sect. 9.

1.1 Notation

The spaces of real and complex numbers are denoted by $\mathbb {R}$ and $\mathbb {C}$ with $\mathbb {R}^{n\times k}$ and $\mathbb {C}^{n\times k}$ denoting the corresponding real and complex $n\times k$ matrices; the real and complex column vectors are denoted by $\mathbb {R}^n := \mathbb {R}^{n\times 1}$ and $\mathbb {C}^n := \mathbb {C}^{n\times 1}$. The complex conjugate of $a\in \mathbb {C}$ is denoted by $\overline{a}$. Given a matrix $A\in \mathbb {C}^{n\times k}$, $A^\top $ denotes its transpose and $A^{{\mathsf {H}}}$ denotes its Hermitian transpose (i.e., $A_{i,j}^\top = A_{j,i}$ and $A^{{\mathsf {H}}}_{i,j} = \overline{A_{j,i}}$). The Moore–Penrose pseudoinverse of a matrix A is denoted by $A^\dagger $. The Frobenius norm of a matrix A is denoted by $\Vert A \Vert _F = \sqrt{ \sum _{i,j}{ |A_{i,j}|^2 }}$. Given a vector $c\in \mathbb {C}^n$, the symbol $\Vert c\Vert _2 := \sqrt{\sum _{i}|c_i|^2}$ denotes its Euclidean norm.

2 Extended Dynamic Mode Decomposition

We consider a discrete time dynamical system

$$\begin{aligned} x^+ = T(x) \end{aligned}$$

(1)

with $T{:}\,\mathcal {M}\rightarrow \mathcal {M}$, where $\mathcal {M}$ is a topological space,^{Footnote 1} and we assume that we are given snapshots of data

$$\begin{aligned} \varvec{X} = [x_1,\ldots , x_M],\quad \varvec{Y} = [y_1,\ldots , y_M] \end{aligned}$$

(2)

satisfying $y_i = T(x_i)$. We do not assume that the data points line on a single trajectory of (1).

Given a vector space of observables $\mathcal {F}$ such that $\psi {:}\,\mathcal {M}\rightarrow \mathbb {C}$ and $\psi \circ T \in \mathcal {F}$ for every $\psi \in \mathcal {F}$, we define the Koopman $\mathcal {K}{:}\,\mathcal {F}\rightarrow \mathcal {F}$ by

$$\begin{aligned} \mathcal {K}\psi = \psi \circ T, \end{aligned}$$

where $\circ $ denotes the pointwise function composition. Given a set of linearly independent basis functions $\psi _i \in \mathcal {F}$, $i=1,\ldots ,N$, and defining

$$\begin{aligned} \mathcal {F}_N := \mathrm {span}\left\{ \psi _1,\ldots ,\psi _N\right\} , \end{aligned}$$

(3)

the EDMD constructs a finite-dimensional approximation $\mathcal {K}_{N,M}{:}\,\mathcal {F}_N\rightarrow \mathcal {F}_N$ of the Koopman operator by solving the least-squares problem

$$\begin{aligned} \min _{A\in \mathbb {C}^{N\times N}} \Vert A\varvec{\psi }(\varvec{X}) - \varvec{\psi }(\varvec{Y}) \Vert _F^2 = \min _{A\in \mathbb {C}^{N\times N}} \sum _{i=1}^M \Vert A\varvec{\psi }(x_i) - \varvec{\psi }(y_i) \Vert _2^2, \end{aligned}$$

(4)

where

$$\begin{aligned} \varvec{\psi }(\varvec{X}) = [\varvec{\psi }(x_1),\ldots , \varvec{\psi }(x_M)],\quad \varvec{\psi }(\varvec{Y}) = [\varvec{\psi }(y_1),\ldots , \varvec{\psi }(y_M)] \end{aligned}$$

and

$$\begin{aligned} \varvec{\psi }(x) = [\psi _1(x),\ldots ,\psi _N(x)]^\top . \end{aligned}$$

Denoting

$$\begin{aligned} A_{N,M} = \varvec{\psi }(\varvec{Y})\varvec{\psi }(\varvec{X})^\dagger , \end{aligned}$$

(5)

a solution^{Footnote 2} to (4), the finite-dimensional approximation of the Koopman operator

$$\begin{aligned} \mathcal {K}_{N,M}{:}\,\mathcal {F}_N \rightarrow \mathcal {F}_N \end{aligned}$$

is then defined by

$$\begin{aligned} \mathcal {K}_{N,M} \phi = c_\phi ^{{\mathsf {H}}}A_{N,M}\varvec{\psi }\end{aligned}$$

(6)

for any $\phi = c_\phi ^{{\mathsf {H}}}\varvec{\psi }$, $c_\phi \in \mathbb {C}^N$ (i.e., for any $\phi \in \mathcal {F}_N$). The operator $\mathcal {K}_{N,M}$ will be referred to as the EDMD operator.

3 EDMD as $L_2$ Projection

To the best of our knowledge, the results of this section and Sect. 4 were first obtained in Klus et al. (2016, Section 3.4) and hinted at already in Williams et al. (2015). Here, we rephrase these results in a form more suitable for our purposes.

From here on, we assume^{Footnote 3} that $\mathcal {F}= L_2(\mu )$, where $\mu $ is a given positive measure on $\mathcal {M}$. This assumption in particular implies that the basis functions $\psi _i$ in (3) belong to $L_2(\mu )$ and hence $\mathcal {F}_N$ is a closed (but not necessarily invariant) subspace of $L_2(\mu )$. Note that the measure $\mu $ is not required to be invariant for (1) and hence the Koopman operator $\mathcal {K}$ is not necessarily unitary. In practical applications, the measure $\mu $ will typically be the uniform measure on $\mathcal {M}$ or other measure from which samples can be drawn efficiently.

We recall that given an arbitrary positive measure $\nu $ on $\mathcal {M}$, the space $L_2(\nu )$ is the Hilbert space of all measurable functions $\phi {:}\,\mathcal {M}\rightarrow \mathbb {C}$ satisfying

$$\begin{aligned} \Vert \phi \Vert _{L_2(\nu )} := \sqrt{\int _{\mathcal {M}} |\phi (x)|^2\,\mathrm{d}\nu (x)} < \infty . \end{aligned}$$

Assuming $\mathcal {F}_N $ is a closed subspace of $L_2(\nu )$, the $L_2(\nu )$-projection of a function $\phi \in L_2(\nu )$ onto $\mathcal {F}_N \subset L_2(\nu )$ is defined by

$$\begin{aligned} P_N^\nu \phi = \mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{f\in \mathcal {F}_N} \Vert f - \phi \Vert _{L_2(\nu )}= \mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{f\in \mathcal {F}_N} \int _{\mathcal {M}} |f - \phi |^2\,\mathrm{d}\nu =\mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{c\in \mathbb {C}^N} \int _{\mathcal {M}} |c^{{\mathsf {H}}}\varvec{\psi }- \phi |^2\,\mathrm{d}\nu . \end{aligned}$$

(7)

Now, given the data points $x_1,\ldots ,x_M$ from (2), we define the empirical measure $\hat{\mu }_M$ by

$$\begin{aligned} \hat{\mu }_M = \frac{1}{M}\sum _{i=1}^M \delta _{x_i}, \end{aligned}$$

(8)

where $\delta _{x_i}$ is the Dirac measure at $x_i$. In particular, the integral of a function $\phi $ with respect to $\hat{\mu }_M$ is given by

$$\begin{aligned} \int _\mathcal {M}\phi (x)\mathrm{d}{\hat{\mu }}_M(x) = \frac{1}{M}\sum _{i=1}^M \phi (x_i). \end{aligned}$$

We remark that the EDMD subspace $\mathcal {F}_N$ defined in (3) is a closed subspace of both $L_2(\hat{\mu }_M)$ and $L_2(\mu )$ and hence the projections $P_N^{\mu }$ and $P_N^{{\hat{\mu }}_M}$ are well defined.

Now, we are ready to state the following characterization of $\mathcal {K}_{N,M} \phi $:

Theorem 1

Let ${\hat{\mu }}_M$ denote the empirical measure (8) associated to the sample points $x_1,\ldots , x_M$ and assume that the $N\times N$ matrix

$$\begin{aligned} M_{{\hat{\mu }}_M} =\frac{1}{M}\sum _{i=1}^M \varvec{\psi }(x_i)\varvec{\psi }(x_i)^{{\mathsf {H}}}= \int _{\mathcal {M}} \varvec{\psi }\varvec{\psi }^{{\mathsf {H}}}\,\mathrm{d}{\hat{\mu }}_M \end{aligned}$$

(9)

is invertible. Then, for any $\phi \in \mathcal {F}_N$

$$\begin{aligned} \mathcal {K}_{N,M}\phi = P_{N}^{\hat{\mu }_M} \mathcal {K}\phi = \mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{f \in \mathcal {F}_N} \Vert f - \mathcal {K}\phi \Vert _{L_2({\hat{\mu }}_M )}, \end{aligned}$$

(10)

i.e.,

$$\begin{aligned} \mathcal {K}_{N,M} = P_N^{{\hat{\mu }}_M} \mathcal {K}_{|\mathcal {F}_N}, \end{aligned}$$

(11)

where $\mathcal {K}_{|\mathcal {F}_N}{:}\,\mathcal {F}_N\rightarrow \mathcal {F}$ is the restriction of the Koopman operator to $\mathcal {F}_N$.

Proof

Since the matrix $M_{{\hat{\mu }}_M}$ is invertible, the least-squares problem (4) has a unique solution given by

$$\begin{aligned} a_i = \left( \sum _{j=1}^M \varvec{\psi }(x_j)\varvec{\psi }(x_j)^{{\mathsf {H}}}\right) ^{-1}\sum _{j=1}^M \varvec{\psi }(x_j)\overline{ \psi _i(y_j)}, \end{aligned}$$

where $a_i^{{\mathsf {H}}}\in \mathbb {C}^{1\times N}$ is the ith row of $A_{N,M}$. Therefore,

$$\begin{aligned} A_{N,M}^{{\mathsf {H}}}= \left( \sum _{j=1}^M \varvec{\psi }(x_j)\varvec{\psi }(x_j)^{{\mathsf {H}}}\right) ^{-1}\sum _{j=1}^M \varvec{\psi }(x_j)\varvec{\psi }(y_j)^{{\mathsf {H}}}. \end{aligned}$$

On the other hand, analyzing the minimization problem on the right-hand side of (10), we get for any $\phi = c_\phi ^{{\mathsf {H}}}\varvec{\psi }$

$$\begin{aligned} \mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{f \in \mathcal {F}_N} \Vert f - \mathcal {K}\phi \Vert _{L_2({\hat{\mu }}_M )} = \mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{c\in \mathbb {C}^N} \frac{1}{M}\sum _{i=1}^M \left( c^{{\mathsf {H}}}\varvec{\psi }(x_i) - c_\phi ^{{\mathsf {H}}}\varvec{\psi }(y_i)\right) ^2 \end{aligned}$$

with the unique minimizer (since the minimized functions is strictly convex in c)

$$\begin{aligned} c = \left( \sum _{j=1}^M \varvec{\psi }(x_j)\varvec{\psi }(x_j)^{{\mathsf {H}}}\right) ^{-1}\sum _{j=1}^M \varvec{\psi }(x_j)\varvec{\psi }(y_j)^{{\mathsf {H}}}c_\phi = A_{N,M}^{{\mathsf {H}}}c_\phi . \end{aligned}$$

Hence, $\mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{f \in \mathcal {F}_N} \Vert f - \mathcal {K}\phi \Vert _{L_2({\hat{\mu }}_M )} = c^{{\mathsf {H}}}\varvec{\psi }= c_\phi ^{{\mathsf {H}}}A_{N,M}\varvec{\psi }= \mathcal {K}_{N,M}\phi $ as desired, where we used (6) in the last equality. $\square $

Theorem 1 says that for any function $\phi \in \mathcal {F}_N$, the EDMD operator $\mathcal {K}_{N,M}$ computes the $L_2({\hat{\mu }}_M)$-orthogonal projection of $\mathcal {K}\phi $ on the subspace spanned by $\psi _1,\ldots ,\psi _N$.

Remark 1

If the assumption that the matrix $M_{{\hat{\mu }}_M}$ is invertible does not hold, then the solution to the projection problem on the right-hand side of (10) may not be unique.^{Footnote 4} The action of the EDMD operator $\mathcal {K}_{N,M}$ [which is defined uniquely by (5) and (6)] then selects one solution to the projection problem. In concrete terms, we have

$$\begin{aligned} \mathcal {K}_{N,M}\phi \in \mathop {\mathrm{Arg}\,\mathrm{min}}\limits _{f\in \mathcal {F}_N}\Vert f - \mathcal {K}\phi \Vert _{L_2({\hat{\mu }}_M)}, \end{aligned}$$

where $\mathop {\mathrm{Arg}\,\mathrm{min}}\limits _{f\in \mathcal {F}_N}\Vert f - \mathcal {K}\phi \Vert _{L_2({\hat{\mu }}_M)}$ denotes the set of all minimizers of $\Vert f - \mathcal {K}\phi \Vert _{L_2({\hat{\mu }}_M)}$ among $f\in \mathcal {F}_N$.

4 Convergence of $\mathcal {K}_{N,M}$ as $M\rightarrow \infty $

The first step in understanding convergence of EDMD is to understand the convergence of $\mathcal {K}_{N,M}$ as the number of samples M tends to infinity. In this section, we prove that

$$\begin{aligned} \mathcal {K}_{N,M} \rightarrow \mathcal {K}_N, \end{aligned}$$

where

$$\begin{aligned} \mathcal {K}_N = P_N^\mu \mathcal {K}_{|\mathcal {F}_N}, \end{aligned}$$

(12)

provided that the samples $x_1,\ldots ,x_M$ are drawn independently from a given probability distribution $\mu $ (e.g., uniform distribution for compact $\mathcal {M}$ or Gaussian for if $\mathcal {M}= \mathbb {R}^n$).

Assumption 1

($\mu $ independence) The basis functions $\psi _1,\ldots ,\psi _N$ are such that

$$\begin{aligned} \mu \left\{ x\in \mathcal {M}\mid c^{{\mathsf {H}}}\varvec{\psi }(x)=0 \right\} = 0 \end{aligned}$$

for all nonzero $c \in \mathbb {C}^N$.

This is a natural assumption ensuring that the measure $\mu $ is not supported on a zero level set of a linear combination of the basis functions used. It is satisfied if $\mu $ is any measure with the support equal to $\mathcal {M}$ in conjunction with the majority of most commonly used basis functions such as polynomials and radial basis functions with unbounded support (e.g., Gaussian, thin plate splines). This assumption in particular implies that the matrix $M_{{\hat{\mu }}_M}$ defined in (9) is invertible with probability one for $M \ge N$ if $x_j$’s are iid samples from $\mu $.

Lemma 1

If Assumption 1 holds, then for any $\phi \in \mathcal {F}$ we have with probability one

$$\begin{aligned} \lim _{M\rightarrow \infty }\Vert P_N^{{\hat{\mu }}_M} \phi - P_N^{\mu } \phi \Vert = 0, \end{aligned}$$

(13)

where $\Vert \cdot \Vert $ is any norm on $\mathcal {F}_N ($which are all equivalent since $\mathcal {F}_N$ is finite dimensional).

Proof

We have

$$\begin{aligned} P_N^{\mu } \phi&= \mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{f\in \mathcal {F}_N} \int _{\mathcal {M}} |f - \phi |^2\, \mathrm{d}\mu = \varvec{\psi }^{{\mathsf {H}}}\mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{c\in \mathbb {C}^N} \int _{\mathcal {M}} |c^{{\mathsf {H}}}\varvec{\psi }- \phi |^2\, \mathrm{d}\mu \\&=\varvec{\psi }^{{\mathsf {H}}}\mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{c\in \mathbb {C}^N} \left[ c^{{\mathsf {H}}}M_{\mu } c - 2{\mathrm{Re}}\big \{c^{{\mathsf {H}}}b_{\mu ,\phi }\big \} \right] , \end{aligned}$$

where

$$\begin{aligned} M_\mu = \int _{\mathcal {M}} \varvec{\psi }\varvec{\psi }^{{\mathsf {H}}}\,\mathrm{d}\mu \in \mathbb {C}^{N\times N},\quad b_{\mu ,\phi } = \int _{\mathcal {M}}\varvec{\psi }\overline{\phi }\,\mathrm{d}\mu \in \mathbb {C}^N \end{aligned}$$

and we dropped the constant term in the last equality which does not influence the minimizer. By Assumption 1, the matrix $M_{\mu }$ is invertible and hence Hermitian positive definite. Therefore, the unique minimizer is $c = M_{\mu }^{-1}b_{\mu ,\phi }$. Hence,

$$\begin{aligned} P_N^{\mu }\phi = b_{\mu ,\phi }^{{\mathsf {H}}}M_{\mu }^{-1}\varvec{\psi }. \end{aligned}$$

On the other hand, the same computation shows that

$$\begin{aligned} P_N^{{\hat{\mu }}_M}\phi = b_{{\hat{\mu }}_M,\phi }^{{\mathsf {H}}}M_{{\hat{\mu }}_M}^{-1}\varvec{\psi }\end{aligned}$$

with

$$\begin{aligned} b_{{\hat{\mu }}_M,\phi } = \int _{\mathcal {M}}\varvec{\psi }\overline{\phi }\,\mathrm{d}{\hat{\mu }}_M = \frac{1}{M}\sum _{i=1}^M \varvec{\psi }(x_i)\overline{\phi (x_i)} \end{aligned}$$

and with the matrix $M_{{\hat{\mu }}_M}$ defined in (9) guaranteed to be Hermitian positive definite by Assumption 1 with probability one for $M\ge N$. The result then follows by the strong law of large numbers which ensures that

$$\begin{aligned} \lim _{M\rightarrow \infty } \left( b_{{\hat{\mu }}_M,\phi }^{{\mathsf {H}}}M_{{\hat{\mu }}_M}^{-1}\right) = b_{\mu ,\phi }^{{\mathsf {H}}}M_{\mu }^{-1} \end{aligned}$$

with probability one since the matrix function $A\mapsto A^{-1}$ is continuous and the samples $x_i$ are iid. $\square $

Theorem 2

If Assumption 1 holds, then we have with probability one for all $\phi \in \mathcal {F}_N$

$$\begin{aligned} \lim _{M\rightarrow \infty }\Vert \mathcal {K}_{N,M} \phi - \mathcal {K}_N \phi \Vert = 0, \end{aligned}$$

(14)

where $\Vert \cdot \Vert $ is any norm on $\mathcal {F}_N$. In particular

$$\begin{aligned} \lim _{M\rightarrow \infty }\Vert \mathcal {K}_{N,M} - \mathcal {K}_N \Vert = 0, \end{aligned}$$

(15)

where $\Vert \cdot \Vert $ is any operator norm and

$$\begin{aligned} \lim _{M\rightarrow \infty }\mathrm {dist}\big (\sigma (\mathcal {K}_{N,M}),\sigma (\mathcal {K}_N)\big ) = 0, \end{aligned}$$

(16)

where $\sigma (\cdot ) \subset \mathbb {C}$ denotes the spectrum of an operator and $\mathrm {dist}(\cdot ,\cdot )$ the Hausdorff metric on subsets of $\mathbb {C}$.

Proof

For any fixed $\phi \in \mathcal {F}_N$, we have by Theorem 1

$$\begin{aligned} \mathcal {K}_{N,M} \phi = P_{N}^{{\hat{\mu }}_M} \mathcal {K}\phi = P_{N}^{{\hat{\mu }}_M} (\phi \circ T). \end{aligned}$$

By definition of $\mathcal {K}_N \phi $, we have $\mathcal {K}_N \phi =P_N^\mu (\phi \circ T) $ and therefore (14) holds from Lemma 1 with probability one. Since $\mathcal {F}_N$ is finite dimensional, (14) holds with probability one for all basis functions of $\mathcal {F}_N$ and hence by linearity for all $\phi \in \mathcal {F}_N$. Convergence in the operator norm (15) and spectral convergence (16) follows from (14) since the operators $\mathcal {K}_{N,M}$ and $\mathcal {K}_N$ are finite dimensional. $\square $

Theorem 2 tells us that in order to understand the convergence of $\mathcal {K}_{N,M}$ to $\mathcal {K}$ it is sufficient to understand the convergence of $\mathcal {K}_N$ to $\mathcal {K}$. This convergence is analyzed in Sect. 5.

4.1 Ergodic Sampling

The assumption that the samples $x_1,\ldots ,x_M$ are drawn independently from the distribution $\mu $ can be replaced by the assumption that $(T,\mathcal {M},\mu )$ is ergodic and the samples $x_1,\ldots ,x_M$ are the iterates of the dynamical system starting from some initial condition $x\in \mathcal {M}$, i.e., $x_i = T^i(x)$. Provided that Assumption 1 holds, both Lemma 1 and Theorem 2 hold without change; the statement “with probability one” is now interpreted with respect to drawing the initial condition x from the distribution $\mu $. The proofs follow exactly the same argument; only the strong law of large numbers is replaced by the Birkhoff’s ergodic theorem in Lemma 1.

5 Convergence of $\mathcal {K}_N$ to $\mathcal {K}$

In this section, we investigate convergence of $\mathcal {K}_N$ to $\mathcal {K}$ in the limit as N goes to infinity. Since the operator $\mathcal {K}_N$ is defined on $\mathcal {F}_N$ rather than $\mathcal {F}$, we extend the operator to all of $\mathcal {F}$ by precomposing with $P_N^\mu $, i.e., we study the convergence of $\mathcal {K}_N P_N^\mu = P_N^\mu \mathcal {K}P_N^\mu {:}\,\mathcal {F}\rightarrow \mathcal {F}$ to $\mathcal {K}{:}\,\mathcal {F}\rightarrow \mathcal {F}$ as $N\rightarrow \infty $. Note that as far as spectrum is concerned, precomposing with $P_N^\mu $ just adds a zero to the spectrum.

To simplify notation, in what follows we denote the $L_2(\mu )$ norm of a function f by

$$\begin{aligned} \Vert f\Vert := \Vert f\Vert _{L_2(\mu )} = \sqrt{\int _{\mathcal {M}} |f|^2\,\mathrm{d}\mu } \end{aligned}$$

and the usual $L_2(\mu )$ inner product by

$$\begin{aligned} \langle f,g\rangle := \int _{\mathcal {M}} f\overline{g}\,\mathrm{d}\mu . \end{aligned}$$

5.1 Preliminaries

Before stating our results, we recall several concepts from functional analysis and operator theory.

Definition 1

(Bounded operator) An operator $\mathcal {A}{:}\,\mathcal {F}\rightarrow \mathcal {F}$ defined on a Hilbert space $\mathcal {F}$ is bounded if

$$\begin{aligned} \Vert \mathcal {A}\Vert := \sup \limits _{f \in \mathcal {F},\; \Vert f \Vert =1} \Vert \mathcal {A}f \Vert < \infty . \end{aligned}$$

The quantity $\Vert \mathcal {A}\Vert $ is referred as the norm of $\mathcal {A}$.

Definition 2

(Convergence in strong operator topology) A sequence of bounded operators $\mathcal {A}_i{:}\,\mathcal {F}\rightarrow \mathcal {F}$ defined on a Hilbert space $\mathcal {F}$ convergences strongly (or in the strong operator topology) to an operator $\mathcal {A}{:}\,\mathcal {F}\rightarrow \mathcal {F}$ if

$$\begin{aligned} \lim _{i\rightarrow \infty } \Vert \mathcal {A}_i g - \mathcal {A}g\Vert \end{aligned}$$

(17)

for all $g \in \mathcal {F}$.

Definition 3

(Weak convergence) A sequence of elements $f_i \in \mathcal {F}$ of a Hilbert space $\mathcal {F}$ converges weakly to $f \in \mathcal {F}$, denoted $f_i \xrightarrow {w}f$, if

$$\begin{aligned} \lim _{i\rightarrow \infty } \langle f_i, g\rangle = \langle f,g\rangle \end{aligned}$$

(18)

for all $g \in \mathcal {F}$.

We emphasize that Definition 2 pertains to convergence of operators defined on $\mathcal {F}$, whereas Definition 3 pertains to convergence of elements of $\mathcal {F}$. We also remark that strong convergence of $f_i \rightarrow f$ (i.e., $\Vert f_i - f\Vert \rightarrow 0$) implies weak convergence, but not vice versa. Similarly, convergence in the strong operator topology implies convergence in the weak operator topology (i.e., $\mathcal {A}_i g \xrightarrow {w}\mathcal {A}_g$ for all g), but does not imply convergence in the operator norm (i.e., $\Vert \mathcal {A}_i - \mathcal {A}\Vert \rightarrow 0$).

In our setting of $\mathcal {F}= L_2(\mu )$, the statements (17) and (18) translate to the requirements that, respectively,

$$\begin{aligned} \lim _{i\rightarrow \infty } \sqrt{\int _{\mathcal {M}} | \mathcal {A}_i g - \mathcal {A}g |^2\,\mathrm{d}\mu } = 0\qquad \mathrm {and}\qquad \lim _{i\rightarrow \infty } \int _{\mathcal {M}} f_i\overline{g} \,\mathrm{d}\mu = \int _{\mathcal {M}} f\overline{g} \,\mathrm{d}\mu \end{aligned}$$

for all $g\in L_2(\mu )$.

For the remainder of this work, we invoke the following assumption:

Assumption 2

The following conditions hold:

1.
The Koopman operator $\mathcal {K}{:}\,\mathcal {F}\rightarrow \mathcal {F}$ is bounded.
2.
The observables $\psi _1,\ldots ,\psi _N$ defining $\mathcal {F}_N$ are selected from a given orthonormal basis of $\mathcal {F}$, i.e., $(\psi _i)_{i=1}^\infty $ is an orthonormal basis of $\mathcal {F}$.

The first part of the assumption holds for instance when T is invertible, Lipschitz with Lipschitz inverse and $\mu $ is the Lebesgue measure on $\mathcal {M}$ (or any measure absolutely continuous w.r.t. the Lebesgue measure with bounded and strictly positive density). The second part of the assumption is nonrestrictive as any countable dense subset of $\mathcal {F}$ can be turned into an orthonormal basis using the Gram–Schmidt process.

5.2 Convergence in Strong Operator Topology

In this section, we prove convergence in the strong operator topology (Definition 2) of $\mathcal {K}_N P_N^\mu $ to $\mathcal {K}$. First, we need the following immediate lemma:

Lemma 2

If $(\psi _i)_{i=1}^\infty $ form an orthonormal basis of $\mathcal {F}= L_2(\mu )$, then $P_N^\mu $ converge strongly to the identity operator I and in addition $\Vert I - P_N\Vert \le 1 $ for all N.

Proof

Let $\phi = \sum _{i=1}^\infty c_i\psi _i$ with $\Vert \phi \Vert = 1$. Then, by Parseval’s identity $\sum _{i=1}^\infty |c_i|^2 = 1$ and

$$\begin{aligned} \Vert P_N^\mu \phi - \phi \Vert = \Bigg \Vert \sum _{i=N+1}^\infty c_i\psi _i \Bigg \Vert = \sum _{i=N+1}^\infty |c_i|^2 \rightarrow 0 \end{aligned}$$

with $\sum _{i=N+1}^\infty |c_i|^2 \le 1$ for all N. $\square $

Now, we are ready to prove strong convergence of $P_N^\mu \mathcal {K}P_N^\mu $ to $\mathcal {K}$.

Theorem 3

If Assumption 2 holds, then the sequence of operators $\mathcal {K}_NP_N^\mu =P_N^\mu \mathcal {K}P_N^\mu $ converges strongly to $\mathcal {K}$ as $N\rightarrow \infty $, i.e.,

$$\begin{aligned} \lim _{N\rightarrow \infty } \int _{\mathcal {M}} | \mathcal {K}_N P_N^\mu \phi - \mathcal {K}\phi |^2\,\mathrm{d}\mu = 0 \end{aligned}$$

for all $\phi \in \mathcal {F}$.

Proof

Let $\phi \in \mathcal {F}$ be given. Then, writing $\phi =P_N^\mu \phi + (I -P_N^\mu )\phi $ we have

$$\begin{aligned} \Vert P_N^\mu \mathcal {K}P_N^\mu \phi - \mathcal {K}\phi \Vert&= \Vert \big (P_N^\mu -I\big )\mathcal {K}P_N^\mu \phi + \mathcal {K}\big (P_N^\mu -I\big )\phi \Vert \\&\le \Vert \big (P_N^\mu -I\big )\mathcal {K}P_N^\mu \phi \Vert + \Vert \mathcal {K}\Vert \Vert ( I - P_N^\mu )\phi \Vert \\&\le \Vert \big (P_N^\mu -I\big )\mathcal {K}\phi \Vert + \Vert \big (P_N^\mu -I\big )\Vert \Vert \mathcal {K}P_N^\mu \phi -\mathcal {K}\phi \Vert \\&\quad +\,\Vert \mathcal {K}\Vert \Vert \big ( I - P_N^\mu \big )\phi \Vert \rightarrow 0 \end{aligned}$$

by Lemma 2 and by the fact that $\mathcal {K}P_N^\mu \phi \rightarrow \mathcal {K}\phi $ since $\mathcal {K}$ is continuous by Assumption 2. $\square $

5.3 Weak Spectral Convergence

Unfortunately, strong converge does not in general guarantee convergence of the spectra of the operators. This is guaranteed only if the operators converge in the operator norm.^{Footnote 5} An important exception to this is the case of $\mathcal {F}_N$ being an invariant subspace, i.e., $\mathcal {K}f \in \mathcal {F}_N$ for all $f\in \mathcal {F}_N$ in which case the spectra of $\mathcal {K}_N$ and $\mathcal {K}_{|\mathcal {F}_N}$ coincide. Here, however, we do not assume that $\mathcal {F}_N$ is invariant and prove certain spectral convergence results in a weak sense. In particular, we prove convergence of the eigenvalues of $\mathcal {K}_N$ along a subsequence and weak convergence of the associated eigenfunctions (see Definition 3), provided that the weak limit of the eigenfunctions is nonzero.

Theorem 4

If Assumption 2 holds and $\lambda _N$ is a sequence of eigenvalues of $\mathcal {K}_N$ with the associated normalized eigenfunctions $\phi _N \in \mathcal {F}_N$, $\Vert \phi _N \Vert =1$, then there exists a subsequence $(\lambda _{N_i},\phi _{N_i})$ such that

$$\begin{aligned} \lim _{i\rightarrow \infty }\lambda _{N_i} = \lambda ,\quad \phi _{N_i} \xrightarrow {w}\phi , \end{aligned}$$

where $\lambda \in \mathbb {C}$ and $\phi \in \mathcal {F}$ are such that $\mathcal {K}\phi = \lambda \phi $. In particular, if $\Vert \phi \Vert \ne 0$, then $\lambda $ is an eigenvalue of $\mathcal {K}$ with eigenfunction $\phi $.

Proof

First, observe that since $\mathcal {K}_N \phi _N = \lambda _N \phi _N$ with $\phi _N \in \mathcal {F}_N$, we also have $P_N^\mu \mathcal {K}P_N^\mu \phi _N = \lambda _N \phi _N $. Hence, $|\lambda _N| \le \Vert P_N^\mu \mathcal {K}P_N^\mu \Vert \le \Vert \mathcal {K}\Vert < \infty $ by Assumption 2 and the fact that $\Vert P_N^\mu \Vert \le 1 $. Therefore, the sequence $\lambda _N$ is bounded. Since $\phi _N$ is normalized and hence bounded, by weak sequential compactness of the unit ball of a Hilbert space [which follows from the Banach–Alaoglu theorem (Rudin 1973, Theorems 3.15) and Eberlein–Šmulian theorem (Dunford and Schwartz 1971)], there exists a subsequence $(\lambda _{N_i},\phi _{N_i})$ such that $\lambda _{N_i}\rightarrow \lambda $ and $\phi _{N_i}\xrightarrow {w}\phi $.

It remains to prove that $(\lambda ,\phi )$ is an eigenvalue–eigenfunction pair of $\mathcal {K}$. For ease of notation, set $\lambda _{i} = \lambda _{N_i}$ and $\phi _i = \phi _{N_i}$. Denote $\hat{\mathcal {K}}_i = \mathcal {K}_{N_i} P_{N_i}^\mu = P_{N_i}^\mu \mathcal {K}P_{N_i}^\mu $ and observe that ${\hat{\mathcal {K}}}_i \phi _i = \lambda _i\phi _i$ for all i. Then, we have

$$\begin{aligned} \mathcal {K}\phi = {\hat{\mathcal {K}}}_i(\phi -\phi _i) + (\mathcal {K}-{\hat{\mathcal {K}}}_i)\phi + {\hat{\mathcal {K}}}_i\phi _i. \end{aligned}$$

Taking the inner product with an arbitrary $f\in \mathcal {F}$ and using the fact that ${\hat{\mathcal {K}}}_i\phi _i = \lambda _i\phi _i$, we get

$$\begin{aligned} \langle \mathcal {K}\phi ,f \rangle = \langle {\hat{\mathcal {K}}}_i(\phi -\phi _i),f \rangle +\langle (\mathcal {K}-{\hat{\mathcal {K}}}_i)\phi ,f\rangle + \langle \lambda _i\phi _i ,f \rangle . \end{aligned}$$

Now, the second term on the right-hand side $\langle (\mathcal {K}-{\hat{\mathcal {K}}}_i)\phi ,f\rangle \rightarrow 0$ since ${\hat{\mathcal {K}}}_i$ converges strongly to $\mathcal {K}$ by Theorem 3. The last term $\langle \lambda _i\phi _i,f \rangle \rightarrow \langle \lambda \phi ,f\rangle $ since $\lambda _i\rightarrow \lambda $ and $\phi _i \xrightarrow {w}\phi $. It remains to show that the first term converges to zero. We have

$$\begin{aligned} \big \langle {\hat{\mathcal {K}}}_i(\phi -\phi _i),f \big \rangle = \big \langle P_{N_i}^\mu \mathcal {K}P_{N_i}^\mu (\phi -\phi _i),f \big \rangle = \big \langle \mathcal {K}\big (P_{N_i}^\mu \phi -\phi _i\big ), P_{N_i}^\mu f \big \rangle , \end{aligned}$$

where we used the fact that $P_{N_i}^\mu $ is self-adjoint and $\phi _i \in \mathcal {F}_{N_i}$ and hence $P_{N_i}^\mu \phi _i = \phi _i$. Denote $h_i := \mathcal {K}(P_{N_i}^\mu \phi -\phi _i)$. We will show that $h_i\xrightarrow {w}0$. Indeed, denoting $\mathcal {K}^\star $ the adjoint of $\mathcal {K}$, we have

$$\begin{aligned} \big \langle \mathcal {K}\big (P_{N_i}^\mu \phi -\phi _i\big ),f\big \rangle= & {} \big \langle \big (P_{N_i}^\mu \phi -\phi + \phi -\phi _i\big ),\mathcal {K}^\star f\big \rangle = \big \langle P_{N_i}^\mu \phi -\phi ,\,\mathcal {K}^\star f\big \rangle \\&+\,\big \langle \phi -\phi _i,\,\mathcal {K}^\star f\big \rangle \rightarrow 0, \end{aligned}$$

since $P_{N_i}^\mu $ converges strongly to the identity (Lemma 2) and $\phi _i \xrightarrow {w}\phi $. Finally, we show that $\langle h_i, P_{N_i}^\mu f\rangle \rightarrow 0$. We have

$$\begin{aligned} \big \langle h_i, P_{N_i}^\mu f\big \rangle = \big \langle h_i, P_{N_i}^\mu f - f\big \rangle + \langle h_i, f\rangle . \end{aligned}$$

The second term goes to zero since $h_i\xrightarrow {w}0$. For the first term, we have

$$\begin{aligned} \big \langle h_i, P_{N_i}^\mu f - f\big \rangle \le \Vert h_i\Vert \Vert P_{N_i}^\mu f - f \Vert \rightarrow 0 \end{aligned}$$

since $P_{N_i}^\mu $ converges strongly to the identity operator (Lemma 2) and $h_i$ is bounded since $\mathcal {K}$ is bounded by Assumption 2, $\Vert P_{N_i}^\mu \Vert \le 1$ and $\Vert \phi _i \Vert \le 1$. Therefore, we conclude that

$$\begin{aligned} \langle \mathcal {K}\phi ,f \rangle = \lim _{i\rightarrow \infty } \langle \lambda _i \phi _i,f\rangle = \langle \lambda \phi ,f \rangle \end{aligned}$$

for all $f \in \mathcal {F}$. Therefore, $\mathcal {K}\phi = \lambda \phi $. $\square $

Example

As an example demonstrating that the assumption that the weak limit of $\phi _N$ is nonzero is important, consider $\mathcal {M}= [0,1]$, $T(x) = x$ and $\mu $ the Lebesgue measure on [0, 1]. In this setting, the Koopman operator $\mathcal {K}{:}\,L_2(\mu )\rightarrow L_2(\mu )$ is the identity operator with the spectrum being the singleton $\sigma (\mathcal {K})=\{1\}$. However, given any $\lambda \in \mathbb {C}$ and the sequence of functions $\phi _N = \sqrt{2}\sin (2\pi N x)$, we have

$$\begin{aligned} \mathcal {K}\phi _N - \lambda \phi _N = \phi _N - \lambda \phi _N = (1-\lambda )\sqrt{2}\sin (2\pi N x)\xrightarrow {w}0 \end{aligned}$$

with $\Vert \phi _N\Vert ^2 = \int _{0}^1 2\sin ^2(2\pi N x)\,dx = 1$. Therefore, if $\phi _N$ were the eigenfunctions of $\mathcal {K}_N$ with eigenvalues $\lambda _N\rightarrow \lambda \ne 1$, then the sequence $\lambda _N$ would converge to a spurious eigenvalue $\lambda $. Fortunately, in this case, we have $\sigma (\mathcal {K}_N) = \{ 1\}$ and hence no spurious eigenvalues exist; however, in general, we cannot rule out this possibility, at least not as far as the statement of Theorem 4 goes. See Fig. 1 for illustration.

This example, with the highly oscillatory functions $\phi _N$, may motivate practical considerations in detecting spurious eigenvalues, e.g., using Sobolev-type (pseudo) norms $\int _{\mathcal {M}} \Vert \nabla \phi _N\Vert ^2\,\mathrm{d}\mu $ or other metrics of oscillatoriness. See e.g., Giannakis (2016) for the use of Sobolev norms in the context of Koopman data analysis and forecasting.

As an immediate corollary of Theorem 4, we get:

Corollary 1

If Assumption 2 holds and $\lambda _{N,M}$ is a sequence of eigenvalues of $\mathcal {K}_{N,M}$ with the associated normalized eigenfunctions $\phi _{N,M} \in \mathcal {F}_N$, $\Vert \phi _{N,M} \Vert =1$, then there exists a subsequence $(\lambda _{N_i,M_j},\phi _{N_i,M_j})$ such that with probability one

$$\begin{aligned} \lim _{i\rightarrow \infty }\lim _{j\rightarrow \infty }\lambda _{N_i,M_j} = \lambda ,\quad \lim _{i\rightarrow \infty }\lim _{j\rightarrow \infty }\langle \phi _{N_i,M_j},f\rangle = \langle \phi ,f\rangle , \end{aligned}$$

for all $f \in \mathcal {F}$, where $\lambda \in \mathbb {C}$ and $\phi \in \mathcal {F}$ are such that $\mathcal {K}\phi = \lambda \phi $. In particular, if $\Vert \phi \Vert \ne 0$, then $\lambda $ is an eigenvalue of $\mathcal {K}$ with eigenfunction $\phi $.

Proof

First notice that since $\mathcal {K}_{N,M} \rightarrow \mathcal {K}_N$ in the operator norm (Theorem 2) and $\Vert \mathcal {K}_N\Vert \le \Vert \mathcal {K}\Vert < \infty $, the sequence $\lambda _{N,M}$ is bounded. Since $\phi _{N,M}$ are normalized and hence bounded, we can extract a subsequence $(\lambda _{N,M_j},\phi _{N,M_j})$ such that $\lim _{j\rightarrow \infty } \lambda _{N,M_j}=\lambda _N \in \mathbb {C}$ and $\lim _{j\rightarrow \infty } \phi _{N,M_j} = \phi _N \in \mathcal {F}_N$ (strong convergence as $\mathcal {F}_N$ is finite dimensional). Then,

$$\begin{aligned} \mathcal {K}_N\phi _N = (\mathcal {K}_N - \mathcal {K}_{N,M_j})\phi _N + \mathcal {K}_{N,M_j}(\phi _N-\phi _{N,M_j}) + \mathcal {K}_{N,M_j}\phi _{N,M_j}. \end{aligned}$$

Since $\mathcal {K}_{N,M_j}$ converges strongly to $\mathcal {K}_N$ with probability one (Theorem 2) and since $\phi _{N,M_j}$ converges strongly to $\phi _N$, the first two terms go to zero with probability one as j tends to infinity. The last term is equal to $\lambda _{N,M_j} \phi _{N,M_j}$ and hence necessarily $\mathcal {K}\phi _N = \lambda _N \phi _N$, $\Vert \phi _N\Vert =1$, with probability one. The result then follows from Theorem 4. $\square $

6 Implications for Finite-Horizon Predictions

One of the main roles of an approximation to the Koopman operator is to provide a prediction of the evolution of a given observable, whereas obtaining accurate predictions over an infinite-time horizon cannot be expected in general from the EDMD approximation of the Koopman operator; a prediction over any finite horizon is asymptotically, as $N\rightarrow \infty $, exact when the prediction error is measured in the $L_2(\mu )$ norm:

Theorem 5

Let $f\in \mathcal {F}^n$ be a given (vector) observable^{Footnote 6} and let Assumption 2 hold. Then, for any $\varOmega \in \mathbb {N}$ we have

$$\begin{aligned} \lim _{N\rightarrow \infty } \sup _{i\in \{1,\ldots ,\varOmega \}} \Vert (\mathcal {K}_N)^i P_N^\mu f - \mathcal {K}^i f \Vert = 0. \end{aligned}$$

(19)

In particular, if $f\in \mathcal {F}_{N_0}^n$ for some $N_0 \in \mathbb {N}$, then

$$\begin{aligned} \lim _{N\rightarrow \infty } \sup _{i\in \{1,\ldots ,\varOmega \}} \Vert (\mathcal {K}_N)^i f - \mathcal {K}^i f \Vert = 0. \end{aligned}$$

(20)

Proof

We proceed by induction. Let $f\in \mathcal {F}$. For $\varOmega =1$, the result is exactly Theorem 3. Let the result hold form some $\varOmega \in \mathbb {\mathbb {N}}$. It is sufficient to prove that $\Vert (\mathcal {K}_N)^{\varOmega +1} P_N^\mu f - \mathcal {K}^{\varOmega +1} f \Vert \rightarrow 0$ as $N\rightarrow \infty $. We have

$$\begin{aligned} \Vert (\mathcal {K}_N)^{\varOmega +1} P_N^\mu f - \mathcal {K}^{\varOmega +1} f \Vert&= \Vert \mathcal {K}_N(\mathcal {K}_N)^{\varOmega } P_N^\mu f - \mathcal {K}\mathcal {K}^{\varOmega } f \Vert = \Vert \mathcal {K}_N g_N - \mathcal {K}g \Vert \\&\le \Vert \mathcal {K}_N g- \mathcal {K}g \Vert + \Vert \mathcal {K}_N(g_N - g) \Vert \\&\le \Vert \mathcal {K}_N g- \mathcal {K}g \Vert + \Vert \mathcal {K}\Vert \Vert g_N - g \Vert . \end{aligned}$$

where $g_N = (\mathcal {K}_N)^{\varOmega } P_N^\mu f$ and $g = \mathcal {K}^{\varOmega } f$; in the last inequality we used the fact that $\Vert \mathcal {K}_N\Vert \le \Vert K\Vert $. The term $\Vert \mathcal {K}_N g- \mathcal {K}g \Vert $ tends to zero by Theorem 3, whereas the term $\Vert g_N - g \Vert \rightarrow 0$ by the induction hypothesis. This proves (19) for a scalar observable $f\in \mathcal {F}$. The general result with a vector valued observable $f\in \mathcal {F}^n$ follows by applying the same reasoning to each component of f. The result (20) follows from (19) since if $f\in \mathcal {F}_{N_0}^n$ for some $N_0\in \mathbb {N}$, then $P_N^\mu f = f$ for $N\ge N_0$. $\square $

To be more specific on practical use of $\mathcal {K}_N$ for prediction, assume that $f\in \mathcal {F}_{N_0}^n$ for some $N_0\in \mathbb {N}$. Then, for all $N\ge N_0$ there exists a matrix $C_N\in \mathbb {R}^{n \times N}$ such that $f = C_N\varvec{\psi }_N$, where $\varvec{\psi }_N = [\psi _1,\ldots ,\psi _N]^\top $ are the observables used in EDMD. Assume that an initial state $x_0$ is given and the values of the observables $\varvec{\psi }_N(x_0)$ are known and we wish to predict the value of the observable f at a state $x_i = T^i(x_0)$, i.e., i steps ahead in the future. Using $\mathcal {K}_N$, this prediction is given by $C_N A_N^i \varvec{\psi }_N(x_0)$, where

$$\begin{aligned} A_N = \lim _{M\rightarrow \infty } A_{N,M} \end{aligned}$$

with $A_{N,M}$ defined^{Footnote 7} in (5). Theorem 5 then says that

$$\begin{aligned} \lim _{N\rightarrow \infty } \int _{\mathcal {M}} \Vert C_N A_N^i \varvec{\psi }_N - f\circ T^i \Vert ^2_2\,\mathrm{d}\mu = 0\quad \forall \, i\in \mathbb {N}. \end{aligned}$$

(21)

A typical application of Theorem 5 is the prediction of the future state x of the dynamical system (1) with a finite-dimensional state space $\mathcal {M}\subset \mathbb {R}^n$. In this case, one simply sets $f(x) = x$. A crucial feature of the predictor obtained in this way is its linearity in the “lifted state” $z=\varvec{\psi }_N(x)$, allowing linear tools to address a nonlinear problem. This concept was successfully applied to model predictive control in Korda and Mezić (2016) and to state estimation in Surana and Banaszuk (2016).

Remark 2

If $A_{N,M}$ is used instead of $A_N$ in Theorem 5 and Eq. (21), then the same convergence results hold with a double limit, first taking the number of samples M to infinity and then the number of basis functions N. In particular, we get

$$\begin{aligned} \lim _{N\rightarrow \infty } \lim _{M\rightarrow \infty } \int _{\mathcal {M}} \Vert C_N A_{N,M}^i \varvec{\psi }_N - f\circ T^i \Vert ^2_2\,\mathrm{d}\mu = 0\quad \forall \, i\in \mathbb {N}. \end{aligned}$$

(22)

7 Analytic EDMD

The results of the previous sections suggest a variation of the EDMD algorithm provided that the mapping T is known in closed form and provided that the basis functions $\psi _i$ are such that the integrals of $\int _{\mathcal {M}} \psi _i \overline{\psi _j}\,\mathrm{d}\mu $ and $\int _{\mathcal {M}} (\psi _i\circ T)\overline{\psi _j}\,\mathrm{d}\mu $ can be computed analytically. This is the case in particular if T and $\psi _i$ are simple functions such as multivariate polynomials or trigonometric functions and $\mu $ is the uniform distribution over a simple domain $\mathcal {M}$ such as a box or a ball, or, e.g., a Gaussian distribution over $\mathbb {R}^n$.

Provided that such analytical evaluation is possible, one can circumvent the sampling step of EDMD and construct directly $\mathcal {K}_N$ rather than $\mathcal {K}_{N,M}$. Indeed, define

$$\begin{aligned} A_N = M_{T,\mu } M_{\mu }^{-1}, \end{aligned}$$

(23)

where

$$\begin{aligned} M_{\mu } = \int _{\mathcal {M}} \varvec{\psi }\varvec{\psi }^{{\mathsf {H}}}\,\mathrm{d}\mu ,\quad M_{T,\mu } = \int _{ \mathcal {M}}(\varvec{\psi }\circ T)\varvec{\psi }^{{\mathsf {H}}}\,\mathrm{d}\mu . \end{aligned}$$

Then, the operator from $\mathcal {F}_N$ to $\mathcal {F}_N$ defined by $c^{{\mathsf {H}}}\varvec{\psi }\mapsto c^{{\mathsf {H}}}A_N\varvec{\psi }$ is exactly $\mathcal {K}_N = P_N^\mu \mathcal {K}_{|\mathcal {F}_N}$.

Theorem 6

If the matrix $M_\mu $ is invertible, then for any $\phi = c_\phi ^{{\mathsf {H}}}\varvec{\psi }\in \mathcal {F}_N$ we have

$$\begin{aligned} c_\phi ^{{\mathsf {H}}}A_N\varvec{\psi }= \mathcal {K}_N\phi . \end{aligned}$$

Proof

Given $\phi = c_\phi ^{{\mathsf {H}}}\varvec{\psi }$, we get

$$\begin{aligned} \mathcal {K}_N\phi= & {} P_N^\mu \mathcal {K}\phi = \varvec{\psi }^{{\mathsf {H}}}\mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{c\in \mathbb {C}^N}\int _{\mathcal {M}}[c^{{\mathsf {H}}}\varvec{\psi }- c_\phi ^{{\mathsf {H}}}(\varvec{\psi }\circ T)]^2\,\mathrm{d}\mu \\= & {} \varvec{\psi }^{{\mathsf {H}}}\mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{c\in \mathbb {C}^N}\big [ c^{{\mathsf {H}}}M_{\mu } c - 2{\mathrm{Re}}\big \{c^{{\mathsf {H}}}M_{T,\mu }^{{\mathsf {H}}}c_{\phi }\big \}\big ] \end{aligned}$$

with the unique minimizer $c = M_{\mu }^{-1}M_{T,\mu }^{{\mathsf {H}}}c_\phi $ (since $M_\mu $ is invertible, therefore Hermitian positive definite, and hence the minimized function is strictly convex). Therefore, as desired

$$\begin{aligned} \mathcal {K}_N\phi = c^{{\mathsf {H}}}\varvec{\psi }= c_\phi ^{{\mathsf {H}}}M_{T,\mu } M_{\mu }^{-1}\varvec{\psi }= c_\phi ^{{\mathsf {H}}}A_N \varvec{\psi }. \end{aligned}$$

$\square $

Example

In order to demonstrate the use of Analytic DMD, we compare the spectra of $\mathcal {K}_N$ and $\mathcal {K}_{N,M}$ for various values of M. The system considered is the logistic map

$$\begin{aligned} x^+ = 2x^2 - 1,\quad x \in [-1,1]. \end{aligned}$$

The measure $\mu $ is taken to be the uniform distribution on $[-\,1,1]$, which is not invariant and hence the dynamics is not measure preserving. The finite-dimensional subspace $\mathcal {F}_N$ is the space of all polynomials of degree no more than eight. For numerical stability, we chose a basis of this subspace to be the Laguerre polynomials scaled such that they are orthonormal with respect to the uniform measure on $[-\,1,1]$. Spectra of $\mathcal {K}_N$ and $\mathcal {K}_{N,M}$ for $M=10^2$, $M=10^3$ and $M=10^5$ are depicted in Fig. 2. We observe that a relatively large number of samples M are required to obtain an accurate approximation of the spectrum of $\mathcal {K}_{N}$. This example demonstrates that a special care must be taken in practice when drawing conclusions about spectral quantities based on a computation with a small number of samples. On the other hand, Fig. 3 suggests that, at least on this example, predictions generated by $\mathcal {K}_{N,M}$ are less affected by sampling. Indeed, even for $M=100$ the prediction accuracy of $\mathcal {K}_{N,M}$ is comparable to that of $\mathcal {K}_N$ and for $M=1000$ the two predictions almost coincide.

8 Convergence of $\mathcal {K}_{N,N}$

In this section, we investigate what happens when we simultaneously increase the number of basis functions N and the number of samples M. We treat the special case of $M = N$ for which interesting conclusions can be drawn. Set therefore $M = N$ and denote $\lambda _N = \lambda _{N,N}$ any eigenvalue of $\mathcal {K}_{N,N}$ and $\phi _N = \phi _{N,N} \in \mathcal {F}_N$, $\Vert \phi _N\Vert _{C(\mathcal {M})} =1$, the associated eigenfunction, where $\Vert \phi \Vert _{C(\mathcal {M})} = \sup _{x\in \mathcal {M}} |\phi (x)|$; such normalization is possible if the basis functions $\psi _i$ are continuous and $\mathcal {M}$ compact, which we assume in this section. First notice that, assuming $M_{{\hat{\mu }}_N}$ invertible, for $N=M$ the system of equations

$$\begin{aligned} \varvec{\psi }(\varvec{Y}) = A\varvec{\psi }(\varvec{X}) \end{aligned}$$

with the unknown $A\in \mathbb {R}^{N\times N}$ has a solution and hence the minimum in the least-squares problem (4) is zero. In other words, for any $f\in \mathcal {F}_N$, the EDMD operator $\mathcal {K}_{N,N}$ applied to f matches the value of the Koopman operator applied to f on the samples points $x_1,\ldots ,x_N$:

$$\begin{aligned} (\mathcal {K}f)(x_i) = (\mathcal {K}_{N,N}f)(x_i) \end{aligned}$$

for all $f\in \mathcal {F}_N$. This relation is in particular satisfied for the eigenfunctions $\phi _N$ of $\mathcal {K}_{N,N}$, obtaining

$$\begin{aligned} (\phi _N\circ T)(x_i) = \lambda _N \phi _N(x_i). \end{aligned}$$

Multiplying by an arbitrary $h\in \mathcal {F}$ and integrating with respect to the empirical measure supported on the sample points (8), we get

$$\begin{aligned} \int _{\mathcal {M}} h\cdot (\phi _{N}\circ T)\, \mathrm{d}{\hat{\mu }}_{N} = \lambda _{N} \int _{\mathcal {M}} h \phi _{N} \, \mathrm{d}{\hat{\mu }}_{N}. \end{aligned}$$

(24)

Define the linear functional $L_{N}{:}\,C(\mathcal {M})\rightarrow \mathbb {C}$ by

$$\begin{aligned} L_N(h) = \int _{\mathcal {M}} h\, \phi _{N}\, \mathrm{d}\hat{\mu }_{N}, \end{aligned}$$

and

$$\begin{aligned} (\mathcal {K}L_N)(h) = \int _{\mathcal {M}} h\cdot (\phi _{N}\circ T)\, \mathrm{d}{\hat{\mu }}_{N}. \end{aligned}$$

With this notation, the relationship (24) becomes

$$\begin{aligned} \mathcal {K}L_N = \lambda _{N} L_N. \end{aligned}$$

Since $\Vert \phi _{N}\Vert _{C(\mathcal {M})} = 1$, we have $\Vert L_N \Vert = \sup _{h\in C(\mathcal {M})} \frac{|L_N(h)|}{\Vert h \Vert _{C(\mathcal {M})}} \le 1$ and $\Vert \mathcal {K}L_N \Vert \le 1$. Therefore, assuming separability^{Footnote 8} of $C(\mathcal {M})$, by the Banach–Alaoglu theorem (e.g., Rudin 1973, Theorems 3.15, 3.17) there exists a subsequence, along which these functionals converge in the weak$^\star $ topology^{Footnote 9} to some functionals $L \in C(\mathcal {M})^\star $ and $\mathcal {K}L\in C(\mathcal {M})^\star $ satisfying

$$\begin{aligned} \mathcal {K}L = \lambda L, \end{aligned}$$

where $\lambda $ is an accumulation point of $\lambda _N$. Furthermore, by the Riesz representation theorem the bounded linear functionals L and $\mathcal {K}L$ can be represented by complex-valued measures $\nu $ and $\mathcal {K}\nu $ on $\mathcal {M}$ satisfying

$$\begin{aligned} \mathcal {K}\nu = \lambda \nu . \end{aligned}$$

We remark that $\mathcal {K}L$ and $\mathcal {K}\nu $ are here merely symbols for the weak$^\star $ limit of $\mathcal {K}L_N$ and its representation as a measure; in particular, the functional $\mathcal {K}L$ is not necessarily of the form $(\mathcal {K}L)(h)=\int _{\mathcal {M}} h\cdot (\rho \circ T)\,\mathrm{d}\mu $ for some function $\rho $.

In order to get more understanding of $\mathcal {K}\nu $ (and hence $\mathcal {K}L$), we need to impose additional assumptions on the structure of the problem. In particular, we assume that the mapping $T{:}\,\mathcal {M}\rightarrow \mathcal {M}$ is a homeomorphism and that the points $x_1,\ldots ,x_N$ lie on a single trajectory, i.e., $x_{i+1} = T(x_i)$. With this assumption, Eq. (24) reads

$$\begin{aligned} \frac{1}{N} \sum _{i=1}^N h(x_i)\phi _N(x_{i+1}) = \lambda _{N} \frac{1}{N} \sum _{i=1}^N h(x_i)\phi _N(x_i), \end{aligned}$$

(25)

where we set $x_{N+1}:= T(x_N)$. The left-hand side of (25) is

$$\begin{aligned} \frac{1}{N} \sum _{i=1}^N h(x_i)\phi _N(x_{i+1})&= \frac{1}{N} \sum _{i=1}^N h(T^{-1} x_i)\phi _N(x_i) + \frac{1}{N}\big [h(x_N)\phi _N(x_{N+1})\nonumber \\&\quad -\,h(T^{-1}x_1)\phi _N(x_1)\big ]\nonumber \\&= \int _{\mathcal {M}} h\circ T^{-1} \, \mathrm{d}\nu _N + \frac{1}{N}\big [h(x_N)\phi _N(x_{N+1}) - h(T^{-1}x_1)\phi _N(x_1)\big ], \end{aligned}$$

(26)

where $\nu _N$ is the measure $\phi _N \mathrm{d}{\hat{\mu }}_N$. Setting $\xi _N:=h(x_N)\phi _N(x_{N+1}) - h(T^{-1}x_1)\phi _N(x_1)$, the relation (25) becomes

$$\begin{aligned} \int _{\mathcal {M}} h\circ T^{-1} \, \mathrm{d}\nu _N + \frac{1}{N}\xi _N = \lambda _N\int _{\mathcal {M}} h\,\mathrm{d}\nu _N. \end{aligned}$$

Since h is bounded on $\mathcal {M}$ (h is continuous and $\mathcal {M}$ compact) and $\Vert \phi _N\Vert _{C(\mathcal {M})} = 1$, the term $\xi _N$ is bounded; in addition $h\circ T^{-1}$ is continuous since T is a homeomorphism by assumption. Therefore, taking a limit on both sides, along a subsequence such that $\nu _{N_i}\rightarrow \nu $ weakly,^{Footnote 10} ${\hat{\mu }}_{N_i}\rightarrow \mu $ weakly and $\lambda _{N_i}\rightarrow \lambda $, we obtain

$$\begin{aligned} \int _{\mathcal {M}} h\circ T^{-1} \, \mathrm{d}\nu = \lambda \int _{\mathcal {M}} h\,\mathrm{d}\nu \end{aligned}$$

(27)

for all $h\in C(\mathcal {M})$.

8.1 Weak Eigenfunctions/Eigendistributions

To understand relation (27), note that a completely analogous computation to (26) shows that the measure $\mu $ is invariant^{Footnote 11} and therefore the $L_2(\mu )$-adjoint $\mathcal {K}^\star $ of the Koopman operator [viewed as an operator from $L_2(\mu ) $ to $L_2(\mu )$] is given by

$$\begin{aligned} \mathcal {K}^\star f = f\circ T^{-1}. \end{aligned}$$

To see this, write

$$\begin{aligned} \langle \mathcal {K}f,g \rangle= & {} \int _{\mathcal {M}} (f\circ T) \overline{g} \,\mathrm{d}\mu = \int _{\mathcal {M}} (f\circ T)\overline{(g\circ T^{-1}\circ T)} \,\mathrm{d}\mu \\= & {} \int _{\mathcal {M}} f \cdot \overline{(g\circ T^{-1} )}\,\mathrm{d}\mu = \langle f, \mathcal {K}^\star g\rangle , \end{aligned}$$

which means the operator $g\mapsto g\circ T^{-1}$ is indeed the $L_2(\mu )$-adjoint of $\mathcal {K}$. The relation (27) then becomes

$$\begin{aligned} \int _{\mathcal {M}} \mathcal {K}^\star h \, \mathrm{d}\nu = \lambda \int _{\mathcal {M}} h\,\mathrm{d}\nu \end{aligned}$$

(28)

or

$$\begin{aligned} L(\mathcal {K}^\star h) = \lambda L(h). \end{aligned}$$

(29)

Functionals of the form (29) were called “generalized eigenfunctions” by Gelfand and Shilov (1964); here, we prefer to call them “weak eigenfunctions” or “eigendistributions” in order to avoid confusion with generalized eigenfunctions viewed as an extension of the notion of generalized eigenvectors from linear algebra. The measure $\nu $ in (28) is then called “eigenmeasure.” Here, again, we emphasize the requirement that the limiting functional L (or the measure $\nu $) be nonzero in order for these objects to be called eigenfunctionals/eigenmeasures.

8.2 Eigenmeasures of Perron–Frobenius

We also observe an interesting connection to eigenmeasures of the Perron–Frobenius operator. To see this, set $h:= g\circ T$ in (27) to obtain $ \int _{\mathcal {M}}g\,\mathrm{d}\nu = \lambda \int _{\mathcal {M}} g\circ T\,\mathrm{d}\nu $ or, provided that $\lambda \ne 0$,

$$\begin{aligned} \int _{\mathcal {M}} g\circ T\,\mathrm{d}\nu = \frac{1}{\lambda } \int _{\mathcal {M}}g\,\mathrm{d}\nu . \end{aligned}$$

(30)

In other words, if nonzero, the measure $\nu $ is the eigenmeasure of the Perron–Frobenius operator with eiegnvalue $1/\lambda $. Here, the Perron–Frobenius operator $\mathcal {P}{:}\,M(\mathcal {M})\rightarrow M(\mathcal {M})$, where $M(\mathcal {M})$ is the space of all complex-valued measures on $\mathcal {M}$, is defined for every $\eta \in M(\mathcal {M})$ and every Borel set A by

$$\begin{aligned} (\mathcal {P}\eta )(A) = \eta (T^{-1}(A)). \end{aligned}$$

The results of Sect. 8 are summarized in the following theorem:

Theorem 7

Suppose that $\mathcal {M}$ is a compact metric space, T is a homeomorphism, $\mathcal {K}{:}\,L_2(\mu )\rightarrow L_2(\mu )$ is bounded, the observables $\psi _1,\ldots ,\psi _N$ are continuous and the sample points $x_1,\ldots , x_N$ satisfy $x_{i+1} = T(x_i)$ for all $i\in \{1,\ldots ,N-1\}$. Let $\lambda _N$ be a bounded sequence of eigenvalues of $\mathcal {K}_{N,N}$, let $\phi _N$, $\Vert \phi _N\Vert _{C(\mathcal {M})}=1$, be the associated normalized eigenfunctions and denote $\nu _N = \phi _N \mathrm{d}{\hat{\mu }}_N$. Then, there exists a subsequence $(N_i)_{i=1}^\infty $ such that $\nu _{N_i}$ and ${\hat{\mu }}_{N_i}$ converge weakly to complex-valued measures $\nu \in M(\mathcal {M})$, $\mu \in M(\mathcal {M})$ and $\lim _{i\rightarrow \infty } \lambda _{N_i } = \lambda \in \mathbb {C}$ such that

$$\begin{aligned} \int _{\mathcal {M}} h\circ T^{-1} \, \mathrm{d}\nu = \lambda \int _{\mathcal {M}} h\,\mathrm{d}\nu \quad \forall \,h\in C(\mathcal {M}). \end{aligned}$$

In addition, the measure $\mu $ is invariant under the action of T and

$$\begin{aligned} \int _{\mathcal {M}} \mathcal {K}^\star h \, \mathrm{d}\nu = \lambda \int _{\mathcal {M}} h\,\mathrm{d}\nu \quad \forall \,h\in C(\mathcal {M}), \end{aligned}$$

where $\mathcal {K}^\star $ is the $L_2(\mu )$ adjoint of $\mathcal {K}$, i.e., if nonzero, $\nu $ is a weak eigenfunction (or eigendistribution) of the Koopman operator. Furthermore, if $\lambda \ne 0$, then

$$\begin{aligned} \int _{\mathcal {M}} h\circ T\,\mathrm{d}\nu = \frac{1}{\lambda } \int _{\mathcal {M}}h\,\mathrm{d}\nu \quad \forall \,h\in C(\mathcal {M}), \end{aligned}$$

i.e., if nonzero, $\nu $ is an eigenmeaure of the Perron–Frobenius operator with eigenvalue $1/\lambda $.

9 Conclusions

This paper analyzes the convergence of the EDMD operator $\mathcal {K}_{N,M}$, where M is the number of samples and N the number of observables used in EDMD. It was proven in Klus et al. (2016) that as $M\rightarrow \infty $, the operator $\mathcal {K}_{N,M}$ converges to $\mathcal {K}_N$, the orthogonal projection of the action of the Koopman operator on the span of the observables used in EDMD. We analyzed the convergence of $\mathcal {K}_N$ as $N\rightarrow \infty $, obtaining convergence in strong operator topology to the Koopman operator and weak convergence of the associated eigenfunctions along a subsequence together with the associated eigenvalues. In particular, any accumulation point of the spectra of $\mathcal {K}_N$ corresponding to a nonzero weak accumulation point of the eigenfunctions lies in the point spectrum of the Koopman operator $\mathcal {K}$. In addition, we proved convergence of finite-horizon predictions obtained using $\mathcal {K}_N$ in the $L_2$ norm, a result important for practical applications such as forecasting, estimation and control. Finally, we analyzed convergence of $\mathcal {K}_{N,N}$ (i.e., the situation where the number of samples and the number of basis functions is equal) under the assumptions that the sample points lie on the same trajectory. In this case, one obtains convergence, along a subsequence, to a weak eigenfunction (or eigendistribution) of the Koopman operator, provided that the weak limit is nonzero. This eigendistribution turns out to be also an eigenmeasure of the Perron–Frobenius operator. As a by-product of these results, we proposed an algorithm that, under some assumptions, allows one to construct $\mathcal {K}_N$ directly, without the need for sampling, thereby eliminating the sampling error.

Future work should focus on non-asymptotic analysis, e.g., on selecting the subspace $\mathcal {F}_N$ such that $\Vert \mathcal {K}_{N} - \mathcal {K}_{|\mathcal {F}_N} \Vert $ is minimized and at the same time such that $\mathcal {F}_N$ is rich enough in the sense of containing observables of practical interest (e.g., the state observable). This line of research was already investigated in the context of stochastic systems in Wu and Noé (2017), providing an interesting and actionable method for selecting $\mathcal {F}_N$.

Notes

We choose to work in the general setting of dynamical systems on arbitrary topological spaces which encompass dynamical systems on finite-dimensional manifolds (in which case one can regard $\mathcal {M}$ as a subset of $\mathbb {R}^n$), as well as infinite-dimensional dynamical systems, arising, for example, from the study of partial differential equations or dynamical systems with control inputs (Korda and Mezić 2016).
In general, the solution to (4) may not be unique; however, the matrix $A_{N,M} = \varvec{\psi }(\varvec{Y})\varvec{\psi }(\varvec{X})^\dagger $, where $\cdot ^ \dagger $ denotes the Moore–Penrose pseudoinverse, is always uniquely defined and $A_{N,M}$ is always a minimizer in (4).
Since $\mathcal {K}{:}\,\mathcal {F}\rightarrow \mathcal {F}$, the assumption of $\mathcal {F}= L_2(\mu )$ implies that the composition relation $\phi \circ T$, $\phi \in L_2(\mu )$, gives rise to a well-defined operator from $L_2(\mu )$ to $L_2(\mu )$. In particular, this implies that $\Vert \phi _1\circ T - \phi _2\circ T\Vert _{L_2(\mu )} = 0$ whenever $\Vert \phi _1-\phi _2 \Vert _{L_2(\mu )} = 0$ and that $\int _{\mathcal {M}} |\phi \circ T|^2 \, \mathrm{d}\mu < \infty $ for all $\phi \in L_2(\mu )$.
To be more specific, if the matrix $M_{{\hat{\mu }}_M}$ is not invertible, the solution to (10) may not be unique when viewed as a member of $ L_2(\mu )$. When viewed as a member of $L_2({\hat{\mu }}_M)$, the solution is unique (since it is a projection onto a closed subspace of a Hilbert space). This is because in this case two functions from $\mathcal {F}_N$ belonging to distinct $L_2(\mu )$ equivalence classes may fall into the same $L_2({\hat{\mu }}_M)$ equivalence class.
A sequence of operators $\mathcal {A}_i$ converges in the operator norm to an operator $\mathcal {A}$ if $\lim _{i\rightarrow \infty }\Vert \mathcal {A}_i -\mathcal {A}\Vert = 0 $.
We choose to state the theorem for vector observables as this is the form of prediction typically encountered in practice. For a vector observable $f\in \mathcal {F}^n$, the norm $\Vert f\Vert $ is defined by $\sum _{i=1}^n \Vert f_i \Vert _{L_2(\mu )}, $ where $f_i\in \mathcal {F}$ is the $i^{\mathrm {th}}$ component of f.
In Sect. 7, we show how the matrix $A_N$ can be constructed analytically.
A sufficient condition for $C(\mathcal {M})$ to be separable is $\mathcal {M}$ compact and metrizable.
A sequence of functionals $L_i\in C(\mathcal {M})^\star $ converges in the weak$^\star $ topology if $\lim _{i\rightarrow \infty }L_i(f) = L(f)$ for all $f\in C(\mathcal {M})$.
A sequence of Borel measures $\mu _i$ converges weakly to a measure $\mu $ if $\lim _{i\rightarrow \infty }\int f\,\mathrm{d}\mu _i = \int f\,\mathrm{d}\mu $ for all continuous bounded functions f. This convergence is also referred to as narrow convergence and it coincides with convergence in the weak$^\star $ topology if the underlying space is compact (which is the case in our setting).
A measure $ \mu $ on $\mathcal {M}$ is invariant if $\mu (T^{-1}(A)) = \mu (A)$ for all Borel sets $A\subset \mathcal {M}$ or equivalently if $\int _{\mathcal {M}} f\circ T \, \mathrm{d}\mu = \int _{\mathcal {M}} f \, \mathrm{d}\mu $ for all continuous bounded functions f.

References

Arbabi, H., Mezić, I.: Ergodic Theory, Dynamic Mode Decomposition and Computation of Spectral Properties of the Koopman Operator (2016). arXiv preprint arXiv:1611.06664
Brunton, B.W., Johnson, L.A., Ojemann, J.G., Kutz, J.N.: Extracting spatial-temporal coherent patterns in large-scale neural recordings using dynamic mode decomposition. J. Neurosci. Methods 258, 1–15 (2016a)
Article Google Scholar
Brunton, S.L., Brunton, B.W., Proctor, J.L., Kutz, J.N.: Koopman invariant subspaces and finite linear representations of nonlinear dynamical systems for control. PLoS ONE 11(2), e0150171 (2016b)
Article MATH Google Scholar
Budisić, M., Mohr, R., Mezić, I.: Applied koopmanism. Chaos Interdiscip. J. Nonlinear Sci. 22(4), 047–510 (2012)
MathSciNet MATH Google Scholar
Dunford, N., Schwartz, J.T.: Linear Operators, Part 1. Wiley-interscience, New York (1971)
MATH Google Scholar
Gelfand, I.M., Shilov, G.: Generalized Functions: Properties and Operations, vol. 1. Academic Press, New York (1964)
Google Scholar
Georgescu, M., Mezić, I.: Building energy modeling: a systematic approach to zoning and model reduction using Koopman mode analysis. Energy Build. 86, 794–802 (2015)
Article Google Scholar
Giannakis, D.: Data-Driven Spectral Decomposition and Forecasting of Ergodic Dynamical Systems (2016). arXiv preprint arXiv:1507.02338
Giannakis, D., Slawinska, J., Zhao, Z.: Spatiotemporal feature extraction with data-driven Koopman operators. In: Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges, NIPS, pp. 103–115 (2015)
Glaz, B., Mezic, I., Fonoberova, M., Loire, S.: Quasi-Periodic Intermittency in Oscillating Cylinder Flow (2016). arXiv preprint arXiv:1609.06267
Klus, S., Koltai, P., Schütte, C.: On the numerical approximation of the Perron–Frobenius and Koopman operator. J. Comput. Dyn. 3(1), 51–79 (2016)
MathSciNet MATH Google Scholar
Koopman, B.O.: Hamiltonian systems and transformation in Hilbert space. Proc. Natl. Acad. Sci. USA. 17(5), 315 (1931)
Article MATH Google Scholar
Korda, M., Mezić, I.: Linear Predictors for Nonlinear Dynamical Systems: Koopman Operator Meets Model Predictive Control (2016). arXiv preprint arXiv:1611.03537
Mauroy, A., Goncalves, J.: Koopman-Based Lifting Techniques for Nonlinear Systems Identification (2017). arXiv preprint arXiv:1709.02003
Mezić, I.: Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dyn. 41(1–3), 309–325 (2005)
MathSciNet MATH Google Scholar
Mezić, I.: Analysis of fluid flows via spectral properties of the Koopman operator. Annu. Rev. Fluid Mech. 45, 357–378 (2013)
Article MathSciNet MATH Google Scholar
Mezić, I., Banaszuk, A.: Comparison of systems with complex behavior. Phys. D Nonlinear Phenom. 197(1), 101–133 (2004)
MathSciNet MATH Google Scholar
Raak, F., Susuki, Y., Mezić, I., Hikihara, T.: On Koopman and dynamic mode decompositions for application to dynamic data with low spatial dimension. In: IEEE 55th Conference on Decision and Control (CDC), pp. 6485–6491 (2016)
Riseth, A.N., Taylor-King, J.P.: Operator Fitting for Parameter Estimation of Stochastic Differential Equations (2017). arXiv preprint arXiv:1709.05153
Rowley, C.W., Mezić, I., Bagheri, S., Schlatter, P., Henningson, D.: Spectral analysis of nonlinear flows. J. Fluid Mech. 641(1), 115–127 (2009)
Article MathSciNet MATH Google Scholar
Rudin, W.: Functional Analysis. McGraw-Hill, Inc., New York (1973)
MATH Google Scholar
Schmid, P.J.: Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech. 656, 5–28 (2010)
Article MathSciNet MATH Google Scholar
Sharma, A.S., Mezić, I., McKeon, B.J.: On the correspondence between Koopman mode decomposition, resolvent mode decomposition, and invariant solutions of the Navier–Stokes equations. Phys. Rev. Fluids 1(3), 032402(R) (2016)
Article Google Scholar
Surana, A., Banaszuk, A.: Linear observer synthesis for nonlinear systems using Koopman operator framework. In: IFAC Symposium on Nonlinear Control Systems (NOLCOS) (2016)
Takeishi, N., Kawahara, Y., Yairi, T.: Subspace Dynamic Mode Decomposition for Stochastic Koopman Analysis (2017). arXiv preprint arXiv:1705.04908
Tu, J.H., Rowley, C.W., Luchtenburg, D.M., Brunton, S.L., Kutz, J.N.: On dynamic mode decomposition: theory and applications. J. Comput. Dyn. 1, 391–421 (2014)
Article MathSciNet MATH Google Scholar
Williams, M.O., Kevrekidis, I.G., Rowley, C.W.: A data-driven approximation of the Koopman operator: extending dynamic mode decomposition. J. Nonlinear Sci. 25(6), 1307–1346 (2015)
Article MathSciNet MATH Google Scholar
Wu, H., Noé, F.: Variational Approach for Learning Markov Processes from Time Series Data (2017). arXiv preprint arXiv:1707.04659
Wu, H., Nüske, F., Paul, F., Klus, S., Koltai, P., Noé, F.: Variational Koopman models: slow collective variables and molecular kinetics from short off-equilibrium simulations. J. Chem. Phys. 146(15), 154104 (2017)
Article Google Scholar

Download references

Acknowledgements

The first author would like to thank Mihai Putinar for discussions on a related topic that sparked the work on this paper. We also thank Clancy Rowley for helpful comments on an earlier version of this manuscript as well as to Péter Koltai for pointing out the reference (Klus et al. 2016). This research was supported in part by the ARO-MURI Grant W911NF-14-1-0359 and the DARPA Grant HR0011-16-C-0116. The research of M. Korda was supported by the Swiss National Science Foundation under Grant P2ELP2_165166.

Author information

Authors and Affiliations

University of California, Santa Barbara, Santa Barbara, CA, USA
Milan Korda & Igor Mezić

Authors

Milan Korda
View author publications
You can also search for this author in PubMed Google Scholar
Igor Mezić
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Milan Korda.

Additional information

Communicated by Bruno Eckhardt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Korda, M., Mezić, I. On Convergence of Extended Dynamic Mode Decomposition to the Koopman Operator. J Nonlinear Sci 28, 687–710 (2018). https://doi.org/10.1007/s00332-017-9423-0

Download citation

Received: 10 May 2017
Accepted: 30 October 2017
Published: 13 November 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s00332-017-9423-0

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On Convergence of Extended Dynamic Mode Decomposition to the Koopman Operator

Abstract

Similar content being viewed by others

A Data–Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition

Dynamic Mode Decomposition for Continuous Time Systems with the Liouville Operator

On the Convergence of Iterative Filtering Empirical Mode Decomposition

1 Introduction

1.1 Notation

2 Extended Dynamic Mode Decomposition

3 EDMD as \(L_2\) Projection

Theorem 1

Proof

Remark 1

4 Convergence of \(\mathcal {K}_{N,M}\) as \(M\rightarrow \infty \)

Assumption 1

Lemma 1

Proof

Theorem 2

Proof

4.1 Ergodic Sampling

5 Convergence of \(\mathcal {K}_N\) to \(\mathcal {K}\)

5.1 Preliminaries

Definition 1

Definition 2

Definition 3

Assumption 2

5.2 Convergence in Strong Operator Topology

Lemma 2

Proof

Theorem 3

Proof

5.3 Weak Spectral Convergence

Theorem 4

Proof

Example

Corollary 1

Proof

6 Implications for Finite-Horizon Predictions

Theorem 5

Proof

Remark 2

7 Analytic EDMD

Theorem 6

Proof

Example

8 Convergence of \(\mathcal {K}_{N,N}\)

8.1 Weak Eigenfunctions/Eigendistributions

8.2 Eigenmeasures of Perron–Frobenius

Theorem 7

9 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation