1 Introduction

Recently, there has been an expanding interest in utilizing the spectral expansion-based methodology that enabled progress in data-driven analysis of high-dimensional nonlinear systems. The research was initiated in Mezić (2005) and Mezić and Banaszuk (2004), using composition (Koopman) operator representation originally defined in Koopman (1931). The framework is being used for model reduction, identification, prediction, data assimilation and control of deterministic (e.g., Budisić et al. 2012; Mauroy and Goncalves 2017; Rowley et al. 2009; Williams et al. 2015; Brunton et al. 2016b; Giannakis et al. 2015; Korda and Mezić 2016) as well as stochastic dynamical systems (e.g., Takeishi et al. 2017; Riseth and Taylor-King 2017; Wu and Noé 2017). This has propelled the theory to wide use on a diverse set of applications such as fluid dynamics (Sharma et al. 2016; Glaz et al. 2016), power grid dynamics (Raak et al. 2016), neurodynamics (Brunton et al. 2016a), energy efficiency (Georgescu and Mezić 2015), molecular dynamics (Wu et al. 2017) and data fusion (Williams et al. 2015).

Numerical methods for approximation of the spectral properties of the Koopman operator have been considered since the inception of the data-driven analysis of dissipative dynamical systems (Mezić and Banaszuk 2004; Mezić 2005). These belong to the class of generalized Laplace analysis (GLA) methods (Mezić 2013). An alternative line of algorithms, called the dynamic mode decomposition (DMD) algorithm (Schmid 2010; Rowley et al. 2009), has also been advanced, enabling concurrent data-driven determination of approximate eigenvalues and eigenvectors of the underlying DMD operator. The examples of DMD-type algorithms are (1) the companion matrix method proposed by Rowley et al. (2009), (2) the SVD-based DMD developed by Schmid (2010), (3) the Exact DMD method introduced by Tu et al. (2014), (4) the Extended DMD (Williams et al. 2015). The relationship between these methods and the spectral operator properties of the Koopman operator was first noticed in Rowley et al. (2009), based on the spectral expansion developed in Mezić (2005). However, rigorous results in this direction are sparse. Williams et al. (2015) provided a result, the corollary of which is that the spectrum of the EDMD approximation is contained in the spectrum of the Koopman operator provided the observables belong to a finite-dimensional invariant subspace of the Koopman operator and the data matrix is of the same rank. The work of Arbabi and Mezić (2016) suggested that an alternative assumption to the finite rank is that the number of sampling points M goes to infinity even though the results of Arbabi and Mezić (2016) still implicitly rely on a finite-dimensional assumption.

The work (Klus et al. 2016) showed that, assuming either independent identically distributed (iid) or ergodic sampling from a measure \(\mu \), the EDMD operator on N observables constructed using M samples, \(\mathcal {K}_{N,M}\), converges as \(M\rightarrow \infty \) to \(\mathcal {K}_N\), where \(\mathcal {K}_N\) is the \(L_2(\mu )\)-orthogonal projection of the action of the Koopman operator on the finite-dimensional subspace of observables. In this work, we show that \(\mathcal {K}_N\) converges to \(\mathcal {K}\) in the strong operator topology. As a result, predictions of a given observable obtained using \(\mathcal {K}_N\) or \(\mathcal {K}_{N,M}\) over any finite prediction horizon converge in the \(L_2(\mu )\) norm (as N or N and M tend to infinity) to its true values. In addition, we show that, as \(N\rightarrow \infty \), accumulation points of the spectra of \(\mathcal {K}_N\) correspond to eigenvalues of the Koopman operator and the associated eigenfunctions converge weakly to an eigenfunction of the Koopman operator, provided that the weak limit of the eigenfunctions is nonzero. The results hold in a very general setting with minimal underlying assumptions. In particular, we do not assume that the finite-dimensional subspace of observables is invariant under the action of the Koopman operator or that the dynamics is measure preserving.

As a by-product of our results, we propose an analytic version of the EDMD algorithm which allows one to construct \(\mathcal {K}_N\) directly, without the need for sampling, under the assumption that the transition mapping of the dynamical system is known analytically and the N-dimensional subspace of observables is such that the integrals of the products of the observables precomposed with the transition mapping can be evaluated in closed form. This method is not immediately useful for large-scale data-driven applications that the EDMD was originally designed for, but it may prove useful in control applications (e.g., Korda and Mezić 2016), where model is often known, or for numerical studies of Koopman operator approximations on classical examples, eliminating the sampling error in both cases.

Finally, we analyze convergence of \(\mathcal {K}_{N,N}\), i.e., the situation where the number of samples M and the number of observables N are equal. Under the additional assumptions that the sample points lie on the same trajectory and the mapping T is a homeomorphism, we prove convergence along a subsequence to weak eigenfunctions (or eigendistributions in the sense of Gelfand and Shilov 1964) of the Koopman operator, which also turn out to be eigenmeasures of the Perron–Frobenius operator.

The paper is organized as follows: In Sect. 2, we introduce the setting of EDMD. In Sect. 3, we show that EDMD is an orthogonal projection of the action of the Koopman operator on a finite subspace of observables with respect to the empirical measure supported on sample points drawn from a measure \(\mu \). In Sect. 4, we show that this projection converges to the \(L_2(\mu )\)-projection of the action of the Koopman operator. In Sect. 5, we analyze the convergence of the EDMD approximations as the dimension of the subspace N goes to infinity, showing convergence in strong operator topology and convergence of the eigenvalues along a subsequence plus weak convergence of the associated eigenfunctions. In Sect. 6, we show convergence of finite-horizon predictions. Section 7 describes the analytic construction of \(\mathcal {K}_N\). Section 8 contains results for the case when \(M=N\) and only convergence to weak eigenfunctions, along a subsequence, is proven. We conclude in Sect. 9.

1.1 Notation

The spaces of real and complex numbers are denoted by \(\mathbb {R}\) and \(\mathbb {C}\) with \(\mathbb {R}^{n\times k}\) and \(\mathbb {C}^{n\times k}\) denoting the corresponding real and complex \(n\times k\) matrices; the real and complex column vectors are denoted by \(\mathbb {R}^n := \mathbb {R}^{n\times 1}\) and \(\mathbb {C}^n := \mathbb {C}^{n\times 1}\). The complex conjugate of \(a\in \mathbb {C}\) is denoted by \(\overline{a}\). Given a matrix \(A\in \mathbb {C}^{n\times k}\), \(A^\top \) denotes its transpose and \(A^{{\mathsf {H}}}\) denotes its Hermitian transpose (i.e., \(A_{i,j}^\top = A_{j,i}\) and \(A^{{\mathsf {H}}}_{i,j} = \overline{A_{j,i}}\)). The Moore–Penrose pseudoinverse of a matrix A is denoted by \(A^\dagger \). The Frobenius norm of a matrix A is denoted by \(\Vert A \Vert _F = \sqrt{ \sum _{i,j}{ |A_{i,j}|^2 }}\). Given a vector \(c\in \mathbb {C}^n\), the symbol \(\Vert c\Vert _2 := \sqrt{\sum _{i}|c_i|^2}\) denotes its Euclidean norm.

2 Extended Dynamic Mode Decomposition

We consider a discrete time dynamical system

$$\begin{aligned} x^+ = T(x) \end{aligned}$$
(1)

with \(T{:}\,\mathcal {M}\rightarrow \mathcal {M}\), where \(\mathcal {M}\) is a topological space,Footnote 1 and we assume that we are given snapshots of data

$$\begin{aligned} \varvec{X} = [x_1,\ldots , x_M],\quad \varvec{Y} = [y_1,\ldots , y_M] \end{aligned}$$
(2)

satisfying \(y_i = T(x_i)\). We do not assume that the data points line on a single trajectory of (1).

Given a vector space of observables \(\mathcal {F}\) such that \(\psi {:}\,\mathcal {M}\rightarrow \mathbb {C}\) and \(\psi \circ T \in \mathcal {F}\) for every \(\psi \in \mathcal {F}\), we define the Koopman \(\mathcal {K}{:}\,\mathcal {F}\rightarrow \mathcal {F}\) by

$$\begin{aligned} \mathcal {K}\psi = \psi \circ T, \end{aligned}$$

where \(\circ \) denotes the pointwise function composition. Given a set of linearly independent basis functions \(\psi _i \in \mathcal {F}\), \(i=1,\ldots ,N\), and defining

$$\begin{aligned} \mathcal {F}_N := \mathrm {span}\left\{ \psi _1,\ldots ,\psi _N\right\} , \end{aligned}$$
(3)

the EDMD constructs a finite-dimensional approximation \(\mathcal {K}_{N,M}{:}\,\mathcal {F}_N\rightarrow \mathcal {F}_N\) of the Koopman operator by solving the least-squares problem

$$\begin{aligned} \min _{A\in \mathbb {C}^{N\times N}} \Vert A\varvec{\psi }(\varvec{X}) - \varvec{\psi }(\varvec{Y}) \Vert _F^2 = \min _{A\in \mathbb {C}^{N\times N}} \sum _{i=1}^M \Vert A\varvec{\psi }(x_i) - \varvec{\psi }(y_i) \Vert _2^2, \end{aligned}$$
(4)

where

$$\begin{aligned} \varvec{\psi }(\varvec{X}) = [\varvec{\psi }(x_1),\ldots , \varvec{\psi }(x_M)],\quad \varvec{\psi }(\varvec{Y}) = [\varvec{\psi }(y_1),\ldots , \varvec{\psi }(y_M)] \end{aligned}$$

and

$$\begin{aligned} \varvec{\psi }(x) = [\psi _1(x),\ldots ,\psi _N(x)]^\top . \end{aligned}$$

Denoting

$$\begin{aligned} A_{N,M} = \varvec{\psi }(\varvec{Y})\varvec{\psi }(\varvec{X})^\dagger , \end{aligned}$$
(5)

a solutionFootnote 2 to (4), the finite-dimensional approximation of the Koopman operator

$$\begin{aligned} \mathcal {K}_{N,M}{:}\,\mathcal {F}_N \rightarrow \mathcal {F}_N \end{aligned}$$

is then defined by

$$\begin{aligned} \mathcal {K}_{N,M} \phi = c_\phi ^{{\mathsf {H}}}A_{N,M}\varvec{\psi }\end{aligned}$$
(6)

for any \(\phi = c_\phi ^{{\mathsf {H}}}\varvec{\psi }\), \(c_\phi \in \mathbb {C}^N\) (i.e., for any \(\phi \in \mathcal {F}_N\)). The operator \(\mathcal {K}_{N,M}\) will be referred to as the EDMD operator.

3 EDMD as \(L_2\) Projection

To the best of our knowledge, the results of this section and Sect. 4 were first obtained in Klus et al. (2016, Section 3.4) and hinted at already in Williams et al. (2015). Here, we rephrase these results in a form more suitable for our purposes.

From here on, we assumeFootnote 3 that \(\mathcal {F}= L_2(\mu )\), where \(\mu \) is a given positive measure on \(\mathcal {M}\). This assumption in particular implies that the basis functions \(\psi _i\) in (3) belong to \(L_2(\mu )\) and hence \(\mathcal {F}_N\) is a closed (but not necessarily invariant) subspace of \(L_2(\mu )\). Note that the measure \(\mu \) is not required to be invariant for (1) and hence the Koopman operator \(\mathcal {K}\) is not necessarily unitary. In practical applications, the measure \(\mu \) will typically be the uniform measure on \(\mathcal {M}\) or other measure from which samples can be drawn efficiently.

We recall that given an arbitrary positive measure \(\nu \) on \(\mathcal {M}\), the space \(L_2(\nu )\) is the Hilbert space of all measurable functions \(\phi {:}\,\mathcal {M}\rightarrow \mathbb {C}\) satisfying

$$\begin{aligned} \Vert \phi \Vert _{L_2(\nu )} := \sqrt{\int _{\mathcal {M}} |\phi (x)|^2\,\mathrm{d}\nu (x)} < \infty . \end{aligned}$$

Assuming \(\mathcal {F}_N \) is a closed subspace of \(L_2(\nu )\), the \(L_2(\nu )\)-projection of a function \(\phi \in L_2(\nu )\) onto \(\mathcal {F}_N \subset L_2(\nu )\) is defined by

$$\begin{aligned} P_N^\nu \phi = \mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{f\in \mathcal {F}_N} \Vert f - \phi \Vert _{L_2(\nu )}= \mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{f\in \mathcal {F}_N} \int _{\mathcal {M}} |f - \phi |^2\,\mathrm{d}\nu =\mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{c\in \mathbb {C}^N} \int _{\mathcal {M}} |c^{{\mathsf {H}}}\varvec{\psi }- \phi |^2\,\mathrm{d}\nu . \end{aligned}$$
(7)

Now, given the data points \(x_1,\ldots ,x_M\) from (2), we define the empirical measure \(\hat{\mu }_M\) by

$$\begin{aligned} \hat{\mu }_M = \frac{1}{M}\sum _{i=1}^M \delta _{x_i}, \end{aligned}$$
(8)

where \(\delta _{x_i}\) is the Dirac measure at \(x_i\). In particular, the integral of a function \(\phi \) with respect to \(\hat{\mu }_M\) is given by

$$\begin{aligned} \int _\mathcal {M}\phi (x)\mathrm{d}{\hat{\mu }}_M(x) = \frac{1}{M}\sum _{i=1}^M \phi (x_i). \end{aligned}$$

We remark that the EDMD subspace \(\mathcal {F}_N\) defined in (3) is a closed subspace of both \(L_2(\hat{\mu }_M)\) and \(L_2(\mu )\) and hence the projections \(P_N^{\mu }\) and \(P_N^{{\hat{\mu }}_M}\) are well defined.

Now, we are ready to state the following characterization of \(\mathcal {K}_{N,M} \phi \):

Theorem 1

Let \({\hat{\mu }}_M\) denote the empirical measure (8) associated to the sample points \(x_1,\ldots , x_M\) and assume that the \(N\times N\) matrix

$$\begin{aligned} M_{{\hat{\mu }}_M} =\frac{1}{M}\sum _{i=1}^M \varvec{\psi }(x_i)\varvec{\psi }(x_i)^{{\mathsf {H}}}= \int _{\mathcal {M}} \varvec{\psi }\varvec{\psi }^{{\mathsf {H}}}\,\mathrm{d}{\hat{\mu }}_M \end{aligned}$$
(9)

is invertible. Then, for any \(\phi \in \mathcal {F}_N\)

$$\begin{aligned} \mathcal {K}_{N,M}\phi = P_{N}^{\hat{\mu }_M} \mathcal {K}\phi = \mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{f \in \mathcal {F}_N} \Vert f - \mathcal {K}\phi \Vert _{L_2({\hat{\mu }}_M )}, \end{aligned}$$
(10)

i.e.,

$$\begin{aligned} \mathcal {K}_{N,M} = P_N^{{\hat{\mu }}_M} \mathcal {K}_{|\mathcal {F}_N}, \end{aligned}$$
(11)

where \(\mathcal {K}_{|\mathcal {F}_N}{:}\,\mathcal {F}_N\rightarrow \mathcal {F}\) is the restriction of the Koopman operator to \(\mathcal {F}_N\).

Proof

Since the matrix \(M_{{\hat{\mu }}_M}\) is invertible, the least-squares problem (4) has a unique solution given by

$$\begin{aligned} a_i = \left( \sum _{j=1}^M \varvec{\psi }(x_j)\varvec{\psi }(x_j)^{{\mathsf {H}}}\right) ^{-1}\sum _{j=1}^M \varvec{\psi }(x_j)\overline{ \psi _i(y_j)}, \end{aligned}$$

where \(a_i^{{\mathsf {H}}}\in \mathbb {C}^{1\times N}\) is the ith row of \(A_{N,M}\). Therefore,

$$\begin{aligned} A_{N,M}^{{\mathsf {H}}}= \left( \sum _{j=1}^M \varvec{\psi }(x_j)\varvec{\psi }(x_j)^{{\mathsf {H}}}\right) ^{-1}\sum _{j=1}^M \varvec{\psi }(x_j)\varvec{\psi }(y_j)^{{\mathsf {H}}}. \end{aligned}$$

On the other hand, analyzing the minimization problem on the right-hand side of (10), we get for any \(\phi = c_\phi ^{{\mathsf {H}}}\varvec{\psi }\)

$$\begin{aligned} \mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{f \in \mathcal {F}_N} \Vert f - \mathcal {K}\phi \Vert _{L_2({\hat{\mu }}_M )} = \mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{c\in \mathbb {C}^N} \frac{1}{M}\sum _{i=1}^M \left( c^{{\mathsf {H}}}\varvec{\psi }(x_i) - c_\phi ^{{\mathsf {H}}}\varvec{\psi }(y_i)\right) ^2 \end{aligned}$$

with the unique minimizer (since the minimized functions is strictly convex in c)

$$\begin{aligned} c = \left( \sum _{j=1}^M \varvec{\psi }(x_j)\varvec{\psi }(x_j)^{{\mathsf {H}}}\right) ^{-1}\sum _{j=1}^M \varvec{\psi }(x_j)\varvec{\psi }(y_j)^{{\mathsf {H}}}c_\phi = A_{N,M}^{{\mathsf {H}}}c_\phi . \end{aligned}$$

Hence, \(\mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{f \in \mathcal {F}_N} \Vert f - \mathcal {K}\phi \Vert _{L_2({\hat{\mu }}_M )} = c^{{\mathsf {H}}}\varvec{\psi }= c_\phi ^{{\mathsf {H}}}A_{N,M}\varvec{\psi }= \mathcal {K}_{N,M}\phi \) as desired, where we used (6) in the last equality. \(\square \)

Theorem 1 says that for any function \(\phi \in \mathcal {F}_N\), the EDMD operator \(\mathcal {K}_{N,M}\) computes the \(L_2({\hat{\mu }}_M)\)-orthogonal projection of \(\mathcal {K}\phi \) on the subspace spanned by \(\psi _1,\ldots ,\psi _N\).

Remark 1

If the assumption that the matrix \(M_{{\hat{\mu }}_M}\) is invertible does not hold, then the solution to the projection problem on the right-hand side of (10) may not be unique.Footnote 4 The action of the EDMD operator \(\mathcal {K}_{N,M}\) [which is defined uniquely by (5) and (6)] then selects one solution to the projection problem. In concrete terms, we have

$$\begin{aligned} \mathcal {K}_{N,M}\phi \in \mathop {\mathrm{Arg}\,\mathrm{min}}\limits _{f\in \mathcal {F}_N}\Vert f - \mathcal {K}\phi \Vert _{L_2({\hat{\mu }}_M)}, \end{aligned}$$

where \(\mathop {\mathrm{Arg}\,\mathrm{min}}\limits _{f\in \mathcal {F}_N}\Vert f - \mathcal {K}\phi \Vert _{L_2({\hat{\mu }}_M)}\) denotes the set of all minimizers of \(\Vert f - \mathcal {K}\phi \Vert _{L_2({\hat{\mu }}_M)}\) among \(f\in \mathcal {F}_N\).

4 Convergence of \(\mathcal {K}_{N,M}\) as \(M\rightarrow \infty \)

The first step in understanding convergence of EDMD is to understand the convergence of \(\mathcal {K}_{N,M}\) as the number of samples M tends to infinity. In this section, we prove that

$$\begin{aligned} \mathcal {K}_{N,M} \rightarrow \mathcal {K}_N, \end{aligned}$$

where

$$\begin{aligned} \mathcal {K}_N = P_N^\mu \mathcal {K}_{|\mathcal {F}_N}, \end{aligned}$$
(12)

provided that the samples \(x_1,\ldots ,x_M\) are drawn independently from a given probability distribution \(\mu \) (e.g., uniform distribution for compact \(\mathcal {M}\) or Gaussian for if \(\mathcal {M}= \mathbb {R}^n\)).

Assumption 1

(\(\mu \) independence) The basis functions \(\psi _1,\ldots ,\psi _N\) are such that

$$\begin{aligned} \mu \left\{ x\in \mathcal {M}\mid c^{{\mathsf {H}}}\varvec{\psi }(x)=0 \right\} = 0 \end{aligned}$$

for all nonzero \(c \in \mathbb {C}^N\).

This is a natural assumption ensuring that the measure \(\mu \) is not supported on a zero level set of a linear combination of the basis functions used. It is satisfied if \(\mu \) is any measure with the support equal to \(\mathcal {M}\) in conjunction with the majority of most commonly used basis functions such as polynomials and radial basis functions with unbounded support (e.g., Gaussian, thin plate splines). This assumption in particular implies that the matrix \(M_{{\hat{\mu }}_M}\) defined in (9) is invertible with probability one for \(M \ge N\) if \(x_j\)’s are iid samples from \(\mu \).

Lemma 1

If Assumption 1 holds,  then for any \(\phi \in \mathcal {F}\) we have with probability one

$$\begin{aligned} \lim _{M\rightarrow \infty }\Vert P_N^{{\hat{\mu }}_M} \phi - P_N^{\mu } \phi \Vert = 0, \end{aligned}$$
(13)

where \(\Vert \cdot \Vert \) is any norm on \(\mathcal {F}_N (\)which are all equivalent since \(\mathcal {F}_N\) is finite dimensional).

Proof

We have

$$\begin{aligned} P_N^{\mu } \phi&= \mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{f\in \mathcal {F}_N} \int _{\mathcal {M}} |f - \phi |^2\, \mathrm{d}\mu = \varvec{\psi }^{{\mathsf {H}}}\mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{c\in \mathbb {C}^N} \int _{\mathcal {M}} |c^{{\mathsf {H}}}\varvec{\psi }- \phi |^2\, \mathrm{d}\mu \\&=\varvec{\psi }^{{\mathsf {H}}}\mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{c\in \mathbb {C}^N} \left[ c^{{\mathsf {H}}}M_{\mu } c - 2{\mathrm{Re}}\big \{c^{{\mathsf {H}}}b_{\mu ,\phi }\big \} \right] , \end{aligned}$$

where

$$\begin{aligned} M_\mu = \int _{\mathcal {M}} \varvec{\psi }\varvec{\psi }^{{\mathsf {H}}}\,\mathrm{d}\mu \in \mathbb {C}^{N\times N},\quad b_{\mu ,\phi } = \int _{\mathcal {M}}\varvec{\psi }\overline{\phi }\,\mathrm{d}\mu \in \mathbb {C}^N \end{aligned}$$

and we dropped the constant term in the last equality which does not influence the minimizer. By Assumption 1, the matrix \(M_{\mu }\) is invertible and hence Hermitian positive definite. Therefore, the unique minimizer is \(c = M_{\mu }^{-1}b_{\mu ,\phi }\). Hence,

$$\begin{aligned} P_N^{\mu }\phi = b_{\mu ,\phi }^{{\mathsf {H}}}M_{\mu }^{-1}\varvec{\psi }. \end{aligned}$$

On the other hand, the same computation shows that

$$\begin{aligned} P_N^{{\hat{\mu }}_M}\phi = b_{{\hat{\mu }}_M,\phi }^{{\mathsf {H}}}M_{{\hat{\mu }}_M}^{-1}\varvec{\psi }\end{aligned}$$

with

$$\begin{aligned} b_{{\hat{\mu }}_M,\phi } = \int _{\mathcal {M}}\varvec{\psi }\overline{\phi }\,\mathrm{d}{\hat{\mu }}_M = \frac{1}{M}\sum _{i=1}^M \varvec{\psi }(x_i)\overline{\phi (x_i)} \end{aligned}$$

and with the matrix \(M_{{\hat{\mu }}_M}\) defined in (9) guaranteed to be Hermitian positive definite by Assumption 1 with probability one for \(M\ge N\). The result then follows by the strong law of large numbers which ensures that

$$\begin{aligned} \lim _{M\rightarrow \infty } \left( b_{{\hat{\mu }}_M,\phi }^{{\mathsf {H}}}M_{{\hat{\mu }}_M}^{-1}\right) = b_{\mu ,\phi }^{{\mathsf {H}}}M_{\mu }^{-1} \end{aligned}$$

with probability one since the matrix function \(A\mapsto A^{-1}\) is continuous and the samples \(x_i\) are iid. \(\square \)

Theorem 2

If Assumption 1 holds, then we have with probability one for all \(\phi \in \mathcal {F}_N\)

$$\begin{aligned} \lim _{M\rightarrow \infty }\Vert \mathcal {K}_{N,M} \phi - \mathcal {K}_N \phi \Vert = 0, \end{aligned}$$
(14)

where \(\Vert \cdot \Vert \) is any norm on \(\mathcal {F}_N\). In particular

$$\begin{aligned} \lim _{M\rightarrow \infty }\Vert \mathcal {K}_{N,M} - \mathcal {K}_N \Vert = 0, \end{aligned}$$
(15)

where \(\Vert \cdot \Vert \) is any operator norm and

$$\begin{aligned} \lim _{M\rightarrow \infty }\mathrm {dist}\big (\sigma (\mathcal {K}_{N,M}),\sigma (\mathcal {K}_N)\big ) = 0, \end{aligned}$$
(16)

where \(\sigma (\cdot ) \subset \mathbb {C}\) denotes the spectrum of an operator and \(\mathrm {dist}(\cdot ,\cdot )\) the Hausdorff metric on subsets of \(\mathbb {C}\).

Proof

For any fixed \(\phi \in \mathcal {F}_N\), we have by Theorem 1

$$\begin{aligned} \mathcal {K}_{N,M} \phi = P_{N}^{{\hat{\mu }}_M} \mathcal {K}\phi = P_{N}^{{\hat{\mu }}_M} (\phi \circ T). \end{aligned}$$

By definition of \(\mathcal {K}_N \phi \), we have \(\mathcal {K}_N \phi =P_N^\mu (\phi \circ T) \) and therefore (14) holds from Lemma 1 with probability one. Since \(\mathcal {F}_N\) is finite dimensional, (14) holds with probability one for all basis functions of \(\mathcal {F}_N\) and hence by linearity for all \(\phi \in \mathcal {F}_N\). Convergence in the operator norm (15) and spectral convergence (16) follows from (14) since the operators \(\mathcal {K}_{N,M}\) and \(\mathcal {K}_N\) are finite dimensional. \(\square \)

Theorem 2 tells us that in order to understand the convergence of \(\mathcal {K}_{N,M}\) to \(\mathcal {K}\) it is sufficient to understand the convergence of \(\mathcal {K}_N\) to \(\mathcal {K}\). This convergence is analyzed in Sect. 5.

4.1 Ergodic Sampling

The assumption that the samples \(x_1,\ldots ,x_M\) are drawn independently from the distribution \(\mu \) can be replaced by the assumption that \((T,\mathcal {M},\mu )\) is ergodic and the samples \(x_1,\ldots ,x_M\) are the iterates of the dynamical system starting from some initial condition \(x\in \mathcal {M}\), i.e., \(x_i = T^i(x)\). Provided that Assumption 1 holds, both Lemma 1 and Theorem 2 hold without change; the statement “with probability one” is now interpreted with respect to drawing the initial condition x from the distribution \(\mu \). The proofs follow exactly the same argument; only the strong law of large numbers is replaced by the Birkhoff’s ergodic theorem in Lemma 1.

5 Convergence of \(\mathcal {K}_N\) to \(\mathcal {K}\)

In this section, we investigate convergence of \(\mathcal {K}_N\) to \(\mathcal {K}\) in the limit as N goes to infinity. Since the operator \(\mathcal {K}_N\) is defined on \(\mathcal {F}_N\) rather than \(\mathcal {F}\), we extend the operator to all of \(\mathcal {F}\) by precomposing with \(P_N^\mu \), i.e., we study the convergence of \(\mathcal {K}_N P_N^\mu = P_N^\mu \mathcal {K}P_N^\mu {:}\,\mathcal {F}\rightarrow \mathcal {F}\) to \(\mathcal {K}{:}\,\mathcal {F}\rightarrow \mathcal {F}\) as \(N\rightarrow \infty \). Note that as far as spectrum is concerned, precomposing with \(P_N^\mu \) just adds a zero to the spectrum.

To simplify notation, in what follows we denote the \(L_2(\mu )\) norm of a function f by

$$\begin{aligned} \Vert f\Vert := \Vert f\Vert _{L_2(\mu )} = \sqrt{\int _{\mathcal {M}} |f|^2\,\mathrm{d}\mu } \end{aligned}$$

and the usual \(L_2(\mu )\) inner product by

$$\begin{aligned} \langle f,g\rangle := \int _{\mathcal {M}} f\overline{g}\,\mathrm{d}\mu . \end{aligned}$$

5.1 Preliminaries

Before stating our results, we recall several concepts from functional analysis and operator theory.

Definition 1

(Bounded operator) An operator \(\mathcal {A}{:}\,\mathcal {F}\rightarrow \mathcal {F}\) defined on a Hilbert space \(\mathcal {F}\) is bounded if

$$\begin{aligned} \Vert \mathcal {A}\Vert := \sup \limits _{f \in \mathcal {F},\; \Vert f \Vert =1} \Vert \mathcal {A}f \Vert < \infty . \end{aligned}$$

The quantity \(\Vert \mathcal {A}\Vert \) is referred as the norm of \(\mathcal {A}\).

Definition 2

(Convergence in strong operator topology) A sequence of bounded operators \(\mathcal {A}_i{:}\,\mathcal {F}\rightarrow \mathcal {F}\) defined on a Hilbert space \(\mathcal {F}\) convergences strongly (or in the strong operator topology) to an operator \(\mathcal {A}{:}\,\mathcal {F}\rightarrow \mathcal {F}\) if

$$\begin{aligned} \lim _{i\rightarrow \infty } \Vert \mathcal {A}_i g - \mathcal {A}g\Vert \end{aligned}$$
(17)

for all \(g \in \mathcal {F}\).

Definition 3

(Weak convergence) A sequence of elements \(f_i \in \mathcal {F}\) of a Hilbert space \(\mathcal {F}\) converges weakly to \(f \in \mathcal {F}\), denoted \(f_i \xrightarrow {w}f\), if

$$\begin{aligned} \lim _{i\rightarrow \infty } \langle f_i, g\rangle = \langle f,g\rangle \end{aligned}$$
(18)

for all \(g \in \mathcal {F}\).

We emphasize that Definition 2 pertains to convergence of operators defined on \(\mathcal {F}\), whereas Definition 3 pertains to convergence of elements of \(\mathcal {F}\). We also remark that strong convergence of \(f_i \rightarrow f\) (i.e., \(\Vert f_i - f\Vert \rightarrow 0\)) implies weak convergence, but not vice versa. Similarly, convergence in the strong operator topology implies convergence in the weak operator topology (i.e., \(\mathcal {A}_i g \xrightarrow {w}\mathcal {A}_g\) for all g), but does not imply convergence in the operator norm (i.e., \(\Vert \mathcal {A}_i - \mathcal {A}\Vert \rightarrow 0\)).

In our setting of \(\mathcal {F}= L_2(\mu )\), the statements (17) and (18) translate to the requirements that, respectively,

$$\begin{aligned} \lim _{i\rightarrow \infty } \sqrt{\int _{\mathcal {M}} | \mathcal {A}_i g - \mathcal {A}g |^2\,\mathrm{d}\mu } = 0\qquad \mathrm {and}\qquad \lim _{i\rightarrow \infty } \int _{\mathcal {M}} f_i\overline{g} \,\mathrm{d}\mu = \int _{\mathcal {M}} f\overline{g} \,\mathrm{d}\mu \end{aligned}$$

for all \(g\in L_2(\mu )\).

For the remainder of this work, we invoke the following assumption:

Assumption 2

The following conditions hold:

  1. 1.

    The Koopman operator \(\mathcal {K}{:}\,\mathcal {F}\rightarrow \mathcal {F}\) is bounded.

  2. 2.

    The observables \(\psi _1,\ldots ,\psi _N\) defining \(\mathcal {F}_N\) are selected from a given orthonormal basis of \(\mathcal {F}\), i.e., \((\psi _i)_{i=1}^\infty \) is an orthonormal basis of \(\mathcal {F}\).

The first part of the assumption holds for instance when T is invertible, Lipschitz with Lipschitz inverse and \(\mu \) is the Lebesgue measure on \(\mathcal {M}\) (or any measure absolutely continuous w.r.t. the Lebesgue measure with bounded and strictly positive density). The second part of the assumption is nonrestrictive as any countable dense subset of \(\mathcal {F}\) can be turned into an orthonormal basis using the Gram–Schmidt process.

5.2 Convergence in Strong Operator Topology

In this section, we prove convergence in the strong operator topology (Definition 2) of \(\mathcal {K}_N P_N^\mu \) to \(\mathcal {K}\). First, we need the following immediate lemma:

Lemma 2

If \((\psi _i)_{i=1}^\infty \) form an orthonormal basis of \(\mathcal {F}= L_2(\mu )\), then \(P_N^\mu \) converge strongly to the identity operator I and in addition \(\Vert I - P_N\Vert \le 1 \) for all N.

Proof

Let \(\phi = \sum _{i=1}^\infty c_i\psi _i\) with \(\Vert \phi \Vert = 1\). Then, by Parseval’s identity \(\sum _{i=1}^\infty |c_i|^2 = 1\) and

$$\begin{aligned} \Vert P_N^\mu \phi - \phi \Vert = \Bigg \Vert \sum _{i=N+1}^\infty c_i\psi _i \Bigg \Vert = \sum _{i=N+1}^\infty |c_i|^2 \rightarrow 0 \end{aligned}$$

with \(\sum _{i=N+1}^\infty |c_i|^2 \le 1\) for all N. \(\square \)

Now, we are ready to prove strong convergence of \(P_N^\mu \mathcal {K}P_N^\mu \) to \(\mathcal {K}\).

Theorem 3

If Assumption 2 holds, then the sequence of operators \(\mathcal {K}_NP_N^\mu =P_N^\mu \mathcal {K}P_N^\mu \) converges strongly to \(\mathcal {K}\) as \(N\rightarrow \infty \), i.e.,

$$\begin{aligned} \lim _{N\rightarrow \infty } \int _{\mathcal {M}} | \mathcal {K}_N P_N^\mu \phi - \mathcal {K}\phi |^2\,\mathrm{d}\mu = 0 \end{aligned}$$

for all \(\phi \in \mathcal {F}\).

Proof

Let \(\phi \in \mathcal {F}\) be given. Then, writing \(\phi =P_N^\mu \phi + (I -P_N^\mu )\phi \) we have

$$\begin{aligned} \Vert P_N^\mu \mathcal {K}P_N^\mu \phi - \mathcal {K}\phi \Vert&= \Vert \big (P_N^\mu -I\big )\mathcal {K}P_N^\mu \phi + \mathcal {K}\big (P_N^\mu -I\big )\phi \Vert \\&\le \Vert \big (P_N^\mu -I\big )\mathcal {K}P_N^\mu \phi \Vert + \Vert \mathcal {K}\Vert \Vert ( I - P_N^\mu )\phi \Vert \\&\le \Vert \big (P_N^\mu -I\big )\mathcal {K}\phi \Vert + \Vert \big (P_N^\mu -I\big )\Vert \Vert \mathcal {K}P_N^\mu \phi -\mathcal {K}\phi \Vert \\&\quad +\,\Vert \mathcal {K}\Vert \Vert \big ( I - P_N^\mu \big )\phi \Vert \rightarrow 0 \end{aligned}$$

by Lemma 2 and by the fact that \(\mathcal {K}P_N^\mu \phi \rightarrow \mathcal {K}\phi \) since \(\mathcal {K}\) is continuous by Assumption 2. \(\square \)

5.3 Weak Spectral Convergence

Unfortunately, strong converge does not in general guarantee convergence of the spectra of the operators. This is guaranteed only if the operators converge in the operator norm.Footnote 5 An important exception to this is the case of \(\mathcal {F}_N\) being an invariant subspace, i.e., \(\mathcal {K}f \in \mathcal {F}_N\) for all \(f\in \mathcal {F}_N\) in which case the spectra of \(\mathcal {K}_N\) and \(\mathcal {K}_{|\mathcal {F}_N}\) coincide. Here, however, we do not assume that \(\mathcal {F}_N\) is invariant and prove certain spectral convergence results in a weak sense. In particular, we prove convergence of the eigenvalues of \(\mathcal {K}_N\) along a subsequence and weak convergence of the associated eigenfunctions (see Definition 3), provided that the weak limit of the eigenfunctions is nonzero.

Theorem 4

If Assumption 2 holds and \(\lambda _N\) is a sequence of eigenvalues of \(\mathcal {K}_N\) with the associated normalized eigenfunctions \(\phi _N \in \mathcal {F}_N\), \(\Vert \phi _N \Vert =1\), then there exists a subsequence \((\lambda _{N_i},\phi _{N_i})\) such that

$$\begin{aligned} \lim _{i\rightarrow \infty }\lambda _{N_i} = \lambda ,\quad \phi _{N_i} \xrightarrow {w}\phi , \end{aligned}$$

where \(\lambda \in \mathbb {C}\) and \(\phi \in \mathcal {F}\) are such that \(\mathcal {K}\phi = \lambda \phi \). In particular, if \(\Vert \phi \Vert \ne 0\), then \(\lambda \) is an eigenvalue of \(\mathcal {K}\) with eigenfunction \(\phi \).

Proof

First, observe that since \(\mathcal {K}_N \phi _N = \lambda _N \phi _N\) with \(\phi _N \in \mathcal {F}_N\), we also have \(P_N^\mu \mathcal {K}P_N^\mu \phi _N = \lambda _N \phi _N \). Hence, \(|\lambda _N| \le \Vert P_N^\mu \mathcal {K}P_N^\mu \Vert \le \Vert \mathcal {K}\Vert < \infty \) by Assumption 2 and the fact that \(\Vert P_N^\mu \Vert \le 1 \). Therefore, the sequence \(\lambda _N\) is bounded. Since \(\phi _N\) is normalized and hence bounded, by weak sequential compactness of the unit ball of a Hilbert space [which follows from the Banach–Alaoglu theorem (Rudin 1973, Theorems 3.15) and Eberlein–Šmulian theorem (Dunford and Schwartz 1971)], there exists a subsequence \((\lambda _{N_i},\phi _{N_i})\) such that \(\lambda _{N_i}\rightarrow \lambda \) and \(\phi _{N_i}\xrightarrow {w}\phi \).

It remains to prove that \((\lambda ,\phi )\) is an eigenvalue–eigenfunction pair of \(\mathcal {K}\). For ease of notation, set \(\lambda _{i} = \lambda _{N_i}\) and \(\phi _i = \phi _{N_i}\). Denote \(\hat{\mathcal {K}}_i = \mathcal {K}_{N_i} P_{N_i}^\mu = P_{N_i}^\mu \mathcal {K}P_{N_i}^\mu \) and observe that \({\hat{\mathcal {K}}}_i \phi _i = \lambda _i\phi _i\) for all i. Then, we have

$$\begin{aligned} \mathcal {K}\phi = {\hat{\mathcal {K}}}_i(\phi -\phi _i) + (\mathcal {K}-{\hat{\mathcal {K}}}_i)\phi + {\hat{\mathcal {K}}}_i\phi _i. \end{aligned}$$

Taking the inner product with an arbitrary \(f\in \mathcal {F}\) and using the fact that \({\hat{\mathcal {K}}}_i\phi _i = \lambda _i\phi _i\), we get

$$\begin{aligned} \langle \mathcal {K}\phi ,f \rangle = \langle {\hat{\mathcal {K}}}_i(\phi -\phi _i),f \rangle +\langle (\mathcal {K}-{\hat{\mathcal {K}}}_i)\phi ,f\rangle + \langle \lambda _i\phi _i ,f \rangle . \end{aligned}$$

Now, the second term on the right-hand side \(\langle (\mathcal {K}-{\hat{\mathcal {K}}}_i)\phi ,f\rangle \rightarrow 0\) since \({\hat{\mathcal {K}}}_i\) converges strongly to \(\mathcal {K}\) by Theorem 3. The last term \(\langle \lambda _i\phi _i,f \rangle \rightarrow \langle \lambda \phi ,f\rangle \) since \(\lambda _i\rightarrow \lambda \) and \(\phi _i \xrightarrow {w}\phi \). It remains to show that the first term converges to zero. We have

$$\begin{aligned} \big \langle {\hat{\mathcal {K}}}_i(\phi -\phi _i),f \big \rangle = \big \langle P_{N_i}^\mu \mathcal {K}P_{N_i}^\mu (\phi -\phi _i),f \big \rangle = \big \langle \mathcal {K}\big (P_{N_i}^\mu \phi -\phi _i\big ), P_{N_i}^\mu f \big \rangle , \end{aligned}$$

where we used the fact that \(P_{N_i}^\mu \) is self-adjoint and \(\phi _i \in \mathcal {F}_{N_i}\) and hence \(P_{N_i}^\mu \phi _i = \phi _i\). Denote \(h_i := \mathcal {K}(P_{N_i}^\mu \phi -\phi _i)\). We will show that \(h_i\xrightarrow {w}0\). Indeed, denoting \(\mathcal {K}^\star \) the adjoint of \(\mathcal {K}\), we have

$$\begin{aligned} \big \langle \mathcal {K}\big (P_{N_i}^\mu \phi -\phi _i\big ),f\big \rangle= & {} \big \langle \big (P_{N_i}^\mu \phi -\phi + \phi -\phi _i\big ),\mathcal {K}^\star f\big \rangle = \big \langle P_{N_i}^\mu \phi -\phi ,\,\mathcal {K}^\star f\big \rangle \\&+\,\big \langle \phi -\phi _i,\,\mathcal {K}^\star f\big \rangle \rightarrow 0, \end{aligned}$$

since \(P_{N_i}^\mu \) converges strongly to the identity (Lemma 2) and \(\phi _i \xrightarrow {w}\phi \). Finally, we show that \(\langle h_i, P_{N_i}^\mu f\rangle \rightarrow 0\). We have

$$\begin{aligned} \big \langle h_i, P_{N_i}^\mu f\big \rangle = \big \langle h_i, P_{N_i}^\mu f - f\big \rangle + \langle h_i, f\rangle . \end{aligned}$$

The second term goes to zero since \(h_i\xrightarrow {w}0\). For the first term, we have

$$\begin{aligned} \big \langle h_i, P_{N_i}^\mu f - f\big \rangle \le \Vert h_i\Vert \Vert P_{N_i}^\mu f - f \Vert \rightarrow 0 \end{aligned}$$

since \(P_{N_i}^\mu \) converges strongly to the identity operator (Lemma 2) and \(h_i\) is bounded since \(\mathcal {K}\) is bounded by Assumption 2, \(\Vert P_{N_i}^\mu \Vert \le 1\) and \(\Vert \phi _i \Vert \le 1\). Therefore, we conclude that

$$\begin{aligned} \langle \mathcal {K}\phi ,f \rangle = \lim _{i\rightarrow \infty } \langle \lambda _i \phi _i,f\rangle = \langle \lambda \phi ,f \rangle \end{aligned}$$

for all \(f \in \mathcal {F}\). Therefore, \(\mathcal {K}\phi = \lambda \phi \). \(\square \)

Example

As an example demonstrating that the assumption that the weak limit of \(\phi _N\) is nonzero is important, consider \(\mathcal {M}= [0,1]\), \(T(x) = x\) and \(\mu \) the Lebesgue measure on [0, 1]. In this setting, the Koopman operator \(\mathcal {K}{:}\,L_2(\mu )\rightarrow L_2(\mu )\) is the identity operator with the spectrum being the singleton \(\sigma (\mathcal {K})=\{1\}\). However, given any \(\lambda \in \mathbb {C}\) and the sequence of functions \(\phi _N = \sqrt{2}\sin (2\pi N x)\), we have

$$\begin{aligned} \mathcal {K}\phi _N - \lambda \phi _N = \phi _N - \lambda \phi _N = (1-\lambda )\sqrt{2}\sin (2\pi N x)\xrightarrow {w}0 \end{aligned}$$

with \(\Vert \phi _N\Vert ^2 = \int _{0}^1 2\sin ^2(2\pi N x)\,dx = 1\). Therefore, if \(\phi _N\) were the eigenfunctions of \(\mathcal {K}_N\) with eigenvalues \(\lambda _N\rightarrow \lambda \ne 1\), then the sequence \(\lambda _N\) would converge to a spurious eigenvalue \(\lambda \). Fortunately, in this case, we have \(\sigma (\mathcal {K}_N) = \{ 1\}\) and hence no spurious eigenvalues exist; however, in general, we cannot rule out this possibility, at least not as far as the statement of Theorem 4 goes. See Fig. 1 for illustration.

This example, with the highly oscillatory functions \(\phi _N\), may motivate practical considerations in detecting spurious eigenvalues, e.g., using Sobolev-type (pseudo) norms \(\int _{\mathcal {M}} \Vert \nabla \phi _N\Vert ^2\,\mathrm{d}\mu \) or other metrics of oscillatoriness. See e.g., Giannakis (2016) for the use of Sobolev norms in the context of Koopman data analysis and forecasting.

As an immediate corollary of Theorem 4, we get:

Fig. 1
figure 1

Graph of the functions \(\phi _N(x) = \sqrt{2}\sin (2\pi Nx)\) for \(N = 1\) and \(N=10\). These functions satisfy \(\Vert \phi _N\Vert _{L_2}=1\) and \(\phi _N\xrightarrow {w}0\) as \(N\rightarrow \infty \). If such \(\phi _N\) happen to be eigenfunctions of \(\mathcal {K}_N\) with eigenvalues \(\lambda _N\), then the accumulation points of the sequence \((\lambda _N)_{N=1}^\infty \) need not be eigenvalues of the Koopman operator

Corollary 1

If Assumption 2 holds and \(\lambda _{N,M}\) is a sequence of eigenvalues of \(\mathcal {K}_{N,M}\) with the associated normalized eigenfunctions \(\phi _{N,M} \in \mathcal {F}_N\), \(\Vert \phi _{N,M} \Vert =1\), then there exists a subsequence \((\lambda _{N_i,M_j},\phi _{N_i,M_j})\) such that with probability one

$$\begin{aligned} \lim _{i\rightarrow \infty }\lim _{j\rightarrow \infty }\lambda _{N_i,M_j} = \lambda ,\quad \lim _{i\rightarrow \infty }\lim _{j\rightarrow \infty }\langle \phi _{N_i,M_j},f\rangle = \langle \phi ,f\rangle , \end{aligned}$$

for all \(f \in \mathcal {F}\), where \(\lambda \in \mathbb {C}\) and \(\phi \in \mathcal {F}\) are such that \(\mathcal {K}\phi = \lambda \phi \). In particular, if \(\Vert \phi \Vert \ne 0\), then \(\lambda \) is an eigenvalue of \(\mathcal {K}\) with eigenfunction \(\phi \).

Proof

First notice that since \(\mathcal {K}_{N,M} \rightarrow \mathcal {K}_N\) in the operator norm (Theorem 2) and \(\Vert \mathcal {K}_N\Vert \le \Vert \mathcal {K}\Vert < \infty \), the sequence \(\lambda _{N,M}\) is bounded. Since \(\phi _{N,M}\) are normalized and hence bounded, we can extract a subsequence \((\lambda _{N,M_j},\phi _{N,M_j})\) such that \(\lim _{j\rightarrow \infty } \lambda _{N,M_j}=\lambda _N \in \mathbb {C}\) and \(\lim _{j\rightarrow \infty } \phi _{N,M_j} = \phi _N \in \mathcal {F}_N\) (strong convergence as \(\mathcal {F}_N\) is finite dimensional). Then,

$$\begin{aligned} \mathcal {K}_N\phi _N = (\mathcal {K}_N - \mathcal {K}_{N,M_j})\phi _N + \mathcal {K}_{N,M_j}(\phi _N-\phi _{N,M_j}) + \mathcal {K}_{N,M_j}\phi _{N,M_j}. \end{aligned}$$

Since \(\mathcal {K}_{N,M_j}\) converges strongly to \(\mathcal {K}_N\) with probability one (Theorem 2) and since \(\phi _{N,M_j}\) converges strongly to \(\phi _N\), the first two terms go to zero with probability one as j tends to infinity. The last term is equal to \(\lambda _{N,M_j} \phi _{N,M_j}\) and hence necessarily \(\mathcal {K}\phi _N = \lambda _N \phi _N\), \(\Vert \phi _N\Vert =1\), with probability one. The result then follows from Theorem 4. \(\square \)

6 Implications for Finite-Horizon Predictions

One of the main roles of an approximation to the Koopman operator is to provide a prediction of the evolution of a given observable, whereas obtaining accurate predictions over an infinite-time horizon cannot be expected in general from the EDMD approximation of the Koopman operator; a prediction over any finite horizon is asymptotically, as \(N\rightarrow \infty \), exact when the prediction error is measured in the \(L_2(\mu )\) norm:

Theorem 5

Let \(f\in \mathcal {F}^n\) be a given (vector) observableFootnote 6 and let Assumption 2 hold. Then, for any \(\varOmega \in \mathbb {N}\) we have

$$\begin{aligned} \lim _{N\rightarrow \infty } \sup _{i\in \{1,\ldots ,\varOmega \}} \Vert (\mathcal {K}_N)^i P_N^\mu f - \mathcal {K}^i f \Vert = 0. \end{aligned}$$
(19)

In particular, if \(f\in \mathcal {F}_{N_0}^n\) for some \(N_0 \in \mathbb {N}\), then

$$\begin{aligned} \lim _{N\rightarrow \infty } \sup _{i\in \{1,\ldots ,\varOmega \}} \Vert (\mathcal {K}_N)^i f - \mathcal {K}^i f \Vert = 0. \end{aligned}$$
(20)

Proof

We proceed by induction. Let \(f\in \mathcal {F}\). For \(\varOmega =1\), the result is exactly Theorem 3. Let the result hold form some \(\varOmega \in \mathbb {\mathbb {N}}\). It is sufficient to prove that \(\Vert (\mathcal {K}_N)^{\varOmega +1} P_N^\mu f - \mathcal {K}^{\varOmega +1} f \Vert \rightarrow 0\) as \(N\rightarrow \infty \). We have

$$\begin{aligned} \Vert (\mathcal {K}_N)^{\varOmega +1} P_N^\mu f - \mathcal {K}^{\varOmega +1} f \Vert&= \Vert \mathcal {K}_N(\mathcal {K}_N)^{\varOmega } P_N^\mu f - \mathcal {K}\mathcal {K}^{\varOmega } f \Vert = \Vert \mathcal {K}_N g_N - \mathcal {K}g \Vert \\&\le \Vert \mathcal {K}_N g- \mathcal {K}g \Vert + \Vert \mathcal {K}_N(g_N - g) \Vert \\&\le \Vert \mathcal {K}_N g- \mathcal {K}g \Vert + \Vert \mathcal {K}\Vert \Vert g_N - g \Vert . \end{aligned}$$

where \(g_N = (\mathcal {K}_N)^{\varOmega } P_N^\mu f\) and \(g = \mathcal {K}^{\varOmega } f\); in the last inequality we used the fact that \(\Vert \mathcal {K}_N\Vert \le \Vert K\Vert \). The term \(\Vert \mathcal {K}_N g- \mathcal {K}g \Vert \) tends to zero by Theorem 3, whereas the term \(\Vert g_N - g \Vert \rightarrow 0\) by the induction hypothesis. This proves (19) for a scalar observable \(f\in \mathcal {F}\). The general result with a vector valued observable \(f\in \mathcal {F}^n\) follows by applying the same reasoning to each component of f. The result (20) follows from (19) since if \(f\in \mathcal {F}_{N_0}^n\) for some \(N_0\in \mathbb {N}\), then \(P_N^\mu f = f\) for \(N\ge N_0\). \(\square \)

To be more specific on practical use of \(\mathcal {K}_N\) for prediction, assume that \(f\in \mathcal {F}_{N_0}^n\) for some \(N_0\in \mathbb {N}\). Then, for all \(N\ge N_0\) there exists a matrix \(C_N\in \mathbb {R}^{n \times N}\) such that \(f = C_N\varvec{\psi }_N\), where \(\varvec{\psi }_N = [\psi _1,\ldots ,\psi _N]^\top \) are the observables used in EDMD. Assume that an initial state \(x_0\) is given and the values of the observables \(\varvec{\psi }_N(x_0)\) are known and we wish to predict the value of the observable f at a state \(x_i = T^i(x_0)\), i.e., i steps ahead in the future. Using \(\mathcal {K}_N\), this prediction is given by \(C_N A_N^i \varvec{\psi }_N(x_0)\), where

$$\begin{aligned} A_N = \lim _{M\rightarrow \infty } A_{N,M} \end{aligned}$$

with \(A_{N,M}\) definedFootnote 7 in (5). Theorem 5 then says that

$$\begin{aligned} \lim _{N\rightarrow \infty } \int _{\mathcal {M}} \Vert C_N A_N^i \varvec{\psi }_N - f\circ T^i \Vert ^2_2\,\mathrm{d}\mu = 0\quad \forall \, i\in \mathbb {N}. \end{aligned}$$
(21)

A typical application of Theorem 5 is the prediction of the future state x of the dynamical system (1) with a finite-dimensional state space \(\mathcal {M}\subset \mathbb {R}^n\). In this case, one simply sets \(f(x) = x\). A crucial feature of the predictor obtained in this way is its linearity in the “lifted state” \(z=\varvec{\psi }_N(x)\), allowing linear tools to address a nonlinear problem. This concept was successfully applied to model predictive control in Korda and Mezić (2016) and to state estimation in Surana and Banaszuk (2016).

Remark 2

If \(A_{N,M}\) is used instead of \(A_N\) in Theorem 5 and Eq. (21), then the same convergence results hold with a double limit, first taking the number of samples M to infinity and then the number of basis functions N. In particular, we get

$$\begin{aligned} \lim _{N\rightarrow \infty } \lim _{M\rightarrow \infty } \int _{\mathcal {M}} \Vert C_N A_{N,M}^i \varvec{\psi }_N - f\circ T^i \Vert ^2_2\,\mathrm{d}\mu = 0\quad \forall \, i\in \mathbb {N}. \end{aligned}$$
(22)

7 Analytic EDMD

The results of the previous sections suggest a variation of the EDMD algorithm provided that the mapping T is known in closed form and provided that the basis functions \(\psi _i\) are such that the integrals of \(\int _{\mathcal {M}} \psi _i \overline{\psi _j}\,\mathrm{d}\mu \) and \(\int _{\mathcal {M}} (\psi _i\circ T)\overline{\psi _j}\,\mathrm{d}\mu \) can be computed analytically. This is the case in particular if T and \(\psi _i\) are simple functions such as multivariate polynomials or trigonometric functions and \(\mu \) is the uniform distribution over a simple domain \(\mathcal {M}\) such as a box or a ball, or, e.g., a Gaussian distribution over \(\mathbb {R}^n\).

Provided that such analytical evaluation is possible, one can circumvent the sampling step of EDMD and construct directly \(\mathcal {K}_N\) rather than \(\mathcal {K}_{N,M}\). Indeed, define

$$\begin{aligned} A_N = M_{T,\mu } M_{\mu }^{-1}, \end{aligned}$$
(23)

where

$$\begin{aligned} M_{\mu } = \int _{\mathcal {M}} \varvec{\psi }\varvec{\psi }^{{\mathsf {H}}}\,\mathrm{d}\mu ,\quad M_{T,\mu } = \int _{ \mathcal {M}}(\varvec{\psi }\circ T)\varvec{\psi }^{{\mathsf {H}}}\,\mathrm{d}\mu . \end{aligned}$$

Then, the operator from \(\mathcal {F}_N\) to \(\mathcal {F}_N\) defined by \(c^{{\mathsf {H}}}\varvec{\psi }\mapsto c^{{\mathsf {H}}}A_N\varvec{\psi }\) is exactly \(\mathcal {K}_N = P_N^\mu \mathcal {K}_{|\mathcal {F}_N}\).

Theorem 6

If the matrix \(M_\mu \) is invertible, then for any \(\phi = c_\phi ^{{\mathsf {H}}}\varvec{\psi }\in \mathcal {F}_N\) we have

$$\begin{aligned} c_\phi ^{{\mathsf {H}}}A_N\varvec{\psi }= \mathcal {K}_N\phi . \end{aligned}$$

Proof

Given \(\phi = c_\phi ^{{\mathsf {H}}}\varvec{\psi }\), we get

$$\begin{aligned} \mathcal {K}_N\phi= & {} P_N^\mu \mathcal {K}\phi = \varvec{\psi }^{{\mathsf {H}}}\mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{c\in \mathbb {C}^N}\int _{\mathcal {M}}[c^{{\mathsf {H}}}\varvec{\psi }- c_\phi ^{{\mathsf {H}}}(\varvec{\psi }\circ T)]^2\,\mathrm{d}\mu \\= & {} \varvec{\psi }^{{\mathsf {H}}}\mathop {{\mathrm{arg}}\,{\mathrm{min}}}\limits _{c\in \mathbb {C}^N}\big [ c^{{\mathsf {H}}}M_{\mu } c - 2{\mathrm{Re}}\big \{c^{{\mathsf {H}}}M_{T,\mu }^{{\mathsf {H}}}c_{\phi }\big \}\big ] \end{aligned}$$

with the unique minimizer \(c = M_{\mu }^{-1}M_{T,\mu }^{{\mathsf {H}}}c_\phi \) (since \(M_\mu \) is invertible, therefore Hermitian positive definite, and hence the minimized function is strictly convex). Therefore, as desired

$$\begin{aligned} \mathcal {K}_N\phi = c^{{\mathsf {H}}}\varvec{\psi }= c_\phi ^{{\mathsf {H}}}M_{T,\mu } M_{\mu }^{-1}\varvec{\psi }= c_\phi ^{{\mathsf {H}}}A_N \varvec{\psi }. \end{aligned}$$

\(\square \)

Example

In order to demonstrate the use of Analytic DMD, we compare the spectra of \(\mathcal {K}_N\) and \(\mathcal {K}_{N,M}\) for various values of M. The system considered is the logistic map

$$\begin{aligned} x^+ = 2x^2 - 1,\quad x \in [-1,1]. \end{aligned}$$

The measure \(\mu \) is taken to be the uniform distribution on \([-\,1,1]\), which is not invariant and hence the dynamics is not measure preserving. The finite-dimensional subspace \(\mathcal {F}_N\) is the space of all polynomials of degree no more than eight. For numerical stability, we chose a basis of this subspace to be the Laguerre polynomials scaled such that they are orthonormal with respect to the uniform measure on \([-\,1,1]\). Spectra of \(\mathcal {K}_N\) and \(\mathcal {K}_{N,M}\) for \(M=10^2\), \(M=10^3\) and \(M=10^5\) are depicted in Fig. 2. We observe that a relatively large number of samples M are required to obtain an accurate approximation of the spectrum of \(\mathcal {K}_{N}\). This example demonstrates that a special care must be taken in practice when drawing conclusions about spectral quantities based on a computation with a small number of samples. On the other hand, Fig. 3 suggests that, at least on this example, predictions generated by \(\mathcal {K}_{N,M}\) are less affected by sampling. Indeed, even for \(M=100\) the prediction accuracy of \(\mathcal {K}_{N,M}\) is comparable to that of \(\mathcal {K}_N\) and for \(M=1000\) the two predictions almost coincide.

Fig. 2
figure 2

Comparison of the spectra of \(\mathcal {K}_N\) (blue circles) computed using Analytic DMD and the spectra of \(\mathcal {K}_{N,M}\) for different values of M (red crosses) (Color figure online)

Fig. 3
figure 3

Comparison of predictions generated using \(\mathcal {K}_N\) and using \(\mathcal {K}_{N,M}\) for different values of M

8 Convergence of \(\mathcal {K}_{N,N}\)

In this section, we investigate what happens when we simultaneously increase the number of basis functions N and the number of samples M. We treat the special case of \(M = N\) for which interesting conclusions can be drawn. Set therefore \(M = N\) and denote \(\lambda _N = \lambda _{N,N}\) any eigenvalue of \(\mathcal {K}_{N,N}\) and \(\phi _N = \phi _{N,N} \in \mathcal {F}_N\), \(\Vert \phi _N\Vert _{C(\mathcal {M})} =1\), the associated eigenfunction, where \(\Vert \phi \Vert _{C(\mathcal {M})} = \sup _{x\in \mathcal {M}} |\phi (x)|\); such normalization is possible if the basis functions \(\psi _i\) are continuous and \(\mathcal {M}\) compact, which we assume in this section. First notice that, assuming \(M_{{\hat{\mu }}_N}\) invertible, for \(N=M\) the system of equations

$$\begin{aligned} \varvec{\psi }(\varvec{Y}) = A\varvec{\psi }(\varvec{X}) \end{aligned}$$

with the unknown \(A\in \mathbb {R}^{N\times N}\) has a solution and hence the minimum in the least-squares problem (4) is zero. In other words, for any \(f\in \mathcal {F}_N\), the EDMD operator \(\mathcal {K}_{N,N}\) applied to f matches the value of the Koopman operator applied to f on the samples points \(x_1,\ldots ,x_N\):

$$\begin{aligned} (\mathcal {K}f)(x_i) = (\mathcal {K}_{N,N}f)(x_i) \end{aligned}$$

for all \(f\in \mathcal {F}_N\). This relation is in particular satisfied for the eigenfunctions \(\phi _N\) of \(\mathcal {K}_{N,N}\), obtaining

$$\begin{aligned} (\phi _N\circ T)(x_i) = \lambda _N \phi _N(x_i). \end{aligned}$$

Multiplying by an arbitrary \(h\in \mathcal {F}\) and integrating with respect to the empirical measure supported on the sample points (8), we get

$$\begin{aligned} \int _{\mathcal {M}} h\cdot (\phi _{N}\circ T)\, \mathrm{d}{\hat{\mu }}_{N} = \lambda _{N} \int _{\mathcal {M}} h \phi _{N} \, \mathrm{d}{\hat{\mu }}_{N}. \end{aligned}$$
(24)

Define the linear functional \(L_{N}{:}\,C(\mathcal {M})\rightarrow \mathbb {C}\) by

$$\begin{aligned} L_N(h) = \int _{\mathcal {M}} h\, \phi _{N}\, \mathrm{d}\hat{\mu }_{N}, \end{aligned}$$

and

$$\begin{aligned} (\mathcal {K}L_N)(h) = \int _{\mathcal {M}} h\cdot (\phi _{N}\circ T)\, \mathrm{d}{\hat{\mu }}_{N}. \end{aligned}$$

With this notation, the relationship (24) becomes

$$\begin{aligned} \mathcal {K}L_N = \lambda _{N} L_N. \end{aligned}$$

Since \(\Vert \phi _{N}\Vert _{C(\mathcal {M})} = 1\), we have \(\Vert L_N \Vert = \sup _{h\in C(\mathcal {M})} \frac{|L_N(h)|}{\Vert h \Vert _{C(\mathcal {M})}} \le 1\) and \(\Vert \mathcal {K}L_N \Vert \le 1\). Therefore, assuming separabilityFootnote 8 of \(C(\mathcal {M})\), by the Banach–Alaoglu theorem (e.g., Rudin 1973, Theorems 3.15, 3.17) there exists a subsequence, along which these functionals converge in the weak\(^\star \) topologyFootnote 9 to some functionals \(L \in C(\mathcal {M})^\star \) and \(\mathcal {K}L\in C(\mathcal {M})^\star \) satisfying

$$\begin{aligned} \mathcal {K}L = \lambda L, \end{aligned}$$

where \(\lambda \) is an accumulation point of \(\lambda _N\). Furthermore, by the Riesz representation theorem the bounded linear functionals L and \(\mathcal {K}L\) can be represented by complex-valued measures \(\nu \) and \(\mathcal {K}\nu \) on \(\mathcal {M}\) satisfying

$$\begin{aligned} \mathcal {K}\nu = \lambda \nu . \end{aligned}$$

We remark that \(\mathcal {K}L\) and \(\mathcal {K}\nu \) are here merely symbols for the weak\(^\star \) limit of \(\mathcal {K}L_N\) and its representation as a measure; in particular, the functional \(\mathcal {K}L\) is not necessarily of the form \((\mathcal {K}L)(h)=\int _{\mathcal {M}} h\cdot (\rho \circ T)\,\mathrm{d}\mu \) for some function \(\rho \).

In order to get more understanding of \(\mathcal {K}\nu \) (and hence \(\mathcal {K}L\)), we need to impose additional assumptions on the structure of the problem. In particular, we assume that the mapping \(T{:}\,\mathcal {M}\rightarrow \mathcal {M}\) is a homeomorphism and that the points \(x_1,\ldots ,x_N\) lie on a single trajectory, i.e., \(x_{i+1} = T(x_i)\). With this assumption, Eq. (24) reads

$$\begin{aligned} \frac{1}{N} \sum _{i=1}^N h(x_i)\phi _N(x_{i+1}) = \lambda _{N} \frac{1}{N} \sum _{i=1}^N h(x_i)\phi _N(x_i), \end{aligned}$$
(25)

where we set \(x_{N+1}:= T(x_N)\). The left-hand side of (25) is

$$\begin{aligned} \frac{1}{N} \sum _{i=1}^N h(x_i)\phi _N(x_{i+1})&= \frac{1}{N} \sum _{i=1}^N h(T^{-1} x_i)\phi _N(x_i) + \frac{1}{N}\big [h(x_N)\phi _N(x_{N+1})\nonumber \\&\quad -\,h(T^{-1}x_1)\phi _N(x_1)\big ]\nonumber \\&= \int _{\mathcal {M}} h\circ T^{-1} \, \mathrm{d}\nu _N + \frac{1}{N}\big [h(x_N)\phi _N(x_{N+1}) - h(T^{-1}x_1)\phi _N(x_1)\big ], \end{aligned}$$
(26)

where \(\nu _N\) is the measure \(\phi _N \mathrm{d}{\hat{\mu }}_N\). Setting \(\xi _N:=h(x_N)\phi _N(x_{N+1}) - h(T^{-1}x_1)\phi _N(x_1)\), the relation (25) becomes

$$\begin{aligned} \int _{\mathcal {M}} h\circ T^{-1} \, \mathrm{d}\nu _N + \frac{1}{N}\xi _N = \lambda _N\int _{\mathcal {M}} h\,\mathrm{d}\nu _N. \end{aligned}$$

Since h is bounded on \(\mathcal {M}\) (h is continuous and \(\mathcal {M}\) compact) and \(\Vert \phi _N\Vert _{C(\mathcal {M})} = 1\), the term \(\xi _N\) is bounded; in addition \(h\circ T^{-1}\) is continuous since T is a homeomorphism by assumption. Therefore, taking a limit on both sides, along a subsequence such that \(\nu _{N_i}\rightarrow \nu \) weakly,Footnote 10 \({\hat{\mu }}_{N_i}\rightarrow \mu \) weakly and \(\lambda _{N_i}\rightarrow \lambda \), we obtain

$$\begin{aligned} \int _{\mathcal {M}} h\circ T^{-1} \, \mathrm{d}\nu = \lambda \int _{\mathcal {M}} h\,\mathrm{d}\nu \end{aligned}$$
(27)

for all \(h\in C(\mathcal {M})\).

8.1 Weak Eigenfunctions/Eigendistributions

To understand relation (27), note that a completely analogous computation to (26) shows that the measure \(\mu \) is invariantFootnote 11 and therefore the \(L_2(\mu )\)-adjoint \(\mathcal {K}^\star \) of the Koopman operator [viewed as an operator from \(L_2(\mu ) \) to \(L_2(\mu )\)] is given by

$$\begin{aligned} \mathcal {K}^\star f = f\circ T^{-1}. \end{aligned}$$

To see this, write

$$\begin{aligned} \langle \mathcal {K}f,g \rangle= & {} \int _{\mathcal {M}} (f\circ T) \overline{g} \,\mathrm{d}\mu = \int _{\mathcal {M}} (f\circ T)\overline{(g\circ T^{-1}\circ T)} \,\mathrm{d}\mu \\= & {} \int _{\mathcal {M}} f \cdot \overline{(g\circ T^{-1} )}\,\mathrm{d}\mu = \langle f, \mathcal {K}^\star g\rangle , \end{aligned}$$

which means the operator \(g\mapsto g\circ T^{-1}\) is indeed the \(L_2(\mu )\)-adjoint of \(\mathcal {K}\). The relation (27) then becomes

$$\begin{aligned} \int _{\mathcal {M}} \mathcal {K}^\star h \, \mathrm{d}\nu = \lambda \int _{\mathcal {M}} h\,\mathrm{d}\nu \end{aligned}$$
(28)

or

$$\begin{aligned} L(\mathcal {K}^\star h) = \lambda L(h). \end{aligned}$$
(29)

Functionals of the form (29) were called “generalized eigenfunctions” by Gelfand and Shilov (1964); here, we prefer to call them “weak eigenfunctions” or “eigendistributions” in order to avoid confusion with generalized eigenfunctions viewed as an extension of the notion of generalized eigenvectors from linear algebra. The measure \(\nu \) in (28) is then called “eigenmeasure.” Here, again, we emphasize the requirement that the limiting functional L (or the measure \(\nu \)) be nonzero in order for these objects to be called eigenfunctionals/eigenmeasures.

8.2 Eigenmeasures of Perron–Frobenius

We also observe an interesting connection to eigenmeasures of the Perron–Frobenius operator. To see this, set \(h:= g\circ T\) in (27) to obtain \( \int _{\mathcal {M}}g\,\mathrm{d}\nu = \lambda \int _{\mathcal {M}} g\circ T\,\mathrm{d}\nu \) or, provided that \(\lambda \ne 0\),

$$\begin{aligned} \int _{\mathcal {M}} g\circ T\,\mathrm{d}\nu = \frac{1}{\lambda } \int _{\mathcal {M}}g\,\mathrm{d}\nu . \end{aligned}$$
(30)

In other words, if nonzero, the measure \(\nu \) is the eigenmeasure of the Perron–Frobenius operator with eiegnvalue \(1/\lambda \). Here, the Perron–Frobenius operator \(\mathcal {P}{:}\,M(\mathcal {M})\rightarrow M(\mathcal {M})\), where \(M(\mathcal {M})\) is the space of all complex-valued measures on \(\mathcal {M}\), is defined for every \(\eta \in M(\mathcal {M})\) and every Borel set A by

$$\begin{aligned} (\mathcal {P}\eta )(A) = \eta (T^{-1}(A)). \end{aligned}$$

The results of Sect. 8 are summarized in the following theorem:

Theorem 7

Suppose that \(\mathcal {M}\) is a compact metric space, T is a homeomorphism, \(\mathcal {K}{:}\,L_2(\mu )\rightarrow L_2(\mu )\) is bounded, the observables \(\psi _1,\ldots ,\psi _N\) are continuous and the sample points \(x_1,\ldots , x_N\) satisfy \(x_{i+1} = T(x_i)\) for all \(i\in \{1,\ldots ,N-1\}\). Let \(\lambda _N\) be a bounded sequence of eigenvalues of \(\mathcal {K}_{N,N}\), let \(\phi _N\), \(\Vert \phi _N\Vert _{C(\mathcal {M})}=1\), be the associated normalized eigenfunctions and denote \(\nu _N = \phi _N \mathrm{d}{\hat{\mu }}_N\). Then, there exists a subsequence \((N_i)_{i=1}^\infty \) such that \(\nu _{N_i}\) and \({\hat{\mu }}_{N_i}\) converge weakly to complex-valued measures \(\nu \in M(\mathcal {M})\), \(\mu \in M(\mathcal {M})\) and \(\lim _{i\rightarrow \infty } \lambda _{N_i } = \lambda \in \mathbb {C}\) such that

$$\begin{aligned} \int _{\mathcal {M}} h\circ T^{-1} \, \mathrm{d}\nu = \lambda \int _{\mathcal {M}} h\,\mathrm{d}\nu \quad \forall \,h\in C(\mathcal {M}). \end{aligned}$$

In addition, the measure \(\mu \) is invariant under the action of T and

$$\begin{aligned} \int _{\mathcal {M}} \mathcal {K}^\star h \, \mathrm{d}\nu = \lambda \int _{\mathcal {M}} h\,\mathrm{d}\nu \quad \forall \,h\in C(\mathcal {M}), \end{aligned}$$

where \(\mathcal {K}^\star \) is the \(L_2(\mu )\) adjoint of \(\mathcal {K}\), i.e., if nonzero, \(\nu \) is a weak eigenfunction (or eigendistribution) of the Koopman operator. Furthermore, if \(\lambda \ne 0\), then

$$\begin{aligned} \int _{\mathcal {M}} h\circ T\,\mathrm{d}\nu = \frac{1}{\lambda } \int _{\mathcal {M}}h\,\mathrm{d}\nu \quad \forall \,h\in C(\mathcal {M}), \end{aligned}$$

i.e., if nonzero, \(\nu \) is an eigenmeaure of the Perron–Frobenius operator with eigenvalue \(1/\lambda \).

9 Conclusions

This paper analyzes the convergence of the EDMD operator \(\mathcal {K}_{N,M}\), where M is the number of samples and N the number of observables used in EDMD. It was proven in Klus et al. (2016) that as \(M\rightarrow \infty \), the operator \(\mathcal {K}_{N,M}\) converges to \(\mathcal {K}_N\), the orthogonal projection of the action of the Koopman operator on the span of the observables used in EDMD. We analyzed the convergence of \(\mathcal {K}_N\) as \(N\rightarrow \infty \), obtaining convergence in strong operator topology to the Koopman operator and weak convergence of the associated eigenfunctions along a subsequence together with the associated eigenvalues. In particular, any accumulation point of the spectra of \(\mathcal {K}_N\) corresponding to a nonzero weak accumulation point of the eigenfunctions lies in the point spectrum of the Koopman operator \(\mathcal {K}\). In addition, we proved convergence of finite-horizon predictions obtained using \(\mathcal {K}_N\) in the \(L_2\) norm, a result important for practical applications such as forecasting, estimation and control. Finally, we analyzed convergence of \(\mathcal {K}_{N,N}\) (i.e., the situation where the number of samples and the number of basis functions is equal) under the assumptions that the sample points lie on the same trajectory. In this case, one obtains convergence, along a subsequence, to a weak eigenfunction (or eigendistribution) of the Koopman operator, provided that the weak limit is nonzero. This eigendistribution turns out to be also an eigenmeasure of the Perron–Frobenius operator. As a by-product of these results, we proposed an algorithm that, under some assumptions, allows one to construct \(\mathcal {K}_N\) directly, without the need for sampling, thereby eliminating the sampling error.

Future work should focus on non-asymptotic analysis, e.g., on selecting the subspace \(\mathcal {F}_N\) such that \(\Vert \mathcal {K}_{N} - \mathcal {K}_{|\mathcal {F}_N} \Vert \) is minimized and at the same time such that \(\mathcal {F}_N\) is rich enough in the sense of containing observables of practical interest (e.g., the state observable). This line of research was already investigated in the context of stochastic systems in Wu and Noé (2017), providing an interesting and actionable method for selecting \(\mathcal {F}_N\).