1 Introduction

DMD has emerged as an effective method of extracting fundamental governing principles from high-dimensional time series data. The method has been employed successfully in the field of fluid dynamics, where DMD methods have demonstrated an ability to determine dynamic modes, also known as “Koopman modes,” which agree with Proper Orthogonal Decomposition (POD) analyses (cf. Budišić et al. 2012; Črnjarić-Žic et al. 2020; Kutz et al. 2016; Mezić 2005, 2013; Williams et al. 2015a, b). However, DMD methods employing Koopman operators do not address continuous time dynamical systems directly. Instead, current DMD methods analyze discrete time proxies of continuous time systems (Kutz et al. 2016). The discretization process constrains Koopman-based DMD methods to systems that are forward complete (Bittracher et al. 2015). The objective of the present manuscript is to develop DMD methods that avoid discretization of continuous time dynamical systems, while providing convergence results that are stronger than Koopman-based DMD and applicable to a broader class of dynamical systems.

The connection between Koopman operators and DMD relies on the idea that a finite dimensional nonlinear dynamical system can be expressed as a linear operator over an infinite dimensional space. The linear representation enables treatment of the nonlinear system via tools from the theory of linear systems and linear operators. The idea of lifting finite dimensional nonlinear systems into infinite dimensional linear ones has been successfully utilized in the literature to achieve various identification and control objectives; however, a few fundamental limitations severely restrict the class of systems for which the connection between Koopman operators and DMD can be established via lifting to infinite dimensions. In particular, this article focuses on the following limitations.

Existence of Koopman Operators in Continuous Time Systems Consider the continuous time dynamical system given as \(\dot{x} = 1 + x^2\). Discretization of this system with time step 1 yields the discrete dynamics \(x_{i+1} = F(x_i) := \tan (1+\arctan (x_i)).\) It should be immediately apparent that F is not well defined over \({\mathbb {R}}\). In fact, through the consideration of \(x_i = \tan (\pi /2 - 1)\) it can be seen that \(F(x_i)\) is undefined. Since the symbol for a Koopman operator must be defined over the entire domain, there is no well-defined Koopman operator arising from this discretization. Note that the example above is not anecdotal. In addition to commonly used examples in classical works, such as Khalil (2002), mass-action kinetics in thermodynamics (Haddad 2019, Section 6.3), chemical reactions (Tóth et al. 2018, Section 8.4), and species populations (Hallam and Levin 2012, Section 4.2) often give rise to such models. In general, unless the solutions of the continuous time dynamics are constrained to be forward complete, (for example, by assuming that the dynamical systems are globally Lipschitz continuous (Coddington and Levinson 1955, Chapter 1)) the resultant Koopman operator cannot be expected to be well-defined. This observation is validated by Bittracher et al. (2015), but otherwise conditions on the dynamics are largely absent from the literature.

Boundedness of Koopman Operators Even in the case of globally Lipschitz models, results regarding convergence of the DMD operator to the Koopman operator rely on the assumption that the Koopman operator is bounded over a specified RKHS (cf. Korda and Mezić 2018). Boundedness of composition operators, like the Koopman operator, has been an active area of study in the operator theory community. Indeed, it turns out there are very few bounded composition operators over many function spaces. A canonical example is in the study of the Bargmann-Fock space, where only affine symbols yield bounded composition operators and of those the compact operators arise from \(F(z) = a z + b\) where \(|a| < 1\) (Carswell et al. 2003). A similar result holds for the native RKHS of the exponential dot product kernel and the native RKHS of the Gaussian radial basis function kernel (Gonzalez et al. 2021, Theorem 1). The implication of these results is that Koopman operators arising from the discretization of continuous time nonlinear systems cannot generally be expected to be bounded.

Practical Utility of Convergence Results In the DMD literature, convergence of the DMD operator to the Koopman operator is typically established in the strong operator topology (SOT). However, as noted in Korda and Mezić (2018), since SOT convergence is the topology of pointwise convergence (Pedersen 2012), it is not sufficient to justify use of the DMD operator to interpolate or extrapolate the system behavior from a collection of samples. Furthermore, by selecting a complete set of observables and adding them one at a time to a finite rank representation of the Koopman operator, pointwise convergence is to be expected. This is a restatement of the more general result that all bounded operators may be approximated by finite rank operators in SOT, which itself is a specialization of a much broader result for topological vector spaces (cf. Pedersen 2012, pg. 172). While Korda and Mezić (2018) also provides theoretically interesting insights into convergence of the eigenvalues and the eigenvectors of the DMD operator to eigenvalues and eigenfunctions of the Koopman operator along a subsequence, without the means to identify the convergent subsequences, practical utility of subsequential convergence is limited. In contrast, norm convergence is uniform convergence for operators, and yields a bound on the error over the kernels corresponding to the entire data set. Thus, a meaningful convergence result would arise from the norm convergence of finite rank representations to Koopman operators. However, this result is only possible for compact Koopman operators, which are virtually nonexistent in applications of interest.

A subset of Liouville operators, called Koopman generators, have been studied as limits of Koopman operators in works such as Cvitanovic et al. (2005), Das and Giannakis (2020), Froyland et al. (2014), Giannakis (2019), Giannakis and Das (2020) and Giannakis et al. (2018). Since Koopman generators are limits of Koopman operators, they also require the assumption of forward completeness on the dynamical system. This discussion brings into question the impact of various approaches to the study of continuous time dynamical systems through discretization and Koopman operators, which all rely on the compactness, boundedness, or existence of Koopman operators.

The present work sidesteps the limiting process, and as a result, the assumptions regarding existence of Koopman operators, through the use of “occupation kernels”. Specifically, occupation kernels remove the burden of approximation from that of operators and places it on the estimation of occupation kernels from time-series data, which requires much less theoretical overhead. Consequently, Liouville operators may be directly examined via occupation kernels, while avoiding limiting relations involving Koopman operators that might not be well defined for a particular discretization of a continuous time nonlinear dynamical system. As a result, the use of Liouville operators in a DMD routine allows for the study of dynamics that are locally rather than globally Lipschitz.

The action of the adjoint of a Liouville operator on an occupation kernel provides the input-output relationships that enable DMD of time series data. For the adjoint of a Liouville operator to be well defined, the operator must be densely defined over the underlying RKHS (Rosenfeld et al. 2019a, b). As a result, the exact class of dynamical systems that may be studied using Liouville operators depends on the selection of the RKHS. However, the requirement that the Liouville operator must be densely defined is not overly restrictive. For example, on the real valued Bargmann-Fock space, Liouville operators are densely defined for a wide range of dynamics that are expressible as real entire functions (which includes polynomial, exponential, sine, and cosine, etc.).

Perhaps the strongest case for Liouville operators is the fact that they can be “scaled” to generate compact operators. Section 3 of this paper introduces the idea of scaled Liouville operators as variants of Liouville operators that are compact for a large class of dynamical systems over the Bargmann-Fock space. Scaled Liouville operators make slight adjustments to the data by scaling the trajectories by a single parameter \(|a| < 1\). Through the selection of a close to 1, scaled Liouville operators yield compact operators that are numerically indistinguishable from the corresponding unbounded Liouville operators over a given compact workspace. More importantly, the DMD procedure performed on scaled Liouville operators yields a sequence of finite rank operators that converge in norm to the scaled Liouville operators (see Theorem 2).

Practical Benefits of the Developed Method In addition to the theoretical benefits of Liouville operators detailed above, there are several practical benefits that arise from the use of occupation kernels and Liouville operators. Quadrature techniques, such as Simpson’s rule, allow for the efficient estimation of occupation kernels while mitigating signal noise (Rosenfeld et al. 2019a, b), and also provide a robust estimation of the action of Liouville operators on occupation kernels. Furthermore, as snapshots are being integrated into trajectories for the generation of occupation kernels, the method presented in this manuscript can naturally incorporate irregularly sampled data.

In DMD, a large finite dimensional representation of the linear operator is constructed from data (i.e., snapshots) using a collection of observables. A subsystem of relatively small rank is then determined via a singular value decomposition (SVD) and approximation of the linear operator by the small rank subsystem is supported by a direct mapping between the eigenfunctions of the former and the eigenvectors of the latter (Williams et al. 2015b). The fact that the rank of the smaller subsystem is typically in agreement with the number of snapshots, which can be considerably smaller than the number of observables, makes DMD particularly useful when there is a small number of snapshots of a high dimensional system. However, direct application of DMD to high dimensional systems sampled at high frequencies still poses a significant computational challenge, where many snapshots may have to be discarded to produce a computationally tractable problem, as was done in Kutz et al. (2016, Example 2.3). Such systems include mechanical systems with high sampling frequencies (Cichella et al. 2015; Walters et al. 2018), and neurobiological systems recorded via electroencephalography (EEG) where the typical sampling frequencies are of the order of 500 Hz (Gruss and Keil 2019). The methods in the present manuscript replace snapshots with integrals of trajectories of the system. The use of trajectories instead of individual snapshots reduces the dimensionality of the problem without discarding any data.

The developed algorithm also obviates the need for the truncated SVD that is utilized throughout DMD literature. For example, in Williams et al. (2015b) the truncated SVD is leveraged to convert from a feature space representation of the action of the Koopman operators to an approximation of the Gram matrix and an “interaction matrix.” This stands in opposition of the spirit of the “kernel trick,” where kernel functions are a means to avoid any direct interface with feature space. Following (Rosenfeld and Kamalapurkar 2021), the presented algorithm is given purely with respect to the occupation kernels, and the resultant methods are considerably simpler than what is seen in Williams et al. (2015b).

A Comparison with Similar Literature Liouville operators are studied in the context of DMD procedures using limiting definitions in works such as Klus et al. (2020). The manuscript (Klus et al. 2020), which was posted to arXiv around the same time as the first draft of this manuscript, approaches the Koopman generator through Galerkin methods. While the signs that the field is expanding beyond Koopman operators is encouraging, the authors of Klus et al. (2020) still adopt the limiting definitions of the Koopman generator in their work, which is an artifact from ergodic theory. Quantities similar to occupation kernels have been studied in the literature previously, in the form of occupation measures and time averaging functionals. Occupation kernels and occupation measures both represent the same functional over different spaces. Occupation measures are in the dual space of the Banach space of continuous functions, while occupation kernels are functions in a RKHS. As such, functions in the RKHS may be estimated through projections onto the span of occupation kernels, and this fact is leveraged in Sect. 4 where finite rank representations of the Liouville operators arise from the matrix representation of a projection operator. Occupation kernels are also distinct from time average functionals, where the latter is the average of a sum of iterated applications of the Koopman operator to an observable. As a result, in contrast with occupation kernels, whose definition is independent of Koopman and Liouville operators, time average functionals are only useful for the study of globally Lipschitz dynamics.

The relevant preliminary concepts for the theoretical underpinnings of the approach taken in the present manuscript are reviewed in Sect. 2.1. This includes definitions and properties of RKHSs as well as densely defined operators and their adjoints.

2 Technical Preliminaries

2.1 Reproducing Kernel Hilbert Spaces

Definition 1

A reproducing kernel Hilbert space (RKHS) over a set X is a Hilbert space of functions from X to \({\mathbb {R}}\) such that for each \(x \in X\), the evaluation functional \(E_x g := g(x)\) is bounded.

By the Riesz representation theorem, corresponding to each \(x \in X\), there is a function \(k_x \in H\), such that for all \(g \in H\), \(\langle g, k_x \rangle _H = g(x)\). The kernel function corresponding to H is given as \(K(x,y) = \langle k_y, k_x \rangle _H.\) The kernel function is a positive definite function in the sense that for any finite number of points \(\{ c_1, c_2, \ldots , c_M \} \subset X\), the corresponding Gram matrix \([K(c_i,c_j)]_{i,j=1}^{M}\) is positive semi-definite. The Gram matrix arises in many contexts in machine learning, such as in support vector machines (cf. Hastie et al. 2005). Particular to the subject matter of this manuscript, the Gram matrix plays a pivotal role in the construction of the kernel-based extended DMD method of Williams et al. (2015b) and the occupation kernel approach presented herein.

The Aronszajn-Moore theorem states that there is a unique correspondence between RKHSs and positive definite kernel functions (Aronszajn 1950). That is the RKHS may be constructed directly from the kernel function itself or the kernel function may be determined by a RKHS through the Riesz representation theorem. When the RKHS is obtained from the kernel function, it is frequently referred to as the native space of that kernel function (Wendland 2004).

RKHSs interact with function theoretic operators, such as Koopman (composition) operators (Jury 2007; Luery 2013; Williams et al. 2015b), multiplication operators (Rosenfeld 2015a, b), and Toeplitz operators (Rosenfeld 2016), in many nontrivial ways. For example, the kernel functions themselves play the role of eigenfunctions for the adjoints of multiplication operators Szafraniec (2000), and when the function corresponding to a Koopman operator has a fixed point at \(c \in X\), the kernel function centered at that point (i.e. \(K(\cdot ,c) \in H\)) is an eigenfunction for the adjoint of the Koopman operator (Cowen and MacCluer 1995). The kernel functions can also be demonstrated to be in the domain of the adjoint of densely defined Koopman operators as will be demonstrated in Sect. 2.2.

For machine learning applications kernel functions are frequently used for dimensionality reduction by expressing the inner product of data cast into a high-dimensional feature space as evaluation of the kernel function itself (Steinwart and Christmann 2008; Hastie et al. 2005). Specifically, a feature map corresponding to a RKHS is given as the mapping \(x \mapsto \varPsi (x) := (\varPsi _1(x),\varPsi _2(x),\ldots )^T \in \ell ^2({\mathbb {N}})\) for \(x \in X\) such that \(K(x,y) = \langle \varPsi (y), \varPsi (x) \rangle _{\ell ^2({\mathbb {N}})}\). That is, kernel function may be expressed as

$$\begin{aligned} K(x,y) = \sum _{m=1}^\infty \varPsi _m(x) \varPsi _m(y). \end{aligned}$$

The feature space expression for a function \( g \in H \) is given as \( {\mathbf {g}} = (g_1, g_2, \ldots )^T \in \ell ^2({\mathbb {N}}) \) so that \( g(x) = \langle {\mathbf {g}}, \varPsi (x) \rangle _{\ell ^2({\mathbb {N}})} = \langle g, K(\cdot ,x) \rangle _H \). This representation of inner products of vectors in a feature space as evaluation of a kernel function is central to the usage of kernel methods in data science, where the feature space is generally unknown but may be accessed through the kernel function. The approach taken in Williams et al. (2015b) uses the feature space as the fundamental basis for their representation and obtains kernel functions through a truncated SVD, whereas the present work avoids the invocation of the feature space and the truncated SVD.

The most frequently employed RKHS in machine learning applications is the native space of the Gaussian radial basis function kernel. The Gaussian radial basis function kernel is given as \(K(x,y) = \exp \left( -\frac{1}{\mu } \Vert x-y\Vert _2^2 \right) \), and it is a positive definite function over \({\mathbb {R}}^n\) for all n.

Another important kernel is the exponential dot product kernel, \(K(x,y) = \exp \left( \frac{1}{\mu } x^Ty \right) \), which is also a positive definite function over \({\mathbb {R}}^n\). What is significant concerning the exponential dot product kernel is that its native space is the Bargmann-Fock space, where bounded Koopman operators have been completely classified. Another significant feature, which will be leveraged in this manuscript, is that polynomials are dense inside the Bargmann-Fock space with respect to the Hilbert space norm.

2.2 Adjoints of Densely Defined Liouville Operators

In the study of operators, the theory concerning bounded operators is the most complete (cf. Pedersen 2012; Folland 2013). A bounded operator over a Hilbert space is a linear operator \(W: H \rightarrow H\) such that \(\Vert Wg \Vert _H \le C \Vert g \Vert _H\) for some \(C > 0\). The smallest C that satisfies \( \left\| W g \right\| _{H} \le C \left\| g \right\| _{H} \) for all \(g \in H\) is the norm of W and written as \(\Vert W \Vert \). A classical theorem in operator theory states that the collection of bounded operators is precisely the collection of continuous operators over a Hilbert space (or more generally a Banach space) (Folland 2013, Chapter 5).

Unbounded operators over a Hilbert space are linear operators given as \(W: {\mathcal {D}}(W) \rightarrow H\), where \({\mathcal {D}}(W)\) is the domain contained within H on which the operator W is defined (Pedersen 2012, Chapter 5). When the domain of W is dense in H, W is said to be a densely defined operator over H. While unbounded operators are by definition discontinuous, closed operators over a Hilbert space satisfy weaker limiting relations. That is, an operator is closed if, whenever \(\{ g_m \}_{m=1}^\infty \in {\mathcal {D}}(W)\), both \(\{ g_m \}_{m=1}^\infty \) and \(\{ Wg_m \}_{m=1}^\infty \) are convergent sequences, \(g_m \rightarrow g \in H\), and \(Wg_m \rightarrow h \in H\), we have that \(g \in {\mathcal {D}}(W)\) and \(Wg = h\) (Pedersen 2012, Chapter 5). The Closed Graph Theorem states that if W is a closed operator such that \({\mathcal {D}}(W) = H\), then W is bounded.

Lemma 1

Given a RKHS, H, consisting of continuously differentiable functions, a Liouville Operator with symbol f, \(A_f: {\mathcal {D}}(A_f) \rightarrow H\), is defined as \(A_f g := \nabla g \cdot f\) where g resides in the canonical domain

$$\begin{aligned}{\mathcal {D}}(A_f) := \{ g \in H : \nabla g \cdot f \in H \}. \end{aligned}$$

With this domain, \(A_f\) is closed over RKHSs that are composed of continuously differentiable functions.

Proof

Liouville operators were demonstrated to be closed in Rosenfeld et al. (2019a). \(\square \)

The closedness of Koopman operators is well known in the study of RKHS, where they are more commonly known as composition operators (cf. Jury 2007; Luery 2013). Beyond the limit relations provided by closed operators, the closedness of an unbounded operator plays a significant role in the study of the adjoints of unbounded operators (Pedersen 2012, Chapter 5).

Definition 2

For a densely defined operator W, let

$$\begin{aligned} {\mathcal {D}}(W^*) := \{ h \in H : g \mapsto \langle Wg, h \rangle _H \text { is bounded on }{\mathcal {D}}(W) \}.\end{aligned}$$

Since \({\mathcal {D}}(W)\) is dense in H, the functional \(g \mapsto \langle Wg, h \rangle _H\) uniquely extends to H, and as such, for each \(h \in {\mathcal {D}}(W^*)\) the Riesz representation theorem guarantees a function \(W^*h \in H\) such that \(\langle Wg, h \rangle _H = \langle g, W^*h \rangle _H\), for all \(g\in {\mathcal {D}}(W)\). The adjoint of the operator W is thus given as \(W^* : {\mathcal {D}}(W^*) \rightarrow H\) via the assignment \(h \mapsto W^*h\).

Since the adjoint of a closed operator over a Hilbert space is densely defined (Pedersen 2012, Proposition 5.1.7), the adjoints of Liouville operators with domains given as in Lemma 1 are densely defined. Specific members of the domain of the respective adjoints may be identified, and these functions will be utilized in the characterization of the DMD methods in the subsequent sections. To characterize the interaction between the trajectories of a dynamical system and the Liouville operator, the notion of occupation kernels must be introduced (cf. Rosenfeld et al. 2019a).

Definition 3

Let X be a metric space, \(\gamma :[0,T] \rightarrow X\) be an essentially bounded measurable trajectory, and let H be a RKHS over X consisting of continuous functions. Then the functional \(g \mapsto \int _0^T g(\gamma (t)) dt\) is bounded, and the Riesz representation theorem guarantees a function \(\varGamma _{\gamma } \in H\) such that

$$\begin{aligned} \langle g, \varGamma _\gamma \rangle _H = \int _0^T g(\gamma (t)) dt \end{aligned}$$

for all \(g \in H\). The function \(\varGamma _{\gamma }\) is the occupation kernel corresponding to \(\gamma \) in H.

Lemma 2

If \(f: {\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\) is the dynamics for a dynamical system, and if \(\gamma : [0,T] \rightarrow {\mathbb {R}}^n\) is a trajectory satisfying \({\dot{\gamma }} = f(\gamma (t))\) in the Carethèodory sense, then \(\varGamma _\gamma \in {\mathcal {D}}(A_f^*)\) and \(A_f^* \varGamma _{\gamma } = K(\cdot , \gamma (T)) - K(\cdot ,\gamma (0)).\)

Proof

This lemma was established in Rosenfeld et al. (2019a). \(\square \)

For Liouville operators, several examples can be demonstrated where particular symbols produce densely defined operators over the Bargmann-Fock space. In particular, since polynomials are dense in the Bargmann-Fock space, for polynomial dynamics, f, the function \(A_f g = \nabla g \cdot f\) is a polynomial whenever g is a polynomial. Hence, polynomial dynamical systems correspond to densely defined Liouville operators over the Bargmann-Fock space, and it should be noted that this is not a complete characterization of the densely defined Liouville operators over this space. For other RKHSs, different classes of dynamics correspond to densely defined operators, requiring independent evaluation for each RKHS.

3 A Compact Variation of the Liouville Operator

One of the drawbacks of employing either the Koopman operator or the Liouville operator for DMD is that the finite rank matrices produced by the method are strictly heuristic representations of the modally unbounded operators. An important question to address is whether a DMD procedure may be produced using a compact operator other than those densely defined operators discussed so far. This section presents a class of compact operators for use in DMD applied to continuous time systems. The compactness and boundedness of the operators will depend on the selection of the RKHS and the dynamics of the system. The Bargmann-Fock space will be utilized in this section, and the compactness assumption will be demonstrated to hold for a large class of dynamics.

Definition 4

Let H be a RKHS over \({\mathbb {R}}^n\), \(a \in {\mathbb {R}}\) with \(|a| < 1\), and let the scaled Liouville operator with symbol \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\),

$$\begin{aligned} A_{f,a} : {\mathcal {D}}(A_{f,a}) \rightarrow H, \end{aligned}$$

be given as \(A_{f,a} g(x) = a\nabla g(ax)f(x)\) for all \(x \in {\mathbb {R}}^n\) and

$$\begin{aligned} g \in {\mathcal {D}}(A_{f,a}) = \{ h \in H : a\nabla h(ax) f(x) \in H\}. \end{aligned}$$

From the definition of scaled Liouville operators, if \(\gamma : [0,T] \rightarrow {\mathbb {R}}^n\) is a trajectory satisfying \({\dot{\gamma }} = f(\gamma )\), then

$$\begin{aligned} \int _0^T A_{f,a} g(\gamma (t)) dt = \int _0^T a \nabla g(a\gamma (t)) f(\gamma (t)) dt = \langle A_{f,a} g, \varGamma _\gamma \rangle _{H}. \end{aligned}$$

The following proposition then follows from arguments similar to the proof of Lemma 2.

Proposition 1

For \(\gamma : [0,T] \rightarrow {\mathbb {R}}^n\), such that \({\dot{\gamma }} = f(\gamma )\), \(\varGamma _{\gamma } \in {\mathcal {D}}(A_{f,a}^*)\) and

$$\begin{aligned} A_{f,a}^* \varGamma _{\gamma } = K(\cdot ,a\gamma (T)) - K(\cdot ,a\gamma (0)).\end{aligned}$$

Theorem 1 and Corollary 1 demonstrate that for the Bargmann-Fock space, a large class of dynamics correspond to compact scaled Liouville operators.

Theorem 1

Let \(F^2({\mathbb {R}}^n)\) be the Bargmann-Fock space of real valued functions, which is the native space for the exponential dot product kernel, \(K(x,y) = \exp (x^Ty)\), \(a \in {\mathbb {R}}\) with \(|a| < 1\), and let \(A_{f,a}\) be the scaled Liouville operator with symbol \(f:{\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\). There exists a collection of coefficients, \(\{ C_\alpha \}_{\alpha }\), indexed by the multi-index \(\alpha \), such that if f is representable by a multi-variate power series, \(f(x) = \sum _{\alpha } f_\alpha x^\alpha \), satisfying

$$\begin{aligned} \sum _{\alpha } |f_{\alpha }| C_\alpha < \infty , \end{aligned}$$

then \(A_{f,a}\) is bounded and compact over \(F^2({\mathbb {R}}^n)\).

Proof

The proof has been relegated to the appendix to ease exposition. \(\square \)

Corollary 1

If f is a multi-variate polynomial, then \(A_{f,a}\) is bounded and compact over \(F^2({\mathbb {R}}^n)\) for all \(|a|<1\).

The compactness of scaled Liouville operators (over the Bargmann-Fock space) is critical for norm convergence of DMD methods. For bounded Koopman operators, results such as Korda and Mezić (2018) obtain convergence in the strong operator topology (SOT) of the DMD operator to the Koopman operator. SOT convergence is only pointwise convergence over a Hilbert space, and does not provide any generalization guarantees in the learning sense. Norm convergence on the other hand gives a uniform bound on the error estimates for all functions in the Hilbert space. Specifically, in this paper, the data-driven finite rank representation of the scaled Liouville operator, given in Sect. 4, is shown to converge, in norm, to the scaled Liouville operator.

While scaled Liouville operators are not identical to the Liouville operator, the selection of the parameter a close to 1 can be used to limit the difference between their finite rank representations to be within machine precision. Hence, the decomposition of scaled Liouville operators is computationally indistinguishable from that of the Liouville operator for a sufficiently close to 1.

4 Occupation Kernel Dynamic Mode Decomposition

4.1 Finite Rank Representation of the Liouville Operator

With the relevant theoretical background presented, this section develops the Occupation Kernel-based DMD method for continuous time systems. Let K be the kernel function for a RKHS, H, over \({\mathbb {R}}^n\) consisting of continuously differentiable functions. Let \(\dot{x} = f(x)\) be a dynamical system corresponding to a densely defined Liouville operator, \(A_f\), over H. Suppose that \(\{ \gamma _i : [0,T_i] \rightarrow X \}_{i=1}^M\) is a collection of trajectories satisfying \({\dot{\gamma }}_i = f(\gamma _i)\). There is a corresponding collection of occupation kernels, \(\alpha := \{ \varGamma _{\gamma _i} \}_{i=1}^M \subset H\), given as \(\varGamma _{\gamma _i}(x) := \int _0^{T_i} K(x,\gamma _i(t)) dt.\) For each \(\gamma _i\) the action of \(A_f^*\) on the corresponding occupation kernel is given by \(A_f^* \varGamma _{\gamma _i} = K(\cdot , \gamma _i(T_{i})) - K(\cdot , \gamma _i(0))\).

Thus, when \(\alpha \) is selected as an ordered basis for a vector space, the action of \(A_f^*\) is known on \({{\,\mathrm{span}\,}}(\alpha )\). The objective of the DMD procedure is to express a matrix representation of the operator \(A_f^*\) on the finite dimensional vector space spanned by \(\alpha \) followed by projection onto \({{\,\mathrm{span}\,}}(\alpha )\).

Let \(w_1,\cdots ,w_M\) be the coefficients for the projection of a function \(g \in H\) onto \({{\,\mathrm{span}\,}}(\alpha ) \subset H\), written as \(P_\alpha g = \sum _{i=1}^{M} w_i\varGamma _{\gamma _{i}}\). Using the fact that

$$\begin{aligned} \langle g , \varGamma _{\gamma _j}\rangle _H = \langle P_\alpha g , \varGamma _{\gamma _j}\rangle _H = \begin{pmatrix} \langle \varGamma _{\gamma _1} , \varGamma _{\gamma _j}\rangle _H&\cdots&\langle \varGamma _{\gamma _M} , \varGamma _{\gamma _j}\rangle _H \end{pmatrix} \begin{pmatrix} w_1 \\ \vdots \\ w_M \end{pmatrix}, \end{aligned}$$

for all \( j = 1, \cdots , M \), the coefficients \( w_1, \cdots , w_M \) may be obtained through the solution of the following linear system:

$$\begin{aligned} \begin{pmatrix} \langle \varGamma _{\gamma _1}, \varGamma _{\gamma _1} \rangle _H &{} \cdots &{} \langle \varGamma _{\gamma _M}, \varGamma _{\gamma _1} \rangle _H\\ \vdots &{} \ddots &{} \vdots \\ \langle \varGamma _{\gamma _1}, \varGamma _{\gamma _M} \rangle _H &{} \cdots &{} \langle \varGamma _{\gamma _M}, \varGamma _{\gamma _M} \rangle _H \end{pmatrix} \begin{pmatrix} w_1 \\ \vdots \\ w_M \end{pmatrix} = \begin{pmatrix} \langle g, \varGamma _{\gamma _1} \rangle _H\\ \vdots \\ \langle g, \varGamma _{\gamma _M} \rangle _H \end{pmatrix}, \end{aligned}$$
(1)

where each of the inner products may be expressed as either single or double integrals as

$$\begin{aligned} \langle \varGamma _{\gamma _j}, \varGamma _{\gamma _i} \rangle _H = \int _0^{T_i} \int _0^{T_j} K(\gamma _i(\tau ),\gamma _j(t)) dt d\tau \text {, and } \langle g, \varGamma _{\gamma _i} \rangle _H = \int _0^{T_i} g(\gamma _i(t))dt.\nonumber \\ \end{aligned}$$
(2)

Furthermore, if \(h = \sum _{i=1}^{M} v_i\varGamma _{\gamma _{i}} \in {{\,\mathrm{span}\,}}(\alpha )\) for some coefficients \(\{v_i\}_{i=1}^M\subset {\mathbb {R}}\), then \( A_f^* h \in H \), and it follows that

$$\begin{aligned} \langle A_f^* h,\varGamma _{\gamma _j} \rangle= & {} \left\langle \sum _{i=1}^{M} v_i A_f^*\varGamma _{\gamma _{i}},\varGamma _{\gamma _j} \right\rangle _H \nonumber \\= & {} \begin{pmatrix}\left\langle A_f^*\varGamma _{\gamma _{1}},\varGamma _{\gamma _j} \right\rangle _H,\cdots ,\left\langle A_f^*\varGamma _{\gamma _{M}},\varGamma _{\gamma _j} \right\rangle _H\end{pmatrix}\begin{pmatrix}v_1\\ \vdots \\ v_M\end{pmatrix}, \end{aligned}$$
(3)

for all \(j = 1,\cdots ,M\). Using (1) and (3), the coefficients \( \{w_i\}_{i=1}^M \) in the projection of \( A_f^* h \) onto \( {{\,\mathrm{span}\,}}(\alpha ) \) can be expressed as

$$\begin{aligned} \begin{pmatrix} w_1 \\ \vdots \\ w_M \end{pmatrix}= & {} \begin{pmatrix} \langle \varGamma _{\gamma _1}, \varGamma _{\gamma _1} \rangle _H &{} \cdots &{} \langle \varGamma _{\gamma _M}, \varGamma _{\gamma _1} \rangle _H\\ \vdots &{} \ddots &{} \vdots \\ \langle \varGamma _{\gamma _1}, \varGamma _{\gamma _M} \rangle _H &{} \cdots &{} \langle \varGamma _{\gamma _M}, \varGamma _{\gamma _M} \rangle _H \end{pmatrix}^{-1}\\&\times \begin{pmatrix} \langle A_f^* \varGamma _{\gamma _1}, \varGamma _{\gamma _1} \rangle _H &{} \cdots &{} \langle A_f^* \varGamma _{\gamma _M}, \varGamma _{\gamma _1} \rangle _H\\ \vdots &{} \ddots &{} \vdots \\ \langle A_f^* \varGamma _{\gamma _1}, \varGamma _{\gamma _M} \rangle _H &{} \cdots &{} \langle A_f^* \varGamma _{\gamma _M}, \varGamma _{\gamma _M} \rangle _H \end{pmatrix} \begin{pmatrix} v_1 \\ \vdots \\ v_M \end{pmatrix}. \end{aligned}$$

Lemma 2 then yields the finite rank representation for \( P_\alpha A_f^* \), restricted to the occupation kernel basis, \( {{\,\mathrm{span}\,}}(\alpha ) \), as

$$\begin{aligned}{}[P_\alpha A_f^*]_{\alpha }^{\alpha } = G^{-1}{\mathcal {I}}, \end{aligned}$$
(4)

where

$$\begin{aligned} G:=\begin{pmatrix} \langle \varGamma _{\gamma _1}, \varGamma _{\gamma _1} \rangle _H &{} \cdots &{} \langle \varGamma _{\gamma _1}, \varGamma _{\gamma _M} \rangle _H\\ \vdots &{} \ddots &{} \vdots \\ \langle \varGamma _{\gamma _M}, \varGamma _{\gamma _1} \rangle _H &{} \cdots &{} \langle \varGamma _{\gamma _M}, \varGamma _{\gamma _M} \rangle _H \end{pmatrix} \end{aligned}$$
(5)

is the Gram matrix of occupation kernels and

$$\begin{aligned} {{\mathcal {I}}:=\begin{pmatrix} \langle K(\cdot ,\gamma _1(T_1)) - K(\cdot , \gamma _1(0)), \varGamma _{\gamma _1} \rangle _H &{} \cdots &{} \langle K(\cdot ,\gamma _M(T_M)) - K(\cdot , \gamma _M(0)), \varGamma _{\gamma _1} \rangle _H\\ \vdots &{} \ddots &{} \vdots \\ \langle K(\cdot ,\gamma _1(T_1)) - K(\cdot , \gamma _1(0)), \varGamma _{\gamma _M} \rangle _H &{} \cdots &{} \langle K(\cdot ,\gamma _M(T_M)) - K(\cdot , \gamma _M(0)), \varGamma _{\gamma _M} \rangle _H \end{pmatrix}.} \end{aligned}$$
(6)

is the interaction matrix.

DMD requires a finite-rank representation of \( P_{\alpha }A_f \), instead of \( P_{\alpha }A_f^* \). Similar to the development above, Lemma 2 can be used to generate a finite rank representation of \( P_{\alpha }A_f \) under the following additional assumption.

Assumption 1

The occupation kernels are in the domain of the Liouville operator, i.e., \(\alpha \subset {\mathcal {D}}(A_f)\).

Given \(h = \sum _{i=1}^{M} v_i\varGamma _{\gamma _{i}} \in {{\,\mathrm{span}\,}}(\alpha )\) for some coefficients \( \{v_i\}_{i=1}^M \subset {\mathbb {R}} \), Assumption 1 implies that \(A_f h\in H\) and

$$\begin{aligned}&\left\langle A_f h,\varGamma _{\gamma _j}\right\rangle _H = \sum _{i=1}^M v_i\left\langle A_f\varGamma _{\gamma _i},\varGamma _{\gamma _j}\right\rangle _H = \sum _{i=1}^M v_i\left\langle \varGamma _{\gamma _i}, A_f^* \varGamma _{\gamma _j}\right\rangle _H \nonumber \\&\quad = \left( \left\langle \varGamma _{\gamma _1}, A_f^* \varGamma _{\gamma _j}\right\rangle _H,\ldots ,\left\langle \varGamma _{\gamma _M}, A_f^* \varGamma _{\gamma _j}\right\rangle _H\right) \begin{pmatrix} v_1\\ \vdots \\ v_M \end{pmatrix}. \end{aligned}$$
(7)

Lemma 2 then yields a finite rank representation of \( P_\alpha A_f \), restricted to \({{\,\mathrm{span}\,}}(\alpha ) \) as

$$\begin{aligned}{}[P_\alpha A_f]_\alpha ^\alpha = G^{-1}{\mathcal {I}}^T. \end{aligned}$$
(8)

4.2 Dynamic Mode Decomposition

Suppose that \(\lambda _i\) is the eigenvalue for the eigenvector \(v_i := (v_{i1}, v_{i2}, \ldots , v_{iM})^T\), \(i=1,\ldots ,M\), of \([P_\alpha A_f]_{\alpha }^\alpha \). The eigenvector \(v_i\) can be used to construct a normalized eigenfunction of \( P_{\alpha } A_f \) restricted to \({{\,\mathrm{span}\,}}(\alpha )\), given as \(\varphi _i = \frac{1}{N_i} \sum _{j=1}^M v_{ij} \varGamma _{\gamma _j}\), where \(N_i := \sqrt{v_i^\dagger G v_i}\), and \((\cdot )^\dagger \) denotes the conjugate transpose. Let V be the matrix of coefficients of the normalized eigenfunctions, arranged so that each column corresponds to an eigenfunction.

The DMD procedure begins by expressing the identity function, also known as the full state observable, \(g_{id}(x) := x \in {\mathbb {R}}^n\) as a linear combination of the eigenfunctions of \(A_f\), i.e., \(g_{id}(x) = \lim _{M\rightarrow \infty } \sum _{i=1}^M \xi _{i,M} \varphi _i(x)\). For a fixed M, the identity function can be approximated using the Liouville modes \(\xi _i \in {\mathbb {R}}^n\) as \(g_{id}(x) \approx \sum _{i=1}^M \xi _i \varphi _i(x)\). The j-th row of the matrix \(\xi = (\xi _1 \cdots \xi _M)\) is obtained as

$$\begin{aligned} \begin{pmatrix}(\xi _1)_j&\cdots&(\xi _M)_j\end{pmatrix} = \left( \begin{pmatrix} \langle \varphi _1,\varphi _1 \rangle _H &{} \cdots &{} \langle \varphi _1,\varphi _M\rangle _H\\ \vdots &{} \ddots &{} \vdots \\ \langle \varphi _M,\varphi _1\rangle _H &{} \cdots &{} \langle \varphi _M,\varphi _M \rangle _H \end{pmatrix}^{-1} \begin{pmatrix} \langle (x)_j, \varphi _1 \rangle _H \\ \vdots \\ \langle (x)_j, \varphi _M \rangle _H \end{pmatrix}\right) ^T, \end{aligned}$$

where \((x)_j\) is viewed as the functional mapping \(x \in {\mathbb {R}}^n\) to its j-th coordinate. By examining the inner products \( \left\langle g_{id},\varGamma _{\gamma _i}\right\rangle _H \), for \(i=1,\ldots ,M\), the matrix \(\xi \) may be expressed as

$$\begin{aligned} \xi = \begin{pmatrix} \int _0^{T_1} \gamma _{1}(t) \mathrm {d}t&\cdots&\int _0^{T_M} \gamma _{M}(t) \mathrm {d}t \end{pmatrix}\left( V^T G \right) ^{-1} \end{aligned}$$
(9)

Given a trajectory \(x(\cdot )\) satisfying \(\dot{x} = f(x)\), each eigenfunction of \(A_f\) satisfies \({\dot{\varphi }}_i(x(t)) = \lambda _i \varphi _i(x(t))\) and hence, \(\varphi _i(x(t)) = \varphi _i(x(0)) e^{\lambda _i t}\), and the following data driven model is obtained:

$$\begin{aligned} x(t) \approx \sum _{i=1}^M \xi _i \varphi _i(x(0)) e^{\lambda _i t}, \end{aligned}$$
(10)

where

$$\begin{aligned} \varphi _i(x(0)) = \frac{1}{N_i} \sum _{j=1}^M v_{ij} \varGamma _{\gamma _j}(x(0)) = \frac{1}{N_i} \sum _{j=1}^M v_{ij} \int _0^{T_j} K\left( x(0),\gamma _j(t)\right) dt. \end{aligned}$$
(11)

The resultant DMD procedure is summarized in Algorithm 1.

figure a

4.3 Modifications for the Scaled Liouville Operator DMD Method

Since Liouville operators are not generally compact, convergence, as \(M\rightarrow \infty \), of the finite rank representation \( P_\alpha A_f \) to the Liouville operator \(A_f\) cannot be guaranteed. Convergence of the finite rank representation can be established in the case of the scaled Liouville operators and the approximations obtained via DMD, under Assumption 1, are provably cogent. We call the approach taken here Scaled Liouville DMD (SL-DMD).

By Theorem 2, for an infinite collection of trajectories \(\{ \gamma _{i}\}_{i=1}^\infty \) with a dense collection of corresponding occupation kernels, \(\{ \varGamma _{\gamma _{i}}\}_{i=1}^\infty \subset H\), the resultant sequence of finite rank operators \(P_{\alpha _M}A_{f,a} P_{\alpha _M}\) converges to \(A_{f,a}\), where \(\alpha _M := \{ \varGamma _{\gamma _{1}},\ldots , \varGamma _{\gamma _{M}}\}\). Consequently, the spectrum of \([P_{\alpha _M}A_{f,a}]_{\alpha _M}^{\alpha _M}\), the finite rank representation of \( P_{\alpha _M} A_{f,a} \), restricted to \({{\,\mathrm{span}\,}}(\alpha )\), converges to that of \(A_{f,a}\).Footnote 1

Furthermore, when a is sufficiently close to 1 and the observed trajectories contained in a compact set are perturbed to within machine precision, the finite rank representations of \(A_{f,a}\) and \(A_{f}\) are computationally indistinguishable.

DMD using scaled Liouville operators is similar to the unscaled case. In particular, recall that for \(|a| < 1\) and f as above, \(A^*_{f,a} \varGamma _{\gamma _{i}} = K(\cdot ,a\gamma _i(T_i)) - K(\cdot ,a\gamma _i(0)).\) Hence, a finite rank representation of \(A_{f,a}\), obtained from restricting and projecting to \({{\,\mathrm{span}\,}}(\alpha )\), is given as

$$\begin{aligned}{}[P_\alpha A_{f,a}]_{\alpha }^{\alpha } = G^{-1}{\mathcal {I}}_a^T, \end{aligned}$$

where

$$\begin{aligned} {\mathcal {I}}_a:=\begin{pmatrix} \langle K(\cdot ,a\gamma _1(T_1)) - K(\cdot , a\gamma _1(0)), \varGamma _{\gamma _1} \rangle _H &{} \cdots &{} \langle K(\cdot ,a\gamma _M(T_M)) - K(\cdot , a\gamma _M(0)), \varGamma _{\gamma _1} \rangle _H\\ \vdots &{} \ddots &{} \vdots \\ \langle K(\cdot ,a\gamma _1(T_1)) - K(\cdot , a\gamma _1(0)), \varGamma _{\gamma _M} \rangle _H &{} \cdots &{} \langle K(\cdot ,a\gamma _M(T_M)) - K(\cdot , a\gamma _M(0)), \varGamma _{\gamma _M} \rangle _H \end{pmatrix}. \end{aligned}$$

The approximate normalized eigenfunctions, \(\{ \varphi _{i,a} \}_{i=1}^M\), for \(A_{f,a}\) may then be obtained in an identical fashion as for the Liouville operator.

Thus, the expression of the full state observable, \(g_{id}\), in terms of the eigenfunctions yields \(g_{id}(x) \approx \sum _{i=1}^M \xi _{i,a} \varphi _{i,a}(x)\) with (scaled) Liouville modes \(\xi _{i,a}\).

As the eigenfunctions satisfy

$$\begin{aligned}{\dot{\varphi }}_{i,a}(a x(t)) = a \nabla \varphi _i(a x(t)) f(x(t)) = A_{f,a} \varphi _{i,a}(x(t)) = \lambda _{i,a} \varphi _{i,a}(x(t)), \end{aligned}$$

it can be seen that \(\varphi _{i,a}(x(t)) \ne e^{t\lambda _{i,a}} \varphi _{i,a}(x(0)).\) When a is close to 1, it can be demonstrated that \(\varphi _{i,a}(x(t))\) is very nearly equal to \(e^{t\lambda _{i,a}} \varphi _{i,a}(x(0)),\) and the error can be controlled when x(t) remains in a compact domain or workspace.

Proposition 2

Let H be a RKHS of twice continuously differentiable functions over \({\mathbb {R}}^n\), f be Lipschitz continuous, and suppose that \(\varphi _{i,a}\) is an eigenfunction of \(A_{f,a}\) with eigenvalue \(\lambda _{i,a}\). Let D be a compact subset of \({\mathbb {R}}^n\) that contains x(t) for all \(0< t < T\). In this setting, if \(\lambda _{i,a} \rightarrow \lambda _{i,1}\) and \(\varphi _{i,a}(x(0)) \rightarrow \varphi _{i,1}(x(0))\) as \(a \rightarrow 1^-\), then

$$\begin{aligned}\sup _{0 \le t \le T} \Vert \varphi _{i,a}(x(t)) - e^{\lambda _{i,a}t}\varphi _{i,a}(x(0))\Vert _2 \rightarrow 0.\end{aligned}$$

Proof

The proof has been relegated to the appendix to ease exposition. \(\square \)

Thus, under the hypothesis of Proposition 2, for a sufficiently close to 1, a data-driven model for a trajectory x satisfying \(\dot{x} = f(x)\) is established as

$$\begin{aligned} x(t) \approx \sum _{i=1}^M \xi _{i,a} \varphi _{i,a}(x(0)) e^{\lambda _{i,a} t}. \end{aligned}$$
(12)

The principle advantage of using scaled Liouville operators is that these operators are compact over the Bargmann-Fock space for a large collection of nonlinear dynamics. Moreover, the sequence finite rank operators obtained through the DMD procedure achieves norm convergence when the sequence of recorded trajectories corresponds to a collection of occupation kernels that are dense in the Hilbert space.

Theorem 2

Let \(|a| < 1\). Suppose that \(\{ \gamma _{i}:[0,T_i] \rightarrow {\mathbb {R}}^n \}_{i=1}^\infty \) is a sequence of trajectories satisfying \({\dot{\gamma }} = f(\gamma )\) for a dynamical system f corresponding to a compact scaled Liouville operator, \(A_{f,a}\). If the collection of functions, \(\{ \varGamma _{\gamma _i} \}_{i=1}^\infty \) is dense in the Bargmann-Fock space, then the sequence of operators \(\{ P_{\alpha _M} A_{f,a} P_{\alpha _M} \}_{M=1}^\infty \) converges to \(A_{f,a}\) in the norm topology, where \(\alpha _M = \{ \varGamma _{\gamma _1}, \ldots , \varGamma _{\gamma _M} \}\).

Proof

The proof has been relegated to the appendix to ease exposition. \(\square \)

5 Numerical Experiments

This section includes two collections of numerical experiments solved using the methods of the paper. The first surround the problem of flow across a cylinder, which has become a classic example for DMD. This provides a benchmark for comparison of the present method with kernel-based extended DMD. There it will be demonstrated that scaled Liouville modes and Liouville modes are very similar.

The second experiment performs a decomposition using electroencephalography (EEG) data, which has been sampled at 250 Hz over a period of 8 seconds. The high sampling frequency gives a large number of snapshots, which then leads to a high-dimensional learning problem when using the snapshots alone. The purpose of this experiment is to demonstrate how the Liouville operator based DMD can incorporate the large number of snapshots to generate Liouville modes without discarding data.

5.1 Flow Across a Cylinder

This experiment utilizes the data set from Kutz et al. (2016), which includes snapshots of flow velocity and flow vorticity generated from a computational fluid dynamics simulation. The data correspond to the wake behind a circular cylinder, and the Reynolds number for this flow is 100. The simulation was generated with time steps of \(\varDelta t = 0.02\) second and ultimately sampled every \(10 \varDelta t\) seconds yielding 151 snapshots. Each snapshot of the system is a vector of dimension 89, 351. More details may be found in Kutz et al. (2016, Chapter 2).

Figure 1 presents the Liouville modes obtained from the cylinder vorticity data set where the collection 151 snapshots was subdivided into 147 trajectories, each of length 5. This figure should be compared with Fig. 2, which presents the scaled Liouville modes, with parameter \(a = 0.99\), corresponding to the same data set. The modes were generated using the Gaussian kernel with \(\mu = 500\). Figure 3 compares snapshots of the true vorticity against vorticity reconstructed using the unscaled and scaled Liouville DMD models in (10) and (12), respectively.

Fig. 1
figure 1

This figure presents the real and imaginary parts of a selection of ten Liouville modes determined by the continuous time DMD method given in the present manuscript corresponding to the vorticity of a flow across a cylinder (data available in Kutz et al. 2016)

Fig. 2
figure 2

This figure presents the real and imaginary parts of a selection of five scaled Liouville DMD modes for the cylinder wake vorticity data in Kutz et al. (2016). The difference between these modes and the modes in Fig. 1 was anticipated for several reasons; the selection of \(a=0.99\) is expected to result in slightly different modes, and there is no consistent method of ordering the Liouville modes as the significance of each mode depends not only on its magnitude, but also the associated eigenvector and initial value

Fig. 3
figure 3

Snapshots of the true flow compared with reconstruction via the Liouville DMD model in (10) and the scaled Liouville DMD model in (12) with \(a=0.99\)

5.2 SsVEP Dataset

This experiment uses data from Gruss and Keil (2019). The data for this experiment was taken from an electroencephalography (EEG) recording of the visual cortex of one human participant during the active viewing of flickering images (Gruss and Keil 2019). By modulating luminance or contrast of an image at a constant rate (e.g. 12Hz), image flickering reliably evokes the steady state visually evoked potential (SsVEP) in early visual cortex (Regan 1989; Petro et al. 2017), reflecting entrainment of neuronal oscillations at the same driving frequency. SsVEP in the current data was evoked by pattern-reversal Gabor patch flickering at 12Hz (i.e. contrast-modulated) for a trial length of 7 seconds, with greatest signal strength originating from the occipital pole (Oz) of a 129-electrode cap. Data was sampled at 500Hz, band-pass filtered online from 0.5 - 48Hz, offline from 3 - 40Hz, with 53 trials retained for this individual after artifact rejection. Of these trials, the first 40 trials were used in the continuous time DMD method and each trial was subdivided into 50 trajectories. SsVEP data have the advantage of having an exceedingly high signal-to-noise ratio and high phase coherence due to the oscillatory nature of the signal, ideally suited for signal detection algorithms (such as brain-computer interfaces (Bakardjian et al. 2010; Bin et al. 2009; Middendorf et al. 2000)).

In this setting each independent trial can be used as a trajectory for a single occupation kernel. This differs from the implementation of Koopman-based DMD, where most often each snapshot corresponds to a single trajectory. The continuous time DMD method was performed using the Gaussian kernel function with \(\mu =50\).

Figure 4 presents the obtained eigenvalues, and Fig. 5 gives log scaled spectrum obtained from the eigenvectors. It can be seen that the spectrum has strong peaks near the 12 Hz range, which suggests that the continuous time DMD procedure using occupation kernels can extract frequency information without using shifted copies of the trajectories as in Kutz et al. (2016).

For this example, the resultant dimensionality of Koopman-based DMD makes the analysis of this data set intractable without discarding a significant number of samples.

Fig. 4
figure 4

Eigenvalues corresponding to the SsVEP dataset from Gruss and Keil (2019). This plot is on the complex plane, where the vertical axis indicates the imaginary part of the eigenvalue, and the horizontal axis indicates the real part

Fig. 5
figure 5

Rescaled spectrum obtained from the SsVEP dataset. This doesn’t quite correspond to the spectrum that would be computed through the Fourier transform. However, note the significant peak around 12 Hz, which corresponds to the SsVEP

6 Discussion

6.1 Unboundedness of Liouville and Koopman Operators

Traditional DMD approaches aim to estimate a continuous nonlinear dynamical system by first selecting a fixed time-step and then investigate the induced discretized dynamics through the Koopman operator. The algorithm developed in this manuscript estimates the continuous nonlinear dynamics directly by employing occupation kernels, which represent trajectories via an integration functional that interfaces with the Liouville operator. That is, the principle advantage realized through DMD using Liouville operators and occupation kernels over that of kernel-based DMD and the Koopman operator is that the resulting finite-rank representation corresponds to a continuous time system rather than a discrete time proxy. This is significant, since not all continuous time systems can be discretized for use with the Koopman operator framework. Moreover, through employment of scaled Liouville operators, many dynamical systems yield a compact operator over the Bargmann-Fock space, which allows for norm convergence of DMD procedures.

Liouville operators are unbounded in most cases due to the inclusion of the gradient in their definition. Koopman operators are also unbounded in all but a few cases. In the specific instance where the selected kernel function is the exponential dot product kernel, Koopman operators are only bounded if the dynamics are affine (cf. Carswell et al. 2003). In contrast, large classes of both Liouville and Koopman operators are densely defined and closed operators over RKHSs. Thus, connections between DMD and Koopman/Liouville operators need to generally rely on the theory of unbounded operators.

6.2 Finite Rank Representations

Since Liouville operators are generally unbounded, convergence of the finite rank representation (in the norm topology) of the method in Sect. 4 cannot be established for most selections of f. Moreover, the selection of observables on which the operator is applied must come from the functions that reside in the domain of the Liouville operator, \({\mathcal {D}}(A_f)\). As bounded Koopman operators are rare as well, the need for care in the selection of observables is shared by both operators. In the design of the algorithm of this manuscript, an additional assumption was made where the domain of the Liouville operator was required to contain the occupation kernels corresponding to the observed trajectories. It should be noted that even if the occupation kernels are not in the domain of the Liouville operator, they are always in the domain of the adjoint of the Liouville operator, as long as the Liouville operator is closed and densely defined. As a result, an alternative DMD algorithm may be designed using the action of the adjoint on the occupation kernels. Interestingly, as evidenced by (4) and (8), the only adjustment to the algorithm in this setting is transposition of the matrix \({\mathcal {I}}\).

6.3 Approximating the Full State Observable

The decomposition of the full state observable relies strongly on selection of the RKHS. In the case of the Bargmann-Fock space, \(x \mapsto (x)_i\) is a function in the space for each \(i=1,\ldots ,n\). However, this is not the case for the native space of the Gaussian radial basis function kernel, which does not contain any polynomials. In both cases, the spaces are universal, which means that any continuous function may be arbitrarily well estimated by a function in the space with respect to the supremum norm over a compact subset. Thus, it is not expected that a good approximation of the full state observable will hold over all of \({\mathbb {R}}^n\), but a sufficiently small estimation error is possible over a compact workspace.

6.4 Scaled Liouville Operators

One advantage of the Liouville approach to DMD is that the Liouville operators may be readily modified to generate a compact operator through the so-called scaled Liouville operator. A large class of dynamics correspond to compact scaled Liouville operators, while Koopman operators cannot be modified in a similar fashion. Allowing this compact modification indicates that on an operator theoretic level, the study of nonlinear dynamical systems through Liouville operators allows for more flexibility.

The experiments presented in Sect. 5 demonstrate that the Liouville modes obtained with the continuous time DMD procedure using Liouville operators and occupation kernels are similar in form to the Koopman modes obtained using kernel-based extended DMD (Williams et al. 2015b). Moreover, occupation kernels allow for trajectories to be utilized as a fundamental unit of data, which can reduce the dimensionality of the learning problem while retaining some fidelity that would be otherwise lost through discarding data.

6.5 Time Varying Systems

The present framework can be adapted to handle time varying systems of the form \(\dot{x} = f(t,x)\) with little adjustment. In particular, an analysis of this system may be achieved through state augmentation where time is included as a state variable as \(z = [ t, x^T]^T,\) which leads to an adjusted dynamical system given as \(\dot{z} = [1, f(t,x)^T]^T\). Hence, the analysis of time varying dynamics are included in the present approach.

6.6 Strong Operator Topology Convergence Versus Norm Convergence

One of the major contributions of this manuscript is the definition of the scaled Liouville operators, which for certain selections of a, these operators are compact over the exponential dot product kernel’s space. This compactness enables the norm convergence of DMD routines, where the finite rank operators made for DMD are essentially operator level interpolants.

Presently, the best convergence results for DMD methods are SOT convergence results (Korda and Mezić 2018). SOT convergence yields pointwise convergence of operators in that a sequence of bounded operators, \(T_m\), converges to T in SOT if and only if \(T_m g \rightarrow Tg\) for all \(g \in H\). This mode of convergence is limited, and not entirely appropriate for spectral methods like DMD, where the only guarantees provided are that the spectrum of the limiting operator may be obtained as a subsequence of the members of the spectrum of the sequence of operators under consideration. Thus, as observed by the authors of Korda and Mezić (2018), infinitely many operators \(T_m\), from the sequence of operators converging in SOT to T, may not be part of the subsequence for convergence, and as a result, may have dramatically different spectra from T. Moreover, the convergence result for Koopman-based DMD is a special case of a more general theorem, which implies that finite rank operators are dense in the collection of bounded operators with respect to the SOT (Pedersen 2012).

In contrast, norm convergence of operators is much stronger, where if two operators are close in norm, then their spectra are also close. Hence, convergence in norm of a sequence of bounded operators, \(T_m\), to an operator, T guarantees the convergence of the spectra. Since DMD is a method where finite rank operators are designed to represent an unknown operator, the only operators amenable for norm convergence are compact operators. Compactness of the scaled Liouville operators thus allows for norm convergence of the finite rank approximations, generated for DMD, to the respective scaled Liouville operators. As a result, convergence of the estimated spectra to the spectra of the scaled Liouville operators is also established. More information concerning spectral theory and operator theory in general can be found in Pedersen (2012).

Scaled Liouville operators are compact over the native space of the exponential dot product kernel for a wide range of dynamical systems, including all polynomial dynamical systems. In contrast, every Koopman operator corresponding to a discretization of the trivial dynamics \(\dot{x} = 0\) is the identity operator (for any selection of underlying function space), which is not compact when the underlying function space is an infinite dimensional Hilbert space.

7 Conclusions

In this paper, the notion of occupation kernels is leveraged to enable spectral analysis of the Liouville operator via DMD. A family of scaled Liouville operators is introduced and shown to be compact, which allows for norm convergence of the DMD procedure. Two examples are presented, one from fluid dynamics and another from EEG, which demonstrate reconstruction of trajectories, approximation of the spectrum, and a comparison of Liouville and scaled Liouville DMD.

The method presented here provides a new approach to DMD and builds the operator theoretic foundations for spectral decomposition of continuous time dynamical systems. By targeting the DMD procedure towards Liouville operators, which include Koopman generators as a proper subset, continuous time dynamical system are modeled directly, without discretization. Moreover, by obviating the limiting process used in the definition of Koopman generators, in favor of direct formulation via Liouville operators, the requirement of forward completeness is relaxed and the resulting methods are applicable to a much broader class of dynamical systems.