1 Introduction

The numerical solution of complex systems of differential equations plays an important role in many areas such as molecular dynamics, fluid dynamics, mechanical as well as electrical engineering, and physics. These systems often exhibit multi-scale behavior which can be due to the coupling of subsystems with different time scales—for instance, fast electrical and slow mechanical components—or due to the intrinsic properties of the system itself—for instance, the fast vibrations and slow conformational changes of molecules. Analyzing such problems using transfer operator-based methods is often infeasible or prohibitively expensive from a computational point of view due to the so-called curse of dimensionality. One possibility to avoid this is to project the dynamics of the high-dimensional system onto a lower-dimensional space and to then analyze the reduced system representing, for instance, only the relevant slow dynamics, see, for example, Pérez-Hernández et al. 2013; Froyland et al. 2014.

In this paper, we will introduce different methods such as time-lagged independent component analysis (TICA) (Molgedey and Schuster 1994; Pérez-Hernández et al. 2013; Schwantes and Pande 2013) and dynamic mode decomposition (DMD) (Schmid 2010; Chen et al. 2012; Tu et al. 2014; Kutz et al. 2016) to identify the dominant dynamics using only simulation data or experimental data. It was shown that these methods are related to Koopman operator approximation techniques (Koopman 1931; Mezić 2005; Rowley et al. 2009; Budišić et al. 2012). Extensions of the aforementioned methods called the variational approach of conformation dynamics (VAC) (Noé and Nüske 2013; Nüske et al. 2014, 2016) developed mainly for reversible molecular dynamics problems and extended dynamic mode decomposition (EDMD) (Williams et al. 2015a, b; Klus et al. 2016) can be used to compute eigenfunctions, eigenvalues, and eigenmodes of the Koopman operator (and its adjoint, the Perron–Frobenius operator). Interestingly, although the underlying ideas, derivations, and intended applications of these methods differ, the resulting algorithms share a lot of similarities. The goal of this paper is to show the equivalence of different data-driven methods which have been widely used by the dynamical systems, fluid dynamics, and molecular dynamics communities, but under different names. Hence, extensions, generalizations, and algorithms developed for one method can be carried over to its counterparts, resulting in a unified theory and set of tools. An alternative approach to data-driven model reduction—also related to transfer operators and their generators—would be to use diffusion maps (Nadler et al. 2006; Coifman et al. 2008; Ferguson et al. 2011; Giannakis 2015). Manifold learning methods, however, are beyond the scope of this paper.

The outline of the paper is as follows: Sect. 2 briefly introduces transfer operators and the concept of reversibility. In Sect. 3, different data-driven methods for the approximation of the eigenvalues, eigenfunctions, and eigenmodes of transfer operators will be described. The theoretical background and the derivation of these methods will be outlined in Sect. 4. Section 5 addresses open problems and lists possible future work.

2 Transfer Operators and Reversibility

In the literature, the term transfer operator is sometimes used in different contexts. In this section, we will briefly introduce the Perron–Frobenius operator, the Perron–Frobenius operator with respect to the equilibrium density, and the Koopman operator. All these three operators are, according to our definition, transfer operators.

2.1 Guiding Example

Our paper will deal with data-driven methods to analyze both stochastic and deterministic dynamical systems. To illustrate the concepts of transfer operators and their spectral components, we first introduce a simple stochastic dynamical system that will be revisited throughout the paper.

Example 2.1

Consider the following one-dimensional Ornstein–Uhlenbeck process, given by an Itô stochastic differential equationFootnote 1 of the form:

$$\begin{aligned} \mathrm {d}\varvec{X}_{t}=-\alpha D\,\varvec{X}_{t}\,\mathrm {d}t+\sqrt{2D}\,\mathrm {d}\varvec{W}_{t}. \end{aligned}$$

Here, \(\{W_{t}\}_{t\ge 0}\) is a one-dimensional standard Wiener process (Brownian motion), the parameter \(\alpha \) is the friction coefficient, and \(D=\beta ^{-1}\) is the diffusion coefficient. The stochastic forcing usually models physical effects, most often thermal fluctuations and it is customary to call \(\beta \) the inverse temperature.

The transition density of the Ornstein–Uhlenbeck process, i.e., the conditional probability density to find the process near y a time \(\tau \) after it had been at x, is given by

$$\begin{aligned} p_{\tau }(x,y)=\frac{1}{\sqrt{2\,\pi \,\sigma ^{2}(\tau )}}\exp \left( -\frac{\left( y-x\,e^{-\alpha D\tau }\right) ^{2}}{2\,\sigma ^{2}(\tau )}\right) , \end{aligned}$$
(1)

where \(\sigma ^{2}(\tau )=\alpha ^{-1}\left( 1-e^{-2\alpha D\tau }\right) \). Figure 1a shows the transition densities for different values of \(\tau \). More details can be found in Pavliotis (2014). For complex dynamical systems, the transition density is not known explicitly, but must be estimated from simulation or measurement data.

In this work we will describe the dynamics of a system in terms of dynamical operators such as the propagator \(\mathcal {P}_{\tau }\), which is defined by the transition density \(p_{\tau }(x,y)\) and propagates a probability density of Brownian walkers in time by

$$\begin{aligned} p_{t+\tau }(x)=\mathcal {P}_{\tau }\,p_{t}(x). \end{aligned}$$

See Fig. 1b for the time evolution of the Ornstein–Uhlenbeck process initiated from a localized starting condition. It can be seen that the distribution spreads out and converges toward a Gaussian distribution, which is then invariant in time. For this simple dynamical system we can give the equation for the invariant density explicitly:

$$\begin{aligned} \pi (x)=\frac{1}{\sqrt{2\,\pi \,\alpha ^{-1}}}\exp \left( -\frac{x^{2}}{2\,\alpha ^{-1}}\right) , \end{aligned}$$
(2)

which is a Gaussian whose variance is decreasing with increasing friction and decreasing temperature. \(\square \)

Fig. 1
figure 1

a Transition density function of the Ornstein–Uhlenbeck process for different values of \(\tau \). If \(\tau \) is small, values starting in x will stay close to it. For larger values of \(\tau \), the influence of the starting point x is negligible. Densities converge to the equilibrium density, denoted by \(\pi \). Here, \(\alpha =4\) and \(D=0.25\). b Evolution of the probability to find the dynamical system at any point x over time t, after starting with a peaked distribution at \(t=0\). We show the resulting distributions at times \(t=0.1\), and \(t=1\), and \(t=10\). The system relaxes toward the stationary density \(\pi (x)\)

2.2 Transfer Operators

Let \(\{\mathbf {X}_{t}\}_{t\ge 0}\) be a time-homogeneousFootnote 2 stochastic process defined on the bounded state space \(\mathbb {X}\subset \mathbb {R}^{d}\). It can be genuinely stochastic or it might as well be deterministic, such that there is a flow map \(\Phi _{\tau }:\mathbb {X}\rightarrow \mathbb {X}\) with \(\Phi _{\tau }(\mathbf {X}_{t})=\mathbf {X}_{t+\tau }\) for \(\tau \ge 0\). Let the measureFootnote 3 \(\mathbb {\mathbb {P}}\) denote the law of the process \(\{\mathbf {X}_{t}\}_{t\ge 0}\) that we will study in terms of its statistical transition properties. To this end, under some mild regularity assumptionsFootnote 4 which are satisfied by Itô diffusions with smooth coefficients (Hopf 1954; Krengel 1985), we can give the following definition.

Definition 2.2

The transition density function \(p_{\tau }:\mathbb {X}\times \mathbb {X}\rightarrow [0,\,\infty ]\) of a process \(\{\mathbf {X}_{t}\}_{t\ge 0}\) is defined by

$$\begin{aligned} \mathbb {P}[\mathbf {X}_{t+\tau }\in \mathbb {A}\,\vert \,\mathbf {X}_{t}=x]=\intop _{\mathbb {A}}p_{\tau }(x,y)\,\mathrm {d}y, \end{aligned}$$

for every measurable set \(\mathbb {A}.\) Here and in what follows, \(\mathbb {P}[\,\cdot \,\vert \,\mathfrak {E}]\) denotes probabilities conditioned on the event \(\mathfrak {E}\). That is, \(p_{\tau }(x,y)\) is the conditional probability density of \(\mathbf {X}_{t+\tau }=y\) given that \(\mathbf {X}_{t}=x\).

For \(1\le r\le \infty \), the spaces \(L^{r}(\mathbb {X}\)) denote the usual spaces of r-Lebesgue integrable functions, which is a Banach space with the corresponding norm \(\Vert \cdot \Vert _{L^{r}}\).

Definition 2.3

Let \(p_{t}\in L^{1}(\mathbb {X})\) be the probability density and \(f_{t}\in L^{\infty }(\mathbb {X})\) an observable of the system. For a given lag time \(\tau \):

  1. (a)

    The Perron–Frobenius operator or propagator \(\mathcal {P}_{\tau }:L^{1}(\mathbb {X})\rightarrow L^{1}(\mathbb {X})\) is defined by

    $$\begin{aligned} \mathcal {P}_{\tau }p_{t}(x)=\intop _{\mathbb {X}}p_{\tau }(y,x)\,p_{t}(y)\,\mathrm {d}y. \end{aligned}$$
  2. (b)

    The Koopman operator \(\mathcal {K}_{\tau }:L^{\infty }(\mathbb {X})\rightarrow L^{\infty }(\mathbb {X})\) is defined by

    $$\begin{aligned} \mathcal {K}_{\tau }f_{t}(x)=\intop _{\mathbb {X}}p_{\tau }(x,y)\,f_{t}(y)\,\mathrm {d}y=\mathbb {E}[f_{t}(\mathbf {X}_{t+\tau })\,\vert \,\mathbf {X}_{t}=x]. \end{aligned}$$

Both \(\mathcal {P}_{\tau }\) and \(\mathcal {K}_{\tau }\) are linear but infinite-dimensional operators which are adjoint to each other with respect to the standard duality pairing \(\langle \cdot ,\,\cdot \rangle \), defined by \(\langle f,\,g\rangle =\int _{\mathbb {X}}f(x)\,g(x)\,\mathrm{d}x\). The homogeneity of the stochastic process \(\{\mathbf {X}_{t}\}_{t\ge 0}\) implies the so-called semigroup property of the operators, i.e., \(\mathcal {P}_{\tau +\sigma }=\mathcal {P_{\tau }}\mathcal {P}_{\sigma }\) and \(\mathcal {K}_{\tau +\sigma }=\mathcal {K_{\tau }}\mathcal {K_{\sigma }}\) for \(\tau ,\sigma \ge 0\). In other words, these operators describe time-stationary Markovian dynamics. While the Perron–Frobenius operator describes the evolution of densities, the Koopman operator describes the evolution of observables. For the analysis of the long-term behavior of dynamical systems, densities that remain unchanged by the dynamics play an important role (one can think of the concept of ergodicity).

Definition 2.4

A density \(\pi \) is called an invariant density or equilibrium density if \(\mathcal {P}_{\tau }\,\pi =\pi \). That is, the equilibrium density \(\pi \) is an eigenfunction of the Perron–Frobenius operator \(\mathcal {P}_{\tau }\) with corresponding eigenvalue 1.

In what follows, \(L_{\pi }^{r}(\mathbb {X})\) denotes the weighted \(L^{r}\)-space of functions f such that \(\Vert f\Vert _{L_{\pi }^{r}}:=\int _{\mathbb {X}}|f(x)|^{r}\pi (x)\,\mathrm{d}x<\infty \). While one can consider the evolution of densities with respect to any density, we are particularly interested in the evolution with respect to the equilibrium density. From this point on, we assume there is a unique invariant density. This assumption is typically satisfied for molecular dynamics applications, where the invariant density is given by the Boltzmann distribution.

Definition 2.5

Let \(L_{\pi }^{1}(\mathbb {X})\ni u_{t}(x)=\pi (x)^{-1}\,p_{t}(x)\) be a probability density with respect to the equilibrium density \(\pi \). Then the Perron–Frobenius operator (propagator) with respect to the equilibrium density, denoted by \( \mathcal {T}_\tau \), is defined by

$$\begin{aligned} \mathcal {T}_{\tau }u_{t}(x) = \intop _{\mathbb {X}}\frac{\pi (y)}{\pi (x)}\,p_{\tau }(y,x)\,u_{t}(y)\,\mathrm {d}y. \end{aligned}$$

The operators \(\mathcal {P}_{\tau }\) and \(\mathcal {K}_{\tau }\) can be defined on other spaces \( L^r \) and \( L^{r'} \), with \( r \ne 1 \) and \( r' \ne \infty \), see (Baxter and Rosenthal 1995; Klus et al. 2016) for more details. By defining the weighted duality pairing \(\langle f,g\rangle _{\pi }=\int _{\mathbb {X}}f(x)\,g(x)\,\pi (x)\,\mathrm{d}x\) for \(f\in L_{\pi }^{r}(\mathbb {X})\) and \(g\in L_{\pi }^{r'}(\mathbb {X})\), where \(\frac{1}{r}+\frac{1}{r'}=1\), \(\mathcal {T}_{\tau }\) defined on \(L_{\pi }^{r'}(\mathbb {X})\) is the adjoint of \(\mathcal {K}_{\tau }\) defined on \(L_{\pi }^{r}(\mathbb {X})\) with respect to \(\langle \cdot ,\,\cdot \rangle _{\pi }\):

$$\begin{aligned} \langle \mathcal {K}_{\tau }f,\,g\rangle _{\pi } = \langle f,\,\mathcal {T}_{\tau }g\rangle _{\pi }. \end{aligned}$$

For more details, see (Lasota and Mackey 1994; Schütte et al. 1999; Froyland et al. 2013; Noé and Nüske 2013; Nüske et al. 2014; Klus et al. 2016; Wu et al. 2017). The two operators \(\mathcal {P}_{\tau }\) and \(\mathcal {T}_{\tau }\) are often referred to as forward operators, whereas \(\mathcal {K}_{\tau }\) is also called backward operator, as they are the solution operators of the forward (Fokker–Planck) and backward Kolmogorov equations (Lasota and Mackey 1994, Section 11), respectively.

2.3 Spectral Decomposition of Transfer Operators

In what follows, let \(\mathcal {A}_{\tau }\) denote one of the transfer operators defined above, i.e., \(\mathcal {P}_{\tau }\), \(\mathcal {T}_{\tau }\), or \(\mathcal {K}_{\tau }\). We are particularly interested in computing eigenvalues \(\lambda _{\ell }(\tau ) \in \mathbb {C}\) and eigenfunctions \(\varphi _{\ell }:\mathbb {X}\rightarrow \mathbb {C}\) of transfer operators, i.e.,

$$\begin{aligned} \mathcal {A}_{\tau }\varphi _{\ell } = \lambda _{\ell }(\tau ) \,\varphi _{\ell }. \end{aligned}$$

Note that the eigenvalues depend on the lag time \( \tau \). For the sake of simplicity, we will often omit this dependency. The eigenvalues and eigenfunctions of transfer operators contain important information about the global properties of the system such as metastable sets or fast and slow processes and can also be used as reduced coordinates, see (Dellnitz and Junge 1999; Schütte et al. 1999; Mezić 2005; Schütte and Sarich 2013; Froyland et al. 2013, 2014) and references therein.

Example 2.6

The eigenvalues \(\lambda _{\ell }\) and eigenfunctions \(\varphi _{\ell }\) of \(\mathcal {K_{\tau }}\) associated with the Ornstein–Uhlenbeck process introduced in Example 2.1 are given by

$$\begin{aligned} \lambda _{\ell }(\tau )=e^{-\alpha D\,(\ell -1)\,\tau },\quad \varphi _{\ell }(x)=\frac{1}{\sqrt{(\ell -1)!}}\,H_{\ell -1}\left( \sqrt{\alpha }\,x\right) ,\quad \ell =1,2,\dots , \end{aligned}$$

where \(H_{\ell }\) denotes the \(\ell \)th probabilists’ Hermite polynomial (Pavliotis 2014). The eigenfunctions of \(\mathcal {P}_{\tau }\) are given by the eigenfunctions of \(\mathcal {K}_{\tau }\) multiplied by the equilibrium density \(\pi \), see also Fig. 2. \(\square \)

Fig. 2
figure 2

Dominant eigenfunctions of the Ornstein–Uhlenbeck process computed analytically (dotted lines) and using VAC/EDMD (solid lines). a Eigenfunctions of the Perron–Frobenius operator \(\mathcal {P}_{\tau }\). b Eigenfunctions of the Koopman operator \(\mathcal {K_{\tau }}\)

In addition to the eigenvalues and eigenfunctions, an essential part of the Koopman operator analysis is the set of Koopman modes for the so-called full-state observable \(g(x)=x\). The Koopman modes are vectors that, together with the eigenvalues and eigenfunctions, allow us to reconstruct and to propagate the system’s state (Williams et al. 2015a). More precisely, assume that each component \(g_{i}\) of the full-state observable, i.e., \(g_{i}(x)=x_{i}\) for \(i=1,\,\dots ,\,d\), can be written in terms of the eigenfunctions as \(g_{i}(x)=\sum _{\ell }\varphi _{\ell }(x)\,\eta _{i\ell }\). Defining the Koopman modes by \(\eta _{\ell }=[\eta _{1\ell },\dots ,\,\eta _{d\ell }]^{T}\), we obtain \(g(x) = x=\sum _{\ell }\varphi _{\ell }(x)\,\eta _{\ell }\) and thus

$$\begin{aligned} \mathcal {K}_{\tau }g(x) = \mathbb {E}[g(\mathbf {X}_{\tau })\,\vert \,\mathbf {X}_{0}=x] = \mathbb {E}[\mathbf {X}_{\tau }\,\vert \,\mathbf {X}_{0}=x] = \sum _{\ell }\lambda _{\ell }(\tau )\,\varphi _{\ell }(x)\,\eta _{\ell }. \end{aligned}$$
(3)

For vector-valued functions, the Koopman operator is defined to act componentwise. In order to be able to compute eigenvalues, eigenfunctions, and eigenmodes numerically, we project the infinite-dimensional operators onto finite-dimensional spaces spanned by a given set of basis functions. This will be described in detail in Sect. 4.

2.4 Reversibility

We briefly recapitulate the properties of reversible systems. For many applications, including commonly used molecular dynamics models, the dynamics in full phase space are known to be reversible.

Definition 2.7

A system is said to be reversible if the so-called detailed balance condition is fulfilled, i.e., it holds for all \(x,y\in \mathbb {X}\) that

$$\begin{aligned} \pi (x)\,p_{\tau }(x,y)=\pi (y)\,p_{\tau }(y,x). \end{aligned}$$
(4)

Example 2.8

The Ornstein–Uhlenbeck process is reversible. It is straightforward to verify that (4) is fulfilled by the transition density (1) and the stationary density (2) for all values of x, y and \(\tau \). Also general Smoluchowski equations of a d-dimensional system of the form

$$\begin{aligned} \mathrm {d}\mathbf {X}_{t } =-D\nabla V(\mathbf {X}_{t})\,\mathrm {d}t+\sqrt{2dD}\,\mathrm {d}\mathbf {W}_{t} \end{aligned}$$

with dimensionless potential V(x) are reversible. The stationary density is then given by \(\pi \varpropto \exp (-V(x))\) (Leimkuhler et al. 2006). \(\square \)

As a result of the detailed balance condition, the Koopman operator \(\mathcal {K}_{\tau }\) and the Perron–Frobenius operator with respect to the equilibrium density, \(\mathcal {T}_{\tau }\), are identical (hence also self-adjoint):

$$\begin{aligned} \mathcal {K_{\tau }}f=\intop _{\mathbb {X}}p_{\tau }(x,y)\,f(y)\,\mathrm {d}y=\intop _{\mathbb {X}}\frac{\pi (y)}{\pi (x)}\,p_{\tau }(y,x)\,f(y)\,\mathrm {d}y=\mathcal {T}_{\tau }\,f. \end{aligned}$$

Moreover, both \(\mathcal {K}_{\tau }\) and \(\mathcal {P}_{\tau }\) become self-adjoint with respect to the stationary density, i.e.,

$$\begin{aligned} \langle \mathcal {P}_{\tau }f,\,g\rangle _{\pi ^{-1}}&=\langle f,\mathcal {\,P}_{\tau }g\rangle _{\pi ^{-1}},\\ \langle \mathcal {K_{\tau }}f,\,g\rangle _{\pi }&=\langle f,\,\mathcal {K}_{\tau }g\rangle _{\pi }. \end{aligned}$$

Hence, the eigenvalues \(\lambda _{\ell }\) are real and the eigenfunctions \(\varphi _{\ell }\) of \(\mathcal {K}_{\tau }\) form an orthogonal basis with respect to \(\langle \cdot ,\,\cdot \rangle _{\pi }\). That is, the eigenfunctions can be scaled so that \(\langle \varphi _{\ell },\,\varphi _{\ell '}\rangle _{\pi }=\delta _{\ell \ell '}\). Furthermore, the leading eigenvalue \(\lambda _{1}\) is the only eigenvalue with absolute value 1 and we obtain

$$\begin{aligned} 1=\lambda _{1}>\lambda _{2}\ge \lambda _{3}\ge \dots , \end{aligned}$$

see, for example, Nüske et al. (2014). We can then expand a function \(f\in L_{\pi }^{2}(\mathbb {X})\) in terms of the eigenfunctions as \(f=\sum _{\ell =1}^{\infty }\langle f,\,\varphi _{\ell }\rangle _{\pi }\,\varphi _{\ell }\) such that

$$\begin{aligned} \mathcal {K}_{\tau }f=\sum _{\ell =1}^{\infty }\lambda _{\ell }(\tau )\,\langle f,\,\varphi _{\ell }\rangle _{\pi }\,\varphi _{\ell }. \end{aligned}$$
(5)

Furthermore, the eigenvalues decay exponentially with \(\lambda _{\ell }(\tau )=\exp (-\kappa _{\ell }\tau )\) with relaxation rate \(\kappa _{\ell }\) and relaxation timescale \(t_{\ell }^{-1}\). Thus, for a sufficiently large lag time \(\tau \), the fast relaxation processes have decayed and (5) can be approximated by finitely many terms. The propagator \(\mathcal {P}_{\tau }\) has the same eigenvalues and the eigenfunctions \(\tilde{\varphi }_{\ell }\) are given by \(\tilde{\varphi }_{\ell }(x)=\pi (x)\,\varphi _{\ell }(x)\).

3 Data-Driven Approximation of Transfer Operators

In this section, we will describe different data-driven methods to identify the dominant dynamics of dynamical systems and to compute eigenfunctions of transfer operators associated with the system, namely TICA and DMD as well as VAC and EDMD. A formal derivation of methods to compute finite-dimensional approximations of transfer operators—resulting in the aforementioned methods – will be given in Sect. 4. Although TICA can be regarded as a special case of VAC, and DMD as a special case of EDMD, these methods are often used in different settings. With the aid of TICA, for instance, it is possible to identify the main slow coordinates and to project the dynamics onto the resulting reduced space, which can then be discretized using conventional Markov state models (a special case of VAC or EDMD, respectively, see Sect. 3.5). We will introduce the original methods—TICA and DMD—first and then extend these methods to the more general case. Since in many publications a different notation is used, we will first start with the required basic definitions.

In what follows, let \(x_{i},y_{i}\in \mathbb {R}^{d}\), \(i=1,\dots ,m\), be a set of pairs of d-dimensional data vectors, where \(x_{i}=\mathbf {X}_{t_{i}}\) and \(y_{i}=\mathbf {X}_{t_{i}+\tau }\). Here, the underlying dynamical system is not necessarily known, the vectors \(x_{i}\) and \(y_{i}\) can simply be measurement data or data from a black box simulation. In matrix form, this can be written as

$$\begin{aligned} X=\begin{bmatrix}x_{1}&x_{2}&\cdots&x_{m}\end{bmatrix}\quad \text {and}\quad Y=\begin{bmatrix}y_{1}&y_{2}&\cdots&y_{m}\end{bmatrix}, \end{aligned}$$
(6)

with \(X,\,Y\in \mathbb {R}^{d\times m}\). If one long trajectory \(\{z_{0},\,z_{1},\,z_{2},\,\dots \}\) of a dynamical system is given, i.e., \(z_{i}=\mathbf {X}_{t_{0}+h\,i}\), where h is the step size and \(\tau =n_{\tau }\,h\) the lag time, we obtain

$$\begin{aligned} X=\begin{bmatrix}z_{0}&z_{1}&\cdots&z_{m-1}\end{bmatrix}\quad \text {and}\quad Y=\begin{bmatrix}z_{n_{\tau }}&z_{n_{\tau }+1}&\cdots&z_{n_{\tau }+m-1}\end{bmatrix}. \end{aligned}$$

That is, in this case Y is simply X shifted by the lag time \(\tau \). Naturally, if more than one trajectory is given, the data matrices X and Y can be concatenated.

In addition to the data, VAC and EDMD require a set of uniformly bounded basis functions or observables, given by \(\{\psi _{1},\,\psi _{2},\,\dots ,\,\psi _{k}\}\subset L^{\infty }(\mathbb {X})\). Since \(\mathbb {X}\) is assumed to be bounded, we have \(\psi _i\in L^r(\mathbb {X})\) for all \(i=1,\ldots ,k\) and \(1\le r\le \infty \). The basis functions could, for instance, be monomials, indicator functions, radial basis functions, or trigonometric functions. The optimal choice of basis functions remains an open problem and depends strongly on the system. If the set of basis functions is not sufficient to represent the eigenfunctions, the results will be inaccurate. A too large set of basis functions, on the other hand, might lead to ill-conditioned matrices and overfitting. Cross-validation strategies have been developed to detect overfitting (McGibbon and Pande 2015).

For a basis \(\psi _{i}\), \(i=1,\,\dots ,\,k\), define \(\psi :\mathbb {R}^{d}\rightarrow \mathbb {R}^{k}\) to be the vector-valued function given by

$$\begin{aligned} \psi (x) = [\psi _{1}(x),\,\psi _{2}(x),\,\dots ,\,\psi _{k}(x)]^{T}. \end{aligned}$$
(7)

The goal then is to find the best approximation of a given transfer operator in the space spanned by these basis functions. This will be explained in detail in Sect. 4. In addition to the data matrices X and Y, we will need the transformed data matrices

$$\begin{aligned} \Psi _{X} = \begin{bmatrix} \psi (x_{1})&\psi (x_{2})&\dots&\psi (x_{m}) \end{bmatrix} \quad \text {and} \quad \Psi _{Y} = \begin{bmatrix} \psi (y_{1})&\psi (y_{2})&\dots&\psi (y_{m}) \end{bmatrix}. \end{aligned}$$
(8)

3.1 Time-Lagged Independent Component Analysis

Time-lagged independent component analysis (TICA) has been introduced in Molgedey and Schuster (1994) as a solution to the blind source separation problem, where the correlation matrix and the time-delayed correlation matrix are used to separate superimposed signals. The term TICA has been introduced later (Hyvärinen et al. 2001). TDSEP Ziehe and Müller (1998), an extension of TICA, is popular in the machine learning community. It was shown only recently that TICA is a special case of the VAC by computing the optimal linear projection for approximating the slowest relaxation processes, and as such provides an approximation of the leading eigenvalues and eigenfunctions of transfer operators (Pérez-Hernández et al. 2013). TICA is now a popular dimension reduction technique in the field of molecular dynamics (Pérez-Hernández et al. 2013; Schwantes and Pande 2013). That is, TICA is used as a preprocessing step to reduce the size of the state space by projecting the dynamics onto the main coordinates. The time-lagged independent components are required (a) to be uncorrelated and (b) to maximize the autocovariances at lag time \(\tau \), see (Hyvärinen et al. 2001; Pérez-Hernández et al. 2013) for more details. Assuming that the system is reversible, the TICA coordinates are the eigenfunctions of \(\mathcal {T}_{\tau }\) or \(\mathcal {K}_{\tau }\), respectively, projected onto the space spanned by linear basis functions, i.e., \(\psi (x)=x\).

Let \(C(\tau )\) be the time-lagged covariance matrix defined by

$$\begin{aligned} C_{ij}(\tau ) = \langle \mathbf {X}_{t,i}\,\mathbf {X}_{t+\tau ,j}\rangle _{t} = \mathbb {E}_{\pi }\left[ \mathbf {X}_{t,i}\,\mathbf {X}_{t+\tau ,j}\right] . \end{aligned}$$

Given data X and Y as defined above, estimators \(C_{0}\) and \(C_{\tau }\) for the covariance matrices C(0) and \(C(\tau )\) can be computed as

$$\begin{aligned} \begin{aligned} C_{0}&=\tfrac{1}{m-1}\sum _{k=1}^{m}x_{k}\,x_{k}^{T}=\tfrac{1}{m-1}XX^{T},\\ C_{\tau }&=\tfrac{1}{m-1}\sum _{k=1}^{m}x_{k}\,y_{k}^{T}=\tfrac{1}{m-1}XY^{T}. \end{aligned} \end{aligned}$$
(9)

The time-lagged independent components are defined to be solutions of the eigenvalue problem

$$\begin{aligned} C_{\tau }\,\xi _{\ell }=\lambda _{\ell }\,C_{0}\,\xi _{\ell }\quad \text {or}\quad C_{0}^{+}C_{\tau }\,\xi _{\ell }=\lambda _{\ell }\,\xi _{\ell }, \end{aligned}$$
(10)

respectively. In what follows, let \(M_{\mathrm{TICA}}=C_{0}^{+}C_{\tau }\), where \(C_{0}^{+}\) denotes the Moore–Penrose pseudo-inverse of \(C_{0}\).

In applications, often the symmetrized estimators

$$\begin{aligned} C_{0} = \tfrac{1}{2m-2}(XX^{T}+YY^{T}) \quad \text {and} \quad C_{\tau }=\tfrac{1}{2m-2}(XY^{T}+YX^{T}) \end{aligned}$$

are used so that the resulting TICA coordinates become real-valued. This corresponds to averaging over the trajectory and the time-reversed trajectory. Note that this symmetrization can introduce a large estimator bias that affects the dominant spectrum of (10), if the process is non-stationary, or the distribution of the data is far from the equilibrium of the process. In the latter case, a reweighting procedure can be applied to obtain weighted versions of the estimators (9), to reduce that bias (Wu et al. 2017).

Example 3.1

Let us illustrate the idea behind TICA with a simple example. Consider the data shown in Fig. 3, which was generated by a stochastic process which will typically spend a long time in one of the two clusters before it jumps to the other. We are interested in finding these metastable sets. Standard principal component analysis (PCA) leads to the coordinate shown in red, whereas TICA—shown in black—takes time information into account and is thus able to identify the slow direction of the system correctly. Projecting the system onto the x-coordinate will preserve the slow process while eliminating the fast stochastic noise. \(\square \)

Fig. 3
figure 3

The difference between PCA and TICA. The top and bottom plot show the x- and y-component of the system, respectively, the plot on the right the resulting main principal component vector and the main TICA coordinate

figure a

The TICA coordinates can be computed using AMUSEFootnote 5 Tong et al. (1990) as shown in Algorithm 1. Instead of computing a singular value decomposition of the data matrix X in step 1, an eigenvalue decomposition of the covariance matrix \(XX^{T}\) could be computed, which is more efficient if \(m\gg d\), but less accurate. The vectors \(\xi _{\ell }\) computed by AMUSE are solutions of the eigenvalue problem (10), since

$$\begin{aligned} M_{\mathrm{TICA}}\,\xi _{\ell }&=\left( XX^{T}\right) ^{+}XY^{T}U\Sigma ^{-1}w_{\ell }\\&=U\Sigma ^{-1}\tilde{X}\tilde{Y}^{T}w_{\ell }\\&=\lambda _{\ell }\,U\Sigma ^{-1}w_{\ell }\\&=\lambda _{\ell }\,\xi _{\ell }. \end{aligned}$$

In the second line, we used the fact that \((XX^{T})^{+}=U\Sigma ^{-2}U^{T}\) and in the third that \(w_{\ell }\) is an eigenvector of \(\overline{M}_{\mathrm{TICA}}=\tilde{X}\tilde{Y}^{T}\).

3.2 Dynamic Mode Decomposition

Dynamic mode decomposition (DMD) was developed by the fluid dynamics community as a tool to identify coherent structures in fluid flows (Schmid 2010). Since its introduction, several variants and extensions have been proposed, see (Chen et al. 2012; Tu et al. 2014; Jovanović et al. 2014; Brunton et al. 2015; Klus et al. 2016). A review of the applications of Koopman operator theory in fluid mechanics can be found in Mezić (2013). DMD can be viewed as a combination of a PCA in the spatial domain and a Fourier analysis in the frequency domain (Brunton et al. 2016). It can be shown that the DMD modes are the Koopman modes for the set of basis functions defined by \(\psi (x)=x\). Given again data X and Y as above, the idea behind DMD is to assume that there exists a linear operator \(M_{\mathrm{DMD}}\) such that \(y_{i}=M_{\mathrm{DMD}}\,x_{i}\). Since the underlying dynamical system is in general nonlinear, this equation cannot be fulfilled exactly and we want to compute the matrix \(M_{\mathrm{DMD}}\) in such a way that the Frobenius norm of the deviation is minimized, i.e.,

$$\begin{aligned} \min \left\| Y-M_{\mathrm{DMD}}X \right\| _{F}. \end{aligned}$$
(11)

The solution of this minimization problem is given by

$$\begin{aligned} M_{\mathrm{DMD}}=YX^{+}=\big (YX^{T}\big )\big (XX^{T}\big )^{+}=C_{\tau }^{T}C_{0}^{+}=M_{\mathrm{TICA}}^{T}. \end{aligned}$$
(12)

The eigenvalues and eigenvectors of \(M_{\mathrm{DMD}}\) are called DMD eigenvalues and modes, respectively. That is, we are solving

$$\begin{aligned} M_{\mathrm{DMD}}\,\xi _{\ell }=\lambda _{\ell }\,\xi _{\ell }. \end{aligned}$$

The above equations already illustrate the close relationship with TICA, cf. (10). The DMD modes are the right eigenvectors of \(M_{\mathrm{DMD}}\), whereas the TICA coordinates are defined to be the right eigenvectors of the transposed matrix \(M_{\mathrm{TICA}}\). Hence, the TICA coordinates are the left eigenvectors of the DMD matrix and the DMD modes the left eigenvectors of the TICA matrix. This is consistent with the results that will be presented in the VAC and EDMD subsections below: The TICA coordinates represent the Koopman eigenfunctions projected onto the space spanned by linear basis functions, i.e., \(\psi (x)=x\), while the DMD modes are the corresponding Koopman modes.

Similar to AMUSE, the DMD modes and eigenvalues can be obtained without explicitly computing \(M_{\mathrm{DMD}}\) by using a reduced singular value decomposition of X. Standard and exact DMD are presented in Algorithm 2. The standard DMD modes are simply the exact DMD modes projected onto the range of the matrix X, see Tu et al. (2014).

figure b

Remark 3.2

TICA and standard DMD are closely related. When comparing with the AMUSE formulation, we obtain

$$\begin{aligned} \overline{M}_{\mathrm{TICA}}=\tilde{X}\tilde{Y}^{T}=\Sigma ^{-1}U^{T}XY^{T}U\Sigma ^{-1}=\Sigma \,U{}^{T}M_{\mathrm{TICA}}\,U\Sigma ^{-1}=:W\Lambda W^{-1} \end{aligned}$$

and

$$\begin{aligned} \overline{M}_{\mathrm{DMD}}=U^{T}YV\Sigma ^{-1}=U^{T}M_{\mathrm{DMD}}\,U=U^{T}M_{\mathrm{TICA}}^{T}U=:\tilde{W}\tilde{\Lambda }\tilde{W}^{-1}. \end{aligned}$$

The TICA coordinates are given by \(\Xi =U\Sigma ^{-1}W\) and the standard DMD modes by \(\tilde{\Xi }=U\tilde{W}\) so that—except for the scaling \(\Sigma ^{-1}\)—AMUSE and standard DMD use the same projection, the main difference is that the former computes the eigenvectors of \(M_{\mathrm{TICA}}\) and the latter the eigenvectors of the transposed matrix \(M_{\mathrm{TICA}}^{T}\). As a result, AMUSE could be rewritten to compute the DMD modes if we define \(\overline{M}'_{\mathrm{DMD}} = \tilde{Y}\tilde{X}^{T} = \Sigma ^{-1}U^{T}YX^{T}U\Sigma ^{-1}\) in step 3 of the algorithm and \(\xi _{\ell }=U\Sigma \,w_{\ell }\) in step 5, where \(w_{\ell }\) now denotes the eigenvectors of \(\overline{M}'_{\mathrm{DMD}}\).

3.3 Variational Approach of Conformation Dynamics

The variational approach of conformation dynamics (VAC) (Noé and Nüske 2013; Nüske et al. 2014, 2016) is a generalization of the frequently used Markov state modeling framework that allows arbitrary basis functions and is similar to the variational approach in quantum mechanics (Nüske et al. 2014). As described above, VAC and EDMD (see below) require—in addition to the data—a set of basis functions (also called dictionary), given by \(\psi \). The variational approach is defined only for reversible systems—EDMD does not require this restriction—and computes eigenfunctions of \(\mathcal {T}_{\tau }\) or \(\mathcal {K}_{\tau }\), respectively. Using the data matrices \(\Psi _{X}\) and \(\Psi _{Y}\) defined in (8), \(C_{0}\) and \(C_{\tau }\) defined in (9) for the transformed data can be estimated as

$$\begin{aligned} \begin{aligned} C_{0}&=\tfrac{1}{m-1}\sum _{k=1}^{m}\psi (x_{k})\,\psi (x_{k})^{T}=\tfrac{1}{m-1}\Psi _{X}\Psi _{X}^{T},\\ C_{\tau }&=\tfrac{1}{m-1}\sum _{k=1}^{m}\psi (x_{k})\,\psi (y_{k})^{T}=\tfrac{1}{m-1}\Psi _{X}\Psi _{Y}^{T}. \end{aligned} \end{aligned}$$

In what follows, let \(M_{\mathrm{VAC}}=C_{0}^{+}C_{\tau }\) for the transformed data matrices \(\Psi _{X}\) and \(\Psi _{Y}\). The matrix \(M_{\mathrm{VAC}}\) can be regarded as a finite-dimensional approximation of \(\mathcal {K}_{\tau }\) (or \(\mathcal {T}_{\tau }\), since the system is assumed to be reversible; the derivation is shown in Sect. 4), respectively. Eigenfunctions of the operator can then be approximated by the eigenvectors of the matrix \(M_{\mathrm{VAC}}\). Let \(\xi _{\ell }\) be an eigenvector of \(M_{\mathrm{VAC}}\), i.e.,

$$\begin{aligned} M_{\mathrm{VAC}}\,\xi _{\ell }=\lambda _{\ell }\,\xi _{\ell }, \end{aligned}$$

and \(\varphi _{\ell }(x)=\xi _{\ell }^{*}\psi (x)\), where \(^{*}\) denotes the conjugate transpose. Since

$$\begin{aligned} \mathcal {K}_{\tau }\varphi _{\ell }(x)\approx (M_{\mathrm{VAC}}\,\xi _{\ell })^{*}\,\psi (x)=\lambda _{\ell }\,\xi _{\ell }^{*}\,\psi (x)=\lambda _{\ell }\,\varphi _{\ell }(x), \end{aligned}$$

we obtain an approximation of the eigenfunctions of \(\mathcal {K}_{\tau }\). The derivation will be described in detail in Sect. 4.

3.4 Extended Dynamic Mode Decomposition

Extended dynamic mode decomposition (EDMD), a generalization of DMD, can be used to compute finite-dimensional approximations of the Koopman operator, its eigenvalues, eigenfunctions, and eigenmodes (Williams et al. 2015a, b). It was shown in Klus et al. (2016) that EDMD can be extended to approximate also eigenfunction of the Perron–Frobenius operator with respect to the density underlying the data points. With the notation introduced above, the minimization problem (11) for the transformed data matrices \(\Psi _{X}\) and \(\Psi _{Y}\) can be written as

$$\begin{aligned} \min \left\| \Psi _{Y}-M_{\mathrm{EDMD}}\Psi _{X} \right\| _{F}. \end{aligned}$$
(13)

The solution—see also (12)—is given by

$$\begin{aligned} M_{\mathrm{EDMD}}=\Psi _{Y}\Psi _{X}^{+}=\big (\Psi _{Y}\Psi _{X}^{T}\big )\big (\Psi _{X}\Psi _{X}^{T}\big )^{+}=C_{\tau }^{T}\,C_{0}^{+}=M_{\mathrm{VAC}}^{T}. \end{aligned}$$

That is, instead of assuming a linear relationship between the data matrices X and Y, EDMD aims at finding a linear relationship between the transformed data matrices \(\Psi _{X}\) and \(\Psi _{Y}\). Eigenfunctions of the Koopman operator are then given by the left eigenvectors \(\xi _{\ell }\) of \(M_{\mathrm{EDMD}}\), i.e.,

$$\begin{aligned} \varphi _{\ell }(x)=\xi _{\ell }^{*}\,\psi (x). \end{aligned}$$

The derivation of EDMD can be found in Sect. 4. Since the left eigenvectors of \(M_{\mathrm{EDMD}}\) are the right eigenvectors of \(M_{\mathrm{VAC}}\), VAC and EDMD are equivalent as they compute exactly the same eigenvalue and eigenfunction approximations for a data and basis set.

As shown in Klus et al. (2016), EDMD can also be used to approximate the Perron–Frobenius operator as follows:

$$\begin{aligned} \tilde{M}_{\mathrm{EDMD}} = \big (\Psi _{X}\Psi _{Y}^{T}\big )\big (\Psi _{X}\Psi _{X}^{T}\big )^{+}=C_{\tau }\,C_{0}^{+}. \end{aligned}$$

It is important to note that the Perron–Frobenius operator is computed with respect to the density underlying the data matrices. That is, if X is sampled from a uniform distribution, we obtain the eigenfunctions of the Perron–Frobenius operator \(\mathcal {P}_{\tau }\). If we, on the other hand, use one long trajectory, the underlying density converges to the equilibrium density \(\pi \) and we obtain the eigenfunctions of the Perron–Frobenius operator with respect to the equilibrium density, denoted by \(\mathcal {T}_{\tau }\). An approach to compute the equilibrium density from off-equilibrium data is proposed in Wu et al. (2017).

Example 3.3

Let us consider the Ornstein–Uhlenbeck process introduced in Example 2.1. Here, \(\alpha =4\) and \(D=0.25\). The lag time is defined to be \(\tau =1\). We generated \(10^{5}\) uniformly distributed test points in \([-2,\,2]\) and used a basis comprising monomials of order up to 10. With the aid of EDMD, we computed the dominant eigenfunctions of the Perron–Frobenius operator \(\mathcal {P}_{\tau }\) and the Koopman operator \(\mathcal {K}_{\tau }\) (which is identical to \(\mathcal {T}_{\tau }\) here due to reversibility). The results are shown in Fig. 2. The corresponding eigenvalues are given by

$$\begin{aligned} \lambda _{1}(\tau )=1.00,\quad \lambda _{2}(\tau )=0.37,\quad \lambda _{3}(\tau )=0.13,\quad \lambda _{4}(\tau )=0.049, \end{aligned}$$

which is a good approximation of the analytically computed eigenvalues (Example 2.6). \(\square \)

In order to approximate the Koopman modes, let \(\varphi (x)=\left[ \varphi _{1}(x),\,\dots ,\,\varphi _{k}(x)\right] ^{T}\) be the vector of eigenfunctions and

$$\begin{aligned} \Xi = \begin{bmatrix}\xi _{1}&\xi _{2}&\dots&\xi _{k}\end{bmatrix} \end{aligned}$$

the matrix that contains all left eigenvectors of \(M_{\mathrm{EDMD}}\). Furthermore, define \(B\in \mathbb {R}^{d\times k}\) such that \(g(x)=B\,\psi (x)\). That is, the full-state observable is written in terms of the basis functionsFootnote 6. Since \(\varphi (x)=\Xi ^{*}\,\psi (x)\), this leads to \(g(x)=B\,\psi (x)=B\,(\Xi ^{*})^{-1}\varphi (x)\). Thus, the \(\ell \)th column vector of the matrix \(\varvec{\eta } = B\,(\Xi ^{*})^{-1}\) represents the Koopman mode \(\eta _{\ell }\) required for the reconstruction of the dynamical system, see (3).

3.5 Relationships with Other Methods

For particular choices of basis functions, VAC and EDMD are equivalent to other methods (see also Nüske et al. 2016; Klus et al. 2016):

  1. 1.

    If we choose \(\psi (x)=x\), we obtain TICA and DMD, respectively. That is, the TICA coordinates are the eigenfunctions of the Koopman operator projected onto the space spanned by linear basis functions and the DMD modes are the corresponding Koopman modes. (Note that in this case \({\varvec{B}}={\varvec{I}}\) and the matrix \(\varvec{\eta } = (\Xi ^{*})^{-1}\) contains the right eigenvectors of \(M_{\mathrm{EDMD}}\).) In many applications of TICA, the basis functions are modified to have zero mean. For reversible processes, this eliminates the stationary eigenvalue \(\lambda _{1}=1\) and its eigenfunction \(\varphi _{1}\equiv 1\). The largest eigenpair then approximates the slowest dynamical eigenvalue and eigenfunction, respectively.

  2. 2.

    If the set of basis functions comprises indicator functions \(\mathbbm {1}_{A_{1}},\,\dots ,\,\mathbbm {1}_{A_{k}}\) for a given decomposition of the state space into disjoint sets \(A_{1},\dots ,A_{k}\), VAC and EDMD result in Ulam’s method Ulam (1960) and thus a Markov state model (MSM).

These relationships are shown in Fig. 4. Detailed examples illustrating the use of VAC and EDMD can be found in Nüske et al. (2014), Williams et al. (2015a), Klus et al. (2016).

Fig. 4
figure 4

Relationships between data-driven methods. While VAC was derived for reversible dynamical systems, the derivation of EDMD covers non-reversible dynamics as well

3.6 Examples

3.6.1 Double Gyre

Let us consider the autonomous double gyre, which was introduced in Shadden et al. (2005), given by the SDE

$$\begin{aligned} \begin{aligned} d\mathbf {X}_{t}&=-\pi \,A\,\sin (\pi \,\mathbf {X}_{t})\,\cos (\pi \,\mathbf {Y}_{t}) + \varepsilon \,d\mathbf {W}_{t,1},\\ d\mathbf {Y}_{t}&=\phantom {-}\pi \,A\,\cos (\pi \,\mathbf {X}_{t})\,\sin (\pi \,\mathbf {Y}_{t}) + \varepsilon \,d\mathbf {W}_{t,2} \end{aligned} \end{aligned}$$

on the domain \(\mathbb {X}=[0,2]\times [0,1]\) with reflecting boundary. For \(\varepsilon =0\), there is no transport between the left half and the right half of the domain and both subdomains are invariant sets with measure \(\frac{1}{2}\), cf. (Froyland and Padberg 2009; Froyland and Padberg-Gehle 2014). For \(\varepsilon >0\), there is a small amount of transport due to diffusion and the subdomains are almost invariant. For the Koopman operator \(\mathcal {\mathcal {K}_{\tau }}\), this means that for \(\varepsilon =0\) the characteristic functions \(\tilde{\varphi }_{1}=\mathbbm {1}_{[0,1]\times [0,1]}\) and \(\tilde{\varphi }_{2}=\mathbbm {1}_{[1,2]\times [0,1]}\) are both eigenfunctions with corresponding eigenvalue 1. If, on the other hand, \(\varepsilon >0\), then the two-dimensional eigenspace subdivides into two one-dimensional eigenspaces with eigenvalues \(\lambda _{1}=1\) and \(\lambda _{2}=1-\mathcal {O}(\varepsilon )\) and eigenfunctions \(\varphi _{1}=\mathbbm {1}_{[0,2]\times [0,1]}\) and \(\varphi _{2}\approx \tilde{\varphi }_{1}-\tilde{\varphi }_{2}\). Typical trajectories of the system are shown in Fig. 5a. Using the parameters \(A=0.25\) and \(\varepsilon =0.05\), we integrated \(10^{5}\) randomly generated test points using the Euler–Maruyama scheme with step size \(h=10^{-3}\).

For the computation of the eigenfunctions, we choose a set of radial basis functions whose centers were given by the midpoints of an equidistant \(50\times 25\) box discretization, and a lag time \(\tau =3\). The resulting nontrivial leading eigenfunctions of the Koopman operator computed with EDMD are shown in Fig. 5b. The two almost invariant sets are clearly visible. The eigenfunctions of the Perron–Frobenius operator exhibit similar patterns (but “rotating” in the opposite direction).

Fig. 5
figure 5

a Typical trajectories (of different lengths) of the double gyre system for \(\varepsilon =0.05\). The initial states are marked by dots. Due to the diffusion term, particles can cross the separatrix (dashed line). The gray lines show the trajectories with the same initial conditions for \(\varepsilon =0\). b Leading eigenfunctions of the Koopman operator associated with the double gyre system computed using EDMD

Fig. 6
figure 6

Illustration of standard EDMD workflow in molecular dynamics using the deca alanine model system. a Leading implied timescales \(t_m\) (in nanoseconds) as estimated by TICA as a function of the lag time. b Effective dimension M selected by applying the criterion of total kinetic variance to the TICA eigenvalues. c Leading implied timescales \(t_{m}\) estimated by a Markov state model after projecting the data onto the first M TICA eigenvectors and discretizing this data set into 50 states using k-means. d Simple visualization of effective coarse grained dynamics. All MSM states are assigned to two macrostates using the PCCA algorithm. An overlay of representative structures from both macrostates shows that the dynamics between them corresponds to helix formation. Macrostates are drawn proportionally to their stationary probability

3.6.2 Deca Alanine

As a second example, we illustrate what has become a typical workflow for the application of VAC/EDMD in molecular dynamics, using deca alanine as a model system. Deca alanine is a small peptide comprised of ten alanine residues, and it has been used as a test system many times before. Here, we analyze equilibrium simulations of \(3\,\upmu \mathrm{s}\) total simulation time using the Amber03 force field, see (Nüske et al. 2014, 2016) for the detailed simulation setup. A set of important quantities for our analysis are the leading implied timescales

$$\begin{aligned} t_m = -\frac{\tau }{\log |\lambda _{m}|}, \end{aligned}$$
(14)

for \( m=2,3,\ldots \). Implied timescales are independent of the lag time (Pazy 1983, Theorem 2.2.4). However, if they are estimated using (14) and an approximation to the eigenvalues \(\lambda _{m}\) obtained from VAC/EDMD, the timescales will be underestimated (see Sect. 4.2.1) and the error will decrease as a function of the lag time (Djurdjevac et al. 2012). Approximate convergence of implied timescales with increasing lag time has become a standard model validation criterion in molecular dynamics (Prinz et al. 2011).

In the first step, a set of internal molecular coordinates is extracted from the simulation data, to which TICA is applied. In our example, we select all 16 backbone dihedral angles as internal molecular coordinates. Figure 6a shows the first five implied timescales estimated by TICA as a function of the lag time \(\tau \).

Next, a first dimension reduction is performed, where the data are projected onto the leading M TICA eigenvectors. The number M is selected by the criterion of total kinetic variance, that is, M is the smallest number such that the cumulative sum of the first M squared eigenvalues exceeds 95 per cent of the total sum of squared eigenvalues (Noé and Clementi 2015). Figure 6c shows the resulting dimension M as a function of the lag time.

As a third step, the reduced data set is discretized by application of a clustering method. In our case, we use k-means clustering to assign the data to 50 discrete states. A Markov state model (MSM, equivalent to Ulam’s method, see above) is estimated from the discretized time series. We show the first five implied timescales from the MSM in Fig. 6c and observe that estimates improve compared to the TICA approximations. Also, timescale estimates converge for lag times \(\tau \ge 4\,\mathrm {ns}\).

Finally, we use the converged model at lag time \(\tau =4\,\mathrm {ns}\) for further analysis. As the slowest implied timescale \(t_{2}\) dominates all others, and as it is the only one which is larger than the lag time used for analysis (indicated by the gray line in Fig. 6c), we attempt to extract a two-state model that captures the essential dynamics. We employ the PCCA + algorithm (Deuflhard and Weber 2005; Röblitz and Weber 2013) to coarse grain all MSM states into two macrostates. Inspection of randomly selected trajectory frames belonging to each macrostate reveals that the slow dynamical process in the data corresponds to the formation of a helix, see Fig. 6d. It should be noted that this coarse graining works well for visualization purposes, but some details need to be taken into account. In fact, PCCA performs a fuzzy assignment of MSM states to macrostates, where each MSM state belongs to each macrostate with a certain membership in [0, 1]. We simply assign each MSM state to the macrostate with maximal membership here. Alternatively, we could also use a hidden Markov model (HMM) to perform the coarse graining (Noé et al. 2013).

4 Derivations

In this section, we will show how VAC and EDMD as well as their respective special cases TICA and DMD can be derived and how these methods are related to eigenfunctions and eigenmodes of transfer operators.

4.1 General Dynamical Systems

Let us begin with general, not necessarily reversible dynamical systems. In order to be able to compute eigenfunctions of transfer operators numerically, the infinite-dimensional operators are projected onto a finite-dimensional space. We will briefly outline how the EDMD minimization problem (13) leads to an approximation of the Koopman operator.

Theorem 4.1

Let the process \(\{\mathbf {X}_{t}\}_{t\ge 0}\) be Feller-continuousFootnote 7. Let \(\psi _{i}\), \(i=1,\,\dots ,\,k\), be the set of at least piecewise continuous basis functions of the finite-dimensional linear space \(\mathbb {V}\). Let the empirical distribution of the data points \(x_{1},x_{2},\ldots \) converge weakly to the density \(\rho \). Then the minimization problem

$$\begin{aligned} \min _{K\in \mathbb {R}^{k\times k}}\frac{1}{m}\sum _{j=1}^{m}\left\| \psi (y_{j})-K^{T}\psi (x_{j}) \right\| _{2}^{2} \end{aligned}$$

converges, as \(m\rightarrow \infty \), almost surely to \(\min _{\hat{K}}\sum _{i=1}^{k}\Vert \mathcal {\mathcal {K_{\tau }}}\psi _{i}-\hat{K}\psi _{i}\Vert _{\rho }^{2}\), where the minimization is over all linear mappings \(\hat{K}:\mathbb {V}\rightarrow \mathbb {V}\).

Proof

Let \(f=\sum _{i=1}^{k}a_{i}\,\psi _{i}=a^{T}\psi \in \mathbb {V}\) be an arbitrary function, where \(a=[a_{1},\,\dots ,\,a_{k}]^{T}\). For a single data point \(x_{j}\), we have for a linear mapping \(\hat{K}:\mathbb {V}\rightarrow \mathbb {V}\) with matrix representation \(K\in \mathbb {R}^{k\times k}\) that

$$\begin{aligned} \hat{K}f(x_{j})=\sum _{i=1}^{k}(K\,a)_{i}\,\psi _{i}(x_{j})=a^{T}K^{T}\psi (x_{j}). \end{aligned}$$

Here, the ith column of the matrix K corresponds to \( \hat{K} \psi _i \). Thus, we obtain

$$\begin{aligned}&\frac{1}{m}\sum _{j=1}^{m}\left\| \psi (y_{j})-K^{T}\psi (x_{j}) \right\| _{2}^{2}\\&\quad = \sum _{i=1}^{k}\frac{1}{m}\sum _{j=1}^{m}(\psi _{i}(y_{j})-\hat{K}\psi _{i}(x_{j}))^{2}\\&\quad {\mathop {\longrightarrow }\limits ^{m\rightarrow \infty }} \sum _{i=1}^{k}\int _{\mathbb {X}}(\mathbb {E}[\psi _{i}(\mathbf {X}_{\tau })\,\vert \,\mathbf {X}_{0}=x]-\hat{K}\psi _{i}(x))^{2}\rho (x)\,dx\\&\quad = \sum _{i=1}^{k}\Vert \mathcal {\mathcal {K_{\tau }}}\psi _{i}-\hat{K}\psi _{i}\Vert _{\rho }^{2}, \end{aligned}$$

where the convergence for \(m\rightarrow \infty \) is almost sure. From the first line to the second we used that the \(y_{j}\) are realizations of the random variables \(\mathbf {X}_{\tau }\) given \(\mathbf {X}_{0}=x_{j}\), that \(\mathbf {X}_{\tau }\) is a Feller-continuous process, that the \(\psi _{i}\) are (piecewise) continuous functions, and that the sampling process of \(x_{j}\) is independent of the noise process that decides over \(\mathbf {X}_{\tau }\) given \(\mathbf {X}_{0}=x_{j}\). \(\square \)

With the aid of the data matrices \(\Psi _{X}\) and \(\Psi _{Y}\) defined in (8), this minimization problem can be written as

$$\begin{aligned} \min \left\| \Psi _{Y}-K^{T}\Psi _{X} \right\| _{F}^{2}, \end{aligned}$$

which is identical to (13), where now \(K^{T}=M_{\mathrm{EDMD}}\). Thus, the transposed EDMD matrix \(M_{\mathrm{EDMD}}\) is an approximation of the Koopman operator. A similar setup allows for the approximation of the Perron–Frobenius operator with respect to the data point density \(\rho \). For details, we refer to (Klus et al. 2016, Appendix A). Note, however, that although the Perron–Frobenius and Koopman operators are adjoint, the matrix representation of the discrete Perron–Frobenius operator will in general not just be the transposed of the matrix K, unless the ansatz functions \(\psi _{i}\) are orthonormal with respect to \(\langle \cdot ,\,\cdot \rangle _{\rho }\).

If the dynamical system is deterministic, we can already interpret the minimization (13) for finite values of m. As shown, e.g., in Klus et al. (2016), Korda and Mezić (2017), the solution of (13) is a Petrov–Galerkin projection of the Koopman operator on the ansatz space \(\mathbb {V}\).

4.2 Reversible Dynamical Systems

Let us now assume that the system is reversible. That is, it holds that \(\pi (x)\,p_{\tau }(x,y)=\pi (y)\,p_{\tau }(y,x)\) for all x and y.

4.2.1 Variational Principle for the Rayleigh Trace

We can also derive a variational formulation for the first M eigenvalues of the Koopman operator \(\mathcal {K}_{\tau }\) in the reversible setting. It is a standard result for self-adjoint operators on a Hilbert space with bounded eigenvalue spectrum, see, for example, Bandle (1980):

Proposition 4.2

Assume that \(1=\lambda _{1}>\lambda _{2}\ge \ldots \ge \lambda _{M}\) are the dominant eigenvalues of the Koopman operator \(\mathcal {K}_{\tau }\) on \(L_{\pi }^{2}\). Then

$$\begin{aligned} \begin{aligned} \sum _{\ell =1}^{M}\lambda _{\ell }&=\sup \sum _{\ell =1}^{M}\langle \mathcal {K}_{\tau }v_{\ell },v_{\ell }\rangle _{\pi },\\ \langle v_{\ell },v_{\ell ^{'}}\rangle _{\pi }&=\delta _{\ell \ell ^{'}} \end{aligned} \end{aligned}$$
(15)

The sum of the first M eigenvalues maximizes the Rayleigh trace, which is the sum on the right-hand side of (15) over all selections of M orthonormal functions \(v_{\ell }\). The maximum is attained for the first M eigenfunctions \(\varphi _{1},\ldots ,\varphi _{M}\).

Proof

The M-dimensional space \(\mathbb {V}\) spanned by the functions \(v_{\ell }\) must contain an element \(u_{M}\) which is orthonormal to the first \(M-1\) eigenfunctions \(\varphi _{\ell }\), i.e., \(\langle u_{M},\varphi _{\ell }\rangle _{\pi } = 0\), \(\ell =1,\ldots ,M-1\), and \( \left\| u_M \right\| _\pi = 1 \). By the standard Rayleigh principle for self-adjoint operators

$$\begin{aligned} \langle K_{\tau }u_{M},u_{M}\rangle _{\pi }\le \lambda _{M}. \end{aligned}$$

Next, determine a normalized element \(u_{M-1}\) of the orthogonal complement of \(u_{M}\) in \(\mathbb {V}\) with \(\langle u_{M-1},\varphi _{\ell }\rangle _{\pi }=0\), \(\ell =1,\ldots ,M-2\). Again, we can invoke the Rayleigh principle to find

$$\begin{aligned} \langle \mathcal {K}_{\tau }u_{M-1},u_{M-1}\rangle _{\pi }\le \lambda _{M-1}. \end{aligned}$$

Repeating this argument another \(M-2\) times provides an orthonormal basis \(u_{1},\ldots ,u_{M}\) of the space \(\mathbb {V}\) such that

$$\begin{aligned} \sum _{\ell =1}^{M} \langle \mathcal {K}_{\tau }u_{\ell },u_{\ell }\rangle _{\pi } \le \sum _{\ell =1}^{M}\lambda _{\ell }. \end{aligned}$$

As the Rayleigh trace is independent of the choice of orthonormal basis for the subspace \(\mathbb {V}\), and the space itself was arbitrary, this proves (15). Clearly, the maximum is attained for the first M eigenfunctions. \(\square \)

Proposition 4.2 motivates the variational approach developed in Noé and Nüske (2013), Nüske et al. (2014) to maximize the Rayleigh trace restricted to some fixed space of ansatz functions:

Proposition 4.3

Let \(\mathbb {V}\) be a space of k linearly independent ansatz functions \(\psi _{i}\) given by a dictionary as above. The set of \(M\le k\) mutually orthonormal functions \(f^{\ell }=\sum _{i=1}^{k}a_{i}^{\ell }\,\psi _{i}\) which maximize the Rayleigh trace of the Koopman operator restricted to \(\mathbb {V}\) is given by the first M eigenvectors of the generalized eigenvalue problem

$$\begin{aligned} C_{\tau }\,a^{\ell }=\hat{\lambda }_{\ell }\,C_{0}\,a^{\ell }, \end{aligned}$$
(16)

where \(a^{\ell }=\left( a_{i}^{\ell }\right) _{i=1}^{k}\), and the matrices \(C_{\tau },\,C_{0}\) are given by

$$\begin{aligned} \begin{aligned} (C_{\tau })_{ij}&=\langle \mathcal {K}_{\tau }\psi _{i},\,\psi _{j}\rangle _{\pi },\\ (C_{0})_{ij}&=\langle \psi _{i},\,\psi _{j}\rangle _{\pi }. \end{aligned} \end{aligned}$$

Proof

First, note that for any functions \(f=\sum _{i=1}^{k}a_{i}\,\psi _{i}\) and \(g=\sum _{i=1}^{k}b_{i}\,\psi _{i}\), we have that

$$\begin{aligned} \begin{aligned} \langle \mathcal {K}_{\tau }f,g\rangle _{\pi }&=a^{T}C_{\tau }\,b,\\ \langle f,g\rangle _{\pi }&=a^{T}C_{0}\,b. \end{aligned} \end{aligned}$$

Let us assume that the ansatz functions are mutually orthonormal, i.e., \(C_{0}=I\). Then, maximization of the Rayleigh trace is equivalent to finding M vectors \(a^{\ell }\), such that \(\left( a^{\ell }\right) ^{T}a^{\ell ^{'}}=\delta _{\ell \ell ^{'}}\) and

$$\begin{aligned} \sum _{\ell =1}^{M}\left( a^{\ell }\right) ^{T}C_{\tau }\,a^{\ell }=\sum _{\ell =1}^{M}\langle C_{\tau }\,a^{\ell },a^{\ell }\rangle \end{aligned}$$

is maximal. By Proposition 4.2 applied to the operator \(C_{\tau }\) on \(\mathbb {R}^{N}\), the vectors \(a^{\ell }\) are given by the first M eigenvectors of \(C_{\tau }\). In the general case, transform the basis functions into a set of mutually orthonormal functions \(\tilde{\psi }_{i}\) via \(\tilde{\psi }_{i}=\sum _{j=1}^{k}C_{0}^{-1/2}(j,i)\,\psi _{j}\). For the transformed basis, we need to compute the eigenvectors \(\tilde{a}^{\ell }\) of

$$\begin{aligned} C_{0}^{-1/2}C_{\tau }C_{0}^{-1/2}\,\tilde{a}^{\ell }=\hat{\lambda }_{\ell }\,\tilde{a}^{\ell }. \end{aligned}$$

This is equivalent to the generalized eigenvalue problem (16), the relation between the eigenvectors is given by

$$\begin{aligned} a^{\ell } = C_{0}^{-1/2}\,\tilde{a}^{\ell }. \end{aligned}$$

\(\square \)

5 Conclusion

In this review paper, we established connections between different data-driven model reduction and transfer operator approximation methods developed independently by the dynamical systems, fluid dynamics, machine learning, and molecular dynamics communities. Although the derivations of these methods differ, we have shown that the resulting algorithms share many similarities.

DMD, TICA and MSMs are popular methods to approximate the dynamics of high-dimensional systems. Due to their simple basis functions, they conduct relatively rough approximations, but when only a few spectral components are required, the approximation error can be controlled by choosing sufficiently large lag times \(\tau \) (Sarich et al. 2010). The more general methods VAC and EDMD are better suited to obtain accurate approximations of eigenfunctions. However, to ensure such an accurate approximation, one would have to deploy multiple basis functions in all coordinates and their combinations, which is unfeasible for high-dimensional systems, and would also lead to overfitting when estimating the eigenfunctions of the Koopman operator from a finite data set (McGibbon and Pande 2015).

A natural approach to mitigate these problems is to construct an iterative or “deep” approach in which the dynamical systems subspace in which a high resolution of basis functions is required is found by multiple successive analysis steps. A common approach is to first reduce the dimension by an inexpensive method such as TICA, in order to have a relatively low-dimensional space in which the eigenfunctions are approximated with a higher-resolution method. Another possibility is to exploit low-rank tensor approximations of transfer operators and their eigenfunctions. Tensor- and sparse-grid-based reformulations of some of the methods described in this paper can be found in Nüske et al. (2016), Klus and Schütte (2016), Klus et al. (2016), and in Junge and Koltai (2009), respectively. The efficiency of these tensor decomposition approaches depends strongly on the coupling structure; strong coupling between different variables typically leads to high ranks. Furthermore, some tensor formats also depend on the ordering of variables and a permutation of the variable’s indices would lead to different tensor decompositions. Yet another approach might be to exploit sparsity-promoting methods using \(L_{1}\)-regularization techniques. Basis functions that are not required to represent the eigenfunctions of an operator can thus be eliminated and refined adaptively. Moreover, dictionary-learning methods could be applied to learn a basis set and to adapt the dictionary to specific data (Mairal et al. 2009). Future work includes evaluating and combining different dimensionality reduction, tensor decomposition, and sparsification methods to mitigate the curse of dimensionality.