1 Introduction

Many mathematical models of dynamical systems, across the sciences, are based on ordinary and stochastic differential equations (ODEs and SDEs, respectively), with a large number of degrees of freedom, often with dynamics at very different timescales. These systems offer multiple significant challenges to their simulation and understanding, which often require collecting a large number of long trajectories to capture a wide variety of possible behaviors of the system. These challenges include: (1) the high dimension of the state space and the corresponding large number of equations; (2) many fast/stiff modes, corresponding to very rapid fluctuations (e.g., solvent molecules around a protein); and (3) metastability, with trajectories dwelling for large times in certain regions of state space (metastable states), with rare transitions between them.

These challenges often compound in a single system, making the large scale (in state space and in time) phenomena of the system, which are often of interest in applications, difficult to capture and study. Some of the properties that one often wishes to capture include the invariant manifold, around which trajectories lie; the stationary distribution, describing the large-time distribution of the system in state space; the residence times describing the distribution (or its expectation) of the time spent in a metastable state M before leaving it, once started in M; the transition rates and transition paths, containing information about the expected time and most likely paths followed by the system when transitioning between metastable states; the reaction coordinates, representing low-dimensional observables whose dynamics is approximately Markovian and predictive of transitions between metastable states; and the leading eigenvalues and eigenvectors of the generator of the process, related to transition rates and reaction coordinates, respectively (Coifman et al. 2005, 2008; Husic and Pande 2018; Klus et al. 2018; Rohrdanz et al. 2013; Bittracher et al. 2018; Kutz et al. 2016; Weinan and Eric 2004; Leimkuhler and Matthews 2015; Legoll and Lelièvre 2010, 2012; Givon et al. 2004; Alexander and Giannakis 2020).

These objects of interest intertwine geometry and dynamics and are our focus in this work: we aim at jointly estimating crucial geometric objects, such as a low-dimensional invariant manifold \( {\mathcal {M}}_{{\epsilon }}\), and effective dynamics of the system at and above a given timescale, via estimated stochastic equations, that average out complex, high-dimensional aspects of the dynamics below that timescale. This reduced model can be more amenable to faster simulation, with low-dimensional equations and time-steps much larger than those needed by a simulator of the original system.

As with any type of model reduction, loss of information is in general unavoidable, and it has possibly dramatic consequences, among which loss of Markovianity, and to a loss of accuracy in predictions by the reduced model, especially at large times. Our approach aims at reducing these problems, at least on a suitable class of systems. We consider the problem of nonlinear model reduction for stochastic systems that, while presenting the above challenges, have features that are possibly redeeming, if appropriately exploited: fast and slow modes of evolution of the system, with a non-negligible separation of timescales; a low-dimensional invariant manifold, onto which the dynamics may be projected by averaging the fast modes, while preserving information about the large-scale/time phenomena; fast modes that may be linearized, but may be high-dimensional, and have large magnitude, with varying direction relative to the invariant manifold. The objects we need to estimate are nonlinear: the invariant manifold, the corresponding reduction map onto it, and the effective stochastic equations on it.

The modality in which we have access to ground-truth trajectory data is important for algorithmic, statistical and computational considerations. We assume to be given access to the system via a black-box simulator \(\mathcal {S}\) taking time-steps \(\delta t\lesssim \epsilon \) (to resolve the fast modes, whose derivative has the order \(1/\epsilon \)), that we can call to obtain only short trajectories, of length-in-time of order \(\tau \gg \epsilon \), where \(\tau \) is typically of the order of the relaxation timescale of the fast modes. From a given initial condition, we use the simulator \(\mathcal {S}\) to obtain a burst of N short paths, each of time length \({{O}}(\tau )\). This now classical setup (Frewen et al. 2009a) enables trivial parallelization, across initial conditions and paths from each initial conditions, and is well-suited to applications (Liu et al. 2015; Kim et al. 2015; Leimkuhler and Matthews 2015; Dietrich et al. 2021). A crucial problem is how initial conditions for these bursts are made available. When they cannot be chosen by the estimation procedure, they may be modeled as randomly drawn, ideally from a probability measure on state space that is reasonably well-distributed over the state space: it is then straightforward to estimate how many initial conditions needs to be sampled to guarantee, with high probability, coverage (see, e.g., Crosskey and Maggioni 2017). When the initial conditions may be chosen by the estimation procedure, a natural exploration–exploitation dilemma arises: refining the estimates in a region of space already populated by initial conditions by sampling more initial conditions and paths, or generate paths from initial conditions “outside” from the parts of state space already visited? And how to generate the latter? Especially when the state space is high-dimensional, and the dynamics of interest is along a low-dimensional invariant manifold, it is not trivial to sample new initial conditions. Even more so in applications, where many physical constraints are often extremely complex and unknown. We develop a simple approach, called “exploration mode”, discussed below, that addresses both the exploration–exploitation dilemma and the problem of generating new initial conditions “outside” the already-visited state space; all needed is a small number (e.g., 1) of initial conditions; our numerical examples will be run in “exploration mode”.

From the observations of N paths from an initial condition, we estimate locally the invariant manifold \( {\mathcal {M}}_{{\epsilon }}\), the effective local directions of the fast modes and an oblique projection along them onto \( {\mathcal {M}}_{{\epsilon }}\), and an effective drift and diffusion coefficient to be used for an effective Itô diffusion on \( {\mathcal {M}}_{{\epsilon }}\). In these steps, it is crucial to avoid the curse of dimensionality, which would demand a number of observations exponential in the high dimension D of the state space. We achieve this by using simple parametric models for these local estimators, and prove that the sampling requirement scale favorably linearly in D. We then piece together these local estimators of the effective dynamics at timescales \(\tau \), to obtain a global estimate of \( {\mathcal {M}}_{{\epsilon }}\), of a nonlinear projection onto \( {\mathcal {M}}_{{\epsilon }}\), and of a process on \( {\mathcal {M}}_{{\epsilon }}\) called ATLAS.

The ATLAS stochastic process \(({\textbf{z}_t^{\mathcal {A}}})_{t\ge 0}\) takes place on the estimated \( {\mathcal {M}}_{{\epsilon }}\), and aims at reproducing, at timescale \(\tau \) and above, the dynamics of the original process on \( {\mathcal {M}}_{{\epsilon }}\), after averaging out the fast modes. The ATLAS process may be simulated much more efficiently than the original process, as it is lower dimensional and amenable to be simulated with time-steps of size \(\tau \gg {\epsilon }\), instead of size \(\lesssim \epsilon \) as in \(\mathcal {S}\). We demonstrate this construction numerically on several systems that display several different salient features: nonlinear slow manifolds, lack of a global map to globally linear slow and fast variables, linear and curvilinear fast modes, with and without a clear separation of timescales between fast and slow modes.

As we mentioned above, in many applications (e.g., in molecular dynamics) a “large enough” set of initial conditions, at which to collect bursts of paths, is not known. A key contribution of this work is to introduce a construction for ATLAS in “exploration mode”, where we initially construct ATLAS from a very small number of initial conditions, and update it on the fly by collecting new bursts of simulations from \(\mathcal {S}\), started at automatically well-chosen initial conditions, whenever ATLAS trajectories leave an ever-increasing “domain of competency” of the current ATLAS. This yields an increasing family of ATLASes, each consistent with the previous ones and with the original dynamics, on ever-increasing subsets of the state space, without over-sampling already explored regions. To efficiently and consistently explore the effective state space for the system is a crucial ability of ATLAS, achieved with techniques very different from existing ones, which typically are based on biasing the dynamics, and trading exploration with fidelity to the original dynamics (Chiavazzo et al. 2017; Frewen et al. 2009b; Tribello et al. 2014; Zheng et al. 2013; Chen et al. 2015).

All our numerical experiments are run exclusively in exploration mode and demonstrate that ATLAS accurately reproduces features of the dynamics at medium and large timescales and enables the efficient construction of Markov state models (MSMs), and of approximations of important observables, such as eigenvalues and eigenvectors of the generator of the dynamics. These in turn may be used for further reduction of the dynamics at very large timescales, estimating transition rates, and yielding low-dimensional embeddings of \( {\mathcal {M}}_{{\epsilon }}\).

2 Fast–Slow SDEs with Slow Nonlinear Manifolds

A classical model of fast–slow SDEs is

$$\begin{aligned} \left\{ \begin{aligned}&\textrm{d}\textbf{x}_t = g(\textbf{x}_t, \textbf{y}_t) \textrm{d}t + G(\textbf{x}_t, \textbf{y}_t)\, \textrm{d}U_t \\&\textrm{d}\textbf{y}_t = \frac{1}{{\epsilon }} f(\textbf{x}_t, \textbf{y}_t) \textrm{d}t + \frac{1}{\sqrt{{\epsilon }}}F(\textbf{x}_t, \textbf{y}_t)\, \textrm{d}V_t \end{aligned} \right. , \end{aligned}$$
(2.1)

where \({\epsilon }>0\) is a small parameter, determining the separation of timescales, \(\textbf{y}_t\in {\mathbb {R}}^{D-d}\) and \(\textbf{x}_t\in {\mathbb {R}}^{d}\) are, respectively, the fast and slow variables, and \((U_t)_{t\ge 0}\), \((V_t)_{t\ge 0}\) are independent Wiener processes in \({\mathbb {R}}^d\) and \({\mathbb {R}}^{D-d}\) respectively. The drift coefficients fg and the diffusion coefficients FG are assumed to be regular, e.g., twice-differentiable. Systems governed by this type of equations have been extensively studied (Pavliotis and Stuart 2008; Berglund and Gentz 2006; van Kampen 1981; Gardiner 2009). We are interested in the situation, common in applications, where the ambient dimension D is much larger than “intrinsic” dimension d of the slow variables. The time-step \(\delta t\) in the original simulator \(\mathcal {S}\) of Eq. (2.1) is typically \(\lesssim {\epsilon }\) to ensure accuracy and stability of the numerical scheme, making it computationally onerous. This constraint on the time-step is generally applicable only to explicit schemes, motivating continued research in implicit schemes, which, however, in high dimensions still appear to be computationally prohibitive. While the techniques we introduce are also applicable to ODEs with minor changes, including for example substituting bursts of stochastic paths by bursts of deterministic paths started by stochastically perturbed initial conditions, we focus on SDEs to streamline the presentation. Also, recall that fast–slow ODEs may be approximated by SDEs, at least in the limit \({\epsilon }\rightarrow 0\) (Pavliotis and Stuart 2008).

2.1 The Slow Manifold and Averaged Equations on it

The fast variable \(\textbf{y}\) is assumed to relax, at a timescale \(O(\tau )\), and stay close to the slow manifold \( {\mathcal {M}}^{\textbf{x}}_{0}:=\{(\textbf{x},\textbf{y}^\star (\textbf{x})): f(\textbf{x},\textbf{y}^\star (\textbf{x}))=0 \}\) of the corresponding deterministic system (with \(F,G\equiv 0\)), if the slow manifold is asymptotically stable. Geometric singular perturbation theory implies the existence of an invariant manifold \( {\mathcal {M}}^{\textbf{x}}_{{\epsilon }}\), close to \( {\mathcal {M}}^{\textbf{x}}_{0}\), see Berglund and Gentz (2006), Berglund and Gentz (2003), Kuehn (2015) and Appendix A.1. Under suitable further conditions, one then obtains a reduced set of equations on \( {\mathcal {M}}^{\textbf{x}}_{{\epsilon }}\), with the drift and diffusion coefficients on \( {\mathcal {M}}^{\textbf{x}}_{{\epsilon }}\) obtained by locally averaging, at each \(\textbf{x}_0\in {\mathcal {M}}^{\textbf{x}}_{{\epsilon }} \), those in Eq. (2.1) against the conditional invariant measure \(\nu (\textbf{y}|\textbf{x}=\textbf{x}_0)\) of the fast modes (Pavliotis and Stuart 2008; Berglund and Gentz 2006). The technical assumptions needed can be far from trivial, e.g., often G is assumed independent of \(\textbf{y}\) (Yu and Veretennikov 1991; Givon et al. 2006; Givon 2007). The reduced equations are

$$\begin{aligned} \begin{aligned} \textrm{d}\bar{\textbf{x}}_t = \bar{g}(\bar{\textbf{x}}_t)\textrm{d}t + \bar{G}(\bar{\textbf{x}}_t) \textrm{d}U_t,\qquad \bar{\textbf{y}}_t = {\bar{\textbf{y}}}(\bar{\textbf{x}}_t,{\epsilon }), \end{aligned} \end{aligned}$$
(2.2)

which define a process on the invariant manifold \( {\mathcal {M}}^{\textbf{x}}_{{\epsilon }}\), for small \(\epsilon \) (see Appendix A.3 for details). Having averaged out the fast variables, the reduced dynamics deliberately lose information about the details of the dynamics of the fast variables and phenomena below the timescale \(\tau \), but it yields a low-dimensional process (in the regime of interest \(d\ll D\)), that reproduces the effective dynamics of the original system on \( {\mathcal {M}}^{\textbf{x}}_{{\epsilon }}\) at timescale of order \(O(\tau )\) and, ideally, beyond.

2.2 Nonlinear Observations and Unknown Slow/Fast Variables

In the model in Eq. (2.1), the slow and fast variables are the given, linear, and orthogonal coordinates \(\textbf{x}\) and \(\textbf{y}\). In applications, however, the slow and fast variables are typically not known a priori and need to be identified, and in general they are neither linear nor orthogonal (Wechselberger 2020).

This motivates the following observation model. We view the system in Eq. (2.1) as a black-box latent local model: Black-box because equations are not available to us. Latent because we do not have access to \(\textbf{x}\) and \(\textbf{y}\), but to observations \(\textbf{z}\), ranging in \(\Omega \subseteq \mathbb {R}^{D}\), which can be mapped to latent variables \((\textbf{x},\textbf{y})\in \mathbb {R}^D\), satisfying Eq. (2.1). Local because such a map is not a global map, but is in fact realized by a collection of charts \(\{(\mathcal {U}_\alpha ,\varphi _\alpha )\}_\alpha \), consisting of open neighborhoods \(\{\mathcal {U}_\alpha \}_\alpha \) covering \(\Omega \) and smooth maps \(\varphi _\alpha :\mathcal {U}_\alpha \rightarrow \mathbb {R}^D\), each invertible on its range and such that \(\varphi _\alpha \circ \varphi _{\alpha '}^{-1}\) is smooth where defined, so that for every point \(\textbf{z}\in \Omega \) the local latent variables \((\textbf{x},\textbf{y})=\varphi _\alpha (\textbf{z})\) satisfy Eq. (2.1), for \(\mathcal {U}_\alpha \ni \textbf{z}\) (there exists one such \(\mathcal {U}_\alpha \) since \(\{\mathcal {U}_\alpha \}_\alpha \) covers \(\Omega \)). Geometrically, this is of course the natural setup for expressing that the observations \(\textbf{z}\) are on a manifold parametrized by a system of charts (called an atlas, in differential geometry). This geometric perspective is here merged with the dynamics, through the condition that the local parametrizations map the dynamics of the observed variables to a dynamics where the latent variables follow the model equations (2.1). This model is inspired and generalizes that of Singer et al. (2009) and Wechselberger (2020), where the aim was to discover an embedding of the underlying slow variables, via the lowest frequency eigenfunctions of an estimated generator of the whole process, and not necessarily a parametrization of the invariant manifold, nor effective equations on it. That approach is broadly applicable to a larger class of processes than ours, for example when the fast modes are highly nonlinear; however this comes at the price of falling victim of the curse of dimension, requiring sampling paths from an exponentially large number of initial conditions (this is not discussed in Singer et al. 2009, but it could be derived). We shall estimate the slow variables and effective equations directly, without first learning the detailed behavior of the high-dimensional fast variables, and subsequently reduce the dynamics via eigen-decompositions.

In the observed variables \(\textbf{z}\), the process \((\textbf{x}_t,\textbf{y}_t)_{t\ge 0}\) maps to a process \((\textbf{z}_t)_{t\ge 0}\), where slow and fast variables are in general nonlinearly mixed, instead of being linear and orthogonal as in Eq. (2.1). The slow variables \(\textbf{z}^{\text {slw}}\) will lie on a nonlinear invariant manifold \( {\mathcal {M}}_{{\epsilon }}\) of dimension d; trajectories will lie in a domain of concentration around \( {\mathcal {M}}_{{\epsilon }}\) that we model as a non-self-intersecting tube around \( {\mathcal {M}}_{{\epsilon }}\). \( {\mathcal {M}}_{{\epsilon }} \) is close to a slow manifold \( {\mathcal {M}}_{0}\) (which in general is not the image of \( {\mathcal {M}}^{\textbf{x}}_{0} \) under the maps \(\varphi _\alpha ^{-1}\)). Locally around the initial point \(\textbf{z}_0\in {\mathcal {M}}_{{\epsilon }} \), one may linearize the equations for \((\textbf{z}_t)_{0\le t\lesssim \tau (\textbf{z}_0)}\) to a form similar to Eq. (2.1), with \(\textbf{x}\) replaced by slow variables \(\textbf{z}^{\text {slw}}\) and \(\textbf{y}\) replaced by fast variables \(\textbf{z}^{\text {fst}}\). Under the same linearization, \(\textbf{z}^{\text {slw}}\) is approximated as lying in the tangent space \(\smash {T_{\textbf{z}_0} {\mathcal {M}}_{{\epsilon }}}:=\mathrm {span(col(} U^{{\textrm{slw}}}_{d}))\) to \( {\mathcal {M}}_{{\epsilon }}\) at \(\textbf{z}_0\), and \(\textbf{z}^{\text {fst}}\) is approximated as lying in \(\smash {\mathbb {V}^{\text {fst}}_{\textbf{z}_0}:=\mathrm {span(col(} V^{{\textrm{fst}}}_{D-d}))}\). The slow and fast directions \(\smash { U^{{\textrm{slw}}}_{d}, V^{{\textrm{fst}}}_{D-d}}\) in general vary, smoothly, with \(\textbf{z}_0\). \(( {\mathcal {M}}_{{\epsilon }}-\textbf{z}_0)\) is locally a graph of a function \(\overline{\textbf{z}}^{\text {fst}}_{\epsilon }(\cdot ;\textbf{z}_0): \smash {T_{\textbf{z}_0} {\mathcal {M}}_{{\epsilon }} \rightarrow \mathbb {V}^{\text {fst}}_{\textbf{z}_0}}\) over the slow variables \(\textbf{z}^{\text {slw}}\). We can then proceed to the reduction to equations in the slow variables \(\textbf{z}^{\text {slw}}\) only, in a form similar to Eq. (2.2), by averaging the fast variables at a prescribed timescale \(\tau \), obtaining a reduced process on \( {\mathcal {M}}_{{\epsilon }}\).

2.3 Structure of the Local Reduced Effective Equations

We assumed that the deviation of the fast variable from the invariant manifold \( {\mathcal {M}}_{{\epsilon }}\) lies, exactly or approximately, in \(\mathbb {V}^{\text {fst}}_{\textbf{z}_0}\); in this subspace let it be given by \({{\varvec{\xi }}_t:= \textbf{z}^{\text {fst}}_t- \overline{\textbf{z}}^{\text {fst}}_{\epsilon }(P(\textbf{z}^{\text {slw}}_t);\textbf{z}_0)}\) with \(t\lesssim \tau (\textbf{z}_0)\), where P is the projection onto \(\smash {T_{\textbf{z}_0} {\mathcal {M}}_{{\epsilon }}}\) with kernel \(\mathbb {V}^{\text {fst}}_{\textbf{z}_0}\). The idea of averaging in fast–slow systems (Pavliotis and Stuart 2008; Freidlin et al. 2012; Givon et al. 2004; Berglund and Gentz 2006; van Kampen 1981) exploits the timescale separation between slow and fast variables: at the separation timescale \(\tau (\textbf{z}_0)\), the dynamics of \({\varvec{\xi }}_t\) conditioned at \(\textbf{z}^{\text {slw}}_t=\textbf{z}_0\) reaches its quasi-equilibrium distribution \(\nu ({\varvec{\xi }}|\textbf{z}^{\text {slw}}_t=\textbf{z}_0)\), which we approximate by a \((D-d)\)-dimensional Gaussian distribution \({\mathcal {N}}(0, \Xi (\textbf{z}_0))\). If the trace of \(\Xi (\textbf{z}_0)\) is large, the fast oscillations of \({\varvec{\xi }}_t\) around \( {\mathcal {M}}_{{\epsilon }}\) have large expected amplitude.

We discuss in Sect. 3 how \(\tau (\textbf{z}_0)\) may be estimated from observations of short trajectories. We assume throughout, but only in order to simplify the presentation, that \(\tau (\textbf{z}_0)\) can be chosen to be the same at all locations \(\textbf{z}_0\): we simply denote it as \(\tau \), and is assumed as given. By stochastic averaging, in these coordinates, the reduced stochastic dynamics on the slow variables is obtained by averaging the drift and diffusion terms by this quasi-equilibrium distribution, leading to reduced SDEs

$$\begin{aligned} \textrm{d}\bar{\textbf{z}}^{\text {slw}}_t = b(\bar{\textbf{z}}^{\text {slw}}_t)\textrm{d}t + H(\bar{\textbf{z}}^{\text {slw}}_t)\textrm{d}U_t, \end{aligned}$$
(2.3)

similar to those in Eq. (2.2). These SDEs may be viewed in intrinsic coordinates, or in Cartesian coordinates in the ambient space \(\mathbb {R}^D\), with \(\textbf{z}^{\text {slw}}_t\in \mathbb {R}^D\) but on \( {\mathcal {M}}_{{\epsilon }}\), \(b\in \mathbb {R}^D\) a vector field on \( {\mathcal {M}}_{{\epsilon }}\), and \(H\in \mathbb {R}^{D\times d}\) acting on a Wiener process \(U_t\) in \(\mathbb {R}^d\).

In classical stochastic averaging, it considers \(\epsilon \rightarrow 0\) and the separation timescale \(\tau \) is typically of order \({\epsilon }\), see, e.g., Givon et al. (2006); Givon (2007), Liu (2010) and Zhang et al. (2018). Here, instead, motivated by applications, we consider \(\epsilon \) fixed, unknown and unused, and \(\tau \) fixed, known or estimated, and larger than \({\epsilon }\), and independent thereof. The dynamics of \(\textbf{z}^{\text {slw}}_t\) is low-dimensional, taking place on \( {\mathcal {M}}_{{\epsilon }}\), and represents the reduced effective dynamics at timescales \(\tau \) and beyond, having averaged out the fast transients of the high-dimensional process \(({\varvec{\xi }}_t)_{t\ge 0}\) at timescales \(\lesssim \tau \). Simulating \(\textbf{z}^{\text {slw}}_t\) requires a time-step independent of \(\epsilon \), often much larger than \({\epsilon }\), and only dependent on the regularity of \( {\mathcal {M}}_{{\epsilon }}\) and of the regularity of the effective drift b and diffusion coefficient H on \( {\mathcal {M}}_{{\epsilon }}\).

Our goal is to estimate a process, called ATLAS, that approximates \(\bar{\textbf{z}}^{\text {slw}}_t\), on an estimated \( \hat{{\mathcal {M}}}_{{\epsilon }}\), given observations of bursts of short trajectories; in all our examples the simulator of ATLAS will take time-steps exactly equal to \(\tau \).

2.4 ATLAS: Learning a Reduced Effective Model

Given observations of multiple bursts of short trajectories, of time length \({{O}}(\tau )\), around each of a collection of initial points \(\{\textbf{z}^{{l}}_0\}_{l=1,\dots ,L}\subset \mathbb {R}^D\), we estimate: the local slow variables by estimating a point \(\textbf{z}^l\) on \( {\mathcal {M}}_{{\epsilon }}\) and a local tangent space \(T_{\textbf{z}^l} {\mathcal {M}}_{{\epsilon }} \) to \( {\mathcal {M}}_{{\epsilon }}\) at \(\textbf{z}^l\); a subspace \(\mathbb {V}^{\text {fst}}_{\textbf{z}^l}\) transversal to \( {\mathcal {M}}_{{\epsilon }}\) at \(\textbf{z}^l\) containing the linearized fast modes \({\varvec{\xi }}\); effective drift and diffusion coefficients for \(\textbf{z}^{\text {slw}}_t\) in \(T_{\textbf{z}^l} {\mathcal {M}}_{{\epsilon }} \) as in Eq. (2.3) for the effective dynamics of the slow variables around \(\textbf{z}^l\) at the timescale \(\tau \). These objects are completely local, around each \(\textbf{z}^l\). Since a global reduction step bringing the equations to the standard form Eq. (2.1) may not be possible, for example because of global topological obstructions (e.g., a slow manifold consisting of a circle cannot be mapped globally to a linear coordinate), in the spirit of the very definition of manifolds and their atlases, we will “glue” together the estimated local charts and equations into a set of charts and smoothly coordinated equations, generating a process, called ATLAS, and a corresponding simulator for obtaining global paths on the estimated invariant manifold. ATLAS, given enough data and under suitable assumptions on the dynamics, estimates in a consistent and accurate fashion the local dynamics and its statistics, we also demonstrate in our numerical experiments that important long-time observables, including the stationary distribution and mean residence times in regions, metastable and not, of state space, are accurately estimated by ATLAS.

Algorithm 1
figure a

High-level pseudo-code for ATLAS construction

3 ATLAS Construction

During construction, ATLAS is assumed to have access to a black-box simulator \(\mathcal {S}\), that takes as input an initial condition \(\textbf{z}_0\) and a time \(t_0\), typically \({{O}}(\tau )\), and returns a path \((\textbf{z}_t)_{t\in [0,t_0]}\), driven by the latent equations as in Eq. (2.1). The construction proceeds in multiple steps, see Algorithm 1.

Before describing the details, we present an example.

Example: Fast/slow system around a pinched sphere. This system is used as a reference throughout our discussion, construction and testing of ATLAS. The process is an Itô diffusion on a “smoothly pinched” two-dimensional sphere centered at the origin (the invariant manifold \( \hat{{\mathcal {M}}}_{{\epsilon }} \subseteq \mathbb {R}^3\), see Fig. 2), perturbed by very rapid fluctuations in the radial direction. These fast modes are (a) large (equal to a significant fraction of the reach of \( {\mathcal {M}}_{{\epsilon }}\)); (b) a.e. not orthogonal to \( {\mathcal {M}}_{{\epsilon }}\), see Figs. 1 (step 3), and 7; and (c) may be approximated by a radial Ornstein–Uhlenbeck (O-U) process. For these reasons, a local PCA of an ensemble of short trajectories would fail to estimate the local tangent plane to \( {\mathcal {M}}_{{\epsilon }}\). Given \(\tau \), at least as large as the timescale of relaxation of the fast modes, the correct effective equations on \( {\mathcal {M}}_{{\epsilon }}\) should be obtained by averaging along an appropriate oblique (radial, in this case) projection onto \( {\mathcal {M}}_{{\epsilon }}\).

Fig. 1
figure 1

High-level overview of the steps in the construction of ATLAS, for a system exhibiting fast large oscillations around an invariant manifold \( {\mathcal {M}}_{{\epsilon }}\) shaped as a pinched sphere. 1. Sample initial conditions, according to a given probability measure. Typically initial conditions (ICs) do not lie on \( {\mathcal {M}}_{{\epsilon }}\) (see zoom-in portion: IC represented by orange dot is not on \( {\mathcal {M}}_{{\epsilon }}\)), nor are they well-distributed throughout state space. A sample path of the original system is shown, which oscillates around \( {\mathcal {M}}_{{\epsilon }}\) with large amplitude. 2. Simulate short bursts: from each IC, in parallel, a burst of short trajectories, of time length comparable to the relaxation timescale \(\tau \) of the fast modes, is obtained from a black-box simulator of the original system (if \(\tau \) is not given, it may be estimated from these trajectories; in this example \(\tau =200\delta t\)). In this example the fast modes have large amplitude, but our technique will correctly determine fast and slow directions. 3. Estimate local geometry and dynamics: from the trajectories in each burst, the local geometry of \( {\mathcal {M}}_{{\epsilon }}\) (including a landmark \(\hat{\textbf{z}}^{{l}}\) on \( \hat{{\mathcal {M}}}_{{\epsilon }}\), an affine tangent space \(\smash {\hat{T}_{\hat{\textbf{z}}^{{l}}} {\mathcal {M}}_{{\epsilon }} }\), and an oblique projection \(\hat{P}^{{ l}} {({\textbf{z}})} \) onto it with kernel \(\smash {\mathrm {span(col(}\hat{V}^{{l,\textrm{fst}}}_{D-d}}\)), as well as parameters of the reduced effective dynamics (drift \(\hat{\textbf{b}}^{{l}}\) and diffusion coefficient \({\hat{\Lambda }^{{l}}_d}\)), are estimated. 4. Glue into global ATLAS simulator: the local geometry and dynamics estimators are glued together into a global ATLAS, with an associated simulator of a reduced effective ATLAS process \(({\textbf{z}_t^{\mathcal {A}}})_{t\ge 0}\). We display estimated local tangent ellipses corresponding to balls in a diffusion-induced local metric, a sample path from the ATLAS simulator. However, at this stage ATLAS may still not cover \( {\mathcal {M}}_{{\epsilon }}\): in “exploration mode” it will simulate paths that stop when reaching the boundary of the current “domain of expertise” (orange point), collect new bursts at those locations (as in step 2), update ATLAS with the estimators from the new region (as in step 3), and then resume the simulation(Color figure online)

We remark that neither the original system (in \(\mathbb {R}^3\)) nor the slow system on \( {\mathcal {M}}_{{\epsilon }}\) is driven by overdamped Langevin equations: the drift is not the gradient of a potential, the diffusion coefficient is not constant, and the process is not reversible. Therefore, methods-based approximations by an overdamped Langevin equations, such as those in Coifman et al. (2005), Coifman et al. (2008), Singer et al. (2009) and Rohrdanz et al. (2011), would be biased, and likely inaccurate. The effective dynamics on \( {\mathcal {M}}_{{\epsilon }}\) has two high probability regions, separated by regions of large volume where drift is small compared to diffusion (“entropic barriers”), which could make standard approximations of those inaccurate (Bicout and Szabo 2000).

To give some intuition about local geometric and dynamical quantities that play a fundamental role in this system, and more generally for systems that motivate our constructions, we show in Fig. 7 a portion of the \( \hat{{\mathcal {M}}}_{{\epsilon }}\) about a point \(\textbf{z}_0\), a corresponding trajectory of the system started at \(\textbf{z}_0\), and several key directions in \(\mathbb {R}^3\): the normal to \( \hat{{\mathcal {M}}}_{{\epsilon }}\) at \(\textbf{z}_0\), the estimated effective direction of the fast modes at timescale \(\tau \), which is significantly different from the normal direction. We also depict the direction of the estimated effective (Itô) drift, which, as expected, is not (and, in fact, far from being) tangent to \( \hat{{\mathcal {M}}}_{{\epsilon }}\). These depicted objects are exactly those estimated in the ATLAS construction, from local bursts of simulations.

Finally, the global geometric approximation of \( {\mathcal {M}}_{{\epsilon }}\) and effective ATLAS process are assembled. An ATLAS path, used for exploration, is shown in Fig. 1; the accuracy of ATLAS is demonstrated in various metrics, from the geometric approximation of \( {\mathcal {M}}_{{\epsilon }}\) to the approximation of effective drift and diffusion coefficients, to the accuracy of estimation of statistics of the process such as mean residence time in relatively small regions of state space and in metastable states (see Sect. 7.1).

Fig. 2
figure 2

Representation of the estimated pinched sphere \( \hat{{\mathcal {M}}}_{{\epsilon }}\), together with the landmarks \(\{\hat{\textbf{z}}^{{l}}\}_l\) (blue dots) and their local connectivity graph, all as constructed by ATLAS during long exploration. The dynamics in this case is not reversible, and the fast modes have large standard deviation (comparable to the reach of \( \hat{{\mathcal {M}}}_{{\epsilon }}\)) and are a.e. not orthogonal to \( \hat{{\mathcal {M}}}_{{\epsilon }}\) (see Fig. 7). The surface color is the norm of the effective drift as a function on \( \hat{{\mathcal {M}}}_{{\epsilon }}\). The landmarks are about \(10^{-2}\)-close to \( \hat{{\mathcal {M}}}_{{\epsilon }}\), see Table 4. The regions around the poles are very rarely visited, as the drift is unbounded there, creating a repulsion. In this visualization we have truncated those regions as landmarks become denser and denser. The landmarks marked in red and cyan represent the regions have high probability (Color figure online)

3.1 Main Steps in the Construction

We are given access to a black-box simulator \(\mathcal {S}\) of the process \(\textbf{z}_t\), a probability measure \(\mu _0\) on the state space of the system, a separation timescale \(\tau \), and a dimension d for the invariant manifold \( {\mathcal {M}}_{{\epsilon }}\). We will discuss later how to proceed in the very important case when \(\mu _0\) is not, or insufficiently, provided (“exploration mode”, Sect. 4), and how to estimate \(\tau \) (in Sect. 3.3) and d (in Appendix C). We output ATLAS, consisting of a process \(({\textbf{z}_t^{\mathcal {A}}})_{t\ge 0}\), and a corresponding simulator, approximating the effective dynamics of \(\textbf{z}_t\) on \( {\mathcal {M}}_{{\epsilon }}\) at the timescale \(\tau \) and beyond.

We sample L initial conditions \(\{\textbf{z}^{{l}}_0\}_{l=1,\dots ,L}\sim _{\text {i.i.d.}}\mu _0\), and for each l we use \(\mathcal {S}\) to obtain a burst \(\mathcal {B}^l\) of N trajectories \(\{\textbf{z}^{{l,n}}_t\}_{n=1,\dots ,N}\), each of time length \({{O}}(\tau )\), starting at \(\textbf{z}^{{l}}_0\). The time-step \(\delta t\) of \(\mathcal {S}\) is typically \(\lesssim {\epsilon }\ll \tau \) and we may think of the output of \(\mathcal {S}\) as if it was in continuous time. For each l, we focus now on the local construction around \(\textbf{z}^{{l}}_0\), given the single burst \(\mathcal {B}^l\): at timescale \(\tau \), the invariant manifold is locally approximated by an estimated effective tangent space \(\hat{T}_{\hat{\textbf{z}}^{{l}}} {\mathcal {M}}_{{\epsilon }} \) of dimension d, at a suitably estimated point \(\hat{\textbf{z}}^{{l}}\); the deviation \({\varvec{\xi }}_t^0\) from \( {\mathcal {M}}_{{\epsilon }}\) reaches equilibrium before time \(\tau \), and we approximate the dynamics of the slow variable \((\textbf{z}^{\text {slw}}_t)_{t\ge 0}\) on \( {\mathcal {M}}_{{\epsilon }}\) around \(\hat{\textbf{z}}^{{l}}\) by an Itô diffusion process on \(\hat{T}_{\hat{\textbf{z}}^{{l}}} {\mathcal {M}}_{{\epsilon }} \) as in (2.3), which requires us to estimate an affine oblique projection \(\hat{P}^{{ l}} \) along the fast modes and onto \(\hat{T}_{\hat{\textbf{z}}^{{l}}} {\mathcal {M}}_{{\epsilon }} \), an effective drift \(\hat{\textbf{b}}^{{l}}\) in \(\mathbb {R}^D\) (in the Itô formulation, the drift is in general not tangent to \( {\mathcal {M}}_{{\epsilon }}\), see Figs. 1 and 7) and an effective diffusion coefficient \({\hat{\Lambda }^{{l}}_d}\) in \(\hat{T}_{\hat{\textbf{z}}^{{l}}} {\mathcal {M}}_{{\epsilon }} \).

3.2 Local Low-Order Moments of the Dynamics

The behavior of the time-dependent mean and covariance of the process started at \(\textbf{z}^{{l}}_0\) reveals crucial local properties of the geometry of the dynamics and of the slow/fast manifolds: at times t comparable to \(\tau \) (but, typically, not smaller nor larger), we assume that these approximations hold:

$$\begin{aligned} \begin{aligned}&\textbf{m}^{{l}}_t:= {\mathbb {E}}\left[ \textbf{z}^{{l}}_t \right| \textbf{z}^{{l}}_0] = \textbf{z}^{l,\text {slw}}_0 +\textbf{b}^{{l}}t +{{O}}({\epsilon }), \\&C(\textbf{z}^{{l}}_t|\textbf{z}^{{l}}_0):= {\text {cov}}(\textbf{z}^{{l}}_t|\textbf{z}^{{l}}_0) = \Gamma ^{{l}} + \Lambda ^{{l}}t + {{O}}({\epsilon }), \end{aligned} \end{aligned}$$
(3.1)

where \(\Gamma ^{{l}}\succeq 0\) has rank \(D-d\), and represents an averaging, at timescale \(\tau \), of the covariance of the fast modes \(\Xi (\textbf{z}^{l,\text {slw}}_0)\); \(\Lambda ^{{l}}=H^{{l}}(H^{{l}})^T\succeq 0\) has rank d is the diffusivity of the effective reduced slow dynamics at \(\textbf{z}^{{l}}_0\). The span of \(\Lambda ^{{l}}\) and \(\Gamma ^{{l}}\) approximate, respectively, the tangent space \(T_{\textbf{z}^{l,\text {slw}}_0} {\mathcal {M}}_{{\epsilon }} \) to \( {\mathcal {M}}_{{\epsilon }}\) at \(\textbf{z}^{l,\text {slw}}_0\) and, respectively, \(\mathbb {V}^{\text {fst}}_{\textbf{z}^{l,\text {slw}}_0}\), which will be the kernel of a projection (in general not orthogonal) onto \(T_{\textbf{z}^{l,\text {slw}}_0} {\mathcal {M}}_{{\epsilon }} \). These expressions result from averaging the fast modes at timescale \(\tau \); in particular, the memory of the effective reduced slow dynamics is (approximately) forgotten.

The quantities above are unknown, and we estimate them from the observations from the burst \(\mathcal {B}^l\):

$$\begin{aligned} \begin{aligned} {\hat{\textbf{m}}^{{l}}_{t}} \!:=\! \frac{1}{N}\sum _{n=1}^N \textbf{z}^{{ l,n}}_t,\,\, \hat{C}^{{l}}_t\!:= \!\frac{1}{N-1}\sum _{n=1}^N (\textbf{z}^{{ l,n}}_t\!\!\!\!-{\hat{\textbf{m}}^{{l}}_{t}})(\textbf{z}^{{ l,n}}_t\!\!\!\!-{\hat{\textbf{m}}^{{l}}_{t}})^T\!. \end{aligned} \end{aligned}$$
(3.2)

These empirical quantities, estimated from burst data, are consistent estimators of the true local mean and covariance of the process, with an approximation of order \(\sqrt{\frac{d+d_f}{N}}\), where d is the dimension of \( {\mathcal {M}}_{{\epsilon }} \), and \(d_f\) is the number of fast modes of large amplitude - we discuss this further in Sect. 5.

We are now ready to introduce the ATLAS construction, which proceeds in three main steps in Algorithm 1, detailed in Sects. 3.3, 3.4, 3.5 respectively; see Appendix B for details.

3.3 Estimation of Local Parameters of the Effective Dynamics

From each burst \(\mathcal {B}_ l\), \( l=1,\dots ,L\), of short simulations we compute several key quantities for constructing an approximation to the invariant manifold \( {\mathcal {M}}_{{\epsilon }}\) and to the reduced effective stochastic dynamics on it. The relationships in Eq. (3.1) suggest that the local effective drift and diffusion coefficients for the local slow variables may be estimated from \({\hat{\textbf{m}}^{{l}}_{t}}\) and \(\hat{C}^{{l}}_t\) for \(t\approx \tau \). The diffusion coefficient should also yield an estimate for local tangent plane to the \( {\mathcal {M}}_{{\epsilon }}\), giving the local slow variables.

Fig. 3
figure 3

Estimating \(\tau \) from a burst of trajectories: \(||{\hat{\textbf{m}}^{{l}}_{t}} ||\) and \({\textrm{tr}}(\hat{C}^{{l}}_t)\) should both behave as linear functions of t at the timescale of interest, as per Eq. (3.1)

3.3.1 Separation Timescale \(\tau \)

When not given, we estimate \(\tau \) from the behavior of \(||{\hat{\textbf{m}}^{{l}}_{t}}||\) and \({\textrm{tr}}(\hat{C}^{{l}}_t)\) as a function of t: for each burst, we obtain the time interval where these two quantities behave linearly, as per Eq. (3.1) (see fig. 3). We let \([\tau _{\min {}},\tau _{\max {}}]\) be the intersection of such intervals over all l’s, which is nonempty since we assume there exists a common relaxation time of the fast modes \(\tau \) valid throughout invariant manifold (our techniques do extend to location-dependent \(\tau ^{{l}}\)).

3.3.2 Drift Coefficient of the Effective Dynamics

The estimated drift \(\hat{\textbf{b}}^{{ l}}\) is obtained as the slope (in t) of \(\textbf{m}^{{l}}_t\) in Eq. (3.1), via a weighted linear regression: with \(\{t_m\}_{m=1}^M\) equispaced in \([\tau _{\min },\tau _{\max }]\),

$$\begin{aligned} \hat{\textbf{b}}^{{l}}:= \frac{\sum _{m=1}^M({\hat{\textbf{m}}^{{l}}_{t_m}} -\bar{\textbf{m}}^{{l}}_M)(t_m-\bar{t}_M)}{\sum _{m=1}^M(t_m-\bar{t}_M)^2}, \end{aligned}$$
(3.3)

where \(\bar{\textbf{m}}^{{l}}_M:= \frac{1}{M}\sum _{m=1}^M\!\hat{\textbf{m}}^{{l}}_{t_m}\) and \(\overline{t}_M=\frac{1}{M}\sum _{m=1}^M\! t_m\). Figures 2 and 7 show norm and direction of \(\hat{\textbf{b}}^{{l}}\) for the pinched sphere system.

3.3.3 Diffusion Coefficient of the Effective Dynamics and Local Slow Variables

Similarly, the local diffusivity \(\Lambda ^{{ l}}\) of the slow effective dynamics is estimated as the slope (in t) of \(C(\textbf{z}^{{l}}_t|\textbf{z}^{{l}}_0)\) in Eq. (3.1):

$$\begin{aligned} \hat{\Lambda }^{{l}}:= \frac{\sum _{m=1}^M(\hat{C}^{{l}}_{t_m} -\bar{C}^{{l}}_M)(t_m-\bar{t}_M)}{\sum _{m=1}^M(t_m-\bar{t}_M)^2}\,, \end{aligned}$$
(3.4)

where \(\bar{C}^{{l}}_M = \frac{1}{M}\sum _{i=1}^M\hat{C}^{{l}}_{t_m}\). While \(\hat{\Lambda }^{{l}}\) is typically not low-rank, for N large enough, with high probability (w.h.p.), its top d singular values may be well-separated from the others, yielding an estimate of the intrinsic dimension of invariant manifold (a dynamics-driven analogue of Multiscale SVD Little et al. 2017); this is case in our examples (see Figs. 789). We project \(\hat{\Lambda }^{{l}}\) onto the space of rank d matrices by truncated SVD:

$$\begin{aligned} {\hat{\Lambda }^{{l}}_d}:=\textrm{Proj}_{\textrm{rk}(d)} \hat{\Lambda }^{{l}}=\hat{U}^{{l,\textrm{slw}}}_{d} \hat{\Sigma }^{l}_{d}(\hat{U}^{{l,\textrm{slw}}}_{d})^T, \end{aligned}$$
(3.5)

where \(\textrm{Proj}_{\textrm{rk}(d)} \) denotes the projection onto rank d matrices (positive semidefinite in this case), \(\smash {\hat{U}^{{l,\textrm{slw}}}_{d} \in \mathbb {R}^{D\times d}}\) orthogonal and \(\Sigma ^{l}_{d}\in \mathbb {R}^{d\times d}\) diagonal with the first d singular values of \({\hat{\Lambda }^{{l}}_d}\).Footnote 1 Let \(\hat{H}^{{l}}_d \!\!:=({\hat{\Lambda }^{{l}}_d})^\frac{1}{2}\) be the (positive) square root of \({\hat{\Lambda }^{{l}}_d}\).

3.3.4 Covariance of the Fast Dynamics

While \({\hat{\Lambda }^{{l}}_d}\) suffices to estimate a local tangent plane to \( {\mathcal {M}}_{{\epsilon }}\), the affine projection \(P^{{ l}}\) of the fast dynamics onto that plane, consistent with the dynamics, requires more information, as it is typically not an orthogonal projection. To estimate the kernel of \(P^{{ l}}\), i.e., the set of directions “along which” the dynamics near \(\hat{\textbf{z}}^{{l}}\) should be projected, we first estimate the covariance matrix \(\hat{\Gamma }^{{l}}\) in Eq. (3.1) as

$$\begin{aligned} \hat{\Gamma }^{{l}}:=\bar{C}^{{l}}_M - \hat{\Lambda }^{{l}}\bar{t}_M\,, \end{aligned}$$
(3.6)

and then let the estimated fast directions to be the span of the \(D-d\) eigenvectors of \(\hat{\Gamma }^{{l}}\) with largest eigenvalues, which we group as columns of an orthogonal matrix \(\hat{V}^{{l,\textrm{fst}}}_{D-d} \). See Figs. 1 and 7 for the case of the pinched sphere system. Since \(\sigma _{D-d+1}(\Gamma )=0\) (see Eq. 3.1), \(\sigma _{D-d+1}(\hat{\Gamma }^{{l}})\ll \sigma _{D-d}(\hat{\Gamma }^{{l}})\) w.h.p., for N large enough. In practice, not all \(D-d\) dimension may be fast modes, and we may truncate at the first \(d_f\le D-d\) significant eigenvectors (e.g., in the oscillating half-moon system below).

3.4 Construction of a Sketch of the invariant manifold \( {\mathcal {M}}_{{\epsilon }}\)

We now utilize the quantities estimated above to construct a sketch \( \hat{{\mathcal {M}}}_{{\epsilon }}\) of the invariant manifold, consisting of a set of portions of well-distributed affine approximate tangent planes.

3.4.1 Landmarks

The initial conditions \(\{\textbf{z}^{{l}}_0\}_{ l=1}^{{L}}\) of the bursts \(\{\mathcal {B}_ l\}_{ l=1}^{{L}}\) are not assumed to be on the unknown \( {\mathcal {M}}_{{\epsilon }}\), nor well-distributed on it. We construct a set of points, called landmarks, on our estimate of \( {\mathcal {M}}_{{\epsilon }}\). From Eq. (3.1), replacing the quantities involved by their empirical counterparts estimated above, for each \( l=1,\dots ,L\) we define the landmark \(\hat{\textbf{z}}^{{l}}\) as

$$\begin{aligned} \hat{\textbf{z}}^{{l}} := \bar{\textbf{m}}^{{l}}_M-\hat{\textbf{b}}^{{l}}\bar{t}_M\,. \end{aligned}$$
(3.7)

3.4.2 Local Tangent Plane to \( {\mathcal {M}}_{{\epsilon }}\)

We obtain an approximate tangent space \(\hat{T}_{\hat{\textbf{z}}^{{l}}} {\mathcal {M}}_{{\epsilon }} \) at \(\hat{\textbf{z}}^{{l}}\):

$$\begin{aligned} \hat{T}_{\hat{\textbf{z}}^{{l}}} {\mathcal {M}}_{{\epsilon }} := \mathrm {span(cols(}\hat{U}^{{l,\textrm{slw}}}_{d})). \end{aligned}$$
(3.8)

3.4.3 Dynamics-Driven Oblique Projections onto a Local Tangent Plane

We obtain the oblique affine projection \(\hat{P}^{{ l}} \) onto \(\hat{T}_{\hat{\textbf{z}}^{{l}}} {\mathcal {M}}_{{\epsilon }} \) centered at \(\hat{\textbf{z}}^{{l}}\):

$$\begin{aligned} \hat{P}^{{ l}} {({\textbf{z}})} :=\hat{U}^{{l,\textrm{slw}}}_{d} (\hat{U}^{{l,\textrm{slw}}}_{d})^T({\hat{E}^{l}}({\hat{E}^{l}})^T)^\dag (\textbf{z}-\hat{\textbf{z}}^{{l}})+\hat{\textbf{z}}^{{l}}\,, \end{aligned}$$
(3.9)

with kernel \(\textrm{span}(\textrm{cols}(\hat{V}^{{l,\textrm{fst}}}_{D-d}))\), where \({\hat{E}^{l}}:=[\hat{U}^{{l,\textrm{slw}}}_{d},\hat{V}^{{l,\textrm{fst}}}_{D-d} ]\) and \(\dag \) is the Moore–Penrose inverse.

3.4.4 A Dynamics-Adapted Metric

We introduce a quasi-metric adapted to the dynamics; then we discard “redundant” landmarks that are too close to others in order to create a well-distributed set of landmarks, which, together with the approximate tangent planes estimated above, gives a parsimonious sketch of the estimated invariant manifold \( \hat{{\mathcal {M}}}_{{\epsilon }}\).

Consider the local Mahalanobis similarity based on the quadratic form associated with the effective diffusivity \({\hat{\Lambda }^{{l}}_d}\)

$$\begin{aligned} \hat{\tilde{\rho }} ^2(\textbf{z}, \hat{\textbf{z}}^{{l}}):= \frac{1}{\chi ^2_d(p) }{ (\hat{P}^{{ l}} {({\textbf{z}})} - \hat{\textbf{z}}^{{l}})^T ({\hat{\Lambda }^{{l}}_d})^\dag (\hat{P}^{{ l}} {({\textbf{z}})} - \hat{\textbf{z}}^{{l}})}, \end{aligned}$$
(3.10)

for \(\textbf{z}\) such that \(\Vert \textbf{z}-\hat{\textbf{z}}^{{l}}\Vert \lesssim R\sqrt{\tau }\); otherwise we set \(\hat{\tilde{\rho }} (\textbf{z}, \hat{\textbf{z}}^{{l}})=R\sqrt{\tau }\). In practice, R is set as 10. \(\chi ^2_{d}(p)\) is the quantile function at level p of the \(\chi ^2\) distribution with d degrees of freedom (we set \(p=0.95\) throughout).

Unlike Euclidean distance, \(\hat{\tilde{\rho }} \) accounts for the anisotropy of the dynamics on \( \hat{{\mathcal {M}}}_{{\epsilon }}\); a similar distance, without the oblique projection, was used, for example, with different objectives, in Coifman et al. (2005) for manifold learning, and in Singer et al. (2009) in the context of dynamical systems. We symmetrize \(\hat{\tilde{\rho }} \) on \( \hat{{\mathcal {M}}}_{{\epsilon }} \) by letting \(\hat{\rho } (\hat{\textbf{z}}^{{l'}}, \hat{\textbf{z}}^{{l}}):=\max \{\hat{\tilde{\rho }} (\hat{\textbf{z}}^{{l'}}, \hat{\textbf{z}}^{{l}}),\hat{\tilde{\rho }} (\hat{\textbf{z}}^{{l}}, \hat{\textbf{z}}^{{l'}})\}\). A \(\sqrt{t}\)-neighborhood of \(\hat{\textbf{z}}^{{l}}\) is defined as \(B(\hat{\textbf{z}}^{{l}}, \sqrt{t}):= \{\textbf{z}\in T_{\hat{\textbf{z}}^{{l}}}{\mathcal {M}}_{\epsilon }: \hat{\tilde{\rho }} (\textbf{z}, \hat{\textbf{z}}^{{l}})< \sqrt{t}\}\): it approximates the set of points reachable from \(\hat{\textbf{z}}^{{l}}\) in time \(\approx t\) with probability at least p. This distance disregards the drift term; this choice reduces the asymmetry in the definition of \(\hat{\tilde{\rho }} \) and in the quasi-metric property and is reasonable for diffusion-dominated dynamics see Appendix A.1). Figure 1 (4th inset) visualizes the ellipsoids corresponding to the quadratic form induced by \({\hat{\Lambda }^{{l}}_d}\), for the pinched sphere system.

3.4.5 A Well-Distributed Net of Landmarks

We now reduce the number of landmarks to a near-minimal number that still, together with their corresponding neighborhoods of radius \({{O}}(\sqrt{\tau })\), cover the estimated invariant manifold \( \hat{{\mathcal {M}}}_{{\epsilon }}\), by discarding landmarks that are too close to each other. Before this process, we assume here that the collection of \(\sqrt{\tau /2}\)-neighborhoods, of the landmarks \(\{\hat{\textbf{z}}^{{l}}\}_{l=1}^L\) covers the invariant manifold \( \hat{{\mathcal {M}}}_{{\epsilon }}\). While this typically cannot be ascertained in practice, by default ATLAS will be run in “exploration mode”, which augments \( \hat{{\mathcal {M}}}_{{\epsilon }}\) on the fly (see Sect. 4): in that case our arguments here apply to the current \( \hat{{\mathcal {M}}}_{{\epsilon }}\) during exploration. For \(l=1,\ldots \), we remove \(\hat{\textbf{z}}^{{l'}}\) if \(\hat{\rho } (\hat{\textbf{z}}^{{l}}, \hat{\textbf{z}}^{{l'}})\le (1-{1}/{\sqrt{2}})\hat{\kappa }\sqrt{\tau }\) and \(l'>l\), where \(\hat{\kappa }\) is the scaling constant. In practice, \(\hat{\kappa }\) is set as 1. When this procedure terminates, we are left with \(L'\le L\) landmarks (unused landmarks are discarded). We assume, without loss of generality, that \(\smash {\{\hat{\textbf{z}}^{{l}}\}_{ l=1}^{{L'}}}\) is the reduced collection of landmarks, and to simplify the notation, we will let \(L'=L\) in what follows. These landmarks, under suitable assumptions, (1) are well-separated: for \( l\ne l'\), \(\hat{\rho } (\hat{\textbf{z}}^{{ l'}}, \hat{\textbf{z}}^{{l}}) > rsim \sqrt{\tau }\); (2) provide a \({{O}}(\hat{\kappa }\sqrt{\tau })\)-cover for \( {\mathcal {M}}_{{\epsilon }}\), in the sense that for any \(\textbf{z}\in \hat{{\mathcal {M}}}_{{\epsilon }} \) there exists l s.t. \(\hat{\tilde{\rho }} (\textbf{z}, \hat{\textbf{z}}^{{l}})\lesssim \sqrt{\tau }\); (3) are well-distributed, in the sense that for any \(\hat{\textbf{z}}^{{l}}\in \mathcal {T}\), where \(\mathcal {T}\) is any connected component of the invariant manifold \( {\mathcal {M}}_{{\epsilon }}\) satisfying suitable constraint, there exists \( l'\) such that \(\hat{\rho } (\hat{\textbf{z}}^{{ l'}}, \hat{\textbf{z}}^{{l}})\lesssim \sqrt{\tau }\). These constants \(R, \hat{\kappa }\) used above could be made explicit, but they depend on quantities typically unknown in practice, such as the curvature of \( {\mathcal {M}}_{{\epsilon }} \).

These steps define a collection of charts, each centered at one of the landmarks \(\hat{\textbf{z}}^{{l}}\), with an associated oblique projection \(\hat{P}^{{ l}} \) with range \(\hat{T}_{\hat{\textbf{z}}^{{l}}} {\mathcal {M}}_{{\epsilon }} \), an effective drift \(\hat{\textbf{b}}^{{l}}\in \mathbb {R}^D\) and effective diffusivity \({\hat{\Lambda }^{{l}}_d}\in \hat{T}_{\hat{\textbf{z}}^{{l}}} {\mathcal {M}}_{{\epsilon }} \). The reader may wish to revisit Figs. 1, 2, 4 and 7, that visualize these objects.

3.5 The ATLAS Process and Simulator for the Reduced Effective (Slow) Dynamics

The final step is to smoothly connect both the geometric and dynamic objects estimated so far at the landmarks, in order to obtain a smooth effective invariant manifold \( \hat{{\mathcal {M}}}_{{\epsilon }}\) and an Itô process constrained on it, with the fast modes averaged at the timescale \(\tau \), together with a numerical scheme for its simulation. This smoothing is achieved by a weighted average: for \(\textbf{z}\in \mathbb {R}^D\), we let \(w^{{ l}}(\textbf{z}):=\exp (-\hat{\tilde{\rho }} (\textbf{z}, \hat{\textbf{z}}^{{l}})/\sqrt{\tau })\) and \(\mathcal {N}_\tau ^{\mathcal {A}}(\textbf{z}):=\{\hat{\textbf{z}}^{{l}}\,:\, \hat{\tilde{\rho }} (\textbf{z},\hat{\textbf{z}}^{{l}})\le 2\hat{\kappa }C\sqrt{\tau }\}\), \(Z(\textbf{z}):=\sum _{ l\in \mathcal {N}_\tau ^{\mathcal {A}}(\textbf{z})}w^{{ l}}(\textbf{z})\), and then define:

$$\begin{aligned} \hat{P}^{\mathcal {A}}(\textbf{z}):= & {} \frac{1}{Z(\textbf{z})}\sum \nolimits _{l\in \mathcal {N}_\tau ^{\mathcal {A}}(\textbf{z})} \hat{P}^{{ l}} {({\textbf{z}})} w^{{l}}(\textbf{z}), \end{aligned}$$
(3.11)
$$\begin{aligned} \hat{\textbf{b}}^{\mathcal {A}}(\textbf{z}):= & {} \frac{1}{Z(\textbf{z})}\sum \nolimits _{l\in \mathcal {N}_\tau ^{\mathcal {A}}(\textbf{z})} \hat{\textbf{b}}^{{l}}w^{{l}}(\textbf{z}), \end{aligned}$$
(3.12)
$$\begin{aligned} \hat{\Lambda }^{\mathcal {A}}(\textbf{z}):= & {} \frac{1}{Z(\textbf{z})}\sum \nolimits _{l\in \mathcal {N}_\tau ^{\mathcal {A}}(\textbf{z})} {\hat{\Lambda }^{{l}}_d}w^{{l}}(\textbf{z}), \end{aligned}$$
(3.13)
$$\begin{aligned} {\hat{H}^{\mathcal {A}}} _d(\textbf{z}):= & {} (\textrm{Proj}_{\textrm{rk}(d)} \hat{\Lambda }^{\mathcal {A}}(\textbf{z}))^\frac{1}{2}. \end{aligned}$$
(3.14)

This defines the ATLAS stochastic process \({\textbf{z}_t^{\mathcal {A}}}\), on the estimated invariant manifold \( \hat{{\mathcal {M}}}_{{\epsilon }}:=\hat{P}^{\mathcal {A}}(\mathbb {R}^D),\) as the Itô diffusion with drift \(\hat{\textbf{b}}^{\mathcal {A}}\) and diffusion coefficient \({\hat{H}^{\mathcal {A}}} _d\). To simulate this process we use the Euler–Maruyama scheme with re-projection on \( \hat{{\mathcal {M}}}_{{\epsilon }} \), with time-step \(\lambda \tau \) and \(\Delta W_{\lambda \tau }\sim {\mathcal {N}}(0, \lambda \tau I_d)\):

$$\begin{aligned} \textbf{z}^{\mathcal {A}}_{t+\lambda \tau } = \hat{P}^{\mathcal {A}}\big ({\textbf{z}_t^{\mathcal {A}}}+ \hat{\textbf{b}}^{\mathcal {A}}({\textbf{z}_t^{\mathcal {A}}}) \lambda \tau + {\hat{H}^{\mathcal {A}}} _d({\textbf{z}_t^{\mathcal {A}}})\Delta W_{\lambda \tau }\big ), \end{aligned}$$
(3.15)

In all our experiments, \(\lambda =1\), i.e., the time-step of the ATLAS simulator is equal to the timescale \(\tau \); in particular, it is independent of, and may be much larger than, the time-step \(\delta t\) of the original black-box simulator, which is typically \(\lesssim \epsilon \).

3.6 Refinements to the Estimation Procedure

When the initial conditions for the bursts are far away from the invariant manifold \( {\mathcal {M}}_{{\epsilon }}\) (e.g., in the oscillating half-moon example below), or the timescale of separation in the original system is not very large (e.g., in butane example below), it may take a time longer than \(\tau \) to relax onto \( {\mathcal {M}}_{{\epsilon }}\). In these cases we perform multiple rounds of the estimation phases, starting each round with initial conditions for the bursts given by the landmarks estimated in the previous round. This refinement process stops when the relative differences of estimated parameters is within 5% (in our examples, this is achieved in less than 10 rounds).

One further step of refinement, after the above ones, may be needed when the linear approximation of the geometry or of the dynamics may not be locally accurate, because \( {\mathcal {M}}_{{\epsilon }}\) has large curvature or the effective drift term has large gradient (e.g., in the oscillating half-moons and butane examples below). In this case, since the estimated drift term is computed at the timescale \([\tau _{\min {}},\tau _{\max {}}]\), the landmarks in the final stage are refined to be \(\hat{\textbf{z}}^{{l}}:= \bar{\textbf{m}}^{{l}}_M-\hat{\textbf{b}}^{{l}}(\bar{t}_M - \tau _{\min {}})\), instead of Eq. (3.7).

4 ATLAS in Exploration Mode

In the construction of ATLAS presented above, initial conditions for the bursts of simulation were sampled from a provided measure \(\mu _0\) on the state space. Ideally such measure is well-distributed on \( {\mathcal {M}}_{{\epsilon }}\), e.g., close to the stationary distribution of the effective reduced process, or, at least, such that the set of local means \(\bar{\textbf{m}}^{{l}}_M\) of bursts started i.i.d. from \(\mu _0\) are well-distributed on \( {\mathcal {M}}_{{\epsilon }}\). However, this is too much to hope for in many practical situations, and it is highly desirable to be able handle more general \(\mu _0\)’s.

There are at least two, not-mutually-exclusive, ways of proceeding. The first one is to use any of many existing techniques aimed at efficiently sampling the effective state space, to obtain a \(\mu _0\). The literature on this subject is vast, including, among many (Chiavazzo et al. 2017; Frewen et al. 2009b; Tribello et al. 2014; Zheng et al. 2013; Chen et al. 2015). These techniques often design a bias of the dynamics to ensure rapid exploration, yielding samples with coverage of the effective state space. In some remarkable cases the samples from this biased process allow to recover statistics of the original dynamics, e.g., the stationary distribution in the case of MCMC. However, even when the stationary distribution is recovered, the biased dynamics often does not preserve other dynamic properties such as mean residence times or transition rates, important in applications. Yet other techniques in this broad family require a target statistics to be computed, and then are designed to achieve accuracy in the estimate of that statistics, but in general no other ones. There is typically a strong tension between the attempt to speed up exploration, and the ability to correct the biased sampling to obtain consistent estimates of statistics of interest. We note that, crucially, this tension does not arise in ATLAS: \(\mu _0\) needs to have coverage, but has no relationship with the dynamics; it is only used as a starting point for ATLAS, which will then estimates consistently the effective dynamics, and from it the stationary distribution and many other dynamical properties, such as mean residence times and transition rates, as demonstrated in the numerical experiments in Sect. 7.

The second way is to extend ATLAS to run in exploration mode: upon a first round of learning starting from a \(\mu _0\) with poor coverage, ATLAS runs trajectories till they exit the current, partial estimate of invariant manifold, and updates itself, accurately but efficiently, by simulating new paths of the original process at those new exit locations. In detail, suppose we have constructed ATLAS \(\mathcal {A}_1\), starting from a set of bursts \(\{\mathcal {B}^l\}_{l=1}^{{L}}\), therefore obtaining the process \(({\textbf{z}_t^{\mathcal {A}_{1}}})_{t\ge 0}\) on \( \hat{{\mathcal {M}}}_{{\epsilon }}^{1}\), perhaps from a small number L (even \(L=1\)) of initial conditions, which are poorly distributed in the state space (e.g., supported in one or a few metastable states). While simulating \(({\textbf{z}_t^{\mathcal {A}_{1}}})_{t\ge 0}\), ATLAS checks if the distance between \({\textbf{z}_t^{\mathcal {A}_{1}}}\) and its closest landmark, in the \(\hat{\tilde{\rho }}\) “metric”, is larger than some threshold \(d_{\text {thr}}={{O}}(\sqrt{\tau })\): if this is the case, then \({\textbf{z}_t^{\mathcal {A}_{1}}}\) is beyond the current “domain of expertise” of the ATLAS \(\mathcal {A}_1\). We now stop that process, and run a new burst \(\mathcal {B}_*\) of simulations from this “exit point” \({\textbf{z}_t^{\mathcal {A}_{1}}}\), estimate the local quantities of the dynamics there, and add a new landmark \(\hat{\textbf{z}}^{{l+1}}\), with its associated tangent plane, projection, and estimated effective drift and diffusion coefficients. This local and efficient update yields a new ATLAS process \(({\textbf{z}_t^{\mathcal {A}_{2}}})_{t\ge 0}\) on an “enlarged” \( \hat{{\mathcal {M}}}_{{\epsilon }}^{2}\). This procedure is repeated, creating ATLAS processes that capture ever-increasing approximations \( \hat{{\mathcal {M}}}_{{\epsilon }}^{1} \subset \dots \subset \hat{{\mathcal {M}}}_{{\epsilon }}^{k} \subset \dots \) of \( {\mathcal {M}}_{{\epsilon }}\), discovering rarer and rarer events, till a given computational budget (for example expressed in terms of total number of bursts used, which is the most expensive component in the construction) is exhausted, or long-enough trajectories are simulated with \(({\textbf{z}_t^{\mathcal {A}_{k}}})_{t\ge 0}\) without leaving \( \hat{{\mathcal {M}}}_{{\epsilon }}^{k}\). See details in Appendix B. All our numerical experiments will be performed with ATLAS in exploration mode.

We provide a pertinent visualization in step 4 of Fig. 1: at stage k of exploration, we represent \( \hat{{\mathcal {M}}}_{{\epsilon }}^{k}\), and a trajectory of \(({\textbf{z}_t^{\mathcal {A}_{k}}})_{t\ge 0}\) which at some point leaves \( \hat{{\mathcal {M}}}_{{\epsilon }}^{k}\) (orange dot in the figure). The current ATLAS \(\mathcal {A}_k\) stops there, and then will obtain a new burst of paths starting at that location, extract local estimates of the geometric and dynamics quantities needed, obtain \( \hat{{\mathcal {M}}}_{{\epsilon }}^{k+1}\), and then continue. In the figure, the path is linearly interpolated between the points \((\mathbf{z^{\mathcal {A}}}(p\lambda \tau ))_p\); note that the ATLAS time-step \(\lambda \tau \) leads (here, and in the other examples, \(\lambda =1\)); as expected, to steps of length comparable to that of the axes of the diffusion ellipsoids representing the level set \(\tau \) of the quadratic form \({\hat{\Lambda }^{{l}}_d}\).

Note that these additions happen during ATLAS simulations (with their large \(\lambda \tau \) time-steps), taking advantage of the computational speed gains achieved by the ATLAS simulator. Therefore, in real clock time, the regions of \( {\mathcal {M}}_{{\epsilon }}\) that are rarely visited with the original simulator are more likely to be sampled quickly in ATLAS exploration mode. Note how this differs from existing techniques based on ideas of importance sampling: at no point is the dynamics of ATLAS biased, and at any stage the dynamics is consistent with the underlying effective dynamics, by construction. This procedure can be very effective, measured in real clock time, in discovering relatively rare events, such as transitions between metastable states, while updating ATLAS, seamlessly. We also note that this procedure is parallelizable across multiple paths of the current ATLAS, provided one checks that the regions being added are far enough from each other to avoid the simulation of the bursts of simulations at nearby locations: these “conflicts” are likely rare in high-dimensional state spaces, at least till ATLAS has explored the vast majority of it.

Finally, it is certainly possible to apply a multitude of techniques and heuristics (e.g., see Chiavazzo et al. 2017 and references therein) to bias the ATLAS simulator itself during exploration, i.e., combine the two approaches described in this section; once new regions are explored with a biased ATLAS, and charts created (with new local bursts initialized in those regions) and incorporated into ATLAS, then the ATLAS simulator can be run in unbiased mode and will be consistent with the effective dynamics. This decoupling of exploration and consistent estimation of the dynamics is a crucial property of ATLAS, and it is very efficient as the information from the expensive simulations of bursts of trajectories is fully reused. This strategy together with parallel learning across multiple paths are particularly helpful when the effective dynamics have a significant amount of meta-stable states.

5 Properties of ATLAS

5.1 Avoiding the Curse of Dimension

The input to ATLAS is random, so are the L initial conditions and the N paths in each burst. It is natural to ask how many short trajectories in each burst are needed to make sure the random error of all the local quantities estimated by ATLAS is small w.h.p. In particular, it is important to assess how N should scale with the dimensions D of the state space, d of \( {\mathcal {M}}_{{\epsilon }}\), \(d_f\) of the fast modes with large magnitude. Using concentration inequalities for high-dimensional vectors and matrices that are concentrated near low-dimensional spaces (Vershynin 2018), it is possible to show that the approximation error between the empirical estimates of \(\hat{\textbf{b}}^{{l}}\), \({\hat{\Lambda }^{{l}}_d}\) and \(\hat{\textbf{z}}^{{l}}\) is smaller than \(\eta \), with high probability, as soon as \(N > rsim {d(d+d_f)}/{\eta ^2}\), where \(d_f\) is the number of fast modes with large magnitude.

The approximation error of the direction of the fast modes appears to require larger sample size, e.g., \(N > rsim {(d+d_f)^2\ln {D}}/{\eta ^2}\) samples appear needed to obtain, w.h.p., \(\Vert \sin \Theta (\hat{V}^{{l,\textrm{fst}}}_{D-d}, V^{{l,\textrm{fst}}}_{D-d})\Vert _{\textrm{F}}\le \eta \). It is worthwhile to note, though, that this still depends only very weakly on D, and only quadratically on the intrinsic dimension d, and it is still quite benign as soon as \(d_f\ll D\).

To summarize, there is no curse of dimensionality—i.e., a requirement of a number of samples exponential in D—when estimating the local quantities above.

5.2 Robustness to Model Error and Nonlinearities in the Fast Modes

The type of stochastic systems for which ATLAS is expected to perform well have been described in Sect. 2. In particular, locally we assumed the existence of an invariant manifold in the observed space, on which the slow dynamics \((\textbf{z}^{\text {slw}}_t)_{t\ge 0}\) takes place, and such that the fast dynamics \((\textbf{z}^{\text {fst}}_t)_{0\le t\le \tau }\) conditioned on \(\textbf{z}^{\text {slw}}_t=\textbf{z}_0\) approximately is an O-U process on the subspace \(\smash {\mathbb {V}^{\text {fst}}_{\textbf{z}_0}}\). In the latent space, the SDEs are those in Eq. (2.1), and one then linearizes locally the observation map \(\varphi \) (or, rather, \(\varphi _\alpha \), as in Sect. 2) to obtain approximate local equations in the \(\textbf{z}\) variables. ATLAS averages the observed process \((\textbf{z}_t)_{t\ge 0}\) at timescale \(\tau \) to obtain the reduced effective dynamics on \( \hat{{\mathcal {M}}}_{{\epsilon }}\).

ATLAS is quite robust to these assumptions. One reason is that ATLAS uses mainly information at timescale \(\tau \): details, such as nonlinearities, or lack of regularity, of the original process \(\textbf{z}_t\) below that timescale are averaged out, possibly leading to effective processes at the timescale that are amenable to approximation by ATLAS. In a forthcoming work, we prove results in this direction, under technical conditions on the coefficients fgFG of the SDEs Eq. (2.1), on the regularity of the map \(\varphi \), and on the stationary measure \(\nu ({\varvec{\xi }}|\textbf{z}^{\text {slw}}_t=\textbf{z}_0)\) of the (fast) displacement process \({\varvec{\xi }}\) conditioned on \(\textbf{z}_0\).

This robustness is reflected in the results for both the second and third examples we consider in Sect. 7, which are both significant perturbations of the basic model. In the “oscillating half-moons” example, a high-dimensional analogue of an example considered in Singer et al. (2009) and Dsilva et al. (2016), the fast displacement process \({\varvec{\xi }}\) is nonlinear and not constrained to an affine subspace, but on a curved “half-moon”-shaped manifold. In the butane model, the fast mode has both large and small nonlinear components; its slow manifold is also highly curved, with effective drift having large gradient. Nevertheless, ATLAS provides accurate estimates of the behavior of these systems, both at timescale \(\tau \) and at very large timescales, providing accurate estimates of the stationary distribution, mean residence time and transition rates between metastable states.

5.3 Computational Complexity and Simulation Speed-Up

The input data to ATLAS are the bursts of trajectories \(\{\mathcal {B}_ l\}_{l=1}^L\), of time length \(\approx \tau \). The cost of obtaining one time-step from the black-box simulator is at least of order \(D^2\). The time-step \(\delta t\) of the simulator needs to be \(\lesssim \epsilon \) due to having to resolve the fast modes. The total number of short paths collected is equal to #landmarks\(\times \)#paths per landmark\(=L\times N\). Therefore the total computational cost of obtaining the bursts is at least \({{O}}(\frac{\tau }{\epsilon }D^2 LN/c)\), where c is the number of parallel cores. Constructing ATLAS requires \({{O}}(D^2dN)\) calculations to estimate local means, covariances, effective drift, effective diffusion coefficients, landmarks and tangent planes; \({{O}}(C^d D L \log L)\) for constructing and organizing the landmarks using, for example, cover trees (Beygelzimer et al. 2006). A time-step of the ATLAS simulator as in Eq. (3.15), which has time length \(\lambda \tau \), has cost \({{O}}(C^d Dd^2)\) by using iterative SVD combining Eq. (3.13,3.14), where \(C^d\) corresponds to the number of landmarks in \(\mathcal {N}_\tau ^{\mathcal {A}}(\textbf{z})\). Therefore, simulating a path of time length T would have computational cost \({{O}}(D^2T/\epsilon )\) with the original simulator, and \({{O}}(C^d Dd^2 T/\tau )\) with ATLAS. This is a dramatic speed-up when \(\epsilon \ll \tau \) and \(d\ll D\). This is very useful when many long paths are needed to estimate dynamical quantities of interest.

6 Applications of ATLAS

6.1 Estimation of Large-Time Dynamical Properties

ATLAS may be used to simulate long paths efficiently, and therefore estimate important properties of the system, such as its stationary distribution, residence times from certain regions of state space (e.g., metastable states), and transition rates between them. Our numerical experiments in Sect. 7 show that such large-time quantities may be estimated accurately by ATLAS, even when run in exploration mode. Note that ATLAS is constructed using only local information, at timescale \(\tau \), that may be easily collected in parallel; yet the effects of the multiple estimation and numerical simulation errors do not appear to compound in these estimates of large-time quantities (Crosskey and Maggioni 2017).

6.2 ATLAS, Approximate Generators, Eigenfunctions and Eigenvalues

The ATLAS process \(({\textbf{z}_t^{\mathcal {A}}})_{t\ge 0}\) may be used to approximate the generator of the effective slow dynamics and its spectral components, including eigenvalues and eigenvectors, especially the low-frequency ones. It may serve as a black-box for matrix–vector multiplication in iterative eigensolvers. In general, ATLAS may be used to compute approximations of \(\mathbb {E}[h(\textbf{z}^{\text {slw}}_t)]\), for sufficiently regular observables h.

6.3 Markov State Models (MSMs) from ATLAS

In MSMs (Husic and Pande 2018) one constructs (1) a partition of state space \(\{C_k\}_{k=1}^K\) and (2) a Markov transition matrix \(P\in \mathbb {R}^{K\times K}\) with \(P_{kk'}\) being the probability of transitioning from \(C_k\) to \(C_{k'}\) in one MSM time-step. MSMs may be “large-timescale MSMs”, where each \(C_k\) corresponds to a metastable state and the MSM models the rare transitions between them, and “small-timescale MSMs”, where K is large and the \(C_k\)’s are small regions of state space.

Large-timescale MSMs may be constructed if the metastable states are known and a large number of transitions between them are observed. Since these transitions are rare, by definition of metastability, this construction is very expensive in general; however, ATLAS can help identifying metastable states and estimating transition rates efficiently.

Small-timescale MSMs are very flexible tools, and as \(K\rightarrow \infty \) the transition matrix P approximates in a suitable sense the generator of the process, and convergence is (under suitable assumptions) strong enough to guarantee convergence of the slow eigenfunctions of P to those of the generator of the process. These eigenfunctions, and the corresponding eigenvalues, yield important information about the process, including metastable states. However, the construction of the local clusters \(C_k\) is crucial, and many recipes exist (Pérez-Hernández et al. 2013; Kutz et al. 2016). This is a challenging task and typically cursed by the ambient dimension D. Many existing techniques require, in order to be of any practical value, the a priori knowledge of a suitable small number of slow variables on which the process is projected, and in which the construction of the \(C_k\)’s is performed (Husic and Pande 2018; Klus et al. 2018). In particular, we are not aware of techniques for efficiently constructing the \(C_k\)’s in the situation where there are many fast modes, possibly with large amplitude. In this context, ATLAS naturally constructs the small-timescale MSMs (at timescale \(\tau \)), in a principled and well-organized fashion, with soft instead of hard partitions, which may diminish the memory effect. ATLAS also uses dynamics-adapted oblique projections and the corresponding estimated local invariant manifold to reduce the dimension, without needing slow variables as inputs. In our experiments, \(C_k\)’s in the MSM correspond to the Voronoi cells, in the \(\hat{\tilde{\rho }} \) “metric”, of the landmarksFootnote 2, and the transition matrix is estimated by running ATLAS trajectories of length \({{O}}(\tau )\) (see appendix B). We may use the small-timescale MSMs to compute approximate slow eigenfunctions and eigenvalues of the system and estimate the number and locations of metastable states and then construct the large-timescale MSMs.

7 Numerical Experiments

We construct ATLAS for three model systems: “pinched sphere ”, “oscillating half-moons” and “butane model”. We evaluate its performance in multiple ways: first of all, against analytically derived reduced models with analytical approximations to the slow manifold \( {\mathcal {M}}_{0}\), effective drift and diffusion coefficients (see Appendix C). For the first two examples, the effective dynamics are calculated in the limit \({\epsilon }\rightarrow 0\); for butane, the effective dynamics are chosen to be the dihedral angle dynamics. It is important to remark that these are not the true effective dynamics on the invariant manifold \( {\mathcal {M}}_{{\epsilon }}\) at timescale \(\tau \), which is what ATLAS approximates, and are also not amenable to analytical calculation for finite \({\epsilon }\). Although with this caveat, we regard them as analytical approximations sufficient as a first check on the quality of the ATLAS process for the local statistics, and report in Table 4 the estimator errors for drift, diffusion, invariant manifold and tangent spaces, between ATLAS and these analytically derived reduced models (details in Appendix D).

We also study the accuracy of ATLAS in estimating key medium- and large-time statistics of the dynamics, in particular the stationary distribution, mean residence times (MRTs) and transition rates for metastable states, and MRTs in regions of state space that are not necessarily metastable. In each example, we repeat the construction of ATLAS 10 times, to assess the variability over the random observed data.

We visualize the invariant manifold \( \hat{{\mathcal {M}}}_{{\epsilon }}\) for each example, as well as key quantities including the stationary distribution and eigenfunctions of MSMs; in these plots we use suitable parametrizations (that of course were not used nor known to ATLAS). Further details and figures for the models are available in Appendix C.

Fig. 4
figure 4

Pinched sphere visualized in the plane of the parametrization \((\phi ,\theta )\) of \( {\mathcal {M}}_{{\epsilon }}\). Top: landmarks and their neighborhoods \(B(\hat{\textbf{z}}^{{l}}, \sqrt{\tau })\). We also show two sets \(S_{\text {red}}\) and \(S_{\text {cyan}}\) around the fixed points in the reduced deterministic system, marked with red crosses in the other insets. Center: eigenfunction \(\varphi _1\) of the MSM, with eigenvalue \(\lambda _1=1\). Bottom: eigenfunction \(\varphi _2\), with eigenvalue \(\lambda _2=0.9995\)

7.1 Pinched sphere System

We start with the pinched sphere system, introduced in Sect. 3. Its governing equations, expressed in spherical (latent) coordinates, are

$$\begin{aligned} \begin{aligned}&\textrm{d}r = -\frac{c_1(r- R(\theta ))}{{\epsilon }r}\textrm{d}t + \frac{c_2}{\sqrt{{\epsilon }}r} \textrm{d}W_1 \\&\textrm{d}\theta = \frac{c_3\cos (3\theta )}{r\sin (\theta )} \textrm{d}t+\frac{c_4\sin (\theta )}{r} \textrm{d}W_2 \\&\textrm{d}\phi = \frac{c_5\sin (\phi +\theta )}{r} \textrm{d}t + \frac{c_6}{r}\textrm{d}W_3. \end{aligned} \end{aligned}$$
(7.1)

The fast variable is the radial coordinate r; the slow variables are the angles \(\phi ,\theta \). The slow manifold (in the limit \({\epsilon }\rightarrow 0\)) is \(R(\theta )= \sqrt{a_1+a_2\cos ^2(\theta )}\), visualized in Fig. 2. The observations \(\textbf{z}\) are in Cartesian coordinates, each of which contains a mix of nonlinearly coupled slow and fast components. Note that the drift diverges near the poles, creating a strong repulsion, and is relatively small in other wide regions of the state space, creating entropic barriers (Bicout and Szabo 2000).

The dominant local PCA mode only captures the fast direction, due to its large amplitude, and fails to identify the slow variables, which are also not orthogonal to the fast ones. ATLAS successfully estimates that the invariant manifold is two-dimensional, and identifies the separation timescale \(\tau \) (see Appendix C.1 and Fig. 7). ATLAS yields an accurate estimation of the effective drift and diffusion terms, as well as of \( {\mathcal {M}}_{{\epsilon }}\) (see Table 4). We visualize in Fig. 4 the \(\sqrt{\tau }\)-neighborhoods over \( {\mathcal {M}}_{{\epsilon }}\) (unwrapped in the \((\phi ,\theta )\) coordinates for clarity), reflecting the ellipsoids associated with the diffusion coefficient. At the time we terminate exploration, as expected the only regions that are not covered are those around the south and north poles, which are very rarely visited.

Table 1 Mean residence times (MRTs) for Pinched sphere

The TICA method (Molgedey and Schuster 1994; Pérez-Hernández et al. 2013) is global and indicates that all observed coordinates are important; in particular the common approach of constructing MSMs in the TICA coordinates would be cursed by the ambient dimension. Here we construct MSMs using ATLAS. The top two eigenfunctions of the transition matrix of an MSM constructed from ATLAS on \( \hat{{\mathcal {M}}}_{{\epsilon }}\) are visualized in Fig. 4. The first eigenfunction \(\varphi _1\) is (up to rescaling) the invariant distribution; the level set \(\varphi _2=0\) partitions the state space into two metastable states \(M_1\) and \(M_2\). We also let \(C_1:=\{\varphi _2>+0.02\}\) and \(C_2:=\{\varphi _2<-0.02\}\); initial conditions for paths used in the computation of mean residence times (MRTs) will be from \(S_{\text {cyan}}:=\{\varphi _2>0.05\}\) and \(S_{\text {red}}:=\{\varphi _2<-0.05\}\) (see Fig.  2 and 4), where \(\varphi _1\) is large.

In Table 1 we report the accuracy of ATLAS in estimating the MRTs in \(M_1, C_1\) (resp. \(M_2, C_2\)) starting from set \(S_{\text {cyan}}\) (resp., \(S_{\text {red}}\)). ATLAS yields \(\le 2\%\) relative error for these quantities, with runtime at least 6 times smaller than original simulator \(\mathcal {S}\); training time is about 21hrs. Of course, the transition rates between metastable states, which are determined by the mean residence times for double-well systems, are also very accurate. Using orthogonal projections, instead of the ATLAS oblique projections, leads to a significant loss of accuracy in long-time observables (e.g., exit times from \(M_1, M_2\)). The estimated \(L^1\)-norm of the difference of the density of the invariant distribution between original and ATLAS simulators is \( 0.107\pm 0.009\).

7.2 Oscillating Half-Moons

This is a multiscale stochastic system in \(\smash {\mathbb {R}^2\times \mathbb {R}^{18}}\) that generalizes the one in Singer et al. (2009) to high dimensions. Its governing equations in latent coordinates are:

$$\begin{aligned} \begin{aligned}&\textrm{d}\theta = \left( a_1 + a_2\sin (2\theta ) +a_3\cos (\theta )\right) \textrm{d}t + a_4 \textrm{d}W_1, \\&\textrm{d}r_1 = \frac{b_1}{{\epsilon }}\left( 1-r_1\right) \textrm{d}t + \frac{b_2}{\sqrt{{\epsilon }}} \textrm{d}W_2, \\&\textrm{d}u_i = \frac{b_3}{{\epsilon }}(-u_i)\textrm{d}t + \frac{b_4}{\sqrt{{\epsilon }}} \textrm{d}W_i, \ \ \mathrm{{i}}=3,{\dots , 20}. \end{aligned} \end{aligned}$$
(7.2)

The observables in Cartesian \(\mathbb {R}^{20}\) by

$$\begin{aligned} \begin{aligned} z_1 = r_1\cos (\theta +r_1-1), z_2= r_1\sin (\theta +r_1-1),z_i= r_1+u_i, \end{aligned} \end{aligned}$$
(7.3)

for \(i=3,\ldots , 20\). The dynamics of the angle \(\theta \) is that of an uneven double-well system with metastable states \(M_{\text {Left}}\) and \(M_{\text {Right}}\) around \(\theta =\pm \pi /2\). The radial variable r and other \(u_i\)’s evolve as O–U processes. The fast variables \(r,u_2,\dots ,u_{19}\) are nonlinearly coupled in the observed Cartesian coordinates.

A typical trajectory exhibits fast oscillations with a half-moon shape, far from a radial direction, while evolving slowly along the circular slow manifold driven by the double-well potential and diffusion along it (see in Appendix C.2).

Fig. 5
figure 5

Oscillating half-moons. The short illustrative trajectory of time \({1 \times 10^{2}}\) is plotted in \((z_1,z_2)\) and colored according to the time t. The landmarks (black dots) and their neighborhood (red lines) and effective drift direction (gray arrows) in \((z_1,z_2)\) are plotted in the left. In the right, the smoothed histograms from the trajectories of time length \(8 \times 10^{6}\) generated by \(\mathcal {S}\) and ATLAS simulator, projected with ATLAS’s projection, are plotted in the coordinate of angle \(\theta \)

Local PCA again fails to detect the slow manifold (see Fig. 8). Notwithstanding the lack of linearity of the fast modes, ATLAS accurately identifies the invariant manifold and the effective dynamics on it, see Table 4. While the relative error of estimated drift and covariance matrix seems large (\(32\%\) and \(11\%\), resp.), if the error is measured only in the first two important coordinates—since the error in the other 18 dimensions does not contribute to effective observations—then these relative errors drop to \(19\%\) and \(6\%\) (resp.). The invariant distribution estimated by ATLAS is very close to the one by original simulator \(\mathcal {S}\) (see Fig. 5), with the estimated \(L^1\)-norm of the difference of their densities is \(0.098\pm 0.006\). The main reason for the small translational bias in the estimate of the stationary distribution is that the fast modes do not fully relax at the timescale \(\tau \), and increasing \(\tau \) is not an option in this case due to the high curvature of the \( {\mathcal {M}}_{{\epsilon }}\). As reported in Table 2, the estimated MRTs in the metastable states are quite accurate, and so are the transition rates. The training time for ATLAS is about 17hrs; runtime for estimating the large-time quantities above is less than half that of original simulator \(\mathcal {S}\).

Table 2 Mean residence time (MRT) for oscillating half-moons

7.3 Butane Model

This is a model for the butane molecule, inspired by molecular dynamics (Legoll and Lelièvre 2012; Schappals et al. 2017), in the form of overdamped Langevin equations in \(\mathbb {R}^6\) (see Appendix C.3). The dihedral angle \(\phi \), which determines the distance of two outer carbons groups, is usually considered to be the slow variable. TICA however flags two coordinates, \(x_4\) and \(z_4\), as important coordinates; in the plane that they span three metastable states \(M_{\text {trans}}\), \(M_{\text {bot-cis}}\) and \(M_{\text {top-cis}}\), concentrated around a circular \( {\mathcal {M}}_{{\epsilon }}\), are apparent (see Fig. 6). ATLAS identifies that the slow variable is one-dimensional, accurately estimates the tangent line direction and \( {\mathcal {M}}_{{\epsilon }}\). The relative error of the estimated drift in the \((x_4, z_4)\) plane are on average \(9\%\), vs. \(20\%\) in all 6 dimensions reported in Table 4. The 5 fast variables are almost orthogonal to the slow variable (as suggested in Legoll and Lelièvre 2012): we therefore expect the local orthogonal projections to work as well as the oblique ones. The top three eigenfunctions of an MSM estimated by ATLAS simulator identify these three metastable regions on the slow manifold, see Fig. 6 and Appendix C.3. The invariant distribution of the ATLAS process has density very close, on \( \hat{{\mathcal {M}}}_{{\epsilon }}\), to the one generated by the original simulator, with the estimated \(L^1\)-norm of the difference of the density \(0.060\pm 0.013\). The results reported in Table 3 show that the mean residence times in the three metastable states, estimated with ATLAS, are within 4% relative error, with a runtime is about \(68\%\) of the original simulator. All estimated reaction rate constants are within \(5\%\) relative error. The training time of ATLAS is about 13hrs.

Fig. 6
figure 6

Butane. The points in the sample trajectory with time length 20 simulated by the original simulator are scattered in blue dots in \((x_4, z_4)\). Three data clusters corresponds to \(M_{\text {top-cis}}\) (upper left), \(M_{\text {bot-cis}}\) (lower left) and \(M_{\text {trans}}\) (right) metastable states. The landmarks (black dots) and their neighborhood (red lines) and effective drift direction (gray arrows) in \((x_4, z_4)\) are plotted. In the upper right, the top three eigenfunctions of the transition matrices are plotted in the coordinate of \(\phi \) with \(\lambda _1=1, \lambda _2=0.9999\) and \(\lambda _3=0.9999\). In the bottom right, the kernel-fitted invariant distribution from the trajectory of time length 500 generated by the original simulator and ATLAS simulator are plotted in the coordinate of \(\phi \)

Table 3 Mean residence time (MRT) and reaction rates for butane
Table 4 Summary of error analysis

8 Conclusion

We have introduced a nonlinear nonparametric technique for reduction of fast-slow stochastic systems, that given a timescale \(\tau \) and access to short trajectories from a black-box simulator, estimates an invariant manifold and an effective stochastic process, called ATLAS, on it, that averages the original system below the timescale \(\tau \). The simulator for ATLAS has time-step of order \(\tau \), typically much larger than the time-step of the original simulator \(\delta t\) (which depends on the fastest timescale), and is intrinsically low-dimensional, making it possible to compute efficiently many long paths of the effective dynamics, and compute approximations to important quantities, such as stationary distributions, mean residence times, and transition rates. We have shown that, under suitable conditions, the estimation of ATLAS is not cursed by the dimension of the state space, and that ATLAS is robust to certain model errors.

This technique significantly extends the one introduced in Crosskey and Maggioni (2017) by correctly handling (1) large fast modes, instead of only very small fast oscillations around a slow manifold, which could be estimated by local PCA, (2) fast modes that are not orthogonal to the slow manifolds, (3) smoothly interpolating all estimated geometric and dynamics quantities, increasing the accuracy of the estimation. Last but not least, it is designed to efficiently run in exploration mode, without loss of accuracy.

The literature on model reduction, averaging and homogenization is vast, see, e.g., Pavliotis and Stuart (2008), Hartmann et al. (2020), Husic and Pande (2018), Maria Bruna et al. (2014) and Givon et al. (2004). Unlike existing techniques, here we do not require: previous knowledge of reaction coordinates or of the slow variables, which we estimate directly; linearity of the slow variables (as in PCA/PODs Holmes et al. 2012); that the fast modes are small (as in local PCA/PODs Holmes et al. 2012 or DMD Rowley et al. 2009; Kutz et al. 2016 or TICA Molgedey and Schuster 1994; Pérez-Hernández et al. 2013), nor that they are orthogonal to the slow manifold, nor that they can be globally defined (as in manifold learning techniques such as Coifman et al. (2008), Rohrdanz et al. (2011), and Singer et al. (2009) and many others), which either requires the absence of even simple topological obstructions (loops) or require a possibly arbitrarily large number of additional coordinates. We also do not require to sample long trajectories, and in exploration mode we do not require a set of sufficiently well-behaved initial conditions; unlike exploration techniques such as Chiavazzo et al. (2017) (and references therein). These techniques can fail (and they do in our examples) to correctly parametrize the invariant manifold, or (not exclusive) the effective dynamics, or would be cursed by the dimension of the state space. Our ATLAS algorithm estimates consistently and accurately the effective dynamics and its invariant manifold in an exploration scheme, which by itself is useful in many cases. Our reduction onto the estimated invariant manifold is nonlinear, and the estimation of both the invariant manifold and of the Itô diffusion is locally parametric, in order to reduce the local sample size required for a given accuracy, but globally nonparametric.

The setting of our work, where a latent slow–fast system in a natural linear coordinate system is observed through a nonlinear observation map, is inspired by the works (Singer et al. 2009; Dsilva et al. 2016). These works start with a latent model significantly simpler than that in Eq. (2.1), and their objective is to learn the map back to the latent space, or at least to the slow variables in the latent space, from bursts of trajectories in observed space. That problem is tackled under significantly stronger assumptions on the latent system, and the approach is typically cursed by the ambient dimension, mainly because it seeks the reduction to slow variables after having constructed an approximation to the full system in the state space. In our work we first locally estimate a reduced system, and do so parsimoniously, by using a rather minimal set of parametric tools, and avoiding the curse of dimensionality.

Extensions to higher order equations, such as Langevin equations, more general local models and nonlinear open neighborhood, incorporating symmetries and conserved quantities, considering non-Gaussian noise and combination with rare sampling techniques are currently being explored.