1 Introduction

This paper studies the long time dynamics of fluid-like equations that are kept out of equilibrium. Among the simplest examples of fluid models displaying interesting out-of-equilibrium behavior (such as fluxes across scales) are the two-dimensional Euler and incompressible Navier–Stokes equations. On the 2-dimensional torus \(\mathbb {T}\), i.e., \(\mathbb {T}:=[0,2\pi ]^2\) with periodic boundary conditions, the Navier–Stokes equations, which model the flow of an incompressible fluid, are

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _tu+(u\cdot \nabla )u = -\nabla p +F + \nu \Delta u\,,\\ {{\,\textrm{div}\,}}(u) :=\nabla \cdot u = 0\,, \end{array}\right. } \end{aligned}$$
(1.1)

where \(u:\mathbb {T}\times \mathbb {R}\rightarrow \mathbb {R}^2\) is the fluid velocity, \(p:\mathbb {T}\times \mathbb {R}\rightarrow \mathbb {R}\) the fluid pressure,

$$\begin{aligned} (u\cdot \nabla )u&= (u_1\partial _1u_1+u_2\partial _2u_1, u_1\partial _1u_2+u_2\partial _2u_2), \quad \text {and}\quad \Delta u= \partial _1^2 u_1 + \partial _2^2 u_2\,. \end{aligned}$$

Here \(u=(u_1,u_2)\) and \(\partial _j:=\partial _{x_j}\). The viscosity \(\nu >0\) measures the strength of the dissipation introduced by the Laplacian \(\Delta \), and F(xt) is an external driving force whose role is to keep the system from relaxing to the trivial state \(u\equiv 0\).

By balancing the dissipative effect of \(\Delta u\), the forcing term allows the system to establish an out-of-equilibrium steady state. Such statistical equilibria often develop fluxes across scales, a phenomenon whose study is an active area of research. Often F is taken to live on only a few scales so that the flux out of those scales can be studied [18, 24, 29, 40]. In practice, the forcing F(xt) is usually taken to be stochastic in space and time for some stationary distribution which is typically white in time [14, 18, 20, 24]. A common choice in the literature is \(F(x,t)=\sum \psi _k(x) \dot{W}_k(t)\) where each \(\psi _k(x)\) is a fixed spatial forcing and \(\{ \dot{W}_k(t) \} \) are a collection of mutually independent white in time noise terms written here as the formal derivative of a Brownian motion. Stochastic forcing serves multiple purposes in these settings. On one hand, as already mentioned, it provides the energetic excitation which keeps the system out of equilibrium and allows for the establishment of a nontrivial statistical steady state. On the other hand, it provides local agitation which, modulo certain constraints, ensures the existence of a unique statistical steady state to which the system converges for most initial conditions. In other words, it guarantees the forcing is sufficiently varied and generic to ensure convergence to a single long time statistical behavior of the system, largely independent of the system’s initial configuration.

This paper studies a class of processes, introduced in the next section, injecting randomness into the fluid models of interest while separating in a simple way the various roles served by noise in previous works. In particular, the randomness is used primarily to ensure that when the dynamics is sufficiently generic, unique ergodicityFootnote 1 holds for a broad class of initial conditions. This will free one to use a much less disruptive class of forcing to keep the system out of equilibrium. More specifically, the class of models introduced below have a number of desirable properties:

  1. (1)

    They allow one to separate the effect of forcing, which keeps the system out of equilibrium, and stochastic agitation, which ensures the system has a unique long time statistical behavior.

  2. (2)

    The stochastic agitation is strongly non-reversible since it is constructed from dynamics which only flow in the directions the original dynamics could already move.

  3. (3)

    The stochastic agitation preserves the conserved quantities of the original dynamics. This allows the properties of the (stochastic) conservative dynamics to be studied directly rather than only as a limit of the forced-dissipated dynamics.

  4. (4)

    The model dynamics will be constructed as the composition of simple dynamics, isolating particular nonlinear interactions which are relatively intuitive and can be explicitly analyzed.

By balancing between preservation of fundamental macroscopic properties of the original dynamics as in (3) and simplicity of the fundamental building blocks in our model dynamics as in (4), we expect the stochastic models introduced in this paper will provide meaningful physical and dynamical insight into nonequilibrium steady states of models such as (1.1).

Our decomposition into fundamental building blocks is partially motivated by the classical stylized models of dynamics studied in depth at the dawn of the theory of dynamical systems. Examples include the doubling map, quadratic maps, the Henon map, the Smale horseshoe, and extended systems like coupled map lattices (see [16, 27] and references therein). The form of the decomposition is also motivated by the recent progress in proving ergodic properties of piecewise deterministic Markov processes (PDMPs) and their success as modeling and sampling tools. See for example [2,3,4, 6, 7, 9, 13, 15, 17, 33,34,35, 38, 44].

1.1 A class of stochastic models

We now introduce the general idea underlying the class of stochastic models, called random splitting, that we study in this paper. A more systematic definition of these models is deferred to Sect. 2. Consider an ordinary differential equation (ODE)

$$\begin{aligned} \dot{x}&= V(x) = \sum _{k=1}^n V_k(x)\,, \end{aligned}$$
(1.2)

where \(n\in \mathbb {N}\) and V and \(\{V_k\}_{k=1}^n\) are vector fields on \(\mathbb {R}^d\). In what follows, we choose the \(V_k\) so that the dynamics

$$\begin{aligned} \dot{x}&= V_k(x) \end{aligned}$$
(1.3)

are in some sense simpler than the dynamics corresponding to (1.2). We then approximate the solution \(\Psi _t :x(0) \mapsto x(t)\) of (1.2) with compositions of the solution maps \(\varphi ^{(k)}_t :x(0) \mapsto x(t)\) of (1.3). This procedure is known as operator splitting in the numerical analysis literature and is often used in numerical simulations of various ordinary, partial, and stochastic differential equations [1, 8, 10, 11, 22, 32, 39, 46, 47]. Typically, the goal is to leverage the fact that each of the dynamics in (1.3) is more computationally tractable than (1.2) to construct an efficient and accurate numerical method. A variant of these models was also explored in the thesis [51].

Here our goal is related but slightly different. Specifically, instead of evolving each \(\varphi ^{(k)}\) for a fixed time h as in traditional operator splitting methods, we evolve each of the \(\varphi ^{(k)}\) for a random time with mean h. Repeated composition then produces dynamics on \(\mathcal {O}(1)\) times. The evolution times for each \(\varphi ^{(k)}\), and over each cycle, will be identically distributed and mutually independent, which implies our models are Markovian. As in the numerical analysis context, we hope to leverage the simplified nature of each \(\varphi ^{(k)}\), obtained from (1.3), to gain insight into the complex dynamics of the composition of maps. We will also see that as the mean evolution time \(h \rightarrow 0\), the random splitting associated to (1.2) will almost surely converge to the deterministic dynamics \(\Psi _t\) on finite time intervals. However, we are most interested in studying the random splitting in its own right and not strictly as a approximation of (1.2). We will be particularly interested in its long time behavior and qualitative understanding of the stationary dynamics the random splitting produces when \(h>0\). More specifically, the property of the system we aim to establish is codified in the following standard definition from the theory of Markov processes; the supporting definition of invariant measure is given in the first paragraph of Sect. 3 after the transition kernel of random splitting is explicitly introduced.

Definition 1.1

A Markov process on a manifold \(\mathcal {X}\) is uniquely ergodic on \(\mathcal {X}\) if its transition kernel admits exactly one invariant probability measure on \(\mathcal {X}\).

We note that the definition of the set \(\mathcal X\) where the above property holds can be quite delicate. While in general there might not exist a d-dimensional manifold \(\mathcal X\) in \(\mathbb R^d\) on which the random splitting is uniquely ergodic (see for example Remark 1.6), in the examples below we will identify a family of manifolds of lowest co-dimension where the above definition applies.

Remark 1.2

The set of invariant probability measures for a Markov transition kernel is convex, and the extremal points of this set are precisely the ergodic invariant measures [12, 23]. In particular, if the transition kernel admits exactly one invariant measure, then it is necessarily extremal and therefore ergodic. This explains the use of the term ergodic in Definition 1.1.

1.2 Two motivating examples

In this paper, we consider two motivating examples: A conservative version of the Lorenz-96 model and Galerkin approximations of the vorticity formulation of the 2D Euler equations. We then use these analyses to study the full Lorenz-96 model and Galerkin approximations of the vorticity formulation of 2D Navier–Stokes.

1.2.1 Lorenz-96.

Fix \(n\ge 4\) and let \(\{e_k\}_{k=1}^n\) denote the standard basis of \(\mathbb {R}^n\). The Lorenz-96 model is

$$\begin{aligned} \dot{x}&= \sum _{k=1}^n\big ((x_{k+1}-x_{k-2})x_{k-1}- \nu x_k+F_k\big )e_k \end{aligned}$$
(1.4)

for \(x\in \mathbb {R}^n\), \(\nu >0\), and nonnegative constants \(F_k\), where the indices are periodized via the identities \(x_{-1}:=x_{n-1}\), \(x_0:=x_n\), and \(x_{n+1}:=x_1\). The \(-\nu x_k\) term in (1.4) represents dissipation in the kth coordinate and \(F_k\) is a forcing constant. Initially, we study a variant of Lorenz-96, called conservative Lorenz-96, obtained by removing the dissipation and forcing terms from Lorenz-96. That is,

$$\begin{aligned} \dot{x}&= V(x) :=\sum _{k=1}^n(x_{k+1}-x_{k-2})x_{k-1}e_k\,. \end{aligned}$$
(1.5)

We sometimes refer to the original Lorenz-96 model as the forced Lorenz-96 model to emphasize the forcing (though the dissipation is equally important). For conservative Lorenz-96, we will decompose V into a collection of simple rotations by observing that

$$\begin{aligned} V(x)&= \sum _{k=1}^n V_k(x) \end{aligned}$$
(1.6)

where \(V_k(x):=(x_{k+1}e_k-x_ke_{k+1})x_{k-1}\). The dynamics given by \(\dot{x}=V_k(x)\) are easy to understand on their own; any complex behavior comes from interactions of the rotations. Importantly, each \(V_k\) is chosen to conserve, like V, the system’s energy, which for Lorenz-96 is defined to be the square of the usual Euclidean norm, \(\Vert x\Vert ^2:=\sum _{k=1}^n x_k^2\).

1.2.2 2D Euler.

Returning to (1.1), we begin by defining the scalar vorticity \(q(x,t)={{\,\textrm{curl}\,}}u(x,t)\) of the velocity field u(xt). Initially, we will consider the Euler equations which are obtained from (1.1) by taking \(\nu =F=0\). Writing the equation for the jth Fourier mode \(q_j \in \mathbb {C}\), defined by \(q(x,t)=\sum _j q_j(t) e_j(x)\) for \(e_j(x):=e^{ix\cdot j}\), and \(j \in \{ j\in \mathbb {Z}^2: |j| < N, j \ne 0\}\), we have

$$\begin{aligned} \dot{q_j} = -\sum _{j+k+\ell =0} C_{k\ell } \bar{q}_k\bar{q}_\ell \end{aligned}$$
(1.7)

for a constant \(C_{k\ell }\) defined in Sect. 6.1. We will see that this system has two conserved quantities, the enstrophy, \(\sum _j |q_j|^2\), and the energy, \(\sum _j |j|^{-2} |q_j|^2\). Notice that the definition of energy differs between this equation and the Lorenz-96 model.

As in the Lorenz-96 model, we introduce the simpler dynamics \(\dot{q}= V_{j k \ell }(q)\) where \(V_{j k \ell }(q) =- C_{k\ell } \bar{q}_k \bar{q}_\ell e_j- C_{j\ell } \bar{q}_j\bar{q}_\ell e_k- C_{jk} \bar{q}_j \bar{q}_k e_\ell \) and observe that

$$\begin{aligned} V(q)=\sum _{j+k+\ell =0} V_{jk\ell }(q)\,. \end{aligned}$$

We will see in Sect. 6 that with this choice of splitting the dynamics \(\dot{q}= V_{jk\ell }(q)\), like the original system V(q), preserves the important physical quantities of enstrophy and energy.

Remark 1.3

In Sect. 6, we further simplify these complex-valued dynamics by projecting onto a real basis. The current choice is sufficient for an introductory discussion.

Remark 1.4

Our results do not focus on establishing minimal hypoellipticity assumptions for our models of Lorenz-96, 2D Euler and 2D Navier–Stokes; the stochastic agitation we use is more global than the minimal hypoellipticity forcing considered in [18, 24]. We hope this will allow us to progress further than with previous models while preserving much of the physically interesting dynamics.

Remark 1.5

It is important to emphasize that, with regard to unique ergodicity, the main role of the forcing, when included, is only to destroy the fixed points and other low-dimensional invariant structures of the original flows and not to provide the stochastic mixing which ensures the existence of a unique, ergodic measure to which the system’s statistics converge. The stochastic mixing is largely provided by the random splitting and is in contrast to the results in [5, 18, 19, 24, 29,30,31].

Remark 1.6

When considering conservative versions of our split dynamics (those without any explicit dissipation or body forcing), we cannot expect there to be a unique invariant measure for the system. In particular, since the dynamics will be constrained to level sets of the conserved quantities, there will be at least one invariant measure per level set. Furthermore, we will see that even on such constraint level sets there can be multiple ergodic invariant measures. Most will correspond to fixed points of the original dynamics and other lower-dimensional invariant structures. However we will see, in the two examples considered, that when our family of switched vector fields is sufficiently rich, there will be a unique ergodic invariant measure which is absolutely continuous with respect to the volume measure on the level set. This implies that in these examples, there is a unique ergodic invariant measure concentrated on a set of full measure inside each constraint level set. In this sense, we will demonstrate a form of uniqueness which aligns with the form of unique ergodicity often proven in the smooth deterministic dynamics setting, i.e., that there is only one invariant measure absolutely continuous with respect to the setting’s natural Lebesgue measure.

1.3 Organization of paper

In Sect. 2, we introduce random splitting and its state spaces, called \(\mathcal {V}\)-orbits. In Sect. 3, we give conditions for random splitting to be uniquely ergodic on a \(\mathcal {V}\)-orbit. In Sect. 4, we show under general conditions that random splitting converges to its deterministic counterpart (1.2) on finite time intervals both in terms of its transition kernel and almost surely as the average time step h goes to zero. In Sects. 5 and 6, we construct random splittings of conservative Lorenz-96 and Galerkin approximations of 2D Euler and apply the preceding results to show these splittings are uniquely ergodic and converge on finite time intervals as \(h \rightarrow 0\). In doing so we show each system has a unique invariant measure that is absolutely continuous (with respect to the volume measure) on the set defined by a given choice of the conserved quantities. In Sect. 7, we consider the Lorenz-96 and Euler models when fixed forcing and dissipation are added. When appropriate dissipation is chosen, the latter model corresponds to a random splitting of Galerkin approximations of 2D Navier–Stokes. We again construct random splittings of these models, prove convergence, and show that if the forcing is not aligned with the equations’ invariant structures (such as fixed points) then both randomly split Lorenz-96 and Galerkin approximations of 2D Navier–Stokes have a unique invariant measure and the distribution starting from any initial condition converges exponentially to this measure.

2 Random Splitting in a General Setting

Let \({\mathcal {V}:=}\{V_k\}_{k=1}^n\) be a family of completeFootnote 2, \(\mathcal {C}^2\) vector fieldsFootnote 3 on \(\mathbb {R}^d\) and set

$$\begin{aligned} V&:=\sum _{k=1}^n V_k\,. \end{aligned}$$
(2.1)

Denote the flow of \(\dot{x}=V(x)\) by \(\Psi \) and the flow of \(\dot{x}=V_k(x)\) by \(\varphi ^{(k)}\). \(\Psi \) is the true dynamics. To construct a random dynamics approximating \(\Psi \), fix \(h>0\), let \(\tau =(\tau _k)_{k=1}^\infty \) be a sequence of independent exponential random variables with mean 1, and set \(h\tau :=(h\tau _k)_{k=1}^\infty \). The approximating dynamics, henceforth referred to as the random splitting associated to \(\mathcal {V}\) or just random splitting for short, is the Markov chain \(\{\Phi ^m_{h\tau }\}_{m=0}^\infty \) defined by \(\Phi ^0_{h\tau }:=I\) and, for \(m>0\),

$$\begin{aligned} \Phi ^m_{h\tau }&:=\varphi ^{(n)}_{h\tau _{mn}}\circ \cdots \circ \varphi ^{(1)}_{h\tau _{(m-1)n+1}}(\Phi ^{m-1}_{h\tau }), \end{aligned}$$
(2.2)

where I is the identity on \(\mathbb {R}^d\), \(\Phi :=\varphi ^{(n)}\circ \cdots \circ \varphi ^{(1)}\), and \(\Phi ^m\) is the m-fold composition of \(\Phi \). Note that \(h\tau _k\overset{\scriptscriptstyle {iid}}{\sim }\text {Exp}(1/h)\). Therefore, starting from the current step, the next step of the chain is obtained by following each \(V_k\) for \(\text {Exp}(1/h)\) time in order from \(k=1\) to n. The chain is Markovian because the random times are independent. Its transition kernel \(P_h\) acts on measurable functions \(f:\mathbb {R}^d\rightarrow \mathbb {R}\) via

$$\begin{aligned} P_hf(x)&= \mathbb {E}\big (f(\Phi _{h\tau }(x))\big ) = \int _{\mathbb {R}^n_+}f(\Phi _{ht}(x))e^{-\sum _{k=1}^n t_k} dt \end{aligned}$$
(2.3)

where \(\mathbb {R}_+:=(0,\infty )\), \(t=(t_1,\dots ,t_n)\), and \(dt=dt_1\cdots dt_n\).

Remark 2.1

Throughout this paper the superscripts k in \(\varphi ^{(k)}\) and subscripts k in \(V_k\) are understood to be taken modulo n if \(k\ \text {mod}\ n\ne 0\) and to be n otherwise. For example, if \(n=3\),

$$\begin{aligned} \varphi ^{(6)}\circ \varphi ^{(5)}\circ \varphi ^{(4)}\circ \varphi ^{(3)}\circ \varphi ^{(2)}\circ \varphi ^{(1)}&= \varphi ^{(3)}\circ \varphi ^{(2)}\circ \varphi ^{(1)}\circ \varphi ^{(3)}\circ \varphi ^{(2)}\circ \varphi ^{(1)}. \end{aligned}$$

Also, the t in \(\Phi ^m_t\) is always a sequence \(t=(t_1,\dots , t_{mn})\) or, more generally, \(t=(t_k)_{k=1}^\infty \), so that

$$\begin{aligned} \Phi ^m_t(x)&= \varphi ^{(n)}_{t_{mn}}\circ \cdots \circ \varphi ^{(1)}_{t_1}(x)\,. \end{aligned}$$

Note that the above is a composition of mn flows, as in (2.2).

Remark 2.2

All results in this paper remain true if at each step we randomly permute indices in the composition \(\Phi \). That is, given a current state x, the next step is \(\varphi ^{(\sigma (n))}_{h\tau _n}\circ \cdots \circ \varphi ^{(\sigma (1))}_{h\tau _1}(x)\) where \(\sigma \) is a random permutation of \(\{1,\dots ,n\}\). This yields both additional randomness and an avenue to higher order approximations of the true dynamics [10, 11, 32, 46, 47]. We forgo this more general setting however to keep exposition more approachable and notationally light.

Remark 2.3

The times are assumed exponentially distributed for convenience. All results extend to any distribution on \([0,\infty )\) with positive density on \((0,\varepsilon )\) for some \(\varepsilon >0\) and exponential tails. The second condition, which is not sharp, guarantees sufficient concentration of averages of random flow times \(\tau _i\) in Lemmas A.3 and A.4 and is required for the convergence results as \(h \rightarrow 0\) in Sect. 4. The first condition is used in Sects. 5 and 6 to guarantee sufficient flexibility in the trajectories of the split systems of interest to establish the global irreducibility needed for ergodicity.

2.1 \(\mathcal {V}\)-Orbits

Throughout this paper we often restrict attention to certain subsets of \(\mathbb {R}^d\) affiliated with the family of vector fields \(\mathcal {V}\). Specifically, for each x in \(\mathbb {R}^d\) define the \(\mathcal {V}\)-orbit of x by

$$\begin{aligned} \mathcal {X}(x)&:=\big \{ \Phi ^m_t(x) : m\ge 0, t\in \mathbb {R}^{mn}\big \}. \end{aligned}$$
(2.4)

This is the set of points in \(\mathbb {R}^d\) that can be reached by the split dynamics starting from x in any finite number of steps and over arbitrary positive and negative times. \(\mathcal {X}(x)\) is well-defined since the \(V_k\) are complete. Furthermore, since the time vectors t in (2.4) admit coordinates that are 0,

$$\begin{aligned} \mathcal {X}(x)&= \big \{\varphi ^{(i_m)}_{t_{i_m}}\circ \cdots \circ \varphi ^{(i_1)}_{t_{i_1}}(x):m\in \mathbb {N}, 1\le i_j\le n, t_{i_j}\in \mathbb {R}\big \}. \end{aligned}$$

Hence (2.4) agrees with the definition of \(\mathcal {V}\)-orbits from control theory [26, 50]. The collection \(\{\mathcal {X}(x):x\in \mathbb {R}^d\}\) partitions \(\mathbb {R}^d\) and if the random splitting \(\{\Phi ^m_{h\tau }\}\) associated to \(\mathcal {V}\) starts in \(\mathcal {X}(x)\) then it stays in \(\mathcal {X}(x)\) for all time. Therefore the \(\{\Phi ^m_{h\tau }\}\) previously defined on \(\mathbb {R}^d\) also defines a Markov chain on \(\mathcal {X}(x)\) whenever it starts in \(\mathcal {X}(x)\), and its transition kernel \(P_h\) acts on measurable functions \(f:\mathcal {X}(x)\rightarrow \mathbb {R}\) as in (2.3). When x is arbitrary or clear from context, we denote \(\mathcal {X}(x)\) by \(\mathcal {X}\). A classic result from geometric control theory, sometimes called the orbit theorem, says if every \(V_k\) in \(\mathcal {V}\) is \(\mathcal {C}^r\) for some \(1\le r\le \infty \) (respectively, analyticFootnote 4), then every \(\mathcal {X}\) is an immersed \(\mathcal {C}^r\) (respectively, analytic) submanifold of \(\mathbb {R}^d\) [26]. In particular, each \(\mathcal {X}\) has a Riemannian structure induced by the Euclidean structure on \(\mathbb {R}^d\) and an associated volume form, henceforth denoted \(\lambda \), sometimes called Hausdorff or Lebesgue measure on \(\mathcal {X}\), which serves as our reference measure on \(\mathcal {X}\).

Remark 2.4

The orbit theorem says every \(\mathcal {V}\)-orbit \(\mathcal {X}\) is an immersed but not necessarily embedded submanifold of \(\mathbb {R}^d\). For example, \(\mathcal {X}\) can be a “figure-eight" curve in \(\mathbb {R}^2\) [37, Example 5.19]. Nevertheless, every \(\mathcal {X}\) is a manifold with a volume form induced by the ambient Euclidean structure, and every vector field in \(\mathcal {V}\) restricts to a vector field on \(\mathcal {X}\) by construction. In particular, \(\{V_i(x):V_i\in \mathcal {V}\}\) is a set of vectors in the tangent space \(T_x\mathcal {X}\) for every x in \(\mathcal {X}\). Throughout this paper submanifold will mean immersed submanifold without further qualification. See [36, 37] for more on immersed and embedded submanifolds in general, and [26] for more on \(\mathcal {V}\)-orbits in particular.

3 Ergodicity

Let \(\mathcal {V}:=\{V_k\}_{k=1}^n\) be a family of complete, \(\mathcal {C}^2\) vector fields on \(\mathbb {R}^d\) as before and fix a p-dimensional \(\mathcal {V}\)-orbit \(\mathcal {X}\). Also fix \(h>0\) and let \(P_h\) be the transition kernel of the associated random splitting on \(\mathcal {X}\). A measure \(\mu \) on \(\mathcal {X}\) is \(P_h\)-invariant if \(\mu P_h=\mu \) where \(\mu P_h\) is defined by

$$\begin{aligned} \mu P_h f&:=\int _\mathcal {X} P_hf(x)\mu (dx) \end{aligned}$$
(3.1)

for all bounded, measurable functions \(f:\mathcal {X}\rightarrow \mathbb {R}\). The main result of this section is

Theorem 3.1

If there exists \(x_*\) in \(\mathcal {X}\) such that for all x in \(\mathcal {X}\) there is an m in \(\mathbb N\) and t in \(\mathbb R_+^{mn}\) with \(\Phi ^m(x,t)=x_*\) and \(D_t\Phi ^m(x,t):T_{t}\mathbb {R}^{mn}_+\rightarrow T_{x_*}\mathcal {X}\) surjective, then \(P_h\) has at most one invariant measure on \(\mathcal {X}\). Moreover, if such a measure exists, it is absolutely continuous with respect to the volume form on \(\mathcal {X}\).

Here \(T_{x_*}\mathcal {X}\) is the tangent space of \(\mathcal {X}\) at \(x_*\). The proof of Theorem 3.1 follows from the classical minorization condition [25, 41, 43, 48] given by the following result, which appears in [6, Lemma 6.3].

Lemma 3.2

Let \(p\le m\) and let \(F:\mathcal {X}\times U\rightarrow \mathcal {X}\) be \(\mathcal {C}^1\), where U is an open subset of \(\mathbb {R}^m\). Suppose \(\tau \) is a U-valued random variable with continuous density \(\rho \). If for some (xt) in \(\mathcal {X}\times U\) the map \(D_tF(x,t)\) is surjective and \(\rho \) is bounded below by \(c_0>0\) on a neighborhood of t, then there exists a constant \(c>0\) and neighborhoods \(U_x\) of x and \(U_*\) of \(x_*:=F(x,t)\) such that

$$\begin{aligned} \mathbb {P}\big (F(y,\tau )\in B\big )&\ge c\lambda (B\cap U_*) \end{aligned}$$
(3.2)

for all y in \(U_x\) and B in the Borel \(\sigma \)-algebra \(\mathcal {B}(\mathcal {X})\) of \(\mathcal {X}\) (recall \(\lambda \) is the volume form on \(\mathcal {X}\)).

Remark 3.3

In our setting, \(U=\mathbb {R}^{mn}_+\), \(F=\Phi ^m:\mathcal {X}\times \mathbb {R}^{mn}_+\rightarrow \mathcal {X}\), and \(\tau =(\tau _1,\dots ,\tau _{mn})\) with the \(\tau _k\) independent exponential random variables with mean h. In this case, if \(x_*=\Phi ^m(x,t)\) for some t with \(D_t\Phi ^m(x,t)\) surjective, then Lemma 3.2 guarantees the existence of a constant \(c>0\) and neighborhoods \(U_x\) of x and \(U_*\) of \(x_*\) such that, for all y in \(U_x\) and B in \(\mathcal {B}(\mathcal {X})\),

$$\begin{aligned} P^m(y,B)&\ge c\lambda (B\cap U_*)\,. \end{aligned}$$
(3.3)

Proof of Theorem 3.1

The proof is by contradiction. Suppose \(\mu _1\) and \(\mu _2\) are distinct \(P_h\)-invariant probability measures. Assume without loss of generality both \(\mu _i\) are ergodic and therefore mutually singular [12, 28]. Then there exist disjoint measurable sets \(A_1\) and \(A_2\) partitioning \(\mathcal {X}\) such that \(\mu _i(B)=\mu _i(B\cap A_i)\) for all B in \(\mathcal {B}(\mathcal {X})\). Fix \(x_i\) in the support of \(\mu _i\) so, by definition, \(\mu _i\) gives positive measure to every neighborhood of \(x_i\). By hypothesis and Remark 3.3 there exist \(c_i>0\), \(m_i\in \mathbb {N}\), and neighborhoods \(U_i\) of \(x_i\) and \(U_*\) of \(x_*\) such that \(P_h^{m_i}(x,\cdot )\ge c_i\lambda (\cdot \cap U_*)\) for all x in \(U_i\). So,

$$\begin{aligned} \mu _i(B)&= \mu _iP_h^{m_i}(B) \ge \int _{U_i} P_h^{m_i}(x,B)\mu _i(dx) \ge c_i\lambda (B\cap U_*)\mu _i(U_i) \end{aligned}$$
(3.4)

for all B in \(\mathcal {B}(\mathcal {X})\). In particular, \(\mu _i(B)=0\) implies \(\lambda (B\cap U_*)=0\) since \(c_i\) and \(\mu _i(U_i)\) are strictly positive. But \(\mu _1(A_2\cap U_*)=\mu _2(A_1\cap U_*)=0\) and hence

$$\begin{aligned} 0&< \lambda (U_*) = \lambda (A_1\cap U_*)+\lambda (A_2\cap U_*) = 0, \end{aligned}$$

which is a contradiction. Absolute continuity of the \(P_h\)-invariant measure \(\mu \), provided it exists, follows from uniqueness together with the fact that the absolutely continuous part, \(\mu _{ac}\), and singular part, \(\mu _s\), of \(\mu \) are \(P_h\)-invariant whenever \(\mu \) is [6, Proposition 2.7]. Specifically, since \(\mu _{ac}\) and \(\mu _s\) are \(P_h\)-invariant and there can be at most one \(P_h\)-invariant probability measure, either \(\mu _{ac}\) or \(\mu _s\) is identically zero. Since \(\mu _{ac}\) is nonzero by (3.4), it follows that \(\mu _s=0\) and therefore \(\mu =\mu _{ac}\). \(\square \)

Remark 3.4

The invariant measure \(\mu \), which we defined as a fixed point of the left action of the Markov semigroup P, is often called a stationary measure. This is since the sequence of random variables generated by the Markov process starting from an initial condition distributed according to \(\mu \) will be stationary. This helps distinguish from the invariant measure of the skew flow \((x, \tau )\mapsto (\Psi _{h\tau }(x), \vartheta \tau )\) where the shift \(\vartheta \) is defined by \(\vartheta \tau : \tau = (\tau _1,\tau _2,\cdots ) \mapsto (\tau _{n+1},\tau _{n+2},\cdots )\). The skew perspective captures more information about the dynamics and is preferred for many questions. However, we will not pursue it here as it complicates the simple picture we explore in this note.

3.1 The Lie bracket condition

Let \(\mathfrak {X}(\mathcal {X})\) be the Lie algebra of smooth vector fields on \(\mathcal {X}\) and assume throughout this subsection the vector fields in \(\mathcal {V}\) are smooth. Then the smallest subalgebra \({{\,\textrm{Lie}\,}}(\mathcal {V})\) of \(\mathfrak {X}(\mathcal {X})\) containing \(\mathcal {V}\) is well-defined, and for each x in \(\mathcal {X}\) the collection \({{\,\textrm{Lie}\,}}_x(\mathcal {V}):=\{V(x):V\in {{\,\textrm{Lie}\,}}(\mathcal {V})\}\) is a subspace of the tangent space \(T_x\mathcal {X}\) at x.

Definition 3.5

The Lie bracket condition holds at x in \(\mathcal {X}\) if \({{\,\textrm{Lie}\,}}_x(\mathcal {V})=T_x\mathcal {X}\).

The Lie bracket condition is called the weak bracket condition in [6] and Condition B in [2]. Both papers also consider a strong bracket condition (Condition A) which is used for results about continuous time Markov processes and is therefore not needed here. The Lie bracket condition has the following important consequence. Note \(\mathbb {R}_+:=(0,\infty )\) throughout this paper.

Theorem 3.6

If the Lie bracket condition holds at a point \(x_*\) in \(\mathcal {X}\) then for every neighborhood U of \(x_*\) in \(\mathcal {X}\) and every \(T>0\) there exists an x in U, an m, and a t in \(\mathbb {R}^{mn}_+\) such that \(\sum _{k=1}^{mn} t_k\le T\) and \(t\mapsto \Phi ^m(x_*,t)=x\) is a submersion at t, i.e. \(D_t\Phi ^m(x_*,t):T_t\mathbb {R}^{mn}\rightarrow T_x\mathcal {X}\) is surjective.

A version of Theorem 3.6 appears as Theorem 3.1 in [26]; the equivalent version given here is better suited to random splitting and other classes of piecewise deterministic Markov processes. See Theorem 5 in [2] and Theorem 4.4 in [6] and their corresponding discussions for details. Intuitively, Theorem 3.6 says that if the Lie bracket condition holds at \(x_*\) then, as a consequence of surjectivity, the random splitting can move in any infinitesimal direction from \(x_*\) in arbitrarily small positive times. The next result is an immediate consequence of Theorems 3.1 and 3.6.

Corollary 3.7

Suppose there is an \(x_*\) in \(\mathcal {X}\) at which the Lie bracket condition holds and such that for every x in \(\mathcal {X}\) there is an \(m \in \mathbb N\) and a \(t \in \mathbb {R}^{mn}_+\) satisfying \(\Phi ^m(x,t)=x_*\). Then \(P_h\) has at most one invariant measure on \(\mathcal {X}\). Furthermore, if such a measure exists, it is absolutely continuous with respect to the volume form \(\lambda \).

One benefit of Corollary 3.7 is that it replaces the need to check the surjectivity assumption of Theorem 3.1, which can be challenging in practice, with the verification of the Lie bracket condition. The next result provides a further convenience in the analytic setting which will be used in the specific examples considered below. See [26, 45] for further discussion and proof.

Theorem 3.8

. Suppose the vector fields in \(\mathcal {V}\) are analytic. If the Lie bracket condition holds at any point in \(\mathcal {X}\), then it holds at every point in \(\mathcal {X}\).

Corollary 3.9

Suppose the vector fields in \(\mathcal {V}\) are analytic and there is a point \(x_*\) in \(\mathcal {X}\) such that for every x in \(\mathcal {X}\) there is an \(m \in \mathbb N\) and a \(t \in \mathbb {R}^{mn}_+\) satisfying \(\Phi ^m(x,t)=x_*\). If the Lie bracket condition holds at any point in \(\mathcal {X}\), then \(P_h\) has at most one invariant measure on \(\mathcal {X}\). Furthermore, if such a measure exists, it is absolutely continuous with respect to the volume form \(\lambda \).

Proof

Since the Lie bracket condition holds at one point in \(\mathcal {X}\), it also holds at \(x_*\) by Nagano’s theorem. The result follows by Corollary 3.7. \(\square \)

4 Convergence as Mean Time Step Goes to Zero

A well-known result in the operator splitting literature is that the error incurred in approximating \(\Psi \) by the deterministic splitting scheme \(\Phi _h=\varphi ^{(n)}_h\circ \cdots \circ \varphi ^{(1)}_h\) is \(\mathcal {O}(h)\) [39]. That is, \(\Phi _h\) converges to the true dynamics \(\Psi \) at worst linearly in h as \(h\rightarrow 0\). In this section we give analogous results for random splitting; the pluralized “results" reflects that with randomness comes several different notions of convergence. Specifically, we give two main results. First, as in the deterministic case, the transition kernel \(P_h\) of random splitting converges to the true dynamics linearly in h as \(h\rightarrow 0\). Second, random splitting converges almost-surely to the true dynamics as \(h\rightarrow 0\). Each case requires a slightly different notion of \(\mathcal {O}(h)\). These statements are made precise in Theorems 4.1 and 4.5, respectively, but to make sense of them we first introduce the appropriate setting.

The following assumption on \(\mathcal {V}\)-orbits is used throughout this section.

Assumption 1

\(\mathcal {X}(x)\) is bounded for each x in \(\mathbb {R}^d\).

Since the vector fields \(V_k\) are assumed \(\mathcal {C}^2\), Assumption 1 implies the \(V_k\) are bounded with bounded first and second derivatives on every \(\mathcal {X}\). In particular,

$$\begin{aligned} C_*(x_0)&:=\sup _{x\in \mathcal {X}(x_0)}\left\{ \Vert V_k(x)\Vert , \Vert DV_k(x)\Vert , \Vert D^2V_k(x)\Vert : 1\le k\le n\right\} < \infty \,, \end{aligned}$$
(4.1)

where \(\Vert V_k(x)\Vert \) is the usual Euclidean norm, \(\Vert DV_k(x)\Vert \) is the operator norm of the linear map \(DV_k(x):\mathbb {R}^d\rightarrow \mathbb {R}^d\), and \(\Vert D^2V_k(x)\Vert \) is the operator norm of the bilinear map \(D^2V_k(x):\mathbb {R}^d\times \mathbb {R}^d\rightarrow \mathbb {R}^d\).

For a positive integer k let \(\mathcal {C}^k(\mathcal {X})\) be the space of k-times continuously differentiable functions \(f:\mathcal {X}\rightarrow \mathbb {R}\). For f in \(\mathcal {C}^k(\mathcal {X})\) and \(\ell \le k\), the \(\ell \)th derivative \(D^\ell f(x)\) of f at x is a multilinear operator from \(\otimes _1^\ell T_x\mathcal {X}\) to \(\mathbb {R}\). The operator norm of \(D^\ell f(x)\) is then

$$\begin{aligned} \Vert D^\ell f(x)\Vert&:=\sup _{\Vert \eta \Vert =1}\left\{ |D^\ell f(x)\eta |\right\} \,, \end{aligned}$$

where \(\eta \in \otimes _1^\ell T_x\mathcal {X}\). Defining \(D^0 f(x) :=f(x)\), this in turn induces a norm on \(\mathcal {C}^k(\mathcal {X})\) given by

$$\begin{aligned} \Vert f\Vert _k&:=\sup _{x\in \mathcal {X}}\left\{ \Vert D^\ell f(x)\Vert : 0\le \ell \le k\right\} . \end{aligned}$$

The corresponding operator norm is denoted \(\Vert \cdot \Vert _{k\rightarrow k}\). More generally, for any k and \(\ell \) define a norm \(\Vert \cdot \Vert _{k\rightarrow \ell }\) on the space of linear operators \(L:\mathcal {C}^k(\mathcal {X})\rightarrow \mathcal {C}^\ell (\mathcal {X})\) by

$$\begin{aligned} \Vert L\Vert _{k\rightarrow \ell }&:=\sup _{\Vert f\Vert _k=1} \Vert Lf\Vert _\ell \,. \end{aligned}$$

We make frequent use of the submultiplicity of \(\Vert \cdot \Vert _{k\rightarrow \ell }\). Namely, if A and B are bounded linear operators from \(\mathcal {C}^j(\mathcal {X})\) to \(\mathcal {C}^k(\mathcal {X})\) and from \(\mathcal {C}^k(\mathcal {X})\) to \(\mathcal {C}^\ell (\mathcal {X})\), respectively, then

$$\begin{aligned} \Vert BA\Vert _{j\rightarrow \ell }&\le \Vert B\Vert _{k\rightarrow \ell }\Vert A\Vert _{j\rightarrow k}\,. \end{aligned}$$

The results below are stated in terms of semigroups of the flows \(\Psi \) and \(\varphi ^{(j)}\), which are \(\mathcal {C}^2\) by assumption. Hence for all \(k\le 2\) the semigroup \(\{S_t\}_{t\ge 0}\) corresponding to \(\Psi \) acts on \(f\in \mathcal {C}^k(\mathcal {X})\) via

$$\begin{aligned} S_tf(x)&= e^{tV}f(x) = f(\Psi _t(x)) \end{aligned}$$
(4.2)

and, similarly, the semigroup \(\{\widetilde{S}^{(j)}_t\}_{t\ge 0}\) corresponding to \(\varphi ^{(j)}\) is given by

$$\begin{aligned} \widetilde{S}^{(j)}_tf(x)&= e^{tV_j}f(x) = f(\varphi ^{(j)}_t(x))\,. \end{aligned}$$
(4.3)

In particular, m steps of random splitting corresponds to \(\widetilde{S}^m_{h\tau } :=\widetilde{S}^{(1)}_{h\tau _1}\cdots \widetilde{S}^{(mn)}_{h\tau _{mn}}\) with superscripts taken as in Remark 2.2. The transition kernel \(P^m_h\) and semigroup composition \(\widetilde{S}^m_{h\tau }\) are related via

$$\begin{aligned} P^m_hf=\mathbb {E}(f(\Phi ^m_{h\tau }))=\mathbb {E}(\widetilde{S}^m_{h\tau }f)\,. \end{aligned}$$
(4.4)

With the above notation we now present the two main results of this section, Theorems 4.1 and 4.5, which follow from Lemmas 4.2 and 4.6, respectively. The full proofs of both lemmas are given in the Appendix, but we discuss the general idea behind each at the end of this section.

Theorem 4.1

Suppose Assumption 1 holds and fix \(t>0\). For all h sufficiently small and satisfying \(mh=t\) for some \(m\in \mathbb {N}\), there exists a constant C(t) depending on t but not on h such that

$$\begin{aligned} \Vert P_h^m-S_t\Vert _{2\rightarrow 0}&\le C(t)h. \end{aligned}$$
(4.5)

Lemma 4.2

If Assumption 1 holds then there exists a constant C such that

$$\begin{aligned} \Vert P_h-S_h\Vert _{2\rightarrow 0}&\le Ch^2 \end{aligned}$$
(4.6)

for all h sufficiently small.

Recalling from (4.4) that \(P_h=\mathbb {E}(\widetilde{S}_{h\tau }^1)\), informally Lemma 4.2 states that the average difference between one step of random splitting and the true dynamics is \(\mathcal {O}(h^2)\) for sufficiently small h. For any finite time interval [0, t] we can leverage this result to approximate \(S_t\) by successive steps of \(P_h\). Specifically, choose h sufficiently small so that (4.6) holds and there exists an integer m with \(mh=t\). Then the composition \(P^m_h\) corresponds to \(\mathcal {O}(1/h)\) steps of \(P_h\). Consequently, since the difference between \(P_h\) and \(S_h\) is \(\mathcal {O}(h^2)\), the difference between \(P^m_h\) and \(S_t\) is \(\mathcal {O}(h)\).

A possible interpretation of \(\mathcal {O}(h^p)\) is given in Theorem 4.1 and Equation (4.5). This choice matched the particular results being proved. In Theorem 4.5 and Lemma 4.6 below, we chose to quantify the error in another fashion, though the same order of magnitude statements hold true. The same basic reasoning can be used to prove the following.

Remark 4.3

As we have made minimal assumptions on the splitting, we will only be able to deduce that \(P_h-S_h= \mathcal {O}(h^2)\). In specific examples, it is often possible to arrange the splitting so that \(P_h-S_h= \mathcal {O}(h^p)\) with \(p > 2\). An example of a higher order splitting is Strang splitting [39]. Alternatively, higher order can also be obtained by fully randomizing the order [10] or randomly choosing between one ordering and its reverse [32, 46, 47].

Proof of Theorem 4.1

Let h be sufficiently small that (4.6) holds and such that \(mh=t\) for some \(m\in \mathbb {N}\). The quantity of interest can be written as the following telescoping sum:

$$\begin{aligned} P^m_h-S_t&= \sum _{k=1}^m P^{k-1}_h(P_h-S_h)S_{h(m-k)}\,. \end{aligned}$$
(4.7)

For any k and continuous function f with \(\Vert f\Vert _0=1\),

$$\begin{aligned} \Vert P^k_h f\Vert _0&\le \mathbb {E}\left( \big \Vert f\big (\Phi ^k_{h\tau }\big )\big \Vert _0\right) = 1\,. \end{aligned}$$

Hence \(\Vert P^k_h\Vert _{0\rightarrow 0}=1\). Similarly, since \(mh=t\) implies \(h(m-k)\le t\) for \(k\ge 0\) and \(\mathcal {X}\) is bounded by Assumption 1 (so \(\Psi \) and its first and second derivatives are bounded on \(\mathcal {X}\), uniformly on [0, t]),

$$\begin{aligned} \Vert S_{h(m-k)}\Vert _{2\rightarrow 2}&\le K(t) \end{aligned}$$

for some K(t) depending on t but not h. Hence, by submultiplicity, (4.7), and Lemma 4.2, we have

$$\begin{aligned} \Vert P^m_h-S_t\Vert _{2\rightarrow 0}&\le \sum _{k=1}^m \Vert P^{k-1}_h\Vert _{0\rightarrow 0}\Vert P_h-S_h\Vert _{2\rightarrow 0}\Vert S_{h(m-k)}\Vert _{2\rightarrow 2} \le K(t)\sum _{k=1}^m Ch^2 = C(t)h\,, \end{aligned}$$

where \(C(t) :=K(t)C\), with C the constant from (4.6) in Lemma 4.2.\(\square \)

Remark 4.4

Theorem 4.1 had the relation \(h=t/m\), while in the almost-sure results below we will take \(h=t/m^2\) (note we explicitly write \(t/m^2\), making no reference to the variable h). The reason, loosely speaking, is that the transition kernel depends only on the expectation of the randomness, while the almost-sure results additionally depend on fluctuations of the randomness about its mean. For example, Lemma 4.6 prepares for an application of the Borel-Cantelli lemma by establishing the summability of probabilities of “large” fluctuations over sets of \(\mathcal {O}(m)=\mathcal {O}(1/\sqrt{h})\) cycles. This is discussed in more detail at the end of this section and worked out in full in the Appendix.

Theorem 4.5

Suppose Assumption 1 holds and fix \(t>0\). Then for any \(\varepsilon > 0\),

$$\begin{aligned} \mathbb P \left( \limsup _{m \rightarrow \infty } \Vert \widetilde{S}^{m^2}_{t\tau /m^2}-S_t\Vert _{2\rightarrow 0} > \varepsilon \right) = 0\,. \end{aligned}$$
(4.8)

Lemma 4.6

Suppose Assumption 1 holds and fix \(t>0\). Then for any \(\varepsilon > 0\),

$$\begin{aligned} \sum _{m=1}^\infty \mathbb {P}\left( \Vert \widetilde{S}^m_{t\tau /m^2}-S_{t/m}\Vert _{2\rightarrow 0} > \tfrac{\varepsilon }{m}\right)&< \infty \,. \end{aligned}$$
(4.9)

Remark 4.7

There is a relationship between Theorems 4.1 and 4.5 and the averaging results from Wentzell-Freidlin theory, e.g., [21, Theorem 2.1, Chapter 7]. This theorem builds on local results like Lemmas 4.2 and 4.6. Since our averaging is that of a deterministic, cyclic process, the calculations can be more explicit and more precise. We are able to prove using simple calculations that the local error is \(\mathcal O(h^2)\) which leads to \(\mathcal O(h)\) error over order one times. Typical soft averaging results prove a local error of o(hFootnote 5 and then simply conclude that the order one error goes to 0. Of course, more careful calculations are possible in the averaging setting. However, the simple structure of our problems, where the only randomness is in the switching times and not the orderings, allows for the direct, straightforward proofs we have presented.

Proof of Theorem 4.5

By the Borel-Cantelli Lemma it suffices to show

$$\begin{aligned} \sum _{m=1}^\infty \mathbb {P}\left( \Vert \widetilde{S}^{m^2}_{t\tau /m^2}-S_t\Vert _{2\rightarrow 0} > \varepsilon \right)&< \infty \,. \end{aligned}$$

Consider the telescoping sum

$$\begin{aligned} \widetilde{S}^{m^2}_{t\tau /m^2}-S_t&= \sum _{k=1}^m \widetilde{S}^{(k-1)}_{t\tau /m^2}\left( \widetilde{S}^m_{t\tau /m^2}-S_{t/m}\right) S_{(m-k)t/m}\,. \end{aligned}$$
(4.10)

For any k and continuous function f with \(\Vert f\Vert _0=1\),

$$\begin{aligned} \big \Vert \widetilde{S}^k_{t\tau /m^2}f \big \Vert _0&= \big \Vert f\big (\Phi ^k_{h\tau }\big )\big \Vert _0 = 1\,. \end{aligned}$$

Hence \(\Vert \widetilde{S}^{(k-1)}_{t\tau /m^2}\Vert _{0\rightarrow 0}=1\). Similarly, since \((m-k)t/m\le t\) for \(k\ge 0\) and \(\mathcal {X}\) is bounded by Assumption 1 (so \(\Psi \) and its first and second derivatives are bounded on \(\mathcal {X}\), uniformly on [0, t]),

$$\begin{aligned} \Vert S_{(m-k)t/m}\Vert _{2\rightarrow 2}&\le K(t) \end{aligned}$$

for some K(t) depending on t but not h. Hence, by submultiplicity, (4.10), and Lemma 4.6, we have

$$\begin{aligned} \big \Vert \widetilde{S}^{m^2}_{t\tau /m^2}-S_t\big \Vert _{2\rightarrow 0}&\le K(t)\sum _{k=1}^m \big \Vert \widetilde{S}^m_{t\tau /m^2}-S_{t/m}\big \Vert _{2\rightarrow 0} = K(t)m\big \Vert \widetilde{S}^m_{t\tau /m^2}-S_{t/m}\big \Vert _{2\rightarrow 0}\,, \end{aligned}$$

and hence by Lemma 4.6,

$$\begin{aligned} \sum _{m=1}^\infty \mathbb {P}\left( \big \Vert \widetilde{S}^{m^2}_{t\tau /m^2}-S_t\big \Vert _{2\rightarrow 0}> \varepsilon \right)&\le \sum _{m=1}^\infty \mathbb {P}\left( \big \Vert \widetilde{S}^m_{t\tau /m^2}-S_{t/m}\big \Vert _{2\rightarrow 0} > \tfrac{\varepsilon }{K(t)m}\right) < \infty . \end{aligned}$$

\(\square \)

We conclude this section by sketching the proofs of Lemmas 4.2 and 4.6, which are inspired by ideas from [10, 11] and given in full detail in the Appendix. In what follows we set \(\widetilde{S}_{h\tau }:=\widetilde{S}^1_{h\tau }\) and define \(\widetilde{S}^{(i,j)}_{h\tau }:=\widetilde{S}^{(i)}_{h\tau }\cdots \widetilde{S}^{(j)}_{h\tau }\). Consider first Lemma 4.2. Differentiating \(\widetilde{S}_{h\tau }\) in h gives

$$\begin{aligned} \partial _h\widetilde{S}_{h\tau }&= \sum _{k=1}^n \tau _k e^{h\tau _1}\cdots e^{h\tau _{k-1}}V_k e^{h\tau _k}\cdots e^{h\tau _n} = \sum _{k=1}^n \tau _k \widetilde{S}^{(1,k-1)}_{h\tau }V_k\widetilde{S}^{(k,n)}_{h\tau }. \end{aligned}$$

Next, commute \(\widetilde{S}^{(1,k-1)}_{h\tau }\) and \(V_k\) via the Lie bracket \([\widetilde{S}^{(1,k-1)}_{h\tau }, V_k]:=\widetilde{S}^{(1,k-1)}_{h\tau }V_k-V_k\widetilde{S}^{(1,k-1)}_{h\tau }\) to get

$$\begin{aligned} \partial _h\widetilde{S}_{h\tau }&= \sum _{k=1}^n \tau _kV_k\widetilde{S}_{h\tau }+\sum _{k=1}^n \tau _k[\widetilde{S}^{(1,k-1)}_{h\tau }, V_k]\widetilde{S}^{(k,n)}_{h\tau } = V\widetilde{S}_{h\tau }+(V_\tau -V)\widetilde{S}_{h\tau }+E_{h\tau } \end{aligned}$$

where \(V_\tau :=\sum _{k=1}^n \tau _kV_k\) and \(E_{h\tau }:=\sum _{k=1}^n \tau _k[\widetilde{S}^{(1,k-1)}_{h\tau }, V_k]\widetilde{S}^{(k,n)}_{h\tau }\). So, by variation of constants,

$$\begin{aligned} \widetilde{S}_{h\tau }-S_h&= \int _0^h S_{h-r}(V_\tau -V)\widetilde{S}_{r\tau } dr+\int _0^hS_{h-r}E_{r\tau } dr. \end{aligned}$$
(4.11)

Loosely speaking, the first integrand is \(\mathcal {O}(h)\) because

$$\begin{aligned} \mathbb {E}(V_\tau -V) = \sum _{k=1}^n \mathbb {E}(\tau _k-1)V_k = 0 \end{aligned}$$
(4.12)

cancels first order terms from the full expression, \(S_{h-r}(V_\tau -V)\widetilde{S}_{r\tau }\). On the other hand the second integrand is \(\mathcal {O}(h)\) because the bracket terms in \(E_{h\tau }\) also cancel first order terms (most of the work of the proof in the Appendix is making these two statements precise). Thus, integrating these \(\mathcal {O}(h)\) terms over the interval (0, h), the difference on the right side of (4.11) is \(\mathcal {O}(h^2)\) as claimed.

The proof of Lemma 4.6 is structurally similar to the one sketched above in that it again begins with an application of variation of constants. However, in this case our analysis aims to establish a concentration estimate and can therefore not rely solely on the vanishing first moment in as in (4.12). Instead, morally speaking, we expect the desired estimate to hold because of the averaging of iid flow times \(\tau _i\) in the homologue of (4.12). In order to capture such averaging, we cannot limit our analysis to one cycle, but have to consider a variation of constants estimate on \(m\gg 1\) such cycles:

$$\begin{aligned} \widetilde{S}_{h\tau }^m-S_{mh}&= \int _0^h S_{m(h-r)} (V_\tau -V)\widetilde{S}_{r\tau }^m dr+\int _0^hS_{m(h-r)}^m E_{r\tau }^{(m)} dr \end{aligned}$$
(4.13)

where now \(E_{r\tau }^{(m)} :=\sum _{k=1}^{mn} \tau _k[\widetilde{S}^{(1,k-1)}_{h\tau }, V_k]\widetilde{S}^{(k,n)}_{h\tau }\). Note that the second term contains \(\mathcal O(m^2)\) commutators, each contributing \(\mathcal O(h^2)\) as in the previous analysis. On the other hand, once integrated, the difference in the first integral, \(\sum _{k=1}^{mn} (\tau _k-1)V_k\), scales as \(\mathcal O(\sqrt{m}h )\) by the central limit theorem for iid random variables. In order to have both terms decay faster than \(\mathcal O(1/m)\) we choose \(m\sim \mathcal O(1/\sqrt{h})\), whence the relation \(h=t/m^2\).

5 Conservative Lorenz-96

In this section, we apply results of the previous sections to the conservative Lorenz-96 model introduced in Sect. 1.2. There we noted the vector field V in (1.5) splits as (1.6) where the flow of each \(V_k\) is a rotation. Specifically, each flow \(\varphi ^{(k)}\) of the splitting vector fields

$$\begin{aligned} V_k(x)&= (x_{k+1}e_k-x_ke_{k+1})x_{k-1} \end{aligned}$$
(5.1)

is a rotation in the \((x_k,x_{k+1})\)-plane with angular velocity \(x_{k-1}\) and therefore preserves Euclidean norm, which we refer to as the energy of the system. Throughout this section \(\mathcal {V}\) denotes the family of splitting vector fields corresponding to (5.1). By the preceding remarks every \(\mathcal {V}\)-orbit lies on a sphere centered at the origin in \(\mathbb {R}^n\). In particular, we have

Proposition 5.1

All the finite time convergence results of Sect. 4 apply to the random splitting (1.6) of conservative Lorenz-96 starting from any initial condition.

Proof

The splitting vector fields are smooth and Assumption 1 is satisfied since every \(\mathcal {V}\)-orbit lies on a sphere, so the conclusions of Theorems 4.1 and 4.5 both hold.\(\square \)

5.1 Ergodicity

A complicating feature of the conservative Lorenz-96 dynamics is that it has fixed points. Specifically, a point x in \(\mathbb {R}^n\) is a fixed point of (1.5) if and only if \(\sum _{k=1}^n (x_k^2+x_{k+1}^2)x_{k-1}^2=0\). For a 2-sphere embedded in \(\mathbb {R}^3\) these are precisely the 6 points of intersection of the sphere with the standard coordinate axes. In higher dimensions, these fixed points lie on submanifolds that in general have dimension greater than 0 and in particular are no longer isolated. Nevertheless, nonfixed points cannot reach fixed points in finite time; in fact, the following result shows there is precisely one \(\mathcal {V}\)-orbit on each sphere that contains all the nonfixed points on that sphere.

Proposition 5.2

If x is a nonfixed point of the conservative Lorenz-96 equations, then

$$\begin{aligned} \mathcal {X}(x)&= \mathcal {X} :=\bigg \{y\in \mathbb {R}^n: \Vert y\Vert = R\ \text {and}\ \sum _{k=1}^n (y_k^2+y_{k+1}^2)y_{k-1}^2\ne 0\bigg \}, \end{aligned}$$
(5.2)

where \(R=\Vert x\Vert \). Furthermore the random splitting of conservative Lorenz-96 is uniquely ergodic on \(\mathcal {X}\): for all \(h>0\) the volume form \(\lambda \) is the unique \(P_h\)-invariant probability measure on \(\mathcal {X}\).

Corollary 5.3

For all \(h>0\) the volume form \(\lambda \) on \(S^{n-1}(R):=\{x\in \mathbb {R}^n:\Vert x\Vert =R\}\) is the unique ergodic \(P_h\)-invariant probability measure on \(S^{n-1}(R)\) that is absolutely continuous with respect to \(\lambda \).

Proof

\(\mathcal {X}\) in (5.2) is the complement of a closed, measure zero subset of \(S^{n-1}(R)\). Thus \(\lambda \) on \(\mathcal {X}\) agrees with the volume form, also denoted \(\lambda \), on \(S^{n-1}(R)\). In particular, \(\lambda \) is an ergodic invariant measure on \(S^{n-1}(R)\) by Proposition 5.2. Since ergodic invariant measures are mutually singular, see e.g. [23], any other ergodic invariant measure on \(S^{n-1}(R)\) must be singular with respect to \(\lambda \).\(\square \)

Proof of Proposition 5.2

Let x be a nonfixed point with \(\Vert x\Vert =R\). We first prove x can be mapped via the split dynamics to \(x_*:=(R/\sqrt{n},\dots , R/\sqrt{n})\). Since x is a nonfixed point, i.e. \(\sum _{k=1}^n (x_k^2+x_{k+1}^2)x_{k-1}^2\ne 0\), there exists k such that \(x_{k-1}\ne 0\) and \(x_k\) or \(x_{k+1}\) is nonzero. Now, since \(\varphi ^{(k)}\) is a rotation in the \((x_k,x_{k+1})\)-plane with angular velocity \(x_{k-1}\), there is a \(t_k\) such that both k and \(k+1\) coordinates of \(\varphi ^{(k)}(x,t_k)\) are nonzero. By the same argument there is a \(t_{k+1}\) such that the k, \(k+1\), and \(k+2\) coordinates of \(x^{(k+1)}=\varphi ^{(k+1)}(\varphi ^{(k)}(x,t_k),t_{k+1})\) are nonzero. Continuing this way, we see x can be made to have nonzero coordinates in a finite number of steps.

Now since \(\Vert x\Vert =R\), there exists an index k such that \(|x_k|\ge R/\sqrt{n}\). If \(k=n\), rotate in the \((n-1,n)\)-plane so that the nth coordinate of x becomes \(R/\sqrt{n}\). If \(k<n\), rotate in the \((k,k+1)\)-plane so that the \(k+1\) coordinate of x becomes \(R/\sqrt{n}\), then rotate in the \((k+1,k+2)\)-plane so that the \(k+2\) coordinate of x becomes \(R/\sqrt{n}\), and so on until the nth-coordinate of x becomes \(R/\sqrt{n}\). Such rotations are always possible because all coordinates of x are nonzero by the preceding argument. Thus, whether \(k=n\) or \(k<n\) we can evolve x via the split dynamics so that its last coordinate, \(x_n\), is \(R/\sqrt{n}\). In particular, there now must exist an index \(k<n\) such that \(|x_k|\ge R/\sqrt{n}\). By the same procedure, and without disturbing the last coordinate, we can use rotations to make the \(n-1\) coordinate of x equal \(R/\sqrt{n}\). Iterating this process maps x to \(x_*\) in a finite number of steps. Since x was arbitrary it follows that every nonfixed point with norm R belongs to the same orbit, which is precisely the set \(\mathcal {X}\) defined in (5.2).

Next we prove there is at most one \(P_h\)-invariant measure on \(\mathcal {X}\). First note that since the split dynamics are all rotations, the above procedure mapping any arbitrary x in \(\mathcal {X}\) to \(x_*\) can be done using strictly positive times. Furthermore, by direct observation, the matrix of splitting vector fields

has rank \(n-1\) whenever all \(x_k\) are nonzero. In particular, since \(\mathcal {X}\) is an open subset of the sphere of radius R and therefore itself an \(n-1\)-dimensional manifold, the splitting vector fields \(V_k\) span \(T_{x_*}\mathcal {X}\). Hence \({{\,\textrm{Lie}\,}}_{x_*}(\mathcal {V})=T_{x_*}\mathcal {X}\). By Corollary 3.7, \(P_h\) has at most one invariant measure on \(\mathcal {X}\).

We next show Lebesgue measure, \({{\,\textrm{Leb}\,}}\), in \(\mathbb R^n\) is \(P_h\)-invariant. Let \(S^{n-1}(R)\) denote the sphere of radius R in \(\mathbb {R}^n\) and let \({{\,\textrm{Leb}\,}}^{(k)}_t:=(\varphi ^{(k)}_t)_\#{{\,\textrm{Leb}\,}}\) be the pushforward of \({{\,\textrm{Leb}\,}}\) by \(\varphi ^{(k)}_t\). Since the \(V_k\) in (5.1) are divergence free, the continuity equation, intended in the weak sense,Footnote 6 becomes

$$\begin{aligned} 0&= \partial _t{{\,\textrm{Leb}\,}}^{(k)}_t + {{\,\textrm{div}\,}}\left( V_k{{\,\textrm{Leb}\,}}^{(k)}_t\right) = \partial _t{{\,\textrm{Leb}\,}}^{(k)}_t + \nabla {{\,\textrm{Leb}\,}}^{(k)}_t\cdot V_k\,. \end{aligned}$$
(5.3)

The latter is a transport equation with constant initial condition \({{\,\textrm{Leb}\,}}^{(k)}_0\equiv 1\) and hence \({{\,\textrm{Leb}\,}}^{(k)}_t={{\,\textrm{Leb}\,}}\) for all t. Because the trajectories of all \(V_k\) conserve the energy \(\Vert x\Vert \), we fiber \(\mathbb R^n\) using spherical coordinates \((r, \vartheta ) \in \mathbb R_+ \times S^{n-1}(R)\). In these coordinates, we have that \(V_k(r, \vartheta ) = 0\, \partial _r + r v_k(\vartheta ) \nabla _\vartheta \) and by a change of coordinates of the divergence operator the stationarity equation becomes

$$\begin{aligned} 0 = {{\,\textrm{div}\,}}\left( V_k(x)\lambda (x)\right) = u(r) w(\vartheta ) {{\,\textrm{div}\,}}_\vartheta (\lambda (r, \vartheta ) v_k(\vartheta ))\,, \end{aligned}$$
(5.4)

where \({{\,\textrm{div}\,}}_\vartheta \) denotes the angular terms of the divergence in spherical coordinates, and \(u(r), w(\vartheta )\) result from the change of variables. Hence, we can factor the solution \(\lambda (r, \vartheta ) = \bar{\lambda }(\vartheta |r) \cdot \mu _R(d r) = \bar{\lambda }(\vartheta ) \cdot \mu _R(d r)\), where \( \bar{\lambda }(\vartheta |r)\) is the conditional density of Lebesgue measure on a fiber. The measure \(\bar{\lambda }\) solves \(w(\vartheta ) {{\,\textrm{div}\,}}_\vartheta (\bar{\lambda }(\vartheta ) v_k(\vartheta ) ) = 0\) and is therefore invariant under the flows \(\varphi _t^{(k)}\). By rotational symmetry of \({{\,\textrm{Leb}\,}}\), we must have that \(\bar{\lambda }(\vartheta )\) is the volume form on \(S^{n-1}(R)\). And since \(\mathcal {X}\) is a full-measure open subset of \(S^{n-1}(R)\), the volume form \(\lambda \) on \(\mathcal {X}\) is just the restriction of \(\bar{\lambda }\) to \(\mathcal {X}\). Thus \(\lambda \) is also invariant under the flows and is therefore the unique \(P_h\)-invariant measure on \(\mathcal {X}\).\(\square \)

6 Galerkin Approximations of 2D Euler

The 2D Euler equations on the torus \(\mathbb {T}\) are obtained from the 2D Navier–Stokes equations (1.1) by dropping the dissipative and forcing terms:

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _tu+(u\cdot \nabla )u = -\nabla p \\ {{\,\textrm{div}\,}}(u) :=\nabla \cdot u = 0 \end{array}\right. } \end{aligned}$$
(6.1)

where, as before, \(u:\mathbb {T}\times \mathbb {R}\rightarrow \mathbb {R}^2\) is the fluid velocity, \(p:\mathbb {T}\times \mathbb {R}\rightarrow \mathbb {R}\) the fluid pressure, and

$$\begin{aligned} (u\cdot \nabla )u&= (u_1\partial _1u_1+u_2\partial _2u_1, u_1\partial _1u_2+u_2\partial _2u_2)\,. \end{aligned}$$

In this section we construct a convenient random splitting of (6.1). To do so we first write (6.1) in vorticity form and apply the Fourier transform. This yields an infinite system of ODEs which we truncate to systems of arbitrary finite size, referred to throughout as Galerkin approximations. Finally, we split these Galerkin approximations and apply the results of Sects. 3 and 4 to the associated random splitting.

6.1 Constructing the splitting

The vorticity formulation of (6.1) is obtained by taking the curl of velocity. Specifically, setting \(q:={{\,\textrm{curl}\,}}(u):=\partial _2u_1-\partial _1u_2\), equation (6.1) becomes

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _tq+(\mathcal {K}q\cdot \nabla )q = 0 \,,\\ {{\,\textrm{div}\,}}(q) = 0\,, \end{array}\right. } \end{aligned}$$
(6.2)

where \(\mathcal {K}:=\nabla ^\perp (-\Delta )^{-1}\) with \(\nabla ^\perp :=(\partial _2,-\partial _1)\). To express (6.2) in Fourier space, set \(\mathbb {Z}^2_\infty :=\mathbb {Z}^2\setminus \{(0,0)\}\) and let \(\{e_j\}_{j\in \mathbb {Z}^2_\infty }\) be the orthonormal basis of \(L^2(\mathbb {T},\mathbb {R})\) given by \(e_j(x):=(2\pi )^{-1}\exp (ix\cdot j)\). Then \(q(x,t)=\sum _{j\in \mathbb {Z}^2_\infty }q_j(t)e_j(x)\) where

$$\begin{aligned} q_j(t)&:=\langle q,e_j\rangle _{L^2} =\int _{\mathbb {T}} q(x,t)\overline{e}_j(x) dx \end{aligned}$$

is the jth Fourier mode of q. Here \(\langle \cdot ,\cdot \rangle _{L^2}\) is the standard inner product on \(L^2(\mathbb {T},\mathbb {R})\) with \(\overline{e}_j\) denoting the complex conjugate of \(e_j\). The jth Fourier mode of \((\mathcal {K}q\cdot \nabla )q\) is

$$\begin{aligned} \langle (\mathcal {K}q\cdot \nabla )q,e_j\rangle _{L^2}&= \sum _{k+\ell =j} C_{k\ell }q_kq_\ell \end{aligned}$$

where

$$\begin{aligned} C_{k\ell } :=\frac{\langle k,\ell ^\perp \rangle }{4\pi }\bigg (\frac{1}{|k|^2}-\frac{1}{|\ell |^2}\bigg ) \end{aligned}$$
(6.3)

with \(\langle \cdot ,\cdot \rangle \) the standard inner product in \(\mathbb R^2\), \(\ell ^\perp :=(\ell _2,-\ell _1)\), and \(|\ell |^2:=\ell _1^2+\ell _2^2\). Therefore

$$\begin{aligned} \sum _j \dot{q}_je_j&= \partial _t q = -(\mathcal {K}q\cdot \nabla )q = -\sum _j\bigg (\sum _{k+\ell =j} C_{k\ell }q_kq_\ell \bigg )e_j \end{aligned}$$

and hence \(\dot{q}_j=-\sum _{k+\ell =j} C_{k\ell }q_kq_\ell \). Moreover, since q is real-valued,

$$\begin{aligned} \sum _j q_je_j&= q = \overline{q} = \sum _j \overline{q}_je_{-j} \end{aligned}$$

which gives \(q_j=\overline{q}_{-j}\). In particular,

$$\begin{aligned} \dot{q}_j&= \dot{\overline{q}}_{-j} = -\sum _{j+k+\ell =0} C_{k\ell }\overline{q}_k\overline{q}_\ell \,. \end{aligned}$$

Writing each Fourier mode \(q_j=a_j+ib_j\) in terms of real and imaginary parts then gives

$$\begin{aligned} \dot{a}_j+i\dot{b}_j = \dot{q}_j&= -\sum _{j+k+\ell =0} C_{k\ell }(a_k-ib_k)(a_\ell -ib_\ell ) \\&= \sum _{j+k+\ell =0} C_{k\ell }(b_kb_\ell -a_ka_\ell )+i\sum _{j+k+\ell =0} C_{k\ell }(a_kb_\ell +a_\ell b_k)\,. \end{aligned}$$

Thus the Fourier modes of solutions to the Euler equation in vorticity form satisfy

$$\begin{aligned} \left\{ \begin{aligned} \dot{a}_j&= \sum _{j+k+\ell =0}^{~} C_{k\ell }(b_kb_\ell -a_ka_\ell ) \\ \dot{b}_j&= \sum _{j+k+\ell =0} C_{k\ell }(a_kb_\ell +a_\ell b_k) \end{aligned}\right. \end{aligned}$$
(6.4)

for all \(j\in \mathbb {Z}^2_\infty \). While (6.4) could be studied as is, notice the constraint \(q_{-j}=\overline{q}_j\) implies \(a_{-j}=a_j\) and \(b_{-j}=-b_j\), which introduces redundancy in (6.4). Therefore we restrict to the subset

$$\begin{aligned} \mathbb {Z}^2_+&:=\{j\in \mathbb {Z}^2 : j_2>0\}\cup \{j\in \mathbb {Z}^2 : j_2=0\ \text {and}\ j_1>0\}\,. \end{aligned}$$

Specifically, by straightforward computation together with the identities \(a_{-j}=a_j\), \(b_{-j}=-b_j\), and \(C_{k\ell }=C_{-k,-\ell }=-C_{-k,\ell }=-C_{k,-\ell }\), the system (6.4) can be re-expressed as

$$\begin{aligned} \left\{ \begin{aligned} \dot{a}_j =&\sum _{j+k-\ell =0} C_{k\ell }(a_ka_\ell +b_kb_\ell )+\sum _{j-k-\ell =0} C_{k\ell }(b_kb_\ell -a_ka_\ell ) \\ \dot{b}_j =&\sum _{j+k-\ell =0} C_{k\ell }(a_k b_\ell -b_k a_\ell )-\sum _{j-k-\ell =0} C_{k\ell }(a_kb_\ell +b_k a_\ell ) \end{aligned}\right. \end{aligned}$$
(6.5)

for all \(j\in \mathbb {Z}^2_+\) with each sum running over all pairs \(k,\ell \in \mathbb {Z}^2_+\) satisfying the specified identity. To split (6.5) note that for any \(j,k,\ell \in \mathbb {Z}^2_+\) satisfying \(j+k-\ell =0\) (and hence \(\ell -j-k=0\)) we can isolate from the above sums exactly 6 equations involving only these indices:

$$\begin{aligned} \begin{aligned} \dot{a}_j&= C_{k\ell }(a_ka_\ell +b_kb_\ell )\,, \qquad \dot{a}_k = C_{j\ell }(a_ja_\ell +b_jb_\ell )\,, \qquad \dot{a}_\ell = C_{jk}(b_jb_k-a_ja_k)\,, \\ \dot{b}_j&= C_{k\ell }(a_kb_\ell -b_k a_\ell )\,, \qquad \dot{b}_k = C_{j\ell }(a_jb_\ell -b_j a_\ell )\,, \qquad \, \dot{b}_\ell = -C_{jk}(a_jb_k+b_j a_k)\,. \end{aligned}\nonumber \\ \end{aligned}$$
(6.6)

For reasons to be made clear shortly, we recombine (6.6) into 4 groups of 3 equations:

$$\begin{aligned} {\left\{ \begin{array}{ll} \dot{a}_j = C_{k\ell }a_ka_\ell \\ \dot{a}_k = C_{j\ell }a_ja_\ell \\ \dot{a}_\ell = -C_{jk}a_ja_k \end{array}\right. } {\left\{ \begin{array}{ll} \dot{a}_j = C_{k\ell }b_kb_\ell \\ \dot{b}_k = C_{j\ell }a_jb_\ell \\ \dot{b}_\ell = -C_{jk}a_jb_k \end{array}\right. } {\left\{ \begin{array}{ll} \dot{b}_j = C_{k\ell }a_kb_\ell \\ \dot{a}_k = C_{j\ell }b_jb_\ell \\ \dot{b}_\ell = -C_{jk}b_ja_k \end{array}\right. } {\left\{ \begin{array}{ll} \dot{b}_j = -C_{k\ell }b_ka_\ell \\ \dot{b}_k = -C_{j\ell }b_ja_\ell \\ \dot{a}_\ell = C_{jk}b_jb_k \end{array}\right. }\,. \end{aligned}$$
(6.7)

Let \(V_{a_ja_ka_\ell }\), \(V_{a_jb_kb_\ell }\), \(V_{b_ja_kb_\ell }\), and \(V_{b_jb_ka_\ell }\) be the vector fields associated to the equations of (6.7) from left to right. For example, \(V_{a_ja_ka_\ell }\) is the vector field on \(\mathbb {R}^\infty \) mapping the \(a_j\) coordinate to \(-C_{k\ell }a_ka_\ell \), the \(a_k\) coordinate to \(-C_{j\ell }a_ja_\ell \), the \(a_\ell \) coordinate to \(-C_{jk}a_ja_k\), and all other coordinates to 0. These are the splitting vector fields. Our sought-after splitting is

$$\begin{aligned} V&= \sum _{j+k-\ell =0}V_{a_ja_ka_\ell }+V_{a_jb_kb_\ell }+V_{b_ja_kb_\ell }+V_{b_jb_ka_\ell }\,, \end{aligned}$$
(6.8)

where V is the vector field associated to (6.5). As noted earlier, our focus will be on finite truncations of the infinite-dimensional system (6.5). Thus we fix an integer \(N\ge 2\) and define the \(N\text {th}\) Galerkin approximation of (6.5) to be (6.5) with indices restricted to the set

$$\begin{aligned} \mathbb {Z}^2_N&:=\big \{j\in \mathbb {Z}^2_+ : \max \{|j_1|, |j_2|\}\le N\big \}\,. \end{aligned}$$

The splitting (6.8) remains valid in this finite-dimensional setting, bearing in mind that now all indices lie in \(\mathbb {Z}^2_N\). By a slight abuse of notation, we denote the finite-dimensional counterpart of V by V and similarly for the splitting vector fields. Thus our family of splitting vector fields is

$$\begin{aligned} \mathcal {V}&=\left\{ V_{a_ja_ka_\ell }, V_{a_jb_kb_\ell }, V_{b_ja_kb_\ell }, V_{b_jb_ka_\ell } : j,k,\ell \in \mathbb {Z}^2_N \text { and } j+k-\ell =0\right\} . \end{aligned}$$
(6.9)

Since \(\mathbb {Z}^2_N\) has cardinality \(2N(N+1)\) and each index \(j\in \mathbb {Z}^2_N\) has an associated \(a_j\) and \(b_j\) coordinate, these are all vector fields on \(\mathbb {R}^n\), where throughout this section we set \(n:=4N(N+1)\). We also abuse notation by conflating elements j in \(\mathbb {Z}^2_N\) with elements j in \(\{1,\dots ,n/2\}\), which can be formalized via any bijection between the two sets. Moreover, we denote elements of \(\mathbb {R}^n\) by \(q=(a_j,b_j)_{j=1}^{n/2}\). This reflects that the \(a_j\) and \(b_j\) coordinates of q in \(\mathbb {R}^n\) are in one-to-one correspondence with the real and imaginary parts of the jth mode of q.

Remark 6.1

There are many possible splittings of a given equation. For the Euler equations, we made the particular choice we have so that both energy and enstrophy are conserved but the dynamics of each splitting are still relatively easily understood. We could have further decomposed the three-dimensional dynamics in the above splitting into a number of two-dimensional dynamics, similar in spirit to the decomposition into rotations used in Lorenz-96. However, that would have necessitated only conserving either the energy or the enstrophy.

6.2 Conservation and convergence

The conservative Lorenz-96 dynamics discussed in Sect. 5 conserves Euclidean norm (energy in that case) and therefore remains on whichever sphere it starts on. So too do the flows of each of the splitting vector fields (5.1). We now show a similar thing is true for Galerkin approximations of 2D Euler. Define the energy and enstrophy of \(q=(a_j,b_j)_{j=1}^{n/2}\) by

$$\begin{aligned} E(q) :=\sum _{j\in \mathbb {Z}^2_N}\frac{a_j^2+b_j^2}{|j|^2} \qquad \text {and}\qquad \mathcal {E}(q) :=\sum _{j\in \mathbb {Z}^2_N} a_j^2+b_j^2\,, \end{aligned}$$
(6.10)

respectively (note the aforementioned conflation of j in \(\mathbb {Z}^2_N\) and \(j\in \{1,\dots ,n/2\}\) in the summations). Straightforward computation shows that for all \(j,k,\ell \in \mathbb {Z}^2_N\) with \(j+k-\ell =0\),

$$\begin{aligned} C_{k\ell }+C_{j\ell }-C_{jk}&= \frac{C_{k\ell }}{|j |^2}+\frac{C_{j\ell }}{|k |^2}-\frac{C_{jk}}{|\ell |^2} = 0\,, \end{aligned}$$

which in turn implies that under the dynamics (6.5),

$$\begin{aligned} \partial _tE(q)&= \partial _t\mathcal {E}(q) = 0 \end{aligned}$$

for all \(q\in \mathbb {R}^n\). That is, both energy and enstrophy are conserved by the true dynamics and the set

$$\begin{aligned} \mathcal {Q}_0(E,\mathcal {E}) :=\big \{q\in \mathbb {R}^n : E(q)=E,\ \mathcal {E}(q)=\mathcal {E} \big \}\,. \end{aligned}$$
(6.11)

is invariant under (6.5). This is a well-established property of the 2D Euler equations. Moreover, if we flow by \(V_{a_ja_ka_\ell }\) starting from q for any \(j,k,\ell \in \mathbb {Z}^2_N\) with \(j+k-\ell = 0\), then

$$\begin{aligned} \tfrac{1}{2}\partial _tE(q)&= \frac{a_j\dot{a}_j}{|j|^2}+\frac{a_k\dot{a}_k}{|k|^2}+\frac{a_\ell \dot{a}_\ell }{|\ell |^2} = \bigg (\frac{C_{k\ell }}{|j |^2}+\frac{C_{j\ell }}{|k |^2}-\frac{C_{jk}}{|\ell |^2}\bigg )a_ja_ka_\ell = 0\,, \end{aligned}$$

and similarly \(\partial _t\mathcal {E}(q)=0\). The same computation shows energy and enstrophy are conserved by all of the splitting vector fields in \(\mathcal {V}\), which provides the motivation for recombining (6.6) as (6.7) in the first place. In particular, we have

Proposition 6.2

All of the finite time convergence results of Sect. 4 apply to the random splitting (6.8) of every Galerkin approximation of 2D Euler starting from any initial condition.

Proof

The splitting vector fields are smooth and Assumption 1 is satisfied since every \(\mathcal {V}\)-orbit lies on a sphere, so the conclusions of Theorems 4.1 and 4.5 both hold.\(\square \)

6.3 Ergodicity

Fix energy and enstrophy values E and \(\mathcal {E}\) and set \(\mathcal {Q}_0:=\mathcal {Q}_0(E,\mathcal {E})\). \(\mathcal {Q}_0\) is an \(n-2\)-dimensional submanifold of \(\mathbb {R}^n\) where, recall, \(n:=4N(N+1)\); denote its volume form by \(\lambda \). As with conservative Lorenz-96, the \(N\text {th}\) Galerkin approximation of 2D Euler has points q in \(\mathcal Q_0\) whose \(\mathcal {V}\)-orbits are not dense in \(\mathcal Q_0\). For example, any q with exactly one nonzero coordinate is a fixed point of (6.5) and of all the equations (6.7). In this subsection we characterize these points and prove there is exactly one \(\mathcal {V}\)-orbit \(\mathcal {Q}\) on \(\mathcal {Q}_0\) such that \(\lambda (\mathcal {Q})=1\). By a slight abuse of notation we denote the restriction of \(\lambda \) to \(\mathcal {Q}\) by \(\lambda \) as well. We then show there exists a unique \(P_h\)-invariant measure on \(\mathcal {Q}\) – and hence on \(\mathcal {Q}_0\) – that is absolutely continuous with respect to \(\lambda \) on \(\mathcal {Q}_0\).

To make the above statements precise, we begin by enumerating the coordinates of \(q \in \mathbb R^n\) by extending the indices \(j\in \mathbb Z_N^2\) with an element \(\chi \in \{+, -\}\) which denotes the real (\(+\)) or imaginary (−) part of the corresponding mode. Then, for \(\textbf{j} = (j, \chi ) \in \mathbb Z_N^2\times \{+,-\}\), we define the type of such coordinates via the function \(\textrm{T}(\textbf{j}) = \chi \) so that \(q_{\textbf{j}}\) is identified with \(a_j\) if \(\textrm{T}(\textbf{j}) = +\) and with \(b_j\) if \(\textrm{T}(\textbf{j}) = -\). For \(q\in \mathbb R^n\), denote by

$$\begin{aligned} \mathcal A(q):=\big \{\textbf{j} \in \mathbb Z_N^2\times \{+,-\}~:~q_{\textbf{j}}\ne 0\big \} \end{aligned}$$

the set of “active” coordinates. To streamline our analysis, we define the following operation to expand the set \(\mathcal A\):

$$\begin{aligned} \mathcal A\oplus {\varvec{\ell }} :={\left\{ \begin{array}{ll} \mathcal A\cup \{{\varvec{\ell }}\}\qquad &{}\text {if } \ell \in \{j+k,j-k\}\cap \mathbb Z_N^2\text { for }{} \textbf{j}, \textbf{k} \in \mathcal A, C_{jk}\ne 0, \textrm{T}(\textbf{j})\cdot \textrm{T}(\textbf{k}) = \textrm{T}({\varvec{\ell }})\,,\\ \mathcal A&{} \text {else}\,, \end{array}\right. } \end{aligned}$$
(6.12)

where \(\textrm{T}(\textbf{j})\cdot \textrm{T}(\textbf{k})\) is \(+\) if \(\textrm{T}(\textbf{j})= \textrm{T}(\textbf{k})\) and − if \(\textrm{T}(\textbf{j})\ne \textrm{T}(\textbf{k})\). This operation corresponds to extending the nonzero coordinates of q from \({\varvec{j}},{\varvec{k}}\) to \({\varvec{\ell }}\) by letting a triple \(\iota = {\varvec{j} \varvec{k} \varvec{\ell }}\) interact.

We assume that the initial condition is sufficiently nondegenerate, as stated in the following assumption similar to the one made in [24, Thm. 2.1].

Definition 6.3

. A point q in \(\mathcal {Q}_0\) is nondegenerate if there exists \(M \in \mathbb N\), \(j^*\in \mathbb Z_N^2\) with \(|j^*|^2>1\), and an ordered set of indices \(({\varvec{\ell }}_i)_{i=1}^M\) in \(\mathbb Z_N^2\times \{+,-\}\) such that

$$\begin{aligned} \big \{(1,0,+),(0,1,+), (j^*,-)\big \}\subseteq \big ((\mathcal A(q)\oplus \varvec{\ell }_1)\oplus \varvec{\ell }_2\big ) \dots \oplus \varvec{\ell }_M. \end{aligned}$$
(6.13)

Definition 6.4

. A point in \(\mathbb {R}^n\) is generic if all of its coordinates are nonzero.

Remark 6.5

Every point with all coordinates nonzero is a nonfixed point of conservative Lorenz-96; similarly, every generic point in \(\mathcal {Q}_0\) is nondegenerate. However, comparing (6.13) with (5.2), we see the conditions defining nondegenerate points in \(\mathcal {Q}_0\) are more complicated than the easily characterized nonfixed points of conservative Lorenz-96. The difference is that, unlike spheres in conservative Lorenz-96, there are proper subspaces of \(\mathcal {Q}_0\) which are invariant for our splitting of the Euler dynamics but are not fixed points. One such subspace is the collection of purely real points; another is the purely imaginary points.

The following analogs of Proposition 5.2 and Corollary 5.3 are the main results of this subsection.

Proposition 6.6

Every nondegenerate point in \(\mathcal {Q}_0\) belongs to the same \(\mathcal {V}\)-orbit, \(\mathcal {Q}\), and for all \(h>0\) there exists a unique \(P_h\)-invariant probability measure on \(\mathcal {Q}\). Furthermore, this unique invariant measure is absolutely continuous with respect to the volume form on \(\mathcal {Q}\).

Proof

By Proposition 6.9 there is a \(q^*\) in \(\mathcal {Q}_0\) such that every nondegenerate point in \(\mathcal {Q}_0\) belongs to the \(\mathcal {V}\)-orbit \(\mathcal {Q}:=\mathcal {Q}(q^*)\), and for every q in \(\mathcal {Q}\) there is an \(m \in \mathbb N\) and a \(t \in \mathbb {R}^{mn}_+\) satisfying \(\Phi ^m(q,t)=q^*\). By Lemma 6.15 the splitting vector fields span the tangent space of \(\mathcal {Q}\) at generic points; in particular, the Lie bracket condition holds at every generic point. Thus, since the vector fields in \(\mathcal {V}\) are analytic, Corollary 3.9 implies \(P_h\) has at most one invariant probability measure on \(\mathcal {Q}\), which is necessarily the one identified by Lemma 6.14.\(\square \)

Corollary 6.7

For all \(h>0\) the measure from Proposition 6.6 is the unique \(P_h\)-invariant ergodic probability measure on \(\mathcal {Q}_0\) that is absolutely continuous with respect to the volume form on \(\mathcal {Q}_0\).

Proof

Let \(\lambda \) denote volume form on \(\mathcal {Q}_0\). Since \(\mathcal {Q}\) contains all generic points in \(\mathcal {Q}_0\), it is an open subset of \(\mathcal {Q}_0\) satisfying \(\lambda (\mathcal {Q})=1\). In particular, the unique invariant measure on \(\mathcal {Q}\) from Proposition 6.6 is an ergodic invariant measure on \(\mathcal {Q}_0\). Since ergodic invariant measures are mutually singular, see e.g. [23], any other ergodic invariant measure on \(\mathcal {Q}_0\) must be singular with respect to \(\lambda \).\(\square \)

Remark 6.8

Continuing in the spirit of Remark 6.1, we observe the splitting in (6.7) splits \(q_j\) into its real and imaginary parts. We could have chosen another basis of \(\mathbb {C}\) and even randomized over this choice for each evolution of an interacting triple \((j,k,\ell )\). More explicitly, if we define \(e(\vartheta )=cos(\vartheta )+i \sin (\vartheta )\) then \(e(\vartheta )\) and \(e(\vartheta +\frac{\pi }{2})\) form an orthonormal basis of \(\mathbb {C}\) for any \(\vartheta \). Then we can drive a system analogous to (6.7) by setting \(q_\ell = a_\ell ^\vartheta e(\vartheta ) + b_\ell ^\vartheta e(\vartheta +\frac{\pi }{2})\). As the form is similar to (6.7), the results of the paper extend to this system. In particular, by randomizing the choice of \(\vartheta \) for each such triple \((j,k,\ell )\), we can relax the characterization of nondegenerate points in Definition 6.3 by destroying some of the invariant structures discussed in Remark 6.5 which obstruct controllability starting from some initial conditions.

6.3.1 Controllability.

In this section, we prove controllability of the dynamics (6.7). By conservation of energy and enstrophy, the \(\mathcal {V}\)-orbit of an initial condition \(q^{(0)}\) in \(\mathcal {Q}_0\) is contained in \(\mathcal {Q}_0\). Recalling the definition of extended indices in Sect. 6.3, we define the set of interacting coordinate triples

$$\begin{aligned} \mathcal I:= & {} \big \{(\textbf{j},\textbf{k},{\varvec{\ell }}) \in (\mathbb Z_N^2\times \{+,-\})^3~:~ j+k=\ell ,~(C_{jk}, C_{j \ell }, C_{k\ell })\\\ne & {} (0,0,0),~\textrm{T}(\textbf{j})\cdot \textrm{T}(\textbf{k})=\textrm{T}({\varvec{\ell }})\big \}. \end{aligned}$$

Then, for any such triple of interacting indices \(\iota \in \mathcal I\) we denote by \(\varphi _{t}^\iota ~:~\mathcal Q_0 \rightarrow \mathcal Q_0\) the flow of the ODEs (6.7) evolving the corresponding coordinates. The dynamics we consider is then obtained by cycling through the set \(\mathcal I\) in a fixed or random order. For any \(\iota \in \mathcal I\) we denote by \(\Phi _{t}^\iota ~:~\mathcal Q_0\rightarrow \mathcal Q_0\) the flow of (6.7) after one such full cycle where the flow times are chosen as

$$\begin{aligned} \tau ^\xi ={\left\{ \begin{array}{ll}t\qquad &{}\text {if } \xi = \iota \,,\\ 0 &{} \text {else}\,, \end{array}\right. } \end{aligned}$$
(6.14)

so that for any \(q \in \mathcal Q_0\), \(\Phi _{t}^\iota (q) = \varphi _{t}^\iota (q)\).

Let \(q^* = (a_j^*,b_j^*)_{j=1}^{n/2}\) be the point in \(\mathcal {Q}_0\) defined as follows:

$$\begin{aligned} q_{(1,0)}^* = q_{(0,1)}^* = (a^*,0) \,,\qquad q_{(N,N)}^* = (0,b^*)\,, \end{aligned}$$
(6.15)

for \(a^*, b^*\ge 0\) and \(q_{j}^* = (0,0)\) for all other \(j\in \mathbb Z_N^2\). We show below that for any nondegenerate initial condition \(q^{(0)} \in \mathcal Q_0\) the system can be driven to this unique point \(q^*\) .

Proposition 6.9

For any nondegenerate point \(q^{(0)} = (a_j^{(0)},b_j^{(0)})_{j=1}^{n/2}\) in \(\mathcal {Q}_0\) there exists \(M \in \mathbb N\) and a joint sequence of transition times and coordinate triples \(\{(\iota (m),\tau (m))\}_{m = 1}^M\) such that

$$\begin{aligned} \Phi _{\tau (M)}^{\iota (M)}\circ \dots \circ \Phi _{\tau (1)}^{\iota (1)} (q^{(0)}) = q^*\,. \end{aligned}$$
(6.16)

Thus every nondegenerate point belongs to the same orbit, \(\mathcal {Q}:=\mathcal {Q}(q^*)\). Furthermore, for every q in \(\mathcal {Q}\) there is an \(m \in \mathbb N\) and a \(t \in \mathbb {R}^{mn}_+\) such that \(\Phi ^m(q,t)=q^*\).

Recall from Remark 2.3 that the only property of the exponential distribution used in this proof is the fact that it has a density around 0, allowing to choose the flow of some of the split vector fields to be the identity as, e.g., in (6.14). This comment also applies to the proof of Proposition 5.2 in the previous section. We further note that, since the trajectories of each of the \(\varphi ^{\iota (m)}\) in the above theorem are periodic (see Lemmas 6.11 and 6.12), each of these transformations can be inverted by choosing complementary transition times to \(\tau (m)\). Inverting the order of the transformations yields the converse statement:

Corollary 6.10

For any nondegenerate point \(q^{(0)} = (a_j^{(0)},b_j^{(0)})_{j=1}^{n/2}\) in \(\mathcal {Q}_0\) there exists \(M \in \mathbb N\) and a joint sequence of transition times and coordinate triples \(\{(\tilde{\iota }(m),\tilde{\tau }_0(m))\}_{m = 1}^M\) such that

$$\begin{aligned} \Phi _{\tilde{\tau }(M)}^{\tilde{\iota }(M)}\circ \dots \circ \Phi _{\tilde{\tau }(1)}^{\tilde{\iota }(1)} (q^*) = q^{(0)}\,. \end{aligned}$$

While the Corollary 6.10 will not be used in the remainder of the paper, it offers an alternative to Theorem 3.8 in proving that, when applying Corollary 3.7, it is sufficient to verify that Lie bracket condition holds at any point in \(\mathcal Q\), not necessarily at \(q^*\).

Proof of Proposition 6.9

We prove the first statement by first evolving the initial condition \(q^{(0)}\) into a sufficiently nondegenerate state \(q^{(1)}\), and then by sequentially shrinking the set of active components of the coordinate vector q to the ones listed in (6.15). We realize this program by following, in order, the sequence of steps described below, represented schematically in Fig. 1:

  1. (0)

    If it is not the case at initialization, Lemma B.1 shows that we can “prepare” our state by evolving \(q^{(0)}\) into \(q^{(1)}\) such that

    $$\begin{aligned} a_{(1,0)}^{(1)}, b_{(1,0)}^{(1)},a_{(0,1)}^{(1)}, b_{(0,1)}^{(1)},a_{(1,1)}^{(1)},b_{(1,1)}^{(1)}\ne 0\,, \end{aligned}$$
    (6.17)

    as represented in Fig. 1a.

  2. (1)

    As shown in Lemma B.2, we can then transform \(q^{(1)}\) into \(q^{(2)}\) with the property

    $$\begin{aligned} q_{j}^{(2)}=(0,0)\qquad \text {for all } j\in \mathbb Z_N^2\setminus \{(0,1),(1,0),(1,1),(N,N),(-N,N)\}\,, \end{aligned}$$
    (6.18)

    as represented in Fig. 1b, and

    $$\begin{aligned} a_{(1,0)}^{(2)}, b_{(1,0)}^{(2)}, a_{(0,1)}^{(2)}, b_{(0,1)}^{(2)},a_{(1,1)}^{(2)},b_{(1,1)}^{(2)} \ne 0\,. \end{aligned}$$
    (6.19)
  3. (2)

    Lemma B.3 shows that we can then “transfer” the amplitude from modes \(a_{(-N,N)}\), \(b_{(-N,N)}\), \(a_{(N,N)}\) to mode \(b_{(N,N)}\) i.e., we can reach a state \(q^{(3)}\) that satisfies

    $$\begin{aligned} q_{j}^{(3)}=(0,0)\qquad \qquad \,&\text {for all } j\in \mathbb Z_N^2\setminus \{(0,1),(1,0),(1,1),(N,N)\}\,, \end{aligned}$$
    (6.20)
    $$\begin{aligned} q_{(N,N)}^{(3)}=(0,b_{(N,N)}^{(3)})\quad&\text {with }b_{(N,N)}^{(3)} \ge 0\,. \end{aligned}$$
    (6.21)

    This state is represented in Fig. 1c.

  4. (3)

    Finally, Lemma B.5 shows that we can “transfer” the amplitude from modes \(a_{(1,1)}\), \(b_{(1,1)}\), \(b_{(0,1)}\) and \(b_{(1,0)}\) to modes \(a_{(0,1)}, a_{(1,0)}, b_{(N,N)}\) so that, after the transfer, \(a_{(0,1)} = a_{(1,0)}\) and \(a_{(0,1)},a_{(1,0)}, b_{(N,N)}>0\) i.e., we reach the unique state \(q^*\) from (6.15) (represented in Fig. 1d).

This proves the first part of Proposition 6.9, which immediately implies nondegenerate points in \(\mathcal {Q}_0\) belong to \(\mathcal {Q}=\mathcal {Q}(q^*)\). Let q be any point in \(\mathcal {Q}\). By definition there exist m and t in \(\mathbb {R}^{mn}\) such that

$$\begin{aligned} \Phi ^m(q,t)&= \varphi ^{(n)}_{t_{mn}}\circ \cdots \varphi ^{(1)}_{t_1}(q) = q^*. \end{aligned}$$

Note that the times \(t_i\) may be negative; however, by Lemma 6.11 each \(\varphi ^{(i)}\) is periodic. Thus for every \(t_i\le 0\) there exists a \(t_i'>0\) such that \(\varphi ^{(i)}_{t_i}(q') = \varphi ^{(i)}_{t_i'}(q')\) for all \(q'\) in \(\mathcal {Q}\). Let \(t'\) be t with all \(t_i \le 0\) replaced by \(t_i'\). Then \(t'\) is in \(\mathbb {R}^{mn}_+\) and \(\Phi ^m(q,t')=\Phi ^m(q,t)=q^*\).      \(\square \)

Fig. 1
figure 1

Representation of the state of the network in a generic initial state (a), after step 1 of the procedure in the proof of Proposition 6.9 (b), and after step 2 (c) and after step 3 (d) of the same procedure. In the above pictures, each point corresponds to a mode, i.e., an element of \(\mathbb {Z}^2_N\) while the color of each circle represents the real/complex value of the corresponding mode: zero (white, no circle), purely imaginary (red), purely real (blue) or having both nonvanishing real and imaginary parts (green)

Defining similarly to (6.126.22) the operation of removing a coordinate from the set \(\mathcal A\)

$$\begin{aligned} \mathcal A\ominus {\varvec{\ell }} = {\left\{ \begin{array}{ll} \mathcal A{\setminus } \{{\varvec{\ell }}\}&{}\quad \text {if } \ell \in \{j+k,j-k\}\cap \mathbb Z_N^2\text { for }{} \textbf{j}, \textbf{k} \in \mathcal A, C_{jk}\ne 0, \textrm{T}(\textbf{j})\cdot \textrm{T}(\textbf{k}) = \textrm{T}({\varvec{\ell }}),\\ \mathcal A&{} \quad \text {else}, \end{array}\right. } \end{aligned}$$
(6.22)

we now proceed to construct (sequences of) times \(\tau \) and interacting triples \(\iota \) such that the transformations \(\Phi _\tau ^{(\iota )}\) of q implement the operations \(\oplus , \ominus \) from (6.126.22), (6.126.22) through the flow of (6.7), i.e., such that \(\mathcal A(q) \oplus {\varvec{\ell }} = \mathcal A(\Phi _\tau ^{\iota }(q))\) or \(\mathcal A(q) \ominus {\varvec{\ell }} = \mathcal A(\Phi _\tau ^{\iota }(q))\) respectively. To do so we separate the possible interactions between the modes in two types:

$$\begin{aligned} a)&\quad \iota = {\varvec{j} \varvec{k} \varvec{\ell }}\in \mathcal I~:~ |j|\ne |k|\ne |\ell |\,, \nonumber \\ b)&\quad \iota = {\varvec{j} \varvec{k} \varvec{\ell }}\in \mathcal I~:~ |j|= |k|\ne |\ell |\,. \end{aligned}$$
(6.23)

Note that these two types of interactions are exhaustive, since if \(|j|=|k|=|\ell |\), \(C_{j\ell } = C_{jk} = C_{k\ell } = 0\).

The following preparatory lemmas describe the properties of these two types of interactions that we will leverage throughout our proof. The first one shows that for interactions of type a), ordering the indices so that \(|j|<|k|<|\ell |\), it is always possible to activate all modes \({\varvec{j}}, {\varvec{k}}, {\varvec{\ell }}\) or to distribute the amplitude of the k-mode to the j and \(\ell \)-modes reaching, in finite time, a state with \(q_{\varvec{k}}=0\). As we show in the proof below, while such a point with \(q_{\varvec{k}}=0\) always exists on the orbits of (6.7), this point is reachable in finite time for \(\iota = {\varvec{j} \varvec{k} \varvec{\ell }}\in \mathcal I\) with \(|j|< |k|< |\ell |\) only if

$$\begin{aligned} E_{\iota }(q) \ne |k|^2 \mathcal E_{\iota }(q)\,, \end{aligned}$$
(6.24)

where \(E_{\iota }(q)\) and \(\mathcal E_{\iota }(q)\) denote the energy and enstrophy of the coordinates in \(\iota \in \mathcal I\):

$$\begin{aligned} E_{\iota }(q) :=\sum _{{\varvec{\ell }}\in \iota } |q_{\varvec{\ell }}|^2\,,\qquad \mathcal E_{\iota }(q) :=\sum _{{\varvec{\ell }}\in \iota } \frac{|q_{\varvec{\ell }}|^2}{|\ell |^2}\,. \end{aligned}$$
(6.25)
Fig. 2
figure 2

Orbits \(\mathcal Q_\iota \) of (6.7) (in red) corresponding in (A) to various values of the energy \(\mathcal E_\iota (q)\) on the sphere of constant enstrophy \(E_\iota (q)\) and in (B) to various values of the enstrophy \(E_\iota (q)\) on the ellipsoid of constant energy \(\mathcal E_\iota (q)\). The axes are, sequentially, \(q_{\varvec{k}}, q_{\varvec{j}}, q_{\varvec{\ell }}\). The orbit with a degenerate point at the pole of the sphere or ellipsoid corresponds to values of \(E_\iota , \mathcal E_\iota \) violating (6.24)

In the following lemma and throughout the section, we abuse notation slightly by defining \(\text {sign}(x) = +1\) for \(x \in [0,\infty )\) and \(-1\) otherwise.

Lemma 6.11

Fix \(\iota = {\varvec{j} \varvec{k} \varvec{\ell }}\in \mathcal I\) with \(|j|<|k|<|\ell |\). Let q be a nondegenerate point in \(\mathcal {Q}_0\) satisfying (6.24) and let \(q_{\varvec{l}}= 0\) for at most an index \({\varvec{l}}\in \{{\varvec{j}},{\varvec{k}},{\varvec{\ell }}\}\). Then the orbit of \(V_\iota \) is periodic and there exist \(\tau _-^\iota , \tau _+^\iota \ge 0\) such that

  1. (a)

    \(\varphi _{\tau _-^\iota }^\iota (q) = q'\) with \(q_{\varvec{k}}' = 0\), \(\textrm{sign}(q_{\varvec{j}}) = \textrm{sign}(q_{\varvec{j}}')\) and \(\textrm{sign}(q_{\varvec{\ell }}) = \textrm{sign}(q_{\varvec{\ell }}')\),

  2. (b)

    \(\varphi _{\tau _+^\iota }^\iota (q) = q''\) with \(q_{\varvec{j}}'', q_{\varvec{k}}'', q_{\varvec{\ell }}'' \ne 0\), \(\textrm{sign}(q_{\varvec{j}}) = \textrm{sign}(q_{\varvec{j}}'')\) and \(\textrm{sign}(q_{\varvec{\ell }}) = \textrm{sign}(q_{\varvec{\ell }}'')\).

Furthermore, if \(|j|^2 \mathcal E_{\iota }(q)< E_{\iota }(q)< |k|^2 \mathcal E_{\iota }(q)\), there exists \(\tau _=^\iota \ge 0\) such that

  1. (c)

    \(\varphi _{\tau _=^\iota }^\iota (q) = q'''\) with \(q_{\varvec{\ell }}''' = 0\), \(\textrm{sign}(q_{\varvec{j}}) = \textrm{sign}(q_{\varvec{j}}''')\) and \(\textrm{sign}(q_{\varvec{k}}) = \textrm{sign}(q_{\varvec{k}}''')\).

Proof

We consider the intersection between the sphere and the ellipse corresponding to the enstrophy and the energy in the coordinates \(\iota = {\varvec{j} \varvec{k} \varvec{\ell }}\in \mathcal I\) of interest, resulting in the set

$$\begin{aligned} \mathcal Q_\iota :=\left\{ (q_{\varvec{j}}',q_{\varvec{k}}',q_{\varvec{\ell }}') \in \mathbb R^3~:~|q_{\varvec{j}}'|^2+|q_{\varvec{k}}'|^2+|q_{\varvec{\ell }}'|^2 = E_{\iota }(q), \frac{|q_{\varvec{j}}'|^2}{|j|^2}+\frac{|q_{\varvec{k}}'|^2}{|k|^2}+\frac{|q_{\varvec{\ell }}'|^2}{|\ell |^2} = \mathcal E_{\iota }(q)\right\} \,. \end{aligned}$$
(6.26)

This set is represented in Fig. 2. We observe that this set has exactly 2 disjoint simply connected components when \(|j|^2 \mathcal E_{\iota }(q)< E_{\iota }(q)< |k|^2 \mathcal E_{\iota }(q)\) and \(|k|^2 \mathcal E_{\iota }(q)< E_{\iota }(q)< |\ell |^2 \mathcal E_{\iota }(q)\). These components are diffeomorphic to \(S^1\). By continuity the dynamics are limited to one such component of \(\mathcal Q_\iota \). Furthermore, \(|\dot{q}|^2\) is uniformly bounded away from 0 on each such component: the fixed points of (6.7) must have at least two coordinates vanishing, which cannot be realized on the curves of interest. Therefore the dynamics on these sets are periodic.

We start by proving part (b) of the lemma. If \(q_{\varvec{j}}, q_{\varvec{k}}, q_{\varvec{\ell }}\ne 0\) the result follows by choosing \(\tau _+^\iota =0\). Else, if \(q_{\varvec{l}}=0\) for \({\varvec{l}}\in \iota \) the result follows immediately choosing \(\tau _+^\iota \) small enough by combining the continuity of the flow \(\Phi _t^\iota \) and the fact that \(\dot{q}_{\varvec{l}}= C_{l'l''}q_{{\varvec{l}}'}q_{{\varvec{l}}''}\ne 0\) for \(\{{\varvec{l}}',{\varvec{l}}''\} = \iota \setminus \{{\varvec{l}}\}\).

To prove part (a) we consider the cases where \(|j|^2 \mathcal E_{\iota }(q)< E_{\iota }(q)< |k|^2 \mathcal E_{\iota }(q)\) and \(|k|^2 \mathcal E_{\iota }(q)< E_{\iota }(q)< |\ell |^2 \mathcal E_{\iota }(q)\) separately. In the first case, we see that there is no point \(q \in \mathcal Q_\iota \) with \(q_{\varvec{j}}= 0\): if that were the case we would have

$$\begin{aligned} E_{\iota }(q) = q_{\varvec{k}}^2+q_{\varvec{\ell }}^2 = |k|^2\left( \frac{q_{\varvec{k}}^2}{|k|^2}+\frac{q_{\varvec{\ell }}^2}{|k|^2}\right) > |k|^2 \mathcal E_\iota (q)\,, \end{aligned}$$
(6.27)

contradicting our assumption. Consequently the points \((p_{\varvec{j}},0,p_{\varvec{\ell }}), (p_{\varvec{j}},0,-p_{\varvec{\ell }})\) with \(p_{\varvec{\ell }}>0\), \(\textrm{sign}(p_{\varvec{j}})=\textrm{sign}(q_{\varvec{j}})\) and

$$\begin{aligned} p_{\varvec{j}}^2 + p_{\varvec{\ell }}^2 = E_\iota (q)\,,\qquad \frac{p_{\varvec{j}}^2}{|j|^2} + \frac{p_{\varvec{\ell }}^2}{|\ell |^2} = \mathcal E_\iota (q)\,, \end{aligned}$$
(6.28)

belong to the same connected component as q and by the lower bound on the velocity on this connected component both these points are reachable in finite time from q. This also proves part (c) by continuity of the dynamics. The second case where \(|k|^2 \mathcal E_{\iota }(q)< E_{\iota }(q)< |\ell |^2 \mathcal E_{\iota }(q)\) can be handled analogously: in this case we have \(\mathcal Q_\iota \cap \{q_{\varvec{\ell }}=0\} = \emptyset \) and we can reach \((p_{\varvec{j}},0,p_{\varvec{\ell }}), (-p_{\varvec{j}},0,p_{\varvec{\ell }})\) with \(p_{\varvec{j}}>0\), \(\textrm{sign}(p_{\varvec{\ell }})=\textrm{sign}(q_{\varvec{\ell }})\) in finite time.\(\square \)

The following lemma considers interactions of type b) in (6.23). Recalling the definition \(j^\perp :=(j_2, -j_1)\) we show that interactions with \(|j|=|k|\ne |\ell |\) leave component \({\varvec{\ell }}\) fixed and move \({\varvec{j}}, {\varvec{k}}\) in a circle at constant angular speed.

Lemma 6.12

Fix an unordered interacting triple \(\iota = {\varvec{j} \varvec{k} \varvec{\ell }}\) with \(|k| = |j|\) and \(q_{\varvec{\ell }}\ne 0\). For all \(\vartheta \) in \([0,2\pi )\) there exists \(t\ge 0\) such that \(\varphi _{t}^\iota (q) = q'\) with \((q_{\varvec{j}}',q_{\varvec{k}}') = \sqrt{q_{\varvec{j}}^2+q_{\varvec{k}}^2}(\cos (\vartheta ), \sin (\vartheta ))\) and \(q_{\varvec{\ell }}' = q_{\varvec{\ell }}\,\).

Corollary 6.13

Fix an (unordered) interacting triple \(\iota = {\varvec{j} \varvec{k} \varvec{\ell }}\in \mathcal I\) with \(|k| = |j|\) and let \(q_{\varvec{\ell }}, q_{\varvec{k}}\ne 0\). Then there exist \(\tau _+^{\iota }, \tau _-^{\iota } \ge 0 \) such that \((\varphi _{\tau _+^\iota }^{\iota } (q))_{\textbf{j}} > 0\) and \((\varphi _{\tau _-^\iota }^{\iota } (q))_{\textbf{j}} = 0\) .

Proof of Lemma 6.12

Recall from (6.3) that if \(|j|=|k|\ne |\ell |\) we have \(C_{jk} = 0\). This implies that, by our choice of \(|k| = |j|\), \(\dot{q}_{\varvec{\ell }}= 0\) and \(q'_{\varvec{\ell }}= q_{\varvec{\ell }}\). Again by (6.3) and since to have an interacting triple \(\ell = j+k\) we must have

$$\begin{aligned} {\langle k^\perp ,\ell \rangle } = {\langle k^\perp ,k+j\rangle } = {\langle k^\perp ,j\rangle } = {\langle (k + j)^\perp - j^\perp ,j\rangle } = {\langle \ell ^\perp ,j\rangle } = -{\langle j^\perp ,\ell \rangle }\,, \end{aligned}$$
(6.29)

so that

$$\begin{aligned} C_{k\ell } = \frac{\langle k,\ell ^\perp \rangle }{4\pi }\bigg (\frac{1}{|k|^2}-\frac{1}{|\ell |^2}\bigg ) = -\frac{\langle j,\ell ^\perp \rangle }{4\pi }\bigg (\frac{1}{|j|^2}-\frac{1}{|\ell |^2}\bigg )=-C_{j\ell }\,. \end{aligned}$$
(6.30)

This implies that the dynamics of the vector \(\tilde{q} :=(q_{\varvec{j}},q_{\varvec{k}})\) can be written as \(\dot{\tilde{q}} = \tilde{C} \tilde{q}^{\perp }\) for \(\tilde{C} :=C_{j\ell }q_{\varvec{\ell }}\ne 0\), proving the claim.\(\square \)

6.3.2 Existence of invariant measure.

As with conservative Lorenz-96, each vector field of the 2D Euler splitting is divergence free and so Lebesgue measure in \(\mathbb {R}^n\) is invariant. Consequently, we have

Lemma 6.14

Let \(\lambda \) denote the Lebesgue measure on \(\mathbb R^n\). The measure obtained by conditioning \(\lambda \) to lie on \(\mathcal Q \subset Q_0(E,\mathcal {E})\), (or equivalently conditioned to lie on \(Q_0(E,\mathcal {E})\)) is \(P_h\)-invariant.

Proof

As in the proof of Proposition 5.2 we have that Lebesgue measure in \(\mathbb R^n\) is \(P_h\)-invariant. Since the vector fields \(V_k\) defined in (6.9) are divergence free, the continuity equationFootnote 7 reads

$$\begin{aligned} \partial _t\lambda + {{\,\textrm{div}\,}}\left( V_k\lambda \right) = \partial _t\lambda + \nabla \lambda \cdot V_k = 0\,. \end{aligned}$$
(6.31)

Because each flow \(\varphi ^{(k)}\) conserves energy E and enstrophy \(\mathcal E\), we locally fiber \(\mathbb R^n\) using coordinates \((E, \mathcal E, \vartheta ) \in \mathbb R_+\times \mathbb R_+ \times \mathbb R^{n-2}\). In these coordinates, we have \(V_k(E, \mathcal E, \vartheta ) = 0\, \partial _E + 0\, \partial _{\mathcal E} + v_k(E, \mathcal E, \vartheta ) \nabla _\vartheta \) so by a change of coordinates of the divergence operator the stationary equation becomes

$$\begin{aligned} 0 = {{\,\textrm{div}\,}}\left( V_k(x)\lambda (x)\right) = u(E, \mathcal E, \vartheta ) {{\,\textrm{div}\,}}_\vartheta (\lambda (E, \mathcal E, \vartheta ) v_k(E, \mathcal E, \vartheta ))\,, \end{aligned}$$
(6.32)

where \({{\,\textrm{div}\,}}_\vartheta \) denotes the “angular” terms of the divergence in \((E, \mathcal E, \vartheta )\)-coordinates, and \(u(E, \mathcal E, \vartheta )\) result from the change of variables. Hence, we can factor the solution \(\lambda (E, \mathcal E, \vartheta ) = \bar{\lambda }(\vartheta |E, \mathcal E) \cdot \lambda ^\perp (E, \mathcal E)\), where \( \bar{\lambda }(\vartheta |E, \mathcal E)\) is the conditional density of Lebesgue measure on a fiber, solving \(u(E, \mathcal E, \vartheta ) {{\,\textrm{div}\,}}_\vartheta (\bar{\lambda }(\vartheta |E, \mathcal E) v_k(E, \mathcal E, \vartheta )) = 0\) for any choice of \(E/(2N^2)< \mathcal E < E\). This proves the invariance of \(\bar{\lambda }(\vartheta |E, \mathcal E)\) under the flow map for any value of the flow times \(\tau \). The stationarity of \(\bar{\lambda }(\vartheta )\) under \(P_h\) follows immediately as in Proposition 5.2\(\square \)

6.3.3 Spanning.

For \(j,k,\ell \in \mathbb {Z}^2_N\) with \(j+k-\ell =0\) define \(M_{jk\ell }\) to be the matrix

(6.33)

and let \(M_{jk\ell }'\) and \(M_{jk\ell }''\) be the 4-by-4 and 2-by-4 matrices consisting of the bottom four and bottom two rows of \(M_{jk\ell }\), respectively. Straightforward Gaussian elimination shows that M, \(M'\), and \(M''\) have ranks 4, 3, and 2 whenever \(C_{jk}\), \(C_{j\ell }\), \(C_{k\ell }\), \(a_j\), \(b_j\), \(a_k\), \(b_k\), \(a_\ell \), and \(b_\ell \) are nonzero.

Recalling that a point \(q\in \mathbb {R}^n\) is generic if all its coordinates are nonzero, we have

Lemma 6.15

The family of vector fields

$$\begin{aligned} \mathcal {V}&:=\big \{V_{a_ja_ka_\ell }, V_{a_jb_kb_\ell }, V_{b_ja_kb_\ell }, V_{b_jb_ka_\ell } : j,k,\ell \in \mathbb {Z}^2_N\ \text {and}\ j+k-\ell =0\big \} \end{aligned}$$

span \(T_q\mathcal {Q}\) at every generic point q in \(\mathcal {Q}\).

Proof

Fix a generic point q in \(\mathcal {Q}\). The main idea of the proof is to choose an enumeration of \(\mathbb {Z}^2_N\) and a subset of vector fields from \(\mathcal {V}\) so that the matrix made up of these vector fields evaluated at q is in a convenient form whose rank is readily deduced. Formally, the enumeration is the bijection \(F:\mathbb {Z}^2_N\rightarrow \{1,\dots ,2N(N+1)\}\) given by

$$\begin{aligned} F(j)&:={\left\{ \begin{array}{ll} 1 &{} \quad j=(1,0)\,, \\ 5+N &{} \quad j=(2,0)\,, \\ j_1+N(2N+1) &{} \quad j=(j_1,0)\ \text {with}\ j_1>2\,, \\ j_1+2+N &{} \quad j=(j_1,1)\ \text {with}\ j_1<3\,, \\ j_1+3+N &{} \quad j=(j_1,1)\ \text {with}\ j_1\ge 3\,, \\ j_1+2-N+(2N+1)j_2 &{} \quad j=(j_1,j_2)\ \text {with}\ j_2>1\,. \end{array}\right. } \end{aligned}$$

Figure 2 gives this enumeration in the case \(N=4\). Informally, F starts at (1, 0), then counts lattice points from left to right along the horizontal line \(y=1\) until the point (2, 1), which corresponds to \(4+N\). It then assigns \(5+N\) to (2, 0) and continues counting along the line \(y=1\). From there it moves up to the lines \(y=2\), \(y=3\), and so on, counting from left to right along each. Finally, it goes back down to the line \(y=0\) and counts the remaining indices from left to right.

The motivation for F is that all horizontally-adjacent indices \((j_1,j_2)\) and \((j_1+1,j_2)\) form an interacting triple together with (1, 0). Fix for the moment an integer \(y>1\) and consider the yth horizontal line of \(\mathbb {Z}^2_N\); that is, the points with second coordinate y. These are outlined by red blocks in Fig. 3. By the preceding remarks we can choose the vector fields corresponding to the horizontally-adjacent indices and concatenate them column-wise to get the block matrix

$$\begin{aligned} B_y :=\left( \begin{array}{c|c|c|c} \widetilde{M}_{{y}} &{} * &{} * &{} * \\ \hline 0 &{} M_{{y, -N+2}}'' &{} * &{} * \\ \hline 0 &{} 0 &{} \ddots &{} * \\ \hline 0 &{} 0 &{} 0 &{} M_{{y,N}}'' \end{array}\right) . \end{aligned}$$

Here, slightly abusing notation, each \(M_{y,i}''\) is the 2-by-4 matrix consisting of the bottom two rows of (6.33) for the indices \(j = (1,0), k= (i-1,y), \ell = (j,y)\) and

$$\begin{aligned} \widetilde{M}_y&:=\begin{pmatrix} C_{j\ell }a_ja_\ell &{} \quad 0 &{} \quad C_{j\ell }b_jb_\ell &{} \quad 0 &{} \quad 0 &{} \quad 0 \\ 0 &{} \quad C_{j\ell }a_jb_\ell &{} \quad 0 &{} \quad -C_{j\ell }b_ja_\ell &{} \quad 0 &{} \quad 0 \\ -C_{jk}a_ja_k &{} \quad 0 &{} \quad 0 &{} \quad C_{jk}b_jb_k &{} \quad -C_{j'k'}a_{j'}a_{k'} \\ 0 &{} \quad -C_{jk}a_jb_k &{} \quad -C_{jk}b_ja_k &{} \quad 0 &{} \quad 0 &{} \quad -C_{j'k'}a_{j'}b_{k'} \end{pmatrix} \end{aligned}$$

where \(j=(1,0), k=(-N,y), \ell =(-N+1,y)\) and \(j'=(0,1)\) and \(k'=(-N+1,y-1)\). This is \(M'\) with two columns from the interacting triple \((0,1), (-N+1,y-1), (-N+1,y)\) adjoined to the end. Note that these adjoined columns contribute entries in the coordinates corresponding to (0, 1) and \((-N+1,y-1)\), but these come before all indices in the yth row for our ordering. By adding the latter two columns, \(\widetilde{M}_y\) has rank 4 at any generic point. Further, since each \(M_{{y,j}}''\) has rank 2, each \(B_y\) has rank \(4+2(2N-1) = 4N+2\). This establishes spanning of the red blocks in Fig. 3.

For the blue block we perform a similar procedure to the one above to get

$$\begin{aligned} B_1 :=\left( \begin{array}{c|c|c|c|c|c} M_{123} &{} * &{} * &{} * &{} * &{} * \\ \hline 0 &{} M_{{1,-N+2}}'' &{} * &{} * &{} * &{} * \\ \hline 0 &{} 0 &{} \ddots &{} * &{} * &{} * \\ \hline 0 &{} 0 &{} 0 &{} \widehat{M} &{} * &{} * \\ \hline 0 &{} 0 &{} 0 &{} 0 &{} \ddots &{} * \\ \hline 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} M_{{1,N}}'' \end{array}\right) \end{aligned}$$

where \(M_{123}\) is the matrix from (6.33) for the interacting triple \((1,0), (-N,1), (-N+1,1)\), each \(M''\) is as before, and \(\widehat{M}\) is the 6-by-8 matrix

$$\begin{aligned} \widehat{M}&:=\left( \begin{array}{ccc|cc} &{} \begin{array}{lll} &{} \quad &{} \quad \\ &{} \quad M_{1,N+3,N+4}' &{} \quad \\ &{} \quad &{} \quad \end{array} &{} \quad &{} \quad \begin{array}{llll} 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 \\ 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 \\ --- &{} \quad --- &{} \quad --- &{} \quad --- \end{array} \\ &{} \begin{array}{llll} --- &{} \quad --- &{} \quad --- &{} \quad --- \\ 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 \\ 0 &{} \quad 0 &{} \quad 0 &{} \quad 0 \\ \end{array} &{} &{} \begin{array}{lll} &{} &{} \\ &{} M_{N+2,N+4,N+5}' &{} \\ &{} &{} \end{array} \end{array}\right) \end{aligned}$$

located at the rows corresponding to \(N+3, N+4\), and \(N+5\). The reason for \(\widehat{M}\), and for considering the blue block separately, is that \(C_{jk}=0\) when \(j=(1,0)\) and \(k=(0,1)\). The matrix M has rank 6 at a generic point. Since \(M_{123}\) has rank 4, \(\widehat{M}\) has rank 6, and each of the \(2N-3\) remaining \(M''\) blocks has rank 2, the matrix \(B_y\) has rank \(4+6+2(2N-3) = 4N+4\).

Finally, none of the indices of the green block interact with (1, 0) since the \(C_{jk}\) are all 0 in this case. However, by an entirely similar procedure to above, we can use the interactions between (0, 1), (x, 0), and (x, 1) for \(x>1\) to get a rank \(2(N-2)\) block matrix for the last \(N-2\) coordinates of the form

$$\begin{aligned} B_{N+1} :=\left( \begin{array}{c|c|c|c} \tilde{M}_{{0,2}}'' &{} * &{} * &{} * \\ \hline 0 &{} \tilde{M}_{{0,3}}'' &{} * &{} * \\ \hline 0 &{} 0 &{} \ddots &{} * \\ \hline 0 &{} 0 &{} 0 &{} \tilde{M}_{{0,N}}'' \end{array}\right) \end{aligned}$$

where \(\tilde{M}_{{0,x}}'' = M_{(0,1),(x,0),(x,1)}''\) for \(M_{jk\ell }''\) consisting of the two bottom rows of (6.33). Combining the above results we observe that there is an ordering of indices and vector fields such that the matrix whose columns consist of these vector fields has the form

$$\begin{aligned} B :=\left( \begin{array}{c|c|c|c} B_1 &{} * &{} * &{} * \\ \hline 0 &{} B_2 &{} * &{} * \\ \hline 0 &{} 0 &{} \ddots &{} * \\ \hline 0 &{} 0 &{} 0 &{} B_{N+1} \end{array}\right) . \end{aligned}$$

Moreover, B has rank

$$\begin{aligned} \text {rank}(B)&= \text {rank}(B_1) + \text {rank}(B_{N+1}) + \sum _{y=2}^N \text {rank}(B_y) = 4N(N+1)-2 = n-2 \end{aligned}$$

at every generic point in \(\mathcal {Q}\). Now since the dynamics conserve energy and enstrophy, every tangent vector to \(\mathcal {Q}\) is perpendicular to the normal vectors for these two quantities which are linearly independent at every generic point. Therefore the maximum dimension of \(T_q\mathcal {Q}\) is \(n-2\), and by the above argument we have shown the vector fields \(\mathcal {V}\) span \(T_q\mathcal {Q}\) at q. \(\square \)

Fig. 3
figure 3

Ordering of \(\mathbb {Z}^2_N\) when \(N=4\) 

7 Adding Forcing and Dissipation: Lorenz-96 and 2D Navier–Stokes

In this section we add dissipation and fixed body forcing to both conservative Lorenz-96 and Galerkin approximations of 2D Euler by introducing a new vector field

$$\begin{aligned} V_0(x)= -\nu \Lambda x + F \end{aligned}$$
(7.1)

to the splittings constructed in Sects. 5 and 6, where \(\nu >0\) is an arbitrary constant, F a fixed nonzero vector with nonnegative entries, and \(\Lambda \) a linear operator satisfying

$$\begin{aligned} \Lambda x \cdot x \ge \alpha \Vert x\Vert ^2 \end{aligned}$$
(7.2)

for some \(\alpha >0\). For the remainder of this section we consider random splittings associated to families of complete, smooth vector fields \(\mathcal {V}=\{V_k\}_{k=0}^n\) on \(\mathbb {R}^d\) satisfying.

Assumption 2

\(V_0\) is as in (7.1) and the flows of the other \(V_k\) conserve Euclidean norm.

Fix \(h>0\) and let \(P_h\) be the transition kernel of a random splitting satisfying Assumption 2. When \(\Lambda \) is the identity matrix, the addition of \(V_0\) to the splitting of conservative Lorenz-96 gives a splitting of the full Lorenz-96 model, (1.4), while for 2D Euler the resulting \(V_0\) corresponds to a friction or drag term sometimes called Ekman damping. When \(\Lambda \) is diagonal with diagonal entry \(|k|^2\) in the spots associated toFootnote 8\(a_k\) and \(b_k\), which corresponds to a Laplacian written in Fourier space, the addition of \(V_0\) to the splitting of 2D Euler gives a splitting of 2D Navier–Stokes, (1.1).

Note that the dissipative part of \(V_0\) in (7.1) depends linearly on x whereas the forcing is constant. Thus dissipation dominates forcing for sufficiently large x and, since the remaining vector fields are conservative, the splitting dynamics cannot grow too large. Specifically, letting \(\Phi _{h\tau }\) be as in (2.2) but with the solution \(\varphi ^{(0)}\) of \(\dot{x}=V_0(x)\) appended to the beginning of each cycle, we have

Lemma 7.1

Under Assumption 2 for any initial x and \(m>0\),

$$\begin{aligned} \Vert \Phi _{h\tau }^m(x)\Vert ^2 \le \Vert x\Vert ^2e^{-\nu \alpha h \sum _{k=0}^{m} \tau _{k(n+1)}} + \frac{1}{\nu ^2\alpha ^2}\Vert F\Vert ^2\left( 1 - e^{-\nu \alpha h \sum _{k=0}^m \tau _{k(n+1)}}\right) . \end{aligned}$$
(7.3)

Proof

Letting \(\varphi =\varphi ^{(0)}\), we have

$$\begin{aligned} \partial _t\Vert \varphi _t\Vert ^2&= 2\langle F,\varphi _t\rangle - 2\nu \langle \Lambda \varphi _t,\varphi _t\rangle \\&\le \frac{1}{\nu \alpha }\Vert F\Vert ^2+\nu \alpha \Vert \varphi _t\Vert ^2-2\nu \alpha \Vert \varphi _t\Vert ^2 = \frac{1}{\nu \alpha }\Vert F\Vert ^2-\nu \alpha \Vert \varphi _t\Vert ^2, \end{aligned}$$

where the inequality follows from (7.2) and \(2\langle F,\varphi _t\rangle \le (\nu \alpha )^{-1}\Vert F\Vert ^2+\nu \alpha \Vert \varphi _t\Vert ^2\). Solving

$$\begin{aligned} \dot{y} = \frac{1}{\nu \alpha }\Vert F\Vert ^2-\nu \alpha y \end{aligned}$$

from \(y(0)=\Vert x\Vert \) together with the comparison theorem for ODEs [42] then gives

$$\begin{aligned} \Vert \varphi _t(x)\Vert ^2&\le \Vert x\Vert ^2 e^{-\nu \alpha t}+\frac{1}{\nu ^2\alpha ^2}\Vert F\Vert ^2\left( 1-e^{-\nu \alpha t}\right) \end{aligned}$$

for all time. Furthermore, since \(\varphi ^{(k)}\) conserves norm for \(1\le k\le n\), the above implies

$$\begin{aligned} \Vert \Phi _{h\tau }(x)\Vert ^2&= \Vert \varphi ^{(n)}_{h\tau _n}\circ \cdots \circ \varphi ^{(0)}_{h\tau _0}(x)\Vert ^2\\&= \Vert \varphi ^{(0)}_{h\tau _0}(x)\Vert ^2 \le \Vert x\Vert ^2 e^{-\nu \alpha \tau _0}+\frac{1}{\nu ^2\alpha ^2}\Vert F\Vert ^2\left( 1-e^{-\nu \alpha \tau _0}\right) . \end{aligned}$$

The result then follows by straightforward induction on the number of cycles, namely m.\(\square \)

Remark 7.2

The convergence results of Sect. 4 do not directly apply to Lorenz-96 and Galerkin approximations of 2D Navier–Stokes since \(\mathcal {V}\)-orbits are generally unbounded in both models. However, Lemma 7.1 implies that any splitting starting from x whose vector fields satisfy Assumption 2 will lie inside the ball of radius \(\Vert x\Vert ^2+(\nu \alpha )^{-2}\Vert F\Vert ^2\) centered at the origin for all nonnegative times. In particular, since the splitting vector fields are smooth, a bound analogous to (4.1) holds for all x in the ball \(B_r(0)\) of radius r centered at the origin in the ambient Euclidean space. Thus all convergence results of Sect. 4 hold for these random splittings when \(\mathcal {C}^k(\mathcal {X})\) is replaced by \(\mathcal {C}^k_r(\mathcal {X})\), the space of k-times continuously differentiable functions that vanish outside \(B_r(0)\). Intuitively, this says that for any initial condition x, the trajectories of a random splitting satisfying Assumption 2 will converge on average and almost surely as \(h\rightarrow 0\) to the trajectory of the true dynamics starting from x.

Corollary 7.3

The Euclidean norm is a Lyapunov function for \(P_h\). That is, there exist constants \(K\ge 0\) and \(\gamma \in (0,1)\) such that for all \(x\in \mathbb {R}^d\),

$$\begin{aligned} \left( P_h\Vert \cdot \Vert \right) (x)&\le \gamma \Vert x\Vert +K. \end{aligned}$$

Proof

By Lemma 7.1, specifically \(\Vert \Phi _{ht}(x)\Vert \le \Vert x\Vert e^{-\frac{1}{2}\nu \alpha t_0}+(\nu \alpha )^{-1}\Vert F\Vert \), we have

$$\begin{aligned} \left( P_h\Vert \cdot \Vert \right) (x)&= \int _{\mathbb {R}^{n+1}_+} \Vert \Phi _{ht}(x)\Vert e^{-\sum t_k} dt \le \frac{1}{1+\frac{1}{2}\nu \alpha h}\Vert x\Vert +\frac{1}{\nu \alpha }\Vert F\Vert \end{aligned}$$

for any x. The result follows with \(K=(\nu \alpha )^{-1}\Vert F\Vert \) and \(\gamma =(1+\tfrac{1}{2}\nu \alpha h)^{-1}\).\(\square \)

7.1 Ergodicity

We now present a variation of Theorem 3.1, namely Theorem 7.4, which simplifies verification of ergodicity in the present setting. Recall from Sects. 5 and 6 that one of the difficulties in verifying Theorem 3.1 was proving controllability, i.e., the existence of a distinguished point \(x_*\) that could be reached by the splitting dynamics in finite time from any other point. With the addition of dissipation, the fixed point \(\nu ^{-1}\Lambda ^{-1}F\) of \(\dot{x}=V_0(x)\) is a natural candidate for \(x_*\) and, as we will see, the fact that it is globally attracting obviates several technicalities associated with controllability in the conservative cases discussed above.

Theorem 7.4

Suppose Assumption 2 holds and set \(x_*=\nu ^{-1}\Lambda ^{-1}F\). If there exist \(m \ge 0 \) and t in \(\mathbb {R}^{mn}_+\) such that the Lie bracket condition holds at \(\widetilde{x}:=\Phi ^m_{ht}(x_*)\), then \(P_h\) has a unique invariant measure \(\mu \) for all \(h>0\). Furthermore, there exist \(C>0\) and \(\gamma \) in (0, 1) such that for all x in \(\mathbb {R}^d\),

$$\begin{aligned} \Vert P_h^m(x,\cdot )-\mu \Vert&\le C\gamma ^m \end{aligned}$$
(7.4)

where \(\Vert \cdot \Vert \) is the norm on probability measures induced by the weighted supremum norm \(\Vert f\Vert :=\sup _x |f(x)|/(1+\Vert x\Vert )\) on bounded measurable functions \(f:\mathbb {R}^d\rightarrow \mathbb {R}\).

The proof of Theorem 7.4 uses the following lemmas. The first, due to Krylov-Bogolubov, is a standard result from the theory of Markov processes [23]. The second, which follows from Lemma 3.2 and Theorem 3.6, is from [6, Theorem 4.4]. For the statement of Lemma 7.5, recall a transition kernel P on \(\mathbb {R}^d\) is Feller if Pf is continuous whenever \(f:\mathbb {R}^d\rightarrow \mathbb {R}\) is continuous and bounded. Also, a sequence of probability measures \(\{\mu _m\}\) on \(\mathbb {R}^d\) is tight if for every \(\varepsilon >0\) there exists a compact subset K of \(\mathbb {R}^d\) such that \(\mu _m(K)\ge 1-\varepsilon \) for all m.

Lemma 7.5

Let P be a Feller probability transition kernel on \(\mathbb {R}^d\). If there exists x in \(\mathbb {R}^d\) such that \(\{P^m(x,\cdot )\}_{m=0}^\infty \) is tight, then P has an invariant probability measure.

Lemma 7.6

Suppose \(\Phi ^m_{ht}(x)=\widetilde{x}\) and the Lie bracket condition holds at \(\widetilde{x}\). Then there exists a \(c>0\), an \(\widetilde{m}\), and neighborhoods \(U_x\) of x and \(\widetilde{U}\) of \(\widetilde{x}\) such that for all y in \(U_x\) and B in \(\mathcal {B}(\mathcal {X})\),

$$\begin{aligned} P_h^{\widetilde{m}}(y,B)&\ge c\lambda \left( B\cap \widetilde{U}\right) . \end{aligned}$$

The following proof is another instance of the rather classical idea, dating at least back to the split chains of Nummelin [49] and work of Meyn and Tweedie [43], that the existence of a globally accessible point at which the dynamics is continuous in the right sense implies the transition densities converge to a unique equilibrium measure. If the return to the globally accessible point has finite expectation, then mixing is exponential. The same basic structure of the SDE version of our system was leveraged in [18] to prove exponential mixing (see also [41]). In the closely related PDMP setting, analogous results are found in [38] in a specific example and [7] in a more general context.

Proof of Theorem 7.4

We first prove existence. Continuity of \(\Phi _{ht}\) immediately implies \(P_h\) is Feller. Furthermore, Lemma 7.1 implies that random splitting starting from any x is constrained to lie in a compact subset of \(\mathbb {R}^d\), namely the closed ball of radius \(\Vert x\Vert ^2+(\nu \alpha )^{-2}\Vert F\Vert ^2\) centered at the origin. Thus, for any x, the sequence \(\{P_h^m(x,\cdot )\}_{m=0}^\infty \) is tight and existence follows from Lemma 7.5.

Next we prove uniqueness. The hypothesis and Lemma 7.6 together imply the existence of \(c>0\), \(\widetilde{m}\), and neighborhoods \(U_*\) of \(x_*\) and \(\widetilde{U}\) of \(\widetilde{x}\) such that

$$\begin{aligned} P_h^{\widetilde{m}}(x,B)&\ge c\lambda \left( B\cap \widetilde{U}\right) \end{aligned}$$
(7.5)

for all \(x\in U_*\) and Borel sets B. Also, positive-definiteness of \(\Lambda \) implies

$$\begin{aligned} \Vert \varphi ^{(0)}_t(x) - x_*\Vert&\le e^{-\alpha t} \Vert x-x_*\Vert \end{aligned}$$

for any \(x\in \mathbb {R}^d\) and \(t\ge 0\). In particular, for any open ball \(B_r\) of radius r centered at the origin, there exists \(T_0>0\) such that \(\varphi ^{(0)}_{ht}(B_r)\) is properly contained in \(U_*\) whenever \(ht>T_0\). And since \(\varphi ^{(0)}_{ht}(B_r)\) is properly contained in \(U_*\) and the \(\varphi ^{(k)}\) are continuous, there exist \(T_k>0\) such that \(\Phi _{ht}=\varphi ^{(n)}_{ht_n}\circ \cdots \circ \varphi ^{(0)}_{ht_0}(x)\in U_*\) for all \(x\in B_r\) and \(ht_k\in (0,T_k)\). So, for any \(x\in B_r\),

$$\begin{aligned} P_h(x, U_*)&\ge \int _0^{T_n}\cdots \int _0^{T_1}\int _{T_0}^\infty {{\,\mathrm{\mathbbm {1}}\,}}_{U_*}\left( \Phi _{ht}(x)\right) e^{-\sum t_k} dt = \frac{1}{T_0}\prod _{k=1}^n \left( 1-e^{-T_k}\right) > 0 \end{aligned}$$

and hence \(\inf _{x\in B_r} P_h(x, U_*)>0\).

As in the proof of Theorem 3.1, suppose toward a contradiction that \(\mu _1\) and \(\mu _2\) are distinct \(P_h\)-ergodic probability measures and that \(A_1\) and \(A_2\) are disjoint measurable sets partitioning \(\mathbb {R}^d\) with \(\mu _i(B)=\mu _i(B\cap A_i)\) for all Borel sets B. Fix \(x_i\) in the support of \(\mu _i\), let r be sufficiently large that \(x_1,x_2\in B_r\), and set \(\kappa :=\inf _{x\in B_r} P_h(x,U_*)>0\). Then by (7.5) for any Borel set B,

$$\begin{aligned} \mu _i(B)= & {} \mu _i P_h^{\widetilde{m}+1}(B) = \int _{\mathbb {R}^d}\int _{\mathbb {R}^d} P_h^{\widetilde{m}}(y,B)P_h(x,dy)\mu _i(dx) \nonumber \\\ge & {} \int _{B_r}\int _{U_*} P_h^{\widetilde{m}}(y,B)P_h(x,dy)\mu _i(dx) \ge \kappa c\lambda \left( B\cap \widetilde{U}\right) \mu _i\left( B_r\right) . \end{aligned}$$
(7.6)

In particular, \(\mu _i(B)=0\) implies \(\lambda (B\cap \widetilde{U})=0\) since c, \(\kappa \), and \(\mu _i(B_r)\) are all strictly positive (the latter because \(B_r\) is an open set containing both \(x_1\) and \(x_2\) which were chosen to be in the supports of \(\mu _1\) and \(\mu _2\), respectively). But \(\mu _1(A_2\cap \widetilde{U})=\mu _2(A_1\cap \widetilde{U})=0\) and so we obtain the contradiction

$$\begin{aligned} 0&< \lambda \left( \widetilde{U}\right) = \lambda \left( A_1\cap \widetilde{U}\right) +\lambda \left( A_2\cap \widetilde{U}\right) = 0, \end{aligned}$$

which concludes the proof of uniqueness.

Finally, for the exponential convergence statement (7.4), we have from (7.6) that for any \(r>0\),

$$\begin{aligned} \inf _{x\in B_r} P_h^{\widetilde{m}+1}(x,B)&\ge \kappa c\lambda \left( B\cap \widetilde{U}\right) \end{aligned}$$

for all Borel sets B. That is, the transition probabilities \(P_h^{\widetilde{m}+1}(x,\cdot )\) are minorized uniformly over \(B_r\) by the probability measure \(\widetilde{\lambda }:=\lambda (\widetilde{U})^{-1}\lambda (\cdot \cap \widetilde{U})\). Exponential convergence then follows from Corollary 7.3 upon taking \(r>2K/(1-\gamma )\). See for example Theorem 1.2 in [25].\(\square \)

Corollary 7.7

Consider the random splitting of Lorenz-96 associated to the vector fields \(\{V_k\}_{k=0}^n\), where \(V_0(x)=-\nu x+F\) and \(\{V_k\}_{k=1}^n\) are the splitting vector fields of conservative Lorenz-96 from Sect. 5. If \(x_*:=-\nu F\) is not a fixed point of conservative Lorenz-96, i.e., \(\nu ^2\sum _{k=1}^n (F_k^2+F_{k+1}^2)F_{k-1}^2\ne 0\), then the random splitting has a unique, and hence ergodic, invariant measure on \(\mathbb R^n\) and the dynamics converge to this measure at an exponential rate in the sense of (7.4).

Proof

The determinant of the n-by-n matrix

is

$$\begin{aligned} x_1x_{n-1}x_n\left( \prod _{x=2}^{n-2}x_k^2\right) \left( \nu ^2\Vert x\Vert ^2-\langle F,x\rangle \right) . \end{aligned}$$

So the \(\{V_k\}_{k=0}^n\) span \(\mathbb {R}^n\) at every x with nonzero coordinates and satisfying \(\nu ^2\Vert x\Vert ^2\ne \langle F,x\rangle \). In particular, since \(x_*\) is not a fixed point of conservative Lorenz-96, we showed in the proof of Proposition 5.2 that \(x_*\) can be moved via the splitting dynamics to some \(\widetilde{x}\) with nonzero coordinates. Finally, by rotating slightly more on the last step if necessary, we can also guarantee \(\nu ^2\Vert \widetilde{x}\Vert ^2\ne \langle F,\widetilde{x}\rangle \). Thus the Lie bracket condition holds at \(\widetilde{x}\) and the result follows by Theorem 7.4.\(\square \)

Corollary 7.8

Fix \(N\ge 2\) and set \(n=4N(N+1)\). Consider the random splitting of the \(N\text {th}\) Galerkin approximation of 2D Navier–Stokes associated to \(\{V_k\}_{k=0}^n\), where \(V_0(x)=-\nu \Lambda x+F\) with \(\Lambda \) the n-by-n diagonal matrix corresponding to the Laplacian discussed at the beginning of this section, and \(\{V_k\}_{k=1}^n\) the splitting vector fields of 2D Euler from Sect. 6. If F is nondegenerate in the sense of Definition 6.3, then the random splitting has a unique, and hence ergodic, invariant measure and the dynamics converge to this measure at an exponential rate in the sense of (7.4).

Proof

Recall in this case \(V_0(x)=-\nu \Lambda x+F\) where \(\Lambda \) is the diagonal matrix with diagonal entry \(|k|^2\) in the slots corresponding to the coordinates \(a_k\) and \(b_k\). Fix \(j,k,\ell \in \mathbb {Z}^2_N\) with \(j+k-\ell =0\) and let W be one of the vector fields \(V_{a_ja_ka_\ell }\), \(V_{a_jb_kb_\ell }\), \(V_{b_ja_kb_\ell }\), or \(V_{b_jb_ka_\ell }\). Letting e.g. \((x_j,x_k,x_\ell )=(a_j,a_k,a_\ell )\) when \(W=V_{a_ja_ka_\ell }\) and similarly for the other cases, direct computation yields

figure a

where \([V_0,W]_j(x)\) is the component of \([V_0,W]\) corresponding to the component \(x_j\) of x, and similarly for \([V_0,W]_k\) and \([V_0,W]_\ell \). As in the 2D Euler case, Gaussian elimination shows that the 6-by-6 matrix (see (6.33) for an explicit form of the middle 4 columns)

figure b

is rank 6 at every genericFootnote 9 point q in \(\mathbb {R}^n\). Thus \(V_0\) and \([V_0,W]\) add two new directions to the splitting vector fields of 2D Euler and by an entirely similar argument to the spanning argument in Sect. 6.3.3 we have that the Lie bracket condition holds at every such q. Furthermore, since F is nondegenerate the controllability argument of Sect. 6.3.1 implies \(x_*\) can be evolved via the split dynamics to a generic point. The result then follows by Theorem 7.4.\(\square \)

Remark 7.9

A very similar argument to the one above proves unique ergodicity for Ekman damping as well, i.e., when \(\Lambda \) is the identity matrix on \(\mathbb {R}^n\). In this case (7.1) becomes

$$\begin{aligned}&[V_0,W]_j(x) = C_{k\ell }\left( F_kx_\ell +F_\ell x_k-\nu x_kx_\ell \right) , \\&[V_0,W]_k(x) = C_{j\ell }\left( F_jx_\ell +F_\ell x_j-\nu x_jx_\ell \right) , \\&[V_0,W]_\ell (x) = -C_{jk}\left( F_jx_k+F_k x_j-\nu x_jx_k\right) , \end{aligned}$$

and the rest of the argument goes through unchanged.