5.1 Introduction

Climate studies require model simulations over periods from centuries to millenia, which are only affordable if ocean models are kept relatively coarse. Many of them stay at a resolution of about one degree and need to parameterize the effect of unresolved mesoscale eddies and smaller-scale motions. The issue of mesoscale eddy parameterization attracts continuing interest as exemplified by recent studies on eddy potential vorticity fluxes (Marshall and Adcroft 2010; Eden 2010; Ringler and Gent 2011; Marshall et al. 2012). With increasing computational power, eddy-permitting (barely resolving the first baroclinic Rossby radius) or eddy-resolving models are becoming feasible for climate studies, too, so that mesoscale dynamics will gradually be resolved. Nonetheless, as the first baroclinic Rossby radius varies widely (with values below 10 km in high latitudes), even eddy-resolving models will not necessarily represent eddy dynamics with the same skill everywhere unless their resolution is on the scale of a few kilometers. Combining resolved mesoscale dynamics in some parts of the ocean with parameterized dynamics in the other part is an interesting possibility, but cautionary results by Hallberg (2013) indicate that the transition from parameterized eddies to resolved eddies can introduce problems of its own.

Even though state-of-the-art eddy-permitting or eddy-resolving models simulate the mesoscale dynamics with a certain skill, they still use some form of explicit and/or implicit viscosity, thought to represent the effect of unresolved small-scale subgrid dynamics. The motivation is based on the picture of quasigeostrophic turbulence (Charney 1971), which indicates that the direct cascade of enstrophy has to be removed at the grid scale to prevent the enstrophy from piling up, causing code instability. Fox-Kemper and Menemenlis (2008) discuss common approaches used in oceanographic practice, in particular the Smagorinsky or Leith parameterizations in either harmonic or biharmonic implementation . While these ideas appear plausible, there are no solid theoretical arguments, especially outside the limits of applicability of quasigeostrophic theory which is questionable at grid scale.

The detailed form of the subgrid operators (e.g., Laplacian vs. biharmonic viscosity), however, is known to impact the large-scale dynamics such as the path and separation of the Gulf Stream (Hecht et al. 2008b). Moreover, removal of enstrophy at the grid scale is accompanied by energy dissipation. For example, Danilov (2005) has shown that a direct enstrophy cascade in two-dimensional turbulence is always associated with a noticeable direct energy cascade, resulting in dissipation at finite resolution . Jansen and Held (2014) point out that the popular biharmonic viscosity operator suppresses resolved eddy motion in models where the separation between the mesoscale and the grid scale is insufficient. This reduces the ability of the flow to drain eddy kinetic energy from the available potential energy (APE) , thereby distorting the entire energy cycle. The effect is most pronounced for eddy-permitting models where the grid scale and the scale of APE release are not well separated. It is also important for eddy-resolving models as the first baroclinic Rossby radius may locally drop below grid scale. The remedies are less immediate and open for investigation; searching for them is the main aim of ongoing studies. There is growing interest in this topic in the community as eddy-permitting models are now beginning to be used in climate research, so that the question of how to make them more realistic becomes pressing; see, e.g., Jansen et al. (2015), Berloff (2015), Cooper and Zanna (2015), and Cooper (2017).

Analyzing the effects of spectral pileup and backscatter of eddy energy in response to common subgrid parameterizations is rather straightforward for simple two-dimensional flows with a prescribed kinetic energy production rate (Graham and Ringler 2013), but the question remains open in the context of more realistic dynamics which includes the effects of baroclinicity and where the geometry of boundaries and topography makes spectral analysis only locally applicable. Moreover, in real flows the balance between energy production and dissipation ceases to be local, which further complicates the situation. It is not clear how the subgrid operators affect the energy exchange between balanced (quasigeostrophic) and non-balanced motions as the resolution is increasing. More broadly, the mathematical side of subgrid parameterization as used in oceanographic tasks needs a more firm basis which would dictate a scale- and frame-invariant structure for admissible parameterizations.

Fig. 5.1
figure 1

Effect of momentum advection discretization on the relative vorticity field in a baroclinically unstable channel flow (top: vector-invariant form, bottom: flux form; near-surface snapshots are shown). Mesh resolution varies (1/36 degree in the central part and coarser elsewhere). Observed scales and amplitude of small eddies in the central part differ substantially between the two schemes due to the difference in implicit dissipation and discretization residual. The variance of vertical velocity (not shown) is substantially lower for the flux form, modifying the APE to eddy KE conversion rate. The figure is based on simulations reported in Danilov and Wang (2015)

It is important to note that the dynamics on scales close to the grid scale is affected not only by explicit subgrid parameterizations, but also by details of the discretization of momentum advection (see Figure 5.1). For example, high-order upwind transport algorithms based on the flux form of the advection operator have implicit numerical dissipation of the same order of magnitude as typical explicit dissipation (see, e.g., Mohammadi-Aragh et al. 2015). Further, there is evidence for a numerical (Hollingsworth) instability associated with the vector-invariant form of momentum advection which creates noise in the vertical velocity field and thus influences the APE to kinetic energy conversion; see the discussion in Gassmann (2013) and Danilov and Wang (2015). Understanding the effects induced by these or other numerical details on the energy balance and accounting for their interplay with subgrid parameterizations is a necessary element on the road to rigorous analysis.

The need to explore the interplay between resolution, parameterized subgrid, and spurious numerical dissipation is particularly important for future earth system models employing multi-resolution technology, for example models based on FESOM (Wang et al. 2014) or ICON (Korn 2017). Recent results point to the retardation of eddy saturation when the upstream resolution is coarse (Danilov and Wang 2015). In multi-resolution models on unstructured meshes, subgrid momentum closures are also needed to stabilize against spurious numerical modes appearing on staggered triangular meshes (Danilov 2013), which adds numerical complexity.

For all the reasons mentioned, the question of how to return the overdissipated energy to the resolved scales is of central importance when working at eddy-permitting resolutions. This is known as the energy backscatter problem . On coarser meshes, one needs to additionally parameterize the contribution from mesoscale eddies. In both cases, there is growing interest in stochastic parameterizations. Stochastic parameterizations have been successfully used to maintain sufficient variance in ensemble forecasts (Palmer et al. 2009). However, energy and momentum consistency especially over long simulation timescales have not received as much attention (cf. the discussion in Franzke et al. 2015).

For the momentum closure problem in the ocean, stochastic parameterizations hold promise far beyond the idea of pure dissipation pursued by traditional deterministic subgrid parameterizations and also beyond downgradient parameterizations for unresolved mesoscale eddies. In the ocean context, systematic work on stochastic parameterizations is rather recent: Duan and Nadiga (2007), Mana and Zanna (2014), Jansen and Held (2014), Grooms et al. (2015b), Cooper and Zanna (2015), Cooper (2017), and Berloff (2015) all implement backscatter as stochastic forcing acting on the resolved flow, showing the potential of the approach, but also raising questions about the structure of this forcing and the choice of parameters. Stochastic backscatter can be implemented in a purely statistical way; more sophisticated approaches seek to include dynamical information, for example by shaping the backscatter forcing according to the nonlinear self-interaction derived from elementary solutions to the tangent linear equation (Berloff 2015, 2016).

Further open questions pertain to finding a more general mathematical framework and generalizations away from a quasigeostrophic setting toward the full primitive equations; work in this direction is at the very beginning. Related work on stochastic LES closures for the Navier–Stokes equations was done by San (2014) for the two-dimensional problem in vorticity form and by Xun and Wang (2014) for channel flow in three dimensions. Jansen and Held (2014) show that backscatter for the two-layer quasigeostrophic equations can be parameterized both stochastically and deterministically, with very similar results. Their approach is generalized to a simplified primitive equation isopycnal model in the presence of topography in Jansen et al. (2015). This is a very valuable step, providing one promising starting point and baseline benchmark.

Concerning subgrid dynamics, in most of the approaches cited, the model is either entirely local or is based on global energy constraints and thus couples the energy budget over the entire domain. There are, however, early attempts at “second-order closures” by Daly and Harlow (1970) and Deardorff (1973), where the Reynolds tensor is treated as a prognostic variable and closure conditions have to be supplied for the higher-order moments. Schumann (1975) suggested a model with a single scalar transport equation for the subgrid energy. However, the algebraic closure relations for the diagnostic subgrid contributions are complicated and subject to solvability constraints; see Schmidt and Schumann (1989) and Schumann (1991). These ideas have been revisited subsequently by Schiestel and Dejoan (2005) and Chaouat (2012), but the problem remains open.

To summarize, the main open questions are:

  • Find a suitable mathematical framework for subgrid momentum parameterization with minimized spurious energy dissipation.

  • Implement practical backscatter algorithms in primitive equation ocean circulation models.

Any progress will have substantial impact on the energetic consistency of existing and future climate models.

This chapter aims at an elementary introduction to the circle of questions outlined above. Our intent is not to give ready and complete answers, but to highlight the issues and survey some of the emerging approaches.

We begin with a brief summary of the concept of subgrid momentum closures in Section 5.2. In Section 5.3, we review theoretical ideas on quasigeostrophic turbulence, with a brief summary on ocean mesoscale and submesoscale turbulence. Our main goal is to emphasize that the notion of “subgrid” scale, as related to ocean modeling, depends on the resolution, which complicates the question on subgrid closures.

In Sections 5.4 and 5.5, we review several proposed parameterizations. The first is the approach by Jansen and Held (2014) which is based on a local subgrid energy budget and an essentially empirical backscatter term which may be either deterministic or stochastic. The second, more sophisticated, but also more expensive and less easily generalized approach is due to Grooms and Majda (2013, 2014) who replace the Reynolds stress term with a stochastic process and explicitly evolve the local subgrid statistics in a local micro-cell attached to each grid box. The last emerging closure scheme is due to Mana and Zanna (2014); it was initially introduced semi-empirically, but later justified under precise assumptions by Grooms and Zanna (2017). The section closes with a brief review on \(\alpha \)-models which provide a framework for regularizing fluid equations without adding dissipation which may possibly be interpreted as a nonlinear remapping of wavenumbers.

Section 5.6 offers concluding remarks and some very brief pointers to the literature for further directions beyond those covered so far.

5.2 Subgrid Momentum Closures

To fix concepts, let us focus on the momentum equations for a homogeneous incompressible or Boussinesq rotating ideal fluid,

$$\begin{aligned} \partial _t {\varvec{u}}+ \varvec{\nabla }\cdot ({\varvec{u}}\otimes {\varvec{u}})&+ 2 \varvec{\varOmega }\times {\varvec{u}}+ \rho ^{-1} \, \varvec{\nabla }p = {\varvec{F}}+ D {\varvec{u}}\,, \end{aligned}$$
(5.1a)
$$\begin{aligned}&\quad \varvec{\nabla }\cdot {\varvec{u}}= 0 \,, \end{aligned}$$
(5.1b)

where \({\varvec{u}}\) is the three-dimensional velocity field, \(\varvec{\varOmega }\) the rotation vector, \(\rho \) the constant density, and p the pressure. All force terms are subsumed into \({\varvec{F}}\). In particular, the system can represent the Boussinesq equations when augmented by thermodynamic equation(s) and with \({\varvec{F}}\) representing all other forces including buoyancy. The operator D represents dissipation through physical processes or prior modeling. This equation may be read either as a partial differential equation (PDE) or as a fine-scale numerical approximation thereof.

In analogy with classical large eddy simulation, we introduce a coarsened velocity field \(\overline{{\varvec{u}}}\). We assume very little about the coarsening process other than that it is linear and commutes with time differentiation. In the classical PDE setting, \(\overline{{\varvec{u}}}\) may be obtained from \({\varvec{u}}\) by convolution with a filter kernel. However, the more interesting point of view is that \(\overline{{\varvec{u}}}\) represents the solution of a modified numerical model at lower resolution. Then, \(\overline{{\varvec{u}}}\) satisfies the equation

$$\begin{aligned} \partial _t \overline{{\varvec{u}}}+ \overline{\varvec{\nabla }}\cdot (\overline{{\varvec{u}}}\otimes \overline{{\varvec{u}}})&+ 2 \varvec{\varOmega }\times \overline{{\varvec{u}}}+ \rho ^{-1} \, \overline{\varvec{\nabla }} \;\! \overline{p} = {\varvec{R}}({\varvec{u}}) + \overline{{\varvec{F}}}+ \overline{D} \overline{{\varvec{u}}}\,, \end{aligned}$$
(5.2a)
$$\begin{aligned}&\quad \overline{\varvec{\nabla }}\cdot \overline{{\varvec{u}}}= 0 \,, \end{aligned}$$
(5.2b)

with eddy source term

$$\begin{aligned} {\varvec{R}}({\varvec{u}}) = \overline{\varvec{\nabla }}\cdot (\overline{{\varvec{u}}}\otimes \overline{{\varvec{u}}}) - \overline{\varvec{\nabla }\cdot ({\varvec{u}}\otimes {\varvec{u}})} + \overline{D {\varvec{u}}} - \overline{D} \overline{{\varvec{u}}}\end{aligned}$$
(5.3)

where \(\overline{\varvec{\nabla }}\) and \(\overline{D}\) denote the coarsened gradient or divergence operator and coarsened dissipation operator, respectively. Thinking of coarsening as a change of numerical resolution, we do not assume that coarsening commutes with the fine-scale operators even though this is often true for convolution coarsening on the continuum. However, we have made two minor simplifying assumptions: First, we have commuted the coarsening operation into the Coriolis term, which is exactly true on the f-plane and approximately true for a slowly varying Coriolis parameter, and second, we are assuming that the flow is incompressible at the coarse level with \(\overline{p}\) denoting the implied coarsened pressure. (So, \(\overline{p}\) is not obtained by convolution of p with the filter, but is chosen to enforce incompressibility of the coarse velocities.)

When considering the full Boussinesq system, the transport equations for potential temperature and other thermodynamic quantities need to be coarsened similarly (Aluie and Kurien 2011). For fully compressible flows, it is more natural to coarse-grain the product \(\rho {\varvec{u}}\), thus modifying the expression for \({\varvec{R}}({\varvec{u}})\) above; see, e.g., Aluie (2013). Additional complications arise with nonlinear equations of state; see, e.g., Eden (2016).

The modeling task is now the following: Find a closure or subscale model \(\overline{{\varvec{R}}}(\overline{{\varvec{u}}})\) which correlates highly with the true \({\varvec{R}}({\varvec{u}})\). The closure may be deterministic or include stochastic terms to reduce bias; it may also include infinitesimal or finite memory. If the momentum equation is coupled to thermodynamics, the same considerations apply to each of the prognostic equations.

One of the most elementary concerns is the distribution and flux of energy across scales, as the statistical behavior of the solution depends on it. In the next section, we illustrate the issues relevant to stratified turbulence in the ocean, concentrating mainly on the quasigeostrophic equations. They allow us to explore the main features of large-scale rotating stratified flow and are also used in most of the recent theoretical studies in the field.

5.3 Quasigeostrophic Turbulence and Ocean Eddies

In this section, we review the energetics of large-scale quasigeostrophic (QG) turbulence. In contrast to three-dimensional turbulence where energy cascades to small scales, QG turbulence is distinguished by an inverse cascade of barotropic kinetic energy to large scales and a cascade of enstrophy to small scales. Thus, it is often said that numerical schemes are required to provide a sink for QG enstrophy at grid scale without dissipating energy. In the following, we explain how and under which conditions this picture arises, but also point out its limitations when used in the context of ocean circulation models.

We emphasize that the classical picture of QG turbulence is strictly valid only under the assumption that QG enstrophy is dissipated at scales much smaller than the forcing scale and that turbulence remains geostrophic across all scales. In stratified flow, the main source of barotropic kinetic energy is the conversion of available potential energy via baroclinic instability. The most unstable baroclinic modes occur close to the first internal Rossby radius of deformation \(L_\mathrm{d}\) . As we move to even smaller scales in a full model, the ageostrophic or non-balanced component of the flow increases and the quasigeostrophic approximation becomes inaccurate. Thus, there is only a finite small range of scales between \(L_\mathrm{d}\) and the scale where ageostrophy starts to be important; at smaller scales, the energy cascade is direct. Moreover, since this direct cascade acts as an energy sink, there must also be some downscale flux of energy across the geostrophic range to feed it.

All of the quasigeostrophic models are able to capture only the part of the dynamics that stays close to geostrophic balance, often referred to as mesoscale eddies. In the real ocean, the scales at which ageostrophic effects are becoming important are rather close to \(L_\mathrm{d}\) (see, e.g., Callies and Ferrari 2013) so that the presence of ageostrophic motions, often accompanying smaller-size submesoscale eddies, may have a significant impact on the direction of the energy transfer across scales. An additional complication arises from the fact that there is also a direct cascade of available potential energy which implies that the forcing of the barotropic cascade does not only take place near the most baroclinically unstable mode, but is distributed across a wider range of scales.

In the following, we begin with the simplest concepts in a purely two-dimensional setting, then move to a two-layer model, and finally discuss the continuously stratified quasigeostrophic equations.

5.3.1 Two-Dimensional Turbulence

The very basic notions of rotating turbulence and ocean mesoscale eddies can be introduced in the framework of two-dimensional quasigeostrophic dynamics.

The barotropic quasigeostrophic equations are reviewed, e.g., in Franzke et al. (2019). In the beta-plane approximation and written in terms of the relative vorticity \(\zeta \), they read

$$\begin{aligned}&\partial _t \zeta + [\psi ,\zeta ] + \beta \, \partial _x \psi = F + D\zeta \,, \end{aligned}$$
(5.4a)
$$\begin{aligned}&\qquad \qquad \zeta = \varDelta \psi \,, \end{aligned}$$
(5.4b)

where brackets denote the Jacobian operator \([\psi ,\zeta ] = \varvec{\nabla }^\bot \psi \cdot \varvec{\nabla }\zeta \) with \(\varvec{\nabla }^\bot =(-\partial _y,\partial _x)\), \(\psi \) denotes the stream function, \(\beta \) is the beta-parameter, D is a dissipation operator to be specified below, and F is the forcing. The forcing F is maintained by baroclinic or barotropic instabilities evolving at some intermediate scales.Footnote 1

To begin, we set \(\beta =0\). Further, to simplify the discussion, we non-dimensionalize the horizontal length scale and consider (5.4) on the doubly periodic domain \(\mathbb {T}^2 = [0, 2 \pi ]^2\), so that we can pass to the Fourier representationFootnote 2 where

$$\begin{aligned} \zeta _{\varvec{k}}= \frac{1}{2\pi } \int _{\mathbb {T}^2} \mathrm{e}^{- {\mathrm i}{\varvec{k}}\cdot {\varvec{x}}} \, \zeta ({\varvec{x}}) \, \mathrm{d}{\varvec{x}}\end{aligned}$$
(5.5)

for \({\varvec{k}}\in \mathbb {Z}^2\). It is useful to separate the dissipation operator D into “infrared" and “ultraviolet" parts that effectively act on large (\(D_{\mathrm i}\)) and small (\(D_{\mathrm u}\)) scales. For simplicity, we assume that these operators are diagonal in Fourier space, so that the transformed vorticity equation (5.4a) takes the form

$$\begin{aligned} \partial _t \zeta _{{\varvec{k}}} + J_{{\varvec{k}}} = D_{\mathrm i}({\varvec{k}}) \, \zeta _{{\varvec{k}}} + D_{\mathrm u}({\varvec{k}}) \, \zeta _{{\varvec{k}}} + F_{{\varvec{k}}} \end{aligned}$$
(5.6)

where, writing \(p = |{\varvec{p}}|\) and likewise for the other wavenumber vectors, the Jacobian term is described by

$$\begin{aligned} J_{\varvec{k}}= \frac{1}{2\pi } \sum _{{\varvec{k}}= {\varvec{p}}+{\varvec{q}}} \frac{{\varvec{p}}^\bot \cdot {\varvec{q}}}{p^2} \, \zeta _{\varvec{p}}\, \zeta _{\varvec{q}}\,. \end{aligned}$$
(5.7)

In the absence of dissipation and forcing, equation (5.6) conserves energy

$$\begin{aligned} E = \sum _{{\varvec{k}}\in \mathbb {Z}^2} E_{\varvec{k}}= -\frac{1}{2} \sum _{{\varvec{k}}\in \mathbb {Z}^2} \psi _{{\varvec{k}}}^* \, \zeta _{{\varvec{k}}}^{ {*}} \end{aligned}$$
(5.8)

and enstrophy

$$\begin{aligned} Z = \sum _{{\varvec{k}}\in \mathbb {Z}^2} Z_{\varvec{k}}= \frac{1}{2} \sum _{{\varvec{k}}\in \mathbb {Z}^2} \zeta _{{\varvec{k}}}^* \, \zeta _{{\varvec{k}}}^{ {*}} \,, \end{aligned}$$
(5.9)

with star denoting the complex conjugate.

The presence of two integrals imposes constraints on how energy and enstrophy are transferred in spectral space. The energy balance in each mode \({\varvec{k}}\) is obtained by multiplying (5.6) by \(\psi _{{\varvec{k}}}^*\) and taking the real part, so that

$$\begin{aligned} \partial _t E_{{\varvec{k}}} = T_{{\varvec{k}}} + 2 \, D_{\mathrm i}({\varvec{k}}) \, E_{{\varvec{k}}} + 2 \, D_{\mathrm u}({\varvec{k}}) \, E_{{\varvec{k}}} + P_{{\varvec{k}}} \,, \end{aligned}$$
(5.10)

where \(P_{{\varvec{k}}} = -\mathfrak {R}[\psi ^*_{{\varvec{k}}} \, F_{{\varvec{k}}}^{ {*}}]\) is the rate of energy pumping and \(T_{{\varvec{k}}} = \mathfrak {R}[\psi ^*_{{\varvec{k}}} \, J_{{\varvec{k}}}^{ {*}}]\) is the rate of nonlinear energy transfer into mode \({{\varvec{k}}}\).Footnote 3 Using (5.7), we can write

$$\begin{aligned} T_{{\varvec{k}}} = \sum _{\{{\varvec{p}}, {\varvec{q}}\} :{\varvec{k}}+{\varvec{p}}+{\varvec{q}}= 0} T({\varvec{k}}| {\varvec{p}}{\varvec{q}}) \,, \end{aligned}$$
(5.11)

where

$$\begin{aligned} T({\varvec{k}}| {\varvec{p}}{\varvec{q}}) = \frac{1}{2\pi } \, {\varvec{p}}^\bot \cdot {\varvec{q}}\, (q^2 - p^2) \, \mathfrak {R}[\psi _{\varvec{k}}\, \psi _{\varvec{p}}\, \psi _{\varvec{q}}] \end{aligned}$$
(5.12)

denotes the rate of energy transfer into mode \({\varvec{k}}\) from modes \(\{{\varvec{p}},{\varvec{q}}\}\) and the sum in (5.11) is taken over un-ordered sets \(\{{\varvec{p}},{\varvec{q}}\}\).

Summing up all the \(T_{\varvec{k}}\), we obtain the overall rate of nonlinear energy transfer T. Clearly, \(T=0\) as the rates of sending and receiving energy must balance across all modes. Following (Fjørtoft 1953), we sort the terms in this sum according to membership in resonant triads of modes

$$\begin{aligned} \mathscr {S} = \{ \{ {\varvec{k}}, {\varvec{p}}, {\varvec{q}}\} :{{\varvec{k}}}+{\varvec{p}}+{\varvec{q}}=0 \} \,, \end{aligned}$$
(5.13)

so that

$$\begin{aligned} T = \sum _{\{{\varvec{k}}, {\varvec{p}}, {\varvec{q}}\} \in \mathscr {S}} ( T({\varvec{k}}| {\varvec{p}}{\varvec{q}}) + T({\varvec{p}}| {\varvec{k}}{\varvec{q}}) + T({\varvec{q}}| {\varvec{k}}{\varvec{p}}) ) \,. \end{aligned}$$
(5.14)

Within each triad, \({\varvec{k}}^{\perp } \cdot {\varvec{q}}=- {\varvec{p}}^{\perp } \cdot {\varvec{q}}\). This directly impliesFootnote 4 that

$$\begin{aligned} T({\varvec{p}}| {\varvec{k}}{\varvec{q}}) = -\frac{q^2-k^2}{q^2-p^2} \, T({\varvec{k}}| {\varvec{p}}{\varvec{q}}) \end{aligned}$$
(5.15a)

and

$$\begin{aligned} T({\varvec{q}}| {\varvec{k}}{\varvec{p}}) = -\frac{k^2-p^2}{q^2-p^2} \, T({\varvec{k}}| {\varvec{p}}{\varvec{q}}) \,. \end{aligned}$$
(5.15b)

These identities constrain the transfer of energy within the triad: If \(p<k<q\) and mode \({{\varvec{k}}}\) loses energy by interacting with modes \({\varvec{p}}\) and \({\varvec{q}}\), the two other modes gain energy; vice versa, if mode \({{\varvec{k}}}\) gains energy in this triad interaction, then modes \({\varvec{p}}\) and \({\varvec{q}}\) lose energy. The same holds true for enstrophy. In other words, nonlinear interactions between three modes always transfer energy and enstrophy either from or to the central component.

The total transfer \(T_{\varvec{k}}\) involves all triads this mode participates in and cannot be predicted without additional arguments. Consider first the case without forcing and dissipation, and define the energy wavenumber \(k_\mathrm{e}\) as the centroid of the spectral energy density E(k):

$$\begin{aligned} k_\mathrm{e}= \frac{1}{E} \sum _k k \, E(k) \,, \end{aligned}$$
(5.16)

where we assume that the distribution of energy is isotropic in wavenumber space with E(k) denoting the energy in the shell \(k = |{\varvec{k}}|\). The second moment

$$\begin{aligned} I = \sum _k (k-k_\mathrm{e})^2 \, E(k) = Z - k_\mathrm{e}^2 \, E \end{aligned}$$
(5.17)

is expected to increase with time if energy spreads over wavenumbers. This is natural to expect for any energy spectrum that is initially spectrally localized. Conservation of energy and enstrophy implies that

$$\begin{aligned} \frac{\mathrm{d}I}{\mathrm{d}t} = - E \, \frac{\mathrm{d}k_\mathrm{e}^2}{\mathrm{d}t} \,, \end{aligned}$$
(5.18)

so when energy spreads over wavenumbers, \(k_\mathrm{e}\) decreases; i.e., energy moves to larger scales. Similarly, it can be shown that the enstrophy centroid moves downscale if and only if the second moment of enstrophy indicates a spread of the enstrophy distribution; see Vallis (2006) for details. This consideration indicates that if two-dimensional freely evolving turbulence develops cascades, we should expect an inverse energy cascade and a direct enstrophy cascade . It does not mean that there is no energy transfer to small scales or enstrophy transfer to large scales, it only means that on average energy tends to go upscale and enstrophy tends to go downscale.

In practice, turbulent flows are forced–dissipative systems.Footnote 5 They can reach a statistically steady state if dissipation is present at both spectral ends, as is envisioned in (5.6). Although the cause of infrared dissipation is not immediately apparent, in many cases its role can be efficiently played by bottom friction.Footnote 6 We thus return to the forced–dissipative case and consider the idealized situation when the forcing F is spectrally localized to a small interval around a forcing wavenumber \(k_{\mathrm f}\), infrared dissipation is localized to wavenumbers \(k < k_{\mathrm i}\), and ultraviolet dissipation is localized to \(k>k_{\mathrm u}\), with \(k_{\mathrm i}< k_{\mathrm f}< k_{\mathrm u}\).Footnote 7 Assuming statistical stationarity, the mean rate of energy injection \(\varepsilon \) is balanced by the mean rate of energy dissipation in the infrared \(\varepsilon _{\mathrm i}\) and the mean rate of energy dissipation in the ultraviolet \(\varepsilon _{\mathrm u}\), i.e.,

$$\begin{aligned} \varepsilon = \varepsilon _{\mathrm i}+ \varepsilon _{\mathrm u}\,. \end{aligned}$$
(5.19)

Likewise, writing \(\eta \) to denote the mean rate of enstrophy injection near wavenumber \(k_{\mathrm f}\), we balance with the mean enstrophy dissipation rates \(\eta _{\mathrm i}\) and \(\eta _{\mathrm u}\) in the respective dissipation ranges, so that

$$\begin{aligned} \eta = k_{\mathrm f}^2 \, \varepsilon = \eta _{\mathrm i}+ \eta _{\mathrm u}\,. \end{aligned}$$
(5.20)

Noting that \(\eta _{\mathrm i}\le k_{\mathrm i}^2 \, \varepsilon _{\mathrm i}\), we estimate

$$\begin{aligned} \eta \ge \eta _{\mathrm u}= k_{\mathrm f}^2 \, \varepsilon - \eta _{\mathrm i}\ge k_{\mathrm f}^2 \, \varepsilon - k_{\mathrm i}^2 \, \varepsilon _{\mathrm i}\ge (k_{\mathrm f}^2 - k_{\mathrm i}^2) \, \varepsilon = (1 - {k_{\mathrm i}^2}/{k_{\mathrm f}^2}) \, \eta \,. \end{aligned}$$
(5.21)

Thus, \(\eta _{\mathrm u}\rightarrow \eta \) in the limit \(k_{\mathrm i}/ k_{\mathrm f}\rightarrow 0\). Similarly, noting that \(\eta _{\mathrm u}\ge k_{\mathrm u}^2 \, \varepsilon _{\mathrm u}\), we can show that \(\varepsilon _{\mathrm i}\rightarrow \varepsilon \) when \(k_{\mathrm f}/ k_{\mathrm u}\rightarrow 0\).

Thus, in the asymptotic limit \(k_{\mathrm i}\ll k_{\mathrm f}\ll k_{\mathrm u}\), there is only upscale energy transfer for \(k<k_{\mathrm f}\) and only downscale enstrophy transfer for \(k>k_{\mathrm f}\). As a result, these regimes are called (inverse) energy range and (direct) enstrophy range, respectively. The mean energy spectral density \(\langle E(k) \rangle \) should be such that a constant spectral energy flux is carried across each range.Footnote 8 Assuming that the mean energy transfer is spectrally local, as well as spatially homogeneous and isotropic, one expects that, in the energy range, \(\langle E(k) \rangle \) depends only on the infrared energy dissipation rate \(\varepsilon _{\mathrm i}=\varepsilon \) and on k. This lead Kraichnan (1967), Leith (1968), and Batchelor (1969), hereafter KLB (following earlier arguments by Kolmogorov for classical turbulence) to conclude that the only dimensionally consistent scaling law is

$$\begin{aligned} \langle E(k) \rangle = C_E \, \varepsilon ^{\tfrac{2}{3}} \, k^{-\tfrac{5}{3}} \,. \end{aligned}$$
(5.22)

Likewise, in the enstrophy range, one expects that \(\langle E(k) \rangle \) will depend only on the ultraviolet enstrophy dissipation rate \(\eta _{\mathrm u}= \eta \) and on k, leading to the scaling law

$$\begin{aligned} \langle E(k) \rangle = C_Z \, \eta ^{\tfrac{2}{3}} \, k^{-3} \,. \end{aligned}$$
(5.23)

The picture outlined above has two important limitations. The first one relates to the KLB assumption that only local triad interactions (triads where p, k, and q are of the same magnitude) contribute to the mean transfer of energy. For an individual wave number \({\varvec{k}}\) in the energy or the enstrophy range where forcing and dissipation are absent, we expect that the mean transfer rate \(\langle T_{\varvec{k}}\rangle \) is zero, which only means that some triads carry energy to \({\varvec{k}}\) and some from it. How they do this, however, does not really agree with the KLB picture—in real forced–dissipative two-dimensional turbulence, the contribution from non-local triads is indispensable. For a mode \({\varvec{p}}\) in the energy range, one cannot neglect triads with two long legs \({\varvec{k}}\) and \({\varvec{q}}\) in the forcing or enstrophy range. The local triads with legs being about \({\varvec{p}}\) dominate locally, but contrary to expectations their average effect is not leading to the inverse energy transfer. Similarly, for mode \({\varvec{k}}\) in the enstrophy range, one cannot neglect triads with one short leg \({\varvec{p}}\) in the forcing or energy range. The first such analysis is due to Maltrud and Vallis (1993) and was corroborated by Danilov and Gurarie (2001). This means that vortices near the forcing scale are strong enough to stir small-scale vortices in the enstrophy range, which is precisely the way these smaller vortices are formed. The presence of non-locality violates the KLB argument, for it is explicitly assumed that \(\langle E(k) \rangle \) in the energy range may only depend on the mean energy flux \(\varepsilon \) and on k (and not, e.g., on the forcing range), and similarly for the enstrophy range.

The second limitation of the classical KLB picture is that in real systems, the energy and enstrophy ranges are finite. If forcing pumps energy at intermediate scale \(k_{\mathrm f}\) with finite separation from \(k_{\mathrm i}\) and \(k_{\mathrm u}\), both energy and enstrophy are transferred to large and small scales through nonlinear interactions. If the wavenumber intervals separating forcing from dissipation are sufficiently broad, most of the energy is transferred upscale and most of the enstrophy downscale. However, these intervals are never broad enough in the ocean, and the question of the amplitude of the direct energy cascade relevant to the ocean is open.

However, even on finite ranges, the classical picture is not entirely lost. The following argument due to Gkioulekas and Tung (2007) provides integral bounds on energy and enstrophy fluxes which do not depend on infinite scale separation. To ease notation, we assume a wavenumber continuum (i.e., an unbounded domain in physical space) and, as before, consider energy densities and energy transfer rate densities as a function of the wavenumber modulus k. The argument, in essence, does not depend on this assumption. In a statistically stationary state, the average \(\langle \partial _t E_{\varvec{k}}\rangle = 0\), so that, taking the time or ensemble mean of (5.10) and averaging over the shell \(|{\varvec{k}}|= k\), we obtain

$$\begin{aligned} \langle T(k) \rangle = - D(k) \, \langle E(k) \rangle - \langle P(k) \rangle \end{aligned}$$
(5.24)

where we assume that the dissipation operators depend only on k, so that we can write \(D(k) = 2 D_{\mathrm i}(k) + 2 D_{\mathrm u}(k)\). Retaining the assumption that \(D_{\mathrm i}\) is dissipating at wavenumbers smaller than \(k_{\mathrm i}\) and \(D_{\mathrm u}\) is dissipating at wavenumbers larger than \(k_{\mathrm u}\), the mean spectral energy flux

$$\begin{aligned} \varPi (k) = \int _k^\infty \langle T(\kappa ) \rangle \, \mathrm{d}\kappa = - \int _0^k \langle T(\kappa ) \rangle \, \mathrm{d}\kappa \end{aligned}$$
(5.25)

is necessarily negative for \(k<k_{\mathrm i}\) and positive for \(k>k_{\mathrm u}\). The equality between the two integrals in (5.25) holds for each realization pointwise in time due to conservation of energy in the inviscid unforced system. Analogous statements hold true for the spectral enstrophy flux . Then,

$$\begin{aligned} \int _0^k 2 \xi \, \varPi (\xi ) \, \mathrm{d}\xi&= \int _0^k (\kappa ^2 - k^2) \, \langle T (\kappa ) \rangle \, \mathrm{d}\kappa \nonumber \\&= - \int _k^\infty (\kappa ^2 - k^2) \, \langle T (\kappa ) \rangle \, \mathrm{d}\kappa \nonumber \\&= \int _k^\infty (\kappa ^2 - k^2) \, (D(\kappa ) \, \langle E(\kappa ) \rangle - \langle P(\kappa ) \rangle ) \, \mathrm{d}\kappa \,, \end{aligned}$$
(5.26)

where the first identity is obtained by exchanging the order of integration, the second identity is once again based on the conservation of energy and enstrophy in the inviscid unforced case, and the last step uses the statistical stationarity relation (5.24). When \(k>k_{\mathrm u}\), the rate of energy pumping \(P(\kappa )\) appearing in the integrand of (5.26) is zero, while the contribution from dissipation is negative. Thus, for every \(k>k_{\mathrm u}\) and, trivially, for every \(k < k_{\mathrm i}\),

$$\begin{aligned} \int _0^k \xi \, \varPi (\xi ) \, \mathrm{d}\xi < 0 \,. \end{aligned}$$
(5.27)

Due to the weight in this integral, we see that the upscale flux of energy for \(k<k_{\mathrm i}\) must be typically larger than the downscale flux of energy for \(k>k_{\mathrm u}\). A similar inequality shows that the enstrophy flux is predominantly downscale (Gkioulekas and Tung 2007).

Whether or not inertial ranges can be observed depends on the spectral loci of dissipation and forcing. In particular, when an inverse cascade is observed, it only means that some energy dissipation is located at smaller wavenumbers than energy forcing. As a rule, dissipation and forcing are spread over wavenumbers and may even intersect. Moreover, when forcing extends up to the spectral cutoff \(k_{\max }\) of a simulation, the small direct cascade of energy may be partially hidden. Thus, in most cases, clean inertial ranges are absent. And even when inertial ranges void of dissipation and forcing exist, the observed spectra may deviate from the KLB predictions because non-local triad interactions are always present and may be significant for finite ranges; see the discussion and examples in Danilov (2005). We conclude that spectral slopes alone tell very little about the nature of the underlying dynamics, and one must turn to exploring the distribution of forcing and dissipation over scales.

Let us comment briefly on the case when \(\beta \ne 0\). In this situation, energy is channeled into zonal modes and large-scale features become highly anisotropic: Jets appear near the Rhines scale \(L_{{{\text {Rh}}}}=E^{1/4}/\beta ^{1/2}\) which is several hundred km for ocean conditions (see Rhines 1975 and the discussion in Danilov and Gurarie 2004). Smaller scales are largely unaffected.

As far as subgrid closures are concerned, the framework of two-dimensional incompressible turbulence implies that the closures should be consistent with the \(k^{-3}\) power law in the enstrophy range. The extent to which this is possible with traditional closures is explored by Graham and Ringler (2013). The degree to which this is relevant to the dynamics of the real ocean remains an open question, for dynamics at scales smaller than the internal Rossby radius develop an ageostrophic component.

5.3.2 Two-Layer Geostrophic Flows

In the presence of stratification, the situation becomes more complex. The general picture as presented in textbooks (see Salmon 1998; Vallis 2006) is as follows. On scales larger than the first internal Rossby radius \(L_\mathrm{d}\), there is a direct cascade of baroclinic (available potential plus kinetic) energy and an inverse cascade of barotropic (kinetic) energy . The baroclinic cascade is maintained through instabilities that release the available potential energy from an existing pool. It feeds the barotropic cascade at scales around \(L_\mathrm{d}\) via the mechanism of baroclinic instability. This energy is transferred upscale where it is dissipated. On scales smaller than \(L_\mathrm{d}\), the layers interact only weakly and behave similarly to two-dimensional turbulence discussed above. In this regime, the dynamics are governed by the direct enstrophy cascade, implying the scaling exponent \(-3\) for the modal or layer kinetic energy spectra. We note that this implies the presence of a direct energy cascade at these scales.

In this section, we discuss these concepts in the simplest possible setting, the two-layer quasigeostrophic model. It is essential that the two-layer model allows for a coupling between the eddy potential energy dynamics and the eddy kinetic energy. In this sense, it represents a minimum model for the real dynamics in ocean and atmosphere.

The two-layer QG model introduces important corrections to the single-layer situation explained in Section 5.3.1 above. First, it shows that the concept of spectrally localized forcing does not work, for the energy is supplied to the system over a broad range of scales, with the maximum spectral density of pumping shifted toward the scale of the energy spectrum maximum. Second, the notion of cascade has to be adjusted, for predictions are made for the baroclinic and barotropic energies, not for the layers.

For simplicity, we assume the layer depths are equal. The two-layer system can then be written as

$$\begin{aligned} \partial _t q_i + [\psi _i, q_i]&= F_i + \delta _{2i} \, D_{\mathrm i}\psi _i + D_{\mathrm u}\psi _i + (-1)^{i+1} \, \kappa \, (\psi _1-\psi _2) \,, \end{aligned}$$
(5.28a)
$$\begin{aligned} q_i&= f_0 + \beta y + \varDelta \psi _i + (-1)^i \, k_\mathrm{d}^2 \, (\psi _1-\psi _2)/2 \,, \end{aligned}$$
(5.28b)

where \(i \in \{1, 2\}\),

$$\begin{aligned} k_\mathrm{d}= \frac{1}{L_\mathrm{d}} = \frac{\sqrt{8} f_0}{N_0H} \end{aligned}$$
(5.29)

is the inverse of the baroclinic Rossby radius, \(f_0\) is the Coriolis frequency, \(N_0\) is the typical Brunt–Väisälä frequency, and H is the total fluid depth; see, e.g., Franzke et al. (2019) for details.

We remark that when diagnosed using the leading-order per-layer geostrophic balance relation, the difference in layer stream functions, \(\psi _1-\psi _2\), is proportional to the displacement of the interface between the layers. Thus, the last term in (5.28b) can be interpreted as the contribution to potential vorticity perturbations from the layer interface and is referred to as the stretching term.

We think of infrared dissipation \(D_{\mathrm i}\) acting as bottom drag only on the lower layer. Then, \(D_{\mathrm i}= -\lambda \varDelta \) with \(\lambda \) the bottom drag coefficient. Ultraviolet dissipation is typically modeled by hyperviscosity of some order \(n \ge 2\), so that \(D_{\mathrm u}= \nu (-\varDelta )^n\) with hyperviscosity coefficient \(\nu \). The last term in (5.28a) models thermal relaxation of the layer interface, with \(2\kappa /k_\mathrm{d}^2\) the inverse timescale. It restores interface displacement and thus enters the layer equations with the opposite sign.

Although the ocean is mainly driven by wind stress applied to the upper layer, a theoretically simpler situation occurs when the interface between layers is relaxed toward a position with a uniform slope, i.e., taking \(F_i=-(-1)^{i} \, \kappa U \, y\), with y the meridional coordinate. Equation (5.28a) in this case has an equilibrium solution \(\psi _1-\psi _2=-U y\), which in the presence of bottom drag implies \(\psi _1=-U y\) and \(\psi _2=0\).Footnote 9 The velocity U defines the vertical shear and interfacial slope in the two-layer QG model. This equilibrium solution corresponds to a pool of available potential energy (APE) and can be baroclinically unstable.

Splitting the stream functions into the equilibrium stream functions and perturbation or “eddy” stream functions \(\psi _1^{{\text {eddy}}}\) and \(\psi _2^{{\text {eddy}}}\), we write

$$\begin{aligned} \psi _1 = -yU + \psi _1^{{\text {eddy}}}\qquad \text {and} \qquad \psi _2 = \psi _2^{{\text {eddy}}}\,. \end{aligned}$$
(5.30)

Further, it is useful to rewrite the system in terms of the eddy barotropic stream function \(\psi \) and the eddy baroclinic stream function \(\tau \), respectively, defined by

$$\begin{aligned} \psi = \frac{\psi _1^{{\text {eddy}}}+ \psi _2^{{\text {eddy}}}}{2} \qquad \text {and} \qquad \tau = \frac{\psi _1^{{\text {eddy}}}- \psi _2^{{\text {eddy}}}}{2} \,, \end{aligned}$$
(5.31)

and the corresponding eddy barotropic potential vorticity q and eddy baroclinic potential vorticity \(\omega \) defined as

$$\begin{aligned} q = \varDelta \psi \qquad \text {and} \qquad \omega = \varDelta \tau - k_\mathrm{d}^2 \, \tau \,. \end{aligned}$$
(5.32)

We note that the stretching term from (5.28b) appears as the second term in the definition of \(\omega \).

Substituting (5.30) into (5.28), writing out the sum and the difference of the layer equations, and rewriting all expressions in terms of the modal stream functions (5.31) and their associated potential vorticities, we obtain

$$\begin{aligned}&\quad \partial _t q + [\psi ,q] + [\tau ,\omega ] + \frac{U}{2} \, \partial _x (q+\varDelta \tau ) + \beta \, \partial _x\psi = \frac{1}{2} \, D_{\mathrm i}(\psi -\tau ) + D_{\mathrm u}\psi \,, \end{aligned}$$
(5.33a)
$$\begin{aligned}&\partial _t \omega + [\psi ,\omega ] + [\tau ,q] + \frac{U}{2} \, \partial _x(\omega + q + k_\mathrm{d}^2 \, \psi ) + \beta \, \partial _x \tau = - \frac{1}{2} \, D_{\mathrm i}(\psi -\tau ) + D_{\mathrm u}\tau + \kappa \tau \,. \end{aligned}$$
(5.33b)

In the following, we will endow the perturbation quantities with doubly periodic boundary conditions. This is possible because the forcing terms, which are non-periodic in the y-direction, drop out of the equations for the perturbation quantities. However, the information on forcing is retained in the terms proportional to U.

The barotropic equation (5.33a) contains self-advection (i.e., the advection of barotropic eddy PV by the barotropic velocity field), whereas the baroclinic equation (5.33b) is linear in the baroclinic variables. Thus, barotropic dynamics are similar to two-dimensional vorticity dynamics characterized by an inverse energy cascade, whereas baroclinic dynamics are similar to the advection of a passive tracer which possesses a direct energy cascade.Footnote 10

As in Section 5.3.1, we consider the modal energy balances for the barotropic (kinetic) energy

$$\begin{aligned} E^\psi = \sum _{{\varvec{k}}\in \mathbb {Z}^2} E_{\varvec{k}}^\psi = - \frac{1}{2} \sum _{{\varvec{k}}\in \mathbb {Z}^2} \psi _{\varvec{k}}^* \, q_{\varvec{k}}\end{aligned}$$
(5.34)

and baroclinic energy

$$\begin{aligned} E^\tau = \sum _{{\varvec{k}}\in \mathbb {Z}^2} E_{\varvec{k}}^\tau = - \frac{1}{2} \sum _{{\varvec{k}}\in \mathbb {Z}^2} \tau _{\varvec{k}}^* \, \omega _{\varvec{k}}= \frac{1}{2} \sum _{{\varvec{k}}\in \mathbb {Z}^2} (k^2 + k_\mathrm{d}^2) \, |\tau _{\varvec{k}}|^2 \,, \end{aligned}$$
(5.35)

where the contribution prefactored by \(k^2\) is baroclinic kinetic energy and the contribution prefactored by \(k_\mathrm{d}^2\) is available potential energy. Taking the Fourier transform of the barotropic and baroclinic equations, multiplying with \(\psi ^*_{{\varvec{k}}}\) and \(\tau ^*_{{\varvec{k}}}\), respectively, and taking the real part, we obtain

$$\begin{aligned}&\,\partial _t E^{\psi }_{{\varvec{k}}} = T^\psi _{\varvec{k}}+ C_{\varvec{k}}^\psi + D_{\varvec{k}}^\psi \,, \end{aligned}$$
(5.36a)
$$\begin{aligned}&\partial _t E^{\tau }_{{\varvec{k}}} = T_{\varvec{k}}^\tau + C_{\varvec{k}}^\tau + G_{\varvec{k}}+ D_{\varvec{k}}^\tau \,. \end{aligned}$$
(5.36b)

The terms

$$\begin{aligned}&\qquad T^\psi _{\varvec{k}}= \mathfrak {R}[\psi _{\varvec{k}}^* \, J_{\varvec{k}}(\psi , q)] \,, \end{aligned}$$
(5.37a)
$$\begin{aligned}&T^\tau _{\varvec{k}}= \mathfrak {R}[\tau _{\varvec{k}}^* \, J_{\varvec{k}}(\tau , q)] - k_\mathrm{d}^2 \, \mathfrak {R}[\tau _{\varvec{k}}^* \, J_{\varvec{k}}(\psi , \tau )] \end{aligned}$$
(5.37b)

with \({\varvec{k}}= (k_x, k_y)\) describe energy transfer within the barotropic and baroclinic modes,

$$\begin{aligned} C_{\varvec{k}}^\psi&= \mathfrak {R}[\psi _{\varvec{k}}^* \, J_{\varvec{k}}(\tau , \omega )] - \frac{U}{2} \, k^2 \, \mathfrak {R}[{\mathrm i}k_x \, \psi _{\varvec{k}}^* \, \tau _{\varvec{k}}] \,, \end{aligned}$$
(5.37c)
$$\begin{aligned} C_{\varvec{k}}^\tau&= \mathfrak {R}[\tau _{\varvec{k}}^* \, J_{\varvec{k}}(\psi , \varDelta \tau )] - \frac{U}{2} \, k^2 \, \mathfrak {R}[{\mathrm i}k_x \, \tau _{\varvec{k}}^* \, \psi _{\varvec{k}}] \end{aligned}$$
(5.37d)

describe transfer from baroclinic to barotropic modes and vice versa, respectively,

$$\begin{aligned} G_{\varvec{k}}= \frac{U}{2} \, k_\mathrm{d}^2 \, \mathfrak {R}[{\mathrm i}k_x \, \tau _{\varvec{k}}^* \, \psi _{\varvec{k}}] \end{aligned}$$
(5.37e)

represents the generation of energy, and all dissipative terms are subsumed into \(D_{\varvec{k}}^\psi \) and \(D_{\varvec{k}}^\tau \).

One can readily see that the generation term is proportional to the meridional buoyancy flux which tends to level off the layer interface (for APE has to be released) if the system is baroclinically unstable. In this case, its mean value has to be positive definite in a statistically stationary sense. Note that \(G_{\varvec{k}}\) is defined by the dynamics and is not an external parameter as in 2D barotropic turbulence theory.

Since

$$\begin{aligned} \sum _{{\varvec{k}}\in \mathbb {Z}^2} T^\psi _{\varvec{k}}= \sum _{{\varvec{k}}\in \mathbb {Z}^2} T^\tau _{\varvec{k}}= 0 \,, \end{aligned}$$
(5.38)

these two terms only redistribute energy between scales. Likewise,

$$\begin{aligned} \sum _{{\varvec{k}}\in \mathbb {Z}^2} C_{\varvec{k}}^\psi = - \sum _{{\varvec{k}}\in \mathbb {Z}^2} C_{\varvec{k}}^\tau \,, \end{aligned}$$
(5.39)

so that these terms only redistribute energy between baroclinic and barotropic modes.

In the traditional view of baroclinic turbulence (Rhines 1977; Salmon 1980), one introduces spectral energy fluxes analogous to (5.25),

$$\begin{aligned} \varPi ^\psi (k) = - \int _0^k T^\psi (\kappa ) \, \mathrm{d}\kappa \qquad \text {and} \qquad \varPi ^\tau (k) = - \int _0^k T^\tau (\kappa ) \, \mathrm{d}\kappa \,, \end{aligned}$$
(5.40)

describing the redistribution of energy between scales. There are numerous publications discussing the behavior of fluxes in this situation (e.g., Scott and Arbic 2007). The barotropic flux \(\varPi ^\psi \) can be shown to be negative at \(k<k_\mathrm{d}\) corresponding to an inverse cascade of barotropic energy, while the baroclinic flux \(\varPi ^\tau \) is always positive corresponding to a direct cascade of full (i.e., potential and kinetic) baroclinic energy. Although there is an upscale (i.e., toward large scales) transfer of barotropic kinetic energy, there is no inertial range at \(k<k_\mathrm{d}\) because the transfer of energy from the baroclinic into barotropic mode is spread over all wavenumbers, being stronger at smaller k.

Thus, no spectral law can be predicted for the inverse cascade in this case. In contrast, on scales smaller than \(L_\mathrm{d}\) the stretching term in the expression for the quasigeostrophic potential vorticity becomes small compared to the relative vorticity and, as already mentioned, each layer behaves as in two dimensions implying the scaling exponent \(-3\) for the kinetic energy.

This picture relies on the fact that the assumed forcing maintains a pool of available potential energy which is then transferred to eddies through baroclinic instability, which develops into a nonlinear regime of quasistationary balance between the release of potential energy, nonlinear transfer, and dissipation. In general, forcing will drive both barotropic and baroclinic components of the mean flow. But even if forcing is only baroclinic, as is the case here, a mean barotropic flow is created in the presence of friction and/or topography. For uniform shear, the release of APE through baroclinic instability is the main source of energy for the eddies, but the kinetic energy of the mean flow may also be important in general.

The picture described so far is tied to the choice of writing the fields in terms of barotropic and baroclinic modes. Arguments will differ when looking at the transfer of energy between layers or between kinetic and potential energy. In particular, the sum of transfers between modes is zero only when integrated over wavenumbers. This explains why the picture of transfers will be modified if considered for layers (there will be transfers between the layers) and for the total energy (when baroclinic and barotropic kinetic energies will be combined, and potential energy split off the baroclinic energy).

The total energy at wave vector \({\varvec{k}}\),

$$\begin{aligned} E_{{\varvec{k}}} = \frac{1}{4} \, k^2 \, (|\psi _1 |^2_{{\varvec{k}}} + |\psi _2 |^2_{{\varvec{k}}}) + \frac{1}{2} \, k_\mathrm{d}^2 \, |\tau |^2_{{\varvec{k}}} \end{aligned}$$
(5.41)

has a rate of change

$$\begin{aligned} \partial _t E_{{\varvec{k}}}=T^K_{{\varvec{k}}}+T^P_{{\varvec{k}}}+G_{{\varvec{k}}}+D^{\mathrm i}_{\varvec{k}}+ D^{\mathrm u}_{\varvec{k}} \end{aligned}$$
(5.42)

with transfer rates

$$\begin{aligned} T^K_{{\varvec{k}}}&= \frac{1}{2} \, \mathfrak {R}\bigl [ \psi _{1{\varvec{k}}}^* \, J_{{\varvec{k}}}(\psi _1,\varDelta \psi _1) + \psi _{2{\varvec{k}}}^* \, J_{{\varvec{k}}}(\psi _2,\varDelta \psi _2) \bigr ] \,, \end{aligned}$$
(5.43a)
$$\begin{aligned}&\quad T^P_{{\varvec{k}}} = -k_\mathrm{d}^2 \, \mathfrak {R}\bigl [ \tau ^*_{{\varvec{k}}} \, J_{{\varvec{k}}}(\psi ,\tau ) \bigr ] \,, \end{aligned}$$
(5.43b)

a generation term \(G_{\varvec{k}}\) as before, and rates of frictional (infrared) dissipation \(D^{\mathrm i}_{\varvec{k}}\) and viscous (ultraviolet) dissipation \(D^{\mathrm u}_{\varvec{k}}\).

Fig. 5.2
figure 2

Spectral energy fluxes corresponding to the fluxes in (5.42), integrated over the wavenumber shell \(|{\varvec{k}}|= k\). Figure adapted from Jansen and Held (2014), their Figure 4. Note that the vertical axis shows \(k \, \partial _t E_k\), so that area under the curve in singly logarithmic scaling corresponds to total transfer rates. Note further that the scale of wavenumbers k shown is normalized by \(2\pi /L\), where L is the domain size. The deformation scale \(k_\mathrm{d}=1/L_\mathrm{d}\) is marked by the vertical dotted line

Intermodal or interlayer transfers are now included in the kinetic and potential energy transfers. The emerging picture is perhaps the most transparent; see Figure 5.2. It shows that generation is nearly compensated by large-scale dissipation, that the EPE flux is direct, for it takes the generated eddy energy \(G_{{\varvec{k}}}\) and carries it to larger wavenumbers gradually releasing it to kinetic energy, and that the EKE flux is inverse, for it takes the released potential energy and carries it back to the interval of small wavenumbers where it is dissipated. It is important to note that transfers into EKE and from EPE are centered at \(k_\mathrm{d}\) and occupy at least one octave of wavenumbers more on the short-wave side. Simulations by Jansen and Held (2014) demonstrate a spectrum of barotropic EKE close to but steeper than \(-3\) starting from \(k_\mathrm{d}\) and toward larger wavenumbers. Yet, a substantial part of the interval where this spectrum is observed is where the transfers take place, i.e., where there cannot be an inertial range. In other words, the existence of a well-defined spectral slope is not an indicator of an inertial range, which is frequently forgotten.

Although the theoretical prediction of the inverse cascade is formally made for the barotropic kinetic energy, it is commonly observed for baroclinic kinetic energy and for layer kinetic energies. This behavior is clarified by Scott and Arbic (2007).

We see that if there is a hope for the interval of self-similar behavior in layer QG dynamics, such behavior should be on the side of small wavenumbers and be consistent with the \(-3\) spectral law. However, the two-layer setup indicates very clearly that the transfer from EPE to EKE involves wavenumbers around \(k_\mathrm{d}\) or larger. For this reason, this spectral law and self-similar behavior of inertial range can only be expected to hold for wavenumbers essentially larger than \(k_\mathrm{d}\), which come too close to the scales where ageostrophy is important in the real ocean. The wavelength associated with \(k_\mathrm{d}=1/L_\mathrm{d}\) is \(2\pi L_\mathrm{d}\). On meshes with spacing \(a=L_\mathrm{d}\), this wavelength is well resolved, but this extra resolution is just needed to accommodate for spectral exchanges between EPE and EKE. In practice, in ocean circulation models, the resolution of \(k_\mathrm{d}^{-1}\) is not always (or not everywhere) achieved. In this case, eddy dynamics may suffer not only from excessive subgrid dissipation but also from the mere fact that the interval where EPE has to feed EKE is too short. The spectral interval where most of the generation (conversion from the available potential energy to the EPE) takes place tends to be at wavenumbers smaller than \(k_\mathrm{d}\). Yet, as shown by Jansen and Held (2014), the generation turns out to be sensitive to dissipation in the vicinity of \(k_\mathrm{d}\). We propose that the ability of subgrid closures to least interfere with energy generation presents a convenient guiding principle in these cases.

An important parameterization for relatively coarse, non-eddy-permitting ocean simulations was introduced by Gent and McWilliams (1990); it is now known as the Gent–McWilliams parameterization . Here, we explain the idea in the context of the two-layer model (5.28). On scales larger than \(L_\mathrm{d}\), the relative vorticity is expected to be small compared to the stretching term, the last term in (5.28b), which models perturbations of the layer interface. Correspondingly, the dominant nonlinear contribution to (5.28a) is the divergence of the thickness flux

$$\begin{aligned} (-1)^i \, k_\mathrm{d}^2 \, [\psi ^{{\text {eddy}}}_i, \tfrac{1}{2} \, (\psi ^{{\text {eddy}}}_1-\psi ^{{\text {eddy}}}_2)] = (-1)^i \, k_\mathrm{d}^2 \, [\psi , \tau ] = (-1)^i \, k_\mathrm{d}^2 \, \varvec{\nabla }\cdot (\tau \, \varvec{\nabla }^\bot \psi ) \,. \end{aligned}$$
(5.44)

This term will be very small if mesoscale eddies are not well resolved. The proposal of Gent and McWilliams (1990) amounts to adding a flux divergence of the form

$$\begin{aligned} \mathscr {F}_i = (-1)^{i} \, k_{\mathrm{d}}^2 \, \varvec{\nabla }\cdot (\varkappa \, \varvec{\nabla }\tau ) \end{aligned}$$
(5.45)

to the right-hand sides of the two-layer equations as a parameterization for the effect of unresolved eddies on the resolved flow. The coefficient \(\varkappa \) is sometimes taken constant, more frequently, however, selected as a polynomial of the velocity difference \(U = |{\varvec{u}}_1 - {\varvec{u}}_2 |\) based on qualitative theory where degree and the coefficients of the polynomial are chosen empirically. In this case, \(\varkappa \) is a measure of vertical instability in the system; see, e.g., Stone (1972), Cessi (2008), and Held and Larichev (1996).

By construction, the \(\mathscr {F}_i\) model only the subgrid layer thickness flux, not the full potential vorticity flux. They provide a sink for potential energy, thus emulating the effect of baroclinic instability on the potential energy balance in a model that is too coarse to resolve this process directly. This technique prevents the buildup of an unlimited pool of available potential energy, but does not model Reynolds stresses nor does it feed energy back into the pool of resolved eddy kinetic energy.

Note that while (5.45) looks like diffusion, it acts on the layer thickness. Whenever the interface between two layers is inclined, thickness diffusion means that it will be leveling the interface. This implies that fluid will move in opposite direction in the layers, showing that the Gent–McWilliams parameterization creates a circulation that tends to flatten isopycnals. Thus, while thickness diffusion proceeds in two dimensions, the generated fluid motion is three-dimensional, and it is advective. See Gent (2011) for a detailed discussion.

5.3.3 Continuously Stratified and Surface QG Dynamics

Even within the quasigeostrophic family of models, the picture presented so far is not the end of the story. First, when allowing for continuous stratification, there are many baroclinic vertical modes. Second, there are surface-trapped motions that can be understood in the framework of surface geostrophic dynamics (SQG); see, e.g., a discussion and further references in von Storch et al. (2019).

For simplicity, we consider the three-dimensional QG equations on a layer of uniform depth H with rigid lid upper boundary condition at \(z=0\). The model then reads

$$\begin{aligned}&\qquad \partial _t q + [\psi , q] = 0 \,, \end{aligned}$$
(5.46a)
$$\begin{aligned}&q = f + \varDelta _{\mathrm h}\psi + f_0^2 \, \partial _z \frac{\partial _z \psi }{N^2(z)} \end{aligned}$$
(5.46b)

where \(\varDelta _{\mathrm h}\) denotes the horizontal Laplacian and brackets, as before, the horizontal Jacobian, with boundary conditions for the buoyancy b at \(z = 0, -H\):

$$\begin{aligned}&\partial _t b + [\psi , b] = 0 \,, \end{aligned}$$
(5.47a)
$$\begin{aligned}&\quad b = f_0 \, \partial _z \psi \,. \end{aligned}$$
(5.47b)

According to Wunsch (1997), the bulk of ocean kinetic energy is well captured by the barotropic and first baroclinic modes. For this reason, the major conclusion regarding the spectral slope \(-3\) of the direct enstrophy cascade remains valid for the bulk of the ocean. However, the standard basis for vertical modes, as given by the eigenvalue problem

$$\begin{aligned} f_0^2 \, \partial _z \frac{\partial _z \varPsi _n(z)}{N^2(z)} + \lambda _n^2 \, \varPsi _n(z) = 0 \end{aligned}$$
(5.48)

with zero boundary conditions for \(\partial _z \varPsi \) at \(z = 0, -H,\) does not take into account surface buoyancy perturbations. Baroclinic instabilities evolving as solutions of (5.46) deal with the modes of the full operator that satisfy the boundary conditions (5.47), and cannot be understood in the frame of the standard basis. Certain textbook instabilities, for example the Eady problem (see, e.g., Vallis 2006), rely entirely on surface-trapped dynamics.

Even though it is possible to reformulate the surface dynamics as \(\delta \)-sheets of potential vorticity, such solutions cannot be represented in terms of the vertical eigenmode basis. Layered models, however, include the surface dynamics in their upper and lower layer potential vorticity. Therefore, the two-layer model described in Section 5.3.2 above cannot separate surface-driven instabilities from interior instability mechanisms, and the simplest model where this can be explored is the three-layer model studied in Badin (2014).

To separate the surface dynamics from the interior in the continuously stratified QG model, one considers the case where \(q= \text {const}\) in an infinitely deep layer, so that only surface dynamics remains. Then, the horizontal Fourier coefficients of the stream function \(\psi \) representing the surface buoyancy perturbation satisfy

$$\begin{aligned} f_0^2 \, \partial _z \frac{\partial _z \psi _{\varvec{k}}(z)}{N^2(z)} - k^2 \, \psi _{\varvec{k}}(z) = 0 \end{aligned}$$
(5.49)

with non-homogeneous Neumann conditions at the top and decay toward infinite depth. When \(N = \text {const}\), the \(\psi _{\varvec{k}}\) decay with depth as \(\exp (kNz/f_0)\); i.e., they decay on a vertical scale

$$\begin{aligned} H \gtrsim \frac{f_0}{kN} \,. \end{aligned}$$
(5.50)

Correspondingly, in a uniformly stratified layer of depth H, only surface perturbations larger in size than the first baroclinic Rossby radius may reach through the fluid depth.

In the absence of forcing and dissipation, surface dynamics will preserve integrals of buoyancy variance and the product \(\psi b\). The latter leads to an inverse cascade at large scales with a \(-1\) spectrum of surface kinetic energy, and the former leads to a direct buoyancy variance cascade with a \(-5/3\) spectrum (see, e.g., Smith et al. 2002). Note that the prediction concerns surface kinetic energy and is valid for uniform stratification. Since in this case \(|\varvec{\nabla }\psi _{{\varvec{k}}} |=k \, |\psi _{{\varvec{k}}} |\sim |b_{{\varvec{k}}} |\), we expect the same spectral law for kinetic energy

$$\begin{aligned} E_{{\text {EKE}}} = \frac{1}{4} \sum _{{\varvec{k}}} |\varvec{\nabla }\psi _{{\varvec{k}}} |^2 \end{aligned}$$
(5.51)

and available potential energy

$$\begin{aligned} E_{{\text {APE}}} = \frac{1}{4 N^2} \sum _{{\varvec{k}}} |b_{\varvec{k}}|^2 \,. \end{aligned}$$
(5.52)

Small scales do not penetrate deep, and spectra become steeper. Furthermore, they are modified by stratification; examples including exponential stratification and the case of a mixed upper layer are discussed in Callies and Ferrari (2013).

Instabilities in the real ocean project on both deep-ocean modes and surface modes, and depend on the structure of PV of the basic ocean state. Surface dynamics are expected to be an important contributor at locations where the interior PV gradients are weak. Since shallow surface modes are excited as a result of evolving instability, the transfer of available potential energy into eddy kinetic energy is not spectrally local. This implies that the argument in favor of precisely the \(-1\) or \(-5/3\) spectral slope at the surface is rather weak. However, it is appropriate to expect that spectral laws for near-surface velocities at small scales are shallower than the \(-3\) prediction for the enstrophy range.

5.3.4 Ocean Models and Observational Evidence

To study ocean turbulence beyond the idealized models mentioned before, we must turn to numerical studies of the primitive equations, full ocean circulation models, and observational evidence. In this context, mesoscale or submesoscale structures which become ageostrophic in high-resolution models are of particular interest. In the following, we review a few studies which highlight these issues with the understanding that this selection is far from being complete or representative.

To begin, the recent interest in surface quasigeostrophic (SQG) dynamics was triggered by the observation that energy spectra of surface geostrophic velocities derived from altimetric data are noticeably shallower at many locations than spectra predicted by the theory of QG turbulence (Lapeyre 2009). High-resolution simulations also lend support to the relevance of the SQG concept for understanding the simulated behavior and observations. However, at scales about the first baroclinic Rossby radius and smaller, in real situations as well as in high-resolution simulations with full primitive equations, surface quasigeostrophic dynamics are accompanied by frontal and mixed-layer instabilities which deviate from geostrophy. Klein et al. (2008) and Capet et al. (2008) analyze the near-surface dynamics in ocean simulations performed at the resolution of \(2\,\text {km}\) and down to \(0.75\,\text {km}\), respectively, and demonstrate that there is a close resemblance to SQG dynamics. The spectra of surface kinetic energy have a slope close to \(-2\) from the spectral maximum to the spectral cutoff at large k. This is much shallower than the slope predicted by quasigeostrophic theory. The conceptual difference to SQG is, however, that the Rossby numbers of eddies at these scales are no longer small and a substantial ageostrophic flow component is generated, which modifies the turbulent energy fluxes. The presence of frontal and mixed-layer instabilities implies that the transfer of available potential energy into kinetic energy continues at rather small scales associated with these instabilities. Nevertheless, the near-surface velocities are nearly in geostrophic balance and the ageostrophic components explain only a small fraction of kinetic energy, only visible close to the high-wavenumber spectral end. Despite their smallness with respect to the dominant rotational component (computed via the Helmholtz decomposition), they are responsible for the downscale cascade of the total eddy kinetic energy. The cascade of the dominant rotational component of the velocity behaves differently: It is upscale and of smaller amplitude than the cascade of full velocity. The fact that it is upscale is perhaps not surprising: As there are transfers from the available potential energy to kinetic energy, as in QG or SQG turbulence, the flux of rotational kinetic energy proceeds to larger scales from the scale of forcing.

Callies and Ferrari (2013) discuss existing views and assess two data sets shedding light on the behavior of ocean submesoscales. They consider scales from about 200 to \(1\,\text {km}\). For a site in the Gulf Stream, they found steep (\(-3\)) spectra of kinetic energy for scales between 200 and \(20\,\text {km}\), and shallower spectra at smaller scales consistent with the \(-2\) slope of the internal gravity wave spectrum. For a site in the North Pacific, they report shallower spectra whose behavior with depth, however, does not agree with the prediction of SQG. It is proposed that the gravity wave continuum and unbalanced motions can contribute to this behavior.

To summarize, the range of submesoscale, where the subgrid scales of eddy-permitting (and eddy-resolving) ocean circulation models are located, combines features of QG and SQG turbulence but also includes ageostrophic (unbalanced) motions, depending on mesh resolution and ocean stratification. Wavenumbers larger than \(k_\mathrm{d}\) are those of the forward cascade of EKE, but the inverse cascade can be present for the rotational component of EKE at even smaller scales if small-scale instabilities continue to transfer the available potential energy to EKE. No true slope prediction can be made on scales around \(k_\mathrm{d}\) because of intermodal exchanges and spectrally spread dissipation.

Inertial ranges may emerge on the side of smaller scales on very high-resolution meshes, but even there one should expect a dependence on the depth and a contribution from submesoscale (frontal) instabilities. So even when they emerge, inertial ranges may deviate from the predictions of QG turbulence because of a forward energy cascade. The dominance of the rotational velocities in the energy spectra does not imply their dominance in energy transfers at large wavenumbers. One may try to draw a certain analogy with the \(-5/3\) spectrum observed in the atmosphere between 500 and \(10\,\text {km}\), which is that of stratified turbulence with forward energy cascade; see the bibliography, discussion, and analysis of high-resolution simulations in Augier and Lindborg (2013). On larger scales, it matches the dynamics predicted by QG theory.

At present, resolutions in ocean circulation models are such that the near-subgrid scales are in a range where self-similar behavior is unlikely. Subgrid closures can therefore not be universal in the range of resolutions about the Rossby radius. Hence, perhaps the guiding principle should be that of minimizing their damping effect on the rate at which energy is released from the pool of APE and the KE of the background state to the EKE at the resolved scales.

5.4 Energy Backscatter

Although most ocean circulation models used for climate research are coarse, the number of eddy-permitting models is increasing and will dominate in the future. Such models simulate eddy dynamics, but cannot resolve it fully for they suffer from overdissipation. Its origin can be explained as follows. In order to remove the variance of velocity and enstrophy at grid scales for numerical stability, they use harmonic or biharmonic dissipative subgrid viscosity operators (Fox-Kemper and Menemenlis 2008). Together with removing the grid-scale noise, these operators also dissipate energy at adjacent scales, which are in this case the scales close to the internal Rossby radius. As we have seen, these scales host exchanges between the potential and kinetic energy compartments and also determine the eddy energy release from the available potential energy. Their overdissipation is the reason why eddy-permitting flows seldom reach the observed levels of eddy kinetic energy.

The problem of overdissipation and, in more broad context, of subgrid closure that takes into account the existence of unresolved scales has been known in atmospheric sciences for a long time. First papers on this issue appeared almost simultaneously with the KLB concept of two-dimensional turbulence; see Leith (1971) and the discussion in Frederiksen and Davies (1997). It may be explained within the spectral picture of triad interactions in two-dimensional turbulence as follows: Since we can only resolve wavenumbers up to some \(k_{\max }\) numerically, it is clear that we miss not only spectrally local interactions responsible for the enstrophy transfer through the boundary at \(k=k_{\max }\), which lead to a net energy drain and hence behave as a form of dissipation in the ensemble mean, but also non-local triads, having two legs at \(k>k_{\max }\) or on both sides of \(k_{\max }\) and one leg at large scales \(k_{\text {LS}} \ll k_{\max }\), which might force the resolved scales. These interactions are termed backscatter. It is not represented by the usual dissipative subgrid operators, which is the main cause of overdissipation in conventional models. A fully deterministic representation of backscatter is impossible as the details of the state of the subgrid are in principle not available. Thus, the best we can hope for is some stochastic model of backscatter.

Theoretical developments in this direction assume as a rule QG dynamics, periodic boundary conditions or spherical geometry, ensemble averaging, and “spectral language” to come up with parameterizations. As an example, we mention the work by Kitsios et al. (2013) who derive both stochastic and deterministic closures by comparing truncated and high-resolution dynamics in a two-layer QG setup on the sphere. It is believed that both types can be equally skillful, for, in any case, useful formulas rely on ensemble averaging and thus do not describe realizations.

Although the turbulent dynamics dictates that drain and backscatter should be described as stochastic processes, additional issues such as numerical stability have to be taken into account. For deterministic parameterizations, the resulting expressions contain powers of the Laplacian, sometimes going beyond the biharmonic one. Their study shows that even in the context of two-layer QG turbulence which is statistically homogeneous in the zonal direction, the final parameterizations of drain and backscatter depend not only on \(k_\mathrm{d}\) and \(k_{\max }\), but also on the extent of the energy-containing range.

It will be much more difficult to propose parameterizations for domains with horizontal boundaries where spectral language and zonal homogeneity are missing. In addition, all backscatter parameterizations raise the question of numerical stability due to the effective negative viscosity of the terms providing backscatter. Finally, in addition to eddy–eddy interactions considered by Kitsios et al. (2013), contributions may come from interactions involving the unresolved mean field (in the sense of time averages) component. These examples show that progress is possible, but we can hardly expect universally valid solutions.

In the following, we review two specific backscatter parameterizations in detail. The first , due to Jansen and Held (2014), is based on a very straightforward scalar model for the subgrid energy. The second, due to Grooms and Majda (2014), uses a more sophisticated linear model for the subgrid dynamics.

5.4.1 Models with Scalar Subgrid Energy Budget

Since, as mentioned above, comprehensive first-principle models are necessarily complex, we think that simplified implementations of energy backscatter proposed by Jansen and Held (2014) and Jansen et al. (2015), who consider kinetic energy backscatter for QG and primitive equations, respectively, deserve attention. These parameterizations do not aim at mimicking missing interactions with subgrid scales, but seek instead to compensate for the overdissipated energy, which is much easier. Importantly, the amount of energy returned through the proposed backscatter parameterization can be controlled, which is a prerequisite for stability of the algorithm.

Jansen and Held (2014) study the two-layer quasigeostrophic equations with the Leith parameterization as nonlinear small-scale dissipation operator. In each layer i,

$$\begin{aligned} D_{\mathrm u}\psi _i = -\varDelta (\nu _i \, \varDelta ^2 \psi _i) \qquad \text {with} \qquad \nu _i = C_L \, a^6 \, |\varDelta ^2 \psi _i |\,, \end{aligned}$$
(5.53)

where a denotes the grid-spacing and \(C_L=0.005\). The associated overall rate of viscous dissipation at wavenumber \({\varvec{k}}\) is

$$\begin{aligned} V_{\varvec{k}}= \frac{1}{2} \sum _{i \in \{1,2\}} k^2 \, (\psi _i)_{\varvec{k}}^* \, (\nu _i \, \varDelta ^2 \psi _i)_{\varvec{k}}^{ {*}} \,. \end{aligned}$$
(5.54)

(The layers are assumed to be of equal thickness, hence the additional factor of 1 / 2 in the expression for \(V_{{\varvec{k}}}\) and in similar expressions below.) The rate of frictional dissipation in the bottom layer at wavenumber \({\varvec{k}}\) is

$$\begin{aligned} F_{\varvec{k}}= \tfrac{1}{2} \, \lambda \, k^2 \, |\psi _2 |^2_{{\varvec{k}}} \,. \end{aligned}$$
(5.55)

Summing over wavenumbers, the total rate of ultraviolet and infrared dissipation is

$$\begin{aligned} V = \sum _{{\varvec{k}}} V_{\varvec{k}}\qquad \text {and} \qquad F = \sum _{{\varvec{k}}} F_{\varvec{k}}\end{aligned}$$
(5.56)

Since transfers are summed to zero, the overall balance of energy is

$$\begin{aligned} \partial _t E = G -F - V \,, \end{aligned}$$
(5.57)

where G is the generation term with Fourier representation (5.37e).

To compensate for the excessive dissipation at small scales, the simplest model is to add an energy source that returns energy at a rate

$$\begin{aligned} S = (1 - \varepsilon ) \, V \end{aligned}$$
(5.58)

so that all but a small fraction \(\varepsilon \approx 0.1\) representing the physical rate of ultraviolet dissipation is returned. Jansen and Held (2014) tested two different models for this source, one deterministic and the other stochastic. In the deterministic version, each layer potential vorticity equation is given a source term

$$\begin{aligned} s_i = -A(t) \, \varDelta ^2 \psi _i \end{aligned}$$
(5.59)

which corresponds to negative Laplacian viscosity in the momentum equations. The amplitude A(t) is set by the condition that the constraint (5.58) is satisfied at every instance in time. Since the Laplacian is less scale-selective than the biharmonic ultraviolet dissipation, energy will be returned at larger scales than those at which it is dissipated.

The second implementation is stochastic, with

$$\begin{aligned} s_i = A(t)^{1/2} \, \eta ({\varvec{x}},t) \,, \end{aligned}$$
(5.60)

where the \(\eta \) is Gaussian noise, \(\delta \)-correlated in space and time. The forcing is kept barotropic; i.e., the same noise process is used for both layers, to replenish the inverse cascade of barotropic kinetic energy. In this case, the ensemble mean \(\langle S \rangle \) will be proportional to A(t) so that the amplitude can be found at each time step from the constraint (5.58). Of course, (5.58) is satisfied only in the ensemble mean. However, it is also approximately satisfied for each realization as the stochastic forcing is distributed over a large number of spatial locations of the computational grid. The rate of backscatter energy pumping at mode \({\varvec{k}}\) is given by

$$\begin{aligned} S_{\varvec{k}}= \frac{1}{2} \sum _{i \in \{1,2\}} (\psi _i)_{\varvec{k}}^* \, (s_i)_{\varvec{k}}^{ {*}} \,. \end{aligned}$$
(5.61)

Thus, even when the \(s_i\) have a white noise spectrum, energy backscatter is biased toward the scales with already high energy content. In practice, this involves scales larger than those of \(V_{{\varvec{k}}}\).

Jansen and Held (2014) conclude that both parameterizations work rather similarly; however, the stochastic implementation returns energy over a broader interval of wavenumbers. The principal question here, namely how much energy has to be returned and where it should be returned, is left without answer and presents a topic for future research. The amplitude of backscatter is selected globally, which is only appropriate if flow energy is distributed uniformly. In the general case, one needs a local criterion.

A small variation of the kinematic backscatter assumption (5.58) is the introduction of a dynamic global subgrid energy budget \(E_{{\text {sg}}}\) via

$$\begin{aligned} \dot{E}_{{\text {sg}}} = V - S - 2 \lambda \gamma \, E_{{\text {sg}}} \,. \end{aligned}$$
(5.62)

The last term represents dissipation of subgrid energy by bottom friction, where the parameter \(\gamma \) is the fraction of subgrid energy residing in the lower layer and \(\lambda \) the bottom drag coefficient.Footnote 11 This form of a global subgrid energy reservoir is suggested by Jansen and Held (2014) as a motivation to justify assumption (5.58), but could also be used computationally by assuming that the amplitude of backscatter A(t) is proportional to the total subgrid energy in the reservoir.

At the next level of complexity, one may use a local subgrid energy budget. Jansen et al. (2015) suggest a budget for the subgrid energy density e of the form

$$\begin{aligned} \partial _t e = v - s - \varvec{\nabla }\cdot {\varvec{F}}- d \,, \end{aligned}$$
(5.63)

where v is the rate of viscous dissipation per unit volume of the resolved scales, s is the rate of backscatter per unit volume, \({\varvec{F}}\) is the flux redistributing subgrid energy, and d is the rate of dissipation of subgrid energy per unit volume. Each of these terms must be modeled. Jansen et al. (2015) assume biharmonic ultraviolet dissipation

$$\begin{aligned} v = \frac{1}{H} \sum _i h_i \, \nu _i \, \vert \varDelta {\varvec{u}}_i |^2 \,, \end{aligned}$$
(5.64)

where H is the total depth, \(h_i\) are the layer depths, and \(\nu _i\) are the layer horizontal biharmonic viscosity coefficients, assumed positive. In this setting, all operators act in the horizontal only. For the backscatter source, one can take harmonic viscosity so that

$$\begin{aligned} s = - \frac{1}{H} \sum _i h_i \, \nu _{{\text {bs}}} \, |\varvec{\nabla }{\varvec{u}}_i |^2 \end{aligned}$$
(5.65)

with negative coefficient of viscosity

$$\begin{aligned} \nu _{{\text {bs}}} = - C_{{\text {bs}}} \, a \, \bigl ( \max \{ 2e, 0\} \bigr )^{\tfrac{1}{2}} \end{aligned}$$
(5.66)

with \(C_{{\text {bs}}}\) an order-one constant. If the energy to be scattered back becomes too large, e becomes negative and backscatter viscosity goes to zero. This controls the amount of energy returned back.

A major point for discussion is the choice of flux \({\varvec{F}}\) for the subgrid energy . Jansen et al. (2015) choose the purely diffusive flux

$$\begin{aligned} {\varvec{F}}= - k_{{\text {sg}}} \, \varvec{\nabla }e \,, \end{aligned}$$
(5.67)

where \(k_{{\text {sg}}}\) is an appropriately selected constant of diffusivity. This choice is guided by the observation that the transfer from and to the subgrid can be very spatially rough so that a mechanism is needed to regularize the distribution of e horizontally. However, the question arises whether subgrid energy should not perhaps be advected by the resolved flow or be subject to some other non-local mechanism of transfer.

Finally, the dissipation rate d in (5.63) is typically small and may be neglected.

It turns out that the backscatter parameterizations by Jansen and Held (2014) and Jansen et al. (2015) lead to noticeable improvements even in situations where non-trivial bottom topography is present, and allow the mesoscale eddy dynamics in eddy-permitting simulations to approach those of high-resolution runs. On a qualitative level, the success of these simple implementations of backscatter rests on the idea that energy needs to be scattered back only in places where it is strongly dissipated. Although the vertically averaged or basin-averaged subgrid kinetic energy balance used to assess the backscatter viscosity presents an oversimplification, the energy scattered back is nevertheless modulated by the distribution of resolved energy. This also implies that the parameterization may bring improvements only in situations when an eddy-permitting model already correctly simulates the pattern of kinetic energy distribution but lacks amplitude. In realistic applications resolving the vertical structure with many more layers, the vertical distribution of backscatter viscosity may matter, since surface-trapped modes may exhibit more vertical structure, but this remains to be seen. A theory of where to return the energy scattered back presents an interesting question for further research too. Clearly, with only the harmonic operator at one’s disposal, the deterministic backscatter parameterization has limited capabilities so that stochastic closures may still be needed. Furthermore, a missing point is the cascade of EPE which is dissipated too by subgrid diffusive closures or through upwind transport algorithms. Too diffusive transport schemes may result in the reduced transfer between EPE and EKE, so that the role of subgrid closures in the tracer equations should be explored. Conversely, Ilıcak et al. (2012) show that spurious mixing of transport algorithms depends on velocity variance at grid scales, so that energizing these scales above a certain level is not recommended. This issue is further explored in Klingbeil et al. (2019). This set of questions shows that even in the context of energy backscatter, the problem is far from being resolved and new ideas are required.

To apply these ideas to the full primitive equations, we face further questions. To our knowledge, this has not been pursued exhaustively, and we can only sketch a direction; more theoretical analysis and numerical experiments are needed here. To start, we may localize even further, treating the subgrid energy density e as a full three-dimensional field, so that the evolution equation (5.63) now takes the form

$$\begin{aligned} \partial _t e = v - s - \varvec{\nabla }_{\mathrm h}\cdot {\varvec{F}}_{\mathrm h}- \partial _z F_z - d \,, \end{aligned}$$
(5.68)

where \({\varvec{F}}\) is the flux redistributing the subgrid energy, taken as \({\varvec{F}}_h = -K_{\mathrm h}\, \varvec{\nabla }_h e\) and \(F_z = - K_z \, \partial _z e\), where \(K_{\mathrm h}\) and \(K_z\) are appropriately selected horizontal and vertical diffusion coefficients. As before, d is the rate of dissipation of subgrid energy; it is small and may be neglected. This approach is more expensive, for now the evolution equation has to be integrated in three spatial dimensions.

The contribution from ultraviolet dissipation now takes the form

$$\begin{aligned} v = - \nu _{\text {visc}} \, |\varDelta _{\mathrm h}{\varvec{u}}|^2 \,, \end{aligned}$$
(5.69)

where \(\nu _{\text {visc}}\) is the coefficient of horizontal biharmonic viscosity. Vertical viscosity in the momentum equation would generally be provided by a vertical mixing parameterization which relies on some physics and empirical data, e.g., using a KPP closure (Large et al. 1994) or k-\(\varepsilon \) closure (Umlauf and Burchard 2003). The corresponding backscatter source term reads

$$\begin{aligned} s = - \nu _{{\text {bs}}} \, |\varvec{\nabla }{\varvec{u}}|^2 \end{aligned}$$
(5.70)

where \(\nu _{{\text {bs}}}\) is again given by an expression of the form (5.66).

One may consider stochastic implementation options for the backscatter source. A caveat here is that for the primitive equations, the source must respect the divergence condition. This could be done by a simple projection. Another possibility is to write the horizontal source in the form

$$\begin{aligned} {\varvec{s}}= \varvec{\nabla }\times (\varPsi {{\varvec{k}}}) \,, \end{aligned}$$
(5.71)

with \(\varPsi ({\varvec{x}},t) = P({\varvec{x}}) \, \eta ({\varvec{x}},t) \, A(t)\). Here, P is a spatial pattern of eddy kinetic energy (which may be modeled, inferred from high-resolution simulations, or taken from observations), A(t) is the amplitude (selected to ensure subgrid energy balance), and \(\eta \) is a random field generated, for example, by a Markov process. Despite the presence of a differential operator, one has to introduce correlations in time and space to ensure that the resulting forcing is smooth.

Another issue is that, for the primitive equations, the concept of backscatter relates in equal measure to the momentum and to the tracer equations. Compensation for kinetic energy overdissipation is not necessarily sufficient if tracer variance is overdissipated. In principle, an approach resembling the one applied to the quasigeostrophic potential vorticity equations can be proposed. However, there are some technical difficulties. First, in many cases dissipation is already built into the implementation of the transport operators and cannot be easily accessed. Second, even if it is not, biharmonic operators are not always available for tracers.Footnote 12 Finally, the production of tracer variance and the production of kinetic energy are linked, so that additional theoretical analysis is required.

5.4.2 Stochastic Superparameterizations

While the backscatter approximation by Jansen and Held (2014) and Jansen et al. (2015) seeks to return excessively dissipated energy back to the main flow, it needs an eddy-permitting model that is able to simulate a correct pattern of eddy variability. Their subgrid representation only captures the subgrid energy e and does not attempt to represent the parameterized action of the Reynolds stress. In models that are not fully eddy-permitting, this approach will not work and one needs a more sophisticated model of the subgrid. We will discuss so-called stochastic superparameterizations as proposed by Grooms and Majda (2013, 2014) and Grooms et al. (2015b) in the context of quasigeostrophic two-layer models.

The main difference between the stochastic parameterization (SP) and the stochastic superparameterization (SSP) is that the latter involves a prognostic fine grid equation which is motivated by the underlying physical evolution equation and involves coarse mesh quantities as well as a stochastic source term. We first explain the idea in the context of the simple single-layer model (5.4), where the essential features of the method are already visible with less notational effort.

Let us decompose the stream function \(\psi \) into a coarse mesh stream function \(\psi _{{\mathrm c}}\) and a fine mesh stream function \(\psi '\), and likewise define the corresponding vorticities, so that

$$\begin{aligned} \psi =\psi _{{\mathrm c}}+\psi ' \quad \text {and} \quad \zeta =\zeta _{{\mathrm c}}+\zeta ' \,, \end{aligned}$$
(5.72)

where it is understood that \(\zeta _{{\mathrm c}} = \varDelta \psi _{{\mathrm c}}\) and \(\zeta ' = \varDelta \psi '\). We also split the forcing into a deterministic physical forcing \(F_{{\mathrm c}}\) on the coarse mesh and a stochastic forcing \(F'\) on the fine mesh. Inserting this ansatz into (5.4), we obtain

$$\begin{aligned} \partial _t \zeta _c + \partial _t \zeta ' + [\psi _c,\zeta _c]&+ [\psi _c,\zeta '] + [\psi ',\zeta _c] + [\psi ',\zeta '] \nonumber \\&\qquad + \beta \, \partial _x \psi _c + \beta \, \partial _x \psi ' = F_{{\mathrm c}} + F' + D\zeta _{{\mathrm c}} + D\zeta ' \,. \end{aligned}$$
(5.73)

We now split this equation into an evolution equation for the coarse variables and an evolution equation for the fine mesh variables. This procedure is non-rigorous, so there is some freedom of choice. However, the fine system should be linear with constant coefficients so that it can be solved explicitly, for otherwise the combined computational cost would be higher than the cost of simulating the entire system on the fine grid.

Following Grooms and Majda (2014), we decompose the domain \(\varOmega \) into disjoint subdomains \(\varOmega _i\). Each subdomain contains exactly one grid point of the coarse mesh, and the coarse mesh variables are assumed constant on each adjacent subdomain. The fine systems are then solved independently for one coarse time step with periodic boundary conditions on each subdomain.

The coarse system should contain all coarse terms and the divergence of the eddy potential vorticity fluxFootnote 13

$$\begin{aligned} {\varvec{F}}_{\text {epv}} = \overline{\zeta ' \, \varvec{\nabla }^\bot \psi '} \,. \end{aligned}$$
(5.74)

All the remaining terms go into the fine mesh system which is solved independently on each coarse mesh cell. The coarse system then takes the form

$$\begin{aligned} \partial _t \zeta _c + [\psi _c,\zeta _c] + \varvec{\nabla }\cdot {\varvec{F}}_{\text {epv}} + \beta \, \partial _x \psi _c = F_{{\mathrm c}} + D\zeta _c \,, \end{aligned}$$
(5.75)

and the fine system reads

$$\begin{aligned} \partial _t \zeta ' + [\psi _c,\zeta '] + [\psi ',\zeta _c] + \beta \, \partial _x \psi ' = S + F' + D\zeta ' \,, \end{aligned}$$
(5.76)

where the nonlinear eddy–eddy interactions are represented by

$$\begin{aligned} S = \varvec{\nabla }\cdot {\varvec{F}}_{\text {epv}} - [\psi ',\zeta '] \,. \end{aligned}$$
(5.77)

The coarse system is solved on the coarse grid. The fine system is linear except for the eddy–eddy term which must be modeled. Grooms and Majda (2014) suggest to replace each Fourier mode \(S_{\varvec{k}}\) by the right-hand side of an Ornstein–Uhlenbeck stochastic process of the form

$$\begin{aligned} \mathrm{d}\zeta = - \gamma \, \zeta \, \mathrm{d}t + \sigma \, \mathrm{d}W \,, \end{aligned}$$
(5.78)

where W is a standard Wiener process .Footnote 14 The Ornstein–Uhlenbeck process is controlled by two parameters, the inverse correlation time \(\gamma \) and the variance \(\sigma ^2/(2\gamma )\) which will later be chosen differently for different wavenumbers. They further assume that the coarse grid fields can be held constant in each fine cell and that there is no forcing on the fine scale. Then, the full fine-scale model in the Fourier representation reads

$$\begin{aligned} \mathrm{d}\zeta '_{\varvec{k}}= (\ell _{\varvec{k}}- \gamma _{\varvec{k}}) \, \zeta _{\varvec{k}}' \, \mathrm{d}t + \sigma _{\varvec{k}}\, \mathrm{d}W_{\varvec{k}}\,, \end{aligned}$$
(5.79)

where \(\ell _{\varvec{k}}\) is the Fourier symbol of all linear terms in (5.76) and the Wiener processes \(W_{\varvec{k}}\) are mutually independent.

The crucial observation is that S is quadratic in fine-scale quantities, so that a space average corresponds, by the Plancharel theorem, to an integral over \(|\psi _{\varvec{k}}|^2\). Averaging further over the stochastic ensemble, it is clear that it suffices to compute the evolution of \(\mathbb E [|\psi _{\varvec{k}}|^2]\). By the Itô Lemma, it is easy to derive a deterministic linear ordinary differential equation for this quantity, which can be solved explicitly and independently for each wavenumber.

The coefficients \(\gamma _{\varvec{k}}\) are tuned so that the equilibrium distribution without the interaction terms \(\ell _{\varvec{k}}\) matches a given power spectrum. Later, when initializing the second moment equation, the initial value is taken to be the equilibrium value, again without interactions. Thus, the effect of the interaction with the coarse grid quantities, which are encoded in \(\ell _{\varvec{k}}\), is to color the fine grid statistics consistent with the coarse grid flow. In particular, when applying this method to stratified models, \(\ell _{\varvec{k}}>0\) at scales where the flow is baroclinically unstable, resulting in growth of the primed quantities.

At this point, the effective subgrid dynamics as seen from the coarse grid is purely deterministic. Grooms and Majda (2014) found that it is necessary to take a large number of modes in the subgrid to match the correct spectral decay. To keep the computational cost low, and to account for the observation that real ocean eddies have anisotropies that vary in space and time, they select at random a direction in each subgrid cell, independent for each point in coarse space–time, and choose a one-dimensional spectral decomposition in this cell.

Grooms et al. (2015b) perform a detailed computational study of their stochastic superparameterization in a two-layer zonally reentrant channel mimicking the Antarctic Circumpolar Current. They compare the model with a deterministic Gent–McWilliams (GM) parameterization in a regime where mesoscale eddies are not resolved on the coarse grid and with an eddy-resolving high-resolution simulation. Their setting is similar to the two-layer quasigeostrophic model discussed in Section 5.3.2, with vorticity equations

$$\begin{aligned}&\partial _tq_1+ [\psi _1,q_1] = - \frac{2}{\rho _0H} \, \partial _y F(y) + \nu _2 \, \varDelta ^2 \psi _1 \,, \end{aligned}$$
(5.80a)
$$\begin{aligned}&\quad \partial _tq_2+[\psi _2,q_2] = - r \, \varDelta \psi _2 + \nu _2 \, \varDelta ^2 \psi _2 \,, \end{aligned}$$
(5.80b)

where \(\nu _2\) is Newtonian viscosity, r is the Ekman drag coefficient, and the layer potential vorticities are given by

$$\begin{aligned}&\,q_1 = f_0 + \beta y + \varDelta \psi _1 + \tfrac{1}{2} \, k_\mathrm{d}^2 \, (\psi _2-\psi _1) - k_\mathrm{e}^2 \, \psi _1 \,, \end{aligned}$$
(5.81a)
$$\begin{aligned}&q_2 = f_0 + \beta y + \varDelta \psi _2 + \tfrac{1}{2} \, k_\mathrm{d}^2 \, (\psi _1-\psi _2) + \frac{2f_0}{H} \, h_{\mathrm b}\,. \end{aligned}$$
(5.81b)

The lateral boundary conditions are periodic in the zonal and stress-free in the meridional direction. This system is different from the two-layer equation (5.28) in the following way: The flow here is driven by a steady sinusoidal wind forcing F(y).Footnote 15 It includes explicit bottom topography \(h_{\mathrm b}\) to avoid unrealistic spin-up of the mean current. A stretching term with coefficient \(k_\mathrm{e}=1/L_\mathrm{e}=f/\sqrt{gH}\) is included in the upper layer potential vorticity so that the model is formally valid to scales up to the external Rossby radius of deformation \(L_\mathrm{e}\) . Finally, dissipation is second order as is common for relatively coarse resolution models; the eddy-resolving simulation and the fine grid dynamics, however, are set up with fourth-order dissipation.

In the coarse model, the divergence of the subgrid potential vorticity fluxes is introduced layerwise as explained in the single-layer setting; for details, see (Grooms and Majda 2014). In order to compare the performance of GM and SPP, the authors analyze the temporal statistics of the surface elevation at each fixed point; in particular, they compute the bias of the mean and the bias of the variance relative to the highly resolved reference simulation. While the time mean biases of both parameterizations are similar in magnitude and spatial pattern (Figure 5.3), the time variance of the stochastic superparameterization run is significantly closer to the reference simulation, even though both models are variance deficient (Figure 5.4).

Fig. 5.3
figure 3

Time mean bias of interface elevation in meters of the Gent–McWilliams parameterization (top) vs. the stochastic superparameterization (bottom). The zonal direction is shown in the horizontal in units of \(10\,000\,\text {km}\), the meridional direction in the vertical in units of kilometers. Graphs are adapted from Grooms et al. (2015b)

Fig. 5.4
figure 4

Ratio of interface elevation time variance of the reference simulation over time variance corresponding to the data shown in Figure 5.3. The variance deficiency of the Gent–McWilliams parameterization (top) is significantly larger than the variance deficiency of the stochastic superparameterization (bottom). Graphs are adapted from Grooms et al. (2015b)

While this model achieves a spectrally consistent proxy for the missing potential vorticity flux, it imposes locality and spatial uniformity of the subgrid model. In reality, however, as follows from the pattern of eddy kinetic energy in Grooms et al. (2015b), the eddy kinetic energy is (i) essentially non-uniform and (ii) does not correlate with the places of maximum instability—indeed, EKE spots are always downstream of places with maximal baroclinic instability. So, the main conceptual question is how to introduce non-trivial advection of subgrid quantities by the coarse flow.

Another question is how to make this approach practical. In essence, we need the dependence of mean subgrid fluxes as a function of velocities and quasigeostrophic PV gradients. For a two-layer quasigeostrophic model, this can still be done. However, for continuous stratification and for the primitive equations, there is the vertical dependence which will soon require more computational effort than directly applying an eddy-resolving mesh. Another question is how to avoid overexciting gravity waves in a gravity wave-permitting model.

There may be possible simplifications. For example, since the fine grid dynamics depends only on the magnitude of resolved velocities, PV gradients, and the angle between the velocities and gradients, one may only consider a finite set of velocity values and PV gradients, and interpolate the results for the eddy flux divergence between the simulated patterns computed for this set. Such a lookup table may considerably reduce computational cost. Further, for primitive equation models, the subgrid may still be treated in quasigeostrophic approximation. Thus, it will be possible to represent PV gradients on the subgrid which is seen essential for providing a proper proxy for baroclinic instability, and cannot be easily done if the superparameterization is formulated in terms of primitive variables where information on PV gradients would be lost when going to the fine grid.

5.5 Other Closures

5.5.1 The Mana–Zanna Parameterization of Ocean Mesoscale Eddies

Mana and Zanna (2014) study the correlation of different functional forms for the eddy source term with a highly resolved direct numerical simulation, select the best candidate function, and match the remaining coefficients with the empirical data. More detailed tests in a double gyre configuration are reported in Zanna et al. (2017). In these two papers, the authors work in a 3-layer quasigeostrophic setting; possible extensions to primitive equation models are discussed and tested in Anstey and Zanna (2017). In the following, we describe the Mana–Zanna parameterization following the concise derivation later given by Grooms and Zanna (2017). We will present a slightly more general view which raises interesting possibilities for further optimization of the closure.

For simplicity, we restrict the discussion to the barotropic single-layer QG equations without \(\beta \)-effect. Working exclusively in the continuum setting on the plane, we define a time-independent coarsening operation via convolution with a filter kernel, i.e.,

$$\begin{aligned} \overline{\zeta }({\varvec{x}}) = \int _{\mathbb {R}^2} \phi _\delta ({\varvec{x}}-{\varvec{y}}) \, \zeta ({\varvec{y}}) \, \mathrm{d}{\varvec{y}}\end{aligned}$$
(5.82)

where \(\phi _\delta \) is a radial kernel with \(\delta \) referring to the width of the filter. Applying this operation to the barotropic vorticity equation (5.4a), we can write

$$\begin{aligned} \overline{D}_t \overline{\zeta }\equiv \partial _t \overline{\zeta }+ [\overline{\psi }, \overline{\zeta }] = S + \overline{F} + D^* \overline{\zeta }\,, \end{aligned}$$
(5.83)

where \(D^*\) is some coarsened dissipation operator and S denotes the eddy source term

$$\begin{aligned} S = [\overline{\psi }, \overline{\zeta }] - \overline{[\psi , \zeta ]} + \overline{D \zeta } - D^* \overline{\zeta }\,. \end{aligned}$$
(5.84)

In Mana and Zanna (2014), the authors seek to build a model for S in terms of the divergence of Rivlin–Ericksen stresses which originated in the description of non-Newtonian fluids with infinitesimal memory (see, e.g., Truesdell and Rajagopal 1999). These tensors satisfy material frame invariance and observer objectivity, properties required of a physical material law. For the barotropic vorticity equation, an exact implementation of an inviscid second-grade Rivlin–Ericksen fluid would correspond to

$$\begin{aligned} S = \alpha \, \overline{D}_t \varDelta \overline{\zeta } \end{aligned}$$
(5.85)

which leads to the vorticity formulation of the Euler-\(\alpha \) model further discussed in Section 5.5.2 below. Their study, however, finds that a better correlation is obtained by using

$$\begin{aligned} S = \alpha \, \varDelta \overline{D}_t \overline{\zeta } \end{aligned}$$
(5.86)

which differs from (5.85) by nonlinear commutators, but preserves the property of frame invariance. They also find that the coefficient \(\alpha \) on the right-hand side is negative, which precludes a straightforward interpretation as advection by a smoothed velocity field.

Grooms and Zanna (2017) provide a posteriori justification of (5.86) along the following lines. They argue that, after high-pass filtering, S is the dominant term on the right-hand side of (5.83).Footnote 16 In particular, the Laplacian of S dominates the Laplacians of the other two terms so that

$$\begin{aligned} \varDelta D_t \overline{\zeta }\approx \varDelta S \,. \end{aligned}$$
(5.87)

They proceed to show that S is highly correlated with \(\varDelta S\), so that (5.87) implies (5.86).

Let us explore such correlations from a more general perspective. We define a family of abstract approximate Laplacians which includes the usual 5-point discrete Laplacian in two dimensions. Suppose that \(\{\mu _{\varepsilon }\}_{\varepsilon >0}\) is a family of finite positive Borel measures on \(\mathbb {R}^n\) with \({\text {supp}} \,\mu _{\varepsilon }\subset B(x,\varepsilon )\), the ball centered at x with radius \(\varepsilon \), satisfying

$$\begin{aligned}&\qquad \mu _{\varepsilon }(\{x\})=0 \,, \end{aligned}$$
(5.88a)
$$\begin{aligned}&\lim _{\varepsilon \rightarrow 0} \int _{B(x,\varepsilon )} y_i \, \mathrm{d}\mu _{\varepsilon }(y)=0 \,, \end{aligned}$$
(5.88b)

and

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \int _{B(x,\varepsilon )} y_i \, y_j \, \mathrm{d}\mu _{\varepsilon }(y) = 2 \, \delta _{ij} \end{aligned}$$
(5.88c)

for all \(i,j=1,\dots ,n\). In particular, various normalized symmetric measures, including point measures and surface Lebesgue measures in lower dimensions, satisfy these conditions. Then,

$$\begin{aligned} \varDelta S(x) = \lim _{\varepsilon \rightarrow 0} \, |\mu _{\varepsilon } |\, ({\text {Av}}_\varepsilon (S) - S(x)) \,, \end{aligned}$$
(5.89)

where \(|\mu _{\varepsilon } |=\mu _{\varepsilon }(\mathbb {R}^n)\) and

$$\begin{aligned} {\text {Av}}_\varepsilon (S) = \frac{1}{|\mu _{\varepsilon } |} \int _{B(x,\varepsilon )} S(y) \, \mathrm{d}\mu _{\varepsilon }(y) \,. \end{aligned}$$
(5.90)

Assume now that S is a homogeneous isotropic \(\delta \)-correlated Gaussian random field with variance \(\sigma ^2\). Setting

$$\begin{aligned} {\varvec{w}}= \begin{pmatrix} S(x) \\ {\text {Av}}_\varepsilon (S) - S(x) \end{pmatrix} \end{aligned}$$
(5.91)

and fixing \(\varepsilon >0\) at a small finite value, we find that the covariance matrix of \({\varvec{w}}\) is given by

$$\begin{aligned} \varSigma = {\text {Cov}}[{\varvec{w}},{\varvec{w}}] = \sigma ^2 \, \begin{pmatrix} 1 &{} -1 \\ -1 &{} (b + 1) \end{pmatrix} \,, \end{aligned}$$
(5.92)

with \(b = \vert \mu _{\varepsilon }^2 |/ |\mu _{\varepsilon } |^2\), where \(\vert \mu _{\varepsilon }^2 |= \sum _{y\in B(x,\varepsilon )}\mu _{\varepsilon }^2(\{y\})\).Footnote 17

The eigenvalue ratio corresponding to the subdominant principal component of \(\varSigma \) is given by

$$\begin{aligned} r \equiv \frac{\lambda _2}{\lambda _1+\lambda _2} = \frac{b - \sqrt{b^2+4}+2}{2 \, b + 4} \,. \end{aligned}$$
(5.93)

It quantifies the fraction of variance not explained by a linear relationship between the components of \({\varvec{w}}\). When \(\mu _\varepsilon \) does not have point measure components, \(b=0\) and consequently the eigenvalue ratio \(r=0\), indicating perfect correlation between the components of \({\varvec{w}}\). The largest value of b in the class of point measures with equal weights corresponds to the 4-point Laplacian where \(\mu _\varepsilon \) consists of three-unit Dirac masses at angles 0, \(2\pi /3\), and \(4\pi /3\). In this case, \(b=1/3\) and \(r=(7 - \sqrt{37})/14 \approx 0.066\). For the usual 5-point stencil as considered in Grooms and Zanna (2017), \(b=1/4\) and \(r={9-\sqrt{65}}/{18} \approx 0.052\).

Thus, even for relatively concentrated measures, a major fraction of the variance is explained by a linear relationship between S and the approximate Laplacian of S as defined via the right-hand expression in (5.89) for finite \(\varepsilon \). The constant of proportionality is the ratio of the components of the principal eigenvector of \(\varSigma \), i.e.,

$$\begin{aligned} S(x) \approx \tfrac{1}{2} \, (b - \sqrt{4 + b^2}) \, ({\text {Av}}_\varepsilon (S) - S(x)) \,. \end{aligned}$$
(5.94)

In the concrete case of the 5-point Laplacian, we find

$$\begin{aligned} S \approx - \frac{1}{8} \, (1-\sqrt{65}) \, \frac{\varepsilon ^2}{4} \, \varDelta S = - (c \varepsilon )^2 \, \varDelta S \approx - (c \varepsilon )^2 \, \varDelta D_t \overline{\zeta } \end{aligned}$$
(5.95)

where \(c \approx 0.469782\), which is close to the empirical value found by Mana and Zanna (2014).Footnote 18

It is clear from the argument above that the eigenvalue ratio improves when the measure becomes less localized. On the other hand, the assumption of S being a \(\delta \)-correlated random field must break down on small scales; we expect the decorrelation length to be at or slightly larger than the grid scale. Thus, it should be possible to replace the Laplacian in the argument above with a discrete operator \(\varLambda \) whose stencil nodes are at least a decorrelation length apart and which is effectively acting as a high-pass filter. The form of \(\varLambda \) can then be optimized for its eigenvalue ratio. In this context, we remark that the approximation made in (5.87) does not seem necessary to proceed, as \(\varLambda \overline{F}\) and \(\varLambda D^* \overline{\zeta }\) are readily computable. We believe that this question is worth further investigation.

A different line of reasoning might be based on a random field model for S with finite spatial correlations. Assuming a given spectrum for S, the characterization of the two-point correlation function via the Wiener–Khinchin theorem (see, e.g., Yaglom 1987) could still allow us to compute the covariance matrix \(\varSigma \) explicitly and subsequently optimize the filter \(\varLambda \). Finally, a detailed analysis of the structure of S is warranted. Grooms et al. (2015a) provide an argument that the spectrum of S grows like \(k^5\), which can likely only be true on a limited range of scales as a perfectly \(\delta \)-correlated random field should have a flat spectrum. Thus, in particular the details of spatial correlation near the grid scale require attention.

5.5.2 \(\alpha \)-Models

The so-called \(\alpha \)-models initially came up in the study of nonlinear waves, not in turbulence. What is now known as the Camassa–Holm equation was first discovered by Fuchssteiner and Fokas (1981) who sought completely integrable generalizations of the Korteweg–de Vries (KdV) equation with a bi-Hamiltonian structure. It was independently re-derived by Camassa and Holm (1993)—for a more detailed exposition, see Camassa et al. (1994)—as a next order correction to the KdV equation in small amplitude expansion of unidirectional surface waves in irrotational shallow water. Camassa and Holm’s work attracted a lot of attention as, in addition to integrability and bi-Hamiltonian structure, they found a family of peaked soliton solutions. Solutions of the Camassa–Holm equation can be seen as geodesics on the diffeomorphism group with respect to a right-invariant \(H^1\)-metric (Kouranbaeva 1999). The striking parallel to Arnold’s (1966) view of ideal three-dimensional hydrodynamics as geodesic flow on the volume-preserving diffeomorphism group endowed with an \(L^2\)-metric was pointed out by Holm et al. (1998) who, replacing the \(L^2\) with an \(H^1\)-metric, obtained a hydrodynamic analog to the Camassa–Holm equations which is now known as the Euler-\(\alpha \) equations or the Lagrangian-averaged Euler equations. In velocity–momentum variables, they read

$$\begin{aligned}&\partial _t {\varvec{v}}- {\varvec{u}}\times (\varvec{\nabla }\times {\varvec{v}}) + \varvec{\nabla }p = 0 \,, \end{aligned}$$
(5.96a)
$$\begin{aligned}&\qquad \quad {\varvec{v}}= (1 - \alpha ^2 \varDelta ) {\varvec{u}}\,, \end{aligned}$$
(5.96b)
$$\begin{aligned}&\qquad \qquad \quad \varvec{\nabla }\cdot {\varvec{u}}= 0 \,. \end{aligned}$$
(5.96c)

The Euler-\(\alpha \) equations arise from the “kinetic energy” Lagrangian

$$\begin{aligned} L_\alpha = \tfrac{1}{2} \int |{\varvec{u}}|^2 + \alpha ^2 \, |\varvec{\nabla }{\varvec{u}}|^2 \, \mathrm{d}{\varvec{x}}\,, \end{aligned}$$
(5.97)

which is a constant of the motion.

The connection to turbulence was made soon after its discovery, based on a number of observations. The momentum \({\varvec{v}}\) is transported by a velocity field \({\varvec{u}}\) which is smoother than the momentum; see (5.96b). This was seen as analogous to Reynolds averaging, even though the two operations are not equivalent; further, the non-viscous terms take the form of a Rivlin–Erikson tensor, so that, in their inviscid form, they coincide with the equations of motion for a non-Newtonian fluid of second grade (Foias et al. 2001). Analytically, the Euler-\(\alpha \) equations possess properties which are notably lacking in ideal and Newtonian fluids: In two dimensions, the Euler-\(\alpha \) model has unique global point vortex solutions (Oliver and Shkoller 2001), and in three dimensions, the viscous \(\alpha \)-equations have global classical solutions (Marsden and Shkoller 2001; Foias et al. 2002). We note that it is not a priori clear how to add viscosity to (5.96): The references quoted so far argue that momentum should be diffused; see Chen et al. (1999a) for a discussion of this issue. The classical equations of a viscous second-grade fluid, in contrast, diffuse velocity—a mathematically weaker form of dissipation so that, correspondingly, the global existence of solution is only known for small initial data (Cioranescu and Girault 1997), much like the situation for the Navier–Stokes equations in three dimensions.

Several authors have given derivations of the Euler-\(\alpha \) equations as the equations of motion for some notion of a Lagrangian mean flow. Holm (1999, 2002) recognized a close connection between Lagrangian averaging and the generalized Lagrangian mean (GLM) of Andrews and McIntyre (1978). To provide closure, Holm assumes that first-order fluctuations in a small amplitude expansion are parallel-transported by the mean flow—an assumption he refers to as a Taylor hypothesis in analogy with G.I. Taylor’s observation that turbulent fluctuations are correlated in the downstream direction of a flow (Taylor 1938). Marsden and Shkoller (2003), in contrast, assume that first-order fluctuations are transported as a vector field and that parallel transport of second-order fluctuations is, on average, orthogonal to the velocity field. Recently, Gilbert and Vanneste (2018) have pointed out that a geometric view of the Lagrangian mean fixes the higher-order closure conditions. In this framework, the Euler-\(\alpha \) equations emerge from Lagrangian averaging under the minimal set of assumptions that (i) the averaged map is the minimizer of geodesic distance, (ii) first-order fluctuations are statistically isotropic, and (iii) first-order fluctuations are transported by the mean flow as a vector field (Oliver 2017; Badin et al. 2018).

The numerical evidence supporting the use of \(\alpha \)-models is mixed. Early numerical studies for homogeneous turbulence were encouraging (Chen et al. 1999b; Mohseni et al. 2003). The underlying idea has also been ported to rotating geophysical fluid flow (Holm and Nadiga 2003) and used in various test cases (Hecht et al. 2008a; Aizinger et al. 2015). Careful comparative studies for two-dimensional quasigeostrophic turbulence, however, show that the \(\alpha \)-model perturbs the dynamics of two-dimensional turbulence. In particular, it suffers from accumulation of enstrophy at small scales (Lunasin et al. 2007; Graham and Ringler 2013) and has inferior correlation with an empirically observed subgrid stress tensor (Mana and Zanna 2014), where the computationally observed behavior is close to (5.95), a relationship that is similar, but not identical to the \(\alpha \)-model closure. In addition, as the inversion of the Helmholtz filter in (5.96b) is non-local, it is not appealing for use in a full ocean model. We note, however, that the idea of filtering in a geometrically intrinsic setting is more general than what is usually pursued and may have some merit even in the setting of the nearly geostrophic turbulence in mesoscale ocean dynamics.

In the final part of the section, we shall sketch a possible nonstandard interpretation of the \(\alpha \)-model dynamics as a model for two-dimensional turbulence. For simplicity, we return to the barotropic vorticity equation of Section 5.3.1 with \(\beta =0\) and initially ignore forcing and dissipation. In this setting, the model coincides with the well-studied two-dimensional Euler-\(\alpha \) equation whose vorticity dynamics reads

$$\begin{aligned}&\partial _t \xi + [\psi , \xi ] = 0 \,, \end{aligned}$$
(5.98a)
$$\begin{aligned}&\quad \xi = L_\alpha \varDelta \psi \,, \end{aligned}$$
(5.98b)

where (5.96b) corresponds to the choice \(L_\alpha = 1 - \alpha \varDelta \), but \(L_\alpha \) could also be a more general operator defined via a Fourier symbol \(\ell _\alpha (k)\). The \(\alpha \)-energy at wavenumber \({\varvec{k}}\) is \(\mathscr {E}_{\varvec{k}}= - \tfrac{1}{2} \, \psi _{{\varvec{k}}}^* \, \xi _{{\varvec{k}}}^{ {*}}\), and the \(\alpha \)-enstrophy is given by \(\mathscr {Z}_{\varvec{k}}= k^2 \, \ell _\alpha (k) \, \mathscr {E}_{\varvec{k}}\); system (5.98) conserves total \(\alpha \)-energy and \(\alpha \)-enstrophy.

Now suppose that \(\alpha \)-wavenumber k corresponds to a different physical wavenumber \(\kappa (k)\) and that there is a corresponding physical energy

$$\begin{aligned} E(\kappa ) = \mathscr {E}(k) / h(k) \,. \end{aligned}$$
(5.99)

A straightforward computation shows that total physical energy and enstrophy are conserved if and only if

$$\begin{aligned} \kappa ^2(k) = \ell _\alpha (k) \, k^2 \end{aligned}$$
(5.100a)

and

$$\begin{aligned} h(k) = \tfrac{1}{2} \, k \, \ell _\alpha '(k) + \ell _\alpha (k) \,. \end{aligned}$$
(5.100b)

Looking at the detailed triad interactions of the \(\alpha \)-model, we find transfer rate relations similar to (5.15) where the rate of nonlinear energy transfer is with respect to \(\alpha \)-wavenumbers, whereas the prefactors on the right-hand side are satisfied with respect to physical wavenumbers. Thus, in general, it is not even approximately true that the \(\alpha \)-triad picture corresponds to the physical triad picture under the wavenumber mapping implied by energy and enstrophy conservation. However, there is one class of triads for which this is approximately the case: when one leg of the triad is in the low wavenumbers and two legs are in the high-wavenumber range; to be definite, we take \(p \ll k < q\) and set \(\delta = p/k \ll 1\). We might call such interactions catalytic triads as (5.15) shows that there is an O(1) energy exchange between modes \({\varvec{k}}\) and \({\varvec{q}}\) while mode \({\varvec{p}}\) exchanges energy only at a rate \(O(\delta )\). In other words, mode \({\varvec{p}}\) takes the role of a catalyst, mediating the transfer of energy in the high-wavenumber regime while not participating in it to leading order. Provided the turbulent regime is dominated by catalytic triads (which is not the classical KLB picture, but it is likely that these triads are key players in the inverse cascade), then under mild assumptions on \(\ell _\alpha \), an \(\alpha \)-model can be interpreted as representing the physical interactions under the mappings (5.100) up to relative errors in rates and mapped wave numbers of \(O(\delta )\). The details of this computation involve only elementary estimates and shall be omitted here.

Thus, to interpret the \(\alpha \)-dynamics consistently via the remapping of wavenumbers, the same map must be applied when adding forcing and dissipation terms. Dissipation in the \(\alpha \)-model momentum equation, in particular, should take the form \(D(\kappa (k))\). This corresponds to momentum rather than velocity diffusion and thus coincides with the dissipation operator typically used in connection with \(\alpha \)-models as reviewed earlier in this section.

Finally, to consistently interpret the energy spectrum, it must be mapped back to physical wavenumbers. In the comparison of Graham and Ringler (2013), for example, no such map is applied. This constitutes an interesting open question as, to our knowledge, such analysis has never been done. A related open problem is to formulate the \(\alpha \)-model subgrid closure mapped to physical wavenumbers.

5.6 Concluding Remarks

In this chapter, we have reviewed the foundations of geostrophic turbulence and its implications for ocean models in the eddy-permitting regime. In the past decade, a number of authors have looked at the problem of effective parameterizations for subgrid eddy activity and for the resulting backscatter of energy into the resolved grid. Most of the detailed testing so far has been done in the context of quasigeostrophic layer models, with increased attention to full primitive equation setups in recent years.

Our selection of parameterizations for close discussion is necessarily incomplete, highlighting recent developments in favor of older ideas, putting an emphasis on mathematical structure toward systematic, or even rigorous, analysis, and with a view toward applicability for a new generation of global circulation models featuring irregular grids with spatially varying grid resolution which rules out approaches that require explicit Fourier transforms or other constructs tied to a regular grid.

To a large extent, the ideas expressed here are exploratory. None of the parameterizations described here is widely used in operational models so that a major development cycle of introducing more energy-consistent parameterizations lies ahead. It is also not clear which of these approaches will be the most fruitful in the long run or whether some new or possibly old ideas will prevail.

Such old ideas could include the anticipated vorticity method of Sadourny and Basdevant (1985) which seeks to introduce a force \(-D{{\varvec{k}}}\times {{\varvec{u}}}\) such that, for example, when D is chosen as an upwind estimate of the layer potential vorticity, the scheme conserves energy exactly while dissipating enstrophy. In practice, this approach is insufficient as it does not remove the component of small-scale numerical noise in \({{\varvec{u}}}\) that does not project on curl as required for numerical stability. Graham and Ringler (2013) report that first-order anticipated vorticity results in either an excess of energy at all scales or dissipation of enstrophy across a too large portion of the spectrum; they suggest that applying a high-order spatial operator within the anticipated PV formalism may solve this issue, but at the expense of easy implementability in current GCMs. Yet, the underlying idea is interesting as the mathematically most direct way to reconcile energy conservation with enstrophy dissipation.

The classical development of Smagorinsky closures has been central in the modeling and simulation of turbulent flow regimes; see, e.g., the review by Meneveau and Katz (2000). However, it is not directly applicable to typical ocean regimes where, due to the scales at which forcing, instabilities, and dissipation act, one is often not in a self-similar scaling regime which is a prerequisite of LES and Smagorinsky-type closures. Dynamical Smagorinsky closures (when the subgrid viscosity is computed by applying an additional coarsening filter and fitting the difference between this and the original filter to the simulated stresses) could be of interest, although even with these techniques the lack of self-similarity may be an issue.

Stochastic modeling of subgrid interactions and backscatter has been developed, based on the direct interaction approximation of Kraichnan (1959), by Frederiksen and co-workers (Frederiksen and Davies 1997; O’Kane and Frederiksen 2008; Kitsios et al. 2013, 2014, 2016). While their work involves a detailed analysis of unresolved eddy–eddy interactions, it also heavily relies on spectral language, so that it is not clear how applicable this approach is in the context of complex geometries and possibly non-uniform grids and what the trade-offs in terms of skill vs. computational expense are. We also remark that there is similarity between the expression for the subgrid drain dissipation matrix in Kitsios et al. (2013) and the estimation of a dynamic Smagorinsky coefficient in the spirit of Germano et al. (1991).

For systems with an explicit fast–slow scale separation, it may be possible to model the fast timescale component with a stochastic process and use stochastic mode reduction to reduce the system to a stochastic equation on the slow timescale. Such methods are reviewed in Section 5 of Franzke et al. (2019). However, it is completely open whether this approach is applicable to subscale modeling in geostrophic turbulence where there is no clear scale separation and whether these techniques scale up to full ocean models.