1 Introduction

Numerical weather prediction (NWP), which began with the work of Richardson during World War I [250], remains one of the most challenging problems in the computational sciences. The two main challenges to producing an accurate forecast are (1) mathematically modeling atmospheric phenomena over a wide range of physical and temporal scales (e.g., turbulence, radiation, cloud formation), and (2) harnessing the available computational resources to evaluate these models in an accurate and efficient manner. While the goal of the first challenge is probably static (that is, a comprehensive mathematical description of the atmosphere at a given time), the second challenge represents a moving target. Computational resources not only expand; they change in character. Richardson’s original idea of a “forecasting factory” consisting of thousands of human computers assembled in an amphitheater was never realized; the first NWP codes were implemented on mainframe computers. Mainframes gave way to minicomputers and later vector supercomputers such as the Cray 1, 2, X-MP, and Y-MP. By the mid-1990s, vector supercomputers were replaced by massively parallel distributed systems. Now, in 2015, we are seeing the proliferation of many-core architectures (e.g. GPUs) and hybrid distributed/shared memory architectures (e.g. clusters of many-core processors, heterogeneous clusters). Moreover, as models increase their accuracy by resolving more phenomena (e.g. resolving non-hydrostatic effects, incorporating more complex moisture parameterizations), their appetite for high performance computing (HPC) resources grow.

The modeling challenge and computational challenge meet in the choice of the numerical method used to discretize the underlying continuum model(s), which are generally expressed as systems of both partial and ordinary differential equations. The numerical model, as this figurative middle-man, must both (1) accurately represent the continuum model, and (2) efficiently utilize the hardware used to implement the numerical method. Hence, the numerical method mediates these two grand challenges by adapting to the hardware; moreover, since NWP models may take on the order of 100 man-years to develop, test, and deploy, the designers of the numerical method should target their model to future HPC resources. Just as biological organisms must constantly adapt to their physical environment, numerical methods must adapt to their computational environment, competing for available resources. A natural question arises: which numerical methods will survive and flourish, and which will stagnate, decline, and perhaps go extinct?

This question was partially addressed in the review of the numerical methods for non-hydrostatic atmospheric modeling reported by Steppeler et al. [282]. Based on some of the questions posed in [282], we concentrate on a class of numerical methods that may emerge victorious in next generation atmospheric (and climate) models: element-based Galerkin methods (EBGs). Among other questions, Steppeler and co-workers asked whether the numerical error caused by terrain-following coordinates could be avoided by means of \(z\)-coordinate based methods [281, 282]; element-based Galerkin methods are a natural choice to fulfill this recommendation. Furthermore, they questioned the ability of low order methods to resolve certain phenomena at high resolution without affecting accuracy: “Experience from current models suggests that approximations of overall third order will be adequate.” It is shown in this review how things have indeed evolved towards the high order approach that Steppeler et al. were discussing 10 years ago and how those schemes that in 2003 had not been used in operational mode (because considered “advanced” [281]), are currently the driving force behind the next generation NWP models.

As discussed above, element-based Galerkin schemes today are tied to their relationship with the evolution of computer hardware. We will see this in the sections that follow, after giving a short overview of the current trends in HPC and how atmospheric models are developing around this paradigm.

1.1 Trends in High Performance Computing

Twenty-five years ago (1990), state-of-the-art HPC were the Cray supercomputers (e.g. Cray Y-MP). These machines had a small number (2–8) of expensive custom vector processors, which perform a single instruction on multiple data (SIMD); all the processors fetched data from a bank of shared memory. This trend changed in the 1990s as commodity processors and memory became relatively inexpensive; suddenly, large clusters of commodity processors that utilized distributed memory architectures became available. Unlike the vector machines, distributed memory systems require communication between independent processes. At the present time (2015) another shift is occurring as many-core architectures, with a relatively small amount of shared memory, are being coupled with massively parallel systems. These distributed memory systems eclipsed the older vectorized machines by the late 1990s, and vectorized machines are no longer used in HPC.

Today, HPC is in the Petascale era, with core counts exceeding \({\fancyscript{O}}(10^6)\) [226] while exascale technologies are rapidly approaching. For instance, the largest cluster as of November 2014 (Top500Footnote 1) is Tianhe-2 with 3.12 million cores and a maximum LINPACK [80] performance of 33.8 PetaFLOPS. The next largest machine is Titan, a Cray XK7 with 560,640 cores and a maximum LINPACK performance of 17.59 PetaFLOPS. To take full advantage of the performance of these architectures, the need for specific characteristics in new models drove scientists from different fields to go back to the design board and start from scratch in the construction of their numerical algorithms [118]. This is required by the need for very specific features that the numerical method must have to reach very high levels of scalability on the new machines. The next section reports on most operational and research atmospheric models developed until today with special emphasis on how atmospheric modelers are moving towards numerical methods that have proved more scalable on current and future computers.

1.2 Existing Atmospheric Models and NWP Systems

Table 1 shows a non-exhaustive list of atmospheric models developed until today. Most of the listed codes are based on the finite difference method. Except for ENDGame (UK Met Office), the Nonhydrostatic Multiscale Model core of the NCEP NAM, and EULerian LAGrangian (EULAG), all FD-based codes are limited area models (LAM). Spectral transform and finite volumes represent the second major trend. Codes based on the spectral transform are common for General Circulation Models (GCM) only. High-order element-based methods (spectral element method, SEM, and discontinuous Galerkin, DG) follow, while the finite element method (FEM), only used by a handful of models, is the least common of all. For reasons that will become clearer in later sections, the temporal integration schemes that are mostly used are the split-explicit and the semi-implicit methods (Table 2).

Table 1 Non-exhaustive compilation of NWP systems
Table 2 Non-exhaustive compilation of atmospheric research models

1.3 Traditional Approaches: Finite Difference (FD) and Spectral Transform (ST) Methods

As noticeable from the tables above, most operational NWP codes in use are based on either the finite difference (FD) method, or, in the case of global models, the spectral transform (ST) method. It is difficult to find models using these methods that scale optimally on massively parallel computers (ST methods due to their all-to-all communication requirements and FD due to non-compact stencils especially at high-order). This is also true of non-compact (high-order) finite volume methods. In order to understand the strengths and weaknesses of these traditional approaches and how EBGs address some of their shortcomings, we briefly review the FD and ST methods in this subsection.

Limited area models (LAMs) consider atmospheric flows over a subsection of the earth’s surface. Examples include mesoscale models, which typically span hundreds of kilometers in the horizontal, and cloud resolving models (CRMs), which span approximately up to tens of kilometers in the horizontal. See an example of a simulated single cloud in Fig. 1.

Fig. 1
figure 1

Large Eddy Simulation of the evolution of a single cloud with the Nonhydrostatic Unified Model of the Atmosphere (NUMA). From [214]. The Maya® computer graphics software was used for the photo-realistic rendering of the simulation (for more details see http://anmr.de/cloudwithmaya)

The finite difference method (FD) is the method of choice for LAMs for several reasons. First, it is simple to implement on a Cartesian grid, especially if the curvature of the earth is neglected. Unlike EBG methods, or the finite volume method, grid generation is trivial and very few ancillary data structures are needed. Second, it is very efficient on a single processor, or on a small number of processors within a shared memory architecture (e.g. vector machines). Third, constructing both upwinded and higher order discretizations is relatively straightforward, although increasing the order of accuracy may hurt its scalability due to the larger halo required.

Global models (or general circulation models, GCMs) solve the governing equations on the whole planet, which is usually approximated as a sphere. The reader is referred to the 2007 paper by Williamson [321] for a review of GCMs. Many operational GCMs utilize ST, where spherical harmonics are used to represent both diagnostic and prognostic variables on the sphere. Spherical harmonics are the natural basis functions to solve PDEs on a sphere since they are the eigenfunctions of the negative Laplacian. Hence, great accuracy is achieved with a minimal number of grid points on the sphere. In order to advance the dynamical equations in time using ST, it is necessary to transform between physical and spectral space; this spectral transform is evaluated using a combination of Fourier and Legendre transforms. We perform an elementary complexity analysis of the ST method to illustrate a fundamental bottleneck as the resolution of NWP models increases.

Letting \(n\) be the number of grid points, Fourier transforms are evaluated along the longitudinal (zonal) direction with an FFT with a cost \(O(n \log n)\); along the latitudinal (meridional) direction, a Legendre transform is required with a cost of \(O(n^3)\). Although fast Legendre methods exist, they are not widely used in NWP since they have high cross-over points. Therefore the cost of the ST method is \(O(n \log n + n^3)\), which scales adversely as \(n\) increases (e.g. horizontal resolution is increased). For a grid spacing greater than 10 km, the hydrostatic, rather than non-hydrostatic equations are the governing equations in GCMs (we will touch more on the equation sets in Sect. 2). These equations are solved via a vertical mode decomposition [119] which results in a constant-coefficient Helmholtz operator. Since spherical harmonics are exact solutions to this Helmholtz operator, no matrix inversion is required. Furthermore, ST have a very small dispersion error. ST models were developed during the era of smaller, shared-memory machines which did not require communicating data across processors. As the architectures transitioned from shared to distributed memory, the communication overhead became more important; the all-to-all communication required by both the FFT and Legendre transform poses a barrier to scalability (not all distributed-memory hardware can do this operation effectively). For instance, the ST-based model NOGAPS, used by the U. S. Navy, could not scale beyond 150 processes at typical resolutions [119]. Hence, ST methods, while both highly accurate and efficient at small processor counts, cannot compete in the era of hundreds of thousands (or millions) of processors.

To overcome the limitations of FD and ST in the current era of massively parallel computers, EBG methods are becoming the new trend in atmospheric modeling for the same reason they have always been popular in other fields of computational mechanics. This alternative is justified by the proven high parallel efficiency of local methods [73, 176, 226, 320]. The efficiency of EBGs on large to very large machines is facilitated by their small parallel communication footprint. To understand this small footprint, consider Fig. 2, where the grids needed by a (a) finite element and by a (b) finite difference method are compared. In Fig. 2a, the grid consists of nine finite elements \(\varOmega _h^{el}\). With EBG the solution is sought on an element-wise basis and each element communicates information to the others only through its shared boundaries (nodes in the case of CG; faces in the case of DG). When the finite element grid is partitioned into smaller portions of the global domain, the only information that needs to be exchanged among the subdomains of the partition is that on the boundary that each subdomain shares with its neighbors. In contrast, in Fig. 2b the grid is a classical structured, rectangular finite difference grid that here is plotted to be a direct analogue (in terms of node count) of the finite element mesh. Because a finite difference stencil is such that differentiation on each node in the domain requires information from a set of adjacent nodes that varies with the order of differentiation, when the domain is partitioned, some nodes will belong to two overlapping subdomains. Because of this, additional communication is necessary. In the case of element-based schemes communication is naturally low by construction. The details of EBG and which models are based on them are reviewed in Sect. 3.

Fig. 2
figure 2

Examples of the adjacency pattern for a finite element \(\varOmega _h^{el}\) (a), and for a node that belongs to a finite difference grid (b). In (a) and (b) information is exchanged, respectively, element- and node-wise. In (a), the only nodes that allow information to be shared between elements are the shared nodes on the boundary of neighboring elements (blue dots on the boundary of \(\varOmega _h^{el}\)). In (b), the cross made of blue circular nodes and a central red node is the stencil of a 4th-order differentiation performed on the central node. How these plots relate to parallelization is described in the text

1.4 A Roadmap for Element-Based Galerkin Methods and This Review

Historically, finite element, spectral transform, and discontinuous Galerkin methods have been developed in relative isolation. In the past several decades, especially with the advent of spectral elements, common threads were identified. The two most important ideas are: (1) decomposing a continuous domain \(\varOmega \) into a finite number \(N_e\) of non-overlapping elements \(\varOmega _e\) and (2) expanding the state variables in \(N\) basis functions \(\psi _i\) within each sub-domain (or element) \(N_e\). In the first operation, we express the geometry in an element-wise fashion; in the second operation, we perform a Galerkin expansion of the state variables. Hence, the moniker Element-based Galerkin method. As discussed, EBG methods are classified as either continuous (CG) or discontinuous (DG). Each of these methods may be characterized by the number of elements \(N_e\) (or equivalently, the element diameter \(h)\) and the order of the basis function \(p\). Resolution may be increased by increasing either \(N_e\) or \(p\) independently, allowing a wide range of combinations. In the limit of \(N_e=1\), the spectral transform (ST) can be seen as an EBG method; however, being ST a degenerate EBG, in the rest of the paper it won’t be considered among the EBG methods. This \(h-p\) parameter space is mapped in Fig. 3. In the left panel (CG), three numerical methods are displayed: finite elements, spectral elements, and the spectral transform method. Since continuity is required between elements, the lowest possible order \(p\) is one. The finite element method (FEM) is the special case when \(p=1,2,3\) basis functions are employed, while the spectral transform method is recovered if a single element is used with a very large order \(p \gg 1\). In the right panel, we see three numerical methods: finite volumes, DG, and the spectral transform method. Since continuity is not required between elements, we may use constant-valued basis functions, which is equivalent to cell-averaging; hence, we recover the classical finite volume (FV) method if \(p =0 \). For \(1 \le p < \infty \) and \(N_e > 1\) we have DG, while for large \(p\) and \(N_e = 1\), we again recover the ST method. Gabersek et. al. [103] systematically mapped out the \(h-p\) space for SEM. They concluded that polynomial order \(p\) between 5 and 10 with an effective resolution of \(\bar{\varDelta x} = h/p\) between 0.5 and 2.0 km is optimal for mesoscale simulations in terms of both accuracy and efficiency. To our knowledge, the \(h-p\) space for global non-hydrostatic simulations has not been explored yet.

Fig. 3
figure 3

EBGs are divided into two classes: continuous Galerkin (CG) methods, whose solutions are continuous with bounded weak derivative \((H^1)\), and discontinuous Galerkin (DG) methods, whose solutions are square integrable \((L^2)\), but not necessarily continuous. The resolution of both CG and DG methods may be characterized by the polynomial order \(p\) of their basis functions and the number of elements \(N_e\) utilized, or, equivalently, by the diameter \(h \propto 1/N_e\) of each element. CG: If low order basis functions \((p=1,2,3)\) are utilized with a large number of elements, we recover the classical Finite Element Method. For \(p \ge 3\) and a smaller number of elements used, we have the spectral element method (SEM). As \(p\) is increased, \(N_e\) may be decreased. In the extreme case of a single element (on the sphere) and \(p \gg 1\), the ST method is recovered. If we are considering problems in Cartesian geometry, this extreme case is generally termed “spectral” or “pseudo-spectral”. DG: Since DG admits discontinuous solutions, a constant basis function \(p = 0\) is admissible, yielding the classical finite volume (FV) method. As \(p\) is increased and \(N_e\) decreases, we enter the arena of DG methods. As with CG, if a single element is utilized, the ST method is recovered. In this extreme case, the solution becomes continuous

1.5 Scalability of EBG Methods

In the following we report on some recent scalability results of EBG on different systems and for different numerical configurations. For a more theoretical discussion on Galerkin scalability, we refer to [142, 176].

1.5.1 Scalability for (Horizontally) Explicit Time Integration

In global atmospheric simulations the vertical resolution is usually much finer than the horizontal. This leads to a much smaller time scale for vertical processes than for horizontal motion. For this reason, it is often more efficient to solve the fast processes in the vertical direction implicitly while using explicit time integration in the horizontal direction, or horizontally-explicit, vertically-implicit (HEVI). If a 2D domain decomposition strategy is adopted where all the elements in a vertical column are maintained on a single processor, HEVI does not incur any additional communication. A recent result for this strategy with the Nonhydrostatic Unified Model of the Atmosphere (NUMA) [117, 176] is shown in Fig. 4. This figure shows that NUMA achieves weak scaling up to 777,600 cores and strong scaling to about 40,000 cores; moreover, the last blue data point in this figure indicates that NUMA scales in this fashion to the limit of one horizontal element per core.

Fig. 4
figure 4

Scalability study with the atmospheric model NUMA for the baroclinic instability test case [160] for three different horizontal resolutions of 25.0, 12.5 and 2.78 km (given in the legend). This scalability study was performed on the Blue Gene Mira of the Argonne National Lab. The number next to each data point shows the average number of elements per core. These simulations use a cubed sphere mesh generated by the function library p4est [42]. All simulations use six elements in the vertical direction with HEVI time integration and a fifth-order CG method

One important factor that contributed to the excellent speedup shown in Fig. 4 is that the amount of work on each core needs significantly more runtime than the time spent in communicating the data among neighboring cores. This becomes more difficult when fully explicit time integration is used and when the spatial discretization order is reduced (Fig. 5).

Fig. 5
figure 5

Scalability study with NUMA using a 1D semi-implicit (HEVI) simulation of a 3D rising thermal bubble in a 1 \(\hbox {km}^3\) cubed domain for polynomial degrees 4 and 8 (see legend), using \(32^3\) elements. The average number of elements per core is given by the numbers next to each data point. This scalability study was performed on the Blue Gene Vesta of the Argonne National Lab

1.5.2 Scalability for Fully Implicit Time Integration

Scalability studies with the model Alya [142, 308] and fully implicit time integration on different machines are shown in Fig. 6.

Alya is an unstructured finite element code. The mesh partitioning therefore relies on the element graph, whose complexity depends on the geometry considered. Libraries such as ZOLTAN [28], SCOTCH [49], or METIS [174], which are based on graph partitioning algorithms, may be used to decompose an EBG mesh. Just like NUMA, Alya does not require halos and the information exchange between neighbors is carried out on the interface nodes, that is, the nodes shared by different subdomains. From the parallelization point of view, the load balance and the communication scheduling for these two codes depend on the quality of the partition.

Several iterative solvers are available, and the selection depends on the physical problem considered. The incompressible Navier–Stokes equations require the solution of the momentum equation and the pressure equation [140]. For the first algebraic system, the GMRES method with a simple diagonal preconditioning is efficient in most of the cases, and few iterations are required to obtain convergence. For the pressure equation, a deflated conjugate gradient method [203] is used together with linelet preconditioning [277], which is very efficient in the presence of boundary layers. The four scalabilities presented in Fig. 6 were obtained for the Navier–Stokes equations. The last one represents the combustion in a kiln, which consists in solving the low Mach Navier–Stokes equations together with a temperature equation and chemical reactions.

Fig. 6
figure 6

Scalability study for a fully implicit simulation using Alya [142, 308] on four different HPC architectures

1.6 Plan of the Paper

The rest of the review is organized as follows: in Sect. 2 we give an overview of the different equation sets used in the dynamical cores of atmospheric models. Element-based Galerkin methods within the context of NWP are introduced in Sect. 3. Since EBG methods may produces unphysical extrema (especially high-order EBGs), stabilization/filtering is often required: a review of some stabilization methods follows in Sect. 4. Section 5 explores accurate grid generation within high resolution simulations (e.g. well resolved topography), along with static and dynamic grid adaptivity. A summary is reported in Sect. 6.

2 Equation Sets for Atmospheric Modeling

For typical atmospheric scales (1 m to 1000+ km), the earth’s atmosphere can be treated as a continuum governed by the compressible Navier–Stokes equations with body forces to model the effects of gravity and the Earth’s rotation (i.e. Coriolis force). Although the gravitational force varies with both altitude and latitude, these minor perturbations are generally neglected. In this section, we neglect the effects of moisture, solar radiation, and heat flux from the ground and consider the dry dynamics of the atmosphere. Let \(\varOmega \) be a three-dimensional domain in a rotating reference system \({\mathbf{x}}\) and let \(t \ge 0\) be time. The state of dry, stratified air can be described by density, \(\rho \), pressure, \(p\), absolute temperature, \(T\), and velocity field, \({\mathbf{u}}\),

$$\begin{aligned}&\frac{\partial \rho {\mathbf{u}}}{\partial t} + \nabla \cdot \left( \rho {\mathbf{u}}\otimes {\mathbf{u}}\right) + \nabla p - \nabla \cdot {\varvec{\sigma }}= -2\rho \left( {\varvec{\omega }}\times {\mathbf{u}} \right) - \rho \mathbf{g}, \end{aligned}$$
(1a)
$$\begin{aligned}&\frac{\partial \rho }{\partial t} + \nabla \cdot \left( \rho {\mathbf{u}} \right) = 0, \end{aligned}$$
(1b)
$$\begin{aligned}&\frac{\partial E}{\partial t} + \nabla \cdot \left( (E + p){\mathbf{u}} \right) - \nabla \cdot \left( \frac{\mu c_{p}}{{\Pr }}\nabla T + {\mathbf{u}}\cdot {\varvec{\sigma }}\right) = 0, \end{aligned}$$
(1c)

where \({\varvec{\omega }}\) is the rotational velocity of the Earth, \({\varvec{\sigma }}\) is the viscous stress tensor, g is the sum of true gravity and the centrifugal force, and the total energy, \(E\), is given by

$$\begin{aligned} E = \rho c_{v} T + \frac{1}{2} \rho {\mathbf{u}}\cdot {\mathbf{u}} + \rho g r, \end{aligned}$$
(2)

where \(r\) is the radial distance from a fixed reference point at the center of the earth. Equation (2) consists of three components: internal energy, kinetic energy, and gravitational potential energy. For a Newtonian fluid with dynamic viscosity \(\mu \), the viscous stress tensor is given by

$$\begin{aligned} {\varvec{\sigma }} = \mu \left[ \left( \nabla {\mathbf{u}} + (\nabla {\mathbf{u}})^{\textsc {t}} \right) - \frac{2}{3}(\nabla \cdot {\mathbf{u}} )\mathbf{I}\right], \end{aligned}$$
(3)

where \(2/3\) is a constant derived from the Stokes hypothesis and \({\textsc {t}}\) is the vector transpose [232]. The system (1) of five conservation laws in six unknowns is closed by the equation of state (ideal gas law) for pressure:

$$\begin{aligned} p = \frac{R}{c_v}\left( E - \frac{1}{2}\rho {\mathbf{u}}\cdot {\mathbf{u}} - \rho g r \right) . \end{aligned}$$
(4)

We note that Eq. (3) does not incorporate any effects of turbulent dissipation. Since the Kolmogorov length scale of a typical atmospheric problem is on the order of 0.1 mm, direct numerical simulation (DNS) of atmospheric motion is not possible with the current computational resources. To properly account for unresolved turbulent motion (e.g. turbulent dissipation), a sub grid scale model or turbulence closure scheme should be included. To simplify the treatment of the most commonly used sets of equations and of the numerical methods discussed below, we will neglect viscosity and restrict our analysis to the Euler equations \((\mu = 0)\) and various approximations utilized in atmospheric modeling; however, we will revisit viscous effects in the context of stabilization in Sect. 4.

Atmospheric models can be broadly classified into three groups: (1) non-hydrostatic models based on the compressible Euler equations, (2) hydrostatic models, which assume a vertical momentum balance between gravity and the vertical pressure gradient but include the vertical stratification of the atmosphere, and (3) sound-proof models. We also mention the shallow water model, which neglects all vertical motion by assuming each column of air moves as a rigid body, as shallow water models are often developed to test the horizontal propagation of features by numerical methods before they are applied to the solution of the equations for a full atmosphere.

The set of governing equations constitutes the dynamical core of the model. In the following sections, we broadly survey the equation sets commonly used in existing operational and research atmospheric models, beginning with non-hydrostatic models and ending with shallow water models. For a discussion of the interplay between the choice of equation set and numerical challenges encountered, consult [299]. For an analysis of the differences between non-hydrostatic, hydrostatic, shallow atmosphere (note, not to be confused with the shallow water model) and deep atmosphere approximations, see [317].

2.1 Non-hydrostatic Models

The fully compressible Euler and Navier–Stokes equations model all the scales and motions of the atmosphere. In NWP the equations expressed in the form of (1) are very often algebraically manipulated via the introduction of derived physical variables to help the physical interpretation of the atmosphere. For example, let us introduce potential temperature, \(\theta \), which is the temperature that an air parcel would have if it were expanded or compressed adiabatically to a standard pressure \(p_0 = 1000\,\hbox {hPa}\) [138]. Potential temperature is related to \(p\) and \(T\) via the expression \(\theta =T/\pi \), where

$$\begin{aligned} \pi =\left( p/p_0\right) ^{R/c_{p}} \end{aligned}$$
(5)

is a normalized pressure (known as Exner pressure) with respect to a reference pressure \(p_0\). Given \(\theta \), the following conservation laws for \((\rho , {\mathbf{u}}, \theta )^{\textsc {T}}\) are obtained by transforming Eq. (1c):

$$\begin{aligned}&\frac{\partial \rho {\mathbf{u}}}{\partial t} + \nabla \cdot \left( \rho {\mathbf{u}}\otimes {\mathbf{u}}\right) + \nabla p = - \rho {\mathbf{g}} - 2\rho \left( {\varvec{\omega }}\times {\mathbf{u}} \right) , \end{aligned}$$
(6a)
$$\begin{aligned}&\frac{\partial \rho }{\partial t} + \nabla \cdot \left( \rho {\mathbf{u}}\right) = 0, \end{aligned}$$
(6b)
$$\begin{aligned}&\frac{\partial \rho \theta }{\partial t} + \nabla \cdot \left( \rho \theta {\mathbf{u}}\right) = 0. \end{aligned}$$
(6c)

The equation of state for pressure (4)

$$\begin{aligned} p = p_{0} \left( \frac{\rho R \theta }{p_{0}} \right) ^{c_{p}/c_{v}} \end{aligned}$$
(7)

completes the system. Numerical methods for the solution of this system can be easily constructed to conserve mass and momentum. It is, however, much more difficult to formulate numerical schemes that also conserve energy. However, for an adiabatic and reversible system, entropy is conserved. Entropy \(s\) may be related to potential temperature \(\theta \) via the relation

$$\begin{aligned} s = c_{p}\ln {\theta } + \hbox {constant}, \end{aligned}$$

thereby justifying the use of \(\theta \) rather than \(E\).

The ARW-WRF model [271] is based on this set, and so are the finite volume model described in [2], the Met Office ENDGame [322], and the German LM model [62]. The Nonhydrostatic Unified Model of the Atmosphere (NUMA) [117, 176] developed at the Naval Postgraduate School is designed around two different sets, including (6). NUMA is the underlying dynamical core of the next generation NWP model of the U.S. Navy, NEPTUNE.

Constructing the divergence of flux in Eq. (6) requires some additional computational overhead; this overhead may be reduced by converting Eqs. (6) to their advection form:

$$\begin{aligned}&\frac{\partial {\mathbf{u}}}{\partial t} + {\mathbf{u}}\cdot \nabla {\mathbf{u}} + \frac{1}{\rho }\nabla p = -{\mathbf{g}} - 2 {\varvec{\omega }}\times {\mathbf{u}}, \end{aligned}$$
(8a)
$$\begin{aligned}&\frac{\partial \rho }{\partial t} + \nabla \cdot \rho {\mathbf{u}}=0, \end{aligned}$$
(8b)
$$\begin{aligned}&\frac{\partial \theta }{\partial t} + {\mathbf{u}}\cdot \nabla \theta = 0, \end{aligned}$$
(8c)

again, completed by an equation of state given by Eq. (7). Numerical approximations to this set of equations can be constructed to conserve mass, although conservation of momentum and energy is more difficult to obtain. NUMA is designed to be able to handle this set as well, although the flux form (6) is the required formulation when NUMA is executed in the discontinuous Galerkin mode.

By combining the definition of the Exner pressure [Eq. (5)] with the continuity equation in Eqs. (8), we obtain:

$$\begin{aligned}&\frac{\partial {\mathbf{u}}}{\partial t} + {{\mathbf{u}}\cdot {\nabla {\mathbf{u}}}} + c_{p}{\theta \nabla \pi } = -{\mathbf{g}} - 2 {\varvec{\omega }}\times {\mathbf{u}}, \end{aligned}$$
(9a)
$$\begin{aligned}&\frac{\partial \pi }{\partial t} +{{\mathbf{u}}\cdot {\nabla \pi }} - \frac{R}{c_{v}}{\pi \nabla \cdot {\mathbf{u}}} = 0, \end{aligned}$$
(9b)
$$\begin{aligned}&\frac{\partial \theta }{\partial t} + {{\mathbf{u}}\cdot {\nabla \theta }} = 0, \end{aligned}$$
(9c)

where \((\pi , {\mathbf {u}}, {\theta })^{{\fancyscript{T}}}\) is the vector of the solution variables [71, 85]. The practitioners who use this set justify it by saying that it is self-contained because there is no need for a state equation. For as much as it is evident that no equation of state is directly necessary, we still need to point out that the algebraic computation of \(p\) from an equation of state similar to (4) is here simply substituted by the diagnosis of \(p\) and \(\rho \) from \(\pi \) or \(\theta \); an operation that is still necessary when it comes to the analysis of the forecast. This still contributes to the net operation count to Eq. (4). Equations (9) do not conserve mass, momentum, and energy; yet, they are widely used in operational NWP models such as MM5 developed at Penn State and NCAR [83], NMM based on the work by Janjic [166] at NCEP, COAMPS [135] from the U.S. Naval Research Laboratory (NRL), and HIRLAM [256, 257] by a consortium of European numerical weather services.

2.1.1 Sound Waves: Anelastic Models and Implicit Time Integration

All of the non-hydrostatic equation sets described in the previous section are compressible; therefore, they all contain sound waves which propagate at a very high speed (approximately 300 m/s) relative to meteorologically relevant phenomena. If these equations are discretized explicitly, a small time-step must be utilized in order to satisfy the stability criterion based on the Courant–Friedrichs–Lewy (CFL) condition [67], thereby increasing the computational cost of the model. Since the vertical grid-spacing is typically much smaller than the horizontal grid spacing, the vertically propagating sound waves are the most problematic aspect in these equation sets. To bypass the small time-step requirement of the models that support sound-waves, yet preserve the remaining dynamics, the anelastic model was introduced in 1953 by Batchelor [15] and later analyzed in [10, 202, 233], where the continuity equation in Eqs. (6) and (8) is replaced by

$$\begin{aligned} \nabla \cdot ({\overline{\rho }}(z){\mathbf{u}})=0. \end{aligned}$$
(10)

In (10), density \(\rho \) is only a function of height. An improved soundproof approximation is the pseudo-incompressible model proposed by Durran [84, 86], where the time dependence of density is accounted for, although density is a function of a time-invariant reference state pressure and time-dependent potential temperature. All these models are able to filter sound from the original compressible Euler/Navier–Stokes equations, but still account for the most important waves (e.g. Rossby) in the solution of the atmospheric motion. The interested reader may consult the review [179] for more on the validity of these approximations. A step towards the blending of soundproof and compressible Euler equations was recently investigated in [20].

The soundproof approximation of the governing equations is one option to the necessary filtering of sound waves. The fully compressible, non-hydrostatic equations can, on the other hand, be approximated in time via a semi-implicit scheme as done in [71, 291, 292]. Because the fast waves are treated implicitly in a semi-implicit approximation, the time step is only limited by the non-linear advective part of the equation; hence, the time-step is limited by the advective CFL condition \(\varDelta t \le C \varDelta x / ||{\mathbf{u}}||\), which is far less stringent than the CFL condition \(\varDelta t \le C \varDelta x / (||{\mathbf{u}}|| + c_s)\), where \(C\) is a constant of order one and \(c_s\) is the speed of sound.

Semi-implicit methods are closely related to implicit–explicit (IMEX) methods [189]. Semi-implicit is, for the most cases, tied to the combination explicit leap-frog + implicit Crank–Nicholson, whereas IMEX can be viewed as a generalization that allows for different time-differencing schemes, as first proposed in 2009 by Restelli and Giraldo [249] to solve the fully compressible Navier–Stokes of nonhydrostatic stratified flows approximated in space by DG. Moreover, the IMEX + DG by Restelli and Giraldo is a general method applicable to different Mach regimes for viscous and inviscid flows. In 2004, Dolejši and Feistauer [76] coupled DG with an implicit–explicit time discretization scheme to solve the Euler equations of fully compressible flows. In that paper, the numerical flux term was first discretized in a fully implicit manner; then, the implicit numerical flux was linearized via a Taylor expansion, resulting in a linear system of equations which is solved via a sparse iterative solver, as opposed to a more expensive non-linear solver (e.g. Newton–Krylov) as required by a fully implicit discretization. More recent work on IMEX methods includes [87], which utilizes Adams and backward difference methods, and [314], which takes a horizontally-explicit vertically-implicit (HEVI) approach.

In 2013, an IMEX version of the Nonhydrostatic Unified Model of the Atmosphere NUMA was introduced in [117]. Both a 3D IMEX scheme which discretized the horizontal and vertical (linear) operators implicitly, and a 1D IMEX which only discretized the vertical operators implicitly (HEVI), were derived and compared using both second-order backward difference formulas and higher-order (up to order 4) implicit Runge–Kutta methods.

As mentioned earlier, 3D-IMEX methods require the solution of a linear system of equations. This linear solve may be poorly conditioned (especially for large Courant numbers) and hence computationally expensive. An alternative method that does not require a linear solve is the split-explicit method [293]. The split-explicit approach relies on sub-time stepping to treat the terms that represent sound and gravity waves within one larger explicit time-step for the remaining terms. This method is common in atmospheric simulations, in spite of its low accuracy [318, 319] and potential instabilities.

2.1.2 Nearly-Hydrostatic Flows

Dynamics in the atmosphere are characterized by small variations of thermodynamic quantities with respect to some background state [178, 207]:

$$\begin{aligned}&\rho ({\mathbf{x}},t) = \rho '({\mathbf{x}},t) + \bar{\rho }(z) \end{aligned}$$
(11a)
$$\begin{aligned}&p({\mathbf{x}},t) = p'({\mathbf{x}},t) + \bar{p}(z) \end{aligned}$$
(11b)
$$\begin{aligned}&\varTheta ({\mathbf{x}},t) = \varTheta '({\mathbf{x}},t) + \bar{\varTheta }(z) \end{aligned}$$
(11c)

where the primed and barred quantities represent, respectively, the perturbation and the background state of \(\rho \), \(p\), and \(\varTheta \). In Eq. (11c), \(\varTheta = \rho \theta \). In typical atmospheric simulations, \(\rho '\ll \bar{\rho }\), \(p'\ll \bar{p}\) and \(\varTheta '\ll \bar{\varTheta }\). If the vertical acceleration is zero, the vertical component of the momentum equation reduces to the hydrostatic balance given by the following equation:

$$\begin{aligned} \frac{\partial p}{\partial z} = -g\rho . \end{aligned}$$
(12)

Given these considerations and the analysis of nearly-hydrostatic flows for well-balanced methods [30], the system (6) is transformed in terms of perturbation variables where the Coriolis term is neglected. Substituting Eq. (11) into Eq. (6) and applying Eq. (12) to the \(z\)-component yields

$$\begin{aligned}&\frac{\partial \rho {\mathbf{u}}}{\partial t} + \nabla \cdot \left( \rho {\mathbf{u}}\otimes {\mathbf{u}}\right) + \nabla p' = - \rho ' {\mathbf{g}}, \end{aligned}$$
(13a)
$$\begin{aligned}&\frac{\partial \rho '}{\partial t} + \nabla \cdot \left( \rho {\mathbf{u}}\right) = 0, \end{aligned}$$
(13b)
$$\begin{aligned}&\frac{\partial (\rho \theta )'}{\partial t} + \nabla \cdot \left( \rho \theta {\mathbf{u}}\right) = 0. \end{aligned}$$
(13c)

Throughout this review, the primes will be mostly omitted to simplify notation.

2.2 Hydrostatic Versus Non-hydrostatic Models

Atmospheric models can be distinguished as hydrostatic and non-hydrostatic. If we assume the vertical acceleration to be negligible, the vertical momentum equation of the hydrostatic system reduces to the diagnostic equilibrium equation (12). At every time-step, this time-independent equation is solved instead of the full equation for vertical momentum. Sound waves are eliminated in the vertical direction [85] but not in the horizontal direction. Because the size of the domain in the horizontal direction is typically much larger than the vertical depth of the atmosphere and the grid size along \(x\) and \(y\) may be orders of magnitude larger than the grid spacing along \(z\), a much larger time-step may be utilized.

The hydrostatic approximation has been a central approximation of NWP for the past four decades and is used in the hydrostatic primitive equations (HPE) discussed in the next section. This approximation is valid for horizontal grid spacing larger than 10 km [165, 299]. The hydrostatic approximation is still appropriate to simulate synoptic scale phenomena where the vertical acceleration can be neglected, but is no longer considered in any mesoscale simulation. With the availability of more powerful computers, the non-hydrostatic formulations described above are standard for mesoscale NWP. The reader should refer to [21, 29, 106, 118, 124, 135, 166, 167, 271, 323] for more on the evolution of non-hydrostatic models.

2.2.1 Hydrostatic Primitive Equations

The hydrostatic primitive equations (HPE) govern the dynamics in synoptic scale (e.g. global-scale) meteorology and are valid for horizontal resolution coarser than 10 km. The HPE are expressed in so-called \(\sigma \) coordinates which allow the boundary condition on the ground to be easily applied, even in the presence of complex orography. The HPE are derived by first expressing the compressible Euler equations in terms of pressure, velocity, and potential temperature. A hydrostatic balance is applied in the vertical direction, which removes vertical acceleration from the momentum equation. Since the HPE rarely appear outside of atmospheric and climate studies, we present a brief derivation from the compressible Euler equations. A more comprehensive treatment is found in [138].

We first apply a Coriolis term to the right hand side of the momentum equation [Eq. (8a)]. Decomposing the velocity \({\mathbf {u}}\) into a horizontal \({\mathbf {u}}_H\) and vertical \(w\) components, the horizontal momentum balance is given by

$$\begin{aligned} \frac{D {\mathbf {u}}_H}{D t} = -\frac{1}{\rho } \nabla _H p - f {\mathbf {k}} \times {\mathbf {u}}_H \end{aligned}$$
(14)

where \(f = 2 \omega \sin \alpha \) is the Coriolis constant at the latitude \(\alpha \) for angular rotation \(\omega \). Next, we transform Eq. (14) into isobaric coordinates \((x,y,p)\), where pressure is the vertical component; this is a useful intermediate step on the path to \(\sigma \) coordinates. Introducing a velocity potential \(\varPhi (x,y,p,t)\), it can be shown that \(\nabla _P \varPhi = \nabla p / \rho \), where the gradient is taken with respect to isobaric coordinates, yielding

$$\begin{aligned} \frac{D_P {\mathbf {u}}_H}{D_{P}t} = -\nabla _P \varPhi - f {\mathbf {k}} \times {\mathbf {u}}_H, \end{aligned}$$
(15)

where the total derivative in Eq. (15) is defined as

$$\begin{aligned} \frac{D_{P}}{D_{P} t} = \frac{\partial }{\partial t} + {\mathbf {u}}_H \cdot \nabla _H + \hat{\omega } \frac{\partial }{\partial p} \end{aligned}$$
(16)

and \(\hat{\omega } = Dp / Dt\). Next, we transform Eq. (15) into \(\sigma \) coordinates via \(\sigma = p / p_{s}\), where \(p_s = p_s(x,y,t)\) is the surface pressure. Note that in this coordinate system, the boundary condition on the ground is always \(\sigma = 1\). Application of the chain rule to the gradient of the velocity potential yields

$$\begin{aligned} \nabla _P \varPhi = \nabla _{\sigma } \varPhi - \sigma \nabla \ln p_s \frac{\partial \varPhi }{\partial \sigma } \end{aligned}$$
(17)

while the total derivative is given by

$$\begin{aligned} \frac{D_{\sigma }}{D_{\sigma } t} = \frac{\partial }{\partial t} + {\mathbf {u}}_H \cdot \nabla _H + \dot{\sigma } \frac{\partial }{\partial \sigma } \end{aligned}$$
(18)

Combining Eqs. (17) and (18) in Eq. (15) yields

$$\begin{aligned} \frac{\partial {\mathbf {u}}_{H}}{\partial t} + {\mathbf {u}}_H \cdot \nabla _{H} {\mathbf {u}}_{H} + \dot{\sigma } \frac{\partial {\mathbf {u}}_H}{\partial \sigma } = \nabla _{\sigma } \varPhi - \sigma \nabla \ln p_s \frac{\partial \varPhi }{\partial \sigma } - f {\mathbf {k}} \times {\mathbf {u}}_H \end{aligned}$$
(19)

In a similar manner, an equation of continuity for the surface pressure \(p_s\) is derived from (6a)

$$\begin{aligned} \frac{\partial p_s}{\partial t} + \nabla _H \cdot \left( p_s {\mathbf {u}}_H \right) + p_s \frac{\partial \sigma }{\partial \sigma } = 0 \end{aligned}$$
(20)

along with a transport equation for potential temperature \(\theta \) from (6c)

$$\begin{aligned} \frac{\partial \theta }{\partial t} + \nabla \cdot {\mathbf {u}}_H \cdot \nabla _H \theta + \dot{\sigma } \frac{\partial \theta }{\partial \sigma } = 0 \end{aligned}$$
(21)

In each \(\sigma \) level, we solve for the prognostic variables \({\mathbf {q}} = \left( p_s, {\mathbf {u}}_H, \theta \right) ^{\textsc {T}}\), while the diagnostic variables are the vertical velocity \(\dot{\sigma }\), pressure \(p\), and geopotential \(\phi \). Because these equations are in exact hydrostatic balance, there are no vertically propagating acoustic or gravity waves. By computing the eigenvalues of the HPE, it is shown that the fastest moving waves are horizontally propagating gravity waves [114]. Hence, even with an explicit time integrator, a much larger time-step may be used with the HPE than with the compressible Euler equations. For this reason, the HPE form the basis of most global atmospheric and climate models.

2.3 Shallow Water Equations (SWE)

The hydrostatic primitive equations require a solution at \(N\) model levels (independent \(\sigma \) or pressure levels). This requires significant computational effort. The HPE may be simplified even further to remove all vertical dependence. One approach is to expand each prognostic variable in Eq. (20) in a 1D Fourier series with height \(\sigma \) as the argument and only retaining the zeroth-term in this series, commonly called the barotropic mode. Another approach is to start with the full compressible Euler equations and apply both the hydrostatic approximation given by Eq. (12) and the shallow water approximation where the deviation of the geopotential height \(\varPhi \) from a given reference geopotential \(\varPhi _0\) is small. From an ocean modeling point of view, this assumption is equivalent to assuming the water depth is small compared to the wavelength of the waves of interest (gravity waves and Coriolis induced Rossby waves). In flux form, the SWE of a viscous atmosphere of depth \(h\) on a rotating sphere of radius \(r\) are:

$$\begin{aligned}&\frac{\partial \varPhi \mathbf u}{\partial t} + \nabla \cdot \left( \varPhi {\mathbf{u}}\otimes {\mathbf{u}}\right) = -\varPhi \nabla \varPhi - f\left( {\mathbf{x}}\times \varPhi {\mathbf{u}}\right) + \mu {\mathbf{x}} + \nu \nabla ^2(\varPhi {\mathbf{u}}), \end{aligned}$$
(22a)
$$\begin{aligned}&\frac{\partial \varPhi }{\partial t} + \nabla \cdot \left( \varPhi {\mathbf{u}} \right) = 0. \end{aligned}$$
(22b)

Equation (22) may be expressed in Cartesian coordinates instead of spherical coordinates by applying a fictitious force \(\mu {\mathbf{x}}\), where \(\mu \) is the Lagrange multiplier; this approach, which facilitates an arbitrary spherical grid, was first proposed by Coté [63] for the semi-Lagrangian solution of the problem and later used in [110, 112] for the solution of the full nonlinear equations. The numerical solution of SWE on spherical geometries is reported by many authors such as [242] (FEM), [112, 158, 204, 294] (SEM), [116, 227, 303] (DG), [211] (unified CG/DG on different unstructured grids with static and dynamic adaptivity), [191, 196, 309, 316] (FV), [278] (comparison between SEM and FV), [245] (comparison using different numerical methods).

2.4 Transport in the Atmosphere

In a typical atmospheric model, there are multiple forms of water (e.g. vapor, rain, ice); in a climate model, there are also hundreds of chemical species. These quantities are transported and diffused by atmospheric dynamics and are classified as tracers. In turn, these tracers actively feedback to dynamics (e.g. latent heat release). To model these tracers, the governing equations of a dry flow must be coupled to a set of transport–diffusion equations for such tracers. For simplicity, we describe how tracers are treated in atmospheric models by looking at the transport of three water quantities only; however, this approach applies to an arbitrary number of tracers.

Let us define the mixing ratios of water vapor, cloud water, and rain as \(q_{v}=\rho _{v}/\rho , q_{c}=\rho _{c}/\rho \) and \(q_{r}=\rho _{r}/\rho \), where \(\rho _{v,c,r}\) are the densities of water vapor, cloud, and rain. Let us also choose one of the nonhydrostatic equation sets described previously and write the coupled system of equations that model a moist atmosphere; we consider system (8) and write the following:

$$\begin{aligned}&\frac{\partial {\mathbf{u}}}{\partial t} + {\mathbf{u}}\cdot \nabla {\mathbf{u}} + \frac{1}{\rho }\nabla p = -{\mathbf{g}}(1 + \varepsilon q_v -q_c - q_r) - 2 {\varvec{\omega }}\times {\mathbf{u}} + S_{turb}, \end{aligned}$$
(23a)
$$\begin{aligned}&\frac{\partial \rho }{\partial t} + \nabla \cdot \left( \rho {\mathbf{u}}\right) =0, \end{aligned}$$
(23b)
$$\begin{aligned}&\frac{\partial \theta }{\partial t} + {\mathbf{u}}\cdot \nabla \theta = \nabla \cdot \left( \kappa _{\theta } \nabla \theta \right) + S_{\theta }(\rho ,\theta ,q_v,q_c,q_r), \end{aligned}$$
(23c)
$$\begin{aligned}&\frac{\partial q_{i}}{\partial t} + {\mathbf{u}}\cdot \nabla q_i = \nabla \cdot \left( k_{q_i} \nabla q_i \right) + S_{q_i}(\rho ,\theta ,q_{v},q_{c},q_{r}),\quad \hbox {for}\;i=v,c,r, \end{aligned}$$
(23d)

where \(\varepsilon =R/R_v\) is the ratio of the gas constants of dry air, \(R\), and of water vapor, \(R_v\). Moist air contributes to the buoyancy of the flow, so that the right hand side of the momentum equation must be corrected by the total buoyancy \(\mathbf{B}=-g{\mathbf{g}}(1 + \varepsilon q_{v} - q_{c} - q_{r})\). The diffusion coefficients \(k_{\theta _{i}}\) and \(k_{q_{i}}\) are typically modeled via an algebraic turbulence closure via

$$\begin{aligned} k_{\theta }= & {} \nu / \text{Pr}_0 + \nu _t / \text{Pr}_t \end{aligned}$$
(24a)
$$\begin{aligned} k_{q_i}= & {} \nu / Sc_0 + \nu _t / Sc_t \end{aligned}$$
(24b)

while the closure term \(S_{turb}\) depends on the turbulence model employed. In Eq. (24), \(\nu \) is molecular viscosity, \(\nu _t\) is eddy viscosity, \(Sc_0\) is the molecular Schmidt number and \(Sc_t\) is the turbulent Schmidt number. Typical values are \(Sc_0 = Sc_t = 0.7\). The microphysical processes that involve phase change in the water content are modeled by the source/sink terms, \(S_{\theta ,{q_i}}\), in the equations. For example, in the case of water vapor, \(S_i\) is driven by evaporation and condensation. These terms can be modeled and computed by some properly designed microphysics scheme, such as the Kessler [177] scheme for warm clouds (no ice involved).

The appropriate numerical discretization of Eq. (23d) is still an active topic of research, especially since moisture possesses large gradients that can cause instabilities. In addition, since the mass fractions \(q_i\) are a priori non-negative, the numerical discretization should be monotonic or, at the very least, positivity-preserving. If, for example, our system produced negative moisture, the physical parameterization would have to resolve this issue in some way (e.g. clipping the negative values); in addition, the resulting incorrect feedback may pollute the overall solution and cause artificial rain to be produced by the model. The words of John P. Boyd are an amusing conclusion to this paragraph: “[...] Clever adaptive algorithms that work for smooth, straight shocks disintegrate into computational anarchy when flayed by gravity waves, assaulted by moist convective instability, battered by highly temperature-sensitive photochemistry, and coupled to the vastly different time and space scales of the ocean[...]” (SIAM News, Multiscale Numerical Algorithms for Weather Forecasting and Climate Modeling: Challenges and Controversies. Nov 2008, Vol. 41 issue 9). Monotonic solutions are certainly more difficult to achieve with high order numerical methods. The problem is particularly challenging when the transport equation is solved by high order methods such as spectral elements or DG. High order methods produce Gibbs oscillations near sharp gradients; these oscillations are unphysical and are exacerbated by increasing the order. Hence, limiters [187] or adaptive filtering is necessary. We will address this problem in Sect. 4, along with some issues involved with unstable Galerkin solutions.

2.4.1 Cloud Microphysics: Kessler Parameterization

Cloud microphysics include all thermo-physical processes at the scales of the particles that form the cloud. Examples are the phase change of water quantities or the agglomeration of particles into larger ones. Most physical processes typical of storm dynamics (e.g., precipitation, freezing, deposition, or sublimation) have physics across a large range of spatial and temporal scales that makes direct numerical simulation unfeasible (see [88], Ch. 10). For this reason, parameterization is commonly used within numerical models. Microphysical parameterization relies on the physical knowledge of certain processes without the need for fully resolving all the microscale processes that are involved. The clear limitation is that certain phenomena cannot be represented with high accuracy if they lie outside of the conditions required by the parameterization. Differentiation A simple representation of cloud microphysics was designed by Kessler and reported in his monograph [177].

Kessler’s is a bulk model, meaning that water species are categorized only with respect to the particles’ type. In other words, if we speak about cloud water, we would model it through one equation that represents the transport of cloud water concentration with water droplets of one single size. Bulk models are contrasted by explicit models, where, within each category (e.g., cloud, rain) the size of the water particles is considered as well. Explicit models are certainly more physically accurate, but they are more costly due to the greater number of quantities that must be accounted for. For more information on the topic refer to Houze’s book [139] and to more recent literature (e.g. [39]).

Kessler’s is a simple scheme based on the main assumption that ice is not contemplated (warm rain). The main limitation of the warm condition is that only moist convection at the tropics or at mid-latitudes in the warm season can be represented. The three forms of water that are considered are: (1) water vapor; (2) cloud water (liquid water whose size is so small that its terminal fall speed is negligible); and (3) precipitating water that only includes rain (namely, drops whose diameter is >0.5 mm). Drizzle is excluded (rain of drop diameter between 0.2 and 0.5 mm).

The main processes resolved by a warm cloud microphysics scheme are briefly described below. These processes dictate how the source terms of the previous equations are defined and how they affect the dynamics of the simulation. The reader is referred to, e.g., [139] and references therein for a more thorough analysis.

Given the approximated Teten’s formula [27] for the saturation vapor pressure,

$$\begin{aligned} e^* = 611.2 \exp \left( \frac{17.67 T}{T + 243.5}\right) , \end{aligned}$$

the saturation mixing ratio is given by

$$\begin{aligned} q_{vs} = \frac{\varepsilon e^*}{p - e^*}. \end{aligned}$$
(25)

From [181], the source terms in (23) are

$$\begin{aligned} S_{\theta }= & {} -\frac{L_{v}}{c_{p} T}\left( \dot{q}_{vs} + E_{r} \right) , \end{aligned}$$
(26a)
$$\begin{aligned} S_{q_{v}}= & {} \dot{q}_{vs} + E_{r}, \end{aligned}$$
(26b)
$$\begin{aligned} S_{q_{c}}= & {} -\dot{q}_{vs} - A_{r} - C_{r}, \end{aligned}$$
(26c)
$$\begin{aligned} S_{q_{r}}= & {} \frac{1}{\rho }\frac{\partial }{\partial z}(\rho V_{r} q_{r}) - E_r + A_{r} + C_{r}, \end{aligned}$$
(26d)

where \(c_{pl}\) and \(c_{pv}\) are the heat coefficients at constant pressure of liquid water and water vapor, respectively, \(L_v = L_{v0} - (c_{pl} - c_{pv})(T - T_0)\) is the latent heat of vaporization with reference value \(L_{v0} = 2.5e+6\,J\,kg^{-1}\), \(T_0\) is a reference temperature, \(V_r\) is the terminal fall speed of raindrops (taken positive in the downward direction), and \(\dot{q_{vs}}\) is the rate of condensation or evaporation (the dot symbol indicates differentiation with respect to time). \(A_r, C_r\), and \(E_r\) are the rates of autoconversion, collection, and evaporation of rain. They are computed using the formulas:

$$\begin{aligned} A_{r} & = {\texttt{MAX}}\left( 0, k_{1}(q_{c} - a_{T})\right) , \end{aligned}$$
(27a)
$$\begin{aligned} C_{r} & = k_{2}\rho ^{0.375}\,q_{c}\,q_{r}^{0.875}, \end{aligned}$$
(27b)
$$\begin{aligned} E_{r} & = -\frac{1}{\rho }\frac{(q_{v}/q_{vs} - 1)k(\rho \,q_{r})^{0.525}}{5.4\times 10^5 + 2.55\times 10^6(p\,q_{vs})}, \end{aligned}$$
(27c)

where \(k_1 = 0.001\,\hbox {s}^{-1}\), \(k_2 = 2.2\,\hbox {s}^{-1}\), \(a_T = 0.001\,\hbox {kg}^{-1}\) are Kessler’s parameters and \(k\) is the ventilation factor that is a function of the terminal fall speed. Equation (27a) was derived by Kessler considering that a cloud is converted into rainwater whenever \(q_c\) exceeds a threshold \(a_T\). Autoconversion is the rate at which the rain water content increases at the expense of cloud water due to the coalescence of smaller particles. Yet, this process is not fully understood. Nor is it fully understood how collection occurs. As the name suggests, collection can be explained as cloud water particles being collected by the falling larger rainwater droplets that go through the cloud layers during their fall. Evaporation occurs when the sensible heat flux from the environment into the water droplet is balanced by the latent heat of evaporation of the water particle. As in [276], the cloud droplets move at the same speed of the flow because they are considered having negligible terminal velocity.

The values of the constants in (27) are, to a certain extent, arbitrary [139]; however, by the observations, it is of common agreement that \(k_1, k_2\) and \(a_T\) are non-linear terms with respect to \(q_c\) itself. They are also a function of temperature and of the distribution of the condensation nuclei. As it is pointed out in Emanuel’s book [88], the lack of understanding of the underlying physics is such that different results are being obtained by different and more sophisticated schemes. However, this topic is beyond the scope of the present article. Nevertheless, it is important to emphasize that microphysical parameterization has a major role in forecasting clouds and precipitation, but is still an active field of investigation (see the 2008 paper by Morrison and Grabowski [223]).

2.4.2 Method of Solution Via Saturation Adjustment

Regardless of the type of space approximation, phase changes are classically treated via the saturation adjustment technique explained in detail in the appendix of [276]. Saturation adjustment—or fractional step method—is not the only option; however, due to its simplicity, it is convenient to describe it here to give a sense of how phase change is accounted for in these models.

The saturation adjustment technique consists of solving the problem in two steps. First, the prognostic equations are solved by neglecting all the terms that involve phase changes (all the \(S\)-terms are set to zero). This means that the dynamics and transport equations are advanced forward to an intermediate time-step \(n^*\) so that the intermediate values of the prognostic variables, \((\rho , p, \theta _{\rho }, q_{vs}, q_v, q_c, q_r)^*\), are obtained. These values are plugged into the Kessler module to compute the \(S\)-quantities defined above. Once the computation of \(S\) has completed, thermodynamic variables are updated and returned to the Euler/transport solver as the initial values for the next time step \(n+1\).

3 Element-Based Galerkin Methods: Finite Elements, Spectral Elements, and Nodal Discontinuous Galerkin

As discussed in the EBG roadmap, the finite element (FEM), spectral element (SEM), and discontinuous Galerkin (DG) methods, are specific types of Galerkin approximation techniques. In this section, we introduce the ideas behind Galerkin schemes in general and then distinguish between FEM, SEM, and DG in particular. We then trace the history of EBG methods in NWP and climate modeling over the past 20 years.

3.1 Element-Based Continuous Galerkin Methods

The birth of Galerkin methods dates back to Boris Grigoryevic Galerkin and his work on the numerical solution of the equations of the elastic equilibrium of rods and plates [105], and to the original ideas of Walter Ritz [253] 6 years earlier. Popularized by Courant in the early 1940s for the study of vibration and equilibrium [66] and extensively developed only in the late 1950s and 1960s by structural dynamicists in the aircraft industry [4], finite element methods in particular are among the most common numerical methods based on a Galerkin approach and that are used today in a wide range of applications. They are used in industry and for research purposes in, e.g., structural analysis [324], fluid dynamics [333], and electromagnetism [14]. Galerkin based methods are a robust tool for the solution of any differential problem [79] and are accepted by scientists and engineers in theoretical studies and applications for a series of reasons such as the ease in modeling complex geometries, the flexible and general purpose programming format that they imply, and the intrinsic treatment of differential-type boundary conditions. In the following, we will describe the idea behind the method of weighted residuals, of which the Galerkin finite element, spectral element, and discontinuous Galerkin methods represent special cases. For a simple but quasi-rigorous analysis of the method we use a problem of real engineering interest and that is a fundamental problem in numerical weather prediction: the advection–diffusion equation. The reader is referred to the books by Fletcher [95] or by Karniadakis and Sherwin [173] as a reference for the more mathematical aspects of Galerkin methods, and to the lecture notes by Giraldo [115] for a unified treatment of high-order continuous and discontinuous Galerkin methods.

Let us take a general differential problem

$$\begin{aligned} {{\fancyscript{L}}}(q) = S, \end{aligned}$$
(28)

where \({{\fancyscript{L}}}\) is the combination of both linear and non-linear differential operators in space x and time t, and S is a source function. Let \(d\) indicate the space dimension and let \(\varOmega \subset {\mathbb {R}}^d\) be the domain with the boundary \(\partial \varOmega \) where (28) is defined within the time interval \((0,t_f)\), and \(t_f\in {\mathbb {R}}^+\). For the problem to be well-posed, suitable boundary and initial conditions must be added to (28). Unless otherwise stated, given a known function \(g\), Dirichlet boundary conditions \(q({\mathbf{x}})=g\) for \({\mathbf{x}}\,\in \,\partial \varOmega \) will be applied to the problems described throughout this section.

As previously stated, Galerkin methods are a particular case of the method of weighted residuals. The idea behind this method is the numerical representation of the solution variable \(q\) by a finite dimensional approximation \(q^h\) obtained by the expansion

$$\begin{aligned} q^{h}({\mathbf{x}}) = \sum _{k=0}^{N} \psi _{k}({\mathbf{x}})\,\hat{q}_{k}, \end{aligned}$$
(29)

where \(N\) is the number of nodes \(p_k\) of a possible partition of the domain \(\varOmega \). On its discrete counterpart, \(\varOmega ^h\), a set of \(k=0,\ldots ,N\) known analytic test functions \(\psi _k\) are defined (The two terms test and basis will be used interchangeably. The unknown coefficients \(\hat{q}_k\) correspond to the physical values of \(q\) at node \(p_k\). The finite difference method is conceptually different in that what is approximated in the differential problem are the differential operators and not the solution variable. Substitution of (29) into (28) is such that \({{\fancyscript{L}}}(q^h) - S \ne 0 \). The method is called method of weighted residuals because a linear system of algebraic equations in the unknowns \(\hat{q}\) is built by imposing that

$$\begin{aligned} \int _{\varOmega } w\,R\,d\varOmega = 0, \end{aligned}$$
(30)

where \(R={{\fancyscript{L}}}- S\) is the (non-zero) residual of (28) and \(w\) is the weight function that has certain properties. Different methods arise from the selection of different \(w\). The Bubnov–Galerkin method is found when \(w = \psi _k\). We can then write the following:

$$\begin{aligned} \int _{\varOmega } \psi \,\left[ {{\fancyscript{L}}}- S\right] \,d\varOmega = 0. \end{aligned}$$
(31)

This is the weak form of the original equation to be solved.

Remark 3.1

So far, no distinction between the finite and spectral element methods has been made. The difference stems from the definition of the interpolation points used to construct \(\psi _k\).

3.1.1 Suitable Function Spaces

The choice of basis and test functions depend on the operator \({\fancyscript{L}}\) under consideration. In the specific cases of the advection–diffusion equation and the Navier–Stokes equations of compressible flows, the highest order of the derivatives is 2, and the choice of the basis functions and the space to which they belong must depend on this regularity condition.

We show that the weak solutions to a linear elliptic operator must resides in the Sobolev space \(H^1 ( \varOmega )\). Consider an operator \({\fancyscript{L}}\) defined on a global domain \(\varOmega \) with boundary \(\varGamma \) acting on a state vector \(q\); specifically, consider the elliptic operator

$$\begin{aligned} {\fancyscript{L}} \left( q \right) = \nabla \cdot \left( \nu \nabla q \right) \end{aligned}$$
(32)

where \(\nu > 0\) is the kinematic viscosity. We consider the boundary-value problem

$$\begin{aligned} {\fancyscript{L}} \left( q \right) = 0 \end{aligned}$$
(33)

with a Dirichlet boundary condition \(q({\mathbf {x}}) = q_0 ({\mathbf {x}})\) for all \({\mathbf {x}} \in \varGamma \) and \(q_0 ({\mathbf {x}}) \in C^1 \left( \varGamma \right) \). Consider a test function \(\psi \in L^2 \left( \varOmega \right) \) and also assume \(q \in L^2 \left( \varOmega \right) \). The following calculation demonstrates that \(q, \psi \in H^1 \subset L^2\).

Integrating Eq. (32) by parts yields

$$\begin{aligned} \int _{\varOmega } \nu \nabla \psi \cdot \nabla q \, d\varOmega = \int _{\varGamma } \psi \nabla \cdot \left( \nu \nabla q \right) \, d\varGamma . \end{aligned}$$
(34)

Since \(q_0 \in C^1\) and \(\psi \in L^2\), the right hand side of Eq. (34) is bounded, implying that the left hand side is bounded as well. We then write that:

$$\begin{aligned} \left| \int _{\varOmega } \nu \nabla \psi \cdot \nabla q \, d\varOmega \right|&\le \int _{\varOmega } \left| \nu \nabla \psi \cdot \nabla q \right| \, d\varOmega \\&\le \nu \Vert \nabla \psi \Vert _{L^2} \Vert \nabla q \Vert _{L^2} \nonumber \end{aligned}$$
(35)

where the second line follows from the Cauchy–Schwartz inequality. Hence, both the norms \(\Vert \nabla \psi \Vert _{L^2}\) and \(\Vert \nabla q \Vert _{L^2}\) are bounded, implying that \(\nabla \psi \) and \(\nabla q\) are square-integrable over the global domain \(\varOmega \). In other words, \(\psi , q \in H^1 \left( \varOmega \right) \).

With regards to CG methods, this elementary calculation illustrates two key points:

  1. 1.

    The space of test functions must be a subset of \(H^1\).

  2. 2.

    Since \(H^1 \subset C^0\), the state vector \(q\) is necessarily continuous.

We hence define the space \(W\) of test functions \(\psi \) and trial solutions \(q\) as a subset of \(H^1\) such that

$$\begin{aligned} W \doteq \{ \psi ,q\,\in \,H^1(\varOmega ) \quad \hbox {s.t.}\quad \psi = 0\quad \hbox {and}\quad q = g\;\hbox {on}\;\partial \varOmega \}. \end{aligned}$$
(36)

3.2 Finite and Spectral Elements: Discretization and Basis Functions

To discretize the problem in a finite and spectral element sense, the domain t\(\varOmega \) is first decomposed into a finite element partition \({\fancyscript{P}}^h =\{ K^{i} \}_{i=1,\ldots ,n_{el}}\) of \(n_{el}\) conforming elements \(K^i\) such that

$$\begin{aligned} \varOmega = \bigcup \limits _{i=1}^{n_{el}}\,K^{i},\quad \hbox {and} \quad \bigcap \limits _{i=1}^{n_{el}} K^{i} = 0, \end{aligned}$$
(37)

where every element \(K^{i}\) is the image of the reference element \({\varvec{I}}\) by a non-singular bijective mapping \({\mathbf{x}} = \fancyscript{H}^{i}(\varvec{\xi })\) from physical space x to computational space \({\varvec{\xi }}\). \({\mathbf{J}}=\hbox {d}{\mathbf{x}}/\hbox {d}{\varvec{\xi }}\) is the transformation Jacobian matrix. A two-dimensional example of the map is represented in Fig. 7.

Fig. 7
figure 7

Mapping from reference, \((\xi ,\eta )\), to physical space, \((x,z)\). \(\forall K^{i}\,\in \,{\fancyscript{P}}^h:\quad K^{i} = \fancyscript{H}^{i}(I)\)

The need for mapping is purely practical and forms the foundations of the finite element computation. For details see [157].

3.2.1 Basis Functions: Finite Elements

Lagrange basis functions are a common choice in finite elements since they interpolate a continuous function exactly at the nodes \(x_l\). These functions, defined by \(h_k\) from now on, have the property of being piecewise continuous and are such that

$$\begin{aligned} h_k(x_l) = \delta _{kl}\quad k,l=0,\ldots ,N, \end{aligned}$$

where \(\delta _{kl}\) is the Kronecker delta.

For linear, quadratic, and cubic finite elements, the roots of the basis function along the reference element \({\varvec{I}}\) are the N + 1 equi-spaced nodes within the element. Using the definition of the Lagrange polynomials

$$\begin{aligned} h_k(\xi ) = \prod \limits _{l=0,l\ne k}^N\frac{\xi - \xi _l}{\xi _k - \xi _l}, \end{aligned}$$
(38)

in Fig. 8 we plot \(h_k\) along a reference element up to 2nd-order. A 4th-order finite element and corresponding basis function are plotted in Fig. 9 (left).

3.2.2 Basis Functions: Spectral Elements

Unlike the case of high-order finite elements, the polynomials used with spectral elements are associated with zeros that are not equi-spaced. A convenient set is represented by the Legendre–Gauss–Lobatto (LGL) points. LGL nodes \(\xi _i\) are the roots of

$$\begin{aligned} (1 - \xi ^2)P'_N(\xi ) = 0, \end{aligned}$$
(39)

where \(P_N(\xi )\) are the Nth-order Legendre polynomial whose construction by recursive formulas can be found in [173]. The polynomials that are used have the same \(\delta \)-property of the Lagrange polynomials defined above. Their analytic expression is given by

$$\begin{aligned} h_k(\xi ) = \frac{(\xi ^2-1)P'_N(\xi )}{N(N+1)(\xi -\xi _k)P_N(\xi )}, \quad k=0,\ldots ,N, \end{aligned}$$
(40)

where \(P'\) indicates differentiation with respect to x.Footnote 2 The 4th-order k-polynomials along \({\varvec{I}}= [-1,1]\) are plotted on the right panel of Fig. 9. The comparative plot (finite element on the left and spectral element on the right) is used to show that, if high-order is required, equi-spaced nodes produce unsatisfactory types of basis functions in the proximity of the edge points of the element. In other words, we lose control on the maximum and minimum values of \(h_k\) at the extrema of the element. When this happens, interpolation of any function is likely to suffer from such a condition. To show how this feature translates into the interpolation of a known analytic function, we use the following example from [115]. We define the Witch of Agnesi of unitary height as

$$\begin{aligned} z(x) = \frac{1}{1.0 + 50\,x^2}, \end{aligned}$$

where \(z(x)\) is smooth and continuous, and interpolate it using the basis functions \(\psi (x)=h_k(x)\) defined above. The test is performed by 4th-order interpolation. Equi-spaced and non equi-spaced points are used along the unitary domain. Figure 10 shows that the more the polynomial order is increased, the better the result is when LGL nodes are employed. This is tied to the definition of the Lagrange polynomials and their interpolation strength given by the Lebesgue constant. The reader is referred to [115] for more details on this issue. Roughly speaking, this analysis serves as a practical way of showing one reason for the use of LGL points in high-order simulations rather than high-order elements with evenly distributed nodes. Figure 11 is a schematic representation of two 4th-order elements in two dimensions.

Fig. 8
figure 8

Lagrange polynomials of order 1 (left) and 2 (right) along the 1D reference element \({\varvec{I}}=[-1,1]\). Clearly, they are equivalent for FE and SE

Fig. 9
figure 9

Basis functions of order 4 along the 1D reference element \({\varvec{I}}=[-1,1]\). Left the nodes within the element are equi-spaced as for classical high-order FE. Right Lagrange–Legendre polynomials of order 4 whose roots are the non-equi-spaced Legendre–Gauss–Lobatto (LGL) quadrature points. Nodal SE and DG may employ LGL or LG quadrature. However, to obtain a diagonal mass matrix then LGL is the only choice for SE, while LG can still be used for DG

Fig. 10
figure 10

Interpolation of a known function (Witch of Agnesi) using high-order interpolating functions with equi-spaced and LGL points. Left 4th-order interpolation. Right 10th-order interpolation

Fig. 11
figure 11

Nodes disposition for a two-dimensional 4th-order finite element (left), and spectral element (right)

3.3 Discontinuous Galerkin

The discontinuous Galerkin method allows the numerical solution and, therefore, the basis functions to be discontinuous at the interface between neighboring elements. For this reason, the basis function is no longer required to live in \(H^1\) but, rather, in \(L^2\). Assume \(n_i\) is the number of elements that share grid point \(i\). Then we will have \(n_i\) different values of the solution at that grid point; one coming from the computation on the left element and one on the right element, where left and right are defined with respect to the shared edge (or face in 3D). The basis functions for element \(\varOmega _e\) vanish everywhere outside the element. Hence Eq. (31) becomes a set of \(n_{el}\) independent equations for each element \(\varOmega _e\):

$$\begin{aligned} \int _{\varOmega _e} \psi \,\left[ {{\fancyscript{L}}}(q) - S(q)\right] \,d\varOmega _e = 0. \end{aligned}$$
(41)

The equations for the different elements are coupled by means of the fluxes between neighboring elements. For this purpose we write our equations in flux form:

$$\begin{aligned} {{\fancyscript{L}}}(q) = \frac{\partial q}{\partial t}+\nabla \cdot {\mathbf{F}}(q) \end{aligned}$$
(42)

where \({\mathbf{F}}\) is the flux tensor. Equation (41) becomes

$$\begin{aligned} \int _{\varOmega _{e}}\left( \frac{\partial q}{\partial t}+\nabla \cdot {\mathbf{F}}-S\right) \psi \left( {\mathbf{x}}\right) \,\text {d}\varOmega _e=0. \end{aligned}$$
(43)

with \({\mathbf{F}}={\mathbf{F}}(q)\) and \(S=S(q)\). Integration by parts leads to

$$\begin{aligned} \int _{\varOmega _{e}}\left( \frac{\partial q}{\partial t}-{\mathbf{F}}\cdot \nabla -S\right) \psi \left( {\mathbf{x}}\right) \text {d}\varOmega _e=-\int _{\varGamma _{e}}\psi \left( {\mathbf{x}}\right) {\mathbf{n}}\cdot {\mathbf{F}}\,\text {d}\varGamma _e, \end{aligned}$$
(44)

where \(\varGamma _e\) is the boundary of \(\varOmega _e\) and \({\mathbf{n}}\) is the outward pointing unit normal vector on \(\varGamma _e\). To compute the numerical solution, \(q^h\), we replace the flux in the boundary integral by a so called numerical flux \({\mathbf{F}}^{h*}\):

$$\begin{aligned} \int _{\varOmega _{e}}\left( \frac{\partial q^h}{\partial t}-{\mathbf{F}}^h\cdot \nabla -S^h\right) \psi \left( {\mathbf{x}}\right) \text {d}\varOmega _e=-\int _{\varGamma _{e}}\psi \left( {\mathbf{x}}\right) {\mathbf{n}}\cdot {\mathbf{F}}^{h*}\text {d}{\varGamma _e}, \end{aligned}$$
(45)

with \({\mathbf{F}}^h={\mathbf{F}}\left( q^h\right) \) and \(S^h=S\left( q^h\right) \). The numerical flux \({\mathbf{F}}^{h*}\) describes the flux through the discontinuous interface between neighboring elements in the same way of the finite volume (FV) method; therefore, we can choose any of the fluxes that are used with FV. For an introduction to different choices of fluxes we refer to [302]. Unlike FV, DG is relatively insensitive to the choice of the numerical flux due to the high order \((p \ge 3)\) basis functions that are used within the element. Therefore a common choice for the numerical flux is the simple Rusanov flux:

$$\begin{aligned} {\mathbf{F}}^{h*}=\frac{1}{2}\left[ {\mathbf{F}}\left( q^h_{L}\right) +{\mathbf{F}}\left( q^h_{R}\right) -\lambda \,\mathbf{\mathbf n}\left( q^h_{R}-q^h_{L}\right) \right] , \end{aligned}$$
(46)

where \(\lambda =\left| \left| \varvec{u}\right| \right| _{2}+c\) is the maximum wave speed, \(\left| \left| \varvec{u}\right| \right| _{2}=\sqrt{u^{2}+v^2+w^{2}}\), and \(c\) is the speed of sound. The subscript \(L\) denotes the index of the element \(\varOmega _e\) whereas the subscript \(R\) denotes the index of the neighboring element. There are recent approaches to incorporate fluxes that are not perpendicular to the interface between elements [327].

The integration by parts of Eq. (45) leads to

$$\begin{aligned} \int _{\varOmega _{e}}\left( \frac{\partial q^h}{\partial t}+\nabla \cdot {\mathbf{F}}^h - S^h\right) \psi \left( {\mathbf{x}}\right) \text {d}\varOmega _e=\int _{\varGamma _{e}}\psi \left( {\mathbf{x}}\right) {\mathbf{n}}\cdot \left( {\mathbf{F}}^h-{\mathbf{F}}^{h*}\right) \text {d}{\varGamma _e}. \end{aligned}$$
(47)

Using the expansion (29) of the numerical solution gives us

$$\begin{aligned} \frac{\partial {\hat{q}}_{k}}{\partial t}=-\int _{\varOmega _{e}}\hat{\psi }_{k}\left( {\mathbf{x}}\right) \left( \nabla \cdot {\mathbf{F}}^h- S^h\right) \text {d}\varOmega _e+\int _{\varGamma _{e}}\hat{\psi }_{k}\left( {\mathbf{x}}\right) {\mathbf{n}}\cdot \left( {\mathbf{F}}^h-{\mathbf{F}}^{h*}\right) \text {d}{\varGamma _e}, \end{aligned}$$
(48)

with the definition \(\hat{\psi }_{i}\left( {\mathbf{x}}\right) =\sum _{k=1}^{M^h}M_{i k}^{-1}\psi _{k}\left( {\mathbf{x}}\right) \) where \(M_{i k}=\int _{\varOmega _{e}}\psi _{i}\left( {\mathbf{x}}\right) \psi _{k}\left( {\mathbf{x}}\right) \text {d}\varOmega _e\) are the components of the mass matrix \({M}\).

Second order derivatives in the differential operator \({{\fancyscript{L}}}\) can be discretized with DG by transforming the problem into a coupled set of equations containing only first order derivatives as done in [54]. This approach is called the local DG (LDG) method. Other choices for discretizing second order operators are given in [266].

3.4 EBG Methods in Atmospheric and Climate Modeling

3.4.1 Continuous Galerkin

The use of continuous Galerkin methods in atmospheric simulations began five decades ago with the work on finite elements by Holmstrom [137] and Simons [269] in the 1960s. This continued in the 1970s (e.g. [69, 70, 101]) and was followed by an extensive production of articles in the 1980s and 1990s with, e.g., Staniforth [279], Beland et al. [19], or Burridge et al. [41], who set the foundations of the operational Global Environmental Multiscale (GEM) model [65, 326] of the Canadian Meteorological Center & Meteorological Research Branch (CMC-MRB). In the UK, Untch and Hortal [305] used finite elements for the vertical discretization of a semi-Lagrangian transport scheme and introduced it in the operational version of the European Centre for Medium-Range Weather Forecasts (ECMWF) global spectral model (IFS), with great improvement with respect to the FD version of the code. In the domain of Geophysical Fluid Dynamics, more Galerkin-type models have appeared since the beginning of the new millennium. In, e.g., [34, 193, 228] or [119], different variational formulations mostly based on spectral elements are employed to solve the shallow water equation or the Navier–Stokes and Euler equations for non-hydrostatic atmospheres. More examples of element-based models are the SE-Core [296], CAM-SEM by [74], the SE/DG Nonhydrostatic Unified Model for the Atmosphere (NUMA) [117, 176], the SEM Community Earth System Model (CESM) [73], the finite element multi physics model ALYA in atmospheric-mode [212, 213].

Possibly, the spectral element method is the most common EBG method used today to develop the next generation research NWP models. Spectral elements were first introduced in geophysical fluid dynamics by Ma [204]. Ma built on the pioneering work of Patera [235], who developed the spectral element method for incompressible CFD and developed an ocean model based on the shallow water equations. In particular Ma stressed the ability of SEM to (1) accurately simulate flows with high Rossby numbers and (2) simulate phenomena with long time durations due to SEM’s low dissipation and dispersion error. Although Ma’s primary application was coastal ocean modeling, he was explicitly aware of the intimate connection between ocean models and atmospheric/climate models and predicted that his work would serve as a basis for atmosphere and climate studies. Iskandarani [158] built on Ma’s ocean model, showing that the accuracy of spectral elements successfully suppressed spurious pressure modes in ocean flows. Both Ma’s and Iskandarani’s work utilized Cartesian grids suitable for oceanographic problems. Two years later, Taylor et al. [294] developed the first SWE spectral element model using spherical geometry. In particular, this work used a cubed sphere with quadrilateral elements which built on the geometrical flexibility of spectral elements. The cubed sphere grid circumvented the well-known pole problem that is present for traditional latitude–longitude grids (we will get back to this point shortly). At the same time, a spectral element shallow water code was developed by Haidvogel et al. [131]. As a result, this methodology was extended to solving the hydrostatic primitive equations on the sphere in the Spectral Element Atmospheric Model (SEAM) [98]. By this time, massively distributed memory clusters had become available, thus motivating the development of highly scalable numerical methods; Fournier and coworkers noted the high parallel efficiency of spectral elements, thus making SEM a suitable numerical method for the dynamical core of climate models, which are computationally expensive. Taylor’s SEM solver later became the basis for the NCAR’s high-order method modeling environment (HOMME), which facilitated the rapid development of next generation atmospheric global circulation models (AGCM).

Taylor’s SEM model utilized spherical coordinates to solve atmospheric problems on the sphere; however, since the sphere is a sub-manifold of three-dimensional space, Cartesian coordinates can also be used to solve problems on spheres provided that the fluid is constrained to lie on the sphere using a Lagrange multiplier [63]. Although computationally more expensive because there is an extra degree of freedom, this approach has two advantages over spherical coordinates: (1) analytical Jacobian transformations for the grid do not need to be derived and (2) any spherical grid (including the cubed sphere grid) may be utilized, thereby liberating the solver from the grid. Giraldo utilized this Cartesian SEM approach to solve the SWE using an Icosahedral grid in [113]. Collaborating with the Naval Research Lab, he applied this framework to solve the hydrostatic primitive equations in [119] and develop the U.S. Navy’s spectral element atmospheric model (NSEAM) including a semi-implicit solver [114]. As we shall see in the next section, Giraldo and coworkers developed DG concurrently with SEM solvers, thereby exposing the common themes and machinery shared by the two methods. To prove how arbitrary grids can be used to solve the SWE on the sphere, a unified continuous/discontinuous Galerkin model has been recently presented in [211]. Using this model, the equations were solved in Cartesian coordinates using static and dynamic adaptivity using both, continuous and discontinuous Galerkin approximations on the grids illustrated in Fig. 12.

Fig. 12
figure 12

Examples of spherical grids for the solution of the SWE. From left to right, classical cubed-sphere, a reduced longitude–latitude, and icosahedral grid. These are high order grids with curved elements on the spherical shell. Figures adapted from [211] with permission of John Wiley & Sons

3.4.2 Discontinuous Galerkin

This subsection presents a short overview of the important steps in the historical development of DG towards atmospheric applications. A more general overview of the history of DG until the year 2000 can be found in [52]. Some information can also be found in the textbook by Hesthaven and Warburton [133].

The discontinuous Galerkin method was first introduced by Reed and Hill [248]. Reed and Hill were working on the solution of the stationary linear transport equation of neutrons with a constant velocity \(\mathbf{v }\) in two dimensions:

$$\begin{aligned} \mu \frac{\partial \psi }{\partial x}+\eta \frac{\partial \psi }{\partial y}+\sigma \,\psi \left( x,y,\mu ,\eta \right) =S\left( x,y,\mu ,\eta \right) , \end{aligned}$$
(49)

where \(\psi \) is the angular neutron flux in the direction \((\mu ,\eta )^{\text {T}}=\mathbf{v }/||\mathbf{v }||_{2}\), the total macroscopic cross-section for neutron–nucleus interaction \(\sigma \) and the source term \(S\). The source term describes scattering, fission and inhomogeneous sources. This transport equation was solved by Reed and Hill on a triangular mesh. They compared a method allowing discontinuities at the interfaces between different triangles with a continuous method and found that DG was computationally more expensive but slightly more accurate and much more robust. One of the main advantages of the discontinuous method was that it showed fewer oscillations at the boundary between areas with two different values of the cross-section \(\sigma \). This allowed Reed and Hill to reduce the resolution of the method while still obtaining a less oscillatory result than the continuous method allowed.

The discontinuous method introduced in [248] was analyzed theoretically by Lesaint and Raviart [198]. In this early work the discontinuous Galerkin method was applied to linear equations. The first application of DG to nonlinear conservation equations is attributed to Chavent and Salzano [48]. Chavent and Salzano used first order polynomials for spatial discretization and a simple explicit Euler method for time discretization. A von Neumann analysis showed that this approach is unstable if the time step \(\varDelta t\) is proportional to the grid spacing \(\varDelta x\). This approach becomes stable only for \(\varDelta t\propto \varDelta x^{3/2}\). This severe restriction for the time step with explicit time integration was solved by the development of Runge–Kutta discontinuous Galerkin methods (RKDG) by Cockburn and Shu [53].

Discontinuous Galerkin methods were applied to parabolic equations by Jamet [163], displacement of oil by water in a porous slab by Chavent and Salzano [48], viscoelastic flows by Fortin and Fortin [97] and to the solution of the Maxwell equations by Warburton and Karniadakis [312].

The first numerical experiment using DG for the Euler equations of gas dynamics is reported in the 1991 work by Bey and Oden [26] and by Bassi and Rebay [12, 13]. The first application of DG to geophysical problems started with the work on shallow water equations by Schwanenberg et al. [264], followed by Giraldo et al. [116] who introduced inexact integration for DG and extended this to the sphere. In 2008, DG was finally used to solve the Navier–Stokes equations of non-hydrostatic atmospheric flows by Giraldo and Restelli [118]. Discontinuous Galerkin methods have not been used in operational global circulation models (GCM) yet. However, a hydrostatic GCM was presented by Nair et al. [225], followed in 2011 by the German DG model DUNE [34], and, in 2012, by the scalable non-hydrostatic model by Kelly and Giraldo [176]. The linear scalability properties of DG was shown by Wilcox et al. [320] and Kelly and Giraldo [176].

As mentioned earlier, the geometrical flexibility of EBG, including the potential for adaptivity, is a major strength of EBG methods–in particular, DG. An adaptive DG model based on non-conforming quads was introduced in [184]. This paper introduced a tree-based adaptive mesh refinement (AMR) strategy, demonstrated the potential for DG to achieve order-of-magnitude efficiency gains, which are difficult, if not impossible, with traditional finite difference or spectral transform methods. Dynamic adaptivity, using conforming triangular elements, was also explored in [224]. These two papers are not the first ones that report on grid adaptivity using EBG; however, they seem to be the first publications on this topic in the framework of non-hydrostatic atmospheric simulations using DG.

4 Stabilization of EBG for Advection-Dominated Problems

The straight numerical approximation of problems with dominating advection may result in unphysical oscillations in the solution. Finite and spectral element methods are no exceptions [170] and an error estimate of the standard Galerkin approximation of the problem proves it (see, e.g., [247]). Here, we show it by deriving the 1D finite element solution of the advection–diffusion problem with Dirichlet boundary conditions. The problem consists in solving

$$\begin{aligned} \frac{\partial q}{\partial t}+ {{\fancyscript{L}}}(q)\, = \,S, \end{aligned}$$
(50)

where

$$\begin{aligned} {{\fancyscript{L}}}(q) = {\mathbf{u}}\cdot \nabla q - \nabla \cdot \left( \nu \nabla q\right) , \end{aligned}$$
(51)

by linear \((p = 1)\) finite elements. In (65), \(\nu \) is a positive, uniform, constant diffusivity coefficient, \({\mathbf{u}} = (u,0,0)\) is the velocity vector, and \(S\) is a source function. The domain of interest is the unit interval \(\varOmega = [0,1]\). A uniform partition \({\fancyscript{P}}^h\) of \(\varOmega \) with \(N+1\) nodes \(p_k\), \(k=0,\ldots ,N\), and \(n_{el}=N\) elements \(K\) of length \(h=\Vert \mathbf{p}_k - \mathbf{p}_{k-1}\Vert _{2}\) is assumed. For uniqueness of the solution, \(q(0) = 0\) and \(q(1) = 1\) are the assigned boundary conditions. Let \(W^h\subseteq \,H^1\) be the space of piece-wise linear Lagrange polynomials of class \(\fancyscript{C}^0\) (Fig. 8, left) The projection of Eq. (50) onto \(W^h\) by the \(L^2\) scalar product and integration by parts of the diffusion term yields the equation

$$\begin{aligned} \int _{\varOmega ^{h}}\psi ^{h}{\mathbf{u}}\cdot \nabla q^{h}\,d\varOmega ^{h} + \int _{\varOmega ^{h}}\nu \nabla \psi ^{h}\cdot \nabla q^{h}\,d\varOmega ^{h} = \int _{\varOmega ^{h}}\psi ^{h} S\,d\varOmega ^{h}\quad \forall \; \psi ^{h}\,\in \,W^{h}, \end{aligned}$$
(52)

\(q^h\) is expanded by (29). When \(S = 0\), the 1D finite element discretization of (52) yields the discrete equation

$$\begin{aligned} \left( \frac{u}{2} - \frac{\nu }{h}\right) \,\hat{q}_{k+1} + \frac{2\nu }{h}\,\hat{q}_k - \left( \frac{u}{2} + \frac{\nu }{h} \right) \,\hat{q}_{k-1} = 0,\quad k=1,\ldots ,N-1. \end{aligned}$$
(53)

Equation (53) is equivalent to the 1D discretization of the same problem by second-order finite differences. After algebraic manipulation and given the definition of the element Péclet number

$$\begin{aligned} Pe = \frac{||{\mathbf{u}}|| \,h}{2\nu }, \end{aligned}$$
(54)

Equation (53) is written as a function of (54):

$$\begin{aligned} \left( Pe - 1\right) \,\hat{q}_{k+1} + 2\,\hat{q}_k - \left( Pe + 1 \right) \,\hat{q}_{k-1} = 0,\quad k=1,\ldots ,N-1. \end{aligned}$$
(55)

It represents a tridiagonal linear system in the unknowns \(q_k,\quad k=1, N-1\), whose solution is the function (see [246])

$$\begin{aligned} \hat{q}_k = \frac{\left( \frac{1+Pe}{1-Pe}\right) ^k - 1}{\left( \frac{1+Pe}{1-Pe}\right) ^{N} - 1}, \quad k=1, N-1. \end{aligned}$$
(56)

The power of \((1+\hbox {Pe})/(1-\hbox {Pe})\) at the numerator produces an oscillatory behavior of the solution whenever \(Pe>1\), as it is shown in Fig. 13. \(Pe\) is a linear function of \(h\) so that the grid, in principle, could be always constructed in such a way that, for a given value of \(u\) and \(\nu \), \(Pe\le 1\). However, this is not viable for most real problems because of the extremely high number of grid points that may be necessary to achieve such a condition. The only way to solve the problem of numerical instabilities in the Galerkin solution of transport problems with dominant advection remains that of stabilization by proper means. A certain category of stabilization methods applied to the multi-dimensional advection–diffusion equation will be described in the next section.

Fig. 13
figure 13

Finite element solution of the advection diffusion problem (50) using uniform, linear elements. \(u=10\), \(\nu =0.1\), in a domain of unitary total length. With these values, the global Péclet is \(Pe_{g}=50\). The plot shows the approximate solutions obtained for different grid spacing \((Pe=2.5\) and \(Pe=0.625)\) with and without stabilization. It is shown how the computed solution can approach the exact solution by either increasing the number of grid points \((Pe=0.625)\), or by maintaining the grid sufficiently coarse but with the addition of a stabilizing term (How this term is built has not been shown yet, but the result gives a hint on what to expect from it)

4.1 Viscosity-Based Stabilization Techniques

Regardless of the numerical method that an atmospheric model is built on, dissipation of some sort is added for various reasons; stabilization is one of them. As is pointed out in [161], the most common mean of dissipation that is found in current research and operational weather forecast models is artificial diffusion (or hyper-diffusion, HV from now on) in the form of

$$\begin{aligned} HV = \int _{\varOmega _h}(-1)^{\alpha +1}\psi ^h\nabla ^{\alpha }\cdot ({{\varvec{\nu }}_{2\alpha }}\nabla ^{\alpha }q^h)\,d\varOmega ^h \end{aligned}$$
(57)

where \(\alpha \) is a positive integer and \({{\varvec{\nu }}_{2\alpha }}\) is the matrix of the diffusivity coefficients that may vary along different grid directions [125]. When \(\alpha =1\), HV reduces to second-order Artificial Viscosity [169]. HV is easy to implement and is robust. These are features that make it attractive for models that are not allowed to break during a forecast simulation. HV is found in other fields of computational fluid dynamics as well; the work by [11] is an example where HV is used to stabilize the simulation of high speed flows. One justification that practitioners in NWP give to HV for \(\alpha >1\) is its scale selectiveness; it damps higher order frequencies that are usually the result of numerical error and dispersion, but tends to leave the important and physical modes untouched. However, even if HV is indeed scale-selective, it is not physical. Since the artificial term given by Eq. (57) is a perturbation to the original equation, if this perturbation does not go to zero as \(h\rightarrow 0\), the exact solutions of the original and of the perturbed problems are not equivalent. As is evident in Fig. 14, these methods add an uncontrolled and non-localized diffusion that yields a certain deterioration of the solution. For a stabilizing scheme to preserve the shape of the tracer, dissipation should be avoided in the direction normal to the flow and only act in the direction parallel to the flow [55, 146, 152].

To preserve the correct physical dimensions of the hyper-viscous term, the value of \(\nu _{2\alpha }\) must scale with respect to \(\alpha \) and the grid spacing. Its selection not only is non-trivial, but has a great impact on the solution of the problem. Jablonowski and Williamson [161] clearly state that “[...] the choice of the \(\nabla ^2, \nabla ^4\) coefficient is most often motivated by empirical arguments and chosen in a somewhat arbitrary manner [...].” More advanced, and by now classical, stabilizing schemes for finite elements, spectral elements, and discontinuous Galerkin are described in the following subsections.

Fig. 14
figure 14

Pure transport of a square wave in a 2D doubly-periodic channel. The velocity is directed along the \(x\)-axis (the bottom-right edge of the squared domain in the three plots). From left to right stabilization achieved by \(\nabla ^2\), \(\nabla ^4\), and using a variational multiscale scheme [143]. Adapted from Fig. 21 of [210], with permission by Elsevier

4.2 Filtering of (High-Order) EBG Methods

Both CG and DG, like all higher-order methods, are limited by Godunov’s Theorem: all linear numerical methods for solving PDEs that do not generate additional extrema (so-called monotone schemes), are all first-order accurate. As an immediate consequence, high-order CG and DG are not monotonicity preserving, especially so near sharp gradients. In NWP, this problem is especially problematic for tracer transport, where mass fractions may become negative due to these artificial extrema. Finally, spectral elements and discontinuous Galerkin methods on quadrilateral and hexahedral elements typically use inexact integration to diagonalize the mass matrix; this approximate integration introduces errors which must be stabilized by a filter or a more sophisticated scheme such as the VMS method discussed later.

To circumvent these problems, filters were introduced in the development of both spectral methods [96] and, later, spectral elements in [32, 94]. They were also applied to discontinuous Galerkin methods in [116, 118]. Filters reduce the aliasing that occurs in the higher-order modes of the solution that are largely responsible for Gibbs oscillations; hence, filters act upon the modal representation of the solution. Once the offending high modes are eliminated, the modal solution is inverse transformed to physical, or nodal space. Hence, the spectral filtering operation consists of a three-step process: (1) transform the nodal solution to a modal solution, (2) apply a low-pass filter to eliminate the largest spatial frequencies, and (3) inverse transform the filtered modal solution to nodal space. For SE and DG that utilize LGL basis functions, a modified Legendre transform may be utilized for steps 1 and 3; in addition, it is possible to perform these operations on an element by element basis, thereby eliminating the need for a global assembly operation.

Ideally, the spectral filter applied during step 2 should eliminate all non-physical oscillations while faithfully preserving the physics of the solution. In practice, satisfying both of these requirements is not possible. An effective filter was developed by Boyd [31, 32], based on previous theoretical work by Vandeven [307], resulting in an erfc-log filter for polynomials of order \(p\). Letting \(\bar{\theta } = k/p - 1/2\), the filter \(\sigma (k ; p)\) is equal to unity if \(\bar{\theta } \le 0\) and

$$\begin{aligned} \sigma (k ; p) = \frac{1}{2} \text{ erfc } \left[ 2 \sqrt{p} \bar{\theta } \sqrt{\frac{-\log \left( 1 - 4 \bar{\theta }^2\right) }{4 \bar{\theta }^2}} \right] . \end{aligned}$$
(58)

Equation (58) rapidly eliminates all higher-order modes, and completely eliminates the highest order node.

Most DG methods utilize exact integration and therefore may not require filter-based stabilization. It should be kept in mind that filtering may not be sufficient to stabilize the solution; for this reason, the dissipation schemes described so far and later in this section are often considered as a possible option for high order methods as well [44, 45]. Finally, it is difficult to derive idempotent filters; when the filter is not idempotent [172], the solution may vary based on how many times the filter is applied along the simulation.

4.3 Towards Consistent Stabilization Methods

Streamline Upwinding (SU) [146], Streamline Upwind Petrov–Galerkin (SUPG) [38], Galerkin/Least-Squares (GLS) [150], Galerkin methods with bubble functions [9, 35, 37], or sub-grid projection methods [126] are some of the most used stabilization techniques for finite elements. To bypass some drawbacks of streamline-type schemes such as these, much work was done in the same years on shock capturing, as found in [132, 170, 171]. The Taylor–Galerkin method [78], the Characteristic-Galerkin formulation [241], and the Characteristic-Based Split (CBS) method [331, 332] are more ways for FE stabilization that, however, rely on a reasoning that has no relationship with the methods we are interested in reviewing in this article. We mention them here but we will not delve into their description.

4.3.1 Streamline-Upwind (SU)

The problem of isotropic smearing of the solution mentioned above was partially solved by [146] with the construction of the Streamline-upwind method, although the idea of finite element upwinding can already be found 1 year earlier with the work by [287] and then continued in [288, 289]. With SU, stabilization is projected in the direction of the flow only, as visible from

$$\begin{aligned} b_{SU} = \int _{\varOmega _{h}}\tau {\mathbf{u}}\cdot \nabla \psi ^{h}{\mathbf{u}}\cdot \nabla q^{h}\,d\varOmega ^{h}. \end{aligned}$$
(59)

However, like HV, SU is not numerically consistent either in the sense that no residual information is used to construct this perturbation term. The Streamline-upwind/Petrov–Galerkin (SUPG) method described below is the consistent evolution of SU and will be among the most common methods of stabilization of finite elements used since its introduction.

4.3.2 Streamline-Upwind/Petrov–Galerkin (SUPG)

The SUPG method was designed by [38] and was later generalized for multidimensional problems by [151]. It is a consistent alternative to the HV approach or to the overly diffusive SU. Its use has been ubiquitous in the solution of transport problems by the finite element method (e.g., [35, 100, 156, 295]). The application of this strategy to higher-order schemes was first tested for spectral methods by Canuto et al. [43], [44, 46], [45], and later by Hughes et al. [148] using non-uniform rational B-splines (NURBS). SUPG is a Petrov–Galerkin method in that it does not assume that the basis and test functions live in the same space. We introduce the additional space \(\varPsi ^h\) of test functions \(w^h\) defined by

$$\begin{aligned} \varPsi ^h \doteq \left\{ w^h \,:\, w^h = \psi ^h + \tau {\mathbf{u}}\cdot \nabla \psi ^h\,:\quad \psi ^h\,\in \,W^h \right\} . \end{aligned}$$

We have the problem of finding the function \(q^h\, \in \, W^h\)   such that

$$\begin{aligned} \int _{\varOmega ^{h}}w^{h}{\mathbf{u}}\cdot \nabla q^{h}\,d\varOmega ^{h} - \int _{\varOmega ^{h}}w^{h}\nabla \cdot (\nu \nabla q^{h})\,d\varOmega ^{h} = \int _{\varOmega ^{h}}w^{h}\,S^{h}\,d\varOmega ^{h}\quad \forall \; w^{h}\,\in \,\varPsi ^{h}. \end{aligned}$$
(60)

Some algebra and rearrangement of (60) yields the problem of finding \(q^h\, \in \, W^h\)   such that

$$\begin{aligned} \underbrace{\int _{\varOmega ^{h}}\psi ^{h}{\mathbf{u}}\cdot \nabla q^{h}\,d\varOmega ^{h} + \int _{\varOmega ^{h}}\nu \nabla \psi ^{h}\cdot \nabla q^{h}\,d\varOmega ^{h}}_{\mathrm{Galerkin}}\,+\,b_{SUPG} =\underbrace{\int _{\varOmega ^{h}}\psi ^{h}\,S^{h}\,d\varOmega ^{h}}_{\mathrm {Galerkin}} \quad \forall \; \psi ^{h}\,\in \,W^{h}, \end{aligned}$$
(61)

where

$$\begin{aligned} b_{SUPG} = \int _{\varOmega } \left( {\mathbf{u}}\cdot \nabla \psi ^{h} \right) \tau \,\underbrace{ \left[ {\mathbf{u}}\cdot \nabla q^{h} - \nabla \cdot \left( \nu \nabla q^{h}\right) - S \right] }_{{{\fancyscript{L}}}(\hbox {q}^{\mathrm{h}}) - \hbox {S}}\,d\varOmega ^{h} \end{aligned}$$
(62)

is the consistent SUPG stabilizing term. In (62), \({\mathbf{u}}\cdot \nabla q^h -\nabla \cdot \left( \nu \nabla q^h\right) - S\) is the residual of (50) and \(\tau \) is the stabilization parameter. The definition of \(\tau \) that yields a nodally exact SUPG solution with continuous piecewise linear finite elements is derived in [50] from 1D analysis. Its generalization to multi-dimensional problems is given by the simple substitution of \(u\) with \(||{\mathbf{u}}||\), although, in multi-dimensions this does not necessarily imply nodal exactness. With respect to higher order elements, a thorough analysis of \(\tau \) for quadratic elements is given by [59]. In the context of VMS, different definitions exist for parameter \(\tau \), some of them are discussed in 4.4. For a brief review on SUPG, the reader is also referred to [145] and the report [102].

4.3.3 Galerkin/Least-Square (GLS)

A generalization of SUPG was obtained by [150] as

$$\begin{aligned} b_{GLS} = \int _{\varOmega } \underbrace{\left[ {\mathbf{u}}\cdot \nabla \psi ^h - \nabla \cdot \left( \nu \nabla \psi ^h\right) \right] }_{\mathrm {{{\fancyscript{L}}}(\psi ^h)}}\tau \,\underbrace{ \left[ {\mathbf{u}}\cdot \nabla q^h - \nabla \cdot \left( \nu \nabla q^h\right) - S \right] }_{{{{\fancyscript{L}}}(\hbox {q}^{\mathrm{h}}) - \hbox {S}}}\,d\varOmega ^h. \end{aligned}$$
(63)

In analogy with the findings of [82] to stabilize the Stokes equation, a sign change in the Laplace term of the stabilizing term in the perturbed equation proved to yield better stabilization characteristics (more accurate results) than the original generalized SUPG (or GLS) method [100]. In (63), for better properties, instead of using the differential operator \({{\fancyscript{L}}}\), the method should use the negative part of the adjoint \({{\fancyscript{L}}}^*\) of the original operator \({{\fancyscript{L}}}\). We have that the last perturbation term of the original AD equation should be

$$\begin{aligned} b = -\int _{\varOmega } {{\fancyscript{L}}}^*(\psi ^h)\tau \,\underbrace{ \left[ {\mathbf{u}}\cdot \nabla q^h - \nabla \cdot \left( \nu \nabla q^h\right) - S \right] }_{{{{\fancyscript{L}}}(\hbox {q}^{\mathrm{h}}) - \hbox {S}}}\,d\varOmega ^h, \end{aligned}$$
(64)

where

$$\begin{aligned} {{\fancyscript{L}}}^* = -{\mathbf{u}}\cdot \nabla - \nabla \cdot (\nu \nabla ) \end{aligned}$$
(65)

is the adjoint of \({{\fancyscript{L}}}\).

Based on what was learned on stabilization of the scalar advection–diffusion equation, researchers in fluid dynamics applied these methods and their evolution to the stabilization of fluid problems. Thanks to the work of [143, 149], the methods that we have just described have been recognized to belong to the same family known as the family of Variational Multiscale Stabilization, or VMS. The VMS approach is summarized below for scalar advection problems, and will be derived and discussed for the Euler equations in Sect. 4.7.

4.4 Variational Multiscale Stabilization (VMS)

In 1995 and 1996, groups of researchers lead by Hughes [143] and Brezzi [36] proposed a theory to explain the reasons of instabilities and a new way to attack the problem. They concluded that the unresolved scales (the scales that cannot be captured by the computational grid) are responsible for the numerical instabilities of the Galerkin solution of the differential problem. The analysis, that continues with [155] and [149], forms the unifying theory of all stabilized finite element methods. According to this theory, stabilized methods are subgrid scale models where the unresolved scales are intimately related to the instabilities at the level of the resolved scales, and thus should be used in the construction of the stabilization term. More specifically, in the formulation of the discrete problem, the effects of the unresolved scales must be introduced by modeling them on the grid. These schemes are known as Variational Multiscale Stabilization (VMS) method. Generally speaking, the stabilization term of VMS corresponds to \(b\) defined in (64).

VMS has been extensively applied to the solution of the advection–diffusion/advection–diffusion–reaction equations (e.g. [58, 61, 141, 143, 149]), and to the solution of the Navier–Stokes equations for incompressible flows (e.g. [6, 17, 56, 57, 123, 153]). Recently, it was applied to spectral elements in the context of atmospheric flows in [209, 210]. In Sect. 4.7 a review of VMS for the compressible Euler equations is found.

The multiscale description of the stabilization scheme relies on the splitting of the solution into a resolved, \(q^h\), and a sub-grid, unresolved component, \(\tilde{q}\), to give \(q=q^h + \tilde{q}\). Let \(W = W^h \oplus \tilde{W}\) be the space decomposition such that \(\tilde{W}\) completes \(W^h\) in \(W\). This translates into the decomposition of the solution variables \(q=q^h + \tilde{q}\), and of the basis functions \(\psi = \psi ^h + \tilde{\psi }\). Substituting the decomposition into the general weak form of Eq. (50),

$$\begin{aligned} \left( \psi ,\frac{\partial q}{\partial t}\right) + a(\psi , q) = (\psi , S)\quad \forall \;\psi \,\in \,W, \end{aligned}$$
(66)

where \((\cdot ,\cdot )\) is the \(L^2\) inner product and \(a(\cdot ,\cdot )\) is a bilinear form that satisfies

$$\begin{aligned} a(\psi , q) = (\psi ,u\nabla q) \, + \, \nu \, (\nabla \psi ,\nabla q), \end{aligned}$$

and anticipating that we will consider the quasi-static approximation \(\partial _t\tilde{q}=0\) [57], we obtain:

$$\begin{aligned} \left( \psi ^h+\tilde{\psi },\frac{\partial q^h}{\partial t}\right) + a(\psi ^h + \tilde{\psi }, q^h + \tilde{q}) = (\psi ^h + \tilde{\psi }, S)\quad \forall \;\psi ^h\,\in \,W^h,\,\tilde{\psi }\,\in \,\tilde{W}. \end{aligned}$$
(67)

By virtue of the linear independence of \(\psi ^h\) and \(\tilde{\psi }\) we can first take \(\tilde{\psi } = 0\) and then \(\psi ^h=0\) and find the split problem:

$$\begin{aligned}&\left( \psi ^h, \frac{\partial q^h}{\partial t} \right) + a(\psi ^h, q^h) + a(\psi ^h, \tilde{q}) = (\psi ^h, S)\quad \forall \;\psi ^h\,\in \,W^h \end{aligned}$$
(68a)
$$\begin{aligned}&\left( \tilde{\psi }, \frac{\partial q^h}{\partial t}\right) + a(\tilde{\psi }, q^h) + a(\tilde{\psi }, \tilde{q}) = (\tilde{\psi }, S)\quad \forall \;\tilde{\psi }\,\in \,\tilde{W}. \end{aligned}$$
(68b)

In the subgrid Eq. (68b) we come back to the original differential operator \({{\fancyscript{L}}}\) from Eq. (50). We assume that \(\tilde{\psi }(\partial K) = 0\) and \(\tilde{q}(\partial K) = 0\), for each element \(K\) of the grid and, following [143], in (68a) we integrate by parts the bilinear forms that depend on the subgrid scale and find:

$$\begin{aligned}&\left( \psi ^{h}, \frac{\partial q^{h}}{\partial t} \right) + a(\psi ^{h}, q^{h}) + ({{\fancyscript{L}}}^*\psi ^{h}, \tilde{q}) = (\psi ^{h}, S)\quad \forall \;\psi ^{h}\,\in \,W^{h} \end{aligned}$$
(69a)
$$\begin{aligned}&\left( \tilde{\psi }, \frac{\partial q^{h}}{\partial t}\right) + (\tilde{\psi }, {{\fancyscript{L}}}q^{h}) + (\tilde{\psi }, {{\fancyscript{L}}}\tilde{q}) = (\tilde{\psi }, S)\quad \forall \;\tilde{\psi }\,\in \,\tilde{W}, \end{aligned}$$
(69b)

where \({{\fancyscript{L}}}^*\) [in Eq. (65)] is the adjoint operator of \({{\fancyscript{L}}}\).

4.5 Approximation of the Sub-grid Scales

The unresolved quantity \(\tilde{q}\) has not been defined yet. Equation (69b) is used as the starting point to approximate \(\tilde{q}\). By re-arranging the terms in (69b), the equation for the subgrid scales is found,

$$\begin{aligned} (\tilde{\psi },{{\fancyscript{L}}}( \tilde{q}))\, =\, (\tilde{\psi },R(q^h))\quad \forall \;\tilde{\psi }\,\in \,\tilde{W}, \end{aligned}$$
(70)

where

$$\begin{aligned} R(q^h)=S - \frac{\partial q^h}{\partial t} - {{\fancyscript{L}}}(q^h) \end{aligned}$$
(71)

is the residual of the original equation. The strong form of (70) is considered on each element \(K\)

$$\begin{aligned} {{\fancyscript{L}}}( \tilde{q}) = R(q^h), \end{aligned}$$
(72)

and \(\tau \approx {{\fancyscript{L}}}^{-1}\), an algebraic approximation of the inverse of the differential operator \({{\fancyscript{L}}}\) is defined. Then the sub-scale has the form

$$\begin{aligned} \tilde{q} = \tau \,R(q^{h}). \end{aligned}$$
(73)

Expression (73) is plugged into equation (69a), to find the expression for the VMS stabilized Galerkin method as follows: Find \(q^h\,\in \,W^h\) such that

$$\begin{aligned} \left( \psi ^h, \frac{\partial q^h}{\partial t}\right) + a(\psi ^h, q^h) + ({{\fancyscript{L}}}^*(\psi ^h),\tau \,R(q^h)) = (\psi ^h,S)\quad \forall \;\psi ^h\,\in \,W^h. \end{aligned}$$
(74)

Equation (74) differs from Eq. (66) by the additional term that models the subgrid scales. The extra term is the viscous-like contribution that stabilizes the equation.

Different formulations for \(\tilde{q}\) are found in the literature, some of them are reviewed in 4.5.14.5.3. All of them depend on the definition of the stabilization parameter \(\tau \). The parameter \(\tau \) is a topic of active research still today, since a general definition is not known [168]. This statement is true for all the residual-based stabilization methods described so far. The quantity \(\tau \) is an intrinsic time that is built as a function of the local Pèclet number of the flow which, for stability, should respect the condition \(Pe<1\). Many problems in atmospheric CFD are advection dominated, implying \(Pe\gg 1\), so that stabilization is indeed necessary for all the problems that are of any interest for atmospheric modelers.

4.5.1 Approximation Via Green’s Functions

This approach is used by Hughes and collaborators [143, 144, 149] to derive \(\tau \). In brief, they consider Eq. (72) with \(\tilde{q}=0\) on \(\partial K\), and the associated Green’s function problem for the adjoint operator

$$\begin{aligned} \left\{ \begin{array}{l} {\fancyscript{L}}^*g(x,y) = \delta (x,y)\quad \forall \; x\in K \\ g = 0\quad \text {on}\; \partial K. \end{array} \right. \end{aligned}$$
(75)

Then a uniform element-wise definition of \(\tau \) is obtained as the average value of the exact element Green’s function

$$\begin{aligned} \tau =\frac{1}{|K|^2}\int _K\int _K g(x,y)\; dK_x \,dK_y, \end{aligned}$$
(76)

where \(|K|\) is the measure (volume/area/length) of the domain. For the one-dimensional linear scalar advection–diffusion equation there is an analytical expression for the Green’s function and the stabilization parameter is computed from (76) to give

$$\begin{aligned} \tau = \frac{1}{2}\frac{h}{||{\mathbf{u}}||}\left( \coth (Pe) -\frac{1}{Pe}\right) , \end{aligned}$$
(77)

for the Péclet number (54). In [38, 147] the same expression (77) is obtained in the context of SUPG stabilization by following an error minimization criterion. For the purely advective case \((\nu =0\) and \(Pe\rightarrow \infty )\), we find

$$\begin{aligned} \tau = \frac{1}{2}\frac{h}{||{\mathbf{u}}||}. \end{aligned}$$
(78)

Instead of (76), Corsini et al. [61] propose a non-uniform \(\tau \) on each element:

$$\begin{aligned} \tau (x)=\frac{1}{|K|}\int _K g(x,y)\; dK_y. \end{aligned}$$
(79)

4.5.2 Approximation Via Fourier Analysis

The strategy of Codina et al. [58] is explained here for the multidimensional advection–diffusion Eq. (50). The starting point is to transform Eq. (72) into the Fourier space. Which is interesting because the differential operator transform, \(\widehat{{{\fancyscript{L}}}}\), is easy to invert. Let’s call \(\fancyscript{T}\) its inverse, \(\fancyscript{T}={(\widehat{{{\fancyscript{L}}}})}^{-1}\), thus the Fourier transform of the sub-scale is approximated on each element \(K\) as

$$\begin{aligned} \hat{\tilde{q}}({\varvec{\omega }}) = \fancyscript{T}({\varvec{\omega }})\,\hat{R}({\varvec{\omega }}), \end{aligned}$$
(80)

where \({\varvec{\omega }}\) is the wave number and

$$\begin{aligned} \fancyscript{T}({\varvec{\omega }})\approx \left( i\,\frac{{\mathbf{u}}\cdot {\varvec{\omega }}}{h}+\nu \,\frac{||{\varvec{\omega }}||^2}{h^2}\right) ^{-1}. \end{aligned}$$

Considering expressions (80) and (73), the Plancherel’s formula and the mean value theorem are applied to obtain an approximated value for the stabilization parameter on each element \(K\) as

$$\begin{aligned} \tau =\frac{1}{2}\frac{h}{||{\mathbf{u}}||}\frac{Pe}{Pe+1}=\left( \frac{2||{\mathbf{u}}||}{h}+\,\frac{4\nu }{h^2}\right) ^{-1}. \end{aligned}$$
(81)

For pure advection problems \((\nu =0\) and \(Pe\rightarrow \infty )\), the stabilization parameter becomes as in Eq. (78).

4.5.3 Approximation Via Bubble Functions

More options to build the stabilization parameter \(\tau \) are found in [141] for linear, quadratic, and cubic elements. The space \(\tilde{W}\) is made of bubble functions (see [9, 37]), that are vanishing functions on the boundaries of each element. The unresolved scales, \(\tilde{q}\), are defined as a function of the bubbles \(b(x)\) that are derived as described in the referenced literature. Omitting the details, \(\tau \) is a function of the bubble as:

$$\begin{aligned} \tau = \frac{1}{h}\int _{0}^{h}\, b(x)\,dx, \end{aligned}$$
(82)

that, once evaluated, yields the expression (77) for the parameter \(\tau \); the same expression encountered with the Green’s function approach.

On the steps of [141], \(\tau \) for spectral elements of arbitrary order and with unequally spaced element nodes was derived in [210]. The stabilization parameter \(\tau \) is built inside the element as a function of the bubbles on every segment delimited by two consecutive LGL points. The uneven spacing of the element nodes is the major difference with respect to the definitions derived in previous studies. In this case, the intrinsic time is non-uniform along the element.

4.6 Preserving Positivity

SUPG, GLS, and VMS are not monotonicity preserving. This issue is particularly important for the simulation of tracer dynamics in the atmosphere. If over- and undershoots affect the solution in the proximity of strong gradients, the net mass balance of the advected tracers, will be negatively affected. To overcome this issue, a controlled crosswind discontinuity capturing can be added to the principal stabilization scheme. For example, the method introduced in [55] was successfully adapted to high-order spectral elements in [210] for standard 2D test cases and, more recently, by [209] to support positivity in the solution of fully 3D cloud simulations. For comparison, we reproduce Fig. 29 of [210] in Fig. 15. In the case of high-order SEM, in [210] the First Order Subcell (FOS) method was introduced. FOS consists in lowering the high-order method to first order in the spectral elements that contain the discontinuity only. The FOS results are encouraging, although there is some overhead coming from looping over the linear sub-elements within the high-order elements that contain the localized over- and under-shoots.

Fig. 15
figure 15

Stable SEM solutions of the transport equations for a sharp cylinder that is transported by a rising thermal flow (i.e. the tracer is driven by the thermal flow that is modeled by the Euler equations of stratified flows). Top left filtered solution using the filter of Sect. 4.2. Top right 2nd-order artificial diffusion without discontinuity capturing. Bottom left VMS. Bottom right VMS with discontinuity capturing. Adapted from [210], with permission by Elsevier

Other positivity preserving schemes such as high-order limiters for both CG and DG are often used as well, as is shown in, e.g., the report by [328].

4.7 VMS Stabilization for the Euler Equations

VMS for compressible flows appears in [222, 251, 252]. A review of residual-based stabilization methods for compressible flows can be found in [145]. Recently, VMS was used to stabilize the FEM solution of the Euler equations of atmospheric non-hydrostatic flows in [212, 213]. VMS was derived for discontinuous Galerkin as well [154] although it has not been applied to atmospheric modeling. In the following, we then limit the analysis to continuous Galerkin (without distinguishing between FEM and SEM in its derivation). For the treatment that follows, it is convenient to express the Euler equations [i.e., the inviscid counterpart of system (1)] in compact form as

$$\begin{aligned} \frac{\partial {\mathbf{q}}}{\partial t}+\frac{\partial {\mathbf{F}}^i({\mathbf{q}})}{\partial x_i}=0,\quad {\text {i}} = 1,2,3, \end{aligned}$$
(83)

where the Einstein summation on the repeated indices is assumed, where \({\mathbf{q}}\) is the vector of the unknowns, and \({\mathbf{F}}\) is the vector of the flux quantities. Without compromising the generality of the stabilization method, gravity and Coriolis are here omitted. As usual, the problem consists in finding \({\mathbf{q}}({\mathbf{x}},t)\) that verifies Eq. (83) for all \(({\mathbf{x}},t)\in \varOmega \times {\mathbb {R}}^+\). To proceed and derive VMS applied to this set, we write the three-dimensional Euler equations in flux form for the conservative variables \({\mathbf{q}}\) and define the advective system:

$$\begin{aligned} \frac{\partial {\mathbf{q}}}{\partial t}+{\mathbf {A}}^i({\mathbf{q}})\;\frac{\partial {\mathbf{q}}}{\partial x_i}=0, \end{aligned}$$
(84)

where

$$\begin{aligned} {\mathbf {A}}^i({\mathbf{q}})=\frac{\partial {\mathbf{F}}^i}{\partial {\mathbf{q}}} \end{aligned}$$
(85)

are the Jacobian matrices. As already done in Sect. 3 for scalar problems, the variational form of Eq. (84) can then be written as

$$\begin{aligned} \int _{\varOmega ^h} {\varvec{\psi }}^h \cdot \frac{\partial {\mathbf{q}}^h}{\partial t} \; d\varOmega ^h + \int _{\varOmega ^h} {\varvec{\psi }}^h \cdot {\mathbf {A}}^i({\mathbf{q}}^h)\;\frac{\partial {\mathbf{q}}^h}{\partial x_i} \; d\varOmega ^h = 0 \quad \forall \; {\varvec{\psi }}^h\,\in \,W^h. \end{aligned}$$
(86)

As it is done in page 40, the decompositions \({\mathbf{q}} = {\mathbf{q}}^h + \tilde{\mathbf{q}}\) and \({\varvec{\psi }}={\varvec{\psi }}^h + \tilde{{\varvec{\psi }}}\) are plugged into the variational problem (86), which hence can be split into the two equations

$$\begin{aligned}&\int _{\varOmega ^h}{\varvec{\psi }}^h \cdot \frac{\partial {\mathbf{q}}^h}{\partial t} \; d\varOmega ^h + \int _{\varOmega ^h}{\varvec{\psi }}^h \cdot {\mathbf {A}}^i({\mathbf{q}})\;\frac{\partial {\mathbf{q}}^h}{\partial x_i} \; d\varOmega ^h \nonumber \\&\quad + \, \sum _{m=1}^{n_{\mathrm{el}}} \left( \int _{K^m}{\varvec{\psi }}^h \cdot \frac{\partial \tilde{{\mathbf{q}}}}{\partial t} \; dK^m + \int _{K^m}{\varvec{\psi }}^h \cdot {\mathbf {A}}^i({\mathbf{q}})\;\frac{\partial \tilde{{\mathbf{q}}}}{\partial x_i} \; dK^m \right) = 0 \quad \forall \; {\varvec{\psi }}^h\,\in \,W^h \end{aligned}$$
(87a)
$$\begin{aligned}&\sum _{m=1}^{n_{\mathrm{el}}} \int _{K^m}\tilde{\varvec{\psi }} \cdot \left( \frac{\partial \tilde{{\mathbf{q}}}}{\partial t} + {\mathbf {A}}^i({\mathbf{q}})\;\frac{\partial \tilde{{\mathbf{q}}}}{\partial x_i}\right) \; d K^m = \sum _{m=1}^{n_{\mathrm{el}}} \int _{K^m}\tilde{\varvec{\psi }} \cdot {\mathbf{R}}({\mathbf{q}}^h)\; d K^m \quad \forall \;\tilde{\varvec{\psi }}\,\in \,\widetilde{W} \end{aligned}$$
(87b)

where

$$\begin{aligned} {\mathbf{R}}=-\left( \frac{\partial }{\partial t} + {\mathbf {A}}^i({\mathbf{q}})\;\frac{\partial }{\partial x_i}\right) \end{aligned}$$
(88)

is the residual operator of the governing Eq. (84). Equation (87a) is solved numerically on the computational grid, whereas (87b) is the subgrid scale equation from which an expression for \(\tilde{{\mathbf{q}}}\) is obtained and hence plugged back into (87a). Concerning Eq. (87a) for the large scales, some assumptions should be made. For details, see the referenced literature. In the case of non-viscous problems (i.e. Euler equations), SUPG [194], GLS [267] and VMS end up having the same structure, unless the approximation of the subgrid scales, \({\tilde{\mathbf{q}}}\), is such that VMS differentiates itself from the other two schemes.

4.7.1 VMS as Implicit Large Eddy Simulation (LES)

Without entering much into this discussion, it is important to underline the fact that VMS is also used as an Implicit Large Eddy Simulation (ILES) scheme that relies on the variational projection of the original equations rather than the traditional filtering. This was first applied to incompressible turbulent flow in [17, 153]. In [91, 183, 234], a turbulent compressible flow is modeled using the VMS framework although the fine scales are modeled by a Smagorinsky model. Similarly, this is done in [60, 199]. In [306], a VMS formulation obtained by extension of the Favre averaging to general projection operators is proposed, where no explicit subgrid modeling is presented. Using SEM, VMS was used in [120] to solve turbulent incompressible flows.

4.7.2 Approximation of the Sub-grid Scales

For the Navier–Stokes equations, analogously to the advection–diffusion equation, the subgrid scale \(\tilde{\mathbf{q}}\) is computed from the subscale Eq. (87b) and has the general form of

$$\begin{aligned} \tilde{\mathbf{q}} = {\varvec{\tau }}\; {\mathbf{R}}({\mathbf{q}}^{h}), \end{aligned}$$
(89)

where \({\varvec{\tau }}\) is a diagonal matrix. Shakib et al. [267] and Hughes and Mallet [151] compute the parameter \(\tau \) for GLS to solve the compressible Euler and Navier–Stokes equations. For the same equations, Hughes and Tezduyar [156] and Le Beau and Tezduyar [194] compute \({\varvec{\tau }}\) for SUPG. Just like for the scalar case, the parameter \({\varvec{\tau }}\) has been derived in different ways by different authors, although the final expressions seldom differ greatly. In, e.g., [222], \({\varvec{\tau }}\) is derived from a Fourier analysis. Another approach involves the use of Green’s functions as done in [61]. Regardless of the definition of \({\varvec{\tau }}\), let us notice the local nature of the subscales that only exist where the residuals of the large scales are important. This, with non-constant values, marks the major difference with respect to artificial diffusion. The structure of \(\tilde{\mathbf{q}}\) for the problem of a rising thermal is shown in the top two plots of Fig. 16. By comparison with the pattern of potential temperature (bottom left plot) and horizontal velocity (bottom right plot), the structure of the sub-grid scale is clearly tied to the residual.

Fig. 16
figure 16

2D Rising thermal bubble. Top row sub-grid scales: \(\tilde{\theta }/\theta _{max}\) (top-left) and \(\tilde{U}/U_{max}\) (top-right). Bottom row potential temperature \(\theta '\) (K) (left), and horizontal velocity, \(u\) (m/s). This vertically displacing flow is triggered by the thermal perturbation \(\theta '\) of a neutrally stratified flow (i.e. uniform and constant \(\theta _0)\). The characteristic shape of the perturbation field \(\theta '\) is shown in the right panel. The plots are adapted from [212]

An example of simulation where VMS was used to stabilized both the dynamics (Euler equations) and the advection–diffusion equations of water tracers is shown in Fig. 17, and is compared against the solution obtained in [103] with a filtered high-order spectral element at an equivalent resolution.

Fig. 17
figure 17

2D squall line simulation. The cloud content is delimited by the thick black contour line in both plots. The color/grey shading in the left/right plots is the equivalent potential temperature. Both horizontal domains extend along 240 km. The left figure is adapted from [213] where a VMS stabilized FEM solution was computed with linear elements (with permission by Elsevier). The right figure is adapted from [103] (with permission of the American Meteorological Society) and the solution was computed with 8th-order SEM stabilized with a constant coefficient diffusion \((\nu =200\,\hbox {m}^2/\hbox {s})\) and a filter

4.8 Alternative Consistent Schemes: Spectral Vanishing Viscosity and Entropy Viscosity Method

Also based on a second order operator, the spectral vanishing viscosity (SVV) of [290] is a stabilizing method used by practitioners of high order spectral element and spectral Fourier methods. The idea of SVV comes from an entropy analysis of the problem at hand and is such that the added dissipation satisfies the entropy condition. For more on this, please, see [130, 173, 290] and references therein.

Also tied to the entropy equation, the entropy viscosity method was first introduced by [128]. The fundamental difference between this method and SVV is in the way the entropy equation is used. The entropy viscosity method builds the local and dynamic viscosity of the equations based on the residual of the associated entropy equation [127, 129]. In [334] we find how this regularization of the governing equations is applied to the discontinuous Galerkin method as well.

We finally comment on the use of adaptive viscosity methods for both CG [125] and DG [236]. These two methods are not consistent, but are element-based and dynamic. An adaptive artificial viscosity for non-hydrostatic modeling using DG has been recently proposed in [328].

4.9 Physics-Based Stabilization

A computationally inexpensive and numerically stable sub-grid scale model for compressible large-eddy simulation was introduced in [229] for adaptive finite elements. Due to its stabilizing properties, this method was easily adapted to the solution of low Mach number atmospheric flows via high order spectral elements in [215] and [216]. Like VMS, this method is a residual-based alternative to the more classical artificial diffusion in a way that not only is numerically consistent, but could also serve as a turbulence model. Unlike VMS, however, stabilization is attacked starting from the governing equations rather than from their numerical approximation. More specifically, the Euler equations are first filtered to separate the resolved from the un-resolved scales [108, 260]. The filtering operation leads to a new set of equations containing additional terms that are dissipative in nature and that are then modeled in some way. The steps described below (following the treatment of [216]) show how stabilization is then achieved. In LES, given a quantity \(q\) (e.g., density, velocity, potential temperature), its large scale (grid resolved) component \(\overline{q}\) is obtained via the application of the filter

$$\begin{aligned} \overline{q}({\mathbf{x}}) = \int _{\varOmega }G_{\varDelta }({\mathbf{x}} - {\varvec{\chi }})q({\varvec{\chi }})d{\varvec{\chi }}. \end{aligned}$$
(90)

Equation (90) is a spatial convolution of the filtering function \(G_{\varDelta }\)with \(q\), where \(\varDelta \) is the filter width. The filter functions can vary; the most commonly used in LES are the Gaussian, the top hat in real space, and the sharp Fourier cutoff functions [197, 240].

Remark 4.1

The barred quantities introduced in Sect. 2.1.2 have no relation with \(\overline{q}\) defined in Eq. (90) for Large Eddy Simulation (LES). LES implies a separation between the resolved and unresolved scales, whereas the splitting given in Sect. 2.1.2 was introduced for numerical convenience in the simulation of atmospheric problems and does not affect the way LES is constructed or derived.

For compressible flows, the Favre filter \(\widetilde{q} = \overline{\rho q}/{\overline{\rho }}\) [92], although not necessary, is classically introduced. The application of these filters to equations (6)—excluding the Coriolis terms for simplicity—yield the filtered system

$$\begin{aligned}&\frac{\partial \overline{\rho }\widetilde{\mathbf{u}}}{\partial t} + \nabla \cdot \left( \overline{\rho }\widetilde{\mathbf{u}}\otimes \widetilde{\mathbf{u}}\right) + \nabla \overline{p} = \nabla \cdot {\varvec{\tau }} - \overline{\rho }{\mathbf{g}}, \end{aligned}$$
(91a)
$$\begin{aligned}&\frac{\partial \overline{\rho }}{\partial t} + \nabla \cdot \left( \overline{\rho }\widetilde{\mathbf{u}}\right) = 0, \end{aligned}$$
(91b)
$$\begin{aligned}&\frac{\partial \overline{\rho }\widetilde{\theta }}{\partial t} + \nabla \cdot \left( \overline{\rho }\widetilde{\theta }\widetilde{\mathbf{u}}\right) = \nabla \cdot {\mathbf{Q}}, \end{aligned}$$
(91c)

where the two derivatives on the right-hand sides of (91a) and (91c) represent the contributions of the unresolved scales. If Favre filtering were not applied, an additional flux term would also appear on the right-hand side of Eq. (91b). With Favre, the filtered density \(\overline{\rho }\) is conserved and no modeling is required for the continuity equation. In (91a), \({\varvec{\tau }}\) is the turbulent stress tensor,

$$\begin{aligned} {\varvec{\tau }} = \overline{\rho }\left( \widetilde{\mathbf{u\otimes u}} - \widetilde{\mathbf{u}}\otimes \widetilde{\mathbf{u}}\right) , \end{aligned}$$

approximated by

$$\begin{aligned} {\varvec{\tau }} = 2 \mathbb {D}(\widetilde{{\mathbf{u}}}), \end{aligned}$$
(92)

where

$$\begin{aligned} \mathbb {D}(\widetilde{{\mathbf{u}}})=\frac{\mu _n}{2}\left( \nabla \widetilde{\mathbf{u}} + \nabla \widetilde{\mathbf{u}}^{\textsc {t}} \right) \end{aligned}$$

is the velocity deformation tensor multiplied by a dynamic coefficient \(\mu _n\) that will be defined shortly. Similarly, in (91c) \({\mathbf{Q}}\) is the kinematic heat flux defined as

$$\begin{aligned} {\mathbf{Q}} = \overline{\rho }\left( \widetilde{\theta {\mathbf{u}}} - \widetilde{\theta }\widetilde{\mathbf{u}}\right) , \end{aligned}$$
(93)

and is modeled via

$$\begin{aligned} {\mathbf{Q}} = \kappa _n\nabla \widetilde{\theta } \end{aligned}$$
(94)

Like \(\mu _n\), the definition of \(\kappa _n\) determines the method proposed in [229]. The coefficients \(\mu _n\) and \(\kappa _n\) are calculated element-wise on every high order element \(\varOmega _e\) for a Galerkin approximation of Equations (91). More specifically, for the sensible temperature \(T=\theta (p/p_0)^{R/c_p}\) and one finite/spectral element of characteristic length \(h_{\varOmega _e}\), we start by defining the dynamic viscosities

$$\begin{aligned} \mu _{\max }|_{\varOmega _e} = 0.5h_{\varOmega _e}\, \Vert |\widetilde{{\mathbf{u}}}| + \sqrt{\gamma c_{p} \widetilde{T}} \Vert _{\infty ,\varOmega _e}, \end{aligned}$$
(95)

and

$$\begin{aligned} \mu _{\mathrm{res}}|_{\varOmega _e} = h_{\varOmega _e}^2 \, \max \left( \frac{\Vert R(\overline{\rho })\Vert _{\infty ,\varOmega _e}}{\Vert \overline{\rho } - \widehat{\rho }\Vert _{\infty ,\varOmega }}, \frac{\Vert R(\overline{\rho }{\widetilde{\mathbf{u}}})\Vert _{\infty ,\varOmega _e}}{\Vert \overline{\rho }{\widetilde{\mathbf{u}}} - \overline{\rho }\widehat{\mathbf{u}} \Vert _{\infty ,\varOmega }} , \frac{\Vert R(\widetilde{\overline{\rho }\theta })\Vert _{\infty ,\varOmega _e}}{\Vert \overline{\rho }\widetilde{\theta } - \overline{\rho }\widehat{\theta } \Vert _{\infty ,\varOmega }} \right) . \end{aligned}$$
(96)

In (96) \(\widehat{\cdot }\) indicates the space average of the quantity at hand over \(\varOmega \) and the norms \(\Vert \cdot \Vert _{\infty ,\varOmega }\) at the denominator are used for normalization to preserve the correct dimension of the resulting equation. Having \(\mu _{\max }\) and \(\mu _{\mathrm{res}}\) constructed, the dynamic coefficients of the viscosity terms can be computed as

$$\begin{aligned} {\mu _{n}}|_{\varOmega _{e}} = \min \Vert \overline{\rho } \Vert _{\infty ,\varOmega _{e}}\, \left( {\mu _{\max }}|_{\varOmega _{e}}, {\mu _{\mathrm{res}}}|_{\varOmega _{e}} \right) \end{aligned}$$
(97)

and

$$\begin{aligned} \kappa _{n}|_{\varOmega _{e}}=\frac{\Pr }{\gamma -1}{\mu _{n}}|_{\varOmega _e}, \end{aligned}$$
(98)

where Pr is an artificial Prandtl number. The residuals in (96) are simply:

$$\begin{aligned}&R(\widetilde{\mathbf{u}}) = \frac{\partial \overline{\rho }\widetilde{\mathbf{u}}}{\partial t} + \nabla \cdot \left( \overline{\rho }\widetilde{\mathbf{u}}\otimes \widetilde{\mathbf{u}}\right) + \nabla \overline{p} + \overline{\rho }{\mathbf{g}}, \end{aligned}$$
(99a)
$$\begin{aligned}&R(\overline{\rho }) = \frac{\partial \overline{\rho }}{\partial t} + \nabla \cdot \left( \overline{\rho }\widetilde{\mathbf{u}}\right) , \end{aligned}$$
(99b)
$$\begin{aligned}&R(\overline{\rho }\widetilde{\theta }) = \frac{\partial \overline{\rho }\widetilde{\theta }}{\partial t} + \nabla \cdot \left( \overline{\rho }\widetilde{\theta }\widetilde{\mathbf{u}}\right) . \end{aligned}$$
(99c)

The time derivatives are to be included or the consistency of the method would be lost. An example of the stabilized spectral element solutions reported in [216] is plotted in Fig. 18, where also the results obtained using a constant coefficient Lilly–Smagorinsky model [201, 274] are given for comparison. Putting together the moist problem briefly described in Sect. 2.4 and the current LES-based stabilization, the simulation of a fully three-dimensional deep convection problem is reported in [215]; in Fig. 19, we reproduce Fig. 3 contained therein.

Fig. 18
figure 18

Stabilized solution of the density current problem [284]. Reproduced from [216]

Fig. 19
figure 19

Deep convection: 3D view of \(q_c\) (grey surface), surface velocity (vectors), and the instantaneous distribution of \(q_r\) on the ground (contours). Reproduced from Fig. 3 of [215]

The multi-scale properties of this scheme have been verified via the simulation of a turbulent flow on the sphere whose radius is that of the earth. As an example, a turbulent flow in a geostrophically balanced atmosphere is shown in Fig. 20, after [216].

Fig. 20
figure 20

Turbulent flow on the sphere after 12, 20, and 25 days. Top-view looking down onto the northern hemisphere. The radial component of vorticity is plotted and colored by intensity. Plot adapted from [216]

5 Vertical Discretization, Computational Grids, and Adaptive Mesh Refinement in NWP

We briefly discuss the issue of vertical discretization in atmospheric models since it is characterized by some constraints that do not apply to more traditional and general CFD models. Because of the classical use of finite differences with Cartesian rectangular grids, the accurate approximation of topography has always been a major concern both in atmospheric and ocean models. The vertical coordinate systems can be separated into two main branches: \(\sigma \) terrain-following [104, 237] and height-coordinates. Terrain-following coordinates have the advantage of the accurate representation of topography and ease of application of boundary conditions as the grid cells follow the shape of the varying bottom of the domain. However, the large truncation errors that increase with increasing topography slope [164, 286] require vertical coordinates that are more suitable for steep topographies. The height-coordinate system was first proposed as the \(\eta \)-system by [219]. It consists of the use of a rectangular grid that intersects the topography and defines the orographic height at the cells edges. Modification of both approaches have been later defined. Examples are the hybrid terrain-following coordinates [268] as an improvement of \(\sigma \), or the shaved-cell method in z-coordinates introduced by [1] for ocean models. Figure 21 shows a schematic of these grids.

Fig. 21
figure 21

Representation of a smooth mountain using: a height coordinate system with step topography, b \(\sigma \)-terrain following coordinates, and c height coordinate system with shaved cells

The \(\sigma \) grid mentioned above is simple, but on steep topography the regularity of the grid in the inner domain is compromised. To overcome this drawback, [263] introduced the smooth level vertical (SLEVE) mapping that helps maintain a sufficient degree of regularity of the node distribution away from the bottom boundary. Given a mountain ridge, a SLEVE grid is obtained from the decomposition of a large and small scale variation of topography (e.g. a Gaussian terrain perturbed by a wave-like function). Through this solution the grid distortion is controlled from bottom to top by means of two free parameters. Somewhere between \(\sigma \) and SLEVE stands the hybrid grid of [268]. The hybrid grid uses the same vertical coordinate \(\sigma \) and combines the topography and the height of the domain through two functions \(a(\sigma )\) and \(b(\sigma )\) whose values are properly tabulated.

Finite elements and Galerkin methods in general (finite volumes included) are free of all the drawbacks of methods that are not flexible with regard to the grid. Finite elements depend on computational grids of quadrilateral and triangular elements (in 2D) or hexahedra, tetrahedra, and prisms (in 3D) that adjust to the physical geometry to be discretized without affecting the formulation of the governing equations. The grid shape is inherently defined in the numerical formulation of the method. Generally speaking, they are z-coordinate based methods with full control of the shape of the topography. The grid itself looks like a \(\sigma \)-grid, but the fundamental difference is that finite difference methods with \(\sigma \) grids require re-expressing the equations using a coordinate transformation.

Due to the geometrical flexibility of Element-Based Galerkin (EBG) methods, no coordinate transformation is needed to apply the ground boundary condition. Complex orography can be modeled with ease using finer or, perhaps, adaptive grids (see Sect. 5.2), as long as certain criteria on regularity and smoothness of the element shape are respected. Furthermore, in a time when high resolution is the rule, complex orography can be modeled with ease and better grids. High resolution terrain-following coordinates induce grids to lose the property of orthogonality at the boundaries. The internal elements as well would suffer from great stretching up to a point that the grid is no longer sufficiently smooth for the numerical method to perform correctly. For example, if the Jacobian of the transformation from physical to computational space is singular, large numerical errors and instability in the solution would occur [300]. The application of CFD grid generation techniques for use in atmospheric problems is being considered more and more. Simple and fast structured grid generation with boundary layer grids or elliptic smoothing has been described in, e.g., [208]. Unstructured grids are also becoming of interest, as shown a few years ago by [5] and, more recently, by [275]. However, the inertia from the atmospheric community towards grids that do not have a characteristic column-wise structure is still large. This is because all of the packages that involve the computation of atmospheric parameterizations (e.g., precipitation, radiation) are designed to work on such grids and would have to be adapted (i.e. re-written) to work on different grids. The reasoning behind this inertia is understandable, although steps ahead in this direction must be made now that high-resolution atmospheric modeling is approaching fast.

5.1 3D Grid Generation for Domains With Orography and Bathymetry

Volume grid generation in atmospheric models is commonly performed by a one directional simplification of Transfinite interpolation (TFI) [89, 122]. TFI is robust, simple, and arguably the fastest grid generation technique in use in many fields of computational mechanics, of which geophysical fluid dynamics represents a particular case. Nevertheless, generally the quality of TFI grids degenerates when the geometric features of the domain boundaries present sharp corners, quasi-vertical boundary walls, or similar characteristics. This has a direct effect on the quality of the numerical solution of the problem [217]. The problem exists regardless of the underlying numerical method of solution. In NWP, sharp mountain ridges and canyons are an example. With the ever increasing trend towards high spatial resolution that we are experiencing in numerical weather prediction today, sharp topographies are certainly an issue. In the following sections, we describe the current way of generating structured grids in topographical domains and present a few examples to underline the possible limitations. At that point, we introduce the idea behind elliptic grid generation and how grids may be improved in terms of smoothness and orthogonality properties by this simple technique. Most of the ideas presented in this appendix are found in the books by [182] and [298], and in the recent paper by [175].

5.1.1 Algebraic Grid Generation

As we have mentioned above, transfinite interpolation has a major drawback that comes from the constraint on the regularity of the boundaries. If the boundaries of the simply-connected domain are not sufficiently smooth, TFI fails to generate good grids. The sharpness of internal corners given by a possible discontinuity in the space derivative of the boundary functions, reflects into folding grids with unacceptable node overlapping. The problem of folding grids with difficult geometries is usually solved by subdividing the domain into smaller subdomains with more regular boundaries. This technique is robust but difficult to automate. In Fig. 23, although the edges do not cross, the vertical wall on the left-hand side of the hill is a challenge for the grid generator, as it can be noted by the extremely stretched elements in the region of the hill’s front.

Nonetheless, because topography is usually smooth in current operational models (at horizontal resolution of 1 km or coarser, all mountain peaks are likely to be smoothed out), TFI is still the perfect and quick solution that can be properly modified for different types of vertical node distributions.

These improved methods are sufficiently good as long as the boundaries are never vertical. This is because the transformations are performed along \(z\) only. For full control of the nodes’ distribution in all directions, these schemes should be incorporated into a full TFI interpolation.

5.1.2 Elliptic Grid Generation

One simple, yet efficient solution to the generation of smooth grids with sufficiently good properties is given by the solution of the Thompson–Thames–Mastin (TTM) problem [297], an elliptic system of partial differential equations. Two-dimensional elliptic grid generation was introduced for ocean circulation modeling in [259]. The penalty of elliptic equation methods is the higher cost with respect to algebraic methods. To control the point distribution with TTM some parameters must be selected by the user. To overcome the need for parameter selection, in [175] an automatic elliptic grid generation method is proposed. A similar approach is described in [180] for grids around topography. In this recent paper, the author also uses an iterative method to smooth the grid on a layer-by-layer basis, with a check on the grid spacing to avoid the overlapping of grid cells. This check is necessary because the method of [180], unlike the elliptic scheme, is not designed to respect the maximum principle.

5.1.3 Orthogonality

When it comes to high resolution simulations, with very fine LES grids, the boundary layer may be solved explicitly. In this case, boundary layer grids may be necessary for atmospheric models like they are for, e.g., industrial flows at much smaller scales. For how the atmospheric community is responding to the introduction of new grids, orthogonal boundary grids may still be seen as futuristic. However, their use is already common in the simulation of atmospheric flows in the micro-scale (i.e. 20–500 m domains) (see, e.g., the Bolund experiment starting from [22]) so that it seems appropriate to mention them here.

Orthogonality in three-dimensional structured grid generation systems is still an active field of work (see [175, 298]). The elliptic grid generation system herein described and implemented in [221] is able to reach reasonable orthogonality properties at the lower boundary by either using Neumann boundary conditions to move the nodes, or by a proper definition of the control functions as done in [175]. Currently, a quasi-orthogonal system is the best that we can achieve with the available algorithms from the literature. Figure 22 shows how a non-orthogonal grid is transformed to a quasi-orthogonal mesh in the proximity of the boundary. This grid was deliberately relaxed to the point where the boundary layer is completely lost. This was done to clearly show orthogonality at the boundary. However, maintaining a proper stretching ratio in the proximity of the boundary without affecting orthogonality remains an open problem. A compromise is needed in building the grid, and experimentation on different topographies may be necessary.

We report a few two- and three-dimensional examples of grids generated using both algebraic and elliptic methods. Figure 22 shows the computational grid around a cosine function obtained by TFI interpolation, TFI with an orthogonal multi-surface method, and with an elliptic grid generator. The geometry is straightforward to mesh. The three methods give similar results; however, using the elliptic method together with a multi-surface technique to achieve orthogonality clearly produces a better boundary layer grid. The problem is taken a little further with the fully three-dimensional mesh of the Bolund hill in Denmark. The improvement in terms of regularity of the grid in the internal volume and in terms of quasi-orthogonality, is evident from panel (b) in Fig. 23, where the elliptic solver was applied with a few iterations to improve the algebraic grid of panel (a).

Fig. 22
figure 22

a TFI, b orthogonal, c elliptic (not orthogonal)

Fig. 23
figure 23

a TFI and b elliptic volume grids. In this plot there is no grid control in the proximity of the boundary surface. The elliptic grid is computed with 50 iterations

5.2 Adaptive Mesh Refinement

The term adaptive mesh refinement (AMR) describes mesh generation techniques in which the spatial resolution is adjusted depending on certain properties of the specific application. Within AMR one distinguishes between static and dynamic AMR. In static AMR the mesh is adjusted once at the beginning of the simulation whereas dynamic AMR adapts the resolution for the whole duration of the simulation as a function of the structure of the solution based on some pre-defined criterion.

The idea to increase the resolution in part of the domain has a long history in scientific computing and also in meteorology. Usually this adjustment of the resolution is done by coupling two numerical models with different resolutions (so called nesting). The easiest way to implement nesting is to first run a full coarse simulation and then use the result of this coarse simulation as boundary conditions for a higher resolved simulation in a smaller domain. In this approach the result of the higher resolution simulation cannot affect the coarse simulation. For this reason this approach is called one-way nesting. An example for one-way nesting can be found in the 1976 work by Davies [72] and the work of Miyakoda and Rosati [220]. More difficult but also more accurate is two-way nesting in which both numerical models are allowed to interact which each other. This means that the result on the coarse mesh is not only used as an initial and boundary condition of the finer mesh but the result on the finer mesh is also used to improve the accuracy of the simulation using the coarse mesh. An example of two-way nesting is the work by Zhang et al. [329]. Nesting does not need to be static. The domain of the higher resolution simulation can move within the domain of the coarse resolution simulation like Ley and Elsberry did in 1976 [200]. Nesting is not restricted to combining two different simulations. More than two different resolutions can be combined, as in Ginis et al. [109].

An alternative to increasing the spatial resolution via nesting is to use variable mesh spacing in the different directions like in Staniforth and Mitchell [280], or to adjust the mesh with the help of a coordinate transformation, as in Dietachmeyer and Droegemeier [75].

Dynamic AMR in which the mesh is repeatedly adjusted according to the current intermediate result of the simulation has been used in engineering applications for a long time [23, 24]. The first application of this kind of dynamic AMR in atmospheric sciences was done by Skamarock et al. [272] and Skamarock and Klemp [270]. A first approach to use dynamic adaptive mesh refinement operationally was given by the OMEGA model (OMEGA stands for Operational Multiscale Environment Model with Grid Adaptivity). The OMEGA model was presented in the work of Bacon et al. [8]. Simulations of hurricane tracks by Gopalakrishnan et al. [121] demonstrate that the accuracy of the hurricane simulation can be improved significantly by using dynamic AMR while at the same time reducing the runtime of the simulation. There are however still many open questions that need to be resolved for a broader application of dynamic AMR in atmospheric sciences [315]. More details about the historical evolution of AMR can be found in [18] and [159].

Within dynamic mesh refinement there are three possible approaches to adjust the accuracy of the simulation according to the current flow:

  • h-adaptive mesh refinement: the spatial resolution is adjusted by adding or removing grid points in the mesh. In an element-based method this is done by subdividing elements into smaller elements or merging elements into larger elements. The total number of elements and grid points is allowed to change in this approach. This makes it necessary to redistribute elements sometimes when multiple computing nodes are used for the computation. We discuss this approach more in detail below.

  • r-adaptive mesh refinement (or moving mesh): the grid points and therefore elements are moved and deformed in such a way that the spatial resolution gets finer in those parts of the domain where the accuracy of the simulation needs to be increased. This reduces automatically the density of grid points in other parts of the domain and therefore reduces accuracy in those parts. The total number of grid points and elements is constant in this approach. An example of this approach can be found in the work of Kühnlein et al. [188], Budd and Williams [40], and Bauer et al. [16].

  • p-adaptive mesh refinement: the accuracy of the simulation is adjusted by changing the polynomial order of the spatial discretization. The size and location of the elements remains unchanged in this approach. One of the first descriptions of this approach can be found in the work of Babuska et al. [7]. Application of this approach to geophysical modeling can be found recently in [303] and, earlier on in [90].

These three approaches for dynamic AMR can be combined with each other like in the work by Lang et al. [190], Pigott et al. [239], and [111] and references therein.

We concentrate in the following on h-adaptive mesh refinement more in detail. Within this approach we can distinguish between conforming AMR and non-conforming AMR. A rising warm air bubble simulation using conforming and non-conforming AMR is shown in Fig. 24.

Fig. 24
figure 24

Conforming (left) and non-conforming (right) AMR simulations of a rising thermal bubble. The left figure is adapted from [224]. The right one is adapted from [184]

5.3 Non-conforming Mesh Refinement

As explained in the section above, mesh refinement techniques create either conforming or non-conforming meshes. In conforming meshes each element has only one neighbor per element face (h-conforming), so a situation where more than two elements share the same face is not allowed. Also, the elements have to be p-conforming, that is the approximating polynomials in both neighboring elements are of the same order and the nodal points on both sides of the face coincide. An example of a hp-conforming element interface is shown in Fig. 25, where elements A and B exclusively share the same face (side) and have the same polynomial expansion order. In such a situation there are no additional requirements for the numerical method. The burden of creating a conforming mesh after the refinement falls entirely on the AMR algorithm.

Fig. 25
figure 25

Different element interfaces, where AB is conforming, BC p-non-conforming, and CDE h-non-conforming. Dashed lines symbolize the higher order mesh within the elements

In non-conforming meshes, however, one needs to account for faces that are shared by more than two elements (h-non-conforming, see interface between elements C, D, and E in Fig. 25) or with different polynomial approximation on both sides (p-non-conforming interface between elements B and C in Fig. 25). Mind that Fig. 25 does not illustrate all the possibilities of non-conforming configurations. One other possibility is a hp-non-conforming interface, where an edge (or face) is shared by more than two elements of different polynomial orders. To complete the discussion of meshes and element-based Galerkin methods, in this section we provide an overview of methods used to reconcile non-conforming elements. Note that conforming AMR grids do not require any special handling by the numerical method and so we will discuss conforming AMR no further, but rather, shall concentrate on non-conforming AMR.

5.3.1 Mortar Element Method

The first to introduce a non-conforming formulation for spectral element methods were Maday et al. [205], who presented the mortar element method (MEM), where the domain is split into blocks of conforming elements, and a new trace space, namely mortars, is introduced to couple the non-conforming blocks. The MEM was an extension of classical non-conforming methods in the finite element community [51, 81, 285] with the difference that, besides being applied to the spectral element method, it did not rely on Lagrange multipliers or master-slave relations of non-conforming edges of the elements.

Fig. 26
figure 26

Schematic of mortar element method. The mortars are binding non-conforming elements that sum the contribution from element edges and apply an \(L_2\) projection of the mortar data back to the element edges. A single arrow represents direct assignment of the vertex value, while double arrow represents \(L_2\) projection

In MEM, the mortars are one-dimensional constructs (in 2D; they are two-dimensional in 3D) with a polynomial space defined on them. The task of the mortar is to reconcile the \(C^0\) continuity condition between the non-conforming elements that the mortar is connecting. In other words, the mortar is an interface between the non-conforming element faces (see Fig. 26). The polynomial order on the mortar is typically chosen to match the highest order expansion among the elements contributing to the mortar. The end-point values of the mortar solutions are constrained to match the values at corresponding vertices of the original elements (represented by single arrows in Fig. 26). The integral projection operation \((L_2\) projection) is defined to project the solution from the mortar to the interior points of the non-conforming element edges (double arrows in Fig. 26). If we write this operation in matrix form as \(Q_{n \times m}\), where \(n\) is the number of nodes on the element edge, and \(m\) is the number of nodal points on the mortar, then the operation \(Q^T\) will sum the contributions from the element edges to the mortar. To reconcile the \(C^0\) continuity condition we first sum the contributions from element edges on the mortar \((Q^T)\), perform weighted averaging, and project the result back to the element edges \((Q)\). Mind that this method only minimizes the discontinuity but does not enforce a strict \(C^0\) continuity. Even though here we present only a limited spectrum of possible non-conforming configurations, MEM is very general and can be applied in more complicated situations [205].

5.3.2 Pointwise-Matching Method

Another method, stemming from the finite element community, is the pointwise matching method (PMM), or the interpolation-based method [47, 93, 254]. In this approach both h and p-non-conforming elements are allowed, however it is assumed that for h-non-conforming elements there is one parent edge on one side of the interface, and two children edges on the other side. In the case of the p-non-conforming, the parent edge is the one with lower polynomial order.

Fig. 27
figure 27

Schematic of the pointwise-matching method. Parent points are marked with filled circles. Values at children points depend on the values of the parent points. Here operation \(Q\) marks the interpolation from the parent element to children elements

In Fig. 27 the parent edge belongs to element C and the parent points are marked with filled circles. The values at the child points are interpolated from the parent points. Here \(Q\) denotes this interpolation. To ensure strict \(C^0\) continuity, the solution from the child points is first added to the parent points via the operation \(Q^T\). The solution is averaged at the parent edge and interpolated onto the child edge via \(Q\). We mark the corner points of children elements with a filled circle, as the interpolation between those points and corresponding points at the parent edge is trivial. Unlike MEM, the continuity here is strictly enforced.

It is possible to express PMM using mortars and hence bring those two methods into one framework. This can be achieved by using the mortar infrastructure and replacing the \(L_2\) projection by interpolation. In such an approach, the choice of \(Q\) (projection or interpolation) will define the method. Traditionally, in MEM we choose the size of mortars to correspond to short (children) edges, while in PMM we use the parent edge as a mortar analogue. [265] investigates how different choices of the size of mortars affects the performance of both PMM and MEM.

5.3.3 Mortar Elements for DG and Application to Atmospheric Simulations

The early work on MEM focused mainly on elliptic problems and spectral element methods [25, 205]. Kopriva [186] applied the MEM to compressible flows and the DG method by imposing an additional condition on the global conservation of the mortar approximation, as well as outflow conditions. The outflow condition required that the solution from the “upwind side” of the mortar after projection onto the mortar and back to the face remains unchanged. This work was used later for atmospheric simulations in [33, 184]. Based on the two-dimensional work reported in [184], Marras et al. [211] used MEM in a unified CG/DG shallow water model on the sphere (with static and dynamic adaptivity). An application of the pointwise-matching scheme to geophysical simulations can be found in [258]. In [185], Kopera and Giraldo presented a unified framwework including both CG and DG methods, as well as integral projection (for DG) and pointwise-matching (for CG) schemes for non-conforming interfaces and found that similar mass conservation properties can be obtained for both configurations.

5.3.4 Unified CG/DG Non-conforming Method

The idea of a unified CG/DG method stems from the similarity of both approaches. Much of the mathematical operations, and therefore much of the code implementation is the same, with an exception of communication between elements. A time-step of a unified CG/DG method can be described by the following algorithm.

  1. 1.

    Evaluate volume integrals for each element and store as right-hand-side \(RHS\).

  2. 2.

    Perform inter-element communication:

    • For CG, perform Direct Stiffness Summation on \(RHS\) to ensure \(C^0\) continuity,

    • For DG, evaluate element-boundary integrals (fluxes) and update \(RHS\).

  3. 3.

    Divide by the mass matrix. Notice that the mass matrix for CG corresponds to the assembled DG mass matrix.

  4. 4.

    Perform the time-step \(\frac{\partial q}{\partial t} = RHS\).

This recipe can be applied regardless of whether one constructs a conforming or non-conforming method. The non-conforming element treatment will affect only the inter-element communication step. Even though this step is different between the methods, it is desirable to construct both non-conforming edge algorithms in a similar fashion. As discussed in previous sections, using the mortar element method one can incorporate the integral projection method by [186] as well as the pointwise-matching method into the same framework. The implementation of a unified CG/DG method with non-conforming interfaces is described in detail in [184, 185]. Here we outline the general approach to both CG (using pointwise-matching method) and DG (using integral projection method) treatment of non-conforming interfaces.

Fig. 28
figure 28

Schematic of the mortar based non-conforming methods for both CG (a) and DG (b). Both methods follow the same algorithm of communicating data to the mortar, performing operations on the mortar and communicating the data back. The differences lie in the choice of mortars (mortar conforming with the long edge plus an additional point mortar to communicate between vertex neighbors for CG, mortars conforming with short edges for DG) and matrices used to communicate data to and from the non-conforming mortar. In the case of CG, the data is interpolated using matrices \(J_{1,2}^T\) and \(J_{1,2}\); for DG we use projection matrices \(P_{1,2}^s\) and \(P_{1,2}^g\)

Figure 28 shows a schematic of the unified approach to inter-element communication of CG and DG methods for conforming and non-conforming interfaces. Panel (a) shows the interpolation-based mortar point wise matching method used for CG, while panel (b) presents the integral projection method for DG. In both situations the solution from non-conforming edges is first communicated onto mortars, then an appropriate action is performed on the conforming mortars and the result is communicated back to the element edges. The first difference between the two approaches is the choice of the mortar. For the CG method we choose the mortar to be conforming with the longer parent edge (so called long rule [265]), while for the DG method we choose the shorter, children edges to define the size of the mortar (short rule). Additionally, for the CG method we need to define an additional, point mortar to ensure the communication between vertex neighboring elements.

The second difference between the two methods is the choice of non-conforming communication matrices. For CG we perform an interpolation with the matrix \(J_k\):

$$\begin{aligned} J_{k,ij} = h_i(\xi ^k(z_j)), \quad k=1,2, \end{aligned}$$

where \(h_i\) are the basis functions defined on the mortar, \(\xi ^k\) is a map from the element edge coordinate \(z\) to the mortar coordinate \(\xi \), and \(z_j\) is the coordinate of the \(j\)th nodal point at the element edge. The interpolation matrices \(J_k\) will scatter the solution from the mortar to two children element edges. The opposite action, gather, is performed by the transpose matrices \(J_k^T\). The matrices \(J_k\) correspond to the operation \(Q\) showed in Fig. 27.

For the DG method the communication from parent element (long edge) to mortars requires projection matrices, defined in [186] by imposing the integral condition on a non-conforming interface

$$\begin{aligned} \int _{-1}^1 (q_M^k(\xi ) - q(\xi ^k(z))) \psi (\xi ) d\xi = 0,\quad k=1,2 \end{aligned}$$

where \(q_M^k(\xi )\) is the solution projected at the \(k\)th mortar, \(q(\xi ^k(z))\) is the solution at the parent edge, \(\xi \) is the coordinate defined on the mortar, \(z\) is the coordinate of the parent edge and \(\xi ^k(z)\) is the map between the parent edge and \(k\)th mortar coordinates. This integral condition can be expressed in matrix form as:

$$\begin{aligned} Mq_M^k - S^kq = 0, \end{aligned}$$

where \(M_{ij} = \int _{-1}^1 \psi _i(\xi ) \psi _j(\xi ) d\xi \) and \(S^k_{ij} = \int _{-1}^1 \psi _i(\xi )\psi _j(\xi ^k(z))\). The projection operation is defined as follows

$$\begin{aligned} q_M^k = M^{-1} S^k q = P_k^s q. \end{aligned}$$

\(P_k^s\) is the scatter projection matrix used to communicate the solution from parent edge to a mortar. The communication in the opposite direction is achieved by a gather projection matrix

$$\begin{aligned} q = \sum _{k=1}^2 P^g_k q_M^k, \end{aligned}$$

where \(P^g_k = \frac{1}{2}M^{-1}(S^k)^T\). The details are presented in [184, 186].

Finally, the third difference between inter-element communication of CG and DG methods lies in the action performed on the mortar. Once the solutions from element edges are interpolated/projected onto mortars, the CG method averages the solution before scattering it back to respective edges, while in the DG method we need to compute a flux, which is then projected back to the elements. This action is identical between conforming and non-conforming edges, as mortars are by design conforming to each other. Note that whenever the mortar and element edge spaces are conforming, the communication matrix (regardless of whether it is interpolation or projection) is always the identity matrix. It is also important to note that while Fig. 28 presents schematics for both methods, the actual implementation in the code may differ. Details of CG interpolation method implementation is presented in [185], along with a unified implementation of the CG/DG method.

5.3.5 Performance of the Non-conforming Adaptive Mesh Refinement

Adaptive mesh refinement and element-based Galerkin methods are a natural match due to the element-local nature of most computationally intensive calculations. The edge-based computations can also be implemented efficiently even for non-conforming edges. Figure 29 presents the plot of speed-up achieved by using the AMR algorithm against the ratio of the element count in a reference simulation to the average element count of an AMR simulation. If there were no AMR overhead, the simulation with element ratio of 10 would yield a speed-up of 10, which is represented by the solid black line. The study was performed for three different time integration methods, namely the explicit Runge–Kutta 3rd order 5-stage method, and two implicit–explicit second order methods BDF2 and ARK2. The explicit method virtually overlaps with the ideal speed-up line, while the IMEX methods show more AMR overhead. The sources of this decreased speed-up are studied in [184]. This result shows, however, that the adaptive mesh algorithm can be implemented with minimal overhead.

Fig. 29
figure 29

Performance of the adaptive mesh algorithm with the DG method. Speed-up is defined as the ratio of runtime of a reference simulation with a fully refined mesh to the runtime of the adaptive simulation. The element ratio measures the ratio of element count of the fully refined simulation to the average element count of the AMR simulation. Black solid line represents ideal speed-up. Results are reported for an explicit Runge–Kutta time integration (RK35) and two implicit–explicit schemes (BDF2 and ARK2). The simulation was set-up as a standard density current test case. Source: [184]

Table 3 shows the breakdown of time spent in different parts of the code that was run on a single CPU with detailed timings for different parts of the AMR algorithm. The biggest share of time spent in adapting the mesh falls on the criterion evaluation, even for the very simple threshold criterion used in this study. The time spent on evaluating fluxes on non-conforming interfaces is in the range of 2–3 % for all the methods, however this cannot be considered an overhead. A single non-conforming interface replaces two regular faces, therefore the time spent in computing those fluxes should be compared with the time the code would spent on computing the fluxes on the faces that the non-conforming interface replaces. Additionally, due to the non-conforming faces we are able to reduce the number of elements in the domain and therefore reduce the time spent on the volume integration. The important message is that the computation of fluxes on non-conforming interfaces has a minor cost compared to other parts of the code.

Table 3 Timings (in seconds of runtime) and percentage breakdown of different AMR components for the density current case

Finally, the element-based Galerkin methods preserve their original mass conservation properties even when used with non-conforming meshes. This is discussed in detail in [185] and presented Fig. 30. Both CG and DG methods conserve mass to machine precision even for very long simulation times. In the density current case presented in Fig. 30 the typical simulation time is 900 s, while the test was performed until 10,000 s.

Fig. 30
figure 30

Mass conservation as a function of simulation time for a dynamically adaptive mesh. Triangular markers denote the CG results and circular markers represent the DG results. Colors represent different polynomial orders

6 Summary

After briefly reviewing the different mathematical models of the atmosphere (i.e. equation sets), in this article we have concentrated on their numerical solution by means of element-based Galerkin methods (EBG). The emphasis was given to the finite and spectral element methods on the one hand and nodal discontinuous Galerkin on the other. Element-based Galerkin methods appear to be perfectly suited for building next-generation climate and numerical weather prediction (NWP) models because of their geometric flexibility (unstructured and AMR) and high parallel-scalability (on both CPU and GPU-based architectures; see, e.g. [231] and citations therein). There are issues that remain essentially unsolved (e.g., stabilization) but, as we have shown in this manuscript, there are numerous options that appear to satisfy the requirements for building robust, efficient, and positivity-preserving numerical models. Finally, because atmospheric modeling is aiming at higher and higher resolution simulations and, hence, well resolved topography, we discussed the issue of grid generation and gave some insight on its evolution in NWP.