1 Introduction

Compressible wall-bounded flows play an important role in many aerospace applications of industrial and academic interest. The direct numerical solution (DNS) of the compressible Navier–Stokes equations for wall-bounded turbulent flows has recently become affordable owing to the large increase in available computer power, and canonical incompressible flows have been simulated up to high Reynolds number [2]. However, it is known that the numerical solution of the compressible Navier–Stokes equations is significantly more time consuming than their incompressible counterpart, partly owing to the inherently higher number of floating point operations (flops) per grid point, but mainly because of the much smaller time step imposed by the acoustic stability restriction. In free-shear flows, conventional explicit algorithms can still be used efficiently as long as the typical Mach number is of the order of unity. However, wall-bounded flows inevitably include regions with near stagnant flow and tiny grid spacing adjacent to solid surfaces, which makes the acoustic time step limitation in the wall-normal direction dominant, even at high bulk Mach numbers. Besides being dictated by stability considerations, time step limitations in turbulent flows also have a physical interpretation, as in order to capture the relevant physics of transport phenomena with given speed (say U) on a mesh with given size (say \(\varDelta \)), time steps no larger than \(\varDelta /U\) should be used. Hence, CFL numbers (defined as the ratio of the time advancement step to the maximum allowed time step for explicit time integration) should always be of the order of unity for genuine DNS. In compressible flows, information simultaneously propagate at the hydrodynamic and at the acoustic speed. However, acoustic waves typically make a negligible contribution to the overall energetics of turbulent flows [3]. Hence, with the obvious exception of cases where acoustic instabilities play an important role, such as in certain combustion applications [4] or in direct simulation of aerodynamic noise [5], using a time step which allows to resolve the hydrodynamic (vortical) mode while giving up accurate representation of acoustic phenomena may be a legitimate choice, which actually subtends much of the research carried out for low-speed solvers.

It is the goal of this paper to develop a numerical algorithm for direct numerical simulation of compressible flow which is capable of seamless efficient operation throughout the Mach number range, down to nearly incompressible conditions. The algorithm is at the same time meant to remove or at least alleviate the acoustic time step limitation in the presence of solid boundaries. To gain a clearer perception for the problem, we refer to a canonical compressible boundary layer flow over a flat surface, or flow in a planar channel. Let \(\varDelta x\), \(\varDelta z\) be the mesh spacings in the streamwise and spanwise directions, respectively, and let \(\varDelta y\) be the minimum mesh spacing in the wall-normal direction, assuming unit CFL number, the time step limitations associated with the discretization of the convective terms in the coordinate directions are

$$\begin{aligned} \varDelta t_x^+= & {} \frac{\varDelta x^+}{\max (u_0^++c_0^+,c_w^+)}= \varDelta x^+ M_0 \sqrt{{C_f}/{2}} \min \left( 1, \frac{1}{1+M_0} \sqrt{{T_w}/{T_0}} \right) , \nonumber \\ \varDelta t_y^+= & {} \frac{\varDelta y^+}{c_w^+}= \varDelta y^+ M_0 \sqrt{{C_f}/{2}} \nonumber \\ \varDelta t_z^+= & {} \frac{\varDelta z^+}{\max (c_0^+,c_w^+)}= {\varDelta z^+} M_0 \sqrt{{C_f}/{2}} \min \left( 1, \sqrt{{T_w}/{T_0}} \right) , \end{aligned}$$
(1)

where the ‘+’ superscript is used to denote quantities made nondimensional with respect to local wall units, namely the friction velocity \(u_{\tau }=(\tau _w/\rho _w)^{1/2}\), and the viscous length scale \(\delta _v=\nu _w/u_{\tau }\), the subscript 0 is used to denote flow properties at the centerline (for channels) and at the free-stream (for boundary layers), and w to denote wall properties, with \(C_f=2 \tau _w/(\rho _0 u_0^2)\) the friction coefficient. \(\rho \), u, T and \(\nu \) denote the density, velocity, temperature and kinematic viscosity, whereas M, c and \(\tau _w\) are the Mach number, speed of sound and wall stress, respectively. It should be noted that if acoustic waves are suppressed, as is the case of strictly incompressible flow, the time step is controlled by the streamwise direction, and

$$\begin{aligned} \varDelta t_I^+={\varDelta x^+} \sqrt{C_f/2}. \end{aligned}$$
(2)

The viscous time step limitation is mainly effective in the wall-normal direction, and in wall units one has

$$\begin{aligned} \varDelta t_{yv}^+= {\varDelta y^+}^2. \end{aligned}$$
(3)
Fig. 1
figure 1

Inviscid time step limitation in the coordinate directions as from Eq. (1) as a function of the reference Mach number \(M_0\). In panel a we show \(\varDelta t_x\) (solid), \(\varDelta t_y\) (dashed), \(\varDelta t_z\) (dot-dashed). In panel b we show the ratios \(\varDelta t_x/\varDelta t_y\) (solid), \(\varDelta t_z/\varDelta t_y\) (dot-dashed). For reference, in panel a we report with a grey line the ‘incompressible’ time limitation given in Eq. (2). The symbols denote the time step limits for the present semi-implicit algorithm as dictated by accuracy (circles) and stability (squares), as discussed in Sect. 3.2

For the sake of graphical representation of the above formulas, we assume: (i) the distance of the first point from the wall is \(\varDelta y_w^+ \approx 0.7\), which is the maximum value for which accurate turbulence statistics are obtained [6]; (ii) the minimum mesh spacing in the wall-normal direction is \(\varDelta y = 2 \varDelta y_w\), which can be achieved by staggering the mesh in the vertical direction, thus alleviating the stability restrictions [6]; (iii) the wall-parallel mesh spacings are \(\varDelta x^+ = 8\), \(\varDelta z^+ = 4\), which is typical for DNS; iv) the wall is isothermal, with \(T_w=T_0\). Figure 1 shows the inviscid time step restrictions according to Eq. (1) as a function of the reference Mach number \(M_0\), scaled by \(\sqrt{C_f/2}\) (panel a), and as a fraction of the wall-normal allowed time step (panel b). Inefficiency of explicit compressible solvers is apparent in the low-Mach-number regime, where vanishingly small time steps are required. Time steps comparable to those achievable in incompressible flow are only possible starting at \(M_0 \approx 3\). With the exception of hypersonic flow, the most restrictive time limitation is that associated with the vertical direction, and an increase by at least a factor of two can be gained by removing it (see panel b). It is also interesting to note that the acoustic time limitation in the spanwise size is more restrictive than the streamwise limitation up to \(M_0 \approx 1\), whereas at supersonic Mach numbers the convective limitation in x is controlling. Removing the wall-normal acoustic time limitation in supersonic flow is sufficient to achieve a similar time step as in incompressible flow, whereas in subsonic flow it is also necessary to remove the acoustic time restriction in the wall-parallel directions. We further note that the normalized viscous time limitation \(\varDelta t_{yv}^+/\sqrt{C_f/2}\), with \(\varDelta t_{yv}^+\) given in Eq. (3) is always much weaker than the convective ones, provided \(\varDelta y^+ \sim 1\), and considering that the range of friction coefficients typically accessed by DNS is \(2 \times 10^{-3} \le C_f \le 6 \times 10^{-3}\). The above estimates are deduced for typical DNS mesh spacings, but the case of wall-resolved RANS, LES and DES is even more severe, as the aspect ratio of near-wall cells is substantially higher, hence making suppression of the wall-normal time step restriction mandatory for any practical calculation. This is even more important in the case of curvilinear coordinate systems with singular metrics. For instance, in the case of cylindrical coordinates the mesh spacing (hence, the allowed time step for explicit stepping) becomes exceedingly small in the azimuthal direction, even in the absence of walls.

The above-mentioned difficulties are well-known to the CFD community, and a variety of techniques have been developed to cope with the numerical stiffness of the compressible Navier–Stokes equations. The chief choice in this respect has traditionally been the use of (semi-)implicit time integration schemes. A landmark contribution in this sense was given by [7, 8], who proposed a time-implicit algorithm for the solution of the Navier–Stokes equations in conservative form based on linearization of the convective and viscous flux vectors, coupled with approximate factorization [9] to handle multiple space dimensions. However, the method is computationally expensive as it requires the inversion of \(5 \times 5\) block-banded systems of equations, which is more expensive than, e.g. standard banded systems. In this respect we note that, whereas the classical Thomas algorithm for tridiagonal matrices requires a number of floating point operations (flops) of O(6N) (where N is the number of grid points in a given coordinate direction), its block-tridiagonal version requires \(O(3N(M^3+M^2))\) flops, where M (\(=\)5 in the Beam–Warming algorithm) is the size of each block [10]. The computational cost is about twice as much in the case of periodic boundary conditions [11]. Pulliam and Chaussee [12] developed a variant of the Beam–Warming algorithm which involves the inversion of standard tridiagonal systems rather than block matrices, with large saving of computer time, but with loss of accuracy and stability in the case of unsteady simulations [13]. Algorithms of the Beam–Warming family are at the heart of highly successful aerospace CFD software [14, 15]. For instance, [16] have recently developed an implicit fourth-order Runge–Kutta scheme, using residual smoothing based on a bilaplacian operator. The algorithm requires the inversion of one scalar pentadiagonal system in each space direction per sub-steb, allowing for net speedup of 3-5 with respect to the explicit case. Algorithms which avoid inversion of banded systems of equations have also been designed [17], which may be useful for efficient parallel implementation. However, those algorithms require point-wise iterative procedures whereby the right-hand-side of the equations must be evaluated several times per time step, with unclear outcome in terms of overall efficiency.

Alternative approaches to circumvent the stiffness of compressible Navier–Stokes equations rely on the use of pre-conditioning techniques, based on the attempt to change the eigenvalues of the system of equations in order to remove the large disparity of wave speeds. This is accomplished by pre-multiplying the time derivatives by a matrix that slows the speed of the acoustic waves down toward the fluid speed [18, 19]. Preconditioning is the choice of election for steady-state application, however its extension to unsteady flow problem is not straightforward, requiring the use of dual time stepping techniques, namely inner iterations in terms of a pseudo-time [20,21,22]. However, the number of iterations per physical time step can be very large, with subsequent loss of computational efficiency.

An important class of algorithms for low-Mach-number flows relies on the concept of asymptotic consistency [23]. The main idea is that the discrete compressible Navier–Stokes equations should automatically reduce to the their discrete incompressible counterpart as the Mach number approaches zero. Examples of this approach include use of the compressible isentropic Navier–Stokes equations which are then discretized with a semi-implicit scheme, yielding a formulation similar to the classical projection method for incompressible flow [24, 25]. Asymptotic-preserving methods are certainly interesting as they allow to entirely remove the acoustic time step limitation, but they suffer from the same drawback of incompressible solvers, as in the incompressible limit they require the solution of Poisson equation, which is computational expensive in multiple space dimensions.

Specialized algorithms for the Navier–Stokes equations have been also developed for the low-Mach number regime, which allow to account for temperature-dependent density variations, as is typically the case in combustion. All these variable-density algorithms are based on the idea the only the terms which bring an acoustic contribution should be advanced implicitly in time, in such a way that the acoustic time limitation is removed. Numerical schemes of this kind were pioneered by [26], who proposed to treat implicitly only the pressure term in the momentum equation and the dilatation term in the internal energy equation, which results in having to solve an elliptic equation for pressure, with large incurred overhead. [27, 28] extended the classical pressure-correction method [29] to variable-density flows by solving a Helmholtz equation for the pressure correction, and the use of sub-iterations. LES results were carried out in which a time step forty times larger than the explicit case was achieved, with modest computational cost overhead. Moureau et al. [30] developed an implicit scheme for the removal of the acoustic limitation which also relies on the solution of a Helmholtz equation, however without reverting to sub-iterations, with an overhead CPU time of about \(25\%\) with respect to standard incompressible solvers. Hence it appears that, in one way or another, algorithms tailored for the near-incompressible regime involve either iterative procedures and/or the inversion of elliptic systems of equations. The latter can only be carried out efficiently in the case that periodic directions are present, which allows for the use of direct solvers [31].

In this paper we develop a novel semi-implicit algorithm for the compressible Navier–Stokes equations based on a modification of the basic Beam–Warming linearization, thus avoiding any iterative procedure. The algorithm is presented in Sect. 2, which also includes a discussion of the treatment of viscous terms, accurate time integration by the use of a third-order semi-implicit Runge–Kutta scheme, and extension to multiple space dimensions. Numerical examples are given in Sect. 3, which include DNS of turbulent flows from the low subsonic to the supersonic regime. Final remarks and suggestions for future work are given in Sect. 4.

2 Formulation of the Algorithm

The Navier–Stokes equations for a compressible perfect gas are considered in which the total energy equation is replaced with the entropy equation

$$\begin{aligned} \frac{\partial \mathbf {w}}{\partial t}=-\sum _{i=1}^3\frac{\partial \mathbf {f}_i}{\partial x_i}+\sum _{i=1}^3\frac{\partial \mathbf {f}_i^v}{\partial x_i} + \mathbf {S} = \mathbf {R} , \end{aligned}$$
(4)

where \(\mathbf {R}\) is the right hand side of Eq. 4, \(\mathbf {w}\) is the vector of the conserved variables, \(\mathbf {f}_i\) and \(\mathbf {f}_i^v\) are the convective and viscous fluxes in the ith direction, with xyz the streamwise, wall normal and spanwise directions and \(\mathbf {S}\) the source terms in the entropy equation,

$$\begin{aligned} \mathbf {w}= \begin{bmatrix} \rho \\ \rho u_j\\\rho s \end{bmatrix}, \quad \mathbf {f}_i= \begin{bmatrix} \rho u_i\\ \rho u_iu_j + p\delta _{ij}\\\rho u_i s \end{bmatrix}, \quad \mathbf {f}_i^v= \begin{bmatrix} 0 \\ \sigma _{ij}\\ -{q_i}/{T} \end{bmatrix}, \quad \mathbf {S}= \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ \frac{\sigma _{\ell m}}{T}\frac{\partial u_{\ell }}{\partial x_m} -\frac{q_{\ell }}{T^2}\frac{\partial T}{\partial x_{\ell }} \end{bmatrix}, \end{aligned}$$
(5)

where \(\rho \) is the density, p is the pressure, T is the temperature and \(u_i,\,i=1,2,3\) the velocity components in the ith direction (also denoted as uvw in the following), \(s=c_v\ln {(p\rho ^{-\gamma })}\) is the entropy per unit mass, \(q_i\) and \(\sigma _{ij}\) are the components of the viscous stress tensor and heat flux,

$$\begin{aligned} \sigma _{ij}=\mu \left( \frac{\partial u_i}{\partial x_j} + \frac{\partial u_j}{\partial x_i} -\frac{2}{3} \frac{\partial u_k}{\partial x_k} \delta _{ij}\right) , \quad q_i=-k\frac{\partial T}{\partial x_i}, \end{aligned}$$
(6)

where \(\mu \) is the dynamic viscosity, \(k=\mu c_p/{\textit{Pr}}\) the thermal conductivity and \({\textit{Pr}}=0.72\) the molecular Prandtl number.

As shown in the following, the use of the entropy equation is instrumental to achieving efficient implicit treatment of the acoustic terms, and also yield benefits in terms of increased robustness as compared to algorithms solving for the energy equation [32, 33]. On the other hand, this setting prevents correct capturing of shock waves, as the entropy equation cannot be used as conservation law [33, 34], hence in the following we restrict ourselves to discussing the case of smooth compressible flows.

2.1 Implicit Treatment of Acoustic Waves

In order to remove (or at least alleviate) the time acoustic time step limitation in the generic coordinate direction (say, y), we proceed by splitting the convective flux vector into a purely advective part, and a part which supports acoustic fluctuations, namely

$$\begin{aligned} \mathbf {f}_y = \mathbf {f}_y^c + \mathbf {f}_y^a, \quad \mathbf {f}_y^c = \begin{bmatrix} 0 \\ \rho u v \\ \rho v^2 \\ \rho v w \\ \rho v s \end{bmatrix}, \quad \mathbf {f}_y^a = \begin{bmatrix} \rho v \\ 0 \\ p \\ 0 \\ 0 \end{bmatrix} . \end{aligned}$$
(7)

In a linearized setting, this splitting yields full decoupling of the acoustic, vortical and entropy modes [35]. Of course, such decoupling does not directly extend to the nonlinear case, and the ansatz 7 is mainly instrumental to try to suppress the acoustic time limitation, its validity resting in success, to be judged a-posteriori. The main advantage for numerical purposes is that the acoustic partial flux Jacobian has a simple structure,

$$\begin{aligned} \mathbf {A}_y^a = \frac{\partial {\mathbf {f}_y^a}}{\partial \mathbf {w}} = \begin{bmatrix} 0&0&1&0&0\\ 0&0&0&0&0&\\ \frac{p}{\rho }\left( \gamma -\frac{s}{C_v}\right)&0&0&0&\frac{p}{\rho C_v}\\ 0&0&0&0&0&\\ 0&0&0&0&0&\\ \end{bmatrix} . \end{aligned}$$
(8)

Splitting of the flux vectors into pressure and velocity contributions was previously considered by [36, 37], based on the attempt to reduce the block size in the implicit operator as compared to the Beam–Warming algorithm. In essence, these decompositions amounted [14] to isolating the pressure gradient in the momentum equation and the pressure flux in the total energy equation. However, besides being consistent with wave decomposition in a linear setting, we find the splitting (8) to be vastly more robust in practice.

We proceed to discretize Eq. (4) between two consecutive time levels n and \(n+1\), by evaluating explicitly the advective partial flux, and evaluating the acoustic partial flux implicitly, upon linearization about time level n, namely

$$\begin{aligned} {\mathbf {f}_y^a}^{n+1} = {\mathbf {f}_y^a}^{n} + {\mathbf {A}_y^a}^n \left( \mathbf {w}^{n+1} - \mathbf {w}^{n} \right) + O(\varDelta t^2), \end{aligned}$$
(9)

thus obtaining

$$\begin{aligned} \left( \mathbf {I} + \varDelta t \frac{\partial }{\partial y} {\mathbf {A}_y^a}^n \right) \varDelta \mathbf {w}^{n} = - \varDelta t \frac{\partial {\mathbf {f}^n_y}}{\partial y} + \varDelta t \mathbf {F}_{xz}^n = \varDelta t \, \mathbf {R}^n, \end{aligned}$$
(10)

where \(\varDelta \mathbf {w}^{n} = \mathbf {w}^{n+1}-\mathbf {w}^n\), and where terms containing transverse flux derivatives and viscous terms are lumped together into \(\mathbf {F}_{xz}\). It is important to note that, because of the special structure of the acoustic flux Jacobian, the inversion of Eq. (10) is much simpler than for the standard Beam–Warming algorithm, which relies on linearization of the full convective flux. Component-wise, Eq. (10) reads

figure a

Hence, the time increments of entropy and of the transverse velocity components can be evaluated explicitly, thus effectively reducing the system of equations to be solved to

figure b

which, upon discretization of the space derivative operators, yields a \(2 \times 2\) block-banded system of equations, whose solution returns the time increments of \(\rho \) and \(\rho v\). Equation (12) can be further rearranged by formally solving for \(\varDelta w^n_1\) in (12a), to obtain

$$\begin{aligned} \left( 1 - \varDelta t^2 {A_y^a}^n_{31} \frac{\partial ^2}{\partial y^2} - \varDelta t^2 \frac{\partial {A_y^a}^n_{31}}{\partial y} \frac{\partial }{\partial y} \right) \varDelta w^n_3 = \varDelta t \widehat{R}_3^n - \varDelta t^2 \frac{\partial }{\partial y} \left( {A_y^a}^n_{31} R_1^n \right) , \end{aligned}$$
(13)

whose solution requires the inversion of a single ordinary banded system of equations, with bandwidth depending on the accuracy in the approximation of the first and second space derivative operators. Back substitution into (12a) then returns the time increment of density. Although apparently cumbersome, we find the latter formulation to be more computationally efficient than the solution of the \(2 \times 2\) block system given by Eq. (12), while the accuracy is nearly identical. Hence, Eq. (13) is used in all the forthcoming numerical applications.

2.2 Implicit Treatment of Viscous Terms

If needed, viscous terms can also be handled implicitly, using approximate factorization. For that purpose, we split the viscous flux derivatives in Eq. (4) into a Laplacian term and a difference thereof

$$\begin{aligned} \frac{\partial \mathbf {f}_y^v}{\partial y} = \varvec{\mu }\frac{\partial ^2 {\mathbf {v}}}{\partial y^2} + {\varvec{\varphi }_y^v}, \end{aligned}$$
(14)

where \(\mathbf {v}\) is the vector of primitive variables, \(\mathbf {v}=\left[ \rho , u, v, w, T\right] \), and \(\varvec{\mu }\) is the viscosity matrix,

$$\begin{aligned} \varvec{\mu }= \begin{bmatrix} 0&0&0&0&0&\\ 0&\mu&0&0&0&\\ 0&0&\mu&0&0&\\ 0&0&0&\mu&0&\\ 0&0&0&0&\frac{\mu Cp}{{\textit{Pr}}T}&\\ \end{bmatrix} . \end{aligned}$$
(15)

Freezing for simplicity the viscosity matrix at time step n, the following linearization is considered,

$$\begin{aligned} \left( \varvec{\mu }\frac{\partial ^2 {\mathbf {v}}}{\partial y^2} \right) ^{n+1} \approx \left( \varvec{\mu }\frac{\partial ^2 {\mathbf {v}}}{\partial y^2} \right) ^n + \varvec{\mu }^n \frac{\partial ^2 {\mathbf {P} \varDelta \mathbf {w}^n}}{\partial y^2} , \end{aligned}$$
(16)

where \(\mathbf {P}\) is the Jacobian of the conservative-to-primitive variables transformation

$$\begin{aligned} \mathbf {P}=\frac{\partial \mathbf {v}}{\partial \mathbf {w}}= \begin{bmatrix} 1&0&0&0&0&\\ -\frac{u}{\rho }&\frac{1}{\rho }&0&0&0&\\ -\frac{v}{\rho }&0&\frac{1}{\rho }&0&0&\\ -\frac{w}{\rho }&0&0&\frac{1}{\rho }&0&\\ \frac{-T s}{\rho c_v}&0&0&0&\frac{T}{\rho c_v}&\\ \end{bmatrix} . \end{aligned}$$
(17)

Following similar steps as done to arrive at Eq. (10), the previous linearization yields

$$\begin{aligned} \left( \mathbf {I} + \varDelta t \frac{\partial }{\partial y} {\mathbf {A}_y^a}^n - \varDelta t \, \varvec{\mu }^n \frac{\partial ^2}{\partial y^2} \mathbf {P}^{n} \right) \varDelta \mathbf {w}^{n} = \varDelta t \, \mathbf {R}^n, \end{aligned}$$
(18)

which can be approximately factorized as follows

$$\begin{aligned} \left( \mathbf {I}+\varDelta t \frac{\partial }{\partial y} {\mathbf {A}_y^a}^n \right) \left( \mathbf {I} -\varDelta t \, \varvec{\mu }^n \frac{\partial ^2}{\partial y^2} \mathbf {P}^{n} \right) \varDelta \mathbf {w}^n = \varDelta t \, \mathbf {R}^n. \end{aligned}$$
(19)

Inversion of Eq. (19) can be then carried out into two sequential sub-steps,

$$\begin{aligned} \left( \mathbf {I}+\varDelta t \frac{\partial }{\partial y} {\mathbf {A}_y^a}^n \right) \widetilde{\varDelta \mathbf {w}}^n= & {} \varDelta t \mathbf {R}^n, \end{aligned}$$
(20)
$$\begin{aligned} \left( \mathbf {I} -\varDelta t \, \varvec{\mu }^n \frac{\partial ^2}{\partial y^2} \mathbf {P}^{n} \right) {\varDelta \mathbf {w}^n}= & {} \widetilde{\varDelta \mathbf {w}}^n , \end{aligned}$$
(21)

whereby the provisional time increment \(\widetilde{\varDelta \mathbf {w}}^n\) is first evaluated through the inversion procedure for the convective fluxes described in Sect. 2.1. The actual time increment \(\varDelta \mathbf {w}^n\) is then evaluated by inverting the viscous implicit operator at the left-hand-side of Eq. (21) which, in light of the special structure of the Jacobian matrix given in Eq. (17), can be carried out sequentially, as follows

figure c

The inversion of four standard narrow-banded systems of equations is thus required for the purpose. We point out that the present procedure is again different than the original Beam–Warming procedure, which relies on linearization of the full viscous flux vectors, hence requiring the inversion of block-banded systems. However, we have found that numerical robustness is very weakly affected by the approximations herein made.

2.3 Multiple Space Dimensions

As done for the case of a single space dimension, the acoustic and viscous time limitations can be removed in more than one direction through direction-wise factorization of the implicit operators. For instance, assuming that all space directions are handled in semi-implicit fashion, Eq. (19) is replaced by

$$\begin{aligned} \mathbf {L}^n \varDelta \mathbf {w}^n = \mathbf {R}^n, \end{aligned}$$
(23)

where

$$\begin{aligned} \mathbf {L}^n =&\left( \mathbf {I}+\varDelta t \frac{\partial }{\partial x} {\mathbf {A}_x^a}^n \right) \left( \mathbf {I}+\varDelta t \frac{\partial }{\partial y} {\mathbf {A}_y^a}^n \right) \left( \mathbf {I}+\varDelta t \frac{\partial }{\partial z} {\mathbf {A}_z^a}^n \right) \cdot \nonumber \\&\left( \mathbf {I} -\varDelta t \varvec{\mu }^n \frac{\partial ^2}{\partial x^2} \mathbf {P}^{n} \right) \left( \mathbf {I} -\varDelta t \varvec{\mu }^n \frac{\partial ^2}{\partial y^2} \mathbf {P}^{n} \right) \left( \mathbf {I} -\varDelta t \varvec{\mu }^n \frac{\partial ^2}{\partial z^2} \mathbf {P}^{n} \right) . \end{aligned}$$
(24)

Hence, repeated application of the procedures developed in the previous two sections is sufficient. Practical application of Eq. (24) requires some caution, as the order in which the various inversions are carried out is not immaterial. We have found that, in order to remove possible spurious anisotropies, it is a good practice to shuffle the order of the implicit left-hand-side operators.

2.4 Time Integration

Time accuracy and stability enhancement is typically obtained by Runge–Kutta schemes as wrapper to one-step implicit procedures outlined in the previous paragraphs. Low-storage algorithms are a popular choice, and here we consider for example Wray’s three-stage, third-order scheme [38], adapted to semi-implicit integration of the convective terms,

$$\begin{aligned} \mathbf {L}^{(\ell )} \varDelta \mathbf {w}^{(\ell )} = \alpha _{\ell } \varDelta t \mathbf {R}^{(\ell -1)} + \beta _{\ell } \varDelta t \mathbf {R}^{(\ell )}, \quad \ell =0,1,2, \end{aligned}$$
(25)

where \(\varDelta \mathbf {w}^{(\ell )} = \mathbf {w}^{(\ell +1)} - \mathbf {w}^{(\ell )}\), \(\mathbf {w}^{(0)}=\mathbf {w}^{n}\), \(\mathbf {w}^{n+1}=\mathbf {w}^{(3)}\), the left-hand-side implicit operator is a generalization of Eq. (24), namely

$$\begin{aligned} \mathbf {L}^{(\ell )}= & {} \left( \mathbf {I}+\gamma _{\ell } \varDelta t \frac{\partial }{\partial x} {\mathbf {A}_x^a}^{(\ell )} \right) \left( \mathbf {I}+\gamma _{\ell } \varDelta t \frac{\partial }{\partial y} {\mathbf {A}_y^a}^{(\ell )} \right) \left( \mathbf {I}+\gamma _{\ell } \varDelta t \frac{\partial }{\partial z} {\mathbf {A}_z^a}^{(\ell )} \right) \\&\cdot \left( \mathbf {I} -\gamma _{\ell } \varDelta t \varvec{\mu }^{(\ell )} \frac{\partial ^2}{\partial x^2} \mathbf {P}^{(\ell )} \right) \left( \mathbf {I} -\gamma _{\ell } \varDelta t \varvec{\mu }^{(\ell )} \frac{\partial ^2}{\partial y^2} \mathbf {P}^{(\ell )} \right) \left( \mathbf {I} -\gamma _{\ell } \varDelta t \varvec{\mu }^{(\ell )} \frac{\partial ^2}{\partial z^2} \mathbf {P}^{(\ell )} \right) , \end{aligned}$$

and the integration coefficient are \(\alpha _{\ell } = (0, 17/60,-5/12)\), \(\beta _{\ell } = (8/15, 5/12, 3/4)\), \(\gamma _{\ell } = \alpha _{\ell } + \beta _{\ell }\). We have found this time stepping scheme to work well in practice, however because of the partial flux linearization, the method is only formally first-order accurate in time.

Higher order of accuracy in time can be achieved using a third-order accurate semi-implicit Runge–Kutta scheme [1], which can be conveniently cast as follows

figure d

where \(\gamma _{\ell } = \gamma \) is the same for all sub-steps, and \(\alpha \) are free parameters (hereafter, we assume \(\alpha =1\), \(\gamma =0.6\)). With respect to Wray’s algorithm, Eq. (26) is not in low-storage form (although it can be implemented using three arrays only), and it involves an additional inversion, to achieve third-order regardless of the linear operator \(\mathbf {L}^{(\ell )}\) in Eq. (26), but no additional evaluation of the explicit operator. Extensive accuracy tests of the method are reported in the original reference.

2.5 Stability Analysis

The stability of the semi-implicit algorithm herein developed is here analyzed within the simplified setting of the linearized inviscid acoustic equations in the presence of a mean flow \(u_0\), which can be cast as

$$\begin{aligned} \frac{\partial \mathbf{v}}{\partial t} + \mathbf{A} \frac{\partial \mathbf{v}}{\partial x} = 0, \quad \mathbf{v} = \begin{bmatrix} \rho ' \\ u' \end{bmatrix}, \quad \mathbf{A} = \begin{bmatrix} u_0&\rho _0 \\ c_0^2/\rho _0&u_0 \end{bmatrix} , \end{aligned}$$
(27)

where the subscript 0 refers to the unperturbed state, and primes to fluctuations thereof. A semi-implicit discretization of (27) can be obtained by considering the linearized counterpart of the partial flux Jacobian (8), namely

$$\begin{aligned} \mathbf{A}^a = \begin{bmatrix} u_0&\rho _0 \\ c_0^2/\rho _0&0 \end{bmatrix} . \end{aligned}$$
(28)

Backward Euler discretization of Eq. (27) then yields

$$\begin{aligned} \left( \mathbf{I} - \varDelta t \mathbf{A}^a \frac{\partial }{\partial x} \right) \varDelta \mathbf{v}^n = - \varDelta t \mathbf{A} \frac{\partial \mathbf{v}^n}{\partial x}. \end{aligned}$$
(29)

Transforming Eq. (29) to Fourier space with the token \(\mathbf{v} (x,t) = \hat{\mathbf{v}}(t) e^{i k x}\) yields the amplification matrix of the scheme

$$\begin{aligned} \mathbf{G} = \mathbf{I} - \left( \mathbf{I} - i {\varDelta t} \tilde{k} \mathbf{A}^a \right) ^{-1} i {\varDelta t} \tilde{k} \mathbf{A}, \end{aligned}$$
(30)

where \(\mathbf{v}^{n+1} = \mathbf{G} \mathbf{v}^n\), and \(\tilde{k}\) is the modified wavenumber corresponding to the discretization of the space first derivative operator [39]. Von Neumann’s stability condition requires that both eigenvalues of \(\mathbf{G}\) are no larger than unity in modulus. Assuming for instance second-order central differencing (i.e. \(\tilde{k} h = \sin (k h)\)), it turns out that the scheme (29) is unconditionally stable for \(M_0 = u_0/c_0 \lesssim 1\). A similar analysis can be carried out (details are omitted) for the Runge–Kutta time stepping scheme of Eq. (26). In the case of explicit time integration (i.e. \(\gamma =0\)) the scheme is stable for \(\mathrm {CFL} \lesssim \sqrt{3}\), where \(\mathrm {CFL} = (u_0+c_0) \varDelta t / h\). In the case of semi-implicit time integration (with \(\gamma =0.6\), \(\alpha =1\)) unconditional stability is achieved for \(M_0 \lesssim 0.525\).

Fig. 2
figure 2

Smallest eigenvalue of amplification matrix at \(\mathrm {CFL}=1\) (a), \(\mathrm {CFL}=2\) (b), \(\mathrm {CFL}=5\) (c), for explicit Runge–Kutta time integration (dotted lines), semi-implicit time integration (with \(\alpha =1\), \(\gamma =0.6\), solid lines), and fully implicit Beam–Warming scheme (dashed lines), at Mach number \(M_0=0.3\). Curves are only shown for stable schemes

To provide an idea of the accuracy of the algorithm, in Fig. 2 we show the smallest eigenvalues of the amplification matrix at various Courant numbers for explicit and semi-implicit Runge–Kutta time integration. For reference, the amplification factor of the baseline Beam–Warming algorithm is also shown. At CFL numbers lower than the stability limit for explicit discretization [panel (a)], the semi-implicit and the fully explicit algorithms have similar performance, whereas the Beam–Warming algorithm has somewhat higher diffusion. At higher Courant numbers the explicit scheme goes unstable, and semi-implicit and fully implicit scheme have similar performance, with slightly less diffusive behavior of Beam–Warming at higher \(\mathrm {CFL}\). Notably, all schemes have unit amplification factor at the Niquist limit (\(k h = \pi \)), hence they are not dissipative in the sense of Kreiss. This is the reason why schemes of the Beam–Warming family are typically used with explicit addition of artificial diffusion terms [7, 13].

2.6 Spatial Discretization

All the convective derivatives at the right-hand-side operator defined in Eq. (4) are discretized using conservative, energy-preserving formulas [40], based on application of standard central difference approximations to the fully expanded form of the convective derivatives [41]. In the explicit case this discretization allows to exactly preserve the total kinetic energy from convection, and conserve the entropy variance in the inviscid limit, hence providing strong nonlinear stability to the algorithm without introducing any numerical diffusion [33, 42]. We have found that this feature is very important to prevent nonlinear divergence caused by accumulation of aliasing errors, especially in light of the fact that the semi-implicit algorithms herein dealt with have zero numerical diffusion at the highest resolved wavenumbers. Hence, no explicit addition of artificial diffusion is needed for the semi-implicit algorithm herein developed. Viscous terms are also expanded to Laplacian form and discretized by means of central formulas [43].

Previous studies [44,45,46] have shown that second-order spatial accuracy is sufficient for DNS of turbulence, provided energy-consistent discretizations are used. Hence, in the present work we focus on second-order space discretizations, which only require the inversion of standard tridiagonal matrices. Extension of the algorithm to higher-order spatial accuracy is straightforward, and it can be achieved by either reverting to compact-difference approximations [13], or by widening the discretization stencil. In the latter case, higher-order accuracy in space can be achieved at the price of inverting wider banded matrices.

2.7 Computational Efficiency

Achieving higher computational efficiency is obviously the main motivation for using implicit algorithms, which are inherently more computationally intensive than explicit ones. Computational cost figures for the present semi-implicit algorithm and for the Beam–Warming scheme are listed in Table 1, as a fraction of the cost for the baseline explicit algorithm. Cost estimates are given for implicit treatment of convective terms only, and for simultaneous treatment of convective and viscous terms, referring to a single space direction. Also for ease of later reference, we use the following notation to distinguish the various schemes. The semi-implicit scheme herein developed is referred to as either acoustic terms-implicit (ATI, as in Eq. (10)), or ATVI in the case that both convective and viscous terms are handled implicitly (Eq. 18). As a basis of comparison, cost figures for the Beam–Warming (BW) scheme, also with implicit treatment of the viscous terms (BWV) are reported. Cost figures are provided for both the case of periodic (CYC) and non-periodic boundary conditions. It should be noted that the cost estimates refer to actual parallel computations, and also include the computational overhead for data transposition across processors in non-contiguous space directions. Of course, precise figures may change depending on the specific implementation of the algorithm and/or machine architecture, but we trust that the numbers listed in the table provide a reasonably robust estimate. It appears that the computational overhead of the ATI algorithm is rather limited, hence implicit treatment of a given space direction is computationally advantageous provided the attainable time step is at least \(20\%\) higher than for fully explicit. Substantial improvement of computational efficiency over standard Beam–Warming discretization is also apparent, for comparable expected accuracy (recalling Fig. 2).

Table 1 Computational cost for implicit schemes compared to fully explicit discretization

3 Numerical Results

The performance of the semi-implicit algorithm herein developed is tested through application to a series of canonical compressible turbulent flows, in order of increasing physical complexity.

3.1 Isotropic Turbulence

Numerical simulations of homogeneous isotropic turbulence have been frequently carried out to evaluate the properties of numerical schemes for turbulent flows [47]. DNS are here carried out in a triply periodic \((2 \pi )^3\) box, discretized with \(64^3\) collocation points. At the initial time pressure and density are taken to be uniform, and solenoidal velocity perturbations are added according to the procedure introduced by [48], with prescribed three-dimensional energy spectrum

$$\begin{aligned} E(k) = 16 \sqrt{\frac{2}{\pi }} \frac{u_0^2}{k_0} \left( \frac{k^4}{k_0} \right) ^4 e^{-2 (k/k_0)^2}, \end{aligned}$$
(31)

where \(k_0 = 4\) is the most energetic mode. The initial turbulent Mach number is given by \(M_{t0} = \sqrt{3} u_0/c_0 = 0.3\), and the Reynolds number based on the Taylor microscale is \({\textit{Re}}_{\lambda } = 2 \rho _0 u_0 / (\mu _0 k_0) = 30\). Time is made nondimensional with respect to the eddy turnover time \(\tau = 2 \sqrt{3} / (k_0 M_{t0} c_0)\).

Fig. 3
figure 3

Numerical simulations of homogeneous isotropic turbulence at \(M_t=0.3\), \(k_0=4\), \(Re_{\lambda }=30\), with ATI-XYZ scheme. Time history of turbulence kinetic energy (a), and pressure variance (b), and spectra of velocity (c) and pressure fluctuations (d) at \(t/\tau =5\). Solid lines denoted reference results obtained with explicit time discretization at \(\mathrm {CFL}=1\). Symbols denote results obtained with ATI scheme at \(\mathrm {CFL} = 1\) (squares), \(\mathrm {CFL} = 2\) (circles), \(\mathrm {CFL} = 3\) (triangles), \(\mathrm {CFL} = 4\) (down-triangles), \(\mathrm {CFL} = 5\) (diamonds)

Fig. 4
figure 4

Numerical simulations of homogeneous isotropic turbulence at \(M_t=0.3\), \(k_0=4\), \(Re_{\lambda }=30\), with BW-XYZ scheme. Time history of turbulence kinetic energy (a), and pressure variance (b), and spectra of velocity (c) and pressure fluctuations (d) at \(t/\tau =5\). Solid lines denoted reference results obtained with explicit time discretization at \(\mathrm {CFL}=1\). Symbols denote results obtained with BW scheme at \(\mathrm {CFL} = 1\) (squares), \(\mathrm {CFL} = 2\) (circles), \(\mathrm {CFL} = 3\) (triangles), \(\mathrm {CFL} = 4\) (down-triangles)

The results obtained with ATI and BW discretization in all space directions are shown in Figs. 3 and 4, respectively, at various Courant numbers. Stable results are obtained for \(\mathrm {CFL} \lesssim 5.1\) for ATI, and \(\mathrm {CFL} \lesssim 4.8\) for BW. Loss of stability at larger time steps is due to flux linearization and/or factorization errors, which prevent unconditional stability in practical computations [13]. The time behavior of turbulence kinetic energy [panel (a)] is well predicted at all Courant numbers up to the stability limit, whereas pressure fluctuations [panel (b)] are overdamped starting at \(\mathrm {CFL} \approx 3\), in both ATI and BW. The different behavior is caused by the fact that pressure receives contributions of both hydrodynamic and acoustic nature. As seen in the previous Section, acoustic waves undergo significant damping at high Courant number. This is even clearer in the velocity and pressure spectra, shown in panels (c) and (d), respectively. Although velocity spectra are perfectly captured at all Courant numbers, pressure spectra undergo numerical damping, especially at intermediate wavenumbers, which is easily understood based on the amplification factors shown in Fig. 2. Given the similar performance of the two implicit methods for this test case, ATI is certainly preferable owing to its lower computational cost, which allows to achieve an effective speed-up over the explicit case (see Table 1) of about a factor of three, whereas BW yields almost the same efficiency.

3.2 Turbulent Flow in Plane Channel

Channel flow is the simplest prototype of wall-bounded flows, and it has been studied by many authors in the incompressible [2, 45, 49], as well as in the compressible regime [6, 50, 51]. The controlling parameters are the bulk Mach number \(M_b=u_b/c_w=1.5\) (where \(u_b\) is the average velocity across the channel thickness, and \(c_w\) the sound speed at the wall temperature), and the bulk Reynolds number \(Re_b=2\rho _b u_bh/\mu _w=6000\) (where \(\rho _b\) is the bulk density, \(\mu _w\) the dynamic viscosity at the wall, and h the channel half height). All DNS are initialized with a parabolic velocity profile with superposed small perturbations, whereas density and pressure are uniform. Periodic boundary conditions are applied in the streamwise (x) and spanwise (z) coordinate directions, and no-slip, isothermal boundary conditions are applied at the walls. A spatially uniform forcing is applied to the streamwise momentum equation, and dynamically adjusted in time to maintain constant mass flow rate [6]. Favre density-weighted decomposition is applied to separate mean values from fluctuations, namely \(\phi =\widetilde{\phi } + \phi ''\), with \(\widetilde{\phi }=\overline{\rho \phi }/\overline{\rho }\)).

Table 2 Flow parameters for DNS of plane channel flow (CH)

The main flow parameters are listed in Table 2. Three flow cases have been considered, one at \(M_b=0.1\) (denoted as CH01), and two at \(M_b=1.5\) (denoted as CH15a-b), the latter two only differing in the distance of the first grid point from the wall. Reference DNS have been carried out with fully explicit time discretization, at \(\mathrm {CFL} \approx 1\), which are used as a basis of comparison for the ATI and BW algorithms. In order to understand the effectiveness of the (semi-)implicit algorithms, in Table 2 we report the time step restrictions associated with the three coordinate directions, as estimated from Eqs. (1) and (3), as well as the actual time step used in the DNS, all in wall units. As expected, in all flow cases the time step limitation in the wall-normal direction is the most restrictive. Although larger time steps are allowed on grounds of sole numerical stability, all DNS have been carried out at the maximum time step for which accurate results are obtained, which corresponds to \(CFL \approx 1\) for the fully explicit simulations. For ease of reference, the maximum time steps associated with accuracy and stability restrictions are also reported in Fig. 1a with circle and square symbols, respectively.

As a first test, we consider flow at low subsonic Mach number (CH01), for which the explicit time advancement step is very small, hence we apply implicit treatment is all coordinate directions (XYZ). We find that, although the wall-normal time step restrictions can be removed, the allowed time step for accurate calculations cannot be substantially larger than the streamwise convective restriction (see Fig. 1a). This is probably due to inherent mesh anisotropy in DNS of wall-bounded flows. In fact, mesh spacing is over-resolved in the wall-normal direction, hence the relevant values of the reduced wavenumber kh are small, which allows to operate at high values of \(\mathrm {CFL}\) with little error, recalling (see Fig. 2) that the dissipation error grows with both kh and \(\mathrm {CFL}\). On the other hand, the typical wall-parallel mesh spacings used in DNS are barely sufficient to resolve the smallest scales of turbulence, hence the typical reduced wavenumbers are higher, and time accuracy is a factor in that case. We find that both ATI and BW are capable of boosting the time step by about a factor of ten, with efficiency gain of \(85\%\) for ATI, and results almost indistinguishable from the fully explicit case (see below). Still, the time step is far from that allowed by incompressible solvers (again, see Fig. 1a). This issue will be further recalled in the concluding discussion.

To show effectiveness in removing the wall-normal acoustic time limitation in supersonic flow calculations, in flow case CH15a the first grid point is placed sufficiently far from the wall that the viscous limitation is ineffective. Hence, the implicit algorithms are applied only in the wall-normal direction (Y), and viscous terms are handled explicitly. The ATI and BW algorithms are both found to effectively suppress the wall-normal acoustic time step limitation, and achieve the same maximum time step for accurate flow resolution, corresponding to about \(\mathrm {CFL} = 2.4\). Hence, accounting for the cost figures given in Table 1, we find a speed-up of about a factor of two for the ATI algorithm, and \(30\%\) gain with BW.

To prove effectiveness of the implicit treatment of the viscous terms proposed in Sect. 2.2, in flow case CH15b the first grid point is placed closer to the wall, in such a way that the viscous time limitation also becomes relevant, after the acoustic one. Both wall-normal time step restrictions are suppressed through use of the AVTI and BWV algorithms, hence the achieved time step is similar to flow case CH15a. Both algorithms here achieve \(\mathrm {CFL} \approx 10\), at a cost which is a small fraction of the fully explicit algorithm.

For the sake of comparison, in Figs. 5, 6 and 7 we show the main statistics for the flow cases listed in Table 2. As anticipated, excellent agreement is observed between implicit algorithms and the reference explicit solution, including pressure and temperature fluctuations, which is especially satisfactory.

Fig. 5
figure 5

Flow statistics for DNS of flow case CH01 (see Table 2): mean velocity (a), Reynolds stresses (b), r.m.s. pressure (c) and r.m.s. temperature (d), for CH01-EXPL (squares), CH01-ATI-XYZ (circles), CH01-BW-XYZ (triangles). \(T_{\tau } = q_w / (\rho _w c_p u_{\tau })\) is the friction temperature

Fig. 6
figure 6

Flow statistics for DNS of flow case CH15a (see Table 2): mean velocity (a), Reynolds stresses (b), r.m.s. pressure (c) and r.m.s. temperature (d), for CH15a-EXPL (squares), CH15a-ATI-Y (circles), CH15a-BW-Y (triangles)

Fig. 7
figure 7

Flow statistics for DNS of flow case CH15b (see Table 2): mean velocity (a), Reynolds stresses (b), r.m.s. pressure (c) and r.m.s. temperature (d), for CH15b-EXPL (squares), CH15b-AVTI-Y (circles), CH15b-BWV-Y (triangles)

3.3 Turbulent Flow in Square Duct

As a further step in complexity we consider the flow inside a straight duct with square cross-section. This flow has been the subject of several DNS studies in the incompressible regime [52,53,54], all limited to low Reynolds number. One of the main difficulties that arise when dealing with square duct flows is the long averaging time necessary to attain convergence of even the basic mean flow statistics, caused by the extremely long typical time scales of secondary corner eddies. In fact, [54] reported that an averaging time of about \(8000 h/u_b\) was needed to have symmetric statistics in the four quadrants of the cross section. Hence, it is clear that efficient numerical methods are needed to study turbulent compressible flow in ducts. Numerical simulations have been here carried out (see Table 3 for the main flow parameters) at the same Reynolds number as [54], and sufficiently low Mach number (\(M_b=0.2\)) that direct comparison with incompressible data is possible. The duct length \(L_x=8h\) (where 2h is the length of each side of the duct), and the time window for collecting the flow statistics is the same used by [54]. As in plane channel flow, a spatially uniform forcing is applied to the momentum equation to maintain a time constant mass flow rate. Note that, unlike in channel flow, the mesh is also non-uniformly spaced in the z direction, hence a range of mesh spacings is reported in Table 3. A reference fully explicit numerical simulation has been carried out and used as a basis of reference for the ATI algorithm, here applied to all coordinate directions. As seen in Table 3, the corresponding CFL number is about unity. As in the case of plane channel, DNS were carried out at increasing values of CFL, until deviations from the reference data were found, to determine the maximum allowed time step for accuracy. It appears that accurate results of the semi-implicit algorithm are recovered up to \(\mathrm {CFL} \approx 10\). Again, implicit treatment of the x direction is not capable of fully suppressing the corresponding time step limitation, owing to the emergence of accuracy issues. Similar to channel flow, use of the ATI algorithm allows for about \(85\%\) cost reduction. Figure 8 confirms that excellent matching of the flow statistics is found among DU02-ATI, DU02-EXPL and the data of [54], except for some differences in the wall-normal Reynolds stress and the pressure r.m.s., which may be due to the greater importance of acoustic waves in the presence of a fully confined flow geometry.

Table 3 DNS dataset for square duct (DU) flow
Fig. 8
figure 8

DNS of flow in square duct (see Table 3): mean velocity (a), Reynolds stresses (b), r.m.s. pressure (c) and r.m.s. temperature (d), for DU02-EXPL (squares), DU02-ATI-XYZ (circles). Triangle symbols denote reference incompressible DNS data [54]

4 Conclusions

A novel semi-implicit algorithm for time-accurate solution of the compressible Navier–Stokes equations has been developed, which is capable to operate efficiently all the way from low subsonic to supersonic flow conditions. The main features of the algorithm are as follows: (i) use of the entropy transport equation instead of total energy conservation; (ii) Beam–Warming-like linearization of the partial convective flux associated with acoustic propagation; (iii) energy-consistent discretization of the convective derivatives in the explicit part of the time-advancement operator; (iv) semi-implicit treatment of viscous fluxes based on isolation of Laplacian terms; (v) approximate factorization for implicit treatment of multiple space directions; (vi) third-order accurate Runge–Kutta time integration, according to the algorithm proposed by [1]. The main advantage of the algorithm is that, unlike the classical Beam–Warming scheme, it avoids the computationally expensive inversion of \(5\times 5\) block-banded matrices, but rather of standard banded matrices (tridiagonal matrices in the case of second-order accurate space discretization). Specifically, a single banded matrix inversion is needed for implicit treatment of the convective terms, whereas five matrix inversions are needed if viscous terms are also handled implicitly. The cost overhead with respect to standard explicit algorithms (see Table 1) is quite modest, ranging from 20 to 30%, for each space direction to be handled implicitly. Modification of existing compressible flow solvers to incorporate the present method is straightforward, as the explicit part of the algorithm is unchanged.

The method nominally allows unconditional stability for low-Mach-number flows. However, flux linearization and approximate factorization reduce the stability margins, and CFL number of the order of 5–10 are achieved in practical computations, which is probably less than achievable with iterative methods. However, compared with compressible flow algorithms based on pre-conditioning, the present method avoids the use of inner time iterations, whose computational cost is difficult to estimate a-priori. The other shortcoming of the method is the use of the entropy equation, which is instrumental to achieve (approximate) separation of hydrodynamic from acoustic effects, but which precludes correct capture of shock waves, as the equation is not in conservation form. Efforts are in progress to achieve shock-capturing capabilities through local replacement of the entropy equation with the total energy conservation equation. Although the algorithm herein developed has in principle much wider range of applications, the main focus of this paper was on DNS of compressible wall-bounded flows, which is notoriously plagued by severe time step restrictions inherited from the wall-normal acoustic and viscous stability conditions. We have found that the wall-normal acoustic time limitation can be effectively removed through semi-implicit treatment. The same conclusion also applies to the viscous time step restriction, although the most efficient way to remove it is placing the first grid point sufficiently away from the wall \(y^+ \approx 0.5-0.7\), and using suitable staggering [6], with no effect on accuracy. The wall-parallel stability restrictions can also be suppressed through semi-implicit treatment. However, accuracy considerations lead to the practical rule (see Fig. 1) that the time step cannot be much larger than the one stemming from the streamwise time limitation. Hence, we suggest that in low-subsonic flow both the wall-normal and the spanwise convective terms are handled implicitly, whereas the streamwise terms can be evaluated explicitly. The resulting saving of computer time can then be of the order of \(85\%\) with respect to a fully explicit solver. In high subsonic or supersonic flow, implicit treatment of the wall-normal convective derivatives is sufficient, with typical savings of to order of \(50\%\), in line with theoretical estimates.

We foresee that the present technique can be fruitfully extended to numerical simulation of wall-bounded turbulent flows with time-accurate models, such as LES or DES [55]. In that case, given the higher aspect ratio of near-wall cells, higher gains are expected. Advantages with respect to classical algorithms based on Beam–Warming linearization are also expected for steady RANS applications. Indeed, although the present algorithm is in principle only capable of suppressing the acoustic time step limitation, it is found to be at least as stable as Beam–Warming in practical computations.