1 Introduction

Typical technical applications, in which multiphase processes can be found, are fuel injection systems such as rocket combustion chambers. The problems inherently contain multiple scales. First, the liquid fuel is injected as a jet with a liquid core. Over time, the jet breaks up and ligaments and droplets form. At the surface of the liquid interface, phase change occurs and the gaseous environment is mixed with evaporated fuel. This mixture is then ignited.

In this project, we aim to understand the mixing processes leading up to the burning of the fuel/oxidizer mixture. Due to the multiscale character, we split the investigation into large scale jet simulations and more detailed simulations of single droplets. These processes face extreme ambient conditions that often exceed the critical state of the fuel. In these regimes, the liquid phase cannot be described incompressible any more and we have to consider the full coupling of hydrodynamics and thermodynamics, requiring the fully compressible flow equations.

The macroscopic modelling for jet simulations is based on the Homogeneous Equilibrium Model (HEM) [22], which considers a mixture of saturated liquid and saturated vapor under full thermodynamic equilibrium. An extension of the intrinsic assumption of vapor-liquid equilibrium in the HEM approach, towards binary mixtures, is the nested procedure of tangent plane distance (TPD) function [15] analysis and classical TPn-flash calculation [16]. These methods are restricted to modifications of the underlying equations of state (EOS), only. Especially with more than one species, the evaluation of the EOS becomes very costly. Therefore, we use look-up tables which shifts the evaluation costs into a pre-processing step while during runtime, only the look-up in an octree data structure is required [9]. For binary mixtures, the look-up tables become huge in storage size which causes problems if the size exceeds the memory of the CPU. Therefore, in this paper we propose a shared memory parallelization of look-up tables based on the MPI 3.0 standard. We provide performance results on benchmark test cases and show its practical use with the simulation of a binary mixing layer.

This project also investigates modelling strategies of phase interfaces, e.g. for droplets, such as sharp and diffuse interface models. As an example of the latter, we use a parabolic relaxation model of the isothermal Navier–Stokes–Korteweg (NSK) equations to simulate the collision of two droplets at varying model parameters. Numerical experiments were conducted using an extension of the open source code FLEXI.Footnote 1 It is based on a high order nodal discontinuous Galerkin spectral element method (DGSEM) [12].

The outline of the paper is as follows. In the next section, the governing equations are presented. This is followed by the description of the numerical methods, thermodynamic modelling and the look-up table approach. We then present the results on the performance of the look-up tables. Numerical experiments are shown of a two component mixing layer at super-critical conditions using a Peng-Robinson EOS as well as two colliding droplets using NSK diffuse interface model.

2 Governing Equations

2.1 The Compressible Navier–Stokes System for Multi-components

The compressible Navier–Stokes equations with multiple components are given by

$$\begin{aligned} \frac{\partial \rho }{\partial t} + \nabla \cdot \left( \rho \varvec{u}\right)&= 0\,, \end{aligned}$$
(1)
$$\begin{aligned} \frac{\partial \rho Y_{k}}{\partial t} + \nabla \cdot \left( \rho Y_{k} \varvec{u} \right)&= \nabla \cdot \left( -\varvec{J}_{k} \right) \,, \end{aligned}$$
(2)
$$\begin{aligned} \frac{\partial \rho \varvec{u}}{\partial t} + \nabla \cdot \left( \rho \varvec{u} \otimes \varvec{u}+p \underline{\varvec{I}}\right)&= \nabla \cdot \left( \underline{\varvec{\tau }}\right) \,, \end{aligned}$$
(3)
$$\begin{aligned} \frac{\partial E}{\partial t} + \nabla \cdot \left[ \left( E + p\right) \varvec{u} \right]&= \nabla \cdot \left( \underline{\varvec{\tau }} \cdot \varvec{u} - \varvec{q}\right) \,, \end{aligned}$$
(4)

with

$$\begin{aligned} \underline{\varvec{\tau }}=\underbrace{2\mu \underline{\varvec{S}}-2/3\underline{\varvec{I}}\cdot \nabla \varvec{u}}_{\mathrm {Stokes~law}} ,\quad \underline{\varvec{S}}=1/2 \left( \nabla \varvec{u}+\left( \nabla \varvec{u}\right) ^\text {T}\right) , \end{aligned}$$
(5)

where \(\rho \) is the density, \(\varvec{u}=(u,v,w)^{\mathrm {T}}\) is the velocity vector, p is the static pressure, E is the total energy per unit volume, \(\underline{\varvec{I}}\) is the unit tensor. By considering \(N_k\) species, the system is extended by \(N_k-1\) concentration equations where \(\varvec{Y}=(Y_1,Y_2,~\dots ~,Y_{N_k-1})^\text {T}\) with \(Y_k = \frac{\rho _k}{\rho }\) is defined as the mass fraction of each species. For multi-component simulations, the heat flux is usually comprised of \(\varvec{q}=\varvec{q}^{\mathrm {f}}+\varvec{q}^{\mathrm {d}}+\varvec{q}^{\mathrm {c}}\), where

$$\begin{aligned} \varvec{q}^{\mathrm {f}}=-\lambda \nabla T \end{aligned}$$
(6)

is the specific heat flux according to Fourier law with thermal conductivity \(\lambda \) and temperature T. The second term is the inter-species energy flux due to diffusion

$$\begin{aligned} \varvec{q}^{\mathrm {d}}=\sum _k h_{k} \varvec{J}_{k}. \end{aligned}$$
(7)

Here, \(\varvec{q}^{\mathrm {c}}\) are cross-effects, like the Dufour effect, which are not considered in this paper. The viscous stress tensor \(\underline{\varvec{\tau }}\) with the strain rate tensor \(\underline{\varvec{S}}\) is defined for a Newtonian fluid. The concentration diffusion flux is usually comprised of \(\varvec{J}_{k}=\varvec{J}_{k}^{\mathrm {f}} + \varvec{J}_{k}^{\mathrm {c}} + \varvec{J}_{k}^{\mathrm {b}} \), where

$$\begin{aligned} \varvec{J}_k^{\mathrm {f}} = -\rho D_k \nabla Y_k\,, \quad k=1,\dots ,N_k-1, \end{aligned}$$
(8)

is the concentration diffusion flux according to Fickian law and \(D_{k}\) is the species diffusion coefficient. Here \(\varvec{J}_{k}^c\) are cross-effects, like the Soret effect, which are also neglected in this paper. The third term,

$$\begin{aligned} \varvec{J}_{k}^{\mathrm {b}} = -\rho Y_k \sum _{j=1}^{N_k}\left( D_j \nabla Y_j \right) \,, \quad k=1,\dots ,N_k-1, \end{aligned}$$
(9)

is a correction for the mass balance and recovers \(\sum _{k=1}^{N_k}\varvec{J}_{k}=0\) to guarantee conservation in cases where the species diffusion fluxes are significantly large [6]. Properties for the last species can be calculated via following relations

$$\begin{aligned} \sum ^{N_k}_{k=1} Y_k = 1\,, ~ ~ ~ ~ ~ ~ ~ \sum ^{N_k}_{k=1} \rho _k = \rho \,. \end{aligned}$$
(10)

Since there are \(5+(N_k-1)\) unknown variables, a closure relation is required between the variables pressure, density, specific internal energy per mass, \(\epsilon \), and the species composition,

$$\begin{aligned} E=\rho \epsilon +\frac{1}{2}\rho \varvec{u} \cdot \varvec{u}\,, ~~~~\epsilon&=\epsilon (\rho ,p,\varvec{Y})\,,~~~~p=p(\rho ,\epsilon ,\varvec{Y}) \,. \end{aligned}$$
(11)

Such a functional relation is called an equation of state, more precise caloric EOS, and defines the thermodynamic relations between the state variables. For the temperature a thermal EOS (12)

$$\begin{aligned} T&=T(\rho ,p,\varvec{Y}). \end{aligned}$$
(12)

has also to be considered.

2.2 The Navier–Stokes–Korteweg Equations

The Navier–Stokes–Korteweg (NSK) equations are an extension of the Navier–Stokes equations where an interfacial stress is added that approximates capillary effects in phase interfaces of finite thickness. The NSK equations are given in the isothermal case for \(T \equiv T_{\mathrm {ref}}\) by

$$\begin{aligned} \rho _t + \nabla \cdot \left( \rho \mathbf {u}\right)&= 0 , \end{aligned}$$
(13)
$$\begin{aligned} \left( \rho \mathbf {u}\right)_t + \nabla \cdot \left( \rho \mathbf {u}\otimes \mathbf {u}+ p \underline{\varvec{I}}\right)&= \nabla \cdot \underline{\varvec{\tau }}+ \nabla \cdot \underline{\varvec{\tau }}_{\mathrm {K}}. \end{aligned}$$
(14)

The NSK equations are non-dimensionalized such that the Stokes stress tensor, \(\underline{\varvec{\tau }}\in {\mathbb R}^{d\times d}\), and the Korteweg stress tensor, \(\underline{\varvec{\tau }}_{\mathrm {K}}\in {\mathbb R}^{d\times d}\), are given by

$$\begin{aligned} \underline{\varvec{\tau }}&= \frac{1}{\mathrm {Re}} \left( \nabla \mathbf {u}+ ( \nabla \mathbf {u})^\text {T} - \frac{2}{3} \nabla \cdot \mathbf {u}\underline{\varvec{I}}\right) , \end{aligned}$$
(15)
$$\begin{aligned} \underline{\varvec{\tau }}_{\mathrm {K}}&= \frac{1}{\mathrm {We}} \left( \rho \mathop {}\!\mathcal {4}\rho + \frac{1}{2}\left|\nabla \rho \right|^2 \right) \underline{\varvec{I}}- \frac{1}{\mathrm {We}} \nabla \rho \otimes \nabla \rho . \end{aligned}$$
(16)

The Reynolds number, \(\mathrm {Re}\), and Weber number, \(\mathrm {We}\), are expressed in terms of the numbers \(\epsilon _{\mathrm {K}}>0\) and \(\gamma _{\mathrm {K}}>0\),

$$\begin{aligned} \frac{1}{\mathrm {Re}} = \epsilon _{\mathrm {K}}, \quad \frac{1}{\mathrm {We}} = \epsilon _{\mathrm {K}}^2 \gamma _{\mathrm {K}}. \end{aligned}$$
(17)

Due to the capillary stress, Eq. (16), the momentum equation is a third order diffusion-dispersion equation. The system is closed by the pressure function of the Van-der-Waals law [24],

$$\begin{aligned} p = \frac{\rho RT_{\mathrm {ref}}}{1-b\rho } - a\rho ^2 , \end{aligned}$$
(18)

where \(a,b,R\) are material parameters. In reduced, non-dimensional, form, they are \(a=3,b=1/3,R=8/3\). For subcritical temperatures, Eq. (18) is non-convex and the eigenvalues of the hyperbolic flux Jacobian of the NSK equations may be imaginary numbers. The NSK system is therefore of hyperbolic-elliptic type and numerical methods that rely on the strict hyperbolicity of the conservation system cannot be used straight forward any more. To overcome these challenges, Corli et al. [7] proposed a parabolic relaxation scheme for diffusion-dispersion equations, which is extended to the isothermal NSK equations as

$$\begin{aligned} \rho _t^{\alpha } + \nabla \cdot \left( \rho ^{\alpha } \mathbf {u}^{\alpha } \right)&= 0 ,\end{aligned}$$
(19)
$$\begin{aligned} \left( \rho ^{\alpha } \mathbf {u}^{\alpha } \right)_t + \nabla \cdot \left( \rho ^{\alpha } \mathbf {u}^{\alpha } \otimes \mathbf {u}^{\alpha } + p^{\alpha } \underline{\varvec{I}}\right)&= \nabla \cdot \underline{\varvec{\tau }}^{\alpha } + \alpha \rho ^{\alpha } \nabla \left( c_{\mathrm {K}}^{\alpha } - \rho ^{\alpha } \right) ,\end{aligned}$$
(20)
$$\begin{aligned} \beta \left(c_{\mathrm {K}}^{\alpha }\right)_t - \epsilon _{\mathrm {K}}^2 \gamma _{\mathrm {K}}\mathop {}\!\mathcal {4}c_{\mathrm {K}}^{\alpha }&= \alpha \left( \rho ^{\alpha } - c_{\mathrm {K}}^{\alpha } \right) . \end{aligned}$$
(21)

An additional unknown, the relaxation variable \(c_{\mathrm {K}}\), satisfies a linear parabolic evolution equation with constant relaxation parameters \(\alpha ,\beta >0\). The system is of second order and of mixed parabolic-hyperbolic type. For \(\alpha \rightarrow \infty \), the solution of the parabolic relaxation model approaches the solution of the classical NSK equations, i.e. \((\rho ^{\alpha },\mathbf {u}^{\alpha }) \rightarrow (\rho ,\mathbf {u})\). The total energy of the relaxation system is given by

$$\begin{aligned} \mathcal {E}^{\alpha }[\rho ] = \int _{\Omega } \left( \frac{1}{2}\rho \left| \mathbf {u}\right|^2 + W(\rho ) + \frac{\alpha }{2} \left( \rho -c_{\mathrm {K}}\right)^2 + \frac{1}{2}\epsilon _{\mathrm {K}}^2 \gamma _{\mathrm {K}}\left| \nabla c_{\mathrm {K}}\right|^2 \right) {\text {d}}\mathbf {x}. \end{aligned}$$
(22)

Admissible solutions to Eqs. (19)–(21) are minimizers of Eq. (22).

3 Numerical Methods

The multiphase solver is comprised of several building blocks. The bulk solver is based on a high order discontinuous Galerkin spectral element method (DGSEM). We use an efficient look up table to incorporate real gas equations of state. For the modelling of the phase interface we apply diffuse interface methods. In the Homogeneous Equilibrium Model (HEM), we rely on the EOS to describe phase transition. In the NSK model, capillarity effects are resolved in a phase interface of finite thickness.

3.1 Discontinuous Galerkin Method

The compressible Navier–Stokes equations and the parabolic relaxation model for the NSK equations are discretized by a discontinuous Galerkin spectral element method as described by [11, 12, 14]. The approach is suitable for general systems of conservation equations. In this paper we restrict ourself to the conservation equations of the form

$$\begin{aligned} \mathbf {U}_t + \nabla _{x} \cdot \underline{\varvec{F}}(\mathbf {U},\nabla _{x} \mathbf {U}) = \varvec{Q} \,, \end{aligned}$$
(23)

where \(\mathbf {U}\) is the vector of the solution unknowns, \(\underline{\varvec{F}}\) is the corresponding flux containing the convective and the diffusive fluxes, and \(\varvec{Q}\) is the source term of the NSK relaxation model. The divergence operator in the physical space is defined as \(\nabla _{x}=\left( \frac{\partial }{\partial x} ,\frac{\partial }{\partial y},\frac{\partial }{\partial z} \right) ^T\).

In a three-dimensional domain we subdivide the computational space into non-overlapping hexahedral elements. Each element is mapped onto the reference cube element \(E:=[-1,1]^3\) by a mapping \(\varvec{x}(\varvec{\xi })\), where \(\varvec{\xi }=(\xi ,\eta ,\zeta )^\text {T}\) is the coordinate vector of the reference element. The mapping onto the reference element E transforms Eq. (23) to the system

$$\begin{aligned} \varvec{J} \mathbf {U}_t + \nabla _{\xi } \cdot \underline{\varvec{\mathcal {F}}}\left( \mathbf {U},\nabla _{\xi } \mathbf {U}\right) = \varvec{J}\varvec{Q} \,, \end{aligned}$$
(24)

with the Jacobian \(\varvec{J}\) and the divergence operator in the reference space \(\nabla _{\xi }=\left( \frac{\partial }{\partial \xi } ,\frac{\partial }{\partial \eta },\frac{\partial }{\partial \zeta } \right) ^\text {T}\). In each element, the solution and the fluxes are then approximated as polynomials

$$\begin{aligned} \mathbf {U}_h = \sum _{i,j,k=0}^{N} \varvec{\hat{U}}_{ijk} \psi _{ijk}(\varvec{\xi }) \quad \text {and} \quad \varvec{\mathcal {F}}^m_h = \sum _{i,j,k=0}^{N} \varvec{\hat{\mathcal {F}}}^m_{ijk} \psi _{ijk}(\varvec{\xi }) \,, \end{aligned}$$
(25)

where the superscript \(m=\{1,2,3\}\) denotes the flux in the direction of the Cartesian coordinates. The basis function \(\psi _{ijk}(\varvec{\xi })=l_i(\xi )l_j(\eta )l_k(\zeta )\) is built by the tensor product of one-dimensional Lagrange polynomials l of degree N. As interpolation nodes we choose Gauss-Legendre points. Due to the nodal character of the Lagrange basis, the degrees of freedom \(\varvec{\hat{U}}_{ijk}\) and \(\varvec{\hat{\mathcal {F}}}^m_{ijk}\) are values of the approximations of the solution and the flux vectors at the interpolation nodes. To obtain the discontinuous Galerkin formulation, the approximations (25) are inserted into (24) which is then multiplied by a test function \(\phi \), identical to the basis function \(\psi \), and then integrated in space. Integration by parts of the volume integral of the flux yields the weak formulation

$$\begin{aligned} \underbrace{ \frac{\partial }{\partial t}\int _{\Omega } \left( \varvec{J} \mathbf {U}_h \phi \right) \text {d} \varvec{\xi }}_{a} - \underbrace{ \int _{\Omega } \left( \underline{\varvec{\mathcal {F}}}_h \cdot \nabla _{\xi } \phi \right) \text {d} \varvec{\xi }}_{b} + \underbrace{ \int _{\partial \Omega } \left( \left[ \underline{\varvec{\mathcal {F}}}_h \cdot \varvec{n} \right] \phi \right) \text {d} \varvec{S}}_{c} = \int _{\Omega } \left( \varvec{J} \varvec{Q}_h \phi \right) \text {d} \varvec{\xi } \,. \end{aligned}$$
(26)

We identify three contributing parts: the volume integral of the time derivative of the solution (a), a volume integral (b) and a surface integral of the fluxes (c). The integrals are evaluated by Gauss-Legendre quadratures. To obtain an approximation of the flux \( \underline{\varvec{\mathcal {F}}}_h \cdot \varvec{n}\) at the element surface, a numerical flux function \(\varvec{\mathcal {G}} = \varvec{\mathcal {G}}(\mathbf {U}_L,\mathbf {U}_R)\) is introduced. It depends on the states left and right of the interface, \(\mathbf {U}_L\) and \(\mathbf {U}_R\), respectively. In case of the viscous and heat conduction fluxes, the gradients are needed in addition. For the numerical flux, we use standard approximative Riemann solvers of the HLL-type and Lax Friedrichs families [23]. The discrete formulation (26) is discretized in time using explicit third- or fourth-order Runge–Kutta schemes (RK) [13]. For the viscous fluxes, the approach of Bassi and Rebay [3, 4] is used.

The DG method with high order accuracy is favourable in smooth parts of the flow. At discontinuities or strong gradients we apply the shock capturing of Sonntag and Munz [20, 21]. We switch locally to a second order accurate finite volume (FV) scheme, where the interpolation nodes of the DG polynomials are reorganized as an equidistant sub-grid on which the solution is stored as integral mean values. A modal Persson indicator [19] is used to switch between DG and FV cells.

3.2 Equation of State and Thermodynamic Equilibrium

As thermodynamic coupling relation for the Navier–Stokes equations the cubic Peng-Robinson (PR) EOS [18] is used

$$\begin{aligned} p&= \frac{R_m T}{\frac{M}{\rho }-b}-\frac{a}{\left( \frac{M}{\rho } + \delta _1 b \right) \left( \frac{M}{\rho } + \delta _2 b \right) }, \end{aligned}$$
(27)

with the universal gas constant \(R_m\) and the molar weight of the mixture M. The parameter a takes intermolecular attraction forces into account, b is the co-volume and the PR EOS specific parameters \((\delta _1,\delta _2) = (1+\sqrt{2},1-\sqrt{2})\). The transformation of the pressure explicit thermal EOS to a caloric one is provided by a residual function ansatz [17]. In case of two-phase phenomena a thermodynamic modelling by use of the HEM approach is performed. The underlying assumption of thermodynamic equilibrium is defined by

$$\begin{aligned} T_v&= T_l, \end{aligned}$$
(28)
$$\begin{aligned} p_v&= p_l, \end{aligned}$$
(29)
$$\begin{aligned} \mu _v^k&= \mu _l^k, \end{aligned}$$
(30)

where the symbols v and l represent the vapor and liquid side respectively, \(\mu \) is the chemical potential and equation (30) has to hold for all \(N_k\) species. In case of single species systems the vapor-liquid calculation is performed by use of the algorithm presented by [1], for mixtures a combined approach of TPD analysis and multi-species VLE calculation is used. The TPD function is defined in mole fraction space \(\mathbf {z}\) and given by

$$\begin{aligned} TPD(\mathbf {z}^{trial}) = \sum _{i=k}^{N_k} z_k^{trial} \left[ \mu _k(\mathbf {z}^{trial},T,p) - \mu _k(\mathbf {z}^{test},T,p) \right] . \end{aligned}$$
(31)

The superscript \((\cdot )^{test}\) indicates for the feed composition, which is provided from the flow solver and \((\cdot )^{trial}\) for all other possible molar compositions, which fulfill the mass balance condition \(\sum _k^{N_k} z_k^{trial} = 1\). The TPD analysis is based on the idea of direct evaluation of the Gibbs free energy surface [2] and checks for a global minimum in Gibbs free energy at the present feed composition. Hereby TPD values greater zero correspond to a stable state, smaller ones to an unstable one. For the analysis of the TPD function the local minimization method with multiple initial guesses presented by [15] is used. The thermodynamic consistent modeling of the states in the two-phase region is provided by the HEM approach with

$$\begin{aligned} \epsilon ^{EQ} = x_v \epsilon _v + (1-x_v) \epsilon , \end{aligned}$$
(32)

where the specific inner energy per mass works as a dummy value for any caloric state variable and \(x_v\) is the vapor mass fraction defined by

$$\begin{aligned} x_v = \frac{1 / \rho - 1 / \rho _l}{ 1 / \rho _v - 1 / \rho _l}. \end{aligned}$$
(33)

Due to the loss of hyperbolicity inside the spinodale region with real gas EOS, the sound speed in the two-phase region in the HEM approach is modeled with the relation presented by [25]

$$\begin{aligned} \frac{1}{\rho a^2} = \frac{\alpha _v}{\rho a_v^2}+\frac{1-\alpha _v}{\rho \alpha _l^2}, \end{aligned}$$
(34)

where a is the sound speed and \(\alpha \) the volumetric vapor fraction given by

$$\begin{aligned} \alpha _v = \frac{x_v \rho }{\rho _v}. \end{aligned}$$
(35)
Fig. 1
figure 1

Quadtree table: left first stage, right second stage, bit numbers (bn) are used for fast localization of quadtree elements in the definiton area given by \(\mathcal {P}_i\)

3.3 Look up Tables and Extension to Shared Memory Trees

The current Cray machine Hazel Hen has about 185,088 cores in the current expansion stage. These are provided with 24 cores each at 7712 nodes. Each node is comprised of 128 GB memory. The next expansion stage, Hawk, which is planned for spring 2020, will be approximately 640,000 cores at 5000 nodes. The ratio of nodes to cores will accordingly increase more than quintupled from the present time of \(N_{cores}/N_{nodes}=24\) to \(N_{cores}/N_{nodes}=128\). It is important to consider that the available capacity of memory on a node is not increased and will therefore be 1 GB per core. Scalable and highly efficient CFD codes for high-performance (HPC) computers, which are perfectly adapted to old architectures, should keep pace with such new developments. To maintain efficiency, the algorithms have to be modified. Examples are memory-consuming algorithms, which can be found in multi-phase and multi-component simulations in combination with so-called look up tables approaches [9, 10]. Today, these tables are composed of modern data structures such as quadtree or octree data structures, see Figs. 1 and 2. Quadtrees and octrees make use of properties from so called space filling curves for fast data localization. Here the Morton curve is popular due to the inherent possibility to access the data via bit operations

$$\begin{aligned} \text {data position}=f(\text {bit number}), \end{aligned}$$
(36)

see Fig. 1. Despite of the usage of such modern data structures, today’s CFD simulations may reach the memory limits fast, if large scale high fidelity simulations are performed. In this context we want to discuss in this paper the implementation and application of tree structures on high-performance computers associated with MPI 3.0 and shared memory (Fig. 3).

Fig. 2
figure 2

Octree table—left: one stage, right: two stages

Fig. 3
figure 3

Left: tree data structure, right: refined quad tree

In the last period, we have extended our tabulation framework, in order to use the look up tables as efficiently as possible on future high performance computers. Initially we will give some information about the parallelization strategy of the CFD solver FLEXI [8]. FLEXI is based on the so-called domain decomposition, which divides the computational grid into heterogeneously distributed MPI processes depending on the number of cores used, see Fig. 4. For the domain decomposition, again a space filling curve, more precisely the so-called Hilbert curve, is used. The curve has the special property to optimally distribute the different MPI regions with respect to the volume/surface ratio, even on unstructured grids. Figure 4 shows such a division. To ensure that each MPI process can access the data in the table, each MPI process has to initialize and allocate its own table when using standard MPI features. By considering MPI 3.0 features, like shared memory windows, the number of tables for each node can be reduced to one table for each node.

Fig. 4
figure 4

FLEXI [8] parallelization strategy with domain decomposition via space filling curve: left computational quad mesh, middle decomposition with 4 MPI processes, right decomposition with 16 MPI processes

However, modern data structures generally consist of chained pointer lists, which are not directly applicable with the MPI 3.0 shared memory feature. This is due to the fact, that each MPI process is linked with its own virtual memory space, see Fig. 5. This has consequences for the way in which the tree structure has to be read in and accessed during the simulation on HPC systems. In Fig. 6, the standard approach to store and access the tree data is depicted. Here, each branch of the tree stores a small portion of the whole data. Furthermore, each MPI process reads and allocates the data during IO. In Fig. 7 the alternative approach to store and access the tree data with MPI 3.0 shared memory window is depicted. Here, unlike before, each branch of the tree only stores two integer IDs depicting a range in the global shared memory array. An important aspect is the fact, that during IO only one MPI process on the node is allowed to read and allocate the data. Nevertheless, each MPI process has to read and store the empty tree. This is necessary because each MPI process has still to know the relative path to the unique IDs in the last branch. With this approach it is possible to maintain the efficient tree data structure while simultaneously be capable to store and access several magnitudes of data.

Fig. 5
figure 5

Mapping from virtual address space for each MPI process to physical address space, here 4 MPI processes (the colors are chosen consistent to the domain decomposition in Fig. 4)

Fig. 6
figure 6

Tree data structure without MPI 3.0 shared memory: each branch contains the data

Fig. 7
figure 7

Tree data structure with MPI 3.0 shared memory: each branch only contains an integer id for the global data array, the global data array is allocated in a shared memory window

4 Results

4.1 Performance Comparison of Tree Data Structures with and Without MPI 3.0 Shared Memory

In this section we investigate the different data structures in terms of performance and memory usage.

For the comparison we use the performance index

$$\begin{aligned} \text {PID} = \frac{\text {wall-clock-time} \cdot \#\text {processors}}{\#\text {DOF} \cdot \#\text {time steps} \cdot \#\text {RK-stages}} . \end{aligned}$$
(37)

The results are obtained with the open source code FLEXI in combination with octree tables. Note that FLEXI is based on the HDF5 standard. To ensure a fair comparison, we have chosen a simple test case, the standard lid driven cavity in two dimensions, see [5]. We choose a binary mixture with two different ideal gases, instead of performing a one-component simulation as it is typically done in the literature. First, we look at the performance of both data structures that we defined in Sect. 3.3. We perform each simulation six times and average the measurement to cancel out hardware influences. The comparison was done on 8 nodes with 192 processes. The tree data was refined up to 7 levels resulting in about \(\approx 2.8\) GB memory size. Each octant represents the data in a three dimensional polynomial basis of degree 4. In the first two lines of Table 2, we have listed the results for the performance test. We notice a slightly higher PID for the MPI 3.0 implementation, which is most likely due to additional index mapping used to get the position in the global shared memory array. In the third line we compare the time which was used to read and allocate the data before the simulation starts. We note that the IO of the MPI 3.0 implementation is different in the way that we do not read in the whole tree from the HDF5 at once. By using the shared memory option, we read, allocate and deallocate each octant successively from the HDF5 file, to save as much memory as possible. Here, we notice a non negligible longer IO time for the MPI 3.0 implementation. The factor between the standard and the shared memory approach is about 6 (Table 1).

Table 1 Numerical setup of the Lid driven cavity test problem
Table 2 Performance comparison for the different data structures

The next two Tables 3 and 4 contain memory comparisons.

Table 3 Memory usage depending on tree level, here we tabulated a binary mixture of Helium/Air
Table 4 Theoretical memory usage by using non-shared memory and shared memory data structures on different architectures

Here, we notice the huge improvement with the MPI 3.0 shared memory implementation. For the planned architecture Hawk we will (theoretically) be able to store and access about 128 times more memory than with the old algorithm.

4.2 Navier–Stokes Multi-component Simulations

The multi-component Navier–Stokes model was used for comparison simulations conducted with direct use of the EOS and tables with different refinement levels. As test case a two-dimensional shear layer of nitrogen and n-dodecane of the dimension \([0,0.2] \times [-0.15,0.15]\) m2 was investigated. The initial states of the pure species are summarized in Table 5. As initial condition a base flow in x-direction superposed by a y-velocity disturbance was used, which are given by

$$\begin{aligned} u_{N_2}&= 2 M_{c,0} a_{N_2} \left[ 1 + \left( \frac{a_{N_2}}{a_{C_7H_{16}}} \right) \sqrt{\frac{\rho _{N_2} Z_{N_2}}{\rho _{C_7H_{16}} Z_{C_7H_{16}}}} \right] ^{-1}, \end{aligned}$$
(38)
$$\begin{aligned} u_{C_7H_{16}}&= - \sqrt{\frac{\rho _{N_2} Z_{N_2}}{\rho _{C_7H_{16}}}} u_{N_2}, \end{aligned}$$
(39)
$$\begin{aligned} u (x,t=0)&= u_0 \bigg |erf \left( \frac{\sqrt{\pi } y}{\delta _{\omega ,0}} \right) \bigg |, \end{aligned}$$
(40)
$$\begin{aligned} Y_{C_7H_{14}} (x,t=0)&= 1 - y_{N_2}, \end{aligned}$$
(41)
$$\begin{aligned} Y_{N_2} (x,t=0)&= 0.5+0.5 \; erf \left( \frac{\sqrt{\pi } y}{\delta _{\omega ,0}} \right) , \end{aligned}$$
(42)
$$\begin{aligned} v (x,y,t=0)&= 0.1 \; max \left( u_0 \right) \sin \left( \frac{8 \pi x}{\delta _{\omega ,0}} \right) \exp \left\{ - \left( \frac{y}{\delta _{\omega ,0}} \right) ^2 \right\} \end{aligned}$$
(43)

and

$$\begin{aligned} \rho = \rho (T,p,\mathbf {Y}). \end{aligned}$$
(44)
Table 5 Specified initial conditions of the base flow for the pure species of the mixing layer test case

Here Z is the compressibility factor, \(M_{c,0}\) is the Mach number which was chosen to 0.4 and \(\delta _{\omega ,0}\) is the initial blending thickness between the two species with \(\delta _{\omega ,0} = 6.859 \cdot 10^{-3}\) m.

Fig. 8
figure 8

Temporal snapshot of nitrogen mass fraction at \(t=4~\text {ms}\) for simulations conducted by direct use of the EOS (left), coarse tables (middle) and refined tables (right)

The achieved results are visualized in Figs. 8 and 9. In both snapshots we can observe some differences in between the three computations. This is due to the fact, that the chosen Kelvin Helmholtz test problem is a highly sensitive initial value problem. The different thermodynamic approximations quickly lead to different results. In summary we can show that the tabulation approach is suitable for multi-component simulations in the super-critical regime, nevertheless future investigations are necessary.

Fig. 9
figure 9

Temporal snapshot of nitrogen mass fraction at \(t=6~\text {ms}\) for simulations conducted by direct use of the EOS (left), coarse tables (middle) and refined tables (right)

4.3 Navier–Stokes–Korteweg

The parabolic relaxation model for the NSK equations was used to investigate head on collisions of two droplets.

4.3.1 Simulation Setup

The initial conditions were

$$\begin{aligned} \rho (\mathbf {x},t=0)&= \rho _{\mathrm {vap}} + \frac{\rho _{\mathrm {vap}}-\rho _{\mathrm {liq}}}{2} \sum _{i=1}^2 \left(\mathrm {tanh} \left( \frac{d_i-r_i}{2\sqrt{\gamma _{\mathrm {K}}\epsilon _{\mathrm {K}}^2}} \right) \right) \end{aligned}$$
(45)
$$\begin{aligned} u(\mathbf {x},t=0)&= {\left\{ \begin{array}{ll} \frac{v_{\mathrm {ini}}}{2} + \left( 1 - \mathrm {tanh} \left(\frac{d_1-r_{\mathrm {d}}}{2\sqrt{\gamma _{\mathrm {K}}\epsilon _{\mathrm {K}}^2}} \right) \right) \quad &{} \text {if} \quad x<0.5, \\ \frac{- v_{\mathrm {ini}}}{2} + \left( 1 - \mathrm {tanh} \left(\frac{d_2-r_{\mathrm {d}}}{2\sqrt{\gamma _{\mathrm {K}}\epsilon _{\mathrm {K}}^2}} \right) \right) \quad &{} \text {if} \quad x\ge 0.5, \\ \end{array}\right. }\end{aligned}$$
(46)
$$\begin{aligned} v(\mathbf {x},t=0)&= 0, \end{aligned}$$
(47)
$$\begin{aligned} w(\mathbf {x},t=0)&= 0, \end{aligned}$$
(48)

where \(\rho _{\mathrm {vap}}=0.3197\), \(\rho _{\mathrm {liq}}=1.8071\) are the Maxwellian densities at \(T_{\mathrm {ref}}=0.85\). The droplet radii were \(r_1=r_2=0.5\) and the distance was given by

$$\begin{aligned} d_i = \parallel \mathbf {x}- \mathbf {x}_{0,i} \parallel , \end{aligned}$$
(49)

where \(\mathbf {x}_{0,1}=(0.3,0.5,0.5)^{\top }\) and \(\mathbf {x}_{0,2}=(0.7,0.5,0.5)^{\top }\) are the initial positions of the droplets. Four cases were investigated where the droplet number, position, and size remained the same and the model parameters and initial velocities were changed. The parameters are summarized in Table 6. The computation domain was \(\Omega =[0,1]^3\) and it was discretized by 64 elements in each direction. The polynomial degree was \(N=3\) which yielded \(256^3\) degrees of freedom (DOF). Time integration was done implicit with \(\mathrm {CFL}=100\) using a fourth order ESDIRK scheme with six stages. The simulations were performed on the Hazel Hen supercomputer at HLRS using 200 nodes.

Table 6 Parameters for head on droplet collision simulations

4.3.2 Simulation Results

The isocontour of the mean density, \(\rho _{\mathrm {mean}}=1.0634\), of the solution of case A is shown in Fig. 10 for different time instances. Two droplets were pushed towards each other and coalesce. A flat disc formed for \(t>0.12\) which broke up into a ring and a small droplet in the center at \(t\approx 0.24\). Both the ring and the centered droplet evaporated and for \(t \rightarrow \infty \) only vapour remained, since the average density was in the stable vapour phase.

Fig. 10
figure 10

Results for Case A with parameters \(\epsilon _{\mathrm {K}}=1\exp {-3}\), \(\gamma _{\mathrm {K}}=100\), \(v_{\mathrm {ini}}=3.0\)

In Case B \(\epsilon _{\mathrm {K}}\) was increased and \(\gamma _{\mathrm {K}}\) was decreased such that different phenomena were observed. The isocontour of the mean density is shown in Fig. 11. Again, the two droplets merged and a disc formed. The disc flattened and break up occurred at its centre, however no droplet was formed and only a ring remained. Eventually, the ring evaporated and the domain was filled by a stable vapour phase.

Fig. 11
figure 11

Results for Case B with parameters \(\epsilon _{\mathrm {K}}=1\exp {-2}\), \(\gamma _{\mathrm {K}}=1.00\), \(v_{\mathrm {ini}}=3.0\)

Case C reduced \(\gamma _{\mathrm {K}}\) further, which led to a thinner phase interface. The isocontour is shown in Fig. 12. After coalescence, the disc formed again but no break up occurred and the disc remained at that form until it evaporated completely.

Fig. 12
figure 12

Results for Case C with parameters \(\epsilon _{\mathrm {K}}=1\exp {-2}\), \(\gamma _{\mathrm {K}}=0.05\), \(v_{\mathrm {ini}}=3.0\)

Case D used the same parameters as Case C but increased the initial velocity of the droplets. The isocontour is shown in Fig. 13. The momentum of the droplets was increased and the impact was stronger such that the disc quickly broke up and a ring and centered droplet remained.

Fig. 13
figure 13

Results for Case D with parameters \(\epsilon _{\mathrm {K}}=1\exp {-2}\), \(\gamma _{\mathrm {K}}=0.05\), \(v_{\mathrm {ini}}=4.0\)

The total energy, Eq. (22), was calculated in each time step. As seen in Fig. 14, the total energy decreased monotonously until a minimum was reached. Hence, the solutions produced by the relaxation model were admissible.

Fig. 14
figure 14

Decay of total energy for the head on droplet collisions

5 Summary and Conclusions

In this work we carried out investigations on the use of modern data structures on high performance computers. In this context, a new implementation strategy for shared memory look up tables for binary mixtures was introduced. We were able to show that a change in hardware architecture on high performance computers, e.g. from Hazel Hen to Hawk, has a great impact on the old algorithms. With the new implementation we are able to store and access about 128 times more memory than with the old algorithm. The simulation and comparison of a multi-component real gas shear layer with exact EOS and tabulation approach led to reasonable results, however further investigations are necessary.

In addition, 3D simulations of colliding droplets were carried out using a parabolic relaxation model of the Navier–Stokes–Korteweg diffuse interface model. A variation of model parameters produced a variation in the coalescence behaviour. Future research aims at validation with experimental results.