1 Introduction

When dealing with nonlinear conservation laws, it is well known that discontinuous solutions can emerge, even for smooth initial data [14]. The numerical approximation of the discontinuous solution will develop non-physical oscillations around the discontinuity, which in turn will negatively impact the accuracy of the numerical scheme. There exist many different stabilisation methods to control these oscillations, for example, through the addition of a viscous term (as denoted by the right-hand side of (1)) or use of limiters:

$$\begin{aligned} \frac{\partial }{\partial t} u + \nabla \cdot f(u) = \nabla \cdot (\nu (u)\nabla u). \end{aligned}$$
(1)

Neural networks regained popularity in the past decade due to the computational tractability of the back-propagation algorithm, used for the learning of weights and biases in a deep neural network. Deep neural networks have been shown to generate robust models for classification in many areas of applications [23, 39] and theoretically, to generate universal classifiers and function approximators [34, 35, 38]. In the context of computational fluid dynamics, the idea of using artificial neural networks as troubled-cell indicators has been explored first in [31], motivated by the objective to find a universal troubled-cell indicator that can be used for general conservation laws. The authors show that this type of approach is promising, performing better than traditional, parameter-dependent limiters. Further studies following a similar idea have been conducted for other numerical schemes [4, 28].

Using limiters is a common way to perform shock capturing in a numerical scheme. In the context of the discontinuous Galerkin (DG) method, we can point out, among many, the high-order (HIO) limiter [6, 22], which does the limiting in a hierarchical manner. In particular, [22] relies on the modal representation of the numerical solution and it is formulated specifically for the modal DG method using Legendre polynomials as basis functions. For high-enough resolution, it does not clip the solution extrema.

Transfer learning is concerned with using a model built using a particular source data distribution on a different (but related) target data distribution [42]. One simple example is the task of spam filtering, where a model is used to discriminate between spam and non-spam emails—a model can be trained on the data of a particular user and adapted to be used on the data of a new user, who might receive significantly different emails. Following the same reasoning, we are interested in training shock-indicators that work in different numerical schemes, while using only (or the majority of the) training data from a particular numerical scheme. This can be useful, for example, in the case where there is access to a solver for which labeled data are easy to obtain.

In this work, we are interested in studying two main questions.

  1. (i)

    Is it possible to learn a data-driven shock-indicator function which requires minimal user input once trained?

  2. (ii)

    Can this shock-indicator be used in different numerical schemes, leading to stabilisation methods which are agnostic to the underlying numerical scheme?

The paper is structured as follows: we start some preliminaries in Sect. 2, the methodology of training a data-driven shock-indicator and its integration with existing computational fluid dynamics (CFD) codes are presented in Sect. 3; in Sect. 4, the construction of the dataset is described (as well as the extension to two-dimensional problems) and Sect. 5 describes the transfer learning strategy. In Sects. 6 and 7, numerical results for one-dimensional and two-dimensional problems, respectively, are shown. We conclude the paper with a discussion and outlook in Sect. 8 and finally, our conclusions are drawn in Sect. 9.

In the spirit of open and reproducible science, all the datasets, trained models and some solversFootnote 1 are made available in a public repository [15].

2 Preliminaries: the Runge-Kutta Discontinuous Galerkin (RKDG) Scheme

Let us consider the classical modal RKDG method [10] using a simple scalar problem in one space dimension:

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t u + \partial _x f( u ) = 0, \quad (x,t) \in \varOmega \times [0,\infty ],\\ u(t=0) = u_0, \\ u_{\partial \varOmega } = g. \end{array}\right. } \end{aligned}$$
(2)

Let \(\varOmega \in {\mathbb {R}}\) be a regular domain which is discretised by N elements \(K_p = [x_{p-1/2},x_{p+1/2}]\) for \(p=1,\cdots ,N\). Consider the local space \({\mathcal {V}}\) given by the set \(\{\phi _i\}_{i=0}^{n}\) of one-dimensional Legendre polynomials with degree of at most n in x. For any element K, the numerical solution is written as

$$\begin{aligned} u^K(x,t) = \sum _{i=0}^{n} {\hat{u}}^K_i(t) \phi _i(x), \end{aligned}$$

where the modal coefficient \({\hat{u}}^K_i(t)\) is obtained by the \(L^2\) projection of the solution u(x) in element K on the ith Legendre basis polynomial. The DG method is based on a weak form of (2), projecting it on the polynomial basis, followed by an integration by parts. We obtain the following semi-discrete formulation of the DG method as

$$\begin{aligned} \frac{\mathrm{d} {\hat{u}}^K_i}{\mathrm{d}t} + \left[ {\hat{f}}(u^K(x,t))\phi _i(x)\right] _{x_{p-1/2}}^{x_{p+1/2}} - \int _{K_p} f(u^K(x,t)) \partial _x \phi _i(x) \mathrm{d}x = 0,\quad i=0,\cdots ,n, \end{aligned}$$

where we exploited the fact that Legendre polynomials form an orthonormal basis. Note that the surface term in the previous equation needs a Riemann solver to compute a continuous numerical flux at element boundaries, noted here \({\hat{f}}\). Once the spatial component has been discretised, we are left with an ordinary differential equation of the form

$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t} u = {\mathcal {L}}(u), \end{aligned}$$

where \({\mathcal {L}}\) denotes the DG discretisation operator. Integration in time is performed using a strong stability preserving (SSP) RK method [17]. The time step has to fulfill a Courant–Friedrich–Lewy (CFL) condition to achieve the numerical stability, which for the RKDG scheme reads [11]

$$\begin{aligned} \Delta t = \frac{C}{2n + 1} \frac{\Delta x}{\left| v_\mathrm{max}\right| }, \end{aligned}$$

where n is the polynomial degree and C is a constant usually set to \(C=0.8\).

The method described above can compute solutions to (2) which are smooth (or contain weak shocks or other discontinuities). However, if the discontinuities are strong, the numerical solution will become significantly oscillatory. To surpass these difficulties, a slope limiter is used after each RK inner stage.

Now we describe the limiting procedure for a modal RKDG scheme as in [36], considering the minmod (MIN) limiter [18].

The coefficient \(u_1\) for the linear term of the DG basis is modified as

$$\begin{aligned} \tilde{u}_1^K = \frac{1}{\sqrt{3}}\mathrm{minmod}\left( \sqrt{3}{\hat{u}}_1^K, \frac{1}{2}\left( {\hat{u}}_0-{\hat{u}}_0^{K_\mathrm{l}}\right) , \frac {1}{2} \left( {\hat{u}}_0^{K_\mathrm{r}}-{\hat{u}}_0^K\right) \right) , \end{aligned}$$

here \(\tilde{u}_1^K\) is the new weight, \(K_\mathrm{l}\) denotes the left neighbouring cell and \(K_\mathrm{r}\) the right neighbouring cell, and the MIN function is defined as

$$\begin{aligned} \text{minmod}(a,b,c) = \left\{ \begin{array}{ll} s \min (|a|,|b|,|c|), &{} \quad s = \text{sign}(a) = \text{sign}(b) =\text{sign}(c), \\ 0, &{} \quad \text{ otherwise.} \end{array} \right. \end{aligned}$$

In a system, each component of the conserved variables is limited separately. The \(\sqrt{3}\)-factors account for the scaling of the Legendre polynomial. If the limited weights are the same as the old weights (i.e., \(\tilde{u}_1^K = {\hat{u}}_1^K\)), the solution \(u^K\) is unlimited. Otherwise, the limited numerical solution becomes

$$\begin{aligned} \tilde{u}^K = {\hat{u}}_0^K + \tilde{u}_1^K \phi ^K_1, \end{aligned}$$

discarding further HIO information. Although this limiter effectively reduces over-shootings and oscillations, it can also be triggered at smooth extrema and lead to an unnecessary loss of HIO information. To avoid clipping of the solution at smooth extrema, the limiter presented above can be improved by a bounded version [11],

$$\begin{aligned} \text{minmodB}(a,b,c) = \left\{ \begin{array}{ll} a, &{} \quad \text{ if } |a| \leqslant M(\Delta x ^K)^{2},\\ \text{minmodB}(a,b,c), &{} \quad \text{ otherwise.} \end{array} \right. \end{aligned}$$

Here M is a user-defined parameter that gives an estimate of the smoothness of the solution u(x). It is to note that M can take a value in a large range of positive numbers, and that it is usually a global quantity fixed in the beginning of the numerical experiment. Thus, this can be a drawback if the solution has different smoothness properties across the domain (in space and time).

Conceptually, a limiter can be thought of having two sub-steps:

  1. (i)

    a shock-indicator procedure \({\mathcal {C}}\) that identifies a “troubled cell” (a cell which needs limiting);

  2. (ii)

    a reconstruction procedure \(\varPi\) that modifies the solution polynomial in those troubled cells into a reconstructed polynomial which is less oscillatory.

Then we can write a limiter as a function \(s\) that takes some local properties of the solution (let us denote this with the map X(u(x))), and returns a modified solution \(\tilde{u}(x)\), which has some desired properties (e.g., non-oscillatory, maximum principle preserving, etc.):

$$\begin{aligned} s\left( u(x), X\left( u(x)\right) \right) = \left\{ \begin{array}{ll} u(x), &{}\quad \text{ if } {\mathcal {C}}(X(u)) \text{ does } \text{ not } \text{ identify } \text{ a } \text{ troubled } \text{ cell, } \\ \varPi (u(x)), &{}\quad \text{ if } {\mathcal {C}}(X(u)) \text{ identifies } \text{ a } \text{ troubled } \text{ cell.} \end{array} \right. \end{aligned}$$

For example, as detailed above, the MIN-based limiting procedure can be understood in these two sub-steps: \({\mathcal {C}}(X(u))\) checks if \(\tilde{u}_1 = {\hat{u}}_1\), there is an equality, no troubled cell is detected, otherwise, the cell is troubled. Then if the cell is troubled, the reconstruction \(\varPi (X(u(x))) = {\hat{u}}_0^K + \tilde{u}_1^K \phi ^K_1\) is used. The map X(u(x)) encodes local properties of the solution around the interval K. In this case,

$$\begin{aligned} X(u(x)) = \left( {\hat{u}}_1^K, \frac {1}{2}\sqrt{3}({\hat{u}}_0-{\hat{u}}_0^{K_{\rm{l}}}), \frac {1}{2}\sqrt{3} ({\hat{u}}_0^{K_{\rm{r}}}-{\hat{u}}_0^K)\right) . \end{aligned}$$

In the following section, we describe how to construct a data-driven shock-indicator function \({\mathcal {C}}\) which is parameter-free once it has been trained.

3 Data-Driven Method

In the following section, we focus on the three main aspects of the proposed method.

  • The setup of the learning algorithm to generate a shock-indicator function (Sect. 3.1).

  • The integration of a neural network that serves as a shock-indicator with an existing CFD code (Sect. 3.2).

  • A description of the performance measures used to validate the proposed data-driven method (Sect. 3.3).

3.1 Setup of the Learning Algorithm

In this section, we describe the details of the learning algorithm to generate a data-driven shock-indicator.

We wish to learn a map \({\mathcal {C}}: {\mathcal {X}} \rightarrow {\mathcal {Y}}\), where \({\mathcal {X}}\) denotes an arbitrary set containing examples that we wish to label with possible outcomes \({\mathcal {Y}}\). The task at hand is a binary classification (i.e., is there a troubled cell/is there a shock in this cell?), thus \({\mathcal {C}}\) will be a binary classifier and \({\mathcal {Y}} = \{0,1\}\).

We choose \({\mathcal {C}}\) to be defined by the composition of a sequence of functions \(g_1, g_2, \cdots, g_n\), yielding the function form

$$\begin{aligned} {\mathcal {C}}(x) = g_n(\cdots g_2(g_1(x))). \end{aligned}$$

This is known as the multilayer perceptron (MLP) neural network (a type of deep neural network [16]). There are many different classifiers which can be used, but it has been shown that deep neural networks perform well on a variety of classification tasks, in particular when the classification plane is nonlinear.

Each function \(g_i(w_i,b_i,h_i(\cdot ))\) is parameterised by a matrix \(w_i\), called the weights matrix, a vector \(b_i\) called the bias vector and an activation function \(h_i(\cdot )\) which introduces the non-linearity on the neural network.

These parameters are tuned through the minimisation of a loss function \({\mathcal {L}}(x)\), which measures how well the mapping \({\mathcal {C}}\) performs on a given dataset \({\mathcal {D}}\) of inputs and outputs, using back-propagation and gradient descent algorithms.

The gradient descent [37] is a first-order iterative optimization algorithm for finding a local minimum of a function (in this case, the loss function \({\mathcal {L}}(x)\)), relying on the fact that for a small enough update \(\eta\),

$$\begin{aligned} a_{n+1} = a_n - \eta \nabla {\mathcal {L}}(a_n), \end{aligned}$$

then

$$\begin{aligned} {\mathcal {L}}(a_{n+1})\leqslant {\mathcal {L}}(a_n) \end{aligned}$$

for a differentiable and convex loss function \({\mathcal {L}}\). Because we cannot guarantee the convexity of our loss function, a stochastic gradient descent [7] like procedure is used. It is not the standard stochastic gradient descent because instead of using a global learning rate \(\eta\), the Adam algorithm [21] is used, which chooses the learning rate adaptively for each parameter (in this case, for the weights and biases).

Furthermore, two different loss functions are considered:

  • the standard cross-entropy

    $$\begin{aligned} {\mathcal {L}}({\mathcal {D}}) = - \frac{1}{N}\sum _i^N y_i\log ({\hat{p}}_i) + (1 - y_i)\log (1 - {\hat{p}}_i); \end{aligned}$$
    (3)
  • the weighted cross-entropy

    $$\begin{aligned} {\mathcal {L}}({\mathcal {D}}) = - \frac{1}{N}\sum _i^N y_i\log ({\hat{p}}_i)\omega + (1 - y_i)\log (1 - {\hat{p}}_i)(1-\omega ). \end{aligned}$$
    (4)

We consider the weighted cross-entropy loss function as we expect that there will be a class imbalance in the dataset (both during the training phase and prediction phase). In particular, it is more likely to find cells which are in no need for stabilisation than ones which are in need for stabilisation. Furthermore, it is more desirable to overlimit than to miss a cell that needs limiting, as it might lead to unphysical results and potentially crash the code. To account for this class imbalance, it is common practice to use a weighted cost function, which increases the penalty of mislabeling a positive label [20]. The asymmetry in the loss function is added through the coefficient \(\omega\).

Finally, we specify the activation functions used, for the initial \(n-1\) hidden layers, rectified linear units (ReLU) are used:

$$\begin{aligned} h(x) = \max (0,x). \end{aligned}$$

Although there exist more sophisticated activation functions, typically modifications to ReLU, e.g., leaky ReLU, parametric ReLU or randomised leaky ReLU, these require further parameter estimation, adding at least one more dependence to the saved model, and the empirical improvement on the performance is not extremely significant [43].

For the last layer (output layer), a sigmoid function is used:

$$\begin{aligned} h(x) = \frac{1}{1 + {\text{e}}^{-x}}, \end{aligned}$$

to attain a value that can be interpreted as a probability.

Lastly, the datasets (described in Sect. 4) are split into disjoint sets of training, validation and test sets.

The training phase is detailed in Algorithm 1. The hyper-parameters, such as batch-size (required the stochastic gradient descent type of minimization procedure), number of training epochs and general architecture of the neural network are fixed at run-time. We also add a early stopping criteria to minimize the risk of overfitting to the training data. This is triggered when the empirical generalisation error (measured through the loss on the validation set) increases for several training cycles [30].

figure a

3.1.1 Architecture

The architecture of the neural network is a hyperparameter as well. There is the general empirical wisdom that deep networks (more hidden layers) generalise better than shallow networks, even with the same amount of degrees of freedom [26].

There are, in addition, works which aim to establish lower bounds for the shape and size of deep neural networks, for example, [29], which establish lower bounds for the number of required non-zero weights and layers necessary for an MLP (using ReLU activation functions).

However, to use those estimates, assumptions about the classification function’s regularity and desired approximation accuracy (in the \(L^2\) norm) must be made, and those change the shape and size of the network dramatically, leading to very different lower bounds for the number of layers and number of neurons. In practice, we choose networks which are deeper and have small width, mainly with the intent to reduce the computational complexity of the model evaluation in mind. In Appendix A, the explicit architectures are detailed.

3.1.2 Directional Invariance

To introduce the feature invariance (see Fig. 1), the models tested use an aggregate way to estimate the label. For a given feature vector \(\mathbf {x}\), several copies of this vector are generated \(\mathbf {x}_1, \cdots , \mathbf {x}_n\) where permutations between the features are performed and their prediction is evaluated. The final label estimation is given by the majority label produced when evaluating the classifier on samples \(\mathbf {x}, \mathbf {x}_1, \cdots , \mathbf {x}_n\). This ensures the response of the model does not depend on the orientation of the stencil.

Fig. 1
figure 1

Example of desired invariance, with respect to a reflection through the y-axis

3.2 Integration of the Method on a CFD Code

Assuming that an MLP has been trained, then the model can be fully specified by the composition of the functions \(g_i\). Furthermore, each \(g_i\) can be fully characterized by the following information:

$$\begin{aligned} w_i,b_i,h_i(\cdot ), \end{aligned}$$

where \(w_i\) denotes the weights matrix, \(b_i\) the bias vector, and \(h_i\) the activation function.

There are two necessary steps to integrate a trained neural network with an existing code.

  1. (i)

    Generation of features: given the local solution u, generate the feature quantities X(u).

  2. (ii)

    Prediction routine: given the features X(u), the classifier \({\mathcal {C}}\) is evaluated on the given feature vector. Once the neural network has been trained offline, the weights and biases can be loaded onto a CFD code. What remains to be implemented are the activation functions for the hidden layers and the activation function for the output layer to evaluate \({\mathcal {C}}\) at some given input.

Now that the neural network-based shock-indicator has been described, the full limiting strategy is detailed in Algorithm 2. We will refer to this procedure as the neural network limiter. Furthermore, the limiter which we use in conjunction with the neural network shock-indicator is the HIO limiter.

figure b

For systems, each variable is limited independently as in Sect. 2.

3.3 Measuring Performance of Model

We use two sets of performance measures, namely

  1. (i)

    label prediction measures;

  2. (ii)

    the \(L^1\)-norm of the numerical solution.

For the first set of measures, we can use typical metrics used in computational statistics and machine learning communities:

$$\begin{aligned} &\frac{\mathrm{tp} + \mathrm{tn}}{\mathrm{tp} + \mathrm{tn} + \mathrm{fp} + \mathrm{fn}}\quad&\text{(accuracy), }\\ &\frac{\mathrm{tp}}{\mathrm{tp} + \mathrm{fn}}\quad&\text{(recall), }\\ &\frac{\mathrm{tp}}{\mathrm{tp} + \mathrm{fp}}\quad &\text{(precision),} \end{aligned}$$

where tp is the number of correctly predicted positive labels, tn the number of correctly predicted negative labels, fp the number of incorrectly predicted positive labels, and fn the number of incorrectly predicted negative labels. We note that it is important to consider recall and precision because the distribution of labels is expected to be imbalanced. Judging the performance of a classifier only through the accuracy could be misleading, as a classifier can have a high accuracy by always predicting the majority label.

For the second performance measure, we consider the \(L^1\)-norm because it is the relevant measure for a CFD code and we can study the effect of this method on the error of the numerical solution.

4 Dataset

The dataset is an integral part of data-driven studies. It contains the data for which we want to learn a mapping for. The task is to learn a function which indicates whether a discontinuity is present in the solution or not. The dataset is the set containing N samples \(\{X_i, y_i\}_{i=1}^N\), where \(X_i\) denotes some local properties of the solution u(x) (features) and \(y_i\) (labels) indicates the existence (or not) of a discontinuity.

For the one-dimensional case, the dataset is generated by performing many runs on a one-dimensional DG code solving the advection equation for different initial conditions, orders and mesh sizes (see Table 1), and the labels are obtained by running the HIO limiter [22]. Because the HIO limiter is sometimes triggered (and modifies the solution) even when the solution is smooth, we introduced a further threshold to generate a positive label: if the limited solution deviates from the unlimited solution by more than \(\zeta = 1\%\), then we get a positive label.

A visual representation of an entry on the dataset can be seen in Fig. 2, where each discrete point in x, the associated solution u(x) and its local properties generate a data point in the feature space, and the corresponding label is denoted by the presence of a cross (if the cell is troubled) or its absence (if the cell is not troubled). Empirically, we noted that the cells which are flagged by the HIO limiter are similar to a well-tuned TVD limiter with the chosen threshold \(\zeta\).

Fig. 2
figure 2

Example of a dataset entry. The dashed line denotes the limited solution, whereas the full line denotes the unlimited solution. Furthermore, the crosses “×” denote cells which are flagged as troubled cells

4.1 Features

The features X are the different quantities used to describe the local solution u. For the sake of generalisation, we choose features which are readily available in different numerical methods, such as the cell mean value, values at interface, divided differences between neighbours (see Table 2 for the complete description of features).

Furthermore, to introduce some magnitude invariance, we normalise physical values (such as averages or pointwise values):

$$\begin{aligned} u_{\mathrm{normal}}(u_*) = \frac{u_*-u_{\mathrm{min}}}{\mid u_{\mathrm{max}} \mid + \mid u_{\mathrm{min}} \mid } - \frac{u_{\mathrm{max}} - u_*}{\mid u_{\mathrm{max}} \mid + \mid u_{\mathrm{min}} \mid }, \end{aligned}$$
(5)

where \(u_{\mathrm{max}} = \max ( {\bar{u}}_i, {\bar{u}}_{i+1}, {\bar{u}}_{i-1})\) and \(u_{\mathrm{min}} = \min ( {\bar{u}}_i, {\bar{u}}_{i+1}, {\bar{u}}_{i-1})\) (taken only over averages). This is a very important step, as a naive normalisation can lead to a non-informative representation of the feature vector. The reason why this normalisation was chosen was because it made intuitive sense to measure the deviation of the degrees of freedom with respect to the maximum and minimum of the local patch.

4.2 Extension to Two-Dimensional Problems

Fig. 3
figure 3

The HIO limiter wrongly getting activated in a smooth solution

The extension to two-dimensional problems is done in a straightforward manner, using the initial conditions detailed in Table 3 to generate the dataset.

To obtain the labels, the HIO limiter is used again with a threshold \(\zeta = 0.25\%\) for maximal difference between the limited and unlimited solutions. This is necessary as although the HIO limiter might not degrade smooth solutions, it is still triggered (Fig. 3).

Furthermore, the features used are shown on Table 4 and the features are normalised in the same manner as in the one-dimensional case.

Table 1 Runs used to generate the one-dimensional dataset
Table 2 Features table for the one-dimensional problem
Table 3 Runs used to generate the two-dimensional dataset
Table 4 Features table for the two-dimensional problem

5 Transfer Learning

With the objective to generate a shock-indicator function which does not depend on the underlying numerical scheme, we are interested in using a shock-indicator function trained with simulation data from a particular solver (numerical scheme and mesh type) and testing its performance in a different solver (in particular, for a different numerical scheme and mesh type). Furthermore, we want to explore different strategies which can be used to perform the transfer learning.

We describe how to adapt a neural network shock-indicator that has been trained on data from a modal DG scheme on a Cartesian mesh to a neural network shock-indicator that works on a residual distribution (RD) scheme ([1, 2, 32, 33] for a brief introduction) on a Cartesian mesh and an unstructured triangular mesh.

The motivations to study this type of problem are, first, for some numerical schemes, there are limiters which are designed to be parameter free which rely on a particular feature of the underlying numerical scheme. It would be desirable if limiters designed for a particular numerical scheme could be generalised to be used for different numerical schemes. Second, there might be a particular numerical solver for which it is easier to generate labeled data.

Traditional supervised machine learning operates under the assumption that training and testing data are taken from the same input space and the same data distribution. However, this assumption does not always hold. Transfer learning aims to produce an effective model for a target task with limited or no labeled training data, using knowledge from a different, but related problem.

Formally, the problem of transfer learning can be written as let \({\mathcal {X}}_{\text{s}}\) be the source instance space. In this space, each instance \(x_{\text{s}} \in {\mathcal {X}}_{\text{s}}\) is represented by a feature vector \(\mathbf {x}_{\text{s}} \in \mathbf {{\mathcal {X}}}_{\text{s}}\), where \(\mathbf {{\mathcal {X}}}_{\text{s}}\) denotes the source feature space. Let \({\mathcal {X}}_{\text{t}}\) be the target instance space and similarly, \(x_{\text{t}} \in {\mathcal {X}}_{\text{t}}\) is represented by a feature vector \(\mathbf {x}_{\text{t}} \in \mathbf {{\mathcal {X}}}_{\text{t}}\), where \(\mathbf {{\mathcal {X}}}_{\text{t}}\) denotes the target feature space. In the case of heterogeneous transfer learning, we have that \(\mathbf {{\mathcal {X}}}_{\text{t}}\ne \mathbf {{\mathcal {X}}}_{\text{s}}\).

Most heterogeneous transfer learning solutions involve a transformation of the feature spaces: a symmetric transformation takes both feature spaces \(\mathbf {{\mathcal {X}}}_{\text{t}}, \mathbf {{\mathcal {X}}}_{\text{s}}\) and learns a feature transformation to project each vector onto a common space for adaptation purposes \(\mathbf {{\mathcal {X}}}_{\text{c}}\) [42], whereas an asymmetric transformation transforms the source feature space to align with the target feature space (or vice-versa). This approach is appropriate when the source and target have the same class label space and one can transform between \(\mathbf {{\mathcal {X}}}_{\text{t}}\) and \(\mathbf {{\mathcal {X}}}_{\text{s}}\).

In this case, the source dataset is generated as detailed in Sect. 4 and the target datasets are described below (Sect. 5.2.1). The end goal is to understand to what extent a shock-indicator designed for one specific class of numerical schemes (in this work, modal DG on Cartesian meshes), can be effective on a different numerical scheme (RD, both for structured and unstructured meshes).

5.1 One-Dimensional Case

We use a trained neural network shock-indicator and we integrate it with a one-dimensional RD code. Note, we must find a projection of the solution computed with the RD scheme to the feature space of the considered neural network. The one-dimensional case is not difficult, in particular, because when designing the feature space for the one-dimensional limiter, quantities which are readily available in most numerical schemes were chosen.

5.2 Two-Dimensional Case

The two-dimensional case is not as simple, as the shock-indicator is trained using simulation data from Cartesian meshes, and the target problems are defined not only on Cartesian meshes but also on triangular, unstructured meshes.

To this end, we must find a common feature space between these two methods. In particular, we test the two simple strategies:

  • mapping to feature space of the Cartesian model,

  • followed by a retraining phase.

Mapping to unstructured mesh While the mapping to structured square meshes is quite straightforward, for unstructured meshes, we project the numerical solution of the RD scheme, defined on triangular elements, to the feature space of the classifier (defined on a Cartesian mesh, as defined in Table 4). The feature transformation can be found in Table 5.

Again, as mentioned in Sect. 3.1.2, to enforce directional invariance, we perform the aggregated prediction by permuting the stencil for the triangular meshes. This seems to help with the asymmetry introduced by replacing a side of the rectangle (in the feature space defined for the RKDG data) with the mean of the solution patch.

Retraining A model which was trained with the data generated by the DG scheme is loaded and retrained using the dataset as described in Sect. 5.2.1.

To avoid the phenomena of catastrophic forgetting of neural networks [16], which describes the lack of ability to learn different tasks in a sequence, in the retraining phase, a hybrid dataset containing elements from the target and source dataset is used, with a parameter \(\lambda\) which determines the ratio to be taken from each dataset.

The retraining algorithm is detailed in Algorithm 3.

figure c
Table 5 Features transformation

5.2.1 RD Dataset

To generate the dataset, small meshes are constructed (see Fig. 4). We represent the solution in each control volume as a linear combination of the polynomial basis. Because there is no straightforward way to generate a labeled dataset through simulations, we impose continuous and discontinuous functions, randomly varying the orientation of the discontinuity.

Fig. 4
figure 4

Example meshes

Using this method, we can generate a large dataset of examples which is close to the task at hand.

6 One-Dimensional Numerical Experiments

This section is split in three parts. First, we show the performance of several trained neural networks on an unseen validation set by measuring the accuracy, recall and precision. In the second part, we choose a model that performed well and we integrate it with a CFD code. The model runs as a black-box (denoted as NN) limiter and we compare its performance to the MIN limiter and the hierarchical HIO limiter through the \(L^1\) error norm. We perform some tests for the linear advection equation and the Euler system of equations. The initial conditions are chosen as different from the ones used for the training. Finally, the transferred limiter is tested in the context of a one-dimensional RD scheme.

6.1 Detection Rate

We measure the performance of several models (their architectures are given in Appendix A) on an unseen test set. Their performances are detailed in Table 6). Going forward, we select Model 4 as it performs well and the resulting size of the weights matrices per layer is significantly smaller than Model 3. It is debatable whether the differences between Models 3, 4 and 5 are statistically significant. Furthermore, surprisingly, we note that the weighted loss function did not improve the chosen performance metrics.

Table 6 Performances

6.2 Numerical Validation

6.2.1 Linear Advection

Consider a linear advection equation with \(a \in {\mathbb {R}}\):

$$\begin{aligned} \frac{\partial }{\partial t}u + a\frac{\partial }{\partial x}u = 0 \end{aligned}$$
(6)

and periodic boundary conditions.

Case of a Gaussian pulse We consider the following initial condition:

$$\begin{aligned} u_0(x) = 1 + 3\exp (-100(x-0.5)^2), \quad (x,t)\in [0,1]\times {\mathbb {R}}^+ \end{aligned}$$
(7)

with the advection velocity \(a = 1\).

The convergence is shown in Table 7 after one full crossing for orders 2 and 3. In Fig. 5, we show how the maxima is clipped using different limiters for grid sizes of \(N=40\) and \(N=80\). We note that the MIN limiter clips the maximum value of the solution, as expected. The HIO limiter behaves as the MIN limiter for the second-order case, but for the third order, it does not limit the solution. The neural network-based (NN) limiter also limits the solution at the second order and \(N=40\), but it improves as N increases. The NN limiter seems to be slightly less diffusive than the HIO limiter for the second-order case; furthermore, the performance does not depend (as much as the HIO limiter) on the order of the method.

Table 7 \(L^1\) error for one crossing of the Gaussian pulse (7) using different limiters
Fig. 5
figure 5

Maxima clipping of the Gaussian pulse (7) after one full crossing for approximation order of 2 (left) and 3 (right) for grid size \(N = 40\) and \(N = 80\)

Case of a smooth pulse and square hat The following initial conditions contain a smooth Gaussian pulse and a hat function:

$$\begin{aligned} u_0(x) = \left\{ \begin{array}{ll} 2, &{}\quad |x-0.7| \leqslant 0.1, \\ 1+\exp \left( -\frac{(x-0.25)^2}{2\times 0.05^2}\right) , &{}\quad \text{ otherwise, } \\ \end{array} \right. \quad (x,t)\in [0,1]\times {\mathbb {R}}^+, \end{aligned}$$
(8)

again with the advection velocity \(a = 1\).

The convergence is shown in Table 8 after one full crossing for orders 2 and 3. Furthermore, in Fig. 6, we show how the different limiters perform, for a grid size of \(N=40\) and \(N=80\) and at the second and the third order. Namely, we notice that at the second order and \(N=40\), the solution given by the NN limiter is similar to the unlimited one, with less undershoots, however, increasing the resolution to \(N=80\) seems to degrade the performance. Overall, in this case, the NN limiter seems able to get rid of most under and overshoots associated with discontinuities, while having a similar performance to the HIO limiter.

Table 8 \(L^1\) error for one crossing of the Gaussian pulse and hat function (8) using different limiters
Fig. 6
figure 6

Maxima clipping of the Gaussian and hat pulses (8) after one full crossing for approximation order of 2 (left) and 3 (right) for grid size \(N = 40\) and \(N = 80\)

6.2.2 One-Dimensional Euler Equation

Now we consider the one-dimensional Euler equations, which describe the behavior of an inviscid flow. This system of equations describes the evolution of a density \(\rho\), a velocity v, a pressure p and total energy E,

$$\begin{aligned} \frac{\partial }{\partial t}\rho + \frac{\partial }{\partial x}(\rho v)&= 0, \end{aligned}$$
(9)
$$\begin{aligned} \frac{\partial }{\partial t}(\rho v) + \frac{\partial }{\partial x}(\rho v^2 + p)&= 0, \end{aligned}$$
(10)
$$\begin{aligned} \frac{\partial }{\partial t}E + \frac{\partial }{\partial x}((E+p)v)&= 0. \end{aligned}$$
(11)

The system is closed with equation of state for an ideal gas

$$\begin{aligned} \rho e = \frac{p}{\gamma - 1}, \end{aligned}$$

where \(e = E\)\(\frac{1}{2}\rho v^2\) is the internal energy.

The NN limiter is applied sequentially for each variable.

Case of Sod shock tube We consider the standard Sod shock tube test, given by the initial conditions:

$$\begin{aligned} (\rho , v, p)(x,0) = \left\{ \begin{array}{ll} (1.0,0,1.0), &{}\quad 0< x \leqslant 0.5, \\ (0.125, 0, 0.1), &{}\quad 0.5< x < 1.0, \\ \end{array} \right. \quad (x,t)\in [0,1]\times [0,0.24] \end{aligned},$$
(12)

and \(\gamma = 1.4\) and gradient-free boundary conditions.

For both the second- and third-order results and a fixed resolution \(N=100\), the density, velocity and pressure fields at \(T=0.24\) are shown in Figs. 7 and 8. We note the solution produced by the NN limiter seems oscillation free and similar to the solution of the HIO limiter at both orders.

Fig. 7
figure 7

Density, velocity, and pressure fields for the Sod shock tube (12) at \(T=0.24\) for grid size \(N = 100\) and approximation order 2

Fig. 8
figure 8

Density, velocity, and pressure fields for the Sod shock tube (12) at \(T=0.24\) for grid size \(N = 100\) and approximation order 3

Case of Lax shock tube We consider the Lax shock tube test [25], given by the initial conditions:

$$\begin{aligned} (\rho , v, p)(x,0) = \left\{ \begin{array}{ll} (0.445,0.698,3.528), &{}\quad 0< x \leqslant 0.5, \\ (0.5, 0, 0.571), &{}\quad 0.5< x < 1.0, \\ \end{array} \right. \quad (x,t)\in [0,1]\times [0,0.08], \end{aligned}$$
(13)

and \(\gamma = 1.4\) and gradient-free boundary conditions.

In Figs. 9 and 10, we show the comparison between different limiters at \(T = 0.08\), for schemes of order 2 and 3, respectively. In Fig. 9, we show the density, velocity, and pressure fields for a second-order scheme and resolution \(N=100\). We note the NN limiter seems less diffusive than the HIO limiter, but also not all overshoots are well controlled. This is more obvious in the velocity and pressure fields. In Fig. 10, the same quantities are shown, but for a third-order scheme. Again, we note both the HIO and NN limiters are less diffusive than the MIN limiter.

Fig. 9
figure 9

Density, velocity, and pressure fields for the Sod shock tube (13) at \(T=0.24\) for grid size \(N = 100\) and approximation order 2

Fig. 10
figure 10

Density, velocity, and pressure fields for the Sod shock tube (13) at \(T=0.24\) for grid size \(N = 100\) and approximation order 3

Case of blast wave Next we consider the interacting blast waves test, given by the initial conditions:

$$\begin{aligned} (\rho , v, p)(x,0) = \left\{ \begin{array}{ll} (1.0,0,1\,\,000.0), &{}\quad 0< x \leqslant 0.1, \\ (1.0, 0, 0.01), &{}\quad 0.1< x \leqslant 0.9, \\ (1.0, 0, 100.0), &{}\quad 0.9< x < 1.0, \\ \end{array} \right. \quad (x,t)\in [0,1]\times [0,0.038] \end{aligned}$$
(14)

with \(\gamma = 1.4\) and reflexive boundary conditions.

In Figs. 11 and 12, we show the comparison between different limiters at \(T = 0.038\) for different orders. The unlimited solution is not shown because the code crashes due to the pressure becoming negative shortly after the start of the simulation for orders higher than 1. The dashed line denotes a high-resolution solution, run with \(N = 1\,\,000\), the third order with the HIO limiter. We can note that the NN limiter is not as good at suppressing oscillations as the MIN limiter and the HIO limiter, but stabilises the solution enough to finish the run. Furthermore, we note that the peak is better preserved, which means that it looks like the limiting is less strong than the MIN limiter and the HIO limiter.

Fig. 11
figure 11

Density, velocity, and pressure fields of the blast wave interaction (14) at \(T=0.038\) for grid size \(N = 100\) and approximation order 2

Fig. 12
figure 12

Density, velocity, and pressure fields of the blast wave interaction (14) at \(T=0.038\) for grid size \(N = 100\) and approximation order 3

6.3 Transfer to RD

In this section, we show the performance of the NN limiter applied to the RD scheme, and we compare with a state of the art limiting technique, MOOD [3]. Other stabilisation strategies for the RD require parameter tuning which are problem dependent.

Case of Sod shock The initial conditions are given as in (12). A qualitative result is shown in Fig. 13. It can be noted that the NN limiter is slightly more diffusive than MOOD, but that it seems to control some of the undershoots better.

Case of blast wave The initial conditions are given as in (14). A qualitative result is shown in Fig. 14. In this example, one can see that MOOD is significantly less diffusive than the NN limiter.

Fig. 13
figure 13

Density, velocity, and pressure fields of the Sod shock

Fig. 14
figure 14

Density, velocity, and pressure fields of the blast wave test case (14)

7 Two-Dimensional Numerical Experiments

Similar to Sect. 6, we first train a set of neural networks varying the number of neurons and layers, and select the one with best recall/precision score.

We then compare the performance of the NN limiter with the MIN limiter and the HIO limiter through the \(L^1\) error norm. We perform some tests for the linear advection equation and Euler system of equations. The initial conditions are chosen to be different from the ones used for the training. In Sect. 7.3, we show the results for the transfer to a RD scheme, for structured and unstructured meshes.

7.1 Numerical Validation

7.1.1 Linear Advection

Consider a linear advection equation with \(\mathbf {a} \in {\mathbb {R}}^2\):

$$\begin{aligned} \frac{\partial }{\partial t}u + \mathbf {a}\cdot \nabla u = 0 \end{aligned}$$
(15)

and periodic boundary conditions.

Smooth initial condition We consider the following initial conditions, which contain a smooth function:

$$\begin{aligned} u_0(x) = 1 + \sin (2\pi r)^{10}, \quad (x,t)\in [0,1]\times {\mathbb {R}}^+ \end{aligned}$$
(16)

with the advection velocity \(\mathbf {a} = (1,1)\) and periodic boundary conditions.

The errors and convergence rates are shown in Table 9 after one full crossing for orders 2 and 3 and different methods. The key-point to note is that the error between the unlimited case and the NN case are very similar, meaning that the shock-detector was not triggered as much as in the cases of the MIN and the HIO.

Table 9 \(L^1\) error for one crossing of the smooth ring (16) using different limiters

Case of smooth pulse and square hat We consider the following initial conditions, which contain a smooth Gaussian pulse and a hat function, defined in \((\mathbf {x},t)\in [0,1]^{2\times} {\mathbb {R}}^{+}\):

$$\begin{aligned} u_0(x) = {\left\{ \begin{array}{ll} 2, &{} (|\mathbf {x}-0.25|,|y-0.5|) \leqslant (0.1,0.1) ,\\ 1 + \exp \left( -100(||\mathbf {x}-\mathbf {x}_1||^2)\right) ,&{} x \geqslant 0.5, \\ 1, &{} \text{ otherwise,} \end{array}\right. }\end{aligned}$$
(17)

again with the advection velocity \(\mathbf {a} = (1,1)\), \(\mathbf {x}_1=(0.75,0.5)\).

The errors and convergence rates are shown in Table 10 after one full crossing for orders 2 and 3 and different methods.

Table 10 \(L^1\) error for one crossing of the Gaussian pulse and hat function (17) using different limiters

7.2 Two-Dimensional Euler Equation

Now we consider the two-dimensional Euler equations, which describe the behaviour of an inviscid flow. This system of equations describes the evolution of a density \(\rho\), a velocity vector \(\mathbf {v}=(v_1,v_2)\), a pressure p and total energy E,

$$\begin{aligned} \partial _t \rho + \nabla \cdot (\rho \mathbf {v})&= 0,\end{aligned}$$
(18)
$$\begin{aligned} \partial _t \rho \mathbf {v} + \nabla \cdot \left( \rho \mathbf {v}\otimes \mathbf {v} + p{\mathcal {I}}_{3}\right)&= 0,\end{aligned}$$
(19)
$$\begin{aligned} \partial _t E + \nabla \cdot \left( E + p\right) \mathbf {v}&= 0. \end{aligned}$$
(20)

The system is closed equation of state for an ideal gas

$$\begin{aligned} \rho e = \frac{p}{\gamma - 1}, \end{aligned}$$

where \(e = E-\frac{1}{2}\rho |\mathbf {v}|^2\) is the internal energy.

Case of the two-dimensional Sod shock tube We consider the radial Sod shock tube test, given by the initial conditions:

$$\begin{aligned} (\rho , v_x, v_y, p)(x,0) = \left\{ \begin{array}{ll} (1.0,0,0,1.0), &{}\quad 0< r \leqslant 0.5, \\ (0.125, 0, 0, 0.1), &{}\quad 0.5< r < 1.0, \\ \end{array} \right. \end{aligned}$$
(21)

where \(r=\sqrt{x^2+y^2}\), \((x,y)\in [0,1]^2\), \(\gamma = 1.4\) and gradient-free boundary conditions.

The solution maps at different increasing times \(t = 0.01, 0.1, 0.2\) and 0.24 are shown in Fig. 15. In the top row, the solution maps using the HIO limiter are shown, as well as the cells in which the limiter was triggered (dark regions). We note that there are regions without discontinuities which are still being limited. In the bottom row, we show the solution maps when using the NN shock detector and the areas where the shock-detector detects a shock (and which are subconsequently limited) are overlaid. We note that at the initial time the detection looks quite symmetric; however, over time, this is no longer the case. We also note that the shock fronts are being tracked, while the smooth regions are not being limited.

In Fig. 16, we show the projection of the solution along the radial axis, for \(N=64^2\) and \(N=128^2\). This allows us to see how the different limiters preserve the symmetry of the problem, as the perfect solution would have very little scatter. We see that both the NN shock detector and the HIO limiter appear to perform similarly, and the MIN limiter is more diffusive.

Fig. 15
figure 15

Detection comparison between HIO limiter (top row) and NN limiter (bottom row) at different times (t = 0.01, 0.1, 0.2 and 0.24)

Fig. 16
figure 16

Scatter plot of density of the two-dimensional Sod shock problem at \(T=0.24\) for \(N_{{\text{elem}}} = 64^2\) and \(N_{{\text{elem}}} = 128^2\) and approximation order 2

Case of the Riemann problem 12 We consider a two-dimensional Riemann problem (configuration 12) [24]. The initial data are

$$\begin{aligned} (\rho , v_x, v_y, p)(x,0) = \left\{ \begin{array}{ll} (1 ,0.727\,\,6, 0 , 1 ), &{}\quad x< 0, y> 0, \\ (0.8 ,0 ,0 , 1), &{}\quad x< 0, y< 0, \\ (0.531\,\,3, 0, 0, 0.4), &{}\quad x> 0, y> 0, \\ (1,0,0.727\,\,6,1), &{}\quad x > 0, y < 0, \end{array} \right. \end{aligned}$$
(22)

for \(\gamma = 1.4\) and gradient-free boundary conditions.

The solution maps at different increasing times \(t = 0.01, 0.1\) and 0.2 are shown in Fig. 17. As before, the top row shows the simulation using the HIO limiter is shown, as well as the cells in which the limiter was triggered. Similar to the previous case, there are regions where there is no discontinuity which are still being limited. In the bottom row, we show the solution map when using the NN shock detector and again, we denote the areas where the shock detector detects a shock (and which are sub-consequently limited). We note that the symmetry is quite well preserved across time.

Fig. 17
figure 17

Detection comparison between HIO limiter (top row) and NN limiter (bottom row) at different times (t = 0.01, 0.1 and 0.2)

7.3 Transfer to RD

In this section, we report the results of the NN limiter applied to an RD scheme. In particular, we report on the performance of the NN shock detector when no retraining phase is performed versus when it is retrained on a reduced dataset using data from numerical runs of an RD scheme.

7.3.1 Two-dimensional Sod Shock Tube

In Fig. 18, we show the performance of the two-dimensional shock-indicator function on a two-dimensional RD scheme on a Cartesian mesh which has been trained on RKDG data alone (top row) and retrained using the RD data (bottom row), at different time snapshots \(T = 0.01, 0.1, 0.2,\) and 0.24. We note that initially, the shock is well captured, but over time the shock front is no longer well captured. In particular, we can note that the symmetry of the solution with retraining (bottom row) seems to be broken. This is more evident in Fig. 19, where again, the radial projection of the solution is shown. The optimal solution would show a thin spread. The \(\lambda\) parameter shown in the figure provides the ratio between the (re)training data coming from the RKDG scheme and RD scheme. We can see that the more samples are taken from the RD scheme, the less effective the shock-detector becomes. Comparing the performance of the NN limiter with other RD limiters, we can note that it is less diffusive than the RD Psi-Galerkin scheme with Burmann jump stabilisation [8], but significantly worse than the MOOD limiter [3].

Fig. 18
figure 18

Detection comparison between the NN limiter without retraining (top row) and with retraining (bottom row) on a structured mesh at different times (t = 0.01, 0.1, 0.2 and 0.24)

Fig. 19
figure 19

Left: comparison between different limiters for the RD scheme on a structured mesh

Moving onto unstructured grids (see Fig. 20 for one of the grids used), we compare the performance of the transferred neural network shock-indicator without and with retraining on the RD data on an unstructured grid. In Fig. 21, we show again solution maps using the shock-indicator trained only on RKDG data (top row) and retrained using the RD data (bottom row), at different time snapshots \(T = 0.01, 0.1, 0.20,\) and 0.24. We can see that the adapted limiter detects troubled cells in the shock fronts. In the unstructured case, the rarefaction is no longer limited. In Fig. 22, we show again the radial projection of the two-dimensional Sod shock. In this case, it seems like retraining the shock-indicator yields a solution which is more diffusive, but less oscillatory. Comparing with other limiters available for the RD, we can note that the performance seems similar to the MOOD limiter, and the NN-based limiter is less diffusive than the Psi-Galerkin scheme with jump stabilisation.

Fig. 20
figure 20

Zoom on unstructured grid used in numerical experiments shown in Fig. 21

Fig. 21
figure 21

Detection comparison between the NN limiter without retraining (top row) and with retraining (bottom row) on an unstructured mesh at different times (t = 0.01, 0.1, 0.2 and 0.24)

Fig. 22
figure 22

Comparison between different limiters for the RD scheme on an unstructured mesh

8 Discussion

We showed that it is possible to learn a parameter-free shock-detector (after the training phase) from data. While the performance on the training set is relatively good (as observed in Sect. 6.1), the on-the-fly performance could be improved. One way would be to have a more representative training dataset in conjunction with the careful design of the loss function during the training phase (for example, to include information on the maximum preserving principle).

For the advection cases, we observed that this limiter was by far less diffusive than the MIN limiter. In the systems cases, some oscillations were corrected but there were other oscillations which were not stabilized enough, although none of the simulations crashed (whereas the unlimited solution was too unstable, namely for the blast wave case). Overall, the direct application of this model to unseen initial conditions for the advection equation and to the Euler system was somewhat successful.

We then explored strategies of transferring a trained shock-indicator function to a different numerical scheme, namely to a RD scheme. We tested two strategies: one of simply transforming the feature vector from a RD solution onto a feature vector from a RKDG solution, and another one of retraining a trained classifier on a reduced RD dataset. What we observed was that it was possible to use the shock-indicator function in a RD scheme, but that retraining on a reduced dataset did not make significant difference (perhaps it even deteriorated the performance of the shock-indicator). This is something to explore in future work.

The attractive property of this type of shock-indicator is that, once trained, it can operate as a black-box, parameter-free shock detector. We have verified that both in one- and two-dimensional problems we were able to attain better results when comparing to limiters which have not been properly hand-tuned. However, in the case of the RD, the transferred NN limiter did not perform as well as the MOOD limiter [3, 9, 41].

Some important difficulties we were faced with were as follows.

  1. (i)

    Lack of theoretical guarantees and quality of the numerical solution

    Throughout the time we worked on this problem, it became apparent that certain properties of the shock capturing function do not arise without a certain amount of careful considerations. For example, it was particularly concerning the fact that the detection was not symmetric with respect to a defined stencil (see Fig. 23). To overcome this problem, the literature typically suggests either a feature transformation which renders a certain property invariant or data augmentation by generating training examples that cover such invariance [27]. We empirically observed that merely performing data augmentation did not perform very well and it led to longer training times (due to the enlargement of the dataset). We then used an ensemble classification where the stencil is permuted and the prediction is averaged (as detailed in Sect. 3.1.2), such that, for example, a prediction on a particular stencil and the same stencil mirrored along the x-axis yield the same response. This improved the detection a lot in problems that had an obvious symmetry (like the two-dimensional Sod shock).

  2. (ii)

    Computational performance degradation

    We verified that, when integrating the shock-indicator with existing codes, the classification step was expensive (it entails, at best, a series of matrix multiplications, and at worst, an additional feature vector computation). This meant that even if the limiter was not triggered as often due to the shock-indicator step, the overall cost of using this particular shock-indicator was higher than with a traditional limiter, causing a slow down in all codes we considered.

  3. (iii)

    Lack of supporting theory on how to optimize, tune and generate deep neural networks

    To this day, the optimization of hyperparameters (such as architecture, shape, size, learning rates related to the NN) is approached mostly through trial and error. While several works [29] estimate lower bounds and necessary complexity of the network to capture a given complexity of the function to be learned, these results remain rather far from concrete applications.

Fig. 23
figure 23

No symmetry invariance considerations versus explicit symmetry invariance

8.1 Future Developments

Taking the previous list of main difficulties that we were faced with during this work, we can guide the discussion about our planned future developments and directions.

  1. (i)

    Lack of theoretical guarantees and quality of the numerical solution

    What we proposed in this work was exact invariance of the network with respect to rotations of the stencil. This adds some cost to the method during the classification phase. Recent ideas to introduce not invariance of the NN, but equivariance, has been explored in [12, 13] in the context of computer vision, and could be an interesting direction going forward.

  2. (ii)

    Computational performance degradation

    We verified integrating the shock-indicator with existing codes lead always to a computational performance degradation. This is not to say it is always going to be too computationally expensive—there have been some successful examples of learning a reduced network from a larger NN, and with this, a large computational gain has been observed [19].

  3. (iii)

    Lack of supporting theory on how to optimize, tune and generate deep NNs

    Using the python package hyperopt [5], designed to optimize hyperparameters over awkward search spaces with real-valued, discrete, and conditional dimensions, which makes it ideal for iterating machine learning hyper-parameters, we have been able to obtain similar performing neural network-based shock-indicators without any hand-tuning and which surprisingly had much less degrees of freedom than the one used in this work.

    In particular, defining the hyper-parameter space as

    1. number of hidden layers, \(n \in \{1,2,3,4\}\);

    2. number of neurons per layers \(p \in \{8, \cdots , 128\}\), allowed us to search for a good configuration of the deep NN.

  4. (iv)

    Better understanding of how to perform transfer learning

    The transfer learning approach we took in this project was a very simple (and possibly naive one). We are interested in exploring symmetric feature space transformations, which take both (or many) feature spaces \({\mathcal {X}}_1,\cdots ,{\mathcal {X}}_n\) (one can imagine coming from different numerical schemes) and finding a common feature space for adaptation purposes.

9 Conclusion

The purpose of this work was primarily to demonstrate the potential of using learning algorithms in CFD codes to automate away some parameter tuning, which is a common practice when using stabilisation methods, as well as to explore the idea of transferring knowledge across numerical schemes. In particular, we detailed the different stages necessary to train a black-box shock-detector that can be integrated with a limiter in different codes. To this end, we described how to construct a dataset for the training phase, how to set up a deep NN (a multilayer perceptron) that detects shocks, and how to integrate the trained model with existing CFD codes. We performed numerical experiments to validate the performance of the shock-detector (paired with a limiter) in the context of scalar and systems of equations. Furthermore, we use the model trained on the data generated with a DG code on an RD code, exploring different ways to perform the domain adaptation.

We then explored strategies of transferring a trained shock-indicator function to a different numerical scheme, namely, to an RD scheme.

We also found that for the exact task of shock-detection, a neural network-based limiter has some notable drawbacks in comparison to some limiters which are both quite agnostic to the underlying numerical method and require minimal parameter tuning (e.g., MOOD [3]). In particular, due to the lack of a systematic way to introduce domain knowledge onto the data-driven model (e.g., positivity of some quantities or the notion of maximum principle, rotation invariance, to name a few), we are of the opinion this neural network-based limiter is in the stage of being a prototype. However, it is our belief that these ideas can be applied to other problems which depend on certain local properties of the numerical solution, ultimately contributing towards CFD codes which are robust to different initial conditions and that require less parameter tuning to produce readily usable results.

With this in mind, we covered a few of the future improvements planned for this NN-based limiter.