1 Introduction

Efficiently capturing shock waves and other discontinuities is crucial for solving hyperbolic equations. Von Neumann and Richtmeyer pioneered shock-capturing work in 1950 [1], introducing artificial viscosity into a staggered Lagrangian scheme for compressible flow simulations. Today, advanced techniques such as essential non-oscillatory (ENO) [2], weighted ENO (WENO) [3], and discontinuous Galerkin methods [4] offer high-order accuracy in simulating problems involving shock waves. For more in-depth information on the development of shock-capturing methods, you can refer to [5, 6].

Due to the rapid development of machine learning and neural networks (NNs), they have been extensively employed to solve partial differential equations (PDEs) [7,8,9,10,11]. Among these applications, physics-informed neural networks (PINNs) have received significant research attention. PINNs encode PDEs or other model equations as one of their penalty components, making them versatile tools across various fields [12]. In the realm of hyperbolic equations, Mao et al. [13] studied one- (1D) and two-dimensional (2D) Euler equations with shock waves and used clustered training samples around high gradient area sto improve the solution accuracy in these regions while preventing error propagation to the entire domain. In another study, Patel et al. [14] introduced a PINN capable of discovering thermodynamically consistent equations that ensure hyperbolicity. This approach is particularly relevant for inverse problems in shock hydrodynamics. Furthermore, Jagtap et al. [15] proposed conservative PINNs that partition the computational domain into smaller subdomains, each employing distinct NNs to tackle Burgers’ equation and Euler equations. Additionally, Jagtap et al. [16] delved into inverse problems of supersonic flows.

The aforementioned studies showcase the effectiveness of PINNs in addressing inverse problems involving prior knowledge of flow structure development, such as density gradients [17]. However, when it comes to studying forward problems, the original PINNs have shown limited applicability, primarily restricted to simple problems like tracking moving shock waves. In contrast, Patel et al. [14], based on traditional shock-capturing methods, devised a mesh-based control-volume PINN and incorporated entropy and total-variation-diminishing conditions into the neural network. Additionally, Papados [18] extends the computational domain for simulating the shock-tube problem, yielding remarkable results without the introduction of non-physical viscosity terms into the equations. However, their open-source code suggests that this method is not very stable in the cases under consideration, and good results may only be achieved with fixed hyperparameters under a fixed number of training steps. Nonetheless, this work has demonstrated the potential of using the PINNs method for solving shock wave problems.

We are interested in applying PINNs to the computation of discontinuities, particularly in problems involving the generation of nonlinear strong discontinuities, such as shock waves in hyperbolic equations. Mathematically, these discontinuities possess zero thickness and infinite gradients, rendering them beyond the description of strong form PDEs. Instead, their behavior can be controlled by physical laws or weak form equations. Furthermore, to the best of our knowledge, there is no existing theory that can guarantee neural networks (NNs) accurately approximate \(\textrm{C}^0\) discontinuous functions. Consequently, when residual points fall within a shock region, significant equation losses are incurred due to their steep gradients. These points, located in high-gradient regions where the shock is expected to occur, are referred to as ’transition points’ [19]. While an NN may prioritize handling transition points as they contribute the most to the loss. However, an NN cannot increase the gradient to decrease the thickness of the shock because of the aforementioned reason. As a result, transition points fall into a paradoxical state - whether the gradient increases or decreases, the training loss inevitably rises. What’s more concerning is that transition points not only impact the total loss but also influence convergence in smooth regions. We elucidate this phenomenon in Sect. 2.2 through a test involving the Burgers’ equation.

To enhance the shock-capturing capabilities of PINNs, this paper presents three key contributions. Firstly, in order to break the paradoxical status at transition points and enable PINNs to represent strong nonlinear discontinuities effectively, we introduce a ‘retreat to advance’ strategy. This strategy weakens the neural network’s expressions in regions of strong compression and large gradients by introducing a physical pointwise equation weight, enabling the network to focus on training in other regions. As a result, depending on the physical compression mechanisms from the well-trained smooth regions, strong discontinuity solutions automatically generate.

Secondly, for the effective control of shock wave solutions and to prevent underdetermined problems, we incorporate the Rankine–Hugoniot (RH) relation, which is equivalent to the weak form of the conservation laws, as constraints in the vicinity of the shock waves. Also, we implement limiters to identify the appearance of shock waves in this part.

Thirdly, in the computation of nonlinear hyperbolic equations, such as the Euler equations, preserving physical conservation is of paramount importance. For instance, it directly impacts the accuracy of shock wave positions. To address this, we introduce a global physical conservation constraint within the new framework.

The remainder of this paper is organized as follows. In Sect. 2, we first analyze the failure of classical PINNs in solving shock waves with the non-dissipative Burgers’ equation. Then, in Sect. 3, we detail our proposed PINNs-WE method. Next, various 1D and 2D forward examples are studied to demonstrate the effectiveness of the proposed method in Sect. 4. Finally, conclusions are drawn in Sect. 5. In addition, we also discuss the omissible boundary conditions in PINNs in Appendix A and verify original PINNs in solving problems with linear or weak discontinuities in Appendix B.

2 Classical PINNs and Problem Analysis

2.1 PINNs for Conservative Hyperbolic PDE

We consider the following conservative hyperbolic PDE

$$\begin{aligned} \frac{\partial {{\textbf{U}}}({{\textbf{x}}})}{\partial t} + \nabla \cdot {{\textbf{F}}}({{\textbf{U}}}) = 0, \quad {\textbf{x}}=(t,x_1,x_2,\cdots )\in \Omega , \end{aligned}$$
(1)

with the initial and boundary conditions ( IBCs):

$$\begin{aligned} \textrm{IBCs}({{\textbf{U}}},{{\textbf{x}}}) = 0 \quad \text {on} \quad \partial \Omega , \end{aligned}$$
(2)

and we treat the initial condition in the same way with Dirichlet boundary condition.

The classical PINNs typically consist of two parts. The first part is a neural network \(\hat{{\textbf{U}}}({{\textbf{x}}};{\varvec{\theta }}) \) used to approximate the relation of \({{\textbf{U}}}({{\textbf{x}}})\) with trainable parameters \({\varvec{\theta }}\). The second part is informed by the governing equations as well as the initial and boundary conditions, which are used to train the network. The calculation of \({\partial }/{\partial t}\) and \(\nabla \cdot \) in the PDE are carried out through automatic derivative evaluation. More details information about PINNs for convection PDE can be found in [13, 18].

The loss function used to train \(\hat{{\textbf{U}}}({{\textbf{x}}};{\varvec{\theta }})\) comprised at least two components that define the problem. One component is controlled by the equations, and the other one is given by the IBCs of the problem,

$$\begin{aligned} {\mathcal {L}} = {\mathcal {L}}_{\textrm{PDE}} + \omega _{\textrm{IBCs}} {\mathcal {L}}_{\textrm{IBCs}}. \end{aligned}$$
(3)

To define the loss, we choose a set of residual points inside the domain \(\Omega \) and another set of points on \(\partial \Omega \) as \({\mathcal {S}}_{\textrm{PDE}}\) and \({\mathcal {S}}_{\textrm{IBCs}}\), respectively. Then

$$\begin{aligned} {\mathcal {L}} = \frac{1}{|{\mathcal {S}}_{\textrm{PDE}}|}\sum _{{{\textbf{x}}}_i \in {\mathcal {S}}_{\textrm{PDE}}}{} \textbf{G}_i^2 + \omega _{\textrm{IBCs}} \frac{1}{ |{\mathcal {S}}_{\textrm{IBCs}}|} \sum _{{{\textbf{x}}}_i \in {\mathcal {S}}_\textrm{IBCs}} (\hat{\textbf{U}}_{e,i} - \textbf{U}_{e0,i})^2, \end{aligned}$$
(4)

where \( \textbf{G}_i: = \partial _t \hat{\textbf{U}}(\textbf{x}_i) + \nabla \cdot \mathbf{F({\hat{U}}(x_i))}\), \(\textbf{G}_i=0\) is the governing equations at residual point \(\textbf{x}_i \in {\mathcal {S}}_{\textrm{PDE}} \) and \(\textbf{U}_{e0,i}\) represents the given IBCs at residual point \(\textbf{x}_i \in {\mathcal {S}}_{\textrm{IBCs}} \). \(\omega _{\textrm{IBCs}}\) is the weight to adjust the confinement strength of IBCs [16, 18, 20]. Typically, more weight is assigned to points located on initial and boundaries.

In each term of a loss function, it is common to average the residual across all residual points to obtain the total training loss. Averaging is often suitable and convenient for problems with smooth solutions. However, when a nonlinear discontinuity forms under compression, the gradient becomes theoretically infinite and cannot be directly described by differential equations. As a result, transition points defined within the discontinuity may introduce significant errors and loss of function. We will illustrate this in the following example.

2.2 Analysis of Transition Points Based on Inviscid Burgers’ Equation

We demonstrate in Appendix B that classical PINNs can sharply capture linear or weak discontinuities without the need of introducing numerical dissipation to maintain stability. However, when it comes to solving nonlinear strong discontinuities, particularly compression discontinuities like shock waves, PINNs encounter significant challenges. To illustrate this phenomenon in the context of PINNs solving problems with compression discontinuities, we consider an inviscid Burgers’ equation problem with the following equation and IBCs as

$$\begin{aligned} \begin{aligned}&\frac{\partial u}{\partial t} + \frac{\partial (u^2/2)}{\partial x} = 0, \quad x\in [0,2], \quad t \in [0,1], \\&u(0,x) = -\sin (\pi (x-1)),\\&u(t,0) = u(t,2) = 0.\\ \end{aligned} \end{aligned}$$
(5)

In this problem, the initial condition is smooth. However, as the initial velocities on both sides are directed towards the center, a strong discontinuous solution will form in the center after finite time with the compression from both the left and right sides.

We solve this problem using the original PINNs method, which is referred to as ‘PINNs’ in this paper and was introduced in the previous section. Figure 1 presents the loss history, the variable u, and the residual at \(t=1\) for different training epochs (1000, 3000, 5000, 8000, and 11500). It is evident that the PINNs initially tend to achieve a global smooth solution to satisfy the initial boundary conditions (checkpoint 1). Subsequently, the training focuses on reducing the residual to approach the exact solution of the problem. However, when the transition points reach relatively high gradients (checkpoint 2), they carry nearly all of the function loss \({\mathcal {L}}_{\textrm{PDE}}\) (Fig. 1c). The training enters a paradoxical state - irrespective of an increase (checkpoint 3) or decrease (checkpoint 4) in the gradient, the training loss always increases. This is because these points cannot be directly controlled by the strong form of the PDE. So a decrease in the gradient will cause the PINNs solution to deviate from the exact solution, while an increase will also result in an increased residual.

As shown in the loss history (Fig. 1a), after 3000 epochs (checkpoint 2), the training encounters difficulties with these transition points and struggles to effectively reduce the total loss. More importantly, as a global method, the total loss also influences the error magnitude in other regions. This stands in contrast to classical finite-volume or finite-difference methods. In those classical methods, when a cell is located within a discontinuous region, the scheme’s order automatically reduces to no more than second-order, and significant dissipation is introduced into the cell to achieve a non-oscillatory result. However, the locally large error does not impact the accuracy order of other cells outside the discontinuity.

Fig. 1
figure 1

Results of the Burgers equation with PINNs. The NN has 4 hidden layers and 30 neurons per layer. PDE residual points are distributed uniformly on a \(100\times 100\) grid in the \(X \times T\) space, and the IBCs points are set on a uniform grid of 100. The optimizer used is Adam with a learning rate of 0.001. The reference result is provided by WENO-Z on a refined mesh with 10,000 spatial grids

3 PINNs-WE Framework

We assume that the flow field is continuous except for finite discontinuities. To effectively capture strong nonlinear discontinuities using PINNs, we introduce a new weighted equation framework. The basic idea is to separate the shock waves from other regions and solve them with different constraints. As depicted in Fig. 2, the new framework consists of three key components.

The first part is a local strong form PDE constraint \({\mathcal {L}}_{\textrm{PDE}}\) for simulating the smooth regions of the computational domain, which also includes linear or weak discontinuities, as explained in Appendix B.

The second part is Rankine–Hugoniot relation constraint \({\mathcal {L}}_{\textrm{RH}}\), which equivalent to local weak form of the conservation laws across the shock waves.

The last part is a global conservation component \({\mathcal {L}}_\textrm{CONs}\) that enforces total conservation.

The algorithm details for solving 2D Euler equations are presented in Fig. 3, while the sample domains for each part are illustrated in Fig. 4.

Fig. 2
figure 2

The basic idea of PINNs-WE framework in solving problems with strong nonlinear discontinuities

Fig. 3
figure 3

Architecture of PINNs-WE for solving 2D Euler equations

Fig. 4
figure 4

The sample space of residual points in different loss parts

3.1 Local Strong form PDE Constraint

Based on the aforementioned analysis of the Burgers’ equation, the training of PINNs can become challenging due to the paradoxical status of transition points.

Therefore, the first and most important part of our method is to resolve this paradoxical status by introducing a weighted-equation (WE) method. The core idea behind this method is to reduce the impact of points in high-gradient regions by introducing a local positive weight, denoted as \(\lambda _1\), into the governing equations.

The effectiveness of this method is based on the fact that strong nonlinear discontinuities are formed by the convergence of characteristic lines. By reducing the influence of transition points through the adjustment of \(\lambda _1\), the neural network (NN) will prioritize training in the smooth regions and achieve high accuracy in those areas. Consequently, the transition points, which are inherently influenced by the compression properties of strong discontinuities, will effectively be compressed into a smooth region under the weak form equation constraint introduced in the next part. As a result, a sharp and precise discontinuity will emerge during training.

For a general conservative equation (Eq. 1), if we define

$$\begin{aligned} \textbf{G} = \frac{\partial \textbf{U}}{\partial t}+ \nabla \cdot \textbf{F}, \end{aligned}$$
(6)

the weighted equation is defined as follows:

$$\begin{aligned} \textbf{G}_{\textrm{new}} = \lambda _1\left( \frac{\partial \textbf{U}}{\partial t}+ \nabla \cdot \textbf{F}\right) . \end{aligned}$$
(7)

It is evident that \(\textbf{G}_{\textrm{new}} = 0\) shares the same solutions as \(\textbf{G} = 0\) if \(\lambda _1\) remains positive. Moreover, we can adjust the NN’s expression at different points through the design of the weight \(\lambda _1\). Correspondingly, the new function loss is defined as:

$$\begin{aligned} {\mathcal {L}}_{\textrm{PDE}} = \textrm{MSE}(\textbf{G}_{\textrm{new}}) = \frac{1}{|{\mathcal {S}}_{\textrm{PDE}}|}\sum _{{{\textbf{x}}}_i \in {\mathcal {S}}_{\textrm{PDE}}}{} \textbf{G}_{\textrm{new},i}^2. \end{aligned}$$
(8)

We would like to note that McClenny and Braga-Neto [21] have also introduced a trainable weight into the loss function to focus the NN on improving approximation in difficult-to-train regions by increasing the loss weights on them. However, based on our analysis, we need to decrease the weight near a physical discontinuity. We believe that this process cannot be effectively controlled by strong form PDEs residual.

Another relative work is proposed by Wang et al. [22], they identify and analyze a fundamental failure mode related to numerical stiffness causing imbalanced gradients during model training. To address this, they introduce a learning rate annealing algorithm that uses gradient statistics to balance the interplay of various terms in composite loss functions. However, the distinction lies in the fact that the computation of shock waves results in an imbalance in spatial derivatives due to physical factors and the non-validity of the strong form PDEs.

We are explaining here that the reduction of sampling points near discontinuities or the extension of the computational domain can also be considered as a method of attenuating the attention on regions close to the discontinuities. However, such methods may, on one hand, reduce the resolution of shock waves and, simultaneously, make it challenging to achieve sufficient weakening strength.

Inspired by the seminal work [1] on artificial viscosity, we define a physics-dependent weight as follows:

$$\begin{aligned} \lambda _1 = \frac{1}{k_1 (|\nabla \cdot \vec {u}| - \nabla \cdot \vec {u})+1}, \end{aligned}$$
(9)

Here, \(\vec {u}\) represents the velocity field, and \(k_1\) is a factor used to adjust \(\lambda _1\). This factor may vary from case to case; Based on the tests, we suggest using a global value of \(k_1 = 0.2\) for this research.

The construction of this weight is based on the fact that velocity divergence \(\nabla \cdot \vec {u}\) becomes negative when the field is compressed, and \(\nabla \cdot \vec {u} \rightarrow -\infty \) when a shock appears. It’s important to highlight that we only utilize velocity divergence to detect shocks and apply \(\lambda _1\) to the equations without introducing any numerical dissipation. Since \(\lambda _1\) is a positive constant, it does not affect the exact solution of the equations.

3.2 Local Weak form of the Conservation Laws Constraint

Since strong form PDEs cannot describe strong discontinuous solutions, the PDE constraints near discontinuities can lead to significant loss errors. Therefore, in the first part, we reduce the weight of the equations near the shock wave regions to eliminate the incorrect constraints in PINNs. However, until now, there have been no appropriate constraints over the shock regions, making the problem underconstrained.

Although strong form solutions do not exist across the shock waves, the physical conservation laws still hold, meaning that the physical flux crossing the shock waves must be conservative. This leads us to the Rankine–Hugoniot (RH) relation of the conservation laws (1):

$$\begin{aligned} s\cdot [\![\textbf{U}]\!] = [\![\textbf{F}]\!], \end{aligned}$$
(10)

where s represents the normal velocity of the shock wave, and for any given variable f,

$$\begin{aligned}{}[\![f]\!] = {f}_1 -{f}_2, \end{aligned}$$
(11)

where \({f}_1\) and \({f}_2\) are the values of f on different sides of the discontinuity.

The RH relation can also be deduced from and is equal to the weak form conservation laws across shock waves. More information can be found in [23]. Here, we provide the RH relation and the corresponding constraints for 1D Burgers equation, as well as 1D and 2D Euler equations.

3.2.1 1D Burgers’ Equation

For Burgers’ equation (5), the RH relation is expressed as follows:

$$\begin{aligned} s [\![u]\!] = [\![ u^2/2]\!], \end{aligned}$$
(12)

where the velocity of the discontinuity is given by

$$\begin{aligned} s = \frac{u_1 + u_2}{2}. \end{aligned}$$
(13)

In the case of the given initial antisymmetric condition in (5), as the discontinuity forms at the center \(x=1\), the value of s remains zero. Therefore, we have the RH constraint as

$$\begin{aligned} f_{\textrm{RH}}(x,t)= u(x=1,t) - u(x=1,0) = 0, \end{aligned}$$
(14)

which can be considered as an additional constraint for the Burgers equation. This constraint is formulated as:

$$\begin{aligned} {\mathcal {L}}_{\textrm{RH}} = \textrm{MSE}(f_{\textrm{RH}}) = \frac{1}{|{\mathcal {S}}_{\textrm{RH}}|}\sum _{{{\textbf{x}}}_i \in {\mathcal {S}}_{\textrm{RH}}}{f}_{\textrm{RH},i}^2. \end{aligned}$$
(15)

The settings of \( {\mathcal {S}}_{\textrm{RH}}\) and other sampling sets mentioned in this article can be referred to in Fig. 4. Of course, \( {\mathcal {S}}_{\textrm{RH}}\) and \( {\mathcal {S}}_{\textrm{PDE}}\) can be completely identical here.

3.2.2 Euler Equations

We next consider the 1D and 2D Euler equations in their conservative forms,

$$\begin{aligned} \frac{\partial \textbf{U}}{\partial t} + \nabla \cdot \textbf{F} = 0. \end{aligned}$$
(16)

For the 1D case, the Euler equations are defined as

$$\begin{aligned} \mathbf{{U}} = \left( \begin{aligned}&\rho \\&\rho u\\&E \end{aligned}\right) , \textbf{F} = \left( \begin{aligned}&\rho u\\&\rho u^2 + p\\&u(E + p) \end{aligned}\right) . \end{aligned}$$
(17)

For 2D cases, we have two flux vectors, \(\textbf{F}_1\) and \(\textbf{F}_2\), with the following definition:

$$\begin{aligned} \textbf{U} = \left( \begin{aligned}&\rho \\&\rho u\\&\rho v\\&E \end{aligned}\right) , F_1= \left( \begin{aligned}&\rho u\\&\rho u^2 + p\\&\rho uv\\&u(E + p) \end{aligned}\right) , F_2= \left( \begin{aligned}&\rho v\\&\rho uv\\&\rho v^2 +p\\&v(E + p) \end{aligned}\right) . \end{aligned}$$
(18)

In these equations, \(\rho \) represents density, u is the velocity in the x direction, v is the velocity in the y direction, p is the pressure, and E stands for the total energy. To close the equations, we apply the ideal gas equation of state:

$$\begin{aligned} E = \frac{1}{2}\rho u^2 + \frac{p}{\gamma -1}, \end{aligned}$$
(19)

Here, \(\gamma = 1.4\) represents the specific heat ratio.

Now, let’s discuss the RH relations. We will first address the 2D Euler equations and subsequently, the 1D case.

For 2D Euler equations, the RH relation is given as

$$\begin{aligned} \begin{aligned}&s [\![\rho ]\!] - [\![\rho \vec {u} \cdot \vec {n}]\!] =0, \\&s [\![\rho u]\!] - [\![\left( \rho \vec {u} \cdot \vec {n}\right) u+p n_{x}]\!] =0, \\&s[\![\rho v]\!] - [\![\left( \rho \vec {u} \cdot \vec {n}\right) v+p n_{y}]\!]=0, \\&s[\![E]\!] - [\![ (E+p) \vec {u} \cdot \vec {n}]\!]=0. \end{aligned} \end{aligned}$$
(20)

Here, \(\vec {n}= (n_x,n_y)\) represents the unit normal vector of the discontinuity. To simplify, we can eliminate the variables s and \(\vec {n}\) from (20), resulting in the following two equations,

$$\begin{aligned} \rho _1 \rho _2 \left[ ( u_{1} - u_{2})^2 + ( v_{1} - v_{2})^2 \right]= & {} (p_1-p_2)(\rho _1 - \rho _2), \end{aligned}$$
(21)
$$\begin{aligned} \rho _1 \rho _2 (e_1 -e_2)= & {} \frac{1}{2} (p_1+p_2)(\rho _1 -\rho _2). \end{aligned}$$
(22)

These two equations are commonly employed in the field of gas dynamics, with the latter being the renowned Hugoniot relation. These two relations are also valid for 1D case where \(v_1 = v_2 = 0\). Then we let

$$\begin{aligned} \begin{aligned} f_1(\textbf{U}_1,\textbf{U}_2)&= \rho _1 \rho _2 \left[ ( u_{1} - u_{2})^2 + ( v_{1} - v_{2})^2 \right] - (p_1-p_2)(\rho _1 - \rho _2),\\ f_2(\textbf{U}_1,\textbf{U}_2)&=\rho _1 \rho _2 (e_1 -e_2) - \frac{1}{2} (p_1+p_2)(\rho _1 -\rho _2). \end{aligned} \end{aligned}$$
(23)

To establish the constraint in PINNs, for each RH residual point \(\textbf{x}(x,y,t) \in {\mathcal {S}}_{\textrm{RH}}\), we introduce two companion points as \(\textbf{x}_L\) and \(\textbf{x}_D\):

$$\begin{aligned} \textbf{x}_L = (x-\Delta x, y, t), \quad \textbf{x}_D = (x, y-\Delta y, t). \end{aligned}$$

Then we utilize them to construct the RH constraint as

$$\begin{aligned} \begin{aligned} {\mathcal {L}}_{\textrm{RH}}&= \textrm{MSE}(\lambda _2(\textbf{U},\textbf{U}_L) f_{1}(\textbf{U},\textbf{U}_L)) + \textrm{MSE}( \lambda _2(\textbf{U},\textbf{U}_L) f_{2} (\textbf{U},\textbf{U}_L)) + \\&\quad \textrm{MSE}(\lambda _2(\textbf{U},\textbf{U}_D) f_{1}(\textbf{U},\textbf{U}_D)) + \textrm{MSE}( \lambda _2(\textbf{U},\textbf{U}_D) f_{2} (\textbf{U},\textbf{U}_D)),\\ \end{aligned} \end{aligned}$$
(24)

where \( \textbf{U} = \textbf{U}(\textbf{x})\), \(\textbf{U}_L = \textbf{U}(\textbf{x}_L)\) and \(\textbf{U}_D = \textbf{U}(\textbf{x}_D)\), respectively.

The hybridization of strong form and weak form equations can be effective, with the key factor being how well we can capture shock waves. In the first part, the added physical weight can adaptively weaken the strong-form PDEs near shock waves with a weak form. However, in the second part, the RH relation is derived under the assumption of shock waves. Therefore, we design a new strong-form filter to detect shock waves using the following expression:

$$\begin{aligned} \lambda _2 (\textbf{U}_1, \textbf{U}_2)= \left\{ \begin{aligned}&|(p_1 -p_2)(\vec {u}_1-\vec {u}_2)|&\textrm{if} \quad |p_1 -p_2|> \varepsilon _1 \quad \textrm{and} \quad |u_1-u_2| > \varepsilon _2, \\&0&\textrm{elsewhere}. \end{aligned}\right. \end{aligned}$$
(25)

Here, \(\varepsilon _1\) and \(\varepsilon _2\) are two parameters used to detect jumps in shock waves and can be adjusted according to the specific problem. After normalizing and dimensionless scaling of the problem, we set \(\varepsilon _1 = \varepsilon _2 = 0.1 \) for all the cases in this paper.

3.3 A Total Physical Conservation Constraint

In solving Euler equations, physical conservation laws are of paramount importance, especially when it comes to accurately determining the position of shock waves. We can explain this by reconsidering the RH relation (20). In 1D case, assume we know \((\rho _1,u_1,p_1)\) on one side, their are three equations but four variables left, as the velocity of the discontinuity s is also unknown. As a conclusion, the discontinuity cannot be completely determined solely by RH relation (local conservation laws), so additional conservation relations from the left and right smooth parts are necessary. However, in PINNs, even though we utilize a system of conservation equations, we still cannot achieve the theoretical total conservation properties as with finite difference or finite volume methods. Therefore, in the third part of this work, in order to improve the conservation, we add a soft total conservation constraint to PINNs-WE framework.

Here we consider the conservation of the total mass, momentum and total energy, which are defined as

$$\begin{aligned} \begin{aligned} \textrm{Mas}_{\textrm{Exact}}&= \int _V \rho ({\textbf{x}}) \textrm{d} V, \\ \textrm{Mom}_{\textrm{Exact}}&= \int _V \rho ({\textbf{x}})\vec {u} ({\textbf{x}})\textrm{d}V, \\ \textrm{Ene}_{\textrm{Exact}}&= \int _V E ({\textbf{x}})\textrm{d}V. \end{aligned} \end{aligned}$$
(26)

The conservation of total mass, momentum, and total energy between time \(t=t_1\) and \(t=t_2\) can be expressed as follows:

$$\begin{aligned} \begin{aligned} \int _V \rho \textrm{d} V|_{t=t_2} -\int _V \rho \textrm{d} V|_{t=t_1}&=\int _{t_1}^{t_2} \oint _{\partial V} \rho (\vec {u}\cdot \vec {n}_{\partial V}) \textrm{d} t\textrm{d}A,\\ \int _V \rho \vec {u}\textrm{d} V|_{t=t_2} -\int _V \rho \vec {u}\textrm{d} V|_{t=t_1}&=\int _{t_1}^{t_2} \oint _{\partial V} \rho \vec {u} (\vec {u} \cdot \vec {n}_{\partial V}) + p \vec {n}_{\partial V} \textrm{d} t\textrm{d}A, \\ \int _V E\textrm{d} V|_{t=t_2} -\int _V E\textrm{d} V|_{t=t_1}&=\int _{t_1}^{t_2} \oint _{\partial V} (E+p) (\vec {u} \cdot \vec {n}_{\partial V}) \textrm{d} t\textrm{d}A, \\ \end{aligned} \end{aligned}$$
(27)

where \(E = \rho e + \frac{1}{2}\rho \vec {u}^2\), V is the volume of the computational domain, and \({{\textbf{x}}} = (t=t_k,x)\) for the 1D problem and \({{\textbf{x}}} = (t=t_k,x,y)\) for the 2D problem. \(\partial V\) is the boundary of the computational domain, and \(\vec {n}_{\partial V}\) is the unit normal vector of the boundary.

As the residual points are random or uniform sampled, we approximate the total conservation constraints at time \(t=t_k\) as

$$\begin{aligned} \begin{aligned} \textrm{Mas}(t_k)&= \frac{V}{ |{\mathcal {S}}_{\textrm{Con}}(t_k)|}\sum _ {{{\textbf{x}}} \in {\mathcal {S}}_{\textrm{Con}}(t_k)} \rho ({{\textbf{x}}})\\ \textrm{Mom}(t_k)&= \frac{V}{ |{\mathcal {S}}_{\textrm{Con}}(t_k)|}\sum _ {{{\textbf{x}}} \in {\mathcal {S}}_{\textrm{Con}}(t_k)} \rho ({{\textbf{x}}}) \vec {u}({{\textbf{x}}}) \\ \textrm{Ene}(t_k)&= \frac{V}{ |{\mathcal {S}}_{\textrm{Con}}(t_k)|}\sum _ {{{\textbf{x}}} \in {\mathcal {S}}_{\textrm{Con}}(t_k)} E({{\textbf{x}}}) \\ \end{aligned} \end{aligned}$$
(28)

The boundary terms can also be approximated as

$$\begin{aligned} \begin{aligned} \textrm{BD}_{\textrm{Mas}}(t_1,t_2)&= \frac{(t_2-t_1)A}{|{\mathcal {S}}_\textrm{BD}|}\sum _{{{\textbf{x}}} \in {\mathcal {S}}_{\textrm{BD}}} \rho ({\textbf{x}})(\vec {u}({{\textbf{x}}})\cdot \vec {n}_{\partial V}({{\textbf{x}}})), \\ \textrm{BD}_{\textrm{Mom }}(t_1,t_2)&= \frac{(t_2-t_1)A}{|{\mathcal {S}}_\textrm{BD}|}\sum _{{{\textbf{x}}} \in {\mathcal {S}}_{\textrm{BD}}} \left[ \rho ({\textbf{x}}) (\vec {u}({{\textbf{x}}}))^2 +p ({{\textbf{x}}})\right] \cdot \vec {n}_{\partial V}({{\textbf{x}}}), \\ \textrm{BD}_{\textrm{Ene}} (t_1,t_2)&= \frac{(t_2-t_1)A}{|{\mathcal {S}}_\textrm{BD}|}\sum _{{{\textbf{x}}} \in {\mathcal {S}}_{\textrm{BD}}} \left[ E({\textbf{x}}) +p ({{\textbf{x}}})\right] \vec {u}({{\textbf{x}}}) \cdot \vec {n}_{\partial V}({{\textbf{x}}}), \\ \end{aligned} \end{aligned}$$
(29)

where \({\mathcal {S}}_{\textrm{BD}}\) is the set of residual points sampled on the boundary over the computational time \(t_1 \le t \le t_2\), and A is the area of the boundary surface. Then the conservation loss is constructed as

$$\begin{aligned} {\mathcal {L}}_{\textrm{CONs}}= & {} (\textrm{Mas}(t_2) - \textrm{Mas}(t_1) - \textrm{BD}_{\textrm{Mas}})^2+(\textrm{Mom}(t_2) - \textrm{Mom}(t_1) - \textrm{BD}_\textrm{Mom})^2 \nonumber \\{} & {} + (\textrm{Ene}(t_2) - \textrm{Ene}(t_1) - \textrm{BD}_{\textrm{Ene}})^2. \end{aligned}$$
(30)

4 Numerical Examples

We employ PINNs-WE to address forward problems involving the Burgers’ equation and Euler equations, without relying on prior information from the data, except for the initial boundary conditions (IBCs). We then compare the results with those obtained using traditional high-order finite difference methods, specifically the fifth-order WENO-Z method for spatial discretization [24] and the third-order Runge–Kutta method for time integration [25].

4.1 Inviscid Burgers’ Equation

We first solve the inviscid Burgers’ equation (5) using the new PINNs-WE method with the same setting as in Sect. 2.2.

In Burgers’ equation, as the position of the discontinuity is determined by the antisymmetric condition as discussed in Eq. 14, so the introduction of total conservation of u, that is \(\int udx = 0\), will be redundant. So the total loss is constructed as

$$\begin{aligned} {\mathcal {L}} =\omega _{\textrm{PDE}} {\mathcal {L}}_{\textrm{PDE}} + \omega _\textrm{IBCs} {\mathcal {L}}_{\textrm{IBCs}} + \omega _{\textrm{RH}} {\mathcal {L}}_{\textrm{RH}}. \end{aligned}$$
(31)

Here, the \(\omega \) is weight for each term to adjust the confinement strength [16, 18, 20]. Typically, more weight is assigned to points located on initial and boundaries and near the shocks. We set

$$\begin{aligned} \omega _{\textrm{PDE}} = 1, \quad \omega _{\textrm{IBCs}} = 10, \quad \omega _{\textrm{RH}} = 10, \end{aligned}$$
(32)

the velocity is represented by u itself. Therefore, the weight in \(\textbf{G}_{\textrm{new}}\) is given by:

$$\begin{aligned} \lambda _1= \frac{1}{ k_1 (|\frac{\partial u}{\partial x}| - \frac{\partial u}{\partial x})+1}. \end{aligned}$$
(33)

The RH constraint is defined as in (15). All the residual points are uniformly distributed. The RH points are sampled at \(x=1\) for \(0<t<1\).

Figure 5 displays the loss history over epochs, the evolution of u, the residual, and \(\lambda _1\) as functions of x at \(t=1\) for different training epochs (1000, 3000, 5000, 8000, 11500, and the end of training).

Compared to the results in Fig. 1, in the first 1000 epochs (Slice 1), PINNs-WE produce results similar to the original PINNs. This is because the gradient of the transition point is not sufficiently large to cause problems. However, after 3000 epochs, PINNs-WE can effectively reduce the total loss as it resolves the paradoxical situation at the transition points. Subsequently, all the transition points are compressed into a smooth region, except for the one where \(u=0\) and always has zero velocity.

As shown in Fig. 5d, after 3000 epochs, all the weights change to nearly 1, except for the central one.

We then proceed to compare PINNs-WE against the original PINNs and the traditional high-order WENO-Z method, considering various factors, including the number of mesh-based residual points, network structure, and the factor \(k_1\). All the presented results are the averages of ten random runs to mitigate the influence of hyperparameter randomness. We employed the L-BFGS algorithm for training to convergence.

Table 1 provides the \(L_2\) errors, \(L_2\) errors within smooth regions (\(x \in [0, 0.95] \cup [1.05, 2], t=1\)), and within the shock region (\(x\in (0.95, 1.05)\)), along with \(L_\infty \) errors. Additionally, the training losses after convergence in different scenarios are displayed in the same table. We also give a rough comparison of the computational cost between different methods. The comparison illustrates that:

  1. 1.

    For PINNs-WE, when the network parameters are sufficient, the network structure (depth and width) does not have a significant impact on the results. The variation of parameter \(k_1\) within the range of 0.1 to 0.4 also does not significantly affect the results.

  2. 2.

    The accuracy of the new method, PINNs-WE, consistently outperforms the original PINNs in this case.

  3. 3.

    When there are insufficient cells/sampling points, the accuracy of PINNs-WE is higher than that of the WENO-Z scheme.

  4. 4.

    Traditional methods exhibit mesh convergence, a property that PINNs do not possess. In fact, even with an increase in residual points, the accuracy of PINNs-WE may decrease. This is mainly due to the increased difficulty in optimization with more points.

  5. 5.

    Using PINNs to solve forward problems is much more expensive than traditional methods (Fig. 6).

Fig. 5
figure 5

Results of inviscid Burgers’ equation with PINNs-WE (Part 1). Similar computational setting with Fig. 1

Fig. 6
figure 6

Results of Burgers’ equation with PINNs-WE (Part 2). Similar computational setting with Fig. 1

Table 1 Accuracy on inviscid Burgers’ equation, \(t=1\)

4.2 Euler Equations

In 1D cases, we examine two classical Riemann problems, namely the Sod and Lax problems, as well as another 1D problem characterized by strong shock waves. Subsequently, we transition to a 2D Riemann problem and a more intricate problem involving moving shock waves. The loss function is constructed as

$$\begin{aligned} {\mathcal {L}} =\omega _{\textrm{PDE}} {\mathcal {L}}_{\textrm{PDE}} + \omega _\textrm{IBCs} {\mathcal {L}}_{\textrm{IBCs}} + \omega _{\textrm{RH}} {\mathcal {L}}_{\textrm{RH}} + \omega _{\textrm{CONs}} {\mathcal {L}}_{\textrm{CONs}}. \end{aligned}$$
(34)

and

$$\begin{aligned} \omega _{\textrm{PDE}} = 1, \quad \omega _{\textrm{IBCs}} = 10, \quad \omega _{\textrm{RH}} = 10,\quad \omega _{\textrm{CONs}} = 10. \end{aligned}$$
(35)

4.2.1 Sod Problem

The Sod problem is an extensively studied 1D Riemann problem with the initial constant states in a tube with unit length, formulated as follows:

$$\begin{aligned} (\rho ,u,p) = \left\{ \begin{aligned}&(1,0,1), \quad&\textrm{if} \quad 0\le x \le 0.5,\\&(0.1,0,0.125),&\quad \textrm{if} \quad 0.5< x \le 1.\\ \end{aligned}\right. \end{aligned}$$
(36)
Fig. 7
figure 7

Evaluation of the Sod problem at \(t=0.2\)

Fig. 8
figure 8

The pointwise function residual and equation weight distribution at an epoch during training for the Sod problem

We use a NN with 7 hidden layers, each containing 50 neurons. We then set 5000 function residual points using the Latin Hypercube Sampling (LHS) method in the \(X \times T\) space. The number of initial points is \(N_{\textrm{IC}} = 100\). The boundary condition is omitted for the reasons explained in Appendix A. There are 100 residual points at time \(t=0.2\), referred to as \(N_{\textrm{RH}}\). Additionally, we have a total of 100 conservation points at both \(t_1=0\) and \(t_2=0.2\), denoted as \(N_{\textrm{CONs}}\). After training, we construct a test set consisting of 100 uniformly spaced points in the range \(x\in [0,1]\) at the final time \(t=0.2\).

First, we compare the results obtained with the well-trained PINNs-WE to the traditional high-order WENO-Z method using 100 cells in space. Figure 7 illustrates that PINNs-WE achieves similar or even superior results when compared to the WENO-Z method. Notably, when capturing shock waves, there are no transition points within the shock region because PINNs-WE does not introduce any dissipation into the equations.

In Fig. 8, we present a result from the network that is still undergoing training. We compare the pointwise function residuals, both with and without the weight \(\lambda _1\), and the equation weight \(\lambda _1\) at time \(t=0.2\) between the well-trained network and the network during training. This comparison demonstrates that, during training, as the shock wave forms, the function residual within the discontinuous region dominates the training process. The design of the equation weight \(\lambda _1\) is effective in reducing the residuals near the shock, thereby balancing the training process and achieving high accuracy results with sharp shock waves.

4.2.2 Lax Problem

The Lax problem is another a Riemann problem featuring a strong shock and strong contact discontinuity. The initial conditions are defined as follows:

$$\begin{aligned} (\rho ,u,p) = \left\{ \begin{aligned}&(0.445,0.698,3.528), \quad&{\text {if}} \quad 0\le x \le 0.5,\\&(0.5,0,0.571),&\quad {\text{ if }} \quad 0.5< x \le 1.\\ \end{aligned}\right. \end{aligned}$$
(37)

The computational domain is defined as \(t \in [0,1.4]\) and \(x \in [0,1]\) within the \(T \times X\) space. Similar to the Sod problem, you have used the same neural network (NN) architecture. In this case, you randomly selected 50,000 interior points from a uniform \(1000 \times 5000\) mesh in the \(X \times T\) space. There are 1000 initial points, and the number of residual points, RH points, and conservation points is the same as in the Sod case.

We compare the results with those obtained from the high-order WENO-Z method. Figure 9 shows that the PINNs-WE can accurately simulate this problem and gains a sharper shock than that by WENO-Z. We also give a result from the network at one epoch that is still during training, and we take a comparison of the pointwise function residuals without the weight \(\lambda _1\) and the equation weight \(\lambda _1\) at time \(t=1.4\) between the well trained and during training network. It illustrates that the design of the equation weight \(\lambda _1\) can balance the training process and achieving high accuracy results with sharp shock waves.

Fig. 9
figure 9

Results for the Lax problem

Fig. 10
figure 10

The pointwise function residual and equation weight distribution at an epoch during training for the Lax problem

4.2.3 Two Shock Waves Problem

At last of the 1D cases, we consider a problem with left and right running strong shock waves. The initial condition is given as

$$\begin{aligned} (\rho ,u,p) = \left\{ \begin{aligned}&(1,0,1), \quad&\textrm{if} \quad -0.1 \le x-0.5 \le 0.1,\\&(1,0,0.01),&\quad \textrm{other where}.\\ \end{aligned}\right. \end{aligned}$$
(38)

This problem can be viewed as a normalization and simplification of the classical 1D blast problem. The shock waves are generated due to a pressure difference 100 times greater between the center and the left and right sides. The computational time is \( t= 0.036\). Other settings are same to those in Sod problem.

In Fig. 11, we give a comparison at time \(t=0.32\). They illustrate that PINNs-WE can accurately capture the left and right shock waves. Compared to WENO-Z with a similar number of grid points, despite a slightly lower accuracy in velocity, the shock resolution of the new method still is higher.

Fig. 11
figure 11

Evaluation of the two shock waves problem at \(t=0.32\)

4.2.4 2D Riemann Problem

Next, we consider a 2D problem with strong discontinuities. The basic settings are retrieved from case 8 in [26]. The initial condition is given by

$$\begin{aligned} (\rho ,u,v,p) = \left\{ \begin{aligned}&(1,-0.75,0.5,1)&\quad \text {if}&\quad 0\le x \le 0.5, 0\le y \le 0.5,\\&(2,0.75,0.5,1)&\quad \text {if}&\quad 0\le x \le 0.5, 0.5< y \le 1,\\&(3,-0.75,-0.5,1)&\quad \text {if}&\quad 0.5< x \le 1, 0\le y \le 0.5,\\&(1,0.75,-0.5,1)&\quad \text {if}&\quad 0.5 \le x \le 0.5, 0.5 < y \le 1.\\ \end{aligned} \right. \end{aligned}$$
(39)

And \(t\in [0,0.4]\). We use an NN with 6 hidden layers and 60 neurons per layer. The training points are obtained by Latin hypercube sampling, with 200,000 interior points in the \(T\times X \times Y\) space and 10,000 initial points in the \(X \times Y\) space. The final training loss is 0.009.

Fig. 12
figure 12

Results of 2D Riemann problem (part 1) with \(100\times 100\) test points for PINNs-WE and the same number of mesh grids for WENO-Z

Fig. 13
figure 13

Results of 2D Riemann problem (part 2) with \(100\times 100\) test points for PINNs-WE and the same number of mesh grids for WENO-Z

Fig. 14
figure 14

Results of 2D Riemann problem II (part 1), an unfair comparison, with \(400\times 400\) test points for PINNs-WE and the same number of mesh grids for WENO-Z

Fig. 15
figure 15

Results of 2D Riemann problem II (part 2), an unfair comparison, with \(400\times 400\) test points for PINNs-WE and the same number of mesh grids for WENO-Z

We provide test results for \(100 \times 100\) mesh points, which slightly outnumber the training points (approximately 60) along each dimension. Comparison results are shown in Figs. 12 and 13 with the WENO-Z method in the same \(100 \times 100\) mesh. The proposed method can capture contact discontinuities more sharply and nearly without transition points, which are unavoidable by the traditional high-order method.

Then, we increased the test points to meshly \(400 \times 400\), which is substantially larger than the training data along each dimension. And we performed an unfair comparison with the WENO-Z method in a \(400 \times 400\) mesh in Figs. 14 and 15. The computation of the detailed structure, particularly in the middle region, is weaker in the PINNs-WE because of the available training data, but discontinuities are still sharper than those computed by the WENO-Z. These results illustrate the advantages of the PINNs-WE in high dimensions given its mesh-free feature.

4.2.5 Transonic Flow Around Circular Cylinder

In our final analysis, we delve into a 2D problem involving a bow shock. Specifically, we examine a transonic flow with a Mach number of 0.728 passing around a stationary circular cylinder. A similar problem, along with detailed analysis, can be found in [27]. The initial condition is defined by a uniform flow with the parameters \((\rho , u, v, p) = (2, 112, 1.028, 0, 3.011)\). The computational domain spans t from 0 to 0.4, x from 0 to 1.5, and y from 0 to 2 within the \(T \times X \times Y\) domain. The center of the cylinder, with a radius of 0.25, is positioned at coordinates (1, 1).

For this analysis, we employ a neural network (NN) consisting of 7 hidden layers, each with 90 neurons. The residual points for our computations are obtained through Latin hypercube sampling within the 3D domain of \(T \times X \times Y\), totaling 300,000 points. Additionally, we randomly select 15,000 boundary points along the cylinder, and another 15,000 initial points are obtained using Latin hypercube sampling. After 2000 optimization steps using the L-BFGS algorithm, we achieve a total loss of 0.028.

Figures 16, 17, 18 and 19 present a comparative analysis of the results obtained using PINNs-WE and the WENO-Z method. The results from PINNs-WE exhibit sharpness similar to those of WENO-Z, but they display a smoother profile. Notably, our approach effectively captures the vortex located behind the cylinder.

It’s important to acknowledge certain limitations stemming from the number of residual points and the use of single-precision calculations, which are imposed by the hardware (Nvidia 1080ti) used in our study. These constraints prevent further reduction of the loss and hindered obtaining more precise results.

Nevertheless, our work underscores the promising capability of PINNs-WE in simulating complex transonic and supersonic flows. Despite these constraints, the method consistently produces competitive and visually appealing results, highlighting its potential in addressing challenging fluid dynamics problems.

Fig. 16
figure 16

Resulting pressure for transonic flow through circular cylinder using proposed PINNs-WE (left) and WENO-Z method (right)

Fig. 17
figure 17

Resulting density for transonic flow through circular cylinder using proposed PINNs-WE (left) and WENO-Z method (right)

Fig. 18
figure 18

Resulting velocity u for transonic flow through circular cylinder using proposed PINNs-WE (left) and WENO-Z method (right)

Fig. 19
figure 19

Resulting velocity v for transonic flow through circular cylinder using proposed PINNs-WE (left) and WENO-Z method (right)

Fig. 20
figure 20

Resulting streamline for transonic flow through circular cylinder using proposed PINNs-WE (left) and WENO-Z method (right)

5 Conclusions

In this paper, we introduce a Physics-Informed Neural Networks with Equation Weights (PINNs-WE) framework designed to capture strong nonlinear discontinuities, particularly shock waves, when solving hyperbolic equations. Despite the versatility of PINNs for solving inverse problems combining equations with data, our focus in this work is on forward problems. This choice allows us to analyze the fundamental characteristics of PINNs without the influence of prior data.

One of the key contributions of our approach is the recognition of a paradoxical issue within transition points inside shock waves. Regardless of whether gradients are increased or decreased, these points tend to increase the total loss, potentially leading to conflicts during neural network training. To address this, we adopt three novel strategies:

  1. 1.

    In our framework, we first incorporate a positive physics-dependent weight into the governing equations to adapt the behavior of PINNs in regions with varying physical features. For solving the Euler equations, we construct a weight that is inversely proportional to the local physics compression, measured through the velocity divergence. By introducing these Weighted Equations (WEs) into PINNs, the neural network training primarily focuses on smoother regions, as shock regions receive small weights. Relying on the inherent physics compression learned from these smooth regions, discontinuities naturally emerge as transition points move out into smoother regions, analogous to passive particles.

  2. 2.

    Recognizing that strong form PDEs are not suitable for describing strong discontinuous solutions, we address the underconstrained nature of the problem by incorporating the Rankine–Hugoniot (RH) relation, which is equivalent to the weak form of conservation laws, as new constraints near the shock waves.

  3. 3.

    For nonlinear hyperbolic equations, such as the Euler equations, preserving physical conservation is of utmost importance. It directly impacts the accuracy of shock wave positions. Therefore, we integrate a conservation constraint into our new framework.

Furthermore, we provide a comparison between PINNs-WE and the traditional shock-capturing method, WENO-Z, in this paper. Some of the key findings include:

  1. 1.

    Due to the nonlinear nature of neural networks, PINNs have the potential to capture discontinuities sharper than mesh-based methods.

  2. 2.

    PINNs-WE can capture shock waves without obvious transition points and accurately solve rarefaction waves as there is no additional dissipation introduced, as demonstrated in the 123 problem presented in Appendix B.

  3. 3.

    When residual points are sparse, PINNs may outperform traditional methods in terms of accuracy.

  4. 4.

    However, it is important to note that PINNs involve online training, which can be more computationally expensive than traditional methods for forward problems.

  5. 5.

    A significant challenge with PINNs and similar methods is their lack of grid or sampling point refinement convergence. This limitation can result in significant errors when dealing with complex flows featuring fine or high-frequency structures.