1 Introduction

Nonlinear problems exist in many fields, such as fluid mechanics, plasma physics and fiber optic communications [1,2,3,4]. Nonlinear phenomena in nature are described as simple mathematical physical models and expressed by nonlinear partial differential equations (NPDEs). Because of the importance of nonlinear problems, solving NPDEs has been the focus of researchers in recent decades.

With the development of computer and the arrival of the era of big data, due to its powerful “learning” ability, deep learning has had a significant impact in many fields, such as computer vision (unmanned driving, face recognition, VR/AR, medical image analysis, etc.), recommendation systems (help users find information that interests them in an information overload environment and push it to them) and natural language processing (syntactic semantic analysis, information extraction, machine translation, information retrieval, etc.) [5,6,7,8]. An important reason behind the success of deep learning is the use of neural networks. Because the key step to obtain the predictive solution of NPDEs through deep learning is to constrain the neural networks to minimize the loss function. Thus, the use of deep neural networks methods to solve NPDEs has attracted extensive attention, and some models [9,10,11,12] optimized on the basis of them have emerged, greatly promoting the development of this field. Among these works, physics-informed neural networks (PINNs) proposed by Raissi et al. [9] are particularly outstanding. This method uses the standard back-propagation (BP) neural networks, generating the function through the automatic differentiation method of the BP neural networks, and then obtain the final predicted solution by minimizing the sum of mean square of initial error, boundary error and residual of NPDEs. In addition, compared with traditional mesh-based methods such as finite difference method and finite element method, PINNs make the method meshless because of its automatic differential method. Therefore, PINNs have attracted much attention because of their good generalization ability and prediction performance in solving NPDEs, Bihlo et al. [13] simulated the shallow-water equations on the sphere by PINNs. Through PINNs, Luo et al. [14] simulated data-driven solutions of the Sasa–Satsuma equation and solved parameter discovery problem. Li et al. [15] solved the forward and inverse problems of the nonlinear Schrödinger equation with the generalized \(\mathcal{P}\mathcal{T}\)-symmetric Scarf-II potential via PINNs. Chen et al. [16] simulated data-driven vector localized waves of Manakov system by PINNs and solved the parameter discovery problem of Manakov system.

Despite certain positive outcomes, PINNs have encountered unforeseen challenges in approximating potential solutions. Certain equations display diminished simulation precision and, at times, fail to be simulated. Consequently, the optimization of neural networks methodologies to enable novel models to simulate equation solutions unattainable by conventional PINNs, or to procure more precise predictive solutions, has emerged as a critical research focus among scholars. Recently, there have been proposals for neural networks methods based on PINNs, involving the optimization of neural networks architectures or the loss function: Mean Squared Error (MSE). For instance, Jagtap et al. [17] introduced an advanced PINNs method by altering the training area of the PINNs loss function and augmenting the energy conservation law constraint, employing this method to simulate the inverse problem in supersonic flow. Mattey et al. [18] put forth a novel sequential PINNs model by incorporating additional loss function constraints, simulating Alan Cahn and Kahn Hilliard equations with this method. Rezaei et al. [19] amalgamated the finite element method, adding the finite element boundary constraint to the loss function to propose a hybrid PINNs, examining issues in heterogeneous domains via this composite PINNs. Chen and his team [20,21,22,23] revised the loss function by appending a slope recovery term, proposing an IPNNs method. This approach was used to simulate data-driven rogue waves and soliton solutions of classical mathematical physics equations such as the Sin-Gordon equation, nonlinear Schrödinger equation (NLSE), and derivative NLSE. The team also employed two identical neural networks, incorporating the sum of the mean squared deviations of the prediction solutions acquired from the two neural networks as constraints to the loss function of the second neural networks, they proposed a two-stage PINNs [24] and simulated the rational waves of a class of nonlinear physical equations. Ling et al. [25] altered the training area, selecting points other than boundary errors, initial errors, and equation residuals per the established algorithm. They added the sum of the mean square errors corresponding to these points into the loss function as a constraining condition, and proposed a pre-fixed multi-stage PINNs, thereby obtaining the data-driven vector soliton solution of coupled NLSE. Yan et al. [26,27,28,29] proposed an enhanced PINNs by modifying the loss function and incorporating the constraint conditions of the first-order derivative boundary loss function term, predicting the peakon solution and periodic solution of the Camassa–Holm-like equations with this improved PINNs. Building upon the PINNs model, Li et al. [30,31,32] revised the loss function, setting the coefficients of the boundary loss function and the initial loss function as variables, thereby proposing a gradient-optimized PINNs; they also altered the training area by merging the boundary area with the initial area to create a new training area, subsequently reducing computer error, and proposed a mix-training PINNs. These two models further enhance the accuracy of the simulation solution. Dai et al. [33,34,35] proposed the mixed PINNs method by modifying the training region of PINNs loss function and increasing the constraint of the energy conservation law, simulating the three-component coupled NLSE, KdV equation, and mKdV equation.

Building on extant research findings, we contemplate the possibility of proposing a novel neural networks model of superior performance through modifications to the loss function MSE, without altering the structure of the neural networks. Consequently, this study amalgamates the initial sampling region I and boundary sampling region B in the original PINNs model into a single mix-training sampling region IB, thereby proposing a mix-training physics-informed neural networks (MTPINNs) model. Furthermore, predicated on the MTPINNs model, we select certain sample points from the sharp region of the equation solution, a region encompassing specific points that instigate substantial oscillations in preceding and subsequent solution values. These sample points are integrated as prior information into the mixed training sampling region IB, forming a novel sampling region IBP and leading to the proposition of a prior information mix-training PINNs (PMTPINNs).

Rogue waves constitute a compelling physical phenomenon, manifesting in various domains such as optics, ocean dynamics, and plasma, among others. Given the instability and unpredictability of rogue waves, studying them and applying the research findings to practical situations hold significant value. This could, for instance, include strategies to mitigate the damages inflicted on ships by rogue waves during maritime voyages, or measures to prevent harm to sea-faring personnel due to these waves. Recently, several scholars have employed neural networks models to simulate rogue waves of certain classical integrable equations, procuring commendable results [18, 20, 23, 27, 29]. However, the current neural networks models are unable to simulate the high-order rogue waves of some intricate NPDEs, such as the high-order rogue waves of the cmKdV equation [22]. Thus, our objective is to propose a neural networks methodology capable of simulating high-order rogue waves of the cmKdV equation. In this paper, we will simulate the second-order and third-order rogue waves of the cmKdV equation utilizing our proposed MTPINNs and PMTPINNs models.

The structure of this paper is as follows: in Sect. 2, we introduce the neural networks’ structure; in Sect. 3, we first review the PINNs model, then introduce MTPINNs and PMTPINNs; in Sect. 4, we use PINNs, MTPINNs and PMTPINNs to simulate second-order and third-order rogue waves of the cmKdV equation, and make data-driven coefficient discovery (inverse problem) for the cmKdV equation. Then, by checking the numerical results, we found that the PINNs could not simulate high-order rogue waves of cmKdV equation, but MTPINNs and PMTPINNs we proposed could do so, and these models’ performance is good. Finally, Sect. 5 gives the conclusion and discussion.

Fig. 1
figure 1

Mix-training physics-informed neural networks (MTPINNs)

2 Neural networks

This section provides an overview of neural networks, with a specific focus on Backpropagation (BP) neural networks. The BP neural networks learning algorithm, according to Li et al. [36], is currently the most triumphant methodology utilized in neural networks training. Typically, a BP neural networks comprises input layers, hidden layers, output layers, neurons, and an activation function.

The process of solution derivation and error examination within the BP neural networks model is as follows: Initially, the relevant weights w and biases b are established. Following this, the input values, such as x and t, are set, along with the quantity of neurons in each hidden layer. The corresponding weights w and biases b are then allocated to the inputs x and t. Executing a sequence of weighted summations on the inputs x and t yields u(xt), which, when applied to the activation function, produces \({\hat{u}}(x,t)\), this is then transmitted to the subsequent layer within the neural networks. It is worth noting that the implementation of activation function is used to modify the linear relationship of the previous data to provide practicability for the layer of neural networks. This capability enables the networks to learn complex objects and data, representing nonlinear relationships between the inputs and outputs of any complex function mapping. This sequence continues up to the penultimate layer, wherein the activation function is no longer utilized, and the output layer directly generates the final predictive solution Y(xt). Then, an optimizer is selected to optimize the obtained Y(xt). If the mean squared error (MSE) derived from the juxtaposition of expected and predicted results exceeds \(\varepsilon \) or if the iteration count has not reached the maximum value, the predictive solution Y(xt) is sent back to the input layer to update the weights w and biases b and repeat the process. The MSE will continue to decrease until it is less than \(\varepsilon \) or the iteration count reaches the maximum value, thereby concluding the loop and producing the final predictive solution for the equation.

In essence, the BP neural networks as an “error correction” function. In this study, we have employed two optimization algorithms, namely L-BFG-S and Adam, to achieve optimal results. The Adam optimization algorithm, as described in [37], is a traditional form of the stochastic gradient descent algorithm, while the L-BFG-S optimization algorithm, outlined in [38], is a full-batch gradient descent optimization algorithm based on the quasi-Newton method.

As shown in Fig. 1, input and output represent input layer and output layer, and Hidden1, Hidden2, ..., HiddenN represent hidden layer. \({\mathcal {I}}(x)\) is an initial value condition, \({\mathcal {B}}(x,t)\) is a boundary condition. MSf is the equation residual; MSIB is the loss function on the mixed region IB, the mixed region IB consists of initial region I and boundary region B. \(\varepsilon \) is a controllable parameter, and maxiter is the largest step of iteration. The above two parameters are used as the loop-continuation condition. Finally, the predicted solution \({\hat{q}} (x,t)\) is obtained at the end of the loop.

3 Model setup

3.1 PINNs

First, we will briefly introduce the PINNs model. The NPDEs solved by PINNs are as follows [9]:

$$\begin{aligned} {\left\{ \begin{array}{ll} {q_t} + {N}[q] = 0,(x,t) \in \Omega \times T,\\ q(x,t) = {\mathcal {I}}(x,t),x \in \Omega , t= t_{0},\\ q(x,t) = {\mathcal {B}}(x,t),(x,t) \in \partial \Omega \times T, \end{array}\right. } \end{aligned}$$
(1)

here T and \(\Omega \) represent the time domain and spatial domain of the equation, respectively, and \(\partial \Omega \) represent the boundary region of the equation in the spatial direction. N[q] is the combination of nonlinear and linear operators of q(xt), and \({\mathcal {I}}(x)\), \({\mathcal {B}}(x,t)\) is the initial condition and boundary condition of the equation, respectively. In addition, in order to calculate the loss function subsequently, we let \(f = {q_t} + {N}[q]\) is the equation residual.

We need to simulate the original equation through neural networks, so as to simulate the predicted solution \({\hat{q}} (x,t)\) of the original equation. Using PINNs to solve NPDEs with initial conditions and boundary conditions requires a loss function MSE and gradient descent method is used to process the loss function, so we make the loss function [9] as follows:

$$\begin{aligned} MSE_{PINNs}= & {} MSf + MSI + MSB\nonumber \\= & {} \frac{1}{{{N_{f}}}}\sum _{i = 1}^{{N_{f}}} {[f({x^i},{t^i})]^2}\nonumber \\{} & {} +\, \frac{1}{{{N_{{\mathcal {I}}}}}}\sum _{i = 1}^{{N_{{\mathcal {I}}}}} {[{\hat{q}} ( x_{{\mathcal {I}}}^i,t_{0}) - q( x_{{\mathcal {I}}}^i,t_{0})]^2} \nonumber \\{} & {} +\, \frac{1}{{{N_{{\mathcal {B}}}}}} \sum _{i = 1}^{{N_{{\mathcal {B}}}}} {[{\hat{q}} (x_{{\mathcal {B}}}^i,t_{{\mathcal {B}}}^i) - q(x_{\mathcal B}^i,t_{{\mathcal {B}}}^i)]^2}\nonumber \\ \end{aligned}$$
(2)

where the loss function consists of three parts, namely, equation residual MSf, initial condition error MSI and boundary condition error MSB. \({N_{f}}\) is the number of sampling points in the whole area, \({N_{{\mathcal {I}}}}\) is the number of sampling points in the initial area, and \({N_{{\mathcal {B}}}}\) is the number of sampling points in the boundary conditions.

The above sampling points are random sampling in a specific area. Finally, the two optimizers use the gradient descent method to minimize the loss function MSE until reach convergence.

3.2 MTPINNs

Although the PINNs has achieved success in many fields, there are still many difficulties in solving some complex NPDEs. Because the loss function of the model is calculated by sampling from the whole area MSf, from the initial value area MSI, and from the boundary area MSB. Finally, the mean square of the three parts is added to form the final loss function MSE. Therefore, when simulating some complex NPDEs, the accuracy may be too low or the solution of the equation cannot be simulated.

In this part, we try to combine the boundary region B and initial value region I into one region IB; that is, we directly proceed sampling point training from this new region. In theory, we can continue to reduce the loss function MSE, and the error of the corresponding boundary conditions and initial conditions also changes from \(MSB + MSI\) to MSIB. At this time, the new loss function [32] is:

$$\begin{aligned} MS{E_{MTPINNs}}= & {} MSf + MSIB\nonumber \\= & {} \frac{1}{{{N_{f}}}}\sum _{i = 1}^{{N_{f}}} {[f({x^i},{t^i})]^2} \nonumber \\{} & {} +\,\frac{1}{{{N_{{\mathcal {I}}B}}}}\sum _{i = 1}^{{N_{{\mathcal {I}}B}}} [{\hat{q}} (x_{{\mathcal {I}}B}^i,t_{{\mathcal {I}}B}^i)\nonumber \\{} & {} -\, q(x_{{\mathcal {I}}B}^i,t_{{\mathcal {I}}B}^i)]^2. \end{aligned}$$
(3)

where the loss function is composed of two parts, namely equation residuals MSf, error MSIB in the mixed region IB composed of initial conditions and boundary conditions. \({\hat{q}} (x_{{\mathcal {I}}B}^i,t_{{\mathcal {I}}B}^i)\) is our predicted solution and \(q(x_{IB}^i,t_{IB}^i)\) is the true solution of the equation; \({N_f}\) is the number of sampling points in the whole region, and \({N_{{\mathcal {I}}B}}\) is the number of sampling points in the mixed region. In order to better compare the quality of methods, we make \({N_{{\mathcal {I}}B}}\) and \({N_{{\mathcal {I}}}}\), \({N_{{\mathcal {B}}}}\) of PINNs to meet the following conditions:

$$\begin{aligned} {N_{{\mathcal {I}}B}} = {N_{{\mathcal {I}}}} + {N_{{\mathcal {B}}}} \end{aligned}$$
(4)

The sampling methods are the same, but the sampling areas are different.

3.3 PMTPINNs

When simulating the existing research results, we found that sometimes the sampling area is too random, so many points with large errors are not selected as training points. Therefore, we consider whether it is possible to add the constraint conditions for selecting training points, so that some points with large errors can be selected for training. At this time, we list the sampling points of the sharp area of the equation solution in advance, and then, they are added as prior information to the previous IB training area, generating a new area IBP. Then, when sampling again, it will be carried out in a new area, so that the method can be further optimized. This idea combines the idea of adaptive grid used in the finite difference method. The idea of adaptive grid is to reduce the number of grid nodes in the area with small error, and densify the grid in the area with large error. At this time, the new loss function [32] is defined as:

$$\begin{aligned} MS{E_{PMTPINNs}}= & {} MSf + MSIBP\nonumber \\= & {} \frac{1}{{{N_{f}}}}\sum _{i = 1}^{{N_{f}}} {[f({x^i},{t^i})]^2} \nonumber \\{} & {} +\,\frac{1}{{{N_{{\mathcal {I}}BP}}}}\sum _{i = 1}^{{N_{{\mathcal {I}}BP}}} [{\hat{q}} (x_{{\mathcal {I}}BP}^i,t_{{\mathcal {I}}BP}^i)\nonumber \\{} & {} -\, q(x_{{\mathcal {I}}BP}^i,t_{{\mathcal {I}}BP}^i)]^2. \end{aligned}$$
(5)

where the loss function MSE is composed of two parts, namely equation residuals MSf and loss function MSIB. \({\hat{q}} (x_{{\mathcal {I}}BP}^i,t_{{\mathcal {I}}BP}^i)\) is our predicted solution and \(q(x_{{\mathcal {I}}BP}^i,t_{{\mathcal {I}}BP}^i)\) is the true solution of the equation; \({N_f}\) is the number of sampling points in the whole region, and \({N_{{\mathcal {I}}BP}}\) is the number of sampling points in the mixed region. Similarly, we order \({N_{{\mathcal {I}}BP}} = {N_{{\mathcal {I}}B}}\). In this way, a kind of prior information mix-training model PMTPINNs is generated by adding the sampling points of the sharp area of the equation solution to the training domain in advance.

4 Numerical experiment

In this section, we test the performance of PINNs, MTPINNs, and PMTPINNs by high-order rogue waves of the cmKdV equation [39]. In all cases, without losing generality, using a deep neural networks with 6 hidden layers with 80 neurons in each hidden layer, we define the hyperbolic tangent function (\(\tanh \)) as the nonlinear activation equation of the model. By default, L-BFGS algorithm [38] is used to train 40,000 steps of the networks, and then Adam optimizer [37] with default learning rate is used to optimize 40,000 steps of the networks. The weights w and deviations b in all models are initialized using the Xavier method [9], without any additional regularization techniques. It is worth noting that we use \({N_{f}} = 10,000\), \({N_{{\mathcal {I}}}} = 50\) and \({N_{{\mathcal {B}}}} = 50\) sample points, respectively, on equation residuals, initial conditions and boundary conditions to test. For consistency and convenience of subsequent comparison of method performance, we use \({N_{{{\mathcal {I}}}{{\mathcal {B}}}}} = 100\) sample points on the mixed training dataset from Dirichlet boundary conditions and \({N_{{{\mathcal {I}}}{{\mathcal {B}}}{{\mathcal {P}}}}} = 100\) sample points on the mixed dataset from sharp regions and Dirichlet boundary conditions.

Consider cmKdV equation with Dirichlet boundary conditions [39]:

$$\begin{aligned} {\left\{ \begin{array}{ll} {q_t} + 6{\left| q \right| ^2}{q_x} + {q_{xxx}} = 0,(x,t) \in \Omega \times T,\\ q(x,t) = {\mathcal {I}}(x,t),x \in \Omega , t=t_{0},\\ q(x,t) = {\mathcal {B}}(x,t),(x,t) \in \partial \Omega \times T, \end{array}\right. } \end{aligned}$$
(6)

where q(xt) is an equation containing variables x and t, which are space variables and time variables, respectively.

4.1 Second-order rogue waves

In this section, we take the second-order rogue waves of cmKdV equation as an example, and implement two sets of numerical experiments according to the different parameters of the rogue waves. The form of the second-order rogue waves of the cmKdV equation [39] is as follows:

$$\begin{aligned} {q^{[2]}} = c{e^{ia[x + t({a^2} - 6{c^2})]}}\frac{B(x,t)}{C(x,t)}, \end{aligned}$$
(7)

where B(xt) and C(xt) are polynomials containing terms \(a,c,x,t,{s_0},{s_1}\), and the specific values are given in Appendix A.

4.1.1 \(a = 1.44,c = 1,{s_0} = 0,{s_1} = 100\)

First, when \(a = 1.44,c = 1,{s_0} = 0,{s_1} = 100\), \(T \in [ - 0.6,0.8],X \in [ - 8,8]\), the corresponding cmKdV equation with Dirichlet boundary conditions [39] is as follows:

$$\begin{aligned} {\left\{ \begin{array}{ll} {q_t} + 6{\left| q \right| ^2}{q_x} + {q_{xxx}} = 0,(x,t) \in \Omega \times T, \\ q(x, - 0.6) = c{e^{ia[x - 0.6({a^2} - 6{c^2})]}}\frac{B(x,-0.6)}{C(x,-0.6)},x \in [-8,8],\\ q( - 8,t) = c{e^{ia[ - 8 + t({a^2} - 6{c^2})]}}\frac{B(-8,t)}{C(-8,t)}, t \in [-0.6,0.8],\\ q(8,t) = c{e^{ia[8 + t({a^2} - 6{c^2})]}}\frac{B(8,t)}{C(8,t)}, t \in [-0.6,0.8]. \\ \end{array}\right. } \end{aligned}$$
(8)

First, we use PINNS, MTPINNS and PMTPINNs models, combined with the training conditions and optimization methods described in Sect. 3 to conduct numerical experiments. However, we find that PINNs model cannot simulate second-order rogue waves, so PINNs is not suitable for this. Then we conduct numerical experiments on MTPINNs and find that MTPINNs has a qualitative leap in simulation ability compared with PINNs. Table 1 provides a more intuitive and detailed assessment of MTPINNs’ simulated performance.

Fig. 2
figure 2

a Exact solution of equation (8), b predicted solution of equation (8), c absolute error of the second-order rogue waves (8) simulated by MTPINNs (relative error is 7.532e−03)

Table 1 The relative errors of the two models with respect to the first kind of second-order rogue waves

From Table 1, we can obtain the experimental results of MTPINNs’ prediction performance for second-order abnormal waves under all neural network conditions. Press the Tab key. From Table 1, we find that MTPINNs was superior to PINNs when the number of neurons in the hidden layer and each layer was the same. MTPINNs can not only complete the tasks that PINNs has not completed, but also has a high accuracy rate. In order to better and more clearly observe the performance of MTPINNs, we give the exact solution, predictive solution and absolute error of MTPINNs in Fig. 2, where the black marks in the figure are we selected training points. In addition, in order to observe the iterative optimization process of the optimizer we set up, we also draw the loss function shown in Fig. (3). As can be seen from Fig. 3, the value of the loss function gradually decreases with the increase of the number of iterations. And we believe that with the increase of the number of iterations, the error of the model will be smaller and smaller, and the accuracy will be higher and higher.

Fig. 3
figure 3

Training error of the second-order rogue waves simulated by MTPINNs, where \({\mathcal {L}}_r\) and \({\mathcal {L}}_u\), respectively, represent the mean value of function residual MSf and the sum of the mean value of the initial error and the mean value of the boundary error

Fig. 4
figure 4

New sampling area of the second-order rogue waves by PMTPINNs

Table 2 Compared with MTPINNs, the relative errors and optimization rate of PMTPINNs

In addition, in order to test PMTPINNs, we choose a neural networks with 6 hidden layers and 80 neurons in each layer, and add some training points in the sharp area of the rogue waves (Fig. 4 shows the new sampling area). Through Table 2, we find that PMTPINNs is an order of magnitude more accurate than MTPINNs, which is very meaningful. Based on the results, the concept of model optimization rate is introduced, It meets the following conditions: \(1-\frac{PMTPINNs ~{\mathbb {L}}_{2~errors}}{MTPINNs~{\mathbb {L}}_{2~errors}}\). We find the model optimization rate of PMTPINNs is also quite good.

4.1.2 \(a = 1.44,c = 1,{s_0} = 0,{s_1} = 0\)

Secondly, when \(a = 1.44,c = 1,{s_0} = 0,{s_1} = 0\), \(T \in [ - 0.5,0.5],X \in [ - 5,5]\), the corresponding cmKdV equation [39] and the three-dimensional diagram in Fig. 5 are as follows:

$$\begin{aligned} {\left\{ \begin{array}{ll} {q_t} + 6{\left| q \right| ^2}{q_x} + {q_{xxx}} = 0,(x,t) \in \Omega \times T,\\ q(x, - 0.5) = c{e^{ia[x - 0.4({a^2} - 6{c^2})]}}\frac{B(x,-0.4)}{C(x,-0.4)},x \in [-5,5],\\ q( - 5,t) = c{e^{ia[ - 3 + t({a^2} - 6{c^2})]}}\frac{B(-3,t)}{C(-3,t)}, t \in [-0.5,0.5],\\ q(5,t) = c{e^{ia[3 + t({a^2} - 6{c^2})]}}\frac{B(3,t)}{C(3,t)}, t \in [-0.5,0.5]. \end{array}\right. } \end{aligned}$$
(9)
Fig. 5
figure 5

Three-dimensional diagram of Eq. (9)

We carried out the same experimental method as above and also found that PINNs could not simulate the second-order rogue waves, and the accuracy of MTPINNs was still excellent. The experimental results of the prediction performance of MTPINNs for this second-order rogue waves under all neural networks conditions are shown in Table 3.

4.2 Third-order rogue waves

In this section, we will simulate the third-order rogue waves of the cmKdV equation. The form of the third-order rogue waves of the cmKdV equation [39] is as follows:

$$\begin{aligned} {q^{[3]}} = \frac{{{L_1(x,t)}}}{{{L_2(x,t)}}}{e^{i[\frac{3}{2}x - \frac{{45}}{8}t]}}, \end{aligned}$$
(10)

where \({L_1(x,t)}\) and \({L_2(x,t)}\) are polynomials containing terms \(a,c,x,t,{s_0},{s_1}\), and the specific values are given in Appendix B.

Due to the limited data, we only implement numerical experiments on third-order rogue waves in this case. When \(a = 1.5,c = 1,{s_0} = 0,{s_1} = 0,{s_2} = 0\), \(T \in [ - 1.0,1.5],X \in [ - 10,10]\), the corresponding cmKdV equation with Dirichlet boundary conditions and the three-dimensional diagram of equations drawn (Fig. 6) are as follows:

$$\begin{aligned} {\left\{ \begin{array}{ll} {q_t} + 6{\left| q \right| ^2}{q_x} + {q_{xxx}} = 0,(x,t) \in \Omega \times T,\\ q(x, - 1.0) = \frac{{{L_1(x,-0.4)}}}{{{L_2(x,-0.4)}}}{e^{i[\frac{3}{2}x + \frac{{45}}{8} \times \frac{2}{5}]}},x \in [-10,10],\\ q( - 10,t) = \frac{{{L_1(-6,t)}}}{{{L_2(-6,t)}}}{e^{i[ - 9 - \frac{{45}}{8}t]}}, t \in [-1,1.5],\\ q(10,t) = \frac{{{L_1(6,t)}}}{{{L_2(6,t)}}}{e^{i[9 - \frac{{45}}{8}t]}}, t \in [-1,1.5] \end{array}\right. } \end{aligned}$$
(11)
Table 3 The relative errors of the two models with respect to the second kind of second-order rogue waves
Fig. 6
figure 6

Three-dimensional diagram of Eq. (11)

We still use the three models in Sect. 2, combined with the training conditions and optimization methods described in Sect. 3, to implement numerical experiments. The relative error of PINNs is still very large, and the third-order rogue waves of the equation cannot be simulated. Then we proceeded numerical experiments on the MTPINNs, we found that compared with the PINNs, MTPINNs still has a good accuracy. Table 4 provides a more intuitive and detailed evaluation of the prediction performance of the MTPINNs.

From Table 4, when the number of neurons in the hidden layer and each layer is the same, we can better confirm that MTPINNs is superior to PINNs. MTPINNs not only complete the tasks that PINNs have not completed, but also have high accuracy. From Table 4, we can obtain the experimental results of the prediction performance of MTPINNs on the third-order rogue waves under all neural networks conditions. In order to better and more clearly observe the performance of the model, we still present the exact solution, predicted solution and absolute error of the equation in Fig. 7.

In addition, in order to test PMTPINNs model, we still select the neural networks with 6 hidden layers and 80 neurons in each layer, and add some training points (Fig. 8) near the sharp area of the rogue waves in training area. Through numerical experiments, we find that PMTPINNs have higher accuracy than MTPINNs. Through Table 5, it is found that the optimization rate of the model relative to MTPINNs is still good, which is very meaningful.

Table 4 Relative errors of the two models

In conclusion, MTPINNs are better than PINNs in the simulation of second-order and third-order rogue waves of cmKdV equation, and PMTPINNs again improve the accuracy based on MTPINNs.

4.3 The inverse problem of cmKdV Equation

In order to explore the unknown parameters \(\left[ {{\lambda _1},{\lambda _2}} \right] \) of cmKdV equation, we will use data-driven discovery method to discover NPDEs from MTPINNs deep learning framework. The specific form of the equation [39] is as follows:

$$\begin{aligned}{} & {} {q_t} + {\lambda _1}{\left| q \right| ^2}{q_x} + {\lambda _2}{q_{xxx}} = 0,\nonumber \\{} & {} (x,t) \in \Omega \times T. \end{aligned}$$
(12)

The potential solution \({\hat{q}} (x,t) = {\hat{u}} (x,t) + \mathrm{{i}}{\hat{v}} (x,t)\), \([{\hat{u}} (x,t),{\hat{v}} (x,t)]\) and are the real part and imaginary part, respectively. Then, we make \(F(x,t) = \mathrm{{i}}{F_u}(x,t) + {F_v}(x,t)\), we approximate the potential solution by using the reduced value of F(xt) method. Here, F(xt), \({F_u}(x,t)\) and \({F_v}(x,t)\) satisfy

$$\begin{aligned} F(x,t)= & {} {q_t} + {\lambda _1}{\left| q \right| ^2}{q_x} + {\lambda _2}{q_{xxx}},\nonumber \\ {F_u}(x,t)= & {} {v_t} + {\lambda _1}{u^2}{v_x} + {\lambda _1}{v^2}{v_x}+ {\lambda _2}{v_{xxx}},\nonumber \\ {F_v}(x,t)= & {} {u_t} + {\lambda _1}{u^2}{u_x} + {\lambda _1}{v^2}{u_x} + {\lambda _2}{u_{xxx}}. \end{aligned}$$
(13)

The unknown parameters \(\left[ {{\lambda _1},{\lambda _2}} \right] \) are trained by using input data sets [u(xt), v(xt)] from initial values \({{\mathcal {I}}}(x,t) = {u_0}(x,t_{0}) + {v_0}(x,t_{0})\) and boundary conditions \({{\mathcal {B}}}(x,t) = {u_{{\mathcal {B}}}}(x,t) + {v_{{\mathcal {B}}}}(x,t)\), and the undetected solution \({\hat{q}} (x,t) = {\hat{u}} (x,t) + \mathrm{{i}}{\hat{v}} (x,t)\) is approximated by minimizing the loss function MSE:

$$\begin{aligned} MSE= & {} \frac{1}{{{N_R}}}\sum _{i = 1}^{{N_R}} \Big ({\left| {{f_u}(x_R^i,t_R^i)} \right| ^2} + {\left| {{f_v}(x_R^i,t_R^i)} \right| ^2}\Big ) \nonumber \\{} & {} +\,\frac{1}{N_S} \sum _{i = 1}^{N_S}\Big (\left| {\hat{u}} (x_S^j,t_S^j)- u(x_S^j,t_S^j) \right| ^2 \nonumber \\{} & {} +\, {\left| {{\hat{v}} (x_S^j,t_S^j) - v(x_S^j,t_S^j)} \right| ^2}\Big ), \end{aligned}$$
(14)

here \({N_{{\mathcal {R}}}}\) has been mentioned above, \({N_S} = {N_{{\mathcal {I}}}} + {N_{{\mathcal {B}}}} = {N_{{{\mathcal {I}}}{{\mathcal {B}}}}} = 100\). For simplicity, we will directly use the neural network parameters used in Sect. 2 for training. It is worth mentioning that in addition to the clean data, we also add 2 and 5% noise data to simulate the second-order and third-order rogue waves of the cmKdV equation, the final numerical results shown in Tables 6 and 7. As can be seen from the numerical experimental results in Tables 6 and 7, MTPINNs not only has a very superior ability in solving inverse problems with clean data; when adding noise data, MTPINNs still has a superior simulation ability in solving inverse problems, which indicates that our model has good robustness and undoubtedly confirms the great advantages of our model again.

Fig. 7
figure 7

a Exact solution of equation (11), b predicted solution of equation (11), c absolute error of the third-order rogue waves (11) simulated by MTPINNs (relative error is 9.802e−04)

Fig. 8
figure 8

New sampling area of third-order rogue waves by PMTPINNs

Table 5 Compared with MTPINNs, the relative error and optimization rate of PMTPINNs
Table 6 Second-order rogue waves: the correct cmKdV equation and the identified one obtained by learning \({\lambda _1}\) and \({\lambda _2}\) and their relative errors
Table 7 Third-order rogue waves: the correct cmKdV equation and the identified one obtained by learning \({\lambda _1}\) and \({\lambda _2}\), and their relative errors

5 Conclusion

In this work, based on PINNs, we combine initial condition region \({\mathcal {I}}\) and boundary condition region \({\mathcal {B}}\) into a new training region \(\mathcal{I}\mathcal{B}\) and propose MTPINNs. Then, we use an adaptive search algorithm to find sample points in the sharp region of rogue waves in the cmKdV equation. These new sample points are added into the training area \(\mathcal{I}\mathcal{B}\) as priors to form a new training area \(\mathcal {IBP}\), and PMTPINNs is proposed.

Taking the second-order and third-order rogue waves of cmKdV equation as examples, the numerical experiments show that the original PINNs cannot simulate the high-order rogue waves of cmKdV equation at all. However, our proposed models not only accomplish the task that PINNs cannot, but also significantly improve the simulation capability. And the prediction accuracy improved by three or four orders of magnitude. We verify the inverse problem of cmKdV equation with the proposed model; the numerical results in Tables 6 and 7 show that proposed models have good performance. In addition, we found that after adding noise data, our proposed models still have good simulation ability, which also proves that these models have good robustness.