1 Introduction

In recent years, due to the wide application of nonlocal equations in various aspects, it has attracted the attention of many scholars [1,2,3]. The most classical nonlocal equation was the nonlocal nonlinear Schrödinger(NNLS) equation proposed by Ablowitz and Musslimani in 2013 [2]. And the most important parity-time(PT) symmetry in nonlocal equations was introduced into AKNS system for the first time. Later, many scholars have studied this equation at various levels [4,5,6,7,8]. Then, many other nonlocal equations were proposed and studied, such as nonlocal Davey-Stewartson equation [9], nonlocal derivative nonlinear Schrödinger equation [10], nonlocal Hirota [11, 12], nonlocal KdV [13] and so on. Recently, a nonlocal modified KdV(mKdV) was proposed

$$\begin{aligned} u_{xxx}+u_{t}+6uu(-x,-t)u_{x}=0, \end{aligned}$$
(1)

which also called reverse-space-time mKdV equation. In physical applications, the nonlocal mKdV has shifted parity and delayed time reversal symmetry, which is related to Alice Bob system [14]. In fact, the nonlocal mKdV equation has been widely discussed by scholars. For example, the inverse scattering transform of the nonlocal mKdV equation was given in Ref. [15, 16]. The soliton solution of the nonlocal mKdV equation was solved through Darboux transformation [17]. The Dbar dressing method for the nonlocal mKdV equation was shown in Ref. [18]. The long-time asymptotic behavior of the nonlocal mKdV equation with decaying initial data was studied by using Deift-Zhou steepest descent method [19].

Conservation law is universal in applied mathematics [20]. It reflects a phenomenon that some physical quantities do not change with time. In soliton theory, conservation law plays an important role in discussing the integrability of soliton equations. The existence of infinitely many conservation laws is closely related to the existence of soliton equations. In fact, most nonlinear development equations with soliton solutions have infinitely many conservation laws. Therefore, for a soliton system, finding its infinite conservation laws has important practical and theoretical significance for proving the integrability of the system. Since Miura, Gardner and Kruskal discovered that the KdV equation has an infinite conservation laws [21], a series of methods have been developed to construct the (1 + 1)-dimensional integrable system, some of which are no longer in use due to their limitations. For instance, through the scattering problem and the gradual expansion of the scattering quantity \(a(\lambda )\) can yield an endless number of conserved quantities [22], but it cannot be used to build the conservation rule, hence, this method is currently of little use in application areas. In the study of infinite conservation laws, Wadati et al. have made considerable contributions. Generally speaking, the infinite conservation law of a continuous system can be obtained through the following ways: Lax pair, Bäcklund transform, formal solution of eigenfunction and trace identity, etc. [23,24,25,26]. Although there are many ways to obtain infinite conservation laws, the final result is the same. This variety of methods and the consistency of results can be seen as an external manifestation of integrability.

In fact, there are many ways to investigate the integrability of a partial differential equation, such as Liuville integrable, Painlevé test, inverse scattering integrable, Lax integrable, Hirota bilinear, solvable potential and quantum inverse scattering integrable. Among them, the Painlevé test is a method proposed by Weiss, Tabor and Carnevale [27]. Many nonlinear partial differential equations can be verified by this method, and the scope of application is very wide. A large number of subsequent scholars have done a lot of work in this area to verify the effectiveness of the method [28, 29]. Especially, Wazwaz has extended this method to higher-dimensional equations and has also obtained good results [30, 31]. Then, a more important goal is to accurately solve the nonlinear partial differential equations. These solutions have very important practical significance in nonlinear physical systems such as nonlinear optics, biophysics, plasmas, cold atoms and Bose-Einstein condensate systems, and they are usually called nonlinear waves. According to physical properties and dynamic characteristics, solitary wave, lump, breather and rogue wave are four kinds of common nonlinear waves. Solitary waves have stability and particle properties, and all integrable equations have soliton solutions that reflect the general nonlinear phenomena in nature. The most classical soliton solutions, including bright soliton solutions, dark soliton solutions, kink solutions, and anti-kink solutions, have been widely studied for their dynamic behavior and numerical simulation solutions [32,33,34]. Compared with the solitary wave solution, the lump solution is a special type of rational function solution, which has localization in all directions of space [35]. Breather and rogue waves are two kinds of typical nonlinear structures with obvious instability localized on the plane wave background [36, 37]. In real life, there are a large number of nonlinear waves interacting with each other. It is of great significance to study the mechanism of their occurrence and the physical parameters controlled by the main interaction. In the process of interaction, there will be many local excited state phenomena. Therefore, it is very important to discuss the inelastic interaction between solitary waves of nonlinear partial differential equations and other nonlinear waves in the physical background, which can provide a theoretical tool for studying the relevant dynamic behavior.

Machine learning is the mainstream method to solve many AI problems at this stage. As an independent direction, it is developing at a high speed. As a form of machine learning, deep learning trains models with multiple hidden layers between input and output. In general, deep learning means using deep neural networks. Deep neural network has the advantages of fast computing speed and high accuracy and has been widely used in natural language processing [42], face recognition [38], speech recognition [41] and other fields [39, 40]. Recently, a new physical information neural network (PINN) is proposed by the mathematical physics system based on the multilayer network of deep learning mode, which is proved to be suitable for dealing with forward problems and highly ill inverse problems. The approximate solution of the control equation and the parameters of the control equation are found from the training data [43]. And the numerical findings demonstrate that high-dimensional network tasks can be successfully completed using the PINN approach with fewer data sets. This training neural network is a supervised learning task to solve some nonlinear partial differential equations that follow the laws of physics. Then, the PINN method is used to generate data-driven solutions to reveal the dynamic behavior of nonlinear partial differential equations under physical constraints, which has attracted wide attention from many scholars. Chen’s team has built many data-driven solutions of nonlinear partial differential equations using the PINN approach during the last two years, including soliton solutions [44, 45], breather solutions [46], rational solution [47], rogue wave [46, 48], higher-order breather waves [49]and rogue periodic wave [50]. In particular, Lin and Chen add the properties of integrable systems such as conserved quantities and Miura transform to the training network and propose a two-stage PINN method and find new local wave solutions [51, 52]. In addition, other scholars also obtained some important results of data-driven solutions for nonlinear partial differential equations using PINN method [53,54,55,56,57]. In particular, Dai’s team extended PINN to non-integrable equations and used PINN to predict multipolar soliton solutions of the saturated nonlinear fractional Schrödinger equation [58]. As far as we know, the application of PINN to nonlocal equations has been rarely studied.

In this paper, we will derive the Ricatti equation from the x part of the Lax pair for the nonlocal mKdV equation and construct the conservation law using the compatibility condition. The infinite conservation laws and infinite conserved quantities are obtained from the solution of Ricatti equation. Besides, we add the nonlocal term to the classical PINN to simulate the data-driven solutions of the nonlocal mKdV equation under zero boundary and nonzero boundary conditions and give the error analysis. At the same time, we use the PINN of the nonlocal term to discover the parameter of the nonlocal mKdV.

The structure of this paper is as follows. In Sect. 2, we mainly study the integrability of the nonlocal mKdV equation and obtain its infinite conservation laws by using the Riccti equation. In Sect. 3, the composition of PINN is introduced. Then, we use PINN to study the data-driven solutions under zero boundary conditions and give its dynamic behavior in Sect. 4. In Sect. 5, we learn the data-driven solutions of the nonzero boundary of the nonlocal mKdV equation and its dynamic behavior. In Sect. 6, the inverse problem is learned based on PINN, which mainly studies the nonlinear coefficients for the nonlocal mKdV equation, while adding different noises to the network. The conclusion is given in Sect. 7.

2 Conservation laws for the nonlocal mKdV equation

The nonlocal mKdV Eq. (1) has the following Lax pair

$$\begin{aligned} \begin{aligned}&\varPhi _x=M \varPhi , \quad M=M(x, t; \lambda ):=i \lambda \sigma _3+U, \\&\quad \varPhi _t=N \varPhi , \quad N=N(x, t; \lambda )\\&\quad :=\left[ 4 \lambda ^2-2 u(x, t) u(-x,-t)\right] M-2 i \lambda \sigma _3 U_x\\&\quad +\left[ U_x, U\right] -U_{x x}, \end{aligned} \end{aligned}$$
(2)

where \(\varPhi \) is the matrix eigenfunction, \(\lambda \) is the spectral parameter and

$$\begin{aligned} \sigma _3=\left[ \begin{array}{cc} 1 &{} 0 \\ 0 &{} -1 \end{array}\right] ,~~U(x, t)=\left[ \begin{array}{cc} 0 &{} u(x, t) \\ -u(-x,-t) &{} 0 \end{array}\right] . \end{aligned}$$

For the solution of Lax pair Eq. (2) written in the form of vector \(\varPhi =(\phi _1,\phi _2)^T\), the following function is introduced

$$\begin{aligned} \varGamma =\frac{\phi _2}{\phi _1}, \end{aligned}$$

we can put it into Eq. (2) and get

$$\begin{aligned} \begin{aligned}&(ln{\phi _1})_x=i\lambda +u\varGamma ,\\&\quad (ln{\phi _1})_t=4i\lambda ^3-2iuu(-x,-t)\lambda -u_xu(-x,-t)\\&\quad -u_x(-x,-t)u+(4\lambda ^2u-2u^2u(-x,-t)\\&\quad -2i\lambda u_x-u_{xx})\varGamma . \end{aligned} \end{aligned}$$

The derivatives of x and t of the above two equations are as follows

$$\begin{aligned}{} & {} (u\varGamma )_t=(-2iuu(-x,-t)\lambda \nonumber \\{} & {} \quad -u_xu(-x,-t)-u_x(-x,-t)u\nonumber \\{} & {} \quad +(4\lambda ^2u-2u^2u(-x,-t)-2i\lambda u_x-u_{xx})\varGamma )_x. \end{aligned}$$
(3)

In addition, combining the Lax pair equation, we can get the Riccti equation and conservation law equation about \(\varGamma \)

$$\begin{aligned} \begin{aligned}&\varGamma _x=-u(-x,-t)-2i\lambda \varGamma -u\varGamma ^2,\\&\quad \varGamma _t=-4\lambda ^2u(-x,-t)+2uu^2(-x,-t)\\&\quad +2i\lambda u_x(-x,-t)+u_{xx}(-x,-t)+A\varGamma -B\varGamma ^2, \end{aligned} \end{aligned}$$

where

$$\begin{aligned}{} & {} A=-8i\lambda ^3+4i\lambda uu(-x,-t)+2u_xu(-x,-t)\\{} & {} \quad +2uu_x(-x,-t),~~B=4\lambda ^2u-2u^2u(-x,-t)\\{} & {} \quad -2i\lambda u_x-u_{xx}. \end{aligned}$$

The function \(\varGamma \) is expanded in the following series

$$\begin{aligned} \varGamma (x, t, \lambda )=\sum _{n=1}^{\infty } \frac{\varGamma ^{(n)}(x, t, \lambda )}{(2 i \lambda )^n} \end{aligned}$$

and collects the coefficient of the same power of \(\lambda \) to obtain the following formula

$$\begin{aligned} \begin{aligned}&\varGamma ^{(1)}=-u(-x,-t),~~~\varGamma ^{(2)}=-u_x(-x,-t),\\&\quad \varGamma ^{(3)}=-u_{xx}(-x,-t)-uu^2(-x,-t),\\&\quad \varGamma ^{(4)}=u_xu^2(-x,-t)-4uu(-x,-t)u_x(-x,-t)\\&\quad -u_{xxx}(-x,-t),\\&\quad \varGamma ^{(2n)}=-\varGamma ^{(2n-1)}_x-2u\sum _{l+k=2n-1}\varGamma ^{(l)}\varGamma ^{(k)},\\&\quad \varGamma ^{(2n+1)}=-\varGamma ^{(2n)}_x-u\left( {\varGamma ^{(n)}}^2+2\sum _{l+k=2n}\varGamma ^{(l)}\varGamma ^{(k)}\right) . \end{aligned} \end{aligned}$$

Thus, we can write several conservation laws for the nonlocal mKdV equation as

$$ \begin{aligned}{} & {} (-uu(-x,-t))_t=(3u^2u^2(-x,-t)+u_{xx}u(-x,-t)\\{} & {} \quad +uu_{xx}(-x,-t)+u_xu_x(-x,-t))_x,\\{} & {} \quad (uu_x(-x,-t))_t=(-6u^2u_x(-x,-t)u(-x,-t)\\{} & {} \quad -u_xu_{xx}(-x,-t)-u_{xx}u_x(-x,-t)\\{} & {} \quad -uu_{xxx}(-x,-t))_x,\\{} & {} \quad (-uu_{xx}(-x,-t)-u^2u^2(-x,-t))_t\\{} & {} \quad =(-4u^2u(-x,-t)u_{xx}(-x,-t)\\{} & {} \& quad +4u^3u^3(-x,-t)+u_{xx}u_{xx}(-x,-t)\\{} & {} \quad -u^2_xu^2(-x,-t)-2uu_xu(-x,-t)u_x(-x,-t)\\{} & {} \quad +uu_{xx}u^2(-x,-t)\\{} & {} \quad -2uu_xu(-x,-t)u_x(-x,-t)\\{} & {} \quad +u_xu_{xxx}(-x,-t)+5u^2u_x^2(-x,-t)\\{} & {} \quad +uu_{4x}(-x,-t))_x,\\{} & {} \quad ~~~~~.~~~.~~~.~~~,\\{} & {} \quad (u\varGamma ^{(n)})_t=((-2u^2u(-x,-t)-u_{xx})\varGamma ^{(n)}\\{} & {} \quad -u_{x}\varGamma ^{(n+1)}-u\varGamma ^{(n+2)})_x. \end{aligned}$$

At the same time, the corresponding conserved quantities can also be obtained as

$$\begin{aligned} \begin{aligned}&I_1=\int _{-\infty }^{+\infty }-uu(-x,-t)dx,\\&I_2=\int _{-\infty }^{+\infty }-uu_x(-x,-t)dx,\\&I_3=\int _{-\infty }^{+\infty }(uu_{xx}(-x,-t)-u^2u^2(-x,-t))dx,\\&~~~~~.~~.~~.~~,\\&I_n=\int _{-\infty }^{+\infty }u\varGamma ^{(n)}dt. \end{aligned} \end{aligned}$$
(4)

The above proves the integrability of the nonlocal mKdV equation. Next, we will apply PINN to the integrable nonlinear equation to obtain its data-driven solutions.

3 The PINN deep learning method

In this part, we will introduce the PINN deep learning method for partial differential equation. Generally, the (1+1)-dimensional nonlinear partial differential equation has the following form

$$\begin{aligned} u_{t}+{\mathcal {N}}[{u}]=0,~x\in [x_0,x_1],~t\in [t_0,t_1] \end{aligned}$$
(5)

where u is a function of x and t, \({\mathcal {N}}[\cdot ]\) is a nonlinear differential operator in space, which generally contains high-order dispersion terms and nonlinear terms. Then, the following equation can be defined by the left part of Eq. (5)

$$\begin{aligned} f:=u_{t}+{\mathcal {N}}[{u}], \end{aligned}$$
(6)

and a deep neural network is used to approximate u. Here, without losing generality, we first consider a neural network with depth H, which is composed of an input layer, \(H-1\) hidden layers and an output layer. And the \(h^{th}\) hidden layer is composed of \(N_h\) neurons, and then, the output \({\textbf{x}}^{h-1}\) of the previous layer after the action of the activation function \(\sigma \) is taken as the input of the next hidden layer. This process is formed through the following radiation transformation

$$\begin{aligned} {\textbf{x}}^{h}=\sigma \left( {\varLambda }_h\left( {\textbf{x}}^{h-1}\right) \right) =\sigma \left( {\textbf{w}}^h {\textbf{x}}^{h-1}+{\textbf{b}}^h\right) , \end{aligned}$$

where \({\textbf{w}}^h\in {\mathbb {R}}^{N_h\times N_{h-1}}\) and \({\textbf{b}}^h\in {\mathbb {R}}^{N_h}\) are the weights and deviations of the \(h^{th}\) layer. Usually, we initialize the bias term to zero, the weight is initialized by Xavier initialization, and the activation function is selected as tanh function. After layer upon layer operation, a neural network can be obtained as

$$\begin{aligned} u\left( {\textbf{x}}^0, \varTheta \right) =\left( \varLambda _H \circ \sigma \circ \varLambda _{H-1} \circ \cdots \circ \sigma \circ \varLambda _1\right) \left( {\textbf{x}}^0\right) , \end{aligned}$$

where the operator \(``\circ ''\) is the composition operator, and \(\varTheta =\left\{ {\textbf{w}}^h, {\textbf{b}}^h\right\} _{h=1}^H\) represents the parameters that can be learned in the network. The core of the neural network is to constantly update the weights and deviations so that the solution u of the partial differential equation satisfies Eq. (6) and minimizes f. And the neural network f and the network representing u have the same parameters, and these shared parameters can be learned by minimizing the mean square error loss

$$\begin{aligned} {\textrm{Loss}}_{1}={\textrm{Loss}}_u+{\textrm{Loss}}_{f}, \end{aligned}$$
(7)

where

$$\begin{aligned}&{\textrm{Loss}}_{u}=\frac{1}{N_{u}}\sum _{i=1}^{N_{u}}|u(x_{u}^{i},t_{u}^{i})-u^{i}|^{2}, \end{aligned}$$
(8)
$$\begin{aligned}&{\textrm{Loss}}_{f}=\frac{1}{N_{f}}\sum _{j=1}^{N_{f}}|f(x_{f}^{j},t_{f}^{j})|^{2}, \end{aligned}$$
(9)

\(\{x_{u}^{i}, t_{u}^{i}, u^{i}\}_{i=1}^{N_{u}}\) is the sampled initial and boundary value training data of u(xt). Similarly, the collocation points for f(xt) are marked by \(\{x_{f}^{j}, t_{f}^{j}\}_{j=1}^{N_{f}}\). The loss function (7) contains the losses of initial boundary value data and the losses of networks (5) at a finite set of collocation points.

Further, if the solution of the partial differential equation is complex, we can write the solution of the final form as \(u=p+iq\). Thus, Eq. (7) can be divided into two equations with real part and imaginary part, as follows

$$\begin{aligned}&p_{t}+{\mathcal {N}}[p]=0, \end{aligned}$$
(10)
$$\begin{aligned}&q_{t}+{\mathcal {N}}[q]=0. \end{aligned}$$
(11)

Then, the physics-informed neural networks \(f_{p}(x, t)\) and \(f_{q}(x, t)\) can be defined as

$$\begin{aligned}&f_{p}:=p_{t}+{\mathcal {N}}[p], \end{aligned}$$
(12)
$$\begin{aligned}&f_{q}:=q_{t}+{\mathcal {N}}[q], \end{aligned}$$
(13)

where p(xtwb), q(xtwb) are the latent function of the deep neural network with the weight parameter w and bias parameter b, which can be used to approximate the exact complex-valued solution u(xt) of objective equations. This form is reasonable, and in Ref. [59], the network \(f_{p}(x, t), f_{q}(x, t)\) can be found under the automatic differentiation mechanism.

Similarly, there is also a loss function in this network, and its form is more complex. We can set it to the following form

$$\begin{aligned} {\textrm{Loss}}^{'}_{1}={\textrm{Loss}}_{p}+{\textrm{Loss}}_{q}+{\textrm{Loss}}_{f_{p}}+{\textrm{Loss}}_{f_{q}}, \end{aligned}$$
(14)

where

$$\begin{aligned}&{\textrm{Loss}}_{p}=\frac{1}{N_{p}}\sum _{i=1}^{N_{p}}|p(x_{p}^{i},t_{p}^{i})-p^{i}|^{2}, \nonumber \\&{\textrm{Loss}}_{q}=\frac{1}{N_{q}}\sum _{i=1}^{N_{q}}|q(x_{q}^{i},t_{q}^{i})-q^{i}|^{2}, \end{aligned}$$
(15)

and

$$\begin{aligned}&{\textrm{Loss}}_{f_{p}}=\frac{1}{N_{f_p}}\sum _{j=1}^{N_{f_p}}|f_{p}(x_{f}^{j},t_{f}^{j})|^{2},\nonumber \\&{\textrm{Loss}}_{f_{q}}=\frac{1}{N_{f_q}}\sum _{j=1}^{N_{f_q}}|f_{q}(x_{f}^{j},t_{f}^{j})|^{2}, \end{aligned}$$
(16)

where \(\{x_{p}^{i}, t_{p}^{i}, p^{i}\}_{i=1}^{N_{p}}\) and \(\{x_{q}^{i}, t_{q}^{i}, q^{i}\}_{i=1}^{N_{q}}\) are the sampled initial and boundary value training data of u(xt). Similarly, the collocation points for \(f_{p}(x, t)\) and \(f_{q}(x, t)\) are marked by \(\{x_{f}^{j}, t_{f}^{j}\}_{j=1}^{N_{f_p}}\) and \(\{x_{f}^{j}, t_{f}^{j}\}_{j=1}^{N_{f_q}}\).

For the loss function (14), the first two terms try to make the learned solution close to the exact solution when approaching the initial value and boundary value data, and the last two terms make the hidden functions p and q meet Eqs. (12) and (13).

4 Soliton solutions of the nonlocal mKdV equation under zero boundary condition

In this part, we mainly use the neural network method to obtain the simulation solution of the nonlocal mKdV equation under the zero boundary condition, as well as its dynamic behavior and error analysis. We consider the nonlocal mKdV equation with Dirichlet boundary conditions, which is given by the following formula:

$$\begin{aligned} \left\{ \begin{array}{l} u_{xxx}+u_{t}+6uu(-x,-t)u_{x}=0, \quad x\in [x_{0}, x_{1}], t\in [t_{0}, t_{1}],\\ u(x, t_{0})=u_{0}(x),\\ u(x_{0}, t)=u_{1}(t),\ u(x_{1},t)=u_{2}(t), \end{array} \right. \end{aligned}$$
(17)

where \(x_{0}, x_{1}\) represent the boundary of x, and \(t_{0}, t_{1}\) represent the start and end times of time t. The \(u_{0}(x)\) defines the initial condition.

In order to better understand the PINN method, we give the flow diagram of the nonlocal mKdV equation according to Sect. 3. It can be seen from Fig. 1 that compared with the classical network diagram of the local equation, the nonlocal term is added to the NN part, which makes more functions output and more complex training. Then, the relevant physical information is supplemented, and the loss function is evaluated by NN and the residual of the control equation given in combination with the relevant physical information. Then, the weight w and the deviation b are continuously updated to minimize the loss function to less than a certain tolerance \(\varepsilon \) until the specified maximum number of iterations is reached.

Fig. 1
figure 1

(Color online) The PINN scheme solving the nonlocal mKdV equation, where \({\tilde{u}}=u(-x,-t), {\tilde{p}}=p(-x,-t)\) and \({\tilde{q}}=q(-x,-t)\)

Next, we will use PINN to simulate the 1-soliton and 2-soliton solutions of the nonlocal mKdV equation, including kink solution, complex solution and their interactions. At the same time, the simulated solutions are compared with the accurate solutions.

4.1 1-soliton solution

In Ref. [17], the method of Darboux transformation is used to obtain the kink solution form of the nonlocal mKdV:

$$\begin{aligned} u(x, t)=\frac{-2 \nu }{1+{\textrm{e}}^{-2 \nu \left( x-4 \nu ^2 t\right) }}, \end{aligned}$$

when \(\nu =-1\), a kink solution is generated,

$$\begin{aligned} u(x,t)=\frac{2}{1+{\textrm{e}}^{2x-8t}}, \end{aligned}$$
(18)

which is a real solution. Therefore, according to Eq. (6), the PINN f(xt) can be constructed as

$$\begin{aligned} f:=u_{xxx}+u_{t}+6uu(-x,-t)u_{x}, \end{aligned}$$
(19)

where we choose [-4,4] as the boundary of x and t, so we can give the initial and boundary information as follows:

$$\begin{aligned} \left\{ \begin{array}{l} u_0(x)=u(x,-4)=\frac{2}{1+{\textrm{e}}^{2x+32}},\\ ~\\ u_1(t)=u(-4,t)=\frac{2}{1+{\textrm{e}}^{-8-8t}},\\ ~\\ u_2(t)=u(4,t)=\frac{2}{1+{\textrm{e}}^{8-8t}}. \end{array} \right. \end{aligned}$$
(20)

To get a better simulation effect, we divide x into 512 points in the interval and 200 points in the time interval. That is to say, the interval is divided into \(512\times 200\) points. \(N_u=100\) points are randomly selected in the initial boundary data set, and the internal \(N_f=10000\) points are sampled by the Latin hypercube sampling method [60]. Here, we construct a feedforward neural network with six layers and 40 neurons in each hidden layer. By adjusting all the learnable parameters of the neural network and the loss function, we successfully learn the 1-soliton solution u(xt). The relative \({\mathbb {L}}_2\) error of the final PINN model is \(2.583821{\textrm{e}}-04\) in about 84.7319 s, and the number of iterations is 346.

Fig. 2
figure 2

(Color online) The 1-soliton solution u(xt) for the nonlocal mKdV equation: a The density plot and the error density diagram; b The wave propagation plot at three different times; c The three-dimensional plot; d The loss curve figure

Figure 2a and b shows the density plots of the exact solution and the learning solution, the error density plots and three wave propagation plots at different times, respectively. We can find that the error between the learning solution and the accurate solution is very small. Figure 2c and d displays the three-dimensional motion and loss curves, respectively. As shown in Fig. 2d, the loss curve is relatively smooth, which implies that the integrable deep learning method is very effective and stable.

In addition, the complex soliton solution of the nonlocal mKdV equation was given by the Hirota direct method in Ref. [61], which can be expressed as

$$\begin{aligned} u(x,t)=\frac{(1+\frac{3}{2}i){\textrm{e}}^{ix+it}}{1+{\textrm{e}}^{(1+\frac{3}{2}i)x-(\frac{1}{4}+\frac{3}{8}i)t}}. \end{aligned}$$
(21)

In this case, we set \(u=p+iq\), then the construction of PINN needs to divide the real part and the imaginary part of the equation reference Eqs.(12) and (13),

$$\begin{aligned} f_{p}&:=p_{xxx}+p_{t}+6pp(-x,-t)p_{x}\nonumber \\&\quad -6q(-x,-t)q_xp\nonumber \\&\quad -6p(-x,-t)qq_x-6qq(-x,-t)p_x, \end{aligned}$$
(22)
$$\begin{aligned} f_{q}&:=q_{xxx}+q_{t}-6qq(-x,-t)q_{x}\nonumber \\&\quad +6pp(-x,-t)q_x\nonumber \\&\quad +6pp_xq(-x,-t)+6p(-x,-t)qp_x. \end{aligned}$$
(23)

noindent Let \([x_{0}, x_{1}]\) and \([t_{0}, t_{1}]\) in Eq.(17) be \([-5, 5]\) and \([-\frac{1}{100}, \frac{1}{100}]\), respectively. We select the complex soliton solution at \(t=-\frac{1}{100}\) as the initial condition

$$\begin{aligned} u_0(x)=u(x,-\frac{1}{100})=\frac{(1+\frac{3}{2}i){\textrm{e}}^{ix-\frac{1}{100}i}}{1+{\textrm{e}}^{(1+\frac{3}{2}i)x+\frac{1}{400}+\frac{3}{800}i}}. \end{aligned}$$
(24)

With the help of LHS, \(N_u=100\) collocation points and \(N_f=5000\) collocation points are randomly selected at the boundary and inside, respectively, to obtain training data and put them into a network with six hidden layers and 40 neurons. Finally, we successfully learned the complex soliton solution, and the training solution and accurate solution achieved a \({\mathbb {L}}_{2}\) error of 5.639394e\(-\)04. The whole training process take 743.5638 s, with 10,414 iterations. In Fig. 3, we not only give the density and error diagrams of the exact solution and the training solution, but also show the differences between the two solutions at different time stages. It can be seen from Fig. 3b that they are very consistent. In addition, we also give the three-dimensional graph of the training solution and the loss function graph in the iteration process in graphs (c) and (d).

Fig. 3
figure 3

(Color online) The complex soliton solution u(xt) for the nonlocal mKdV equation: a The density plot and the error density diagram; b The wave propagation plot at three different times; c The three-dimensional plot; d The loss curve figure

4.2 2-soliton solution

In this subsection, we mainly simulate two soliton solutions, including bright-bright soliton and soliton kink interaction solutions. As shown in Ref. [61], the general formula of the 2-soliton solution is

$$\begin{aligned} u(x,t)=\frac{F}{G}, \end{aligned}$$
(25)

where

$$\begin{aligned} F= & {} \frac{a_1a_2\left( f_1 {\textrm{e}}^{(1+k+l) x-(k^3+l^3+1) t}+f_2 {\textrm{e}}^{\left( \frac{3}{2}+k\right) x-\left( k^3+\frac{9}{8}\right) t}\right) }{l-\frac{1}{2}}\\{} & {} +\frac{f_3{\textrm{e}}^{-t+x}+f_4 {\textrm{e}}^{-k^3 t+k x}}{1-k},\\{} & {} G{=}1\\{} & {} {+}\frac{g_1 {\textrm{e}}^{(1+l) x{-}(l^3+1) t}{+}g_2 {\textrm{e}}^{\frac{3}{2} x{-}\frac{9}{8} t}{+}g_3 {\textrm{e}}^{(k+l) x{-}(k^3{+}l^3) t}{+}g_4 {\textrm{e}}^{(k{+}\frac{1}{2}) x{-}(k^3{+}\frac{1}{8}) t}}{(1-k)(l-\frac{1}{2})}\\{} & {} +a_1 a_2 b_1 b_2 {\textrm{e}}^{(\frac{3}{2}+k+l) x-(k^3+l^3+\frac{9}{8}) t},\\{} & {} f_1=\frac{3}{2}(k+\frac{1}{2}) b_1,~~~~~~~~~~~~~~~~~~f_2=(1+l)(k+l) b_2,\\{} & {} f_3=\frac{3}{2} a_1(1+l),~~~~~~~~~~~~~~~~~~f_4=a_2(k+\frac{1}{2})(k+l),\\{} & {} g_1=\frac{3}{2} a_1 b_1(k+l),~~~~~~~~~~~~~~~~g_2=a_1 b_2(1+l)(k+\frac{1}{2}),\\{} & {} g_3=a_2 b_1(k+\frac{1}{2})(1+l),~~~~~~~~~g_4=\frac{3}{2} a_2 b_2(k+l). \end{aligned}$$

When \(k=\frac{1}{2},l=1,a_1=a_2=b_1=b_2=1\), Eq. (25) can be reduced to

$$\begin{aligned}{} & {} u(x,t)\nonumber \\{} & {} \quad =\frac{6 {\textrm{e}}^{x-t}+3 {\textrm{e}}^{\frac{1}{2} x-\frac{1}{8} t}+3 \textrm{e}^{\frac{5}{2} x-\frac{17}{8} t}+6 {\textrm{e}}^{2 x-\frac{5}{4} t}}{1+9 {\textrm{e}}^{2 x-2 t}+16 {\textrm{e}}^{\frac{3}{2} x-\frac{9}{8} t}+9 {\textrm{e}}^{x-\frac{1}{4} t}+{\textrm{e}}^{3 x-\frac{9}{4} t}},\nonumber \\ \end{aligned}$$
(26)

which is a non-singular 2-soliton solution. Let \([x_{0}, x_{1}]\) and \([t_{0}, t_{1}]\) in Eq. (17) be \([-15, 15]\), the corresponding initial condition is given by

$$\begin{aligned}&u_{0}(x)=u(x,-15)\nonumber \\&\quad =\frac{6 {\textrm{e}}^{x+15}+3 {\textrm{e}}^{\frac{1}{2} x+\frac{15}{8}}+3 {\textrm{e}}^{\frac{5}{2} x+\frac{255}{8}}+6 {\textrm{e}}^{2 x+\frac{75}{4}}}{1+9 {\textrm{e}}^{2 x+30}+16 {\textrm{e}}^{\frac{3}{2} x+\frac{135}{8}}+9 {\textrm{e}}^{x+\frac{15}{4}}+{\textrm{e}}^{3 x+\frac{135}{4}}}. \end{aligned}$$
(27)

We select the same configuration points as the 1-soliton solution and use the LHS method to obtain training data sets, then we input these training data sets into the depth network of nine hidden layers, each of which has 40 neurons. We have successfully learned the 2-soliton solution. Compared with the exact solution, its \({\mathbb {L}}_2\)-norm error is 5.937731e\(-\)02. The whole training time is 162.7119 s, and the number of iterations is 1715. What needs to be noted here is not that more configuration points or more training network layers will lead to better results. Through experiments, we choose deeper network layers, and the results will be worse. The \({\mathbb {L}}_2\)-norm error will become 2.027299e\(-\)01. The training time is 155.5235 s, and the number of iterations is 1402. The training results are shown in Fig. 4, including the density diagram, error dynamics diagram, propagation diagram at different times, three-dimensional diagram and loss curve diagram of learning 2-soliton solution and accurate 2-soliton solution. It can be seen that the results are quite satisfactory.

Fig. 4
figure 4

(Color online) The 2-soliton solution u(xt) for the nonlocal mKdV equation: a The density plot and the error density diagram; b The wave propagation plot at three different times; c The three-dimensional plot; d The loss curve figure

When \(k=0,l=2,a_1=a_2=b_1=b_2=-1\), Eq. (25) can be reduced to

$$\begin{aligned}{} & {} u(x,t)\nonumber \\{} & {} =-\frac{1+\frac{9}{2} {\textrm{e}}^{x-t}+\frac{1}{2} {\textrm{e}}^{3 x-9 t}+4 {{\textrm{e}}}^{\frac{3}{2} x-\frac{9}{8} t}}{1+2 {\textrm{e}}^{3 x-9 t}+{\textrm{e}}^{\frac{3}{2} x-\frac{9}{8} t}+{\textrm{e}}^{-8 t+2 x}+2 {\textrm{e}}^{\frac{1}{2} x{-}\frac{1}{8} t}{+}{\textrm{e}}^{\frac{7}{2} x{-}\frac{73}{8} t}},\nonumber \\ \end{aligned}$$
(28)

which represents interaction of soliton and kink-type wave. The Dirichlet boundary x and t of Eq. (28) are [-10,10] and [-15,15], respectively. The initial training condition is

$$\begin{aligned}{} & {} u(x,t)\nonumber \\{} & {} =-\frac{1{+}\frac{9}{2} {\textrm{e}}^{x+15}{+}\frac{1}{2} {\textrm{e}}^{3 x+135}{+}4 {\textrm{e}}^{\frac{3}{2} x{+}\frac{135}{8}}}{1+2 {\textrm{e}}^{3 x+135}{+}{\textrm{e}}^{\frac{3}{2} x{+}\frac{135}{8}}{+}{\textrm{e}}^{2 x+120}{+}2 {\textrm{e}}^{\frac{1}{2} x{+}\frac{15}{8}}{+}{\textrm{e}}^{\frac{7}{2} x{+}\frac{1095}{8}}}.\nonumber \\ \end{aligned}$$
(29)

Select the same configuration points as above to obtain training data, and put them into a six-layer neural network with each 40 neurons. The solution of soliton kink interaction is well learned. Under the condition of training duration of 373.2689 s and iteration number of 7028, we get that the error of \({\mathbb {L}}_2\)-norm between the exact solution and the training solution is 1.452126e\(-\)02. In Fig. 5a, b, the density graph, error graph and time evolution graph of the training and exact solution at t=\(-\)10.05, t=0, t=10.05 are specifically given. In Fig. 5c, d, we use the training data to give the three-dimensional graph and the iterative degree graph.

Fig. 5
figure 5

(Color online) Solution of the interaction between soliton and kink-type waves u(xt) for the nonlocal mKdV equation: a The density plot and the error density diagram; b The wave propagation plot at three different times; c The three-dimensional plot; d The loss curve figure

5 Data-driven solutions of the nonlocal mKdV equation with nonzero boundary

This part mainly studies the solutions of the nonlocal mKdV equation under nonzero boundary, including kink solution, dark solution, anti-dark solution and rational solution.

5.1 Data-driven solitary wave solution

The kink solution of the nonlocal mKdV with nonzero boundary has been obtained by inverse scattering method in Ref. [16], which is expressed as

$$\begin{aligned} u(x, t)=-{\tilde{u}}_0\tanh \left[ {\tilde{u}}_0\left( x+2 {\tilde{u}}_0^2 t\right) +\frac{\theta _1+\theta _{2}}{2} i\right] {\textrm{e}}^{i \theta _{2}}. \end{aligned}$$

In order to distinguish between the training initial boundary and the nonzero boundary conditions satisfied by the equation, we use \({\tilde{u}}_0\) to represent the nonzero boundary conditions satisfied by the equation. The neural network is constructed by Eq. (6), and its training initial condition is

$$\begin{aligned}{} & {} u_0(x)=u(x,-10)\\{} & {} \quad =-{\tilde{u}}_0\tanh \left[ {\tilde{u}}_0\left( x-20 {\tilde{u}}_0^2\right) +\frac{\theta _1+\theta _{2}}{2} i\right] {\textrm{e}}^{i \theta _{2}}. \end{aligned}$$

At this time, we select the region of x and t as \([-10,10]\) and divide 512 points and 200 points in the region of x and t, respectively. \(N_u=1000\) and \(N_f=10,000\) points are randomly selected as training data at the boundary and inside, respectively. We numerically predict the kink soliton solution with proper parameters \({\tilde{u}}_0=1, \theta _1=\theta _2=0\) under a six hidden layer deep PINN with the 40 neurons per layer. The training results show that its \({\mathbb {L}}_2\) error is 5.162946e\(-\)04 compared with the exact solution, the training time is merely 23.7374 s, and the number of iterations is only 350, the final result is shown in Fig. 6. Once again, it shows the power of the deep integrable system.

Fig. 6
figure 6

(Color online) The kink soliton solution u(xt) for the nonlocal mKdV equation under nonzero boundary: a The density plot and the error density diagram; b The wave propagation plot at three different times; c The three-dimensional plot; d The loss curve figure

Fig. 7
figure 7

(Color online) The dark soliton solution u(xt) for the nonlocal mKdV equation: a The density plot and the error density diagram; b The wave propagation plot at three different times; c The three-dimensional plot; d The loss curve figure

Fig. 8
figure 8

(Color online) The anti-dark soliton solution u(xt) for nonlocal mKdV equation: a The density plot and the error density diagram; b The wave propagation plot at three different times; c The three-dimensional plot; d The loss curve figure

The dark and anti-dark soliton solutions of the nonlocal mKdV were also given in Ref. [16], and the overall expression is

$$\begin{aligned}{} & {} u(x, t)\nonumber \\{} & {} =\frac{{\textrm{e}}^{i \theta _{-}}}{w} \frac{w {\tilde{u}}_0\left( {\tilde{u}}_0^2+w^2\right) \left[ {\textrm{e}}^{2 \varphi {+}i\left( \theta _1{+}\theta _2\right) }{-}{\textrm{e}}^{2 i \theta _{-}}\right] {+}2\left( w^4 {\textrm{e}}^{i \theta _2}{-}{\tilde{u}}_0^4 {\textrm{e}}^{i \theta _1}\right) {\textrm{e}}^{\varphi {+}i \theta _{-}}}{\left( {\tilde{u}}_0^2{+}w^2\right) \left[ {\textrm{e}}^{2 \varphi {+}i\left( \theta _1{+}\theta _2\right) }{-}{\textrm{e}}^{2 i \theta _{-}}\right] {+}2 {\tilde{u}}_0 w\left( {\textrm{e}}^{i \theta _2}{-}{\textrm{e}}^{i \theta _1}\right) {\textrm{e}}^{\varphi {+}i \theta _{-}}},\nonumber \\ \end{aligned}$$
(30)

where

$$\begin{aligned} \varphi =\frac{\left( w^2-{\tilde{u}}_0^2\right) \left[ w^2 x-\left( w^4+4 w^2 {\tilde{u}}_0^2+{\tilde{u}}_0^4\right) t\right] }{w^3}, \end{aligned}$$

it is a dark soliton solution as \({\tilde{u}}_0=1,w=\frac{3}{2}, \theta _1=\pi , \theta _2=\theta =0,\) and it is an anti-dark soliton solution as \({\tilde{u}}_0=1,w=\frac{3}{2}, \theta _1=\theta =0,\theta _2=\pi \). For the dark soliton solution, let the interval of x and t be [-1,1], and the corresponding initial condition is

$$\begin{aligned} u_0(x)=u(x,-1)=\frac{39 {\textrm{e}}^{\frac{5}{3} x+\frac{1205}{108}}+39-97 {\textrm{e}}^{\frac{5}{6} x+\frac{1205}{216}}}{39 {\textrm{e}}^{\frac{5}{3} x+\frac{1205}{108}}+39-72 {\textrm{e}}^{\frac{5}{6} x+\frac{1205}{216}}}. \end{aligned}$$

For the anti-dark soliton solution, let the intervals of x and t be [-4,4] and [-2,2], respectively, and its initial condition becomes

$$\begin{aligned} u_0(x)=u(x,-2)=\frac{39 {\textrm{e}}^{\frac{5}{3} x+\frac{1205}{54}}+39+97 {\textrm{e}}^{\frac{5}{6} x+\frac{1205}{108}}}{39 {\textrm{e}}^{\frac{5}{3} x+\frac{1205}{54}}+39+72 {\textrm{e}}^{\frac{5}{6} x+\frac{1205}{108}}}. \end{aligned}$$

By performing the same data collection and training procedures as kink soliton solution, through training, we can get that the \({\mathbb {L}}_2\)-norm error between the learning solution and the accurate solution is 2.459108e\(-\)02 for the dark soliton solution, the whole learning process takes about 261.6747 s and iterates 4618. For the anti-dark soliton solution, the \({\mathbb {L}}_2\)-norm error between the learning solution and the exact solution is 3.244125e\(-\)04, and the whole learning process takes about 24.5986 s, with 345 iterations. The construction of this neural network is easier to learn the anti-dark soliton solution, takes less time, and the training result is more ideal. The learning results are shown in Figs. 7 and 8, respectively.

Fig. 9
figure 9

(Color online) The rational soliton solution u(xt) for the nonlocal mKdV equation: a The density plot and the error density diagram; b The wave propagation plot at three different times; c The three-dimensional plot; d The loss curve figure

Table 1 Parameter discovery through different neurons with different noises
Table 2 Parameter discovery through different internal configuration points with different noises
Fig. 10
figure 10

(Color online) Parameter discovery of the nonlocal mKdV equation: a Iteration diagram of different internal configuration points; b Iterative graph of different neurons in each layer with the same neural network depth; c Iterative graph under different noise conditions; d The variation of loss function with the different noise

5.2 Data-driven rational solution

In this section, we use the PINN method to construct the rational soliton solution of Eq. (1). The form of its solution was given in Ref. [16]

$$\begin{aligned} u(x,t)=-1+\frac{4}{1+4(x-6 t)^2}. \end{aligned}$$
(31)

If the boundary regions of x and t are [\(-\)0.5,0.5] and [\(-\)0.3,0.3], then the corresponding initial condition is

$$\begin{aligned} u(x,t)=-1+\frac{4}{1+4(x-\frac{5}{9})^2}, \end{aligned}$$
(32)

in order to learn rational soliton solution better and more accurately, \(N_u=100\) configuration points are selected at the boundary and \(N_f=5000\) configuration points are selected internally for training. The training results are compared with the accurate solution to achieve a relative \({\mathbb {L}}_2\) error of 1.164655e\(-\)02 in about 477.8897 s, and the number of iterations is 2848. In Fig. 9, we give the specific training results, including the density graph, error graph and time evolution graph of the exact solution and the training solution. The 3D stereoscopic map is generated from the training results. The training error map is not very stable, and there will be some burrs in the middle, but this will not affect the final training results.

6 The PINN algorithm for the data-driven parameter discovery

In this section, we focus on data-driven discovery of the nonlocal mKdV equation through PINN algorithm. It is well known that linear equations have good properties, while most equations in real life are nonlinear. It is precisely because of the existence of these nonlinear terms that more physical properties are stimulated and widely used in various fields, such as physics, mechanics, and life sciences. Therefore, it is necessary to learn the specific forms of nonlinear terms. For nonlocal nonlinear partial differential mKdV equation, we mainly study the coefficient a of the following nonlinear terms

$$\begin{aligned} u_{xxx}+u_{t}+auu(-x,-t)u_{x}=0. \end{aligned}$$
(33)

Theoretically, parameter can be found by using any known nonlocal mKdV solution. Here, we choose to use the complex solution to do parameter discovery, and choose Eqs. (22) and (23) as the physical information neural network of the nonlocal mKdV equation. The Latin hypercube is still used for sampling. With the help of the exact soliton solutions of a=6 and \((x, t)\in [-5, 5]\times [-\frac{1}{100}, \frac{1}{100}]\), the training data set is generated by randomly selecting \(N_u=200\) as the initial boundary data and \(N_f=50000\) as the configuration point. According to the training data set obtained, the data-driven parameter a can be found by using the PINN.

When the hidden layer of PINN is determined, we give the data-driven parameter a under different neurons of each layer and show the results in Table 1. From the table, it can be found that, when no noise is added to the system, the precision of PINN learning unknown parameters becomes higher and higher as the number of neurons increases. When we add different noises, the results of parameter discovery will oscillate with the increase in the number of neurons. When the number of hidden layers and neurons is fixed, the number of internal configuration points also affects the discovery of parameter. In Table 2, we list the results of parameter learning under different internal configuration points and different noises. It can be seen from the table that the more internal configuration points are given, the better the learning results will be. However, when the internal configuration points are determined, different noises are added to the neural network, and the results are also oscillatory. In general, the more configuration points, the greater the noise added to the neural network, and the result of parameter discovery is the closest to the ideal result.

In Fig. 10a and b, we show the iterative changes of unknown parameters in the process of inverse problem under different internal configuration points and different number of neurons. When other conditions are determined, the more internal configuration points or the more neurons in each layer, the faster the convergence of unknown parameters. In Fig. 10c and d, the changes of unknown parameters and loss functions with iteration are analyzed when different noises are used in the inverse problem. The iteration of unknown parameters under different noises is described in Fig. 10c. The convergence without noise is faster than that with noise, but the final learning result is better when the noise is \(0.5\%\). Figure 10d describes the change of loss functions of different noises with the increase in iteration times. The results show that the convergence effect is the best when \(0.5\%\) noise is added to the PINN. When \(0.1\%\) noise is added to the network, the convergence effect is the worst. This shows that adding proper noise into the network is beneficial to improving the training results.

7 Conclusion

In this paper, we first give the infinite conservation laws of the nonlocal mKdV equation by using Riccti-type equation, which shows the integrability of the nonlocal mKdV equation. Then, the components of PINN are introduced, and two forms of loss functions for nonlinear equation solutions are given. After that, the nonlocal term is added to the PINN deep learning to reconstruct the zero boundary solutions of the nonlocal mKdV equation, including kink solution, complex soliton solution, bright-bright soliton and bright-kink interaction solutions. For nonzero boundary, we use PINN to numerically simulate kink solution, dark soliton solution, anti-dark soliton solution and rational solution. At the same time, the error of each learning solution under \({\mathbb {L}}_2\) norm is given. The results show that the error between the exact solution and the predictive solution generated by the PINN deep learning method is very small, which verifies the effectiveness and stability of the integrable deep learning method. For a given region, PINN can quickly and accurately learn the corresponding solution by using less data and network layers, which shows the power of integrable deep learning. Finally, the problem of data-driven parameter discovery is solved. The coefficients of nonlinear terms for the nonlocal mKdV equation are learned through PINN. In the process of inverse problem processing, we add noise to the network. Using the control variable method, when the training data and network depth are determined, we compare the parameter discovery of different neurons in each hidden layer under different noise conditions and present the results in the form of a table. Under the condition of determining the network depth and neurons, we compare the learning results of parameters given different training datas and noises. The results show that the more internal configuration points are given, the more accurate the learning results are. In addition, the discovery of parameters is also closely related to noise. The addition of different noises has a great influence on the learning results, which indicates that noise has a strong sensitivity.

Here, we only verify the integrability of the nonlocal mKdV equation and numerically simulate various solutions for the nonlocal mKdV through PINN, without applying the integrability to PINN. Whether adding some properties of integrability to the PINN for nonlocal partial differential equations will have better results is a question we need to consider further. Just as the conservation laws were added to the PINN of the local equation in Ref. [51, 62], whether there is less error for the nonlocal equation under the \({\mathbb {L}}_2\)-norm is a question to be investigated. In addition, other neural networks can also predict the solutions of nonlinear partial differential equations. For example, Zhang has predicted and analyzed the dynamic behavior of bright soliton solutions, dark soliton solutions and rouge wave solutions of the (3+1)-dimensional Jimbo-Miwa equation, p-gBKP equation and other nonlinear partial differential equations through bilinear residual network and bilinear neural network [35, 37, 63,64,65], whether these neural network models are suitable for nonlocal equations is also the direction we need to consider in the future.