1 Introduction

Recently, there has been a significant surge of interest in fractional differential equations, due to their widespread application in modeling complex phenomena that exhibit memory and long-range dependence across various fields, such as turbulent flow [3, 27] and viscoelastic constitutive law [18]. This increasing interest has led to a lot of research focused on creating and analyzing methods to solve these equations [13, 23, 30]. In this paper, we propose a neural network for the numerical solution of the steady-state fractional advection dispersion equation (FADE) in high-dimensional spaces. The FADE has its application in modeling physical phenomena of anomalous diffusion [10, 19]. The theoretical well-posedness and numerical methods of this equation can be found in [4, 5, 25, 29, 32]. However, the non-local nature of fractional differential operators implies that the coefficient matrix associated with FADE tends to be dense or even full during discretization. Moreover, as the dimensionality increases, traditional methods inevitably require discretizing the solution domain, resulting in a substantial increase in the storage cost for computers. To overcome the numerical difficulties with non-locality and dimensionality, we explore the neural network solutions for space-fractional differential equations.

Over the last few years, various neural network structures have been developed to solve partial differential equations (PDEs) [6,7,8, 11, 12, 16, 17, 20, 24, 26, 28, 31]. Analogous to the difference between supervised and unsupervised learning algorithms, the neural networks for solving PDEs can be roughly divided into two categories. The first category of neural networks approximate the solution map governed by specified PDEs directly by training over the large set of boundary/initial conditions and the corresponding solutions [7, 8, 11, 12, 16, 17, 26]. An example of this neural network structure is the BCR-Net proposed in [7] based on wavelet transforms. These neural networks design specific structure due to certain properties of the proposed solution map, but the computational costs when generating the training data is significantly large. The second category of neural networks aims to approximate equations or the deformation of equations under specific boundary/initial conditions [6, 24, 31]. Physics-informed neural networks (PINNs) serve as a prime example of this category [24]. By representing the solution as a neural network and then incorporating the equation and boundary conditions to derive the loss function for training, this method shows its efficiency in deployment for various types of equations.

When it comes to fractional differential equations, a so-called fractional PINNs (fPINNs) was introduced in [22] to extend PINNs into fractional cases. In addition to the article [9, 22] proposed the Monte Carlo fPINNs method to address the computational challenges posed by the high dimensionality of fPINNs. Due to the \(L_2\) formulation of the loss function, these PINNs-based methods have good performance on classical solutions. However, the PINNs-based methods may not perform well for less smooth solutions [31].

To fill this research gap where the solution is less regular, we propose a novel neural network structure, called fractional weak adversarial networks (f-WANs), solving the weak formula of the fractional differential equations using the generative adversarial structure. Such a structure has been applied to elliptic partial differential equations and related inverse problem in [31]. To be more precise, we use the Monte Carlo sampling to obtain the parameterization of the weak solution and the test function in the weak formulation of the fraction equations as two neural networks, which are trained alternately in a confrontational manner to obtain the solution of the given minimax problem. There are several advantages to our approach. Firstly, it can handle situations where the classical solution does not exist. Secondly, our method can overcome the “curse of dimensionality” as it uses the Monte Carlo sampling method within the region, thereby avoiding the grid division brought by traditional numerical methods.

The structure of the paper is as follows: In Sect. 2, we present the formulation of the problems addressed in this paper and establish their uniqueness. Section 3 provides an introduction to the WAN framework and outlines the proposed algorithm for training neural networks. In Sect. 4, we present the neural network architectures for the solution function and the test function, along with numerical examples to illustrate their effectiveness. Finally, Sect. 5 concludes the paper.

2 Preliminaries and problem formulation

In this section, we introduce the definitions of fractional operators and fractional-order spaces. These concepts will serve as the foundation for deriving the weak form of the fractional-order equation using the variational approach. The conditional uniqueness of our problem is proved.

2.1 Fractional-order derivative

For completeness, we first introduce the definition of Riemann–Liouville fractional integral:

Definition 2.1

(Riemann–Liouville Fractional Integral [23]). Let u be a \(L^1\) function defined on (ab), and \(\alpha >0\). Then the left and right Riemann–Liouville fractional integrals of order \(\alpha \) are defined as

$$\begin{aligned} \begin{aligned} J^{\alpha } u(x)&:=\frac{1}{\Gamma (\alpha )} \int \limits _a^x(x-w)^{\alpha -1} u(w) \textrm{d} w,\\ J^{\alpha }_{-} u(x)&:=\frac{1}{\Gamma (\alpha )} \int \limits _x^b(w-x)^{\alpha -1} u(w) \textrm{d} w, \end{aligned} \end{aligned}$$

where \(\Gamma (\cdot )\) is the Gamma function.

We can observe that in the case of the Riemann–Liouville fractional integral, if we assume the function u has some continuity properties, then when \(\alpha \) tends to 0, the integral yields the function itself [23, Section 2.3.2]. The Riemann–Liouville fractional derivatives are given by taking normal derivatives of the Riemann–Liouville fractional integrals:

Definition 2.2

(Riemann–Liouville Fractional Derivative [23]). For \(\alpha \in (0,1)\), the left and right Riemann–Liouville fractional derivatives of order \(\alpha \) are defined as follows:

$$\begin{aligned} \begin{aligned}&\partial ^{\alpha } u(x):=\frac{\partial }{\partial x}J^{1-\alpha } u(x)=\frac{1}{\Gamma (1-\alpha )} \frac{\partial }{\partial x} \int \limits _{a}^{x} u(s)(x-s)^{-\alpha } \textrm{d} s, \\&\partial _{-}^{\alpha } u(x):= - \frac{\partial }{\partial x} J^{1-\alpha }_{-} u(x)=\frac{-1}{\Gamma (1-\alpha )} \frac{\partial }{\partial x} \int \limits _{x}^{b} u(s)(s-x)^{-\alpha } \textrm{d} s. \end{aligned} \end{aligned}$$
(1)

To facilitate readers’ understanding of fractional calculus, we present the calculation process of fractional derivatives for two special functions. Let \(u(x) = x, \) with \( x>0 \) and set \(a = 0\), we calculate the left Riemann–Liouville fractional derivative of this function as follows:

$$\begin{aligned} \begin{aligned} \partial ^{\alpha } x&= \frac{\partial }{\partial x}\bigg [ \frac{1}{\Gamma (1-\alpha )} \int \limits _0^x s(x-s)^{-\alpha } d s\bigg ] \\ {}&=\frac{\partial }{\partial x}\bigg [ \frac{1}{\Gamma (1-\alpha )} \int \limits _0^x s^{2-1}(x-s)^{1-\alpha -1} d s\bigg ] \\ {}&{\mathop {=}\nolimits ^{s=\theta x}} \frac{\partial }{\partial x}\bigg [ \frac{1}{\Gamma (1-\alpha )} \int \limits _0^1 (\theta x)^{2-1}(x-\theta x)^{1-\alpha -1} x d \theta \bigg ] \\ {}&=\frac{\partial }{\partial x}\bigg [ \frac{x^{2-\alpha }}{\Gamma (1-\alpha )} \int \limits _0^1 \theta ^{2-1}(1-\theta )^{1-\alpha -1} d \theta \bigg ] \\ {}&=\frac{\partial }{\partial x}\bigg [ \frac{x^{2-\alpha }}{\Gamma (1-\alpha )} \frac{\Gamma (2)\Gamma (1-\alpha )}{\Gamma (3-\alpha )}\bigg ] = \frac{x^{1-\alpha } }{\Gamma (2-\alpha )}.\\ \end{aligned} \end{aligned}$$

Using a similar approach, one can calculate the left Riemann–Liouville fractional derivative of \(u(x) = 1, \)

$$\begin{aligned} \begin{aligned} \partial ^{\alpha } 1&= \frac{\partial }{\partial x}\bigg [ \frac{1}{\Gamma (1-\alpha )} \int \limits _0^x (x-s)^{-\alpha } d s\bigg ] =\frac{\partial }{\partial x}\bigg [ \frac{1}{\Gamma (1-\alpha )} \frac{x^{1-\alpha }}{1-\alpha }\bigg ] = \frac{x^{-\alpha } }{\Gamma (1-\alpha )}.\\ \end{aligned} \end{aligned}$$

From this, we observe the difference compared to integer-order differentiation. For example, Riemann–Liouville fractional differentiation yields a nonzero result for constants, whereas the integer-order differentiation of constants results in zero.

2.2 Stationary fractional advection dispersion equations

In this paper, we assume the domain of definition is a rectangular domain \(\Omega = \left( \underline{x}_{i}, \overline{x}_{i} \right) ^n\subset \mathbb {R}^n\). We use the notation \(\underline{{\ \cdot \ }}\) and \(\overline{{\ \cdot \ }}\) to denote the lower and upper boundaries in each direction. Now we are ready to give the Dirichlet problem for the stationary FADE:

$$\begin{aligned} \left\{ \begin{aligned} \mathcal {L} u&=f,{} & {} \text{ in } \Omega , \\ u&=g,{} & {} \text{ on } \partial \Omega , \end{aligned}\right. \end{aligned}$$
(2)

where

$$\begin{aligned} \mathcal {L} u:= -\sum _{i=1}^{n}\partial _{x_{i}}\left( p_{i} J^{\alpha }_{\underline{x}_i}+q_{i} J^{\alpha }_{\overline{x}_i-} \right) \partial _{x_{i}} u, \end{aligned}$$
(3)

with constants \(\alpha \in (0,1)\), and \(\sum _{i=1}^{n}(p_{i}+q_{i})=1\) with \( p_{i},q_{i}>0\) for \(1\le i \le n\). Here \(J^{\alpha }_{\underline{x}_i}\) denotes the left fractional integral over the interval \((\underline{x}_i,x_i)\), while \(J^{\alpha }_{\overline{x}_i-}\) represents the right fractional integral over the interval \((x_i, \overline{x}_i)\).

Now we introduce some function spaces that will be used in this paper. Recall the usual fractional-order Sobolev spaces \(H^{\alpha }(0,T)\), (see e.g., Adams[2]). Furthermore, we set \(H^0(0,T):= L^2(0,T)\), and denote \(H^1_0(\Omega )\) as the homogeneous Sobolev space, consisting of functions whose weak partial derivatives are integrable in the \(L^2\) sense over \(\Omega \) with vanishing traces on the boundary \(\partial \Omega \). Following the notation in [15], for any constant \(\alpha \in (0,1)\), we define the Banach spaces

$$ H_{\alpha }(0, T):=\left\{ \begin{aligned}&\left\{ v\in H^{\alpha }; v(0)=0\right\} ,{} & {} \frac{1}{2}<\alpha<1, \\&\left\{ v \in H^{\frac{1}{2}}(0, T);\, \int \limits _{0}^{T} \frac{|v(t)|^{2}}{t} d t<\infty \right\} ,{} & {} \alpha =\frac{1}{2}, \\&H^{\alpha }(0, T),{} & {} 0<\alpha <\frac{1}{2} \end{aligned}\right. $$

with the norm

$$ \Vert v\Vert _{H_{\alpha }(0, T)}=\left\{ \begin{aligned}&\Vert v\Vert _{H^{\alpha }(0, T)},{} & {} \alpha \ne \frac{1}{2}, \\&\left( \Vert v\Vert _{H^{\frac{1}{2}}}^{2}+\int \limits _{0}^{T} \frac{|v(t)|^{2}}{t} d t\right) ^{\frac{1}{2}},{} & {} \alpha =\frac{1}{2}. \end{aligned}\right. $$

Next we define the Sobolev spaces given by Riemann–Liouville fractional derivative for \(s\in \mathbb {R}_{+}\):

$$\hat{H}_{s}(\Omega )=\left\{ v\in L^2(\Omega ), \partial ^{s}_{x_{i}} v\in L^2(\Omega ), \,\text {for all }\, i=1,\dots ,n\right\} ,$$

with the norm

$$ \Vert v\Vert ^2_{\hat{H}_{s}(\Omega )}:=\sum _{i=1}^n\Vert \partial ^{s}_{x_{i}} v\Vert ^2_{L^2(\Omega )}. $$

The next proposition states that the \(L^2\)-norm of a function can be bounded by its \(H_{1-\alpha }\)-norm, as seen in [15, Theorem 2.1] and [21, Theorem 6.7]. Actually, it can be viewed as a special case of the aforementioned two references. For the convenience of readers’ comprehension, we also present a proof below.

Proposition 2.3

Given any \(\alpha \in (0,1)\), for any \(u\in H_{1-{\alpha }}(\underline{x},\overline{x})\), the following inequality holds

$$\begin{aligned} \Vert u\Vert _{L^2(\underline{x},\overline{x})} \le C \Vert J^{\alpha } \partial _x u \Vert _{L^2(\underline{x},\overline{x})}. \end{aligned}$$
(4)

where \(C>0\) is a constant.

Proof

Denote \(v:=J^{\alpha } \partial _x u\), by applying the fractional integral both sides we obtain \(u=J^{1-\alpha } v\). Therefore, it is sufficient to prove \(\Vert J^{1-\alpha } v\Vert _{L^2(\underline{x},\overline{x})} \le C \Vert v \Vert _{L^2(\underline{x},\overline{x})} \). Using Young’s inequality [2],

$$ \begin{aligned} \Vert J^{1-\alpha } v\Vert _{L^2(\underline{x},\overline{x})}&=\Vert \frac{1}{\Gamma (1-\alpha )} \int \limits _{x_l}^{x} (x-w)^{-\alpha }v(w)\textrm{d}w \Vert _{L^2(\underline{x},\overline{x})} \\&= C \Vert x^{-\alpha }*v\Vert _{L^2(\underline{x},\overline{x})}\\&\le C \Vert x^{-\alpha }\Vert _{L^1(\underline{x},\overline{x})} \Vert v\Vert _{L^2(\underline{x},\overline{x})} \\&\le C \Vert v\Vert _{L^2(\underline{x},\overline{x})}, \end{aligned} $$

which concludes the proof.

Now we state a conditional uniqueness result of (2).

Theorem 2.4

If the equation (2) admits a non-trivial solution \(u\in \hat{H}_{1}(\Omega )\), then u is unique in the sense of \(\hat{H}_{1-\alpha }(\Omega )\) for any fixed \(\alpha \in (0, 1)\), i.e. for any \(u_1, u_2 \in \hat{H}_{1}(\Omega )\) satisfying (2),

$$\Vert u_1-u_2\Vert _{\hat{H}_{1-\alpha }(\Omega )}=0.$$

Proof

First, we consider the homogeneous boundary condition \(g=0\). To prove the uniqueness result, it’s enough to prove

$$\begin{aligned} \Vert u\Vert _{\hat{H}_{1-\alpha }(\Omega )} \le C\Vert f\Vert _{L^2(\Omega )}. \end{aligned}$$
(5)

We denote coordinate notation as \(\varvec{x}:=(\hat{x}_i,x_i)\), here \(\hat{x}_i = (x_1, \cdots , x_{i-1},x_{i+1},\cdots , x_n)\). And the domain with respect to \(\hat{x}_i\) is denoted by \(\Omega ^{\prime }\). Multiplying both sides of (2) with u, the integration by parts gives that

$$ \begin{aligned}&- \int \limits _{\Omega ^{\prime }} \int \limits _{\underline{x}_i}^{\overline{x}_i} \left( \sum _{i=1}^{n}\partial _{x_{i}}\left( p_{i} J^{\alpha }_{\underline{x}_i}+q_{i} J^{\alpha }_{\overline{x}_i-} \right) \partial _{x_{i}} u \right) u \textrm{d}x_i\textrm{d}\hat{x}_i \\&\quad = -\int \limits _{\Omega ^{\prime }} \left( \left. \sum _{i=1}^{n}\left( p_{i} J^{\alpha }_{\underline{x}_i}+q_{i} J^{\alpha }_{\overline{x}_i-} \right) \partial _{x_{i}} u \right| _{x_i=\underline{x}_i}^{x_i=\overline{x}_i} \right) u \textrm{d}\hat{x}_i + \int \limits _{\Omega ^{\prime }} \int \limits _{\underline{x}_i}^{\overline{x}_i} \left( \sum _{i=1}^{n}\left( p_{i} J^{\alpha }_{\underline{x}_i}+q_{i} J^{\alpha }_{\overline{x}_i-} \right) \partial _{x_{i}} u \right) \partial _{x_{i}} u \textrm{d}x_i\textrm{d}\hat{x}_i \\&\quad = \int \limits _{\Omega ^{\prime }} \int \limits _{\underline{x}_i}^{\overline{x}_i} \left( \sum _{i=1}^{n}\left( p_{i} J^{\alpha }_{\underline{x}_i}+q_{i} J^{\alpha }_{\overline{x}_i-} \right) \partial _{x_{i}} u \right) \partial _{x_{i}} u \textrm{d}x_i\textrm{d}\hat{x}_i \\&\quad = \int \limits _{\Omega } fu\textrm{d}\varvec{x}. \end{aligned} $$

If we denote \(w:=J^{\alpha }_{\underline{x}_i} \partial _{x_i} u \), we can get \(\partial _{x_i} u = \partial _{x_i}^{\alpha }w\). Following the assumption \(u\in \hat{H}_1(\Omega )\), we have \(w(\cdot , \hat{x}_i)\in H_{\alpha }(\underline{x}_i,\overline{x}_i)\) for fixed \(\hat{x}_i\). Then by [15, Theorem 3.1]

$$ \int \limits _{\Omega ^{\prime }} \int \limits _{\underline{x}_i}^{\overline{x}_i} p_i J^{\alpha }_{\underline{x}_i} (\partial _{x_i} u) (\partial _{x_i} u) \textrm{d}x_i\textrm{d}\hat{x}_i=\int \limits _{\Omega ^{\prime }} \int \limits _{\underline{x}_i}^{\overline{x}_i} p_i w \cdot \partial _{x_i}^{\alpha } w \textrm{d}x_i\textrm{d}\hat{x}_i \ge C \int \limits _{\Omega ^{\prime }} p_i \Vert w(\cdot , \hat{x}_i)\Vert ^2_{L^2(\underline{x}_i,\overline{x}_i)} \textrm{d}\hat{x}_i. $$

By Definition 2.1, with variable substitution, the operator \(J^{\alpha }_{\overline{x}_i-}\) similarly satisfies the above inequalities for the operator \(J^{\alpha }_{\underline{x}_i}\), therefore we get

$$\begin{aligned} \begin{aligned}&\int \limits _{\Omega ^{\prime }} \int \limits _{\underline{x}_i}^{\overline{x}_i} \left( \sum _{i=1}^{n}\left( p_{i} J^{\alpha }_{\underline{x}_i}+q_{i} J^{\alpha }_{\overline{x}_i-} \right) \partial _{x_{i}} u \right) \partial _{x_{i}} u \textrm{d}x_i\textrm{d}\hat{x}_i \\&\quad \ge C \int \limits _{\Omega ^{\prime }} \sum ^{n}_{i=1} \left( p_i \Vert w(\cdot , \hat{x}_i)\Vert ^2_{L^2(\underline{x}_i,\overline{x}_i)} + q_i \Vert w(\cdot , \hat{x}_i)\Vert ^2_{L^2(\underline{x}_i,\overline{x}_i)} \right) \textrm{d}\hat{x}_i \\&\quad = C \sum _{i=1}^{n}\int \limits _{\Omega ^{\prime }} \Vert w(\cdot , \hat{x}_i)\Vert ^2_{L^2(\underline{x}_i,\overline{x}_i)}\textrm{d}\hat{x}_i. \end{aligned} \end{aligned}$$
(6)

Hereafter, we use the notation \(C>0\) to represent generic constants that are independent of the functions being considered but dependent on parameters such as \(\alpha \) and \(\Omega \).

On the other hand, we have

$$\begin{aligned} \int \limits _{x^{\prime }} \int \limits _{\underline{x}_i}^{\overline{x}_i} \left( \sum _{i=1}^{n}\left( p_{i} J^{\alpha }_{\underline{x}_i}+q_{i} J^{\alpha }_{\overline{x}_i-} \right) \partial _{x_{i}} u \right) \partial _{x_{i}} u \textrm{d}x_i\textrm{d}\hat{x}_i = \int \limits _{\Omega } f(\varvec{x})u(\varvec{x})\textrm{d}\varvec{x} \le \Vert u\Vert _{L^2(\Omega )} \Vert f\Vert _{L^2(\Omega )}. \end{aligned}$$
(7)

By Proposition 2.3, we obtain

$$\begin{aligned} \begin{aligned} \Vert J^{\alpha }_{\underline{x}_i} \partial _{x_i} u\Vert ^4_{L^2(\Omega )}&=\left( \int \limits _{\Omega ^{\prime }} \Vert J^{\alpha }_{\underline{x}_i} \partial _{x_i} u(\cdot , \hat{x}_i)\Vert ^2_{L^2(\underline{x}_i,\overline{x}_i)}\textrm{d}\hat{x}_i\right) ^2\\&= \Vert J^{\alpha }_{\underline{x}_i} \partial _{x_i} u\Vert ^2_{L^2(\Omega )} \left( \int \limits _{\Omega ^{\prime }} \Vert J^{\alpha }_{\underline{x}_i} \partial _{x_i} u(\cdot , \hat{x}_i)\Vert ^2_{L^2(\underline{x}_i,\overline{x}_i)}\textrm{d}\hat{x}_i \right) \\&\ge C \Vert J^{\alpha }_{\underline{x}_i} \partial _{x_i} u\Vert ^2_{L^2(\Omega )} \left( \int \limits _{\Omega ^{\prime }} \Vert u(\cdot , \hat{x}_i)\Vert ^2_{L^2(\underline{x}_i,\overline{x}_i)}\textrm{d}\hat{x}_i \right) \\&= C \Vert J^{\alpha }_{\underline{x}_i} \partial _{x_i} u\Vert ^2_{L^2(\Omega )} \Vert u\Vert ^2_{L^2(\Omega )}. \end{aligned} \end{aligned}$$
(8)

Combining (6) - (8), we find

$$ C\left( \sum _{i=1}^{n} \Vert J^{\alpha }_{\underline{x}_i} \partial _{x_i} u\Vert ^2_{L^2(\Omega )} \right) \Vert u\Vert ^2_{L^2(\Omega )} \le \Vert f\Vert ^2_{L^2(\Omega )}\Vert u\Vert ^2_{L^2(\Omega )}. $$

If \(\Vert u\Vert _{L^2(\Omega )}\ne 0\), then we proof

$$ \Vert u\Vert _{\hat{H}_{1-\alpha }(\Omega )} \le C \Vert f\Vert _{L^2(\Omega )}. $$

Thus, we prove (5). As for the inhomogeneous boundary condition, if there exists two different solutions \(u_1\) and \(u_2\). We can define \(u:=u_1-u_2\). Since the equation is linear, u is the solution for the homogeneous situation. The conclusion is consistent.

3 Fractional weak adversarial networks (f-WANs) framework

In this section, we derive the weak formulation of the fractional-order equation. Building upon this formulation, we propose a novel weak adversarial network for solving the equation.

3.1 The weak formulation of the model problem

In the case when the source term f in (2)-(3) is not smooth, the resulting solution u may not belong to the set of \(C^2(\Omega )\). This observation serves as the motivation to consider its weak formulation. In view of the assumption that \(p_i,q_i\) are all constants, we set \(p_i=q_i=\frac{1}{2n}\) for simplicity. The fractional equation (2)-(3) can be reformulated as follows:

$$\begin{aligned} \begin{aligned}&-\sum ^{n}_{i=1} \frac{\partial }{\partial x_i} \left( \int \limits _{\underline{x}_i}^{x_i} \frac{1}{\Gamma (\alpha )} (x_i-w)^{\alpha - 1} \frac{\partial u}{\partial w}(w,\hat{x}_i) \textrm{d}w \right) \\&- \sum _{i=1}^{n} \frac{\partial }{\partial x_i} \left( \int \limits _{x_i}^{\overline{x}_i} \frac{1}{\Gamma (\alpha )} (w-x_i)^{\alpha - 1} \frac{\partial u}{\partial w}(w,\hat{x}_i) \textrm{d}w \right) = f, \quad (\hat{x}_i, x_i) \in \Omega , \end{aligned} \end{aligned}$$
(9)

with boundary condition

$$\begin{aligned} u = g(\varvec{x}), \quad \text {on} \ \partial \Omega . \end{aligned}$$
(10)

Multiplying a test function \(v\in H^1_0(\Omega )\) on both sides of (9) and integrating by parts, we can obtain a weak formula for (9), which is given by

$$\begin{aligned} \begin{aligned} \langle \mathcal {L}[u],v\rangle :=&\sum _{i=1}^{n} \int \limits _\Omega \left( \int \limits _{\underline{x}_i}^{x_i} \frac{1}{\Gamma (\alpha )} (x_i-w)^{\alpha - 1} \frac{\partial u}{\partial w}(w, x^{\prime }) \textrm{d}w \right) \frac{\partial v}{\partial x_i} (x^{\prime },x_i) \textrm{d} x_i\textrm{d} x^{\prime }\\&+\sum _{i=1}^{n} \int \limits _\Omega \left( \int \limits _{x_i}^{\overline{x}_i} \frac{1}{\Gamma (\alpha )} (w-x_i)^{\alpha - 1} \frac{\partial u}{\partial w}(w,x^{\prime }) \textrm{d}w \right) \frac{\partial v}{\partial x_i}(x^{\prime },x_i) \textrm{d} x_i \textrm{d} x^{\prime }\\&-\int \limits _{\Omega } f v \textrm{d}\varvec{x} = 0. \end{aligned} \end{aligned}$$
(11)

Meanwhile, we define the following form corresponding to the Dirichlet boundary condition (10):

$$\begin{aligned} \mathcal {B}[u]:= \left. (u - g)\right| _{\partial \Omega }. \end{aligned}$$
(12)

We also note that when the boundary condition in (2) is given in Neumann type, i.e. \(\partial _{n}u=g\) on \(\partial \Omega \), then \(\mathcal {B}[u]\) can be defined as

$$ \mathcal {B}[u]:= \left. (\frac{\partial u}{\partial \textbf{n}} - g)\right| _{\partial \Omega }. $$

Before moving forward, we can see that our equation (11) is actually a degenerate form of the variational formulation in two-dimensional space presented in [5]. In [5], the authors consider the directional derivative in any direction in 2D, whereas we have fixed the direction of the derivative here. However, the problem we consider can be extended to n dimensions.

3.2 Induced operator norm minimization

The above weak formula (11) can induce an operator norm:

Definition 3.1

We define the operator norm

$$\Vert \mathcal {L}[u]\Vert _{op}:= \max _{v\in H^1_0(\Omega ), v\ne 0} \frac{\left| \langle \mathcal {L}[u],v\rangle \right| }{\Vert v\Vert _2},$$

where \(\Vert v\Vert _2=(\int \limits _{\Omega } |v(x)|^2\textrm{d}x)^{1/2}. \)

This gives the definition of the operator norm of \(\mathcal {L}[u]\) induced from \(L^2\) norm. Here the linear functional \(\mathcal {L}[u]:H^1_0(\Omega ) \mapsto \mathbb {R}\) such that \(\mathcal {L}[u](v) \triangleq \langle \mathcal {L}[u],v\rangle \) for fractional equations.

Lemma 3.2

Under the assumption that \(u\in \hat{H}_{1}(\Omega )\), satisfies the boundary condition (12) on \(\partial \Omega \), then u is the unique weak solution in \(\hat{H}_{1}(\Omega )\) of equation (9) if and only if \(\Vert \mathcal {L}[u]\Vert _{op}=0\).

Proof

After applying Theorem 2.4, the subsequent steps of the proof follow a similar approach to the one presented in [31, Theorem 1].

By the above lemma, since \(\Vert \mathcal {L}[u]\Vert _{op}\ge 0\) for any \(u\in \hat{H}_{1}(\Omega )\), \(\Vert \mathcal {L}[u]\Vert _{op}\) achieves its minimum over \(\hat{H}_1(\Omega )\) when u is the weak solution of (9). Based on the above analysis, we can formulate the following minimax problem:

$$\begin{aligned} \min _{u\in \hat{H}_{1}} \Vert \mathcal {L}[u]\Vert _{op}= \min _{u\in \hat{H}_{1}} \max _{v\in H^1_0} \frac{|\langle \mathcal {L}[u],v\rangle |^2}{\Vert v\Vert ^2_2}. \end{aligned}$$
(13)

In the next section, we will propose a neural network to find the optimal solution u for the minimax problem (13).

3.3 Weak adversarial network framework

We parameterize (13) using neural networks. Let \(u_\theta :\mathbb {R}^d \rightarrow \mathbb {R}\) and \(v_\eta :\mathbb {R}^d \rightarrow \mathbb {R}\) denote the parameterization of u and v, respectively, where \(\theta \) and \(\eta \) are the trainable model weights. Then we can express the minimax problem as follows:

$$\begin{aligned} {\min _{\theta } \max _{\eta }} \frac{|\langle \mathcal {L}[u_\theta ],v_\eta \rangle |^2}{\Vert v_\eta \Vert ^2_2}. \end{aligned}$$
(14)

During the learning process of the networks, we first fix \(\eta \) and optimize \(\theta \) to minimize (14). Once we acquire the optimal \(\theta \), we fix it and optimize \(\eta \) to challenge \(\theta \) and maximize (14). The neural network approximated solution is obtained after steps of loop iterations. The schematic of the WAN methods is shown in Fig. 1.

Fig. 1
figure 1

Schematic of the WAN for solving fractional partial differential equations

In the interior of \(\Omega \), the objective function of \(u_\theta \) and \(v_\eta \) is

$$ L_{\textrm{int}}(\theta , \eta ) \triangleq |\langle \mathcal {L}[u_\theta ],v_\eta \rangle |^2 / \Vert v_\eta \Vert ^2_2. $$

In the meantime, the weak solution on the boundary \(\partial \Omega \) must satisfy the boundary condition (12). Therefore, the objective function is given by:

$$ L_{\textrm{bound}}(\theta ) \triangleq \left| u_{\theta }-g\right| ^2. $$

To sum up, we combine the two objective functions of the interior and the boundary to obtain the total objective function, given by:

$$\begin{aligned} \min _\theta \max _\eta L(\theta , \eta ),\quad \text {where}\quad L(\theta , \eta ) \triangleq L_{\textrm{int}}(\theta , \eta ) + \beta L_{\textrm{bound}}(\theta ), \end{aligned}$$
(15)

with \(\beta \) is a regularization parameter that balances the relative importance of the interior and boundary terms.

3.4 Stochastic approximation of operators and training algorithm

Unlike the outer integral, which is over the whole domain, the inner integral has a changeable upper limit that depends on the value of the outer integral. In order to discretize the inner integral, we begin by discretizing the outer integral and then generate points within the domain corresponding to the current value of the outer integral using Monte Carlo sampling. Following this, we employ the same method to approximate the inner integral based on the generated points.

To illustrate the steps of the algorithm, without loss of generality, we consider the model problem in the two dimensional case. In particular when \(n=2\), we write \(\mathcal {L} u =f\) as

$$\begin{aligned} -\partial _x \left( p_1 J^{\alpha }_{x_l}+q_1 J^{\alpha }_{x_u-} \right) \partial _x u -\partial _y \left( p_2 J^{\alpha }_{y_l}+q_2 J^{\alpha }_{y_u-} \right) \partial _y u=f, \end{aligned}$$
(16)

where to simplify the notation we use xy to denote the variables in the two dimensions and subscript \({\cdot }_l\) and \({\cdot }_u\) for the constants related to the lower and upper boundaries. The parameterized weak formula is given by

$$\begin{aligned} \begin{aligned} \langle \mathcal {L}[u_\theta ],v_\eta \rangle =&\int \limits _{\Omega } \left( \int \limits _{x_l}^{x} \frac{1}{\Gamma (\alpha )} (x-w)^{\alpha - 1} \frac{\partial u_\theta }{\partial w}(w, y) \textrm{d}w \right) \frac{\partial v_\eta }{\partial x} (x,y) \textrm{d}x\textrm{d}y\\&+ \int \limits _{\Omega } \left( \int \limits _{x}^{x_u} \frac{1}{\Gamma (\alpha )} (w-x)^{\alpha - 1} \frac{\partial u_\theta }{\partial w}(w, y) \textrm{d}w \right) \frac{\partial v_\eta }{\partial x} (x,y) \textrm{d}x\textrm{d}y \\&+\int \limits _{\Omega } \left( \int \limits _{y_l}^{y} \frac{1}{\Gamma (\alpha )} (y-w)^{\alpha - 1} \frac{\partial u_\theta }{\partial w}(x,w) \textrm{d}w \right) \frac{\partial v_\eta }{\partial y} (x,y) \textrm{d}x \textrm{d}y \\&+\int \limits _{\Omega } \left( \int \limits _{y}^{y_u} \frac{1}{\Gamma (\alpha )}(w-y)^{\alpha - 1} \frac{\partial u_\theta }{\partial w}(x,w) \textrm{d}w \right) \frac{\partial v_\eta }{\partial y} (x,y) \textrm{d}x \textrm{d}y \\&-\int \limits _{\Omega } f v_\eta (x,y) \textrm{d}x \textrm{d}y. \end{aligned} \end{aligned}$$
(17)

Many traditional methods have been proven effective in dealing with singular integrals. In this paper, in order to avoid the “curse of dimensionality”, the Monte Carlo method is employed. Traditional meshing requires an exponentially increasing number of nodes as the dimensionality d increases, resulting in high computational costs. However, with the emergence of neural networks, we can conduct high-dimensional numerical experiments at a low cost, even on personal computers.

We denote \(\{ (x_i, y_i) \}_{i=0}^{M_I}\) for the collocation points in the interior domain of \(\Omega \) and \(\{ (x_i, y_i) \}_{i=0}^{M_B}\) for the collocation points on the boundary \(\partial \Omega \). Then the fifth term of the right hand side of Eq. (17) can be handled by

$$ RHS_5 = \int _\Omega f v_\eta (x,y) \textrm{d}x \textrm{d}y \approx \frac{1}{M_I} \sum ^{M_I}_{i=1} fv_\eta (x_i,y_i). $$

However, the first four remaining terms on the right-hand side require more attention because they involve integrals with limits that depend on the values of the outer integral variables x and y. To be more specific, when the outer layer is discretized into \(x_i\) (or \(y_i\)), the inner upper limit of the integral will change along with \(x_i\) (or \(y_i\)) in the outer layer. To differentiate between the discrete \(x_i\) (or \(y_i\)) values in various intervals, we introduce the notation \(x_i^l\) (or \(y_i^l\)) for the lower interval and \(x_i^u\) (or \(y_i^u\)) for the upper interval. For the sake of convenience, we will only consider the first and second terms of Eq. (17):

$$\begin{aligned} \begin{aligned} RHS_{1,2} =&\int _\Omega \left( \int _{x_l}^{x} \frac{1}{\Gamma (\alpha )} (x-w)^{\alpha - 1} \frac{\partial u_\theta }{\partial w}(w, y) \textrm{d}w \right) \frac{\partial v_\eta }{\partial x} (x,y) \textrm{d}x\textrm{d}y \\&+ \int _\Omega \left( \int _x^{x_u} \frac{1}{\Gamma (\alpha )} (w-x)^{\alpha - 1} \frac{\partial u_\theta }{\partial w}(w, y) \textrm{d}w \right) \frac{\partial v_\eta }{\partial x} (x,y) \textrm{d}x\textrm{d}y \\ \approx&\sum _{i=1}^{M_I} \frac{1}{M_I}\left( \int _{x_l}^{x_i^l} \frac{1}{\Gamma (\alpha )} (x_i^l-w)^{\alpha - 1} \frac{\partial u_\theta }{\partial w}(w, y_i^l)\textrm{d}w \right) \frac{\partial v_\eta }{\partial x} (x_i^l,y_i^l) \\&+ \sum _{i=1}^{M_I} \frac{1}{M_I}\left( \int _{x_i^u}^{x_u} \frac{1}{\Gamma (\alpha )} (w-x_i^u)^{\alpha - 1} \frac{\partial u_\theta }{\partial w}(w,y_i^u) \textrm{d}w \right) \frac{\partial v_\eta }{\partial x} (x_i^u,y_i^u) \\ \approx&\sum _{i=1}^{M_I} \sum _{j=1}^{N} \frac{1}{M_I \cdot N} \frac{1}{\Gamma (\alpha )} (x_i^l-w_j^{x_l})^{\alpha - 1} \frac{\partial u_\theta }{\partial w}(w_j^{x_l},y_i^l) \frac{\partial v_\eta }{\partial x} (x_i^l,y_i^l) \\&+ \sum _{i=1}^{M_I} \sum _{j=1}^{N} \frac{1}{M_I \cdot N} \frac{1}{\Gamma (\alpha )} (w_j^{x_u}-x_i^u)^{\alpha - 1} \frac{\partial u_\theta }{\partial w}(w_j^{x_u}, y_i^u) \frac{\partial v_\eta }{\partial x} (x_i^u,y_i^u), \end{aligned} \end{aligned}$$
(18)

here \(w_j^{x_l}\) (or \( w_j^{x_u}), j=1,\cdots , N\) means the collocation points in the range of \([x_l,x_i^l]\) (or \([x_i^u,x_u])\) for \(i=1,\cdots ,M_I\). Finally, we arrive at the following approximation for the weak form:

$$ \begin{aligned} \langle \mathcal {L}[u_\theta ],v_\eta \rangle \approx&\sum _{i=1}^{M_I} \sum _{j=1}^{N} \frac{1}{M_I \cdot N} \frac{1}{\Gamma (\alpha )} \left( (x_i^l-w_j^{x_l})^{\alpha - 1} \frac{\partial u_\theta }{\partial w}(w_j^{x_l},y_i^l) \frac{\partial v_\eta }{\partial x} (x_i^l,y_i^l) \right. \\ {}&\left. + (w_j^{x_u}-x_i^u)^{\alpha - 1} \frac{\partial u_\theta }{\partial w}(w_j^{x_u}, y_i^u) \frac{\partial v_\eta }{\partial x} (x_i^u,y_i^u) \right) \\&+ \sum _{i=1}^{M_I} \sum _{j=1}^{N} \frac{1}{M_I \cdot N} \frac{1}{\Gamma (\alpha )} \left( (y_i^l-w_j^{y_l})^{\alpha - 1} \frac{\partial u_\theta }{\partial w}(x_i^l,w_j^{y_l}) \frac{\partial v_\eta }{\partial x} (x_i^l,y_i^l) \right. \\ {}&\left. + (w_j^{y_u}-y_i^u)^{\alpha - 1} \frac{\partial u_\theta }{\partial w}(x_i^u,w_j^{y_u}) \frac{\partial v_\eta }{\partial x} (x_i^u,y_i^u) \right) - \frac{1}{M_I} \sum ^{M_I}_{i=1} fv_\eta (x_i,y_i). \end{aligned} $$

And the loss function on the boundary \(\partial \Omega \) is defined by

$$ L_{\textrm{bound}} = \sum _{i=1}^{M_B} \frac{1}{M_B} \left| u_{\theta }(x_i, y_i)-g(x_i,y_i)\right| ^2. $$

Based on the stochastic approximations of the interior and boundary objective functions discussed above, we can ultimately obtain the total objective function

$$\begin{aligned} L(\theta ,\eta ):= L_{int} + \beta L_{bound} = |\langle \mathcal {L}[u_\theta ],v_\eta \rangle | / \Vert v_\eta \Vert _2^2 + \beta \sum _{i=1}^{M_B} \frac{1}{M_B} \left| u_{\theta }(x_i,y_i)-g(x_i,y_i)\right| ^2. \end{aligned}$$
(19)

While looking for the saddle point of formula (19), we use TensorFlow [1] to automatically calculate \(\nabla _\theta L(\theta ,\eta )\) and \(\nabla _\eta L(\theta ,\eta )\). The resulting algorithm is outline in Algorithm 1.

Algorithm 1
figure a

Weak adversarial network (WAN) for solving fractional differential equations

4 Numerical experiments

4.1 Experiment setup

Recall that we have introduced two networks in Sect. 3.3 to approximate the weak solution u and test function v, namely \(u_\theta \) and \(v_\eta \). All neural networks and algorithm presented in this paper are implemented using TensorFlow. Unless stated otherwise, we use Adam [14] as the optimizer with a step size 0.0015 for \(u_\theta \) and 0.04 for \(v_\eta \). The parameters of the networks are initialized randomly according to TensorFlow’s default procedure.

For network \(v_\eta \), we adopt a structure that consists of 6 hidden layers, each with 50 neurons. The activation function of the first two layers is \(\tanh \), while the activation function for even layers is softplus and for odd layers is \(\sinh \). The last output layer does not have an activation function. The structure of network \(u_\theta \) differs slightly form the fully connected feed forward networks used in most cases. We add a convolutional layer to increase the expressiveness of the neural network and reduce the number of neurons in other layers. Thus, \(u_\theta \) has 6 hidden layers, each with 20 neurons. The activation functions of \(u_\theta \) are \(\tanh \) for the first two layers, and softplus for even layers and \(\sinh \) for odd layers. We add a convolutional layer in the fifth layer.

4.2 Experimental results

4.2.1 The solution of the 2-dimensional fractional equation exhibits smoothness

We start with fractional equation with \(d=2\) in (9) along with boundary condition (10). For convenience, we denote equation (9) as \(\mathcal {D}^\alpha _x u=f\), where \(\Omega =(0,1)^2\). The exact solution is given by \(u(x,y)=x^2y^2\) and the right-hand side of the equation can be calculated accordingly. The hyper-parameters are given by \(M_I=2500,M_B=400,N=50,K_u=1,K_v=1,\tau _\theta =0.0015,\tau _\eta =0.04,\beta =1000000\). And in Table 1, we give the reference of notations used in the algorithm.

Table 1 List of algorithm parameters

In our first numerical experiment, we evaluate the feasibility of the proposed f-WANs method for solving fractional differential equations by applying Algorithm 1 for 2000 iterations with different fractional-order values \(\alpha \). The true solution of u is shown in Fig. 2. Specifically, we choose \(\alpha =0.3,0.6,0.9\) to demonstrate the effectiveness of the proposed method. It is worth noting that for the inner and outer integral formulations in (18), we sample the data points using different methods. For the outer definite integral with respect to the variables x and y, we generate data points with a uniform distribution. However, for the inner integral with respect to w, we divide the interval into N sub-intervals of equal width. To clarify, let’s take the one-dimensional case as an example. We divide \([0,x_i]\) and \([x_i,1]\) into \([0,\frac{1}{N}x_i,\cdots ,\frac{N-1}{N}x_i,x_i]\) and \([x_i,\frac{1+(N-1)x_i}{N},\cdots ,\frac{N-1+x_i}{N},1]\)) for \(i \in [1, M_I]\). Then, we calculate the left singular integral \( \int _{0}^{x_i}(x_i-w)^{\alpha -1}f(w)dw \approx \frac{1}{N} \sum \limits _{i=1}^{N} \big (x_i - \frac{i-1}{N}x_i\big )^{\alpha -1}f\big (\frac{i-1}{N}x_i\big ). \) And the right singular integral \( \int _{x_i}^{1}(w- x_i)^{\alpha -1}f(w)dw \approx \frac{1}{N} \sum \limits _{i=1}^{N} \big (\frac{i+1}{N}x_i - x_i \big )^{\alpha -1}f\big (\frac{i+1}{N}x_i\big ). \) In other words, for the selection of integration points \(w_i\) with respect to the variable w depends on the discrete points of \(x_i\). Taking this approach can somewhat attenuate the effect of the singular kernel function on the value of the numerical integration. This approach can alleviate the randomness and instability of the whole neural network system, despite the emergence of singular integrals that may cause larger errors. Nonetheless, the overall relative error is approximately \(3\%\), which demonstrates the feasibility of the proposed method. The trend in Fig. 3 shows that the error increases as the value of \(\alpha \) decreases. This is consistent with our expectations, as smaller \(\alpha \) corresponds to higher singularities in the solution of (9), leading to larger discretization errors.

Fig. 2
figure 2

True solution of \(u(x,y) = x^2y^2\)

Fig. 3
figure 3

The first row displays the predicted values of the WAN for \(\alpha =0.3,0.6\) and 0.9, respectively, from left to right. The second row shows the corresponding errors \(|u_{\text {prediction}} - u_{\text {true}}|\)

4.2.2 The solution of the 2-dimensional fractional equation exhibits smoothness with noise

To investigate the robustness of the WAN with respect to the choice of \(\alpha \), we use the same data as in 4.2.1 except for a small random perturbation to the Dirichlet value g on the boundary \(\partial \Omega \). More precisely, we define \(g^{\delta }=g+\delta \max \{g\} \xi \), where \(\xi \) is a Gaussian random distribution with zero mean and unit variance. Here the noise level \(\delta \) is set to be \(5\%\). To ensure fairness, we set the same random seed to generate \(\xi \) and add noise to the Dirichlet values g when solving the equations corresponding to different \(\alpha \). The results are presented in Fig. 4. In the first row of Fig. 4, we have highlighted the contour line of 0 in the lower left corner of the image. As we perturb the Dirichlet value of the boundary, g is not exactly equal to 0 on the left and lower sides of \(\partial \Omega \). As a result, the predictions obtained by the neural network have a small disturbance in the lower left corner. However, they are still very close to 0 in value, and there is no significant difference in magnitude on the colorbar. Furthermore, we have also considered different levels of noise, and the results show that the error increases as the noise level increases.

Fig. 4
figure 4

Results with different \(\alpha \) while noise level chosen by \(\delta =5\%\). The first line displays the predictive values of the f-WANs for \(\alpha =0.3\), 0.6, and 0.9, from left to right. The second row shows the corresponding errors, i.e., the absolute difference between the predicted solution \(u^{\delta }_{\text {prediction}}\) and the true solution \(u_{\text {true}}\)

4.2.3 The 2-dimensional fractional equation exhibits a solution with less smoothness

In this example, we consider the solution of the model with less smoothness. For instance, the solution of the model problem is chosen by \( u(x,y)=x^{\frac{16}{15}}y^{\frac{16}{15}}.\) To better illustrate the results near the point (0, 0), where the true solution becomes singular after taking the \(2-\alpha \) derivative, we employ a logarithmic scale for both the x and y coordinates. The true solution is shown in Fig. 5. In this example, we set \(M_I=2500,M_B=400,N=50,K_u=1,K_v=1,\tau _\theta =0.0001,\tau _\eta =0.001,\beta =1000000\). In Fig. 6, we illustrate the numerical solutions with less smoothness, demonstrating that our method remains effective.

Fig. 5
figure 5

True solution of \(u(x,y) = x^{\frac{16}{15}}y^{\frac{16}{15}}\)

Fig. 6
figure 6

Numerical solutions of the 4.2.3 with less smoothness. In the first row, from left to right, is the predictive value of the WAN when alpha is equal to 0.2, 0.4 and 0.6. While the second row is the corresponding residual, i.e. \(|u_{\text {prediction}} - u_{\text {true}}|\)

4.2.4 The 3-dimensional fractional equation exhibits a solution with smoothness

In this subsection, we demonstrate the effectiveness of f-WANs in solving 3D problems. The higher dimensionality does not introduce any significant differences. Moreover, we will investigate the correlation between \(\alpha \) and relative error. Here we consider the fractional equation with \(d=3\) in (9) along with boundary condition (10). We consider the model problem in \(\Omega =(0,1)^3\). The exact solution is given by \(u(x,y,z)=x^2 y^2z^2\). First, in Fig. 7, we give the volume slice planes of model problem with \(\alpha =0.5\). We select the slice with \(x=0.8\) and \(z=0.7\) since the points with large values are primarily concentrated near the point (1, 1, 1).

Fig. 7
figure 7

Volume slice planes of model problem in 3-d with \(\alpha =0.5\). The left image is the true value and the right one is the prediction. Both images have the same slice locations

To better visualize the subtle difference between the true value and the prediction, we chose slices at \(z=1\) and \(z=x\), and plotted the two-dimensional images. Figures 8 and 9 show the cases where \(\alpha \) is equal to 0.1 and 0.9, respectively. In the first line of each figure, we show the slice of \(z=1\), where the true value is \(u(x,y)=x^2y^2\), and its projection on the \(x-y\) plane. On the second line, we show the projection of the \(x=z\) plane onto the \(x-y\) plane, where the true value is \(u(x,y)=x^4y^2\). Regardless of whether \(\alpha \) is equal to 0.1 or 0.9, f-WANs performs well. Although the error near the point (1, 1) reached about \(8\%\), the overall relative error is still around \(3.5\%\), as given in equation (20).

Fig. 8
figure 8

Results of 4.2.4 with \(\alpha =0.1\). The first row displays the true value \(u_{\text {true}}\), the predictive value \(u_{\text {prediction}}\), and the difference \(|u_{\text {true}}-u_{\text {prediction}}|\) while the slice is \(z=1\). The second row presents the corresponding images with slice \(z=x\)

Fig. 9
figure 9

Results of 4.2.4 with \(\alpha =0.9\). The first row displays the true value \(u_{\text {true}}\), the predictive value \(u_{\text {prediction}}\), and the difference \(|u_{\text {true}}-u_{\text {prediction}}|\) while the slice is \(z=1\). The second row presents the corresponding images with slice \(z=x\)

Finally, we present the relationship between \(\alpha \) and relative error in Fig. 10. This figure is generated using the same parameter settings, and we repeat the experiment ten times, taking the average of the results. In equation (9), there is a term \(|x-w|^{\alpha - 1}\), where the exponent \(\alpha -1 \in (-1,0)\) since \(\alpha \in (0,1)\). As \(\alpha \) approaches zero, the singularity caused by \(|x-w|^{\alpha -1}\) becomes more pronounced. The relative error, measured in the \(\ell ^2\)-norm, is given by:

$$\begin{aligned} \frac{\Vert u_{prediction} - u_{true} \Vert _{\ell ^2}}{\Vert u_{true} \Vert _{\ell ^2}}. \end{aligned}$$
(20)
Fig. 10
figure 10

Relative error for different values of \(\alpha \)

Fig. 11
figure 11

Volume slice planes 4.2.5 with \(\alpha =0.4\). The left image is the true value, and the right one is the prediction. Both images have the same slice locations

Fig. 12
figure 12

Results of 4.2.5 with \(\alpha =0.4\). The first row displays the true value \(u_{\text {true}}\), the predictive value \(u_{\text {prediction}}\), and the difference \(|u_{\text {true}}-u_{\text {prediction}}|\), while the slice is \(z=1\). The second row presents the corresponding images with slice \(z=x\)

4.2.5 The 3-dimensional fractional equation exhibits a solution with less smoothness

In this example, we choose the true solution by \(u(x,y,z)=x^{\frac{16}{15}}+y^{\frac{16}{15}}+z^{\frac{16}{15}}\) in three dimensional case. To enhance the expressive power of the network, we increase the number of neurons in each hidden layer of the neural network \(u_\theta \) to 40 while keeping the rest of the structure the same. We choose the slice at \(y=0.7\) and \(z=0.8\). We set \(\alpha =0.4\). The volume slice planes are shown in Fig. 11, while the projection of the true value onto the plane \(z=1\) and \(z=x\) with logarithmic scale for coordinates is shown in Fig. 12.

5 Concluding remarks

In this paper, we propose a novel structure of neural network called f-WANs, based on the weak form of the FADE, which shows its efficiency to handle both smooth and less smooth solutions in high dimensions. Our approach combines Monte Carlo sampling method with the neural network to approximate the solution of the fractional differential equation, which can be extend to general fractional differential equations that admits a variational form. Our experiments focus on 2D and 3D problems defined on rectangle domains, but it is possible to extend our proposed architectures to the general convex bounded Lipschitz domains by carefully reparametrizing the domain, although this may involve technical considerations and efforts. Our proposed neural network has its limitations and cannot effectively solve all types of equations. GANs face challenges, particularly with functions that have multiple local minima and maxima. We do not think that our approach overcomes the challenges and difficulties faced by traditional GANs. Developing approaches to solve functions with multiple saddle points is a valuable direction for future work. And the choice of hyperparameters may have an impact on the experimental results, and we will consider how to choose the optimal hyperparameters in the near future.