1 Introduction

Simulation of complex physical systems described by nonlinear partial differential equations (PDEs) is central to reservoir engineering. The governing equations for subsurface flow in porous media involve multiphase flow equations, and they may be coupled with other physical laws to describe geomechanical and thermal effects (Ajayi and Gupta 2019). Conventional numerical simulations employ finite element methods (FEM) and finite difference methods (FDM) to solve these PDEs through discretization (Wen and Benson 2021; Benson and Cole 2008; Benson et al. 2012). However, the combination of discretization schemes with time-stepping iteration strategies is time-consuming, particularly due to the multi-physics, nonlinearity, and multi-scale nature of these problems. Large-scale reservoir development projects require extensive computational effort for uncertainty quantification, history matching, and development optimization, which requires a large number of numerical simulations. Hence, leveraging the latest advancements in artificial intelligence to develop surrogate models for the oil industry  (Bagheri and Riahi 2015; Rahmanifard and Plaksina 2019; Bagheri and Rezaei 2019; Shi et al. 2020; Zare et al. 2020; Moosavi et al. 2022; Li et al. 2024) can enhance the traditional iterative process of solving partial differential equations, leading to a notable reduction in computational costs and a faster workflow..

The surrogate model, based on deep learning methods, can achieve real-time predictions once trained with a reasonable amount of data from full-physics simulations (Zhu and Zabaras 2018; Mo et al. 2019; Tang et al. 2020; Wen et al. 2021). Numerous deep learning techniques have been proposed to construct fast and accurate alternative models for numerical reservoir simulations.

Among these techniques, pure data-driven models, based on convolutional neural networks (CNN), focus on learning mappings in Euclidean space from traditional numerical simulation data (Mo et al. 2019; Liu et al. 2023; Fu et al. 2023). In this approach, surrogate models are constructed to rapidly predict pressure and saturation fields, as well as wellbore flow rates. These models are employed for history matching, well control optimizations, uncertainty quantification, and various other applications (Tang et al. 2021; Wang et al. 2021; Huang et al. 2023; Zhong et al. 2019; Liu et al. 2019; Kim and Durlofsky 2021). To date, this method has successfully predicted multiphase flow in three-dimensional reservoirs (Jiang et al. 2021; Tang et al. 2021; Wen et al. 2021; Wu and Qiao 2021). However, the mapping of data-driven finite-dimensional operators between finite-dimensional spaces is prone to overfitting and may be inconsistent with the governing physics laws  (Anandkumar et al. 2020; Li et al. 2023). It requires a substantial number of numerical simulation results as input data, which may be unmanageable as the dimension of the problem increases  (Wen et al. 2022; Zhang et al. 2023). Additionally, the data-driven approach only produces valid results for specific spatial and temporal meshes, limiting its practical applications for reservoir simulation on an industrial grid scale of tens of millions (Li et al. 2020b; Nasir and Durlofsky 2023; Jiang and Durlofsky 2023).

Physics-informed or physics-constrained deep learning methods embed the governing equations in the loss function of neural networks (Haghighat and Juanes 2021; Raissi et al. 2019; Zhu et al. 2019). One popular approach is the physics-informed neural network (PINN) method, which integrates domain knowledge and physical principles into the architecture, thereby transforming the traditional numerical simulation process into an optimization task. A key advantage of this method is its ability to make accurate predictions for PDE solutions even with limited labeled data (Raissi et al. 2019). Researchers have employed the PINN method to solve the one-dimensional Buckley–Leverett problem involving two-phase flow in porous media (Fuks and Tchelepi 2020), simulate the two-dimensional single-phase groundwater flow process (Wang et al. 2021), and forecast variables such as pressure, gas saturation, the water extraction rate in \(\hbox {CO}_2\) EOR (Shokouhi et al. 2020) process. However, merely penalizing deviations from the governing physics equations or domain knowledge by adding hard physical constraints to the loss function is not suitable for large-scale spatiotemporal domain simulation problems and struggles with complex PDEs (Zhang et al. 2022). For PINN and its extension methods, changes in the parameters of the equations or the initial conditions require the model to be retrained. Pang et al. (2019); Yu et al. (2022); Wu et al. (2023).

In recent years, the deep neural operator has emerged as a novel approach designed to learn the infinite-dimensional mapping from PDE solution  (Bhattacharya et al. 2021; Lu et al. 2019; Li et al. 2020c). Unlike other deep learning methods such as CNN and PINNs, the deep neural operator is aimed at learning a family of PDEs. The trained model can achieve better generalization under varying PDE parameters and initial boundary conditions. The surrogate model established by this method has demonstrated promising potential for applications in reservoir engineering. However, due to the high cost of evaluating global neural integral operators, previously proposed deep neural operators have not yet reached the desired computational efficiency. The Fourier neural operator (FNO) is a new framework for solving PDEs, proposed by NVIDIA’s machine learning group in the 2021 International Conference on Learning Representations (ICLR) (Li et al. 2020b). FNO enhances the computational efficiency of previous deep neural operators through two-dimensional fast Fourier transform (FFT). Zhang et al. (2022) first evaluated the potential of the FNO surrogate model for two-dimensional oil–water subsurface flow. Wen et al. (2022) evaluated the potential of the FNO surrogate model for two-dimensional \(\hbox {CO}_2\)-water flows. Yan et al. (2022a) developed a deep learning workflow based on FNO to predict pressure distributions and \(\hbox {CO}_2\) plumes during injection and post-injection periods of geological \(\hbox {CO}_2\) sequestration (GCS) operations. Kuang et al. (2023) recently compared the performance of FNO and U-net for well rate forecasting in a two-dimensional fractured reservoir, and observed that the FNO-based approach showed faster training speed, higher accuracy, and more robust ability to describe fracture features.

For three-dimensional geometric systems (xyz), FNO suffers from high GPU memory consumption and computational cost due to an large number of model parameters, along with the use of computationally expensive three-dimensional FFT operations (Jiang et al. 2023). Wen et al. (2023) proposed a nested FNO architecture, which integrates the FNO with a semi-adaptive local grid refinement (LGR) modeling method for numerical simulations. This approach significantly reduces the computational costs associated with data collection in simulating four-dimensional problems. Furthermore, Yan et al. (2022b) developed a deep learning workflow that employs FNOs alongside feature coarsening methods, which improved the prediction accuracy of large-scale geological \(\hbox {CO}_2\) storage models, reducing the GPU memory requirements and enhancing the training efficiency. Overall, however, limited work has been done to reduce the time consumption of FNO during the training process for subsurface flow problems when dealing with four-dimensional spatiotemporal state variables.

In this work, to improve the computational speed of the FNO in a four-dimensional spatiotemporal reservoir, we developed a new framework that can simulate four-dimensional subsurface flows based on the three-dimensional FNO network with a domain decomposition method. This new framework decomposes the four-dimensional spatiotemporal domain in the z-dimension, trains the FNO network independently in the three-dimensional spatiotemporal domain in each z-layer, and predicts the distribution of pressure and flow fields of the four-dimensional spatiotemporal domain by combining the three-dimensional domain (xyt) results in all z-layers. This method employs three-dimensional FNOs along the x-axis and y-axis and incorporates the time component, thus circumventing the substantial computational expenses associated with direct predictions of four-dimensional flow state variables using existing FNOs. To demonstrate the applicability of our proposed approach to large and complex reservoirs, we conducted several tests, which included injecting \(\hbox {CO}_2\) into a reservoir with complex fractures in case 2 to predict the time-series flow state distribution. Additionally, in another real reservoir in case 3, we successfully established a surrogate model for well rate production under time-varying well conditions.

The remainder of this paper is organized as follows. Section 2 describes a new framework that extends FNO-net from the three-dimensional spatiotemporal domain to the four-dimensional spatiotemporal domain. Section 3 describes three test cases and discusses the corresponding results. In Sect. 4, the applications and potential extensions of the proposed approach are discussed. Section 5 concludes this study. The detailed loss functions for boundary conditions and initial conditions are provided in the Appendix.

2 Methodology

In this section, firstly, we describe the subsurface flow governing equations in porous media and the FNO-net architecture. Then, we introduce a novel methodology that simulates four-dimensional subsurface flow based on the FNO and domain decomposition method. The workflow of this new method, model training details, loss function design, and error measurement method are also illustrated. Compositional simulation is used in the study, and the governing equation for each component is written as

$$\begin{aligned} \frac{\partial }{\partial t}\left( \phi \sum _{j} (x_{i,j} \rho _j S_j) \right) + \nabla \sum _{j}\left( x_{i,j}\rho _jq_j\right) +\sum _{j} x_{i,j} \rho _j Q_j = 0, \end{aligned}$$
(1)

where subscript i denotes the primary fluid components, j denotes the fluid phase, t represents time, \(\phi \) is the porosity, \(x_{ij}\) is the concentration of component i in phase j, \(\rho _j\) is the fluid phase density, \(S_j\) is the phase saturation, \(Q_j\) is the source term, \(\mu _j\) is the viscosity of phase j, and \(q_j\) is the Darcy flux of phase j. Darcy’s law for each phase flow can be written as

$$\begin{aligned} q_j= -{\varvec{k}}\frac{k_{rj}}{\mu _j}(\nabla P -\rho _jg\nabla Z), \end{aligned}$$
(2)

where \(k_{rj}\) is the relative permeability of phase j that is a function of \(S_j\), P is pressure (capillary pressure is ignored in this study), g is the gravity constant, and Z represents depth.

Furthermore, we utilize an embedded discrete fracture model (EDFM) to simulate the fluid flow process in heterogeneous reservoirs with complex fracture distributions. Initially introduced by Li and Lee (2008), the EDFM has been further developed in subsequent studies by Moinfar et al. (2014) and Ţene et al. (2017). The fluid cross-flow between the porous matrix and fractures is described in this study by the following equation (Zeng et al. 2019)

$$\begin{aligned} q_{mf}= \frac{2l_{mf}({\varvec{K}}_m\cdot {\varvec{n}})\cdot {\varvec{n}}}{\mu }\nabla P, \end{aligned}$$
(3)

where \({\varvec{n}}\) is the normal unit vector of the fracture element, \({\varvec{K}}_m\) represents the absolute permeability tensor in the global meshes, \(l_{m_f}\) is the length between the fracture and mesh matrix, and \(\mu \) is the fluid viscosity. In the EDFM, the fracture element and matrix mesh are defined in two systems, and there is no requirement to generate complex meshes, which alleviates the computational cost for complex fracture modeling. Within a physics-driven reservoir simulator, the state variables such as pressure and saturation are computed through an iterative process.

2.1 FNO Architecture

The FNO operator is a newly proposed neural operator whose main objective is to learn the mapping between two infinite-dimensional spaces from the observed finite input–output pairs by the Fourier transform (Li et al. 2020a). The mapping is

$$\begin{aligned} G : A \rightarrow U , \end{aligned}$$
(4)

where \( A \) is the input function space, and \( U \) is the output function space. As shown in Fig. 1, the FNO approximation of this mapping mainly includes the following three steps: (1) The input observation a(x) is lifted to a potential representation \(v_{0}\) in higher-dimensional space through a simple shallow fully connected network \(v_{0}(x)=P(a(x))\). (2) Iterative Fourier layers are implemented, namely \(v_{0}\rightarrow v_{1} \rightarrow v_{2} \rightarrow \cdot \cdot \cdot \rightarrow v_{N}\). The concrete iterative process between elements of this sequence \(v_{n} \rightarrow v_{n+1}\) can be written as

$$\begin{aligned} v_{n+1}(x) = \sigma (W v_{n}(x) + ( K (a;\phi )v_{n}(x))(x)), \end{aligned}$$
(5)

where \(\sigma \) is the activation function, W is the network weight, and \(( K (a;\phi )v_{n}(x))(x)\) is the bias. The kernel integral transformation parameterized in a neural network is denoted as \( K \). The FNO kernel operator can be written as

$$\begin{aligned} ( K (a;\phi )v_{t}(x))(x) = F ^{-1}(R_{\phi }\cdot ( F v_{k}))(x), \end{aligned}$$
(6)

where \( F \) denotes the Fourier transform and \( F ^{-1}\) denotes the inverse Fourier transform. The linear transform \(R_{\phi }\) represents a restriction operator which can filter the higher models in Fourier space. (3) After N Fourier layer iterations, the output z(x) can be obtained by the fully connected neural network transformation Q, z(x) = \(Q(v_{N}(x))\). The full architecture of the FNO is shown in Fig. 1 and is described in detail in the original publication Li et al. (2021).

Fig. 1
figure 1

Full architecture of FNO: input PDE properties a(x) go through fully connected network P to \(v_{0}(x)\) and then go through the iteration in the Fourier layers to obtain \(v_{n+1}(x)\), and finally go through Q to the spatial dimension of the solution z(x)

2.2 New Methodology to Extend FNO-Net from Three Dimensions (xyt) to Four Dimensions (xyzt)

In this section, to circumvent the computational burden incurred by directly predicting four-dimensional flow state variables, our framework proposes the application of three-dimensional FNO across the x-axis and y-axis, together with the time component. The detailed procedure for the new framework is as follows: (1) Decompose the four-dimensional spatiotemporal domain in the z-dimension. (2) Train multiple FNO-net in the three-dimensional spatiotemporal domain. (3) Predict the state variables in subsequent time steps for the entire three-dimensional domain. (4) Obtain the four-dimensional spatiotemporal solution by re-coupling the three-dimensional prediction results in the z-dimension.

Fig. 2
figure 2

Framework of new methodology to extend FNO-net from three-dimensional (xyt) to four-dimensional (xyzt)

The whole framework can be divided into the following three parts in Fig. 2. Part 1 (red box in Fig. 2) illustrates the solution of global governing equations. After running numerical simulations, we obtain the simulation results for n time steps (\(M_{l_{(x, y, z, t_1)}}\), \(M_{l_{(x, y, z, t_2)}}\), \(M_{l_{(x, y, z, t_3)}}\),..., \(M_{l_{(x, y, z, t_n)}}\)). Part 2 (purple box in Fig. 2) depicts the decomposition of the simulation results across n time steps in the z-dimension. The decomposition process can be expressed as

$$\begin{aligned} \begin{array}{l} M_{l(x, y, z_{1}, t)} \text {::} M_{l(x, y, z_{1}, t_{1})}, M_{l(x, y, z_{1}, t_{2})}, M_{l(x, y, z_{1}, t_{3})}, \ldots , M_{l(x, y, z_{1}, t_{n})} \\ M_{l(x, y, z_{2}, t)} \text {::} M_{l(x, y, z_{2}, t_{1})}, M_{l(x, y, z_{2}, t_{2})}, M_{l(x, y, z_{2}, t_{3})}, \ldots , M_{l(x, y, z_{2}, t_{n})} \\ M_{l(x, y, z_{3}, t)} \text {::} M_{l(x, y, z_{3}, t_{1})}, M_{l(x, y, z_{3}, t_{2})}, M_{l(x, y, z_{3}, t_{3})}, \ldots , M_{l(x, y, z_{3}, t_{n})} \\ \quad \vdots \\ M_{l(x, y, z_{n}, t)} \text {::} M_{l(x, y, z_{n}, t_{1})}, M_{l(x, y, z_{n}, t_{2})}, M_{l(x, y, z_{n}, t_{3})}, \ldots , M_{l(x, y, z_{n}, t_{n})}{,} \end{array} \end{aligned}$$
(7)

where \(M_l\) denotes the flow field distributions of pressure, phase saturation, and component concentration in each phase;  :  :  denotes an operator of the domain decomposition which can decompose the simulation results of n time steps in the z-dimension. After domain decomposition in the z-dimension at every time step, we can obtain a simulation results matrix R. The matrix can be written as

$$\begin{aligned} R=\left[ \begin{array}{c} M_{l\left( x, y, z_{1}, t_{1}\right) }, M_{l\left( x, y, z_{1}, t_{2}\right) }, M_{l\left( x, y, z_{1}, t_{3}\right) } \cdots M_{l\left( x, y, z_{1}, t_{n}\right) } \\ M_{l\left( x, y, z_{2}, t_{1}\right) }, M_{l\left( x, y, z_{2}, t_{2}\right) }, M_{l\left( x, y, z_{2}, t_{3}\right) } \cdots M_{l\left( x, y, z_{2}, t_{n}\right) } \\ M_{l\left( x, y, z_{3}, t_{1}\right) }, M_{l\left( x, y, z_{3}, t_{2}\right) }, M_{l\left( x, y, z_{3}, t_{3}\right) } \cdots M_{l\left( x, y, z_{3}, t_{n}\right) } \\ \vdots \\ M_{l\left( x, y, z_{n}, t_{1}\right) }, M_{l\left( x, y, z_{n}, t_{2}\right) }, M_{l\left( x, y, z_{n}, t_{3}\right) } \cdots M_{l\left( x, y, z_{n}, t_{n}\right) } \end{array}\right] {.} \end{aligned}$$
(8)

The first row R in Eq. 8 denotes the first-layer results for n time steps. The second row of R contains the second-layer results for n time steps, and so on. Each column of R contains the simulation results from time step \(t_1\) to time step \(t_n\) that can be used as input data to train a three-dimensional FNO-net and predict the subsequent time step state variables at \(t_n+N\) (N is the number of predicted time steps). The training and prediction process can be written as

$$\begin{aligned} FNO _{i} \left( R(i,{:})^{t_{1}-t_{n}}\right) =\delta _{i}^{t_{n}+N}, \end{aligned}$$
(9)

where R(i,  : ) are the simulation results for the ith layer. In the training process, R(i,  : ) is first divided into a training dataset and a test dataset. Then these datasets are used as input data to the ith FNO-net (\( FNO _{i}\)) for training. Part 3 (blue box in Fig. 2) shows the re-coupling module for reassembling the prediction results of \(t_n+N\) time steps of each layer.

After FNO predicts the resulting state distribution at \(t_n+N\) for each layer, we collect the solutions for all layers and reconstruct the three-dimensional state distribution. The re-coupling process is

$$\begin{aligned} {\mathscr {C}}:\left\{ \delta _{1}^{t_{n}+N},\delta _{2}^{t_{n}+N},\delta _{3}^{t_{n}+N}... \delta _{n}^{t_{n}+N}\right\} = \delta ^{t_{n}+N}, \end{aligned}$$
(10)

where \({\mathscr {C}}\) represents the process of re-coupling the predicted results in z-dimension, and \(\delta ^{t_{n}+N}\) denotes the final results of four-dimensional subsurface flow distributions. In solving physical equations to obtain the training data needed for our model (state distribution data from previous time steps), the simulator already takes into account both vertical and cross-flows. Therefore, hierarchical training of FNO can capture the change of z-direction during the prediction process.

2.3 Loss Function Design and Evaluation Metrics

The total loss function employed in this study comprises three components: (1) \(L_2\)-loss, (2) boundary condition (BC) loss, and (3) initial condition (IC) loss. The total loss function is formulated as

$$\begin{aligned} \text {Loss}_\textrm{total}=L_2(u,{\hat{u}})+\lambda _1\text {Loss}_{BC}+\lambda _2\text {Loss}_{IC}. \end{aligned}$$
(11)

In our study, the reservoir is modeled with no-flow boundary conditions. Both BC and IC can be easily computed via penalty terms due to their straightforward structure. Details of the loss functions for boundary conditions and initial conditions are provided in the Appendix. The \(L_2\)-loss used to train the deep learning models is defined as Pecha and Horák (2018)

$$\begin{aligned} L_2(u,{\hat{u}})=\frac{\Vert u-{\hat{u}}\Vert _2}{\Vert u \Vert _2}+\beta \frac{\Vert du/dr-d{{\hat{u}}}/dr \Vert _2}{\Vert du/dr \Vert _2}, \end{aligned}$$
(12)

where \({\hat{u}}\) denotes the predicted output values, which can be a matrix, and du/dr denotes the first derivative of the input data; \(d{{\hat{u}}}/dr\) is the first derivative of the predicted output values, \(\Vert \Vert _2\) is \(L_2\) norm, and \(\beta \) is a hyperparameter of the network.

Now, we introduce several error metrics which will be used to evaluate the performance of the model proposed in this work. The relative error for predicted results is defined as

$$\begin{aligned} A_{i}= & {} \frac{1}{T}\frac{1}{N}\sum _{t=1}^{T}\sum _{n=1}^{N}\left| u_{i}- {\hat{u}}_{i} \right| , \end{aligned}$$
(13)
$$\begin{aligned} a_{i}= & {} \frac{1}{T}\frac{1}{N}\sum _{t=1}^{T}\sum _{n=1}^{N}\left| u_{i} \right| , \end{aligned}$$
(14)
$$\begin{aligned} \text {RE}_{i}= & {} \frac{A_{i}}{a_{i}}, \end{aligned}$$
(15)

where T represents the number of simulated time steps, N represents the number of grids, \(u_{i}\) represents the true value of pressure, phase saturation, component concentration in one phase, and so on, and \({\hat{u}}_{i}\) represents the corresponding predicted value in the output. The variable \(A_{i}\) is the average absolute error, \(a_{i}\) is the average value of the actual value, and \(\text {RE}_{i}\) represents the relative error.

3 Application Cases

In this section, we demonstrate the performance of our new framework across three cases. The first and second cases involve complex \(\hbox {CO}_2\) injection for enhanced oil recovery. The first case studies a three-dimensional heterogeneous reservoir measuring 250 feet \(\times \) 250 feet \(\times \) 50 feet with an eight-component fluid. The second case is a three-dimensional heterogeneous fractured reservoir, measuring 1,280 feet \(\times \) 1,280 feet \(\times \) 50 feet, with the same fluid components. The number of fractures in this case is 42. The FNO-net structure for each layer of training is illustrated in Table 7 for our two test cases. The third case focuses on dynamic production prediction under varying well control conditions in a three-dimensional heterogeneous gas reservoir measuring 1,120 feet \(\times \) 1,350 feet \(\times \) 670 feet. Details regarding data processing (preprocessing and post-processing), model training, and testing are discussed later in the section. Predictions and error measurements are provided for gas saturation, pressure, and \(\hbox {CO}_2\) concentration in the gas phase.

The construction of the four-dimensional FNO neural network structure in this paper is based on PyTorch, a popular open-source machine learning software library. The three different numerical cases in this work were implemented on a DELL Precision T7920 workstation with two Inter(R) Xeon(R) Gold 6230R CPUs@ 2.10GHz, and two NVIDIA GeForce RTX 3090 Ti GPUs. All code and data required to reproduce the results presented in the paper will be available at https://github.com/HPMPS upon publication.

3.1 Case 1: Prediction of Time Series State Distribution in a Three-Dimensional Reservoir

In this case study, we utilized the method proposed in Sect. 2 to predict subsurface flow state distribution, including the gas saturation, pressure, and \(\hbox {CO}_2\) concentration in the gas phase, in the subsequent time steps under a three-dimensional heterogeneous reservoir. The Stanford University new-generation reservoir simulator ADGPRS (Younis 2011; Zhou 2012) was used to generate synthetic data for model training and testing.

In case 1, the reservoir domain is discretized using a Cartesian grid with dimensions of 25\(\times \)25\(\times \)5 grid cells in the \(x\), \(y\), and \(z\) directions, respectively. Each grid cell measures 10 feet \(\times \) 10 feet \(\times \) 10 feet. The total number of grid cells is 3,125, and the reservoir contains two wells. The injection well is located at coordinates (25,25), and the production well is at (1,1), with each well perforating all five layers vertically. The \(\hbox {CO}_2\) injection rate is set at 3,500 Mscf/day at the injection well, and the total liquid production rate is 2,000 STB/day at the production well. Permeability distributions (\(k_x\), \(k_y\)) were generated using the SGeMS (Stanford Geostatistical Modeling Software) sequential Gaussian simulation for a grid size of 25\(\times \)25. The values for \(k_x\) and \(k_y\) in each layer are displayed in Fig. 3. The fluid behavior is modeled using the Peng-Robinson equation of state (EOS) (Robinson and Peng 1978), suitable for an eight-component fluid, with fluid properties summarized in Table 8.

Fig. 3
figure 3

Permeability fields \(k_x\),\(k_y\) (layer1 to layer5)

We take the pressure and saturation distribution of the first 10 time steps as input and the state distribution of last 40 time steps as output. The simulator runs for 10,500 days with 1,050 time steps, 10 days for each time step. After the numerical simulator runs, the four-dimensional subsurface flow state distribution results obtained by the simulation are decomposed into the three-dimensional state distribution of five layers in the z-direction according to Eq. 7. In each layer, the FNO model training, testing, and validation are described as follows. Dataset organization is such that each batch has 50 consecutive time steps: batch 1 is 1 to 50 time steps, batch 2 is 2 to 51 time steps, and so on. The last batch is 1,000 to 1,050 time steps. The total batch number is 1,000.

Figure 4 illustrates the preparation of batches. For each batch, the values of the first 10 time steps are used as input data to train the FNO-net and predict the state values as \({\hat{u}}\) in Eq. 12. Then, the true values in the remaining 40 time steps are introduced into Eq. 12 as u to obtain \(L_2\) loss. By combining BC loss and IC loss, we optimize the total loss in Eq. 11 by the Adam optimization method. Among the 1,000 batches, 800 batches were randomly selected as the training data set and the remaining 200 batches as the testing data set. This method of assigning training sets and verification sets is widely used in machine learning research (Hu and Zhang 2022). To demonstrate the prediction capability in time series, the values in the three 50, 250, and 450 time steps are used as the validation data and are excluded in training and testing. The input shape for the network is (1, 000, 25, 25, 1, 10), where 1,000 denotes the number of samples, 25 denotes the number of grids in the x-dimension, the other 25 is the number of the grids in the y-dimension, 1 is the required placeholder for variables, and 10 is the first 10 time steps in the time series. The output shape is (1, 000, 25, 25, 40), where 40 is the remaining 40 time steps. The results show that our model accurately predicted the subsurface flow state distributions in the future 40 time steps based on the current state distributions of 10 time steps. Because the three-dimensional FNO model training in a layer is independent of all other layers, we can train all three-dimensional FNO models in parallel to speed up the training process.

Fig. 4
figure 4

Dataset organization

Training for 500 epochs takes about 7,500 s with the data from 1,050 time steps on an NVIDIA GeForce RTX 3090 Ti GPU computer. Each epoch takes 15 s in training. Then, it takes 3.2 s to predict the outputs for subsurface flow state distributions in the future 40 time steps. For case 1, we studied the effect of different numbers of Fourier layers on the model accuracy. The results are shown in Table 1. The results of the ablation study show that our network structure with five Fourier layers can achieve the best performance in training time and average relative error. Therefore, all test cases in the work adopt the same network structure, which includes five Fourier layers.

Table 1 Effect of the different number of Fourier layers

We now consider the relationships between the relative total loss and the number of training runs. Figure 5 shows the losses for the five layers based on training and validation data over 500 epochs. Generally, decreasing the learning rate by 0.1 every 100 epochs has a positive effect on the entire training process. Both training and validation losses were reduced similarly in all scenarios, indicating that the optimization did not overfit the model to the training data. The relative total loss from training can be reduced to about 0.03 with 100 epochs, and the losses decrease steadily and rapidly. As the number of epochs increases, the relative total loss from training can be further reduced to about 0.01 with 500 epochs.

Fig. 5
figure 5

Training and testing accuracy versus the number of the epochs for case 1. Each epoch takes about 15 s in the NVIDIA GeForce RTX 3090 Ti GPU

Our model accurately predicted the gas saturation and pressure fields in the three-dimensional heterogeneous reservoir for the compositional modeling of the eight-component fluid. To investigate the relative error on every layer of our method, we employed Eqs. 1315 to evaluate the relative errors. Figure 6 displays the relative errors in pressure across five layers at three different testing time steps: 50, 250, and 450. The results, as shown in Fig. 6, indicate that the values of average relative error range from 0.2% to 1.2% for all test time steps.

Fig. 6
figure 6

Relative errors of pressure field in different layers (layer 1 to layer 5) with 500 epochs training for case 1. The orange line in each box indicates the relative error of the median, and the two sides of the bottom and top indicate the 25th and 75th percentile error, respectively. The “whiskers” protruding from the box indicate minimum and maximum errors

Figure 7 depicts the gas saturation fields obtained by both the simulator and our network model using 1,000 samples at the 250th time step. The figure demonstrates that the model achieves higher approximation accuracy for the strongly nonlinear and discontinuous gas saturation field, as expected. It also highlights areas where the position of front discontinuity has higher approximation errors in all instances. Increasing the training sample size may allow the model to capture better characterization of the local nonlinear features and improve accuracy in the gas saturation at the front where \(\hbox {CO}_2\) is injected. Even with a small portion of training data used in this study, the maximum error of the gas saturation remains within the acceptable range.

Fig. 7
figure 7

Prediction of gas saturation distribution in layer 1, layer 3, and layer 5 at the 250th time step (case 1): network results (first column), numerical simulation results (second column), errors (third column)

In addition, for the gas saturation distribution, we compare the prediction results of different layers at the same time step. For example, in Fig. 7, layer 1 (the first row), layer 3 (the second row), and layer 5 (the third row) demonstrate that as \(\hbox {CO}_2\) injection progresses, the rate of change in the gas saturation field in the deeper layers is slower than in the shallower layers. This phenomenon may be attributed to the influence of the gravity field in the z-direction. After numerical simulation of the three-dimensional reservoir, the field distribution at each time step, considering vertical flow, is obtained. We decouple the simulation results of the entire four-dimensional reservoir along the z-dimension to obtain \(n_z\) layer fields at each time step. These data are then used to train the three-dimensional FNO network models in every z-layer. Thus, the network accounts for the influence of vertical flow during the training process. The predicted results further validate that our model effectively captures the impact of the gravity field on the four-dimensional \(\hbox {CO}_2\) EOR simulation in the spatial z-axis.

As computations are independent in each layer, training and prediction in the four-dimensional spatiotemporal domain can be performed in parallel to expedite the process. We also developed four-dimensional visualization Python code to display the four-dimensional prediction results. Figure 8 shows the visualization of the four-dimensional gas saturation distribution for case 1 at the 50th (first row), 250th (second row), and 450th (third row) time steps, respectively. With this four-dimensional visualization software, we can quickly and clearly view the prediction results and overall errors of four-dimensional subsurface flow.

Fig. 8
figure 8

Visualization of four-dimensional gas saturation distribution: network results (first column), numerical simulation results (second column), errors (third column) for case 1 at the 50th (first row), 250th (second row), and 450th (third row) time steps, respectively

Here, we compared the performance of our proposed method with the original FNO method for time-dependent three-dimensional problems (four-dimensional problems) in terms of GPU memory consumption, number of parameters, and single-epoch training speed. Performing the fast Fourier transform (FFT) on a tensor with input data size N remains time-consuming, despite the optimized \(O(N \log N)\) complexity introduced by Cooley and Tukey (Cooley and Tukey 1965). The FFT layers are the most time-consuming steps in the FNO architecture, dominating the overall time complexity.

Table 2 presents the time and spatial complexity analysis for both methods. For input data with mesh size \(X \times Y \times Z \times T\), where T represents the total time snapshots and \(m_x, m_y, m_z, m_t\) are the FFT modes in each dimension:

  1. (1)

    The time complexity for training an epoch using the original FNO method is \(O((m_xX \cdot m_yY \cdot m_zZ \cdot m_tT) \log (m_xX \cdot m_yY \cdot m_zZ \cdot m_tT))\).

  2. (2)

    The time complexity for our proposed method is \(Z \times O((m_xX \cdot m_yY \cdot m_tT) \log (m_xX \cdot m_yY \cdot m_tT))\).

  3. (3)

    The spatial complexity for the original FNO method is \(O(m_xX \cdot m_yY \cdot m_zZ \cdot m_tT)\).

  4. (4)

    The spatial complexity for our proposed method is \(Z \times O(m_xX \cdot m_yY \cdot m_tT)\).

The improvement in time complexity is quantified as

$$\begin{aligned} R_{T(n)} \approx m_z \left( 1 + \frac{\log (m_zZ)}{\log (N)}\right) , \end{aligned}$$
(16)

where N is proportional to \(m_xX \cdot m_yY \cdot m_tT\). This indicates a reduction factor greater than \(m_z\) as the number of layers Z increases.

The improvement in space complexity is given by

$$\begin{aligned} R_{S(n)} \approx m_z, \end{aligned}$$
(17)

indicating that the space complexity reduction is directly related to the number of FFT modes in the z-direction.

Table 2 The time and spatial complexity analysis of the original FNO method and our proposed method on four-dimensional problems, \(N \propto m_{x}X \cdot m_{y}Y \cdot m_{t}T\)

In this experiment, we chose the dataset for case 1 with mesh sizes of 25, 25, 5, and 10. Considering the high GPU memory consumption of the original FNO method, we chose an NVIDIA A800 with 80-GB GPU memory for this comparison. The detailed results are shown in Table 3. Compared with the original FNO, our proposed method only has 6.3% of the number of parameters and needs much less GPU memory (<20%).

Table 3 Comparison between original FNO method and our proposed method on four-dimensional problems. Batch size is set as 1 for all the experiments

3.2 Case 2: Prediction of Time Series State Distribution in a Three-Dimensional Fractured Reservoir

In the second test case, the reservoir was divided uniformly into a 128 \(\times \) 128 \(\times \) 50 Cartesian grid, with each grid measuring 10 feet \(\times \) 10 feet \(\times \) 10 feet. Additionally, 42 fractures were integrated into the reservoir. The total number of grids is 81,920 with two wells. The injection well is located in (128,128) and the production well in (1,1). Both wells penetrate the whole reservoir vertically. The injection well operates under gas rate control at 3,500 \(\mathrm{Mscf/day}\), and the production well under a total liquid rate control of 2,000 \(\mathrm{STB/day}\). The well configurations are depicted in Fig. 9. We employed the SGeMS sequential Gaussian simulation to generate permeability distributions (kx, ky) with a grid size of 128 \(\times \) 128. The fracture parameters including position, permeability in horizontal directions, permeability in vertical directions, porosity in fracture, and aperture of fracture are listed in Table 9. The distribution of these fractures is shown in Fig. 9. In this case, we adopted the eight-component fluid model shown in Table 8, which is the same as those used in case 1.

Fig. 9
figure 9

Reservoir and wells in case 2. The blue square at the lower left corner of the configuration is the production well. The red square at the upper right corner of the configuration is the \(\hbox {CO}_2\) injection well. The blue lines at the domain are the 42 natural fractures

We ran ADGPRS to obtain synthetic data for training and testing, including pressure, gas saturation, and \(\hbox {CO}_{2}\) concentration in the gas phase. The simulation runs a total of 10,00 days with 1,050 time steps, 10 days for each time step. The selection of training data and the training process are the same as in case 1.

Table 4 The cost of training and prediction

We evaluated the relative error in the pressure field by Eqs. 1315. Figure 10 displays the relative error of pressure in five layers at the 50th, 250th, and 450th testing time steps. Note that the average values for relative error across all test time steps range from 0.5% to 1.4%. These results are from training over 500 epochs. For this case, training 500 epochs takes about 8,150 s with the data from 1,050 time steps on an NVIDIA GeForce RTX 3090 Ti GPU computer. Each epoch takes 16.3 s. Then, it takes 3.5 s to predict the outputs for subsurface flow state distributions in the subsequent 40 time steps. In comparison, the traditional solver ADGPRS, which uses the GMRES linear solver with the two-stage CPR preconditioners, takes 18.01 s for every single time step, and about 720 s for all 40 time steps. Table 4 shows the cost of training and prediction for case 1 and case 2. In this study, we chose 600, 800, 1,000, and 1,200 samples to optimize the number of batches. Table 5 shows that the frequency term 1,000 can keep the average relative error within a reasonable range.

Fig. 10
figure 10

Relative errors of pressure field in different layers (layer 1 to layer 5) with 500 epochs training for case 2. The orange line in each box indicates the relative error of the median, and the two sides of the bottom and top indicate the 25th and 75th percentile error, respectively. The “whiskers” protruding from the box indicate minimum and maximum errors

Fig. 11
figure 11

Cross-plots of predicted pressure and ground truth values for five layers (layer 1 to layer 5, in order from left to right, top to bottom) for case 2. Red dots represent results at the 50th time step, blue dots at the 250th time step, and green dots at the 450th time step

Table 5 Effect of the different numbers of frequency terms. ARE is the average relative error of pressure

For the pressure distribution, to evaluate the predictive capability of our network, we present cross-plots of predicted and ground truth values for five layers (layer 1 through layer 5) in Fig. 11. Figure 11 shows that all testing points fall near the \(\hbox {45}^{\circ }\) line, demonstrating that our predicted solutions are in close agreement with the ground truth values.

Figure 12 illustrates the comparison of the gas saturation distribution at the 250th time step between the simulator and our network model. Figure 12 shows that the distribution of gas saturation is very complex due to the presence of fractures, while the overall gas saturation error is within 0.06. The error on the boundary is within 0.01. Our model, as expected, achieves higher approximation accuracy for the strongly nonlinear and discontinuous gas saturation field in the fractured reservoir. In addition, the error of the model mainly exists in the injected gas front.

Fig. 12
figure 12

Prediction of gas saturation distribution in layer 1, layer 3, and layer 5 at the 250th time step (case 2): network results (first column), numerical simulation results (second column), errors (third column)

Figure 13 depicts a comparison of the \(\hbox {CO}_2\) concentration in the gas phase at the 450th time step between the simulator and our network model. Similar to the prediction of gas saturation, the maximum predicted error is around the gas front (about 0.1 mole fraction). The results also show that the error of \(\hbox {CO}_2\) concentration in the gas field is larger than that in the pressure field and gas saturation field. This discrepancy may be attributed to the compositional simulation, which presents a significant challenge to our model’s predictive capability for \(\hbox {CO}_2\) concentration in the gas phase. To address this issue, our subsequent work will involve incorporating physical laws related to thermodynamic phase equilibrium as constraints in our network loss function to enhance the prediction accuracy of \(\hbox {CO}_2\) concentration in the gas phase.

Fig. 13
figure 13

Prediction of \(\hbox {CO}_2\) concentration in the gas phase in layer 1, layer 3, and layer 5 at the 450th time step (case 2): network results (first column), numerical simulation results (second column), errors (third column)

Figure 14 shows the visualization of the four-dimensional gas saturation distribution for case 2 at the 50th, 250th, and 450th time steps, respectively. The figure reveals good agreement between the predicted values and reference results.

Fig. 14
figure 14

Visualization of four-dimensional gas saturation distribution: network results (first column), numerical simulation results (second column), errors (third column) for case 2 at the 50th (first row), 250th (second row), and 450th (third row) time step, respectively

3.3 Case 3: Well Production Prediction in the Three-Dimensional Reservoir with Varying Well Controls

In reservoir engineering applications, it is usually necessary to allocate daily or weekly production or injection rates among hundreds of wells for a fixed geological model. Reservoir engineers employ optimization algorithms in conjunction with surrogate models to adjust well controls based on predicted outputs. Surrogate modeling must accurately predict production under varying well control conditions. To demonstrate the performance of our proposed algorithm in well rate production prediction under time-varying well conditions, we test a new application case. This surrogate model built by our proposed algorithm can make a dynamic prediction of the well rate production under time-varying well conditions.

Fig. 15
figure 15

The overall architecture of the surrogate modeling process for the test case 3

The overall architecture of the surrogate modeling process for case 3 is shown in Fig. 15. The input is the pressure, saturation field at \(t_{n}\), and the specified well control at \(t_{n+1}\). The output is the pressure and saturation fields at \(t_{n+1}\). Considering that case 3 is a gas reservoir that produced very little water in the early stages of development, gas saturation was only used for this work. Subsequently, the pressure and saturation values in the well grids at \(t_{n+1}\) were extracted, and the production was predicted by the multilayer perceptron (MLP) network.

Case 3 involves a three-dimensional gas reservoir with an oil-gas-water system. The permeability distributions and the well configurations are shown in Fig. 16. Nine production wells are located in the reservoir, and all producers are controlled by the gas rate target (GRAT). The reservoir is discretized using a Cartesian grid with dimensions of 112 \(\times \) 135 \(\times \) 67 gird cells in the x, y, and z directions, totaling 1,013,040 grid cells.

Fig. 16
figure 16

The permeability field and the well configurations for test case 3

For dataset preparation, we employed a Computer Modelling Group Ltd. (CMG) IMEX simulator to develop and simulate multiple cases as the training and testing datasets. We simulate production from January 1, 2011, to September 1, 2022, 1 month for each time step. All well controls are adjusted once every month. The control conditions (GRAT) are from 50,362.200 to 252,411.000 \(m^{3}/day\) with normal distribution. A total of 550 well controls were generated and simulated, among which 500 cases were used for training and 50 for testing.

The network inputs for each FNO are shown in Table 6. The inputs can be divided into two types: space-dependent inputs (field inputs) and scalar inputs. Field inputs include state distribution of pressure and saturation at \(t_{n}\). Considering that case 3 is an irregularly shaped reservoir, we use zero-padding to denote cells that are outside the real reservoir. Therefore, the data structures for field input variables are matrices with dimensions (112, 135). The scalar inputs include the well controls at \(t_{n+1}\). The well controls are randomly sampled, ranging from 50,362.200 to 252,411.000 \(m^{3}/d\). The scalar variables are also broadcast into a matrix with dimensions (112, 135), and the values of other cells are filled with zeros. The final input data sample is constructed by connecting field variables, scalar variables, spatial grids, and temporal grids together. The final input data sample structure is (112, 135, 3).

Table 6 Summary of each FNO network input. The shape of the field inputs is (112, 135). The N[a, b] indicates that the scalar input value ranges from a to b. The scalar variables are also broadcast into a matrix with dimensions (112, 135)

Once the surrogate model is trained, it can predict the distribution of the pressure and saturation field and well production under various well control conditions. In the training process, we use the Adam optimizer to train for 500 epochs with an initial learning rate of 0.001 which is halved every 100 epochs. The neurons in the connection networks P and Q in Fig. 1 are activated using the sigmoid function.

Fig. 17
figure 17

Prediction of pressure distribution in layer 5, layer 15, and layer 35 at the 80th time step (case 3): network results (first column), numerical simulation results (second column), and errors (third column)

After the surrogate model was trained, we predicted pressure and saturation distribution at the 80th month. Figures 17 and 18 show that the predicted pressure and saturation distribution present a good match with the reference results. The medium relative errors for well pressure and saturation are 0.85% and 0.35%, respectively. Additionally, we presented the gas production results for the nine wells in Fig. 19, in which the blue dotted lines represent the results from the simulator, and the red lines represent the results from our model. Figure 19 shows that our model achieves very high accuracy in predicting well production. Figure 20 compares the results of the three-dimensional (xyz) gas saturation distribution for case 3 at time step 80.

Fig. 18
figure 18

Prediction of saturation distribution in layer 5, layer 15, and layer 35 at the 80th time step (case 3): network results (first column), numerical simulation results (second column), and errors (third column)

Fig. 19
figure 19

Gas rate prediction in case 3

Fig. 20
figure 20

Visualization of three-dimensional gas saturation distribution for case 3 at the 80th time step

For computational efficiency, the 550 cases were simulated on an Intel Xeon(R) W-3275 M dual CPU with 28 cores, and each run took approximately 1,500 s. On an NVIDIA GeForce RTX 3090 Ti GPU node with 24 GB video memory, it took 25,510 s to train our surrogate model. After training, the surrogate model took 125 s to forecast the outputs for all 50 test cases, with an average time cost of 2.5 s for each one. A speedup factor of over 600x is observed for this application scenario after model training.

In this case, the constructed surrogate model can provide flow predictions under time-varying well controls. By inputting the pressure and saturation field of the current time step and future well controls, the well production rates, pressure, and saturation distributions at the next time step can be predicted. The trained model has been tested under different well control scenarios, and the results showed that our proposed model can accurately predict the pressure and saturation fields. In addition, the production forecast for each well also matched the corresponding simulation results.

4 Discussion

We have extended the FNO from three-dimensional (xyt) to four-dimensional (xyzt) based on the domain decomposition method. Although the general four-dimensional (xyzt) FNO network, which is based on the three-dimensional FFT, is computationally very expensive and hard to employ in an application, we trained and used the four-dimensional (xyzt) FNO network from the two-dimensional FFT. Therefore, our developed framework retains the speed advantage of FNO in three dimensions (xyt) and can be applied to large-scale four-dimensional (xyzt) subsurface flow simulation. To demonstrate the developed four-dimensional FNO applicability in reservoir engineering, we use our method to build surrogate models in the dynamic prediction of a large three-dimensional reservoir (over 1 million grids) under changing well conditions in case 3. Excellent prediction is observed in well production and pressure and saturation fields.

Although we have been able to simulate the four-dimensional (xyzt) subsurface flow problems using the FNO network and the domain decomposition method, this was only done on orthogonal grids at this time. For the unstructured grid, more reservoir characteristics are not considered, such as production splitting, reservoir inclination, and so on. Recent studies have shown that complex geometric shapes can be fed into a normal FNO network through a spatial transformation (Li et al. 2022), and this work can be extended to deal with large-scale irregularly shaped geometry. We can combine our current work with irregularly shaped geometry techniques to build surrogate models for large-scale three-dimensional complex reservoirs and use these models in history matching, dynamic prediction of the reservoir, optimization of well conditions, uncertainty quantification, and other scenarios. For example, we can combine the proxy model established by our proposed method with particle swarm optimization (PSO) or simultaneous perturbation stochastic approximation (SPSA) algorithms to optimize the total oil production of the oilfield in the future production process by adjusting the water injection rate of all injectors. This could be extremely helpful when large amounts of simulation are required for production optimization on realistic reservoirs.

5 Conclusions

The FNO is a fast and promising neural network method to speed up the simulation of engineering applications. Because of the complexity of this method, up to now, it has only worked for three-dimensional (xyt) simulation problems and has had little use in practice, because practical simulations are always in the four-dimensional (xyzt) domain. In this work, we employ three-dimensional FNO along the x-axis and y-axis and incorporate the time component, thus circumventing the substantial computational expense associated with directly predicting four-dimensional flow state variables using existing four-dimensional FNO. This new methodology successfully extends the current FNO network from three-dimensional spatiotemporal domains to four-dimensional spatiotemporal domains for the subsurface flow simulations. The performance comparison results in case 1 indicate that, compared to the original FNO, our proposed method has only 6.3% of the parameters and requires less GPU memory (<20%). In addition, the single-epoch training speed is improved eightfold. The time complexity improvement of our proposed method is not dominated by the number of layers in the z-direction, and the space complexity is directly related to the number of FFT modes in the z-direction.

The new framework was successfully tested for some complex cases of \(\hbox {CO}_2\) injection for enhanced oil recovery by compositional simulations. The first case is a three-dimensional (250 feet \(\times \) 250 feet \(\times \) 50 feet) heterogeneous reservoir with an eight-component fluid. The second case is a three-dimensional (1,280 feet \(\times \) 1,280 feet \(\times \) 50 feet) heterogeneous reservoir with 42 fractures and an eight-component fluid. The framework is also applied to the surrogate model to accurately predict the pressure and saturation distribution of the reservoir and the well productions in an over-1-million-grid reservoir with nine wells (case 3). Key achievements in this work include the following:

  1. (1)

    Our new framework successfully extends the current FNO-network from three dimensions (xyt) to four dimensions (xyzt) to predict the state distributions in subsurface flow problems.

  2. (2)

    The computational speed in four dimensions (xyzt) by our method can be as fast as in three dimensions (xyt), which overcomes the very low speed of the original four-dimensional (xyzt) FNO method.

  3. (3)

    The tested results show that our new framework can efficiently simulate the complex EOR processes by injecting \(\hbox {CO}_2\) into complex fracture reservoirs in compositional simulation, and predict the well productions and the field pressure and saturation distribution in a large three-dimensional reservoir (over 1 million grids) with time-varying well conditions.

Future research will mainly involve the following aspects. In terms of network framework innovation, we will combine the current work with it and introduce the irregularly shaped geometry techniques to further process the large-scale three-dimensional complex reservoirs. In terms of application, we will establish the surrogate model that can describe the layered injection production relationship based on current work, and combine optimization algorithms to achieve intelligent parameter optimization, solving the challenges brought by time-consuming numerical simulators in traditional optimization processes.