Keywords

1 Introductions

After more than ten years of exploration and breakthrough, China's shale gas resources have successfully achieved commercial development, which means shale gas has become the main direction for increasing gas reserve and promoting gas production in the future. Compared with conventional oil and gas resources, the development effect of shale gas is restricted by double sweet spot factors, namely geological factors and engineering factors. As a result, it is more difficult to predict the estimated recovery reserve (EUR) for shale gas wells. Especially for new reservoir types or new reservoir layers during the exploration stage, evaluation stage and early development stage, the uncertainty of single well EUR becomes even more greater, and the traditional single factor analysis method cannot meet the prediction accuracy. In recent years, with the development of big data technology and intelligent algorithms, machine learning algorithms have been widely applied in various industries, including field of oil and gas exploration and development.

Oil and gas exploration work involves multiple aspects of data such as geological data, geophysical data, logging, and production data. Machine learning methods can take all the data into consideration, which helps improve the accuracy of analysis results. Yu [1] elaborated on the application of machine learning algorithms and intelligent optimization theory in the fields of reservoir parameter prediction, reservoir fluid identification, lithofacies identification, fracturing fracture identification, automatic well location optimization, advantageous channel classification, and CO2 oil displacement and storage, respectively. Liu [2] introduced unsupervised machine learning algorithms into the delineation of favorable zone boundaries and developed an intelligent delineation method for complex reservoir favorable zone boundaries. Firstly, the sensitive reservoir data and unsupervised clustering algorithm were used to cluster the reservoir distribution map, and then the favorable zone boundary range of the reservoir was given based on the criteria for delineation of favorable zones.

Compared with oil and gas exploration, machine learning methods are more widely used in the field of oil and gas field development. On the one hand, machine learning can predict development factors such as production capacity, yield, and single well EUR. On the other hand, based on the prediction of development factors, they can optimize the design of development plans and fracturing parameters. Ma et al. [3] proposed a machine learning process based prediction of shale gas productivity; Hirsch Miller et al. [4] used the recursive elimination method to screen the main control factors, and established a gas production prediction model with random forest to optimize engineering parameters; Sheikhi et al. [5] used linear regression, random forest, gradient elevator, tree regression, neural network and other methods to establish a shale gas prediction model, and evaluated engineering parameters from both local and global aspects; Shelley et al. [6] established a shale gas prediction model based on the BP neural network algorithm and used the established model to evaluate the quality of engineering parameters; Mao [7] elaborated on the current research status of artificial intelligence in oil and gas fracturing production, analyzed the key theoretical issues faced by the development of fracturing artificial intelligence, and looked forward to the main research direction and application scenario design of fracturing artificial intelligence.

This paper takes comprehensive factors (geological data and fracturing data) into consideration and uses machine learning method to predict single well EUR for shale gas. Firstly, analyze the integrity and validity of the collected data and use multiple interpolation algorithm methods to generate the missing values; Then, single factor analysis is employed to qualitatively analyze the correlation of various types of data. Next, the XGBoost method is used to quantitatively characterize the correlation between single well EUR and various types of data; Finally, three machine learning methods (XGBoost, RF, and SVR) are selected to predict single well EUR.

2 Research Method and Realization Process

2.1 Introduction to Machine Learning

Machine learning essentially allows computers to find regulations from data and predict future trends based on the obtained patterns. The basic idea of machine learning is to imitate the process of human learning behaviors. For example, when we encounter new problems in reality, we usually summarize the rules through experience and predict the future process. The checkers program proposed by Turing and developed by Samuel in the 1950s marked the official entry of machine learning into its development period. In the 1980s, the concept of multi parameter linear programming (MLP) trained by neural network back propagation (BP) algorithm brought machine learning into a renaissance period. The decision tree (DT) proposed in the 1990s, along with the later support vector machine (SVM) algorithm, transformed machine learning from a knowledge driven approach to a data driven approach. At the beginning of the twenty-first century, Hinton proposed deep learning (DL), which led machine learning research from a downturn period to a flourishing period. Since 2012, with the improvement of computing power and the support of massive training samples, DL has become a hot research topic in machine learning and has driven widespread applications in the industry.

2.2 Research Process

In this paper, the machine learning algorithm is used to predict the EUR of a single well in JS block of F Shale Gas Field in China. On the basis of clarifying the geological characteristics and production characteristics of JS shale gas block, a series of geological parameters, engineering parameters and production data are put into comprehensive consideration. First, single factor analysis is employed to qualitatively describe the impact of different geological and engineering parameters on single well EUR; Then, the XGBoost method is used to quantify the importance of each parameter by recording the total number of feature splits and the total/average information gain, calculate the weight of different parameters, and screen the main control factors. Based on the selected main control factors, SVR, RF, and XGBoost algorithms are used to predict single well EUR, and the adaptability of the three methods is analyzed and compared.

3 Field Application in JS Shale Gas Block

3.1 Basic Condition and Data Collection

3.1.1 Introduction to JS Shale Gas Block

JS shale gas Block in China is a medium deep high-pressure shale gas reservoir with excellent reservoir physical properties, moderate burial depth and a high gas reservoir pressure. After fracturing, high open flow capacity can be obtained. The physical parameters of the JS shale gas block are as in Table 1.

Table 1 Geological parameters of JS shale gas block

Since the first shale gas well in JS shale gas Block was put into production in 2012, a total of 257 wells have been drilled in the three-year construction period of which 3 wells are in long-term shut-in condition due to fracturing failure. At present, there are 254 wells in production, with an average well test productivity of 37.9 × 104 m3/d. Since the production time of all shale gas wells in the study area has exceeded 7 years, the decline analysis method is used to predict the EUR of 254 wells, and the average EUR per well is 1.91 × 108 m3.

3.1.2 Data Collection and Diagnose

Many geological, engineering and production parameters of 254 wells in the study area were collected. The geological parameters include burial depth, TOC, porosity, gas content, free gas content, adsorbed gas content, gas saturation, pressure coefficient and brittle mineral content. The engineering parameters include the length of the horizontal well section, the number of fracturing sections, the length in the main layer, sand volume, and the fracturing fluid. Production parameters include open flow rate, cumulative flowback rate, and single well EUR for gas testing.

Based on the degree of data missing, quality evaluation standards for data is established to diagnose the validity and integrity of collected data. According to the standard, Python is used to create an automatic data quality control screening program to achieve batch control of data quality.

3.2 Analyses of Key Factors

The potential factors affecting the EUR of shale gas reservoirs include geological factors and engineering factors. Some factors have a great impact on the shale gas production while others have little impact. Qualitative analysis and quantitative analysis are carried out to evaluate the importance of collected factors.

3.2.1 Qualitative Analysis

The cross plot map between each factor and EUR is a good way the qualitatively show their relationship. The cross plot of single well EUR per kilometer and the above variables is drawn as follows, and the correlation is observed manually.

  1. (1)

    Geological parameters

From the above cross-plots, it can be seen that reservoir burial depth, pressure coefficient, porosity and gas saturation have a greater impact on single well EUR. The burial depth affects compaction degree, crustal stress, ground temperature, etc. It can be seen from the analysis of the Fig. 1a that the burial depth is negatively related to single well EUR. The deeper the burial depth is, the stronger the shale reservoir is composed and the harder to maintain effective fractures. From Fig. 1b, we can see that formation pressure coefficient is positively correlated to single well EUR. The formation pressure coefficient not only reflects the energy of the gas reservoir, but also affects the hydrocarbon expulsion ability of the shale reservoir and the diffusion degree of shale gas. As can be seen in Fig. 1c, porosity is also an important factor. For conventional oil and gas reservoirs, porosity and permeability are two important characterization parameters of the reservoir to determine the quality of the reservoir. An important enrichment form of shale gas is that it exists in the pores and fractures of shale in free form. As a result, gas saturation has a big effect on EUR as is shown in Fig. 1d. The porosity and permeability of shale reservoirs also determines the amount of shale free gas. The correlations of gas content and TOC to single well EUR can be probed in Fig. 1e–f. Gas content is the most direct factor affecting shale gas enrichment. Shale gas content is determined by many factors, including enrichment and accumulation conditions, burial depth, TOC content, Ro, shale thickness, formation pressure, preservation conditions, etc. Gas content determines the enrichment of shale gas, and the correlation between free gas content and reserves is more obvious. The formation of shale gas reservoir requires source rock to have high total organic carbon content, moderate organic maturity, large thickness and wide distribution of effective source rock. TOC and recoverable reserves indicators are moderately correlated.

Fig. 1
A set of 6 scatterplots. The best fits are as follows. A, depth. (2250, 2.6), (3250, 0). B, pressure coefficient. (1.3, 0), (1.6, 3). C, porosity. (3.1, 0.2), (6, 2.3). D, gas saturation. (50, 0), (70, 3). E, T O C. (2.1, 0.2), (5, 2.3). F, gas content. (4.5, 1), (7.9, 2). Values are estimated.

Cross plot between EUR and different geological parameters

  1. (2)

    Engineering parameters

The complexity of artificial fracturing is the key factors affecting the production of shale gas reservoirs, and horizontal strength, fracturing liquid volume, and sand volume are the key parameters to describe the degree of fracturing. As is shown in Fig. 2a, within a certain range, there is a positive relationship between the length of the horizontal section and recoverable reserves. When the length of the horizontal section is greater than around 1500–1600 m, the extension of the horizontal section is not conducive to an increase in EUR per unit length. The amount of fracturing fluid and sand addition reflect the degree of effective fracture opening in the reservoir, and these two parameters show moderate correlation with the unit length EUR.

Fig. 2
A set of 3 scatterplots. The best fits are as follows. A, E U R versus horizontal length. (1000, 0.2), (1500, 2.3), and (1800, 0.2). B, E U R versus sand volume. (650, 0.2), (1000, 1.2), (1300, 2). C, E U R versus fracturing liquid volume. (24000, 1), (35000, 1.5), and (45000, 1.7). Values are estimated.

Cross plot between EUR and different engineering parameters

3.2.2 Quantitative Analysis

The traditional Pearson correlation coefficient method can only measure the linear correlation between two variables and all the variables must obey the assumption of normal distribution. The distance correlation coefficient overcomes the shortcomings of Pearson correlation coefficient method because it measures the nonlinear relationship between two variables and is not limited by the assumption. The distance correlation coefficient studies the independence between two variables. A distance correlation coefficient of 0 indicates that the two variables are independent; The larger the distance correlation coefficient, the stronger the correlation between two variables.

Based on the distance correlation coefficient method, the overall analysis of the whole area shows that geological factors have a greater influence than engineering factors. The top three factors are burial depth, pressure coefficient and free gas content (Fig. 3).

Fig. 3
A bar graph plots the main control factors ranking as follows. (0.65, depth), (0.6, pressure coefficient), (0.5, gas saturation percent), (0.48, porosity percent), (0.23, T O C percent), (0.21, sand volume), (0.2, horizontal length), and (0.15, liquid volume).

Ranking of Main Control Factors for EUR per kilometer using Pearson correlation method

XGBoost algorithm can directly output the importance of every factor, and a single decision tree model actually finds a suitable segmentation point for a certain feature during model establishment. During the training process, the feature importance is quantified by recording the total number of times that the feature is split and the total/average information is gained. The weight calculation method uses the number of times that the feature is used as a partition attribute in all trees. That is to say the more features are used to construct a decision tree in a subtree model during splitting, the higher their importance.

Based on XGBoost algorithm, JS shale gas Block is evaluated as a whole, and the order of importance of influencing factors is analyzed, including burial depth, pressure coefficient, fracturing fluid volume, free gas content, horizontal well section length, sand addition volume, TOC, porosity, and number of fracturing sections (Fig. 4).

Fig. 4
A bar graph plots the main control factors ranking as follows. (481, depth), (439, pressure coefficient), (329, gas saturation percent), (262, porosity percent), (275, T O C percent), (289, sand volume), (329, horizontal length), and (339, liquid volume).

Ranking of Main Control Factors for EUR per kilometer using XGBoost

3.3 EUR Prediction Using Machine Learning Methods

The results of three machine learning algorithms, namely SVR, Random Forest, and XGBoost, are compared for the ranking of the main control factors that can be extracted above. The basic component of XGBoost is the decision tree. We refer to these decision trees as weak learners, and with these weak learners combined, XGBoost model is formed. During the prediction process, trees are continuously added to fit the residuals of the previous prediction. After training, the corresponding scores of each tree add up to the predicted values of the sample, which has the advantages of fast speed, good performance, and being able to process large-scale data. However, the algorithm has too many parameters, complex parameter tuning, and is not suitable for processing ultra-high dimensional feature data (Figs. 5, 6 and 7).

Fig. 5
A line graph plots E U R versus numbers. The curves pass through the following points. Real data. (0, 2.2), (15, 2.65), (28, 5.4), (40, 2.5), and (45, 3.3). Prediction data. (2, 3), (10, 1), (20, 1.7), (30, 2), (40, 2.5), (49, 2). Values are estimated.

Comparison between real production data and prediction data with SVR

Fig. 6
A line graph plots E U R versus numbers. The curves pass through the following points. Real data. (0, 2.2), (15, 2.65), (28, 5.2), (40, 2.5), and (45, 3.3). Prediction data. (2, 3.6), (10, 0.75), (20, 2), (30, 2), (40, 2.5), and (49, 1.8). Values are estimated.

Comparison between real data and prediction data with RR

Fig. 7
A line graph plots E U R versus numbers. The curves pass through the following points. Real data. (0, 2.2), (15, 2.65), (28, 5.2), (40, 2.5), and (45, 3.3). Prediction data. (2, 3.6), (10, 0.7), (20, 2.4), (30, 2), (40, 2.2), and (49, 1.9). Values are estimated.

Comparison between real data and prediction data with XGBoost

Comparison among predicting results of three algorithmic models show that XGBoost algorithm performs relatively well, with 35% of wells having a relative error of less than 20% in the test set. The total number of wells used in the test set is 50, and the number of wells with relative error within a certain range is shown in Fig. 8.

Fig. 8
A column chart plots the percentage of relative errors as follows. Training data. (S V R, 49), (R R, 70), (X G boost, 82). Testing data. (S V R, 29), (R R, 35), (X G boost, 35). Whole data. (S V R, 45), (R R, 63), (X G boost, 72).

Relative error comparisons between three methods

4 Conclusions

  1. (1)

    Compared with conventional oil and gas, EUR prediction for shale gas has more uncertainty due to the comprehensive geological and engineering conditions. It is difficult to quantitatively characterize the correlations between EUR and various parameters simply by traditional methods. This article adopts a combination of traditional single factor analyses and intelligent methods. First, single factor analyses are used to qualitatively analyze the correlation between EUR and various parameters, and then XGBoost method is used to quantitatively characterize the impact of each factor on single well EUR.

  2. (2)

    Through the combination of qualitative and quantitative analyses, the ranking of the main controlling factors for EUR is obtained. Controlling factors may be different for specific shale gas reserves. In our case, the burial depth of the gas reservoir, pressure coefficient, porosity and gas saturation are the main geological factors that have a key impact on EUR of shale gas wells; and the length of the horizontal section, the amount of sand and fracturing fluid are the main engineering factors that affect the EUR of shale gas single wells.

  3. (3)

    Three machine learning algorithms (SVR, RF and XGBoost) are employed to predict the EUR of shale gas single wells and comparison is made among the three methods. It is shown that XGBoost method is the best prediction method for the data in this paper.