1 Introduction

The porosity as a rock property can be defined as the ratio between the void pore spaces in the rock to the total bulk volume for the rock, and hence, it represents the rock storage capacity for the petroleum fluids in the reservoir rock [1, 2]. The accurate estimation for this parameter is very critical as it has a great impact on petroleum reserve estimation, petroleum economics, and field development plans [3,4,5]. Many technical methods are followed for determining the rock porosity as using the downhole logging tools for determining the rock porosity and this technique is costly due to the cost of the logging operation and downhole tools; in addition, it might be affected by the hole conditions due to the mud contamination [6, 7]. The lab measurement is the direct way for determining the rock porosity and the most accurate approach [8, 9]; however, it takes much time and cost for coring the rock sample and lab testing and this technique will not provide a complete log for the rock porosity [10, 11].

A recent technique is introduced to the field applications of rock characterization and rock porosity measurement by employing the drilled cuttings; however, the technique required special cuttings size and advanced sample preparation [12].

Determining the rock porosity from the logs data approach was studied in the literature where the rock porosity was obtained based on the sonic log or rock density [13, 14]. Nuclear magnetic resonance measurement was introduced for determining the rock porosity [15]. However, such techniques required the logging data or lab measurements to determine the rock porosity values that required extra cost and time.

1.1 New Machine Learning Applications for Porosity Prediction

The applications of machine learning (ML) techniques provided huge contributions for dealing with petroleum data in different disciplines. ML tools such as artificial neural networks (ANNs), fuzzy logic (FL), expert systems, support vector machines (SVMs), functional networks (FN), and case-based reasoning provided high performance and accurate prediction results[16]. The implementation of such tools contributed to solving many technical problems such as estimation and optimization of drilling parameters [17,18,19,20,21], predicting and monitoring the drilling fluids properties [22,23,24,25,26,27], reservoir fluid properties [28,29,30,31,32,33], rock density[34], rock permeability estimation [35, 36], and rock strength and geomechanical properties [37,38,39,40,41].

The porosity prediction by employing artificial intelligence techniques was studied in the literature as shown in Table 1. The table shows the input parameters for predicting the rock porosity, rock formation type for the study, and the ML techniques that were employed to build the prediction model.

Table 1 ML applications for porosity prediction

Studies among the literature investigated the rock porosity prediction using the well-logging data as density, neutron porosity, sonic time, resistivity log, gamma-ray (GR), and stratigraphic information [42,43,44]; however, these logs (input parameters) are not available for all wells and requires additional logging operations to acquire the log data. In addition, the drilling data were employed for predicting the formation porosity using drilling parameters as the rate of penetration (ROP), pump rate (Q), drill string rotating speed (RPM), standpipe pressure (SPP), torque (T), weight on bit (WOB), and mechanical specific energy [45, 46]; however, these studies are restricted to certain types of formations. As shown in Table 1, drilling data were employed but for carbonate formation during drilling horizontal well [45], and another study for sandstone and shale formations but with incorporating the mechanical specific energy as an additional input to the drilling data; furthermore, the model accuracy was low with a correlation coefficient between the predicted and actual porosity values was 0.6 [46].

The novel contributions for this research comprise generating the formation neutron porosity log from only the available surface drilling parameters for complex lithology drilled rocks with high accuracy using ANN model. The current study predicted the porosity using the collected drilling data during drilling complex lithology formations that have carbonate, sand, and shale formations. In addition, the study presented a newly developed ANN-based equation for easy estimation of the rock porosity from the drilling data. The obtained model from this study will help to save the operational cost and time to log or measure the rock porosity in the lab.

2 Methodology for Predicting the Rock Porosity

This research proposed an ANN prediction model for the rock porosity using the drilling data as inputs. The porosity profile was generated with a high accuracy using the developed ANN model for the whole drilled section that contains complex formation. Figure 1 represents the methodology flow to provide a robust porosity prediction model starting from the data gathering from the drilling sensors, followed by data preprocessing that includes data cleaning, removing the outliers, and data smoothing for removing the noise to provide the model input parameters with good quality. The next phase is to build and optimize the ANN model by training and optimizing processes with the trained algorithm. The model accuracy has to be determined to check the accuracy level of the model, and if the accuracy is low, then a retraining process should be performed to determine the optimum model parameters for high accuracy performance for the porosity prediction. Once the accepted accuracy is achieved, the model parameters will be saved and the results will be reported.

Fig. 1
figure 1

Methodology layout for building ANN model

2.1 Data Description and Statistics

The data in this study were collected during a drilling phase that covered the intermediate section for vertical wells. The drilled formations contain more than one rock type as sandstone, shale, and limestone that can be considered complex lithologies. The data covered 3767 readings for all the drilling parameters with the neutron porosity log after the data cleaning and preprocessing were used for building the machine learning model. Another data set of 1670 data points was collected from the same drilling phase that was employed for validating the developed model. The drilling parameters include the surface drilling parameters as the weight on bit (WOB) in klb, torque (T) in kft.lbf, standpipe pressure (SPP) in psi, drill string rotary speed (RPM) in min−1, drilling rate of penetration (ROP) in ft/h, and mudflow rate (Q) in gpm.

2.2 Data Statistics and Analysis

The collected data from the drilling sensors suffered from operational measurement and tool errors. And hence, the data should be preprocessed for removing the missing measurements, noise, and outliers by using a developed MATLAB code to ensure the data quality for developing the AI model. Statistical analysis for the cleaned data shows the minimum, maximum, mean, and standard deviation for each parameter as shown in Table 2.

Table 2 Data statistical analysis

From the data statistics, the drilling parameters and porosity indicated the wide range for the data that will enhance the prediction capabilities of the developed AI model. The statistics show that WOB ranged from 1.5 to 26.7 (klbf), T from 4.3 to 11.0 (kft.lbf), SPP from 2140 to 3076 psi, pipe speed from 77.9 to 162.5 (1/min), ROP from 26.1 to 119.6 (ft/h), flow rate ranged from 627 to 854 (gpm), and the target parameter from 0.055 to 0.429 that covered very tight rock class to high porous rock scale. Studying the porosity data range and frequency showed that the porosity values below 0.2 recorded 44% of the total recorded frequency, 49% of the total frequencies was recorded for the porosity values from 0.2 to 0.3, and only 8% from the total frequency was observed for the higher porosity values greater than 0.3. Hence, the porosity database covered a wide range of rock porosity data that enhances the capability of the prediction model.

The relationships between the drilling parameters (model inputs) and the rock porosity (model output) show a direct linear relationship between the porosity and drilling parameters as Q, RPM, WOB, ROP, and T with a correlation coefficient (R) of 0.299, 0.233, 0.151, 0.144, and 0.086, respectively. However, the porosity shows a very weak indirect relationship with SPP by R of −0.003. as represented in Fig. 2. However, it is worth mentioning that the relationship between the porosity and drilling parameters might reveal a nonlinear relationship.

Fig. 2
figure 2

Correlation coefficient of drilling data and rock porosity

2.3 Building and Evaluating the ANN Model

The artificial neural network tool was utilized for solving engineering problems by its processing algorithms based on interconnected artificial neurons that mimic the biological neural networks [47, 48]. Three layers represented the common architecture for ANN which are the input, hidden, and output layers [49]. Weights and biases are utilized in the ANN structure to link the layers and affect the network performance [50]. Different algorithms are used for model training and controlling neuron processing [51]. Many researchers studied extracting empirical correlations from the ANN architecture for easier applications in the petroleum industry [52, 53]. Many parameters were tested to check the impact on the ANN model accuracy as the hidden layer/s number, the neurons number, network, training, and transfer functions. Figure 3 shows the design of the developed ANN model in this study.

Fig. 3
figure 3

The structure of the ANN porosity model

The developed model was evaluated by determining two statistical parameters which are correlation coefficient (R) and average absolute percentage error (AAPE). R and AAPE are calculated as follows:

$${\text{R}} = \frac{{N(\mathop \sum \nolimits_{1}^{N} Y_{i} \hat{Y}_{i} ) - (\mathop \sum \nolimits_{1}^{N} Y_{i} )(\mathop \sum \nolimits_{1}^{N} \hat{Y})}}{{\sqrt {\left[ {N{\text{~}}\mathop \sum \nolimits_{1}^{N} Y_{{\text{i}}}^{2} - \left( {\mathop \sum \nolimits_{1}^{N} Y_{i} } \right)^{2} } \right]\left[ {n\mathop \sum \nolimits_{1}^{N} \hat{Y}_{i}^{2} - \left( {\mathop \sum \nolimits_{1}^{N} \hat{Y}_{i} } \right)^{2} } \right]} }}$$
(1)
$${\text{AAPE}} = \left( {\frac{1}{N}\mathop {\sum\limits_{{i = 1}}^{{\text{N}}} \left| {\left. {\frac{{Y_{i} - \hat{Y}_{i} }}{{Y_{i} }}} \right|} \right.}} \right) \times \;100$$
(2)

where N is the number of data points in the dataset, \({Y}_{i}\) is the actual output, \(\widehat{Y}\)i is the predicted output.

3 Results and Discussion

This section discusses the obtained results from building and optimizing the ANN model for the porosity prediction from the drilling data.

3.1 Model Training and Testing

The data were randomly distributed to training and testing sets by 70:30% as 2637 data points for training and 1130 points for the testing set from all the model data set of 3767 recordings. Optimizing the ANN model parameters was achieved by testing several runs for the model to obtain the optimum number of neurons, network function, training function, transfer function, and the drilling data as inputs. Figure 4 illustrates how the impact of changing the number of neurons in the hidden layer on the model results of the training and testing processes. Changing the neurons number from 15 to 30 showed that increasing the number of neurons caused increasing the model performance by increasing R and reducing AAPE for training and testing data sets. In addition, a sensitivity analysis was performed to check the impact of changing network function, training function, transfer function, and the drilling parameters as shown in Figs. 5, 6, 7, 8, respectively.

Fig. 4
figure 4

The impact of changing neurons number on ANN model results

Fig. 5
figure 5

The impact of changing network function on ANN model results

Fig. 6
figure 6

The impact of changing training function on ANN model results

Fig. 7
figure 7

The impact of changing transfer function on ANN model results

Fig. 8
figure 8

The impact of drilling parameters on ANN model results

The best model parameters were recorded for only one hidden layer with 30 neurons, function fitting neural network (fitnet) as a network function, Bayesian regularization backpropagation (trainbr) as a training function, hyperbolic tangent sigmoid transfer function (tansig) as a transfer function, and employing all the six drilling parameters. The optimized ANN model yielded a high correlation coefficient of 0.97 and 0.92 with a low AAPE of 6.2 and 9.3% for training and testing data sets, respectively, as shown in Fig. 9.

Fig. 9
figure 9

ANN model results. (a) training (b) testing

The obtained results showed a high degree of match between the actual and the predicted values for the porosity profile for the drilled section of different lithology formation types as presented in Fig. 10.

Fig. 10
figure 10

ANN porosity model results for the drilled section. (a) training (b) testing

3.2 Model Validation

In order to ensure the practical application for employing the developed model, a different data set from the same field that has the same penetrated rocks with complex lithology was utilized for validating the developed ANN model. A cleaned data set (1670 data points) was employed for validating the model, and the obtained results showed a strong prediction performance for the porosity log from the surface drilling parameters. The validation results showed R of 0.95 and AAPE of 8.5% for the ANN model as shown in Fig. 11.

Fig. 11
figure 11

ANN MODEL validation results

3.3 A Developed Empirical Correlation for Porosity Estimation

A new nonlinear relationship was extracted from the weights and biases of the developed ANN model. The ANN model equation is proposed to be used by non-AI users. To utilize the new ANN model equation, the input values should be normalized to be in the range between -1 and 1 as follows:

$$X_{{i_{{{\text{nor}}}} }} = 2{\text{*}}\left( {\frac{{X_{i} - {\text{~}}X_{{i~{\text{min}}}} }}{{X_{{i{\text{max}}}} - {\text{~}}X_{{i{\text{min}}}} }}} \right) - 1$$
(3)

where \({X}_{{i}_{\mathrm{n}\mathrm{o}\mathrm{r}}}\) represents the normalized value for variable \(X\), \({X}_{i}\) is the value of variable \(X\), \(X_{{i{\text{min}}}}\) is the minimum value of variable \(X\), \(X_{{i{\text{max}}}}\) is the maximum value of variable \(X\).

The minimum and maximum values for all variables that are used for data normalization are presented in Table 3.

Table 3 Minimum and maximum values for data normalization

The proposed ANN equation that can be used for porosity prediction in the normalized form is presented in Eq. 4. The equation uses the weights and biases that are shown in Table 4.

$$\phi _{{{\text{ni}}}} = ~\left[ {\sum\limits_{{i = 1}}^{N} {{\text{w}}_{{2_{i} }} } \left( {\frac{2}{{1 + {\text{e}}^{{ - 2{\text{*}}\left( {w_{{1_{{i,1}} }} \left( {{\text{WOB}}_{n} } \right) + w_{{1_{{{\text{i}},2}} }} \left( {T_{n} } \right) + w_{{1_{{{\text{i}},3}} }} \left( {{\text{SPP}}} \right) + ~w_{{1_{{i,4}} }} \left( {{\text{RPM}}_{n} } \right) + {\text{w}}_{{1_{{{\text{i}},5}} }} \left( {{\text{ROP}}_{n} } \right) + w_{{1_{{i,6}} }} \left( {Q_{n} } \right) + b_{{1_{i} }} } \right)}} }}} \right) - 1} \right] + b2$$
(4)

where \({{\phi }_{n}}_{i}\) is the normalized porosity value, \(N\) is the number of neurons in the hidden layer, i.e., 30, \({w}_{{1}_{i}}\) is the weight associated with each feature between the input and the hidden layers, \({w}_{{2}_{i}}\) is the weight associated with each feature between the hidden and the output layers, \({b}_{{1}_{i}}\) is the bias associated with each neuron in the hidden layer, and \({b}_{2}\) is the bias of the output layer.

Table 4 Weights and biases for the optimized porosity ANN model

To convert the obtained normalized \(\phi\) to an actual \(\phi\) value, Eq. 5 can be used:

$$\phi = \frac{{\phi _{{\text{n}}} + 1}}{{5.3476}} + 0.055$$
(5)

4 Conclusions

This study presented a novel approach for predicting the rock porosity from the drilling data during drilling complex lithology formations (sandstone, shale, and carbonate) using ANN. Two data sets were employed for building the ANN model (3763 data points) and the other set for validating the developed model (1670 points). The study findings can be withdrawn as follows:

  • ANN model was optimized as 30 neurons, fitnet as a network function, trainbr as a training function, tansig as a transfer function, and utilizing all the recorded drilling parameters.

  • The study results for the model showed R of 0.97 and 0.92 with AAPE of 6.2 and 9.3% for training and testing, respectively.

  • Validating the developed ANN model proved the strong prediction performance for the model with R of 0.95 with AAPE of 8.5%.

  • Furthermore, the study presented a newly developed empirical correlation for porosity estimation from the drilling parameters in real time.

The porosity estimation in real time using the developed ANN model will save cost and time for the porosity determination in reality by employing either the lab measurements or well logging operations.