Introduction

The rate of penetration (ROP) is the speed of the bit when it drills the formation. Increasing the ROP values may result in many problems such as poor hole instability and poor cleaning. Therefore, an effective drilling process can be obtained by optimizing the rate of penetration with the lowest cost. The ROP is influenced by numerous factors which can be classified into controllable and uncontrollable factors (Hossain and Al-Majed 2015).

Pump pressure is one of the significant parameters that affect the drilling performance and the ROP. Su et al. (2018) illustrated that optimizing the pump pressure is the key parameter in drilling a well with high efficiency.

Galle and Woods (1963) achieved a suitable ROP for roller cone bit simultaneously with maximizing the bearing life and minimizing the bit teeth wearing assuming optimum bit, hydraulics, and mud properties selection. Teale (1965) defined the specific energy term which represents the amount of energy required to remove definite rock volume. He concluded that the minimum specific energy is in the same order of rock compressive strength. This formulation was modified by Pessier and Fear (1992). Armenta (2008) made a new correlation depending on field and experimental data to determine the inefficient drilling conditions. He mainly modified Teale’s formula by integrating a hydraulic term to the original equation.

Walker et al. (1986) demonstrated that the reduction in the ROP as the well becomes deeper is related to the downhole pressures which cause an increase in rock strength and ductility and decrease inefficient hole cleaning. They concluded that the ROP is a function in the rock properties as well as the drilling parameters.

Warren (1987) claimed that the reduction in the ROP at high-pressure functions in local cratering effects and global cleaning effects. Winters et al. (1987) continued to improve the model developed by Warren (1987). They proceeded into laboratory drilling tests using roller cone bits. They considered numerous factors including bit design, operating conditions, and rock mechanics. They also considered the ductility of the rock previously introduced by Warren (1987) and considered it a highly influencing factor. They did not consider the chip hold down effect.

Bilgesu et al. (1997) generated data from drilling simulator including WOB, RPM, pumping rate, formation hardness and bit type to eliminate any errors contribution. They concluded that this ANN technique can be used to predict the ROP with a very good accuracy even in the absence of some parameters. Fear (1999) used homogeneous data sets acquired from mud logs, bit characteristics, geological properties, and drilling operating parameters. He designed procedures to be done and recommended them to reach an optimized ROP during field operation.

Rumzan and Schmitt (2001) studied the effect of wellbore pressure and rock strength on ROP. They concluded that as the wellbore pressure increases, the ROP will decrease. Hareland et al. (2010) developed a new simple ROP model using laboratory data observed using roller cone bits. Then, they used it to estimate the unconfined rock strength. Wu et al. (2010) continued to study extensively the effect of some parameters including rock type, inserts type, and bit wearing on the resulting ROP.

Jahanbakhshi et al. (2012) developed the ANN model to predict the ROP using field data. They considered a wider range of parameters including formation type, its corresponding mechanical properties, hydraulics, bit type, WOB, and RPM to predict more precise ROP. Kowakwi et al. (2012) modified Warren’s model for the roller cone bits by integrating terms that represent bit hydraulic, chip hold down effect, and bit wearing. They eliminated Warren’s assumption of perfect cleaning. This allowed better estimation of the ROP operating under real conditions.

Elkatatny (2017) constructed an ANN to estimate the ROP. They mainly included the mud properties to construct the network in a more proper manner. The resulted ROP by this network was obviously improved which explicitly proved the dependency of the ROP on mud properties. Bezminabadi et al. (2017) developed a new model to estimate the ROP using multiple non-linear regression as well as constructing ANN. They concluded that the ANN technique gives more precise results than multiple non-linear regression model.

The objective of this paper is collecting the mud properties and the drilling variables (WOB, RPM, T, Q, P, and UCS) to build a new ROP model based on ANN using field data. ANN was combined with the self-adaptive differential evolution optimizing method to determine the best combination of ANN architecture. A new empirical correlation for ROP will be developed based on the weights and biases of the optimized ANN model.

Artificial intelligence

Arabjamaloei and Shadizadeh (2011) stated that ANN is a computational technique, which is obtained from the construction features of biological neural networks. Schalkoff (1997) illustrated that ANN consists of neurons, where each neuron has a given input or output and they are connected together to form the network. A normal ANN consists of an input layer, some hidden layers, and an output layer. Every neuron of one layer is linked to every neuron in the following layer. Every connection has a related weight (Haykin 1998).

Ali (1994) explained that the relationship between the neuron and the source is controlled by the weights and the biases which are utilized also to monitor the input parameters. To overcome the common issue of over-fitting and under-fitting, the optimization process is very important to select the number of neurons and the number of hidden layers (Haykin 1998, Aalst et al. 2010).

Lippmann (1987) and Jain et al. (1996) illustrated that the ANN should be trained first to build the model and then the developed model can be assessed using the weights and biases of the developed model using a new data set which was not seen during the training process.

Recently, artificial intelligence techniques are extensively applied in the petroleum industry especially in predicting the well or field performance. Akcayol and Sagiroglu (2007) developed a neuro-fuzzy controller for tuning the output voltage of transformer rectifier units. Aguilar et al. (2009) developed an automated system for artificial gas lift wells for management abnormal situation. He (2009) developed an accurate neural-based predictive control algorithm for online control of a force acting on industrial hydraulic actuators. Isa and Rajkumar (2009) developed a support vector machines (SVM) system to predict the loss of the wall thickness of the pipeline.

AlAjmi et al. (2015) predicted the choke performance using an artificial neural network (ANN). Alarifi et al. (2015) estimated the productivity index for oil horizontal wells using ANN, functional network, and fuzzy logic. Chen et al. (2015) applied neural network and fuzzy logic to evaluate the performance of inflow control device (ICD) in a horizontal well. Elkatatny et al. (2016, 2017) applied the artificial neural network (ANN) comprehensively to determine the permeability of heterogeneous reservoir and to estimate the rheological properties of drilling fluids based on real-time measurements. Van and Chon (2017a, 2017b) evaluated the performance of CO2 flooding using artificial neural network techniques. They developed ANN models for determining oil production rate, CO2 production, and gas oil ratio (GOR). Choubineh et al. (2019) developed a new ANN model for minimum miscibility pressure of varied gas compositions over a wide range of conditions. Zhang et al. (2018) concluded that experimental and simulation results confirmed the accuracy and the ability of the developed ANN approach in predicting the real-time tidal level.

Wang et al. (2018) proposed a model that can be used to predict the rock properties from the mechanical drilling parameters on a real time. Karakul (2018) concluded that the polymer-based drilling fluid yields a wellbore stability better than bentonite- or KCL water-based drilling fluid. Khandelwal and Singh (2011) developed an ANN model to estimate the elastic rock properties of schistose rocks such as static Young’s modulus and static Poisson’s ratio using compressive and tensile strength. Elkatatny et al. (2018a, 2018b) stated that the static Young’s modulus can be predicted based on log data using the ANN or regression technique with high accuracy.

The self-adaptive differential evolution was introduced by Qin et al. (2009) to overcome the common issues of the differential evaluation (DE), Storn and Price (1997). The advantage of the SaDE is its ability to self-adapt to the control parameters and mutation strategies based on the learning experience in the previous algorithm generations to obtain better results. Moussa and Awotunde (2018) developed a modified SaDE that can be used for the optimization in different engineering problems.

The objective of this study is to apply the SaDE optimization technique to determine the best combination of ANN variables to be able to predict the ROP with a high accuracy using drilling mechanical parameters and fluid properties.

Methodology

The optimization of the ANN variable parameters is very difficult and requires a long time. The main parameters of ANN that need optimization are the number of hidden layers, the number of neurons in each layer, the training function and transferring function, and the training over testing ratio. To overcome this issue, an automated system should be developed to determine the best combination of these parameters in order to get a high accuracy in terms of R and AAPE. The main goal of this study is to determine the ROP as a function of flow rate (Q), standpipe pressure (P), RPM, WOB, T, UCS, D, and PV.

The optimization process will continue until the performance of the proposed SaDE-ANN model is acceptable (the values of estimated data from the model are very close to the corresponding experimental data).

After training the SaDE-ANN model on the randomly selected training data, the model is validated on unseen testing data, then a new empirical correlation is extracted to estimate ROP based on Q, P, RPM, WOB, T, UCS, D, and PV.

Data description

Data was collected from a vertical section (12.025″ hole section). The selected section consists of four carbonate layers with different uniaxial compressive strength values. The drilling fluid properties were collected from the daily drilling report and the drilling mechanical properties were collected from the well summary report. Data from two wells was collected. Well 1 was used to build the Sa-ANN model (training and testing) and the data from well 2 was used to evaluate the accuracy and the generalization of the developed model.

To build the artificial intelligence model, well 1 data was used. Figure 1 indicates that the rate of penetration is a strong function of Q, RPM, T, WOB, and P. The R is 0.92, 0.91, 0.86, 0.85, 0.85 for Q, RPM, T, WOB, and P, respectively. ROP has a moderate function of uniaxial compressive strength (UCS); R is − 0.52. The ROP had a strong function of mud density (R = − 0.85) and the same result was obtained for plastic viscosity (R = − 0.74), Fig. 1. The negative sign of R indicated that increasing the mud density and plastic viscosity will reduce the ROP.

Fig. 1
figure 1

Relative importance of the mechanical and fluid parameters with ROP for well 1

In order to reduce the number of input parameters, the ratio of T/UCS was taken as one parameter, horsepower (HP = the product of pressure and flow rate) was used to express the input of pressure and flow rate, and the ratio of D/PV was used as one input. Figure 2 shows that ROP has an R of 0.65 and 0.56 for HP and T/UCS ratio, respectively. ROP has an R of 0.70 for D/PV.

Fig. 2
figure 2

Relative importance of combined parameters with ROP for well 1

Building the SaDE-ANN model

Data from well 1 was used to build the SaDE-ANN model (2223 data points). The main idea of compiling SaDE with the artificial neural network is to optimize the variable parameters of the ANN model automatically. The optimization procedure will result in the best combination of the ANN parameters, which results in the highest correlation coefficient (R) and lowest AAPE. The ANN parameters that should be optimized and compiled are the number of hidden layers, the number of neurons in each layer, and the training function and transferring function. The advantage of the proposed model is its ability to automatically determine the training over testing ratio, which requires a huge amount of time to be determined if ANN is used without SaDE. The main input parameters are RPM, WOB/d, HP, T/UCS, and D/PV, while the output parameter is ROP.

To train the model, a normalization should be done for both the input and output parameters. The following equations were used to normalize the parameters and they were extracted from the ANN model:

$$ RP{M}_n=0.016\ \left( RPM-68\right)-1 $$
(1)
$$ {\left(\frac{WOB}{d}\right)}_n=0.638\ \left(\frac{WOB}{d}-0.263\right)-1 $$
(2)
$$ {\left(\frac{T}{UCS}\right)}_n=1343.98\ \left(\frac{T}{UCS}-0.0002\right)-1 $$
(3)
$$ H{P}_n=0.00095\ \left( HP-344.134\right)-1 $$
(4)
$$ {\left(\frac{D}{PV}\right)}_n=1.212\ \left(\frac{D}{PV}-2.421\right)-1 $$
(5)

The optimization process confirmed that the best ANN structure is 5-30-1; where the five input parameters were used in the input layers such as RPM, WOB/d, T/UCS, D/PV, and HP. Thirty neurons were selected for the first hidden layer and one output parameter (ROP) representing the output layer. The data was divided into 60% training and 40% testing. Bayesian regularization backpropagation (trainbr) was the optimum training function and the perfect transforming function was Logarithmic sigmoid (logsig). Table 1 lists the statistical parameters of the training data (1334 data points). Q ranged from 617 to 1172 gpm. Revolutions per minute ranged from 68 to 129 rpm. WOB ranged from 4.50 to 54.39 klbf. T ranged from 11 to 40 klbf-ft. P ranged from 956 to 3577 psi. D ranged from 76.6 to 78.4 pcf. PV ranged from 20 to 27 cP. UCS ranged from 21,000 to 40,000 psi, and ROP ranged from 3.85 to 122.20 ft/h.

Table 1 Statistical parameters of the training data (1334 data points)

Figure 3 shows that by using the optimized combination of the SaDE-ANN model parameters, the ROP was predicted by a high precision. The R was 0.98 and the AAPE was 5% for the training data. Figure 4 shows that the coefficient of determination (R2) was 0.97 when the measured and predicted ROP values are plotted.

Fig. 3
figure 3

ROP prediction using SaDE-ANN model for training data (1334 data points)

Fig. 4
figure 4

Coefficient of determination between actual and predicted ROP using the SaDE-ANN model

For testing the developed ROP model, 889 unseen data was used. Table 2 shows the statistical analysis of the testing data. Q ranged from 683 to 1172 gpm. Revolutions per minute ranged from 75 to 129 rpm. WOB ranged from 5.03 to 53.89 klbf. T ranged from 12.86 to 39.19 klbf-ft. P ranged from 1493 to 3577 psi. D ranged from 76.6 to 78.4 pcf. PV ranged from 20 to 27 cP, UCS ranged from 21,000 to 40,000 psi, and ROP ranged from 3.85 to 122.20 ft/h. Figure 5 shows that the developed SaDE-ANN model was able to predict the ROP with an R of 0.98 of and an AAPE of 5.6%. This result confirmed the accuracy of the developed SaDE-ANN ROP model.

Table 2 Statistical parameters of the unseen testing data (889 data points)
Fig. 5
figure 5

ROP prediction using SaDE-ANN model for testing data (889 unseen data points)

Development of a new ROP empirical correlation

Using the weights and biases of the optimized SaDE-ANN model, a new empirical correlation for ROP prediction was developed. Equation 6 can be used to find the ROP in normalized form and Eq. 8 can be used to determine ROP in the de-normalized.

$$ RO{P}_n={\sum}_{i=t}^N{W}_{2_i}\ \frac{1}{1+{e}^{-{X}_i}}+{b}_2 $$
(6)
$$ {X}_i=\frac{1}{1+{e}^{-\left({w}_{1_{i,1}} RP{M}_n+{w}_{1_{i,2}}{\left(\frac{WOB}{d}\right)}_n+{w}_{1_{i,3}}{\left(\frac{T}{UCS}\right)}_n+{w}_{1_{i,4}}{\left(\frac{D}{PV}\right)}_n+{w}_{1_{1,5}}{(HP)}_n+{b}_{1_i}\right)}} $$
(7)
$$ ROP=\frac{RO{P}_n+1}{0.01689}+3.6 $$
(8)

Table 3 lists the weights (W1, W2) and biases (b1, b2) that are used in Eqs. 6 and 7.

Table 3 Weight and biases for the hidden layer of SaDE-ANN-based ROP model

To assess the accuracy of the proposed equation, the unseen data was used. Figure 6 shows that the R2 was 0.96 when plotting the actual and predicted ROP for unseen data. The AAPE was around 5.6% confirming the high accuracy of the developed ROP equation (Eq. 8).

Fig. 6
figure 6

Prediction of ROP using Eq. 8 for the unseen data

For further validation of the developed ROP correlation, well 2 data were used (2651 data points). Table 4 lists the statistical parameters for well 2 data. Q ranged from 793 to 1170 gpm. RPM ranged from 98 to 127 rpm. WOB ranged from 4.6 to 47 klbf. T ranged from 11.5 to 26 klbf-ft. P ranged from 1187 to 2568 psi. D ranged from 76.6 to 78.3 pcf. PV ranged from 20 to 27 cP, UCS ranged from 21,000 to 40,000 psi, and ROP ranged from 8.53 to 49.12 ft/h.

Table 4 Statistical parameters of the validation data (2651 data points of well 2)

Figure 7 shows the high accuracy of the ROP prediction using that of Eq. 8. The AAPE was 4.3% and the R was 0.92.

Fig. 7
figure 7

ROP prediction using developed ROP (Eq. 8) correlation and different models for well 2

Comparison with previous models

The developed ROP correlation was compared with three previous ROP models (Maurer (1962), Bingham (1965), and Bourgoyne and Young’s (1974)).

Maurer (1962) derived a ROP formula (Eq. 9) for roller cone bits assuming perfect hole cleaning conditions. He assumed that the applied WOB was far larger than the threshold WOB required by the bit teeth to penetrate through the rock which is only valid in the soft formations. Figure 7 shows that the Maurer model underestimated the ROP values from the top of the formation to 2000 ft of the formation section. While the Maurer model overestimated the ROP of the lower part of the intermediate section which has a higher compressive strength, the accuracy of ROP prediction was very low; R was 0.04.

$$ ROP=k\ \frac{RPM^{\ast }\ WO{B}^2}{d^{2\ast }\ UC{S}^2} $$
(9)

Bingham (1965) considered the ROP as a function in WOB and RPM and he integrated the bit diameter into his formula (Eq. 10). He neglected the threshold WOB required for penetration. He also agreed on the non-linear relation between ROP and WOB but continues that the WOB exponent is determined experimentally. UCS is not included in the Bingham model, which resulted in a very low accuracy for predicting ROP (Fig. 7). The R was 0.06 and the AAPE was 11%.

$$ ROP={k}^{\ast }{\left(\frac{W}{d}\right)}^{a\ast }\ RP{M}^b $$
(10)

Bourgoyne and Young’s (1974) model (Eq. 11) was used to estimate the rate of penetration. Figure 7 shows the inaccuracy of this model for predicting the ROP; the correlation coefficient was 0.41 and the APPE was 12%. This model ignored the rock strength and the mud properties, which yielded a low accuracy.

$$ \frac{d}{dt}(ROP)={e}^{\left({a}_1+{\sum}_{i=2}^8 aixi\right)} $$
(11)

In terms of AAPE, Fig. 8 confirms that the developed SaDE-ANN correlation outperformed the three models for predicting the ROP. The Maurer model yielded 43%, the Bingham model yielded 11%, and Bourgoyne and Young’s model yielded 12%, while the SaDE-ANN correlation yielded 4.3%.

Fig. 8
figure 8

Comparison between the developed SaDE-ANN equation with previous models

Conclusions

The SaDE technique was applied to optimize the ANN variable parameters and determine the best combination of these variables. The obtained results showed that ROP has a strong relationship with the drilling variables RPM, WOB, T, and HP, while ROP is a moderate function of UCS. The optimized ANN structure is 5-30-1; where five input parameters were used in the input layer, the optimized number of neurons was 30, and ROP as the output parameter exists in the output layer. The new ROP-ANN model can predict ROP with a 0.98 correlation coefficient and an AAPE of 5%. The developed ROP empirical correlation outperformed the previous ROP models.