Introduction

One of the most critical factors affecting the total cost of oil well’s drilling is the required time to complete the drilling operations (Lyons and Plisga, 2004). Rig time is a function of several parameters including the rate of penetration (ROP), which represents the number of feet drilled per 1 h; therefore, ROP is considered as the most important factor controlling the rig time and the cost of drilling (Barbosa et al., 2019).

There are several parameters controlling the ROP that could be subdivided into two main categories of controllable and uncontrollable parameters (Hossain and Al-Majed, 2015). The mud injection rate (Q), drillstring rotation speed (DSR), standpipe pressure (SPP), torque (T), and the weight on bit (WOB) are all considered as controllable parameters (Eren and Ozbayoglu, 2010; Mitchell and Miska, 2011; Payette et al., 2017), while the drilling fluid type, rheological properties, and density and the drill bit size are all uncontrollable parameters. Quantification of the effect of the uncontrollable parameters on the ROP is complicated because the change in any of these parameters affects the others (Osgouei, 2007). The ROP is also affected by the hole cleaning conditions, especially for the inclined and horizontal wells (Mahmoud et al., 2020a).

Several previous studies were conducted to develop models for ROP estimation; these models considerably vary in terms of accuracy because of the variation in the parameters considered to calculate the ROP by every one of these models, which significantly limits their applicability (Soares et al., 2016; Soares and Gray, 2019).

There are two types of ROP prediction models: traditional models and data-driven-based models (Hegde et al., 2017). The traditional models are empirical correlations developed based on regression analysis, and the data-driven models are developed based on machine learning techniques.

Empirical equations for prediction of the ROP

Maurer (1962) developed the first empirical correlation for ROP estimation while drilling with the rolling cutter bits. Maurer’s (1962) empirical correlation in Eq. (1) estimates the ROP as a function of the WOB, DSR, drill bit size, and rock strength.

$$ ROP=\frac{k}{S^2}{\left(\frac{WOB}{d_b}-\frac{W_t}{d_b}\right)}^2 RPM $$
(1)

where K represents the constant of proportionality; S denotes the compressive strength of the formation; WOB is the weight on bit, Klbf; Wt represents the threshold value of the bit weight; and db denotes the drill bit diameter, in. Wt in Eq. (1) is too much smaller than the WOB, and hence, for simplification, the second term in Eq. (1) could be assumed equal to zero.

Another ROP empirical correlation of Eq. (2) was developed by Bingham (1965). In his model, Bingham (1965) combined the rock strength’s effect into the constant of proportionality and considered a varying exponent (a5) to replace the constant exponent of Eq. (1).

$$ ROP=k{\left(\frac{WOB}{d_b}\right)}^{a_5} RPM $$
(2)

where K represents the constant of proportionality, which also includes the rock strength’s effect, and a5 denotes the WOB exponent.

Considering Eq. (1) and Eq. (2) developed by Maurer (1962) and Bingham (1965), respectively, it is clear that both models did not account for the effect of the differential pressure, drill bit’s hydraulics and wear, and formation compaction on the change in the ROP, and this considerably reduced the accuracy of ROP estimated using these models.

Bourgoyne and Young (1974) suggested Eq. (3) for ROP estimation and optimization of the drilling process. In their model, Bourgoyne and Young (1974) considered the effect of the formation’s compaction, depth, and strength; the bit’s diameter, wear, and hydraulics; and the bottom hole’s pressure differential, WOB, and DSR on the ROP.

$$ \frac{dD}{dt}={e}^{\left[{a}_1+{\sum}_{j=2}^8{a}_j{x}_j\right]} $$
(3)

where D represents the well’s true vertical depth, ft.; t denotes the time, the constants a1a8 are drilling parameter coefficients; and x2x8 represent the drilling parameters in dimensionless format calculated as a function of the real drilling parameters, where a1 accounts for the formation strength, a2x2 and a3x3 consider the compaction of the formation, a4x4 accounts for the differential pressure, a5x5 models the effect of the bit diameter and WOB, a6x6 accounts for the change in the DSR, a7x7 considers the variation in the drill bit’s tooth wear, and a8x8 models the impact of the bit hydraulic jet. Eqs. (4) to (10) could be used to calculate the variable xj.

$$ {x}_2=\mathrm{10,000}-D $$
(4)
$$ {x}_3={D}^{0.69}\left( pore\ pressure\ gradient-9.0\right) $$
(5)
$$ {x}_4=D\left( pore\ pressure\ gradient- equivalent\ circulation\ density\right) $$
(6)
$$ {x}_5=\mathit{\ln}\left[\frac{\raisebox{1ex}{$ WOB$}\!\left/ \!\raisebox{-1ex}{${d}_b$}\right.-{\left(\raisebox{1ex}{$ WOB$}\!\left/ \!\raisebox{-1ex}{${d}_b$}\right.\right)}_t}{4.0-{\left(\raisebox{1ex}{$ WOB$}\!\left/ \!\raisebox{-1ex}{${d}_b$}\right.\right)}_t}\right] $$
(7)
$$ {x}_6=\mathit{\ln}\left[\frac{RPM}{100}\right] $$
(8)
$$ {x}_7=- fractional\ tooth\ height\ worn\ away $$
(9)
$$ {x}_8=\frac{\rho\ q}{350\ \mu\ {d}_n} $$
(10)

where ρ represents the drilling mud’s density, lb/gal; q is the drilling fluid’s flow rate, gal/min; μ represents the viscosity of the drilling mud, cP; and dn is the diameter of the drill bit, inches.

Applications of machine learning techniques for evaluation of the rate of penetration

Machine learning techniques were applied in different applications in scientific research areas (Hag Elsafi, 2014; Babikir et al., 2019), including the petroleum industry where the AI techniques were used to solve difficult problems such as prediction of formation tops (Elkatatny et al., 2019), identification of lithology (Ren et al., 2019), evaluation of drill bit wear using drilling parameters (Arehart, 1990), estimation of the drilling fluid rheology (Elkatatny, 2017; Abdelgawad et al., 2018), estimation of total organic carbon (Mahmoud et al., 2017a, 2017b, 2019a, 2019b, 2020b), estimation of the oil recovery factor (Mahmoud et al., 2017c, 2019c), prediction of the pore and fracture pressures (Ahmed et al., 2019a, 2019b), evaluation of the static Young’s modulus (Mahmoud et al., 2019d, 2020c, 2020d, 2020e), optimization of the rate of penetration (Al-AbdulJabbar et al., 2018; Mahmoud et al., 2020f), and detection of the downhole anomalies (Alsaihati et al., 2021).

Artificial neural network (ANN) is a machine learning tool inspired by biological neural networks and developed to mimic animal brains. In its simplest form, the ANN model consists of three layers: single input, single training, and single output layers. Any of these layers has a collection of connected neurons that model the biological brain nodes. Every neuron in the input layer represents a single input parameter, while the number of the neurons in the training layer optimized to predict the targeted parameter, and the neuron in the output layer represents the output parameter. Different training and transferring functions are usually evaluated during the model optimization stage to find the optimum weights and biases associated with the input, training, and output layers that will optimize the predictability of the targeted parameter.

To overcome the weakness of the empirical equations developed based on the linear regression analysis on estimating the ROP with low accuracy, Bilgesu et al. (1997) introduced the application of artificial intelligence for estimation of the ROP and suggested two artificial neural network (ANN) models to estimate the ROP for nine formations. In their first model, the authors used the formation type; drill bit’s diameter, tooth, bearing wear, and type; gross hours of drilling; mud circulation; WOB; footage; and DSR as inputs to predict the actual ROP, while the bearing wear and bit tooth were excluded from the input variables in the second model. The results of this study showed that both models accurately estimated the ROP.

In another study, Amar and Ibrahim (2012) also suggested another two ANN models to predict the ROP from the formation depth, DSR, WOB, ECD, the formation pore pressure gradient, Reynolds number function, and the drill bit’s tooth wear. The results showed that the ANN-based models were able to estimate the ROP with higher accuracy compared with the available empirical equations.

Elkatatny (2018) developed an equation for the estimation of the ROP in vertical wells based on the optimized ANN. The developed equation estimated the ROP based on the surface measurable drilling parameters of the Q, DSR, T, WOB, and standpipe pressure (SPP) in combination with the drilling fluid properties of the plastic viscosity (PV) and mud weight (MW). The author evaluated his equation in real data, and it evaluated the ROP with a very low average absolute percentage error (AAPE) of 4% compared to AAPE of more than 10% for the estimation with available empirical equations.

Al-AbdulJabbar et al. (2020) optimized an ANN model for ROP estimation on carbonate formation during horizontal drilling. This model is based on the use of the Q, DSR, and T, in combination with the conventional well log data of the gamma ray, formation bulk density, and deep resistivity. This model showed a great improvement in predicting the ROP for the carbonate formations.

In this study, a new model for ROP estimation in sandstone formations during the horizontal drilling process was developed based on the surface measurable parameters of the Q, DSR, SPP, WOB, and T, with a newly developed parameter called calculated ROP (ROPc), which is defined in the study for the first time from the DSR, the WOB, the drill pipe diameter (D), and the drilled hole area (A).

Methodology

In this study, the ANN technique was applied to develop a model to enable the estimation of the ROP in real-time while horizontally drilling through natural gas-bearing sandstone formations based on only the surface measurable drilling parameters of the Q, DSR, SPP, T, and WOB, with the ROPc parameter. The expression for the ROPc is developed in this study, which determines the ROP based on the DSR, the WOB, the drill pipe diameter (D), and the drilled hole area (A).

Data preparation and preprocessing

To train the ANN model, 3082 datasets of the different input drilling parameters and their corresponding actual ROP were collected from an oil well (Well-A) in the Middle East; the input parameters are all surface measurable in the real-time base; this is considered to enable real-time prediction of the ROP will drilling. Another 4662 datasets of the inputs and ROP are collected from another well (Well-B) from the same oil field. The data gathered from both well is collected while horizontally drilling through natural gas-bearing sandstone formations. Before introducing the inputs into the ANN model, the data was evaluated to remove all unrealistic values and outliers. For unrealistic value determination, the mechanical specific energy (MSE), which is a parameter developed by Teale (1965), accounts for the energy required at the surface to drill a specific volume of the rock. According to Teale (1965), the value of the MSE should correlate with the crushing strength of the rock or the rock’s compressive strength (UCS) value.

The UCS for the sandstone formations considered in this study is in the range of 25,000 to 45,000 psi. As indicated in Fig. 1, many of the MSE values for the data collected from both Well-A and Well-B are considerably greater or lower than the UCS; all these values are unrealistic and represent data of inefficient drilling, so all the data points with MSE values outside the range of 15,000 to 75,000 psi values are removed from the data considered in this study; this range of the MSE is selected by considering the formation UCS ± a margin. As shown in Fig. 1, out of the data gathered from Well-A and Well-B, 1031 and 1992 data points, respectively, represent locations of inefficient drilling; in this stage of data preprocessing, all the inefficient drilling data points are removed from the input data. After removing the inefficient (unrealistic) drilling data points, 2051 data points from Well-A and 2670 data points from Well-B were considered realistic.

Fig. 1
figure 1

The relationship between the MSE and the UCS at the points corresponding to the collected input drilling parameters for the sandstone formations considered in this study; these data are from a Well-A and b Well-B. Many MSE values are significantly different than their corresponding UCS; these data are unrealistic (inefficient drilling) and removed from the inputs

The second step in the data preprocessing is to define and eliminate all outliers; for this purpose, the standard deviation is considered as the controlling factor in this step. The outliers in all input drilling parameters of Q, DSR, SPP, T, and WOB and their corresponding ROP are identified as the data points outside the range of ± 3.0 standard deviation; every dataset having an outlier is removed from the input data. After outlier removal, 1649 and 2213 of the surface measurable drilling datasets gathered from Well-A and Well-B, respectively, are considered valid to develop the ANN-based model for ROP estimation.

Developing a new expression for the rate of penetration

To develop the new term for ROP, which is called the calculated ROP (ROPc), the ROPc will be considered as an input parameter to train the ANN model, starting from the MSE expression as in Eq. (11).

$$ MSE=\frac{WOB}{A}+\frac{8\times RPM}{d_b\times ROP} $$
(11)

The first term in Eq. (11) is much less than the second term, so the main dominant parameters will be the DSR and ROP. The objective at this step is to relate the MSE, DSR, and ROP parameters of the training data. By plotting the MSE and ROP at DSR of 60, 80, and 100 rpm for the training data of Well-A after preprocessing as in Fig. 2, it is noted that every one of the three plots in Fig. 2 represents a relationship between the MSE and ROP at specific DSR, and these three plots are best fitted by power functions with an exponent (n) of − 1.0 and constant (a) of 82,209, 108,525, and 136,380 for DSR of 60, 80, and 100 rpm.

Fig. 2
figure 2

The relationship between the MSE and ROP at DSR of 60, 80, and 100 rpm for the training data of Well-A. There is relationship between the MSE and ROP at every specific DSR best fitted by power functions with an exponent (n) of − 1.0

The three power functions in Fig. 2 could be expressed in a general form as in Eq. (12) or Eq. (13).

$$ MSE=a\ {ROP}^{-1.0}=\frac{a}{ROP} $$
(12)
$$ MSE\times ROP=a $$
(13)

The exponent in Eq. (12) is simplified to − 1.0 to simplify the derivation of the ROPc expression.

To carry further analysis on the value of constant “a,” multiple analysis was performed. In Fig. 3, in MSE and ROP relations, both Well-A and Well-B were plotted for the DSR of 60 and 80; it is clear in Fig. 3 that the relationships are similar at the same DSR values regardless of the source of the MSE and ROP values (Well-A or Well-B).

Fig. 3
figure 3

The plot of the MSE versus ROP at DSR of 60 and 80 rpm for the data of both Well-A and Well-B. There is relationship between the MSE and ROP at every specific DSR best fitted by power functions with an exponent (n) of − 1.0

Now, let us plot the constant “a” and DSR as shown in Fig. 4. The plot in Fig. 4 confirms that the constant “a” and DSR have a linear relationship, which could be expressed in a general form as in Eq. (14).

$$ a=1354\ RPM+696 $$
(14)
Fig. 4
figure 4

The relationship between the constant “a” and the DSR. The values of the constant “a” are extracted from the plots of Fig. 2

Now, by considering Teale expression for the MSE in Eq. (15), and neglecting the torque and substituting for the MSE from Eq. (12) into Eq. (15), we will have the relationship in Eq. (16):

$$ MSE=\frac{WOB}{A}+\frac{2\pi \times RPM\times Torque\ }{A\times ROP} $$
(15)
$$ \frac{a}{ROP}=\frac{WOB}{A}+\frac{2\pi \times RPM\ }{A\times ROP} $$
(16)

Rearranging Eq. (16) we will get:

$$ ROP=\frac{-\left(2\pi \frac{RPM}{A}-a\right)A}{WOB}={ROP}_c $$
(17)

where the expression for the ROP in Eq. (17) is a new expression and we call it here as calculated ROP of ROPc. Equation (17) is relating the ROPc to two parameters only: DSR and WOB even after the constant “a” was expanded. The idea behind the previous equations is to create an offset in the equation to make the ROP prediction much easier. The cross-plot of Fig. 5 compares the actual ROP and ROPc calculated using Eq. (17). Even though the data fit is not perfect, the correlation coefficient is just enough to guide the model toward better ROP prediction. As explained earlier, the newly introduced parameter (ROPc) will be used as an input to train the ANN model along with the five surface measurable drilling parameters (i.e., Q, DSR, SPP, T, and WOB).

Fig. 5
figure 5

The cross-plot of the calculated rate of penetration (ROPc) and ROP for the data collected from Well-A (1649 data points)

Optimizing the artificial neural network model

ANN model was optimized in this study to predict the ROP as a function of six parameters; five surface measurable variables of the Q, DSR, T, SPP, and WOB and the sixth parameter are the ROPc calculated by Eq. (17). The sixth parameter (ROPc) is included as an input to improve the ROP estimation accuracy.

The ANN model was trained on 1154 datasets of the surface measurable parameters and their corresponding calculated ROPc to estimate the actual ROP; these are the data collected from Well-A, which represents 70% of Well-A’s data after unrealistic value and outlier removal. Figure 6 shows the input surface measurable parameters and ROPc calculated using Eq. (17) for Well-A.

Fig. 6
figure 6

The input surface measurable drilling parameters of Well-A, from left to right: Q, DSR, SPP, T, WOB, and ROPc; the ROPc is calculated using Eq. (17). This is the data used to train the ANN model

The statistical features of the training surface measurable drilling parameters and their corresponding ROP values of Well-A are in Table 1. As summarized in this table, the Q values are ranging from 239 to 259 gal per minute (gpm), DSR is ranging between 59.0 and 106 rotations per minute (rpm), SPP is in the range from 2401 to 3746 psi, T is ranging between 4.32 and 10.6 kft.lbf, WOB is in the range from 5.07 to 20.7 klbf, and ROP is between 1.20 to 9.85 ft/h. The statistical characteristics listed in Table 1 are very important because they show the applicable range for the optimized ANN model and the empirical correlation to be developed out of this model.

Table 1 The ranges and statistical characteristics for the training drilling parameters and their corresponding ROP collected from Well-A

Sensitivity analysis was performed to optimize the ANN model design parameters of the training and transferring functions, the number of training layers, and the optimum number of neurons per every training layer. During this stage, different training functions such as the gradient descent with adaptive learning rate backpropagation function, Levenberg-Marquardt function, and resilient backpropagation function were evaluated. Predictability of the transferring functions of the pure linear function, logarithmic sigmoid function, and tangential sigmoid function was also studied. The use of one, two, and three training layers with the use of one to 30 neurons in every layer was also evaluated.

Table 2 lists the optimum ANN model’s design parameters that according to the sensitivity analysis conducted in this study enabled the prediction of the ROP with the lowest AAPE and root mean square error (RMSE) and the highest R. Figure 7 also shows the schematic of the optimized ANN model, which consists of one input layer having six neurons for the six inputs, one training layer having five neurons, and one output layer. This training data was obtained from Well-A.

Table 2 The ANN model’s optimum design parameters
Fig. 7
figure 7

Schematic of the optimized ANN model for ROP estimation. The letter “b” denotes the bias. This model consists of six neurons in the input layer, five neurons in the training layer, with a single neuron in the output layer

The effect of excluding one input parameter from the training input variables (i.e., Q, DSR, SPP, T, WOB, and ROPc) was also studied during the sensitivity analysis; at this stage, the aim is to investigate the impact of neglecting a single input variable, especially the newly developed ROPc, on the predictability of the ROP.

After the sensitivity analysis, and based on the calculated AAPE in Eq. (18), RMSE in Eq. (19), and correlation coefficient (R) in Eq. (20), the system with the lowest AAPE and RMSE and the highest R was selected as the optimized ANN model for ROP prediction.

$$ AAPE=\frac{1}{N}\sum \limits_{i=1}^N\left(\left|\frac{{\left({ROP}_r\right)}_i-{\left({ROP}_p\right)}_i}{{\left({ROP}_r\right)}_i}\right|\times 100\right) $$
(18)
$$ RMSE=\sqrt{\frac{1}{N}\sum \limits_{i=1}^N{\left[{\left({ROP}_r\right)}_i-{\left({ROP}_p\right)}_i\right]}^2} $$
(19)
$$ R=\frac{\sum_{i=1}^N\left[\left({\left({ROP}_r\right)}_i-\overline{ROP_r}\right)-\left({\left({ROP}_p\right)}_i-\overline{ROP_p}\right)\right]}{\sqrt{\sum_{i=1}^N{\left[{\left({ROP}_r\right)}_i-\overline{ROP_r}\right]}^2{\sum}_{i=1}^N{\left[{\left({ROP}_p\right)}_i-\overline{ROP_p}\right]}^2}} $$
(20)

where N is the total number of the datasets and the subscripts r and p represent the real and predicted ROP.

Developing the new equation for estimation of the rate of penetration

After optimizing the ANN model, its weights and biases were extracted to develop the new ROP empirical equation. As indicated earlier in Table 2 and Fig. 7, the optimum ANN model has a single training layer associated with five neurons and it calculated the ROP as a function of six inputs using the Levenberg-Marquardt training function and pure linear transferring function. The general form of the equation that represents the ANN model with pure linear transferring function is as in Eq. (21).

$$ y=\left(\sum \limits_{j=1}^m{w}_{j1}\left(\sum \limits_{i=1}^n{w}_{ij}{\mathrm{x}}_i+{b}_j\right)+{b}_1\right) $$
(21)

where y is the objective parameter; i and j indexes account for the inputs and neurons; w represents the weights; m and n denote the number of neurons and inputs, respectively; x represents the input variables; and b represents the biases.

For the optimized ANN model of Fig. 7, Eq. (21) could be written as in Eq. (22).

$$ ROP=\left(\sum \limits_{j=1}^5{w}_{j1}\left(\sum \limits_{i=1}^6{\mathrm{w}}_{ij}{\mathrm{x}}_i+{b}_j\right)+{b}_1\right) $$
(22)

where the weights and biases required for Eq. (22) were extracted and summarized in Table 3.

Table 3 The weights and biases extracted from the optimized ANN model for ROP estimation, with i and j indexes accounting for the inputs and neurons, respectively

By expanding Eq. (22), it will be as in Eq. (23).

$$ ROP={a}_1 GPM+{a}_2 RPM+{a}_3 SPP+{a}_4 Torque+{a}_5 WOB+{a}_6{ROP}_c+c $$
(23)

To determine the coefficients a1 to a6 of Eq. (23), as explained in Appendix A, the output layer and training layer weight matrices, which are [1, 5] and [5, 6], must be multiplied to obtain a matrix of [1, 6]. The constant c could be determined by multiplying the output layer weight matrix by the training layer bias matrices, which are [1, 5] and [1, 5], respectively; this multiplication results in a scaler that is equal to − 0.0398; adding this scaler to the output layer bias of − 0.3896 results in − 0.4294, which equals to the constant c in Eq. (23). More discussion about the matrix multiplication and determination of the coefficients a1 to a6 and the constant c is in Appendix A.

Substituting for the coefficients a1 to a6 and the constant c from Appendix A into Eq. (23) leads to the final ROP equation in Eq. (24).

$$ ROP=-0.0260\ GPM+0.2521\ RPM+0.0338\ SPP-0.0107\ T-0.5599\ WOB+1.6778\ {ROP}_c-0.4294 $$
(24)

Testing and validating the new equation for estimation of the ROP

The developed equation for prediction of the ROP (Eq. (24)) was tested on 495 new unseen datasets from Well-A (30% of Well-A) and then validated on 2213 datasets collected from Well-B. The predictability of the developed equation for ROP in Well-B was also compared with the predictability of available empirical equations to evaluate the improvement in ROP prediction using Eq. (24) developed in this study.

Results and discussion

Training the artificial neural network model

The ANN model was firstly trained on 1154 datasets collected from Well-A to predict the ROP from six inputs of the Q, DSR, SPP, T, WOB, and ROPc. As shown in Fig. 8, the optimized ANN model estimated the ROP accurately as confirmed by the excellent matching between the real and predicted ROP as well as the low AAPE and RMSE of 5.11% and 0.28 ft/h, respectively, and the high R of 0.991.

Fig. 8
figure 8

Comparison of the actual and predicted ROP for the training datasets collected from Well-A (1154 datasets). The ROP was predicted accurately with R, AAPE, and RMSE of 0.991, 5.11%, and 0.28 ft/h, respectively

Sensitivity analysis of the training input variables

As explained earlier, the Q, DSR, SPP, T, WOB, and ROPc are considered as inputs for the ANN model to estimate the ROP as output for Well-A. Figure 9 compares the AAPE, RMSE, and R in estimating the ROP for the use of all the six input variables and different cases of excluding one of the input variables. As indicated in Fig. 9, the use of all the six inputs enabled the prediction of the ROP accurately with AAPE and RMSE of 5.34% and 0.29 ft/h, respectively, and R of 0.990. Excluding any of the six input parameters from the training input data reduced the accuracy of the ANN model predictability as confirmed by the increase in the AAPE and RMSE and the decrease in R.

Fig. 9
figure 9

Comparison of the effect of excluding an input parameter from the training data. These results indicate that all inputs are necessary for ROP estimation

Excluding the ROPc from the input training data significantly reduced the accuracy of the ANN model for ROP estimation. The ANN model estimated the ROP with a very high AAPE of 42.6%, RMSE of 1.89 ft/h, and R of only 0.424, as indicated in Fig. 9. These results confirmed the importance of including the ROPc parameter calculated using Eq. (17) as an input parameter to train the ANN model for ROP estimation.

Testing the developed equation for the rate of penetration

The developed equation for ROP prediction in Eq. (24) was tested on another 495 datasets collected from Well-A (30% of Well-A). As indicated in Fig. 10, Eq. (24) estimated the ROP for the testing data (495 datasets from Well-A) with very low AAPE and RMSE of 5.39% and 0.30 ft/h, respectively. The R between actual and estimated ROP for the testing data is 0.989. Visual check of the real and predicted ROP also indicates the high accuracy of Eq. (24) in estimating the ROP as shown in Fig. 10.

Fig. 10
figure 10

Comparison of the actual and estimated ROP for the testing datasets collected from Well-A (495 datasets). The predicted ROP values were calculated using the new empirical correlation of Eq. (24). The ROP was predicted accurately with R, AAPE, and RMSE of 0.989, 5.39%, and 0.30 ft/h, respectively

Validating the developed equation for the rate of penetration

The predictability of the optimized ANN-based model developed in this study was evaluated using the 2213 data points collected from Well-B. The predictability of the ANN-based model was also compared to four of the previously available models of Bingham, Maurer, Bourgoyne and Young, and Al-Abduljabbar et al. (2020) model. As shown in Fig. 11, the optimized ANN-based model is most accurate compared with the other models as confirmed by the excellent matching between the actual and estimated ROP. In term of the RMSE, Bingham, Maurer, Bourgoyne and Young, and Al-Abduljabbar et al. model predicted the ROP for Well-B with RMSEs of 1.67, 2.02, 1.29, and 1.39 ft/h, respectively, while the RMSE for the ROP predicted with the ANN-based model is only 0.44 ft/h. All previous models predicted the ROP with a very low R of less than 0.25, while the ANN-based model predicted the ROP with 0.954. As confirmed in Fig. 11 and Fig. 12, all previous models estimated the ROP with very high AAPE and RMSE and low R. The AAPEs for the ROP predicted using Bingham, Maurer, Bourgoyne and Young, and Al-Abduljabbar et al. models are 51.0% 47.57%, 36.64%, and 33.73%, respectively, compared to the AAPE of only 8.85% for the ANN-based model.

Fig. 11
figure 11

Comparison of the ROP estimation in Well-B using Bingham model, Maurer model, Bourgoyne and Young model, Al-Abduljabbar et al. model, and the new ROP correlation of Eq. (24) developed in this study. Equation (24) predicted the ROP accurately with the lowest AAPE and RMSE and the highest R compared to the previous models

Fig. 12
figure 12

Comparison of the AAPE, RMSE, and absolute R on estimating the ROP for the validation dataset. Equation (24) predicted the ROP accurately with the lowest AAPE and RMSE and the highest R compared to the previous models

These results reflected the accurate predictability of the ANN-based model in predicting the ROP for sandstone formations during the horizontal drilling process based on only the surface measurable drilling parameters.

Conclusions

ANN model was optimized to estimate the ROP in real-time while horizontally drilling through natural gas-bearing sandstone formations based on the Q, DSR, SPP, T, and WOB, in combination with ROPc, which is a new parameter developed in this study. The ANN-based model was firstly learned and optimized on 1154 data points gathered from Well-A. After that, based on the optimized ANN model, a new empirical equation for ROP estimation was developed. The developed equation was tested on 495 datasets collected from Well-A and validated on the 2213 datasets from Well-B. The predictability of the new ROP equation was compared with the available correlations. The following are concluded out of this study:

  1. a.

    The ANN-based model estimated the ROP for the training data of Well-A with AAPE and R of 5.11% and 0.991, respectively, when the ROPc is considered as an input parameter.

  2. b.

    Excluding the ROPc from the training inputs increased the AAPE for the predicted ROP to 42.6% and reduced R to 0.424.

  3. c.

    The new empirical equation estimated the ROP for the testing data of Well-A with AAPE and R of 5.39% and 0.989, respectively.

  4. d.

    For the validation data, the ROP was estimated with AAPE and R of 8.85% and 0.954, respectively, when the new empirical equation developed in this study was used.

  5. e.

    The optimized ANN-based model overperformed all the available empirical correlations for ROP estimation.