1 Introduction

Drilling fluid is a major expense in the oil and gas industry. Drilling fluid performs many functions in rotary drilling. Drilling mud is circulated to remove cuttings and enhance the performance of the drill bit. Cuttings are carried from the borehole to the surface, where cutting will be separated. Furthermore, the following functions are performed by drilling fluid [1]:

  1. 1.

    Minimizing the invasion of the filtrate

  2. 2.

    Decreasing the friction between the sides of the borehole and drill string

  3. 3.

    Sealing permeable formations

  4. 4.

    Maintaining stability for the borehole in uncased sections

  5. 5.

    Limiting reservoir damage

  6. 6.

    Ensuring sufficient formation evaluation

  7. 7.

    Forming a thin, impermeable filter cake that seals pores and reduces the fluid lost into permeable formations.

Drilling mud losses and issues related to lost circulation during drilling the thief zones account for a considerable cost in the oil and gas industry. Millions of dollars are spent annually to stop and minimize this obstacle [2]. Lost circulation can be defined as “the partial or total loss of circulating fluid from the wellbore to the formation. It is the loss of whole fluid, not simply filtrate, to the formation. Losses can result from either natural or induced causes and can range from a couple of barrels per hour to hundreds of barrels in minutes. Lost circulation is one of the drilling’s biggest expenses in terms of rig time and safety. Uncontrolled lost circulation can result in a dangerous pressure control situation and loss of the well” [3].

Even though it may happen in any zone, some primary factors to loss circulation are high permeability weakly consolidated formations, fracture calcium carbonate reservoirs, and depleted aquifer zones. [4]. Historically, technical journals, papers, manuals, international oil companies training courses, and textbooks have been categorized the types of formations that are highly candidate to have lost circulation problem, and all of them are on the same page in terms of the formation causing lost circulation. These formations are cavernous formations, natural or intrinsic fractures, induced or created fractures, and unconsolidated or highly permeable formations as shown in Fig. 1.

Fig. 1
figure 1

Candidate formations for lost circulation

Lost circulation occurrences are classified based on the total amount of mud lost during penetrating thief zones. The amount of mud loss relies on several elements, involving formation characteristics, mud specifications, and fracture gradient. The classifications of mud loss are illustrated as follows, depending on the amount of mud loss [5, 6]:

  1. 1.

    Seepage loss (0.5–1 m3/hrs or 3–6 bbl/hr): It can happen in most of the penetrated formations, and it is normal during the drilling process due to over-balance drilling phase. Also, it can be named as filtration.

  2. 2.

    Partial loss (1–10 m3/hrs or 7–70 bbl/hr): this type of mud loss can occur in permeable zones, gravel beds, and small natural and induced fractures.

  3. 3.

    Severe loss (15 m3/hrs or 95 bbl/hr and above): This type of loss is more complicated and serious than partial loss, and it can lead to complicated consequences.

  4. 4.

    Complete loss: No return from the annulus to the surface will be presented in this kind of mud loss, it is considered the most complicated and serious type of loss since it has many direct and indirect unwanted consequences in the drilling process. This type of mud loss occurs in large natural and induced fractures, vugs, long open sections of gravel, and caverns

Directly or indirectly, lost circulation has many negative impacts on the drilling process, including but not limited to; circulation loss, mechanical and differential stuck pipes, kick, bit damage, etc. [5].

An assessment of the severity of mud loss should be carried before selecting the best lost circulation treatment to be used to stop mud loss. It is distinguished that it is complicated to find one solution to stop lost circulation. Hence, an enormous range of lost circulation treatments are available, including but not limited to, high viscosity pill, fibrous, granular, and flaky materials, and cement slurries. These treatments are classified into generic groups to help in elucidating to recognize their uses. Also, a broad range of plugging materials is available for mitigating lost circulation or restoring circulation during drilling or cementing. Every material or treatment is chosen by relying on the type of mud loss, timing, cost, drilling phase, fluid type, and thief zone. Mud loss treatments and materials are used to accomplish two goals [7, 8]:

  1. 1.

    To bridge across the already existed vugs and fractures.

  2. 2.

    To prohibit the development of new fractures that may be stimulated during the penetration.

The purpose of this paper is to build two neural network models to predict mud loss for natural and induced fractures using data of more than 3500 wells drilled worldwide. This work will eliminate the shortcomings in the literature regarding lost circulation predictions using real-field data collected from various locations around the world.

2 Neural networks

McCulloch and Pitts [9] presented the first neural network research. Rosenblatt [10] developed the perceptron and proved that a perceptron would create a vector that divides the classes. Rosenblatt [10] believed that structures of more layers can conquer the limitations of a simple perceptron. Nevertheless, there were not any learning algorithms that can determine the weights for a given calculation [11]. A few years after, a network called Adeline was created by Widrow [12]. Minsky and Papert [13] proved that single-layer perceptron cannot solve elementary calculation problems. After that, the neural network's research stopped for 20 years [14]. Then, Hopfield [15] proposed new algorithms, such as backpropagation, that brought life for the neural network's research. Afterward, the neural network applications have gone viral [11]. An artificial neural is a mimic for a biological neuron that can process information. Neurons are the basic building blocks of the nervous system. A typical biological neuron consists of a cell body, an axon, and dendrites. Information in the cell body enters through the dendrites. The cell body then gives an output that travels via the axon then to another receiving neuron, the output from the first neuron becomes an input for the second neuron, and so on.

The human brain has 10–500 billion neurons [16]. The neurons are separated into sections; every section has about 500 neural networks [17]. Every neural network contains about 100,000 neurons where these neurons are linked with thousands of other neurons. This structure is behind the human’s complex behavior. A simple task such as moving hands, walking, or catching a cup of coffee requires very complex computations that advanced computers cannot execute, but the human brain can do them. Although computers are faster than human brains (human brain cycle is 10 to 100 milliseconds while computer chips cycle is in nanoseconds), the human brain can still perform much more complex activities than computers due to the sophisticated structures of the neurons [11].

Neural networks are a simulation for the aforementioned biological process. Neural networks are developed based on the following assumptions and mathematical models:

  1. 1.

    Neurons are responsible for processing the information

  2. 2.

    To let the information pass through, the neurons are connected by connection links. Every connection link has weights.

  3. 3.

    To find the output of a neuron after receiving input, an activation function will be applied by the neuron.

The outputs of other neurons are multiplied by the weights of the connection links and enter the neuron. Then, the input data are summed, and the activation function of the neuron is applied which leads to an output. Thus, a neuron can have multiple inputs but only a single output. An artificial neural network has an input layer where the inputs are processed, one or more hidden layers where the feature extraction from the data are processed, and an output layer to process the outputs. Artificial neural networks (ANNs) have been utilized in drilling for a long time. Table 1 shows some applications of ANNs in the drilling industry.

Table 1 Applications of ANNs in drilling

Lost circulation estimation is a limited topic in the literature; only a few papers were published about this topic. Some shortcomings were identified in the previous work as follows [34, 36, 39,40,41,42]:

  1. 1.

    Not enough data were used

  2. 2.

    The model is applicable only in a specific area

3 Data and methods

3.1 Data

Figure 2 presents a map with red dots showing the locations where data collected. Many resources were used to collect data such as daily drilling reports (DDR), final well reports, case histories, literature, etc. Two separate databases were created; natural fractures and induced fractures. The data went through processing steps where all outliers, errors, white spaces were removed [43]. Input data were selected based on previous statistical and sensitivity analysis studies conducted that showed the most influential parameters on mud loss as well as experts’ opinions [39, 41]. Table 2 shows the parameters used to create the models and a summary of statistics for both induced and natural fractures.

Table 2 Summary of statistics

3.2 Data normalization

Normalizing the data is a vital step in the training process for any neural network. Normalization and scaling help simplify the problem being modeled and assist the network in achieving better results [44, 45]. One example of normalizing the data is to make the data range between −1 and 1; this can be obtained using Eq. 1:

$$X_{i}^{\prime } = 2\left[ {\frac{{X_{i} - X_{min} }}{{X_{max} - X_{min} }}} \right] - 1$$

where \({{X}}_{{i}}^{{^{\prime}}}\) is the normalized value of original value (Xi), Xmin and Xmax minimum and maximum values of Xi, respectively [46].

3.3 Activation function

Weight (w) will be assigned to each input, and bias (b) will be assigned for each neuron in the hidden layer. These biases and weights for each input will be summed and will be an input for the activation function. There are several activation functions available in the literature. These activation functions are divided into linear and nonlinear activation functions. Discussion of the activation function is beyond the scope of this paper; details of the activation functions are available in the literature [47]. Hyperbolic tangent sigmoid activation function was selected for the hidden layer while a linear activation function was utilized for the output layer due to its suitability for regression problems [46].

3.4 Feedforward backpropagation

Usually, to create a neural network, data are divided into three sets; training, verification, and testing. This is done to ensure the model’s robustness at generalizing to new data. Training data are used in training, verification data are used to verify the model, and testing data are used to test the network and assess the outcomes. Feedforward backpropagation is the process where the data are imported to the model and obtaining a desired output, then the output of the network will be compared with the actual output, the error will backpropagate, and the weights are adjusted until calibration is reached. To prevent overtraining and ensure generalization, the data were divided into 60% for training, 20% for validation, and 20% for testing.

Fig. 2
figure 2

Data collection locations

3.5 Network structure

Choosing the number of hidden layers and the number of neurons in the hidden layers is a vital step to create neural networks. There are endless network structures that can be chosen. Choosing too many hidden layers and/or too many neurons in each hidden layer may result in overtraining of the network, meaning the network will lose generalization to new data. Thus, the optimal number of hidden layers and the number of neurons in the hidden layers must be selected. This process was done by trial and error, such that starting with one hidden layer and increasing the number of hidden layers after each trial, the same process was also implemented to select the number of neurons in the hidden layers. Figure 3 summarizes the process of selecting the optimal number of hidden layers. For each trial, the mean square of error (MSE) was calculated, and the scenario with the lowest MSE was selected. MSE was calculated using Eq. 2, where M is the number of data points [46]:

$$MSE = \frac{1}{M}\sum\nolimits_{i = 1}^{M} {\left( {Actual - Predicted} \right)^{2} }$$
(2)
Fig. 3
figure 3

Process of selecting the optimum number of hidden layers

3.6 Training algorithms

Table 3 shows a summary of the training algorithms tested in this work. Two criteria were used to select the training algorithms; the algorithm with the lowest MSE and the highest R2 was selected to train the network. Equation 3 was used to calculate R2:

$$R^{2} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\hat{y}_{i} - \overline{y}} \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - \overline{y}} \right)^{2} }}$$
(3)
Table 3 Training algorithms

where \({y}_{i}\) is the actual data point, \({\widehat{y}}_{i}\) is the estimated data point, and \(\stackrel{-}{y}\) is the average mean of the actual data. Figure 4 summarizes the methodology used in this study.

Fig. 4
figure 4

Summary of the methodology used in this study

4 Results and discussion

Since two datasets were collected for the natural and induced fractures, two networks were created: one for the natural and one for induced fractures. The results are divided into natural fractures network results and induced fractures network results.

4.1 Natural fractures network

A neural network with one input layer, one hidden layer with ten neurons, and one output layer was created for the natural fractures dataset. Figures 5 and 6 show the MSE and R2 for all training functions examined in this study, respectively. LM and BR algorithms have the lowest MSE and the highest R2 among the other algorithms with the LM algorithms being slightly better than the BR algorithm (LM has lower MSE and higher R2). Typical BR algorithm does not use validation to stop the network when a generalization is reached such that the training can continue until an optimal combination of weights is found. On the other hand, LM usually has the fastest convergence which gives accurate training. Also, the LM normally performs very well in approximation (regression) problems. Training will stop in the LM algorithm when generalization stops improving. Thus, the LM algorithm was chosen to train the network [46].

Fig. 5
figure 5

MSE of all training functions examined in this study (natural fractures)

Fig. 6
figure 6

R2 of all training functions examined in this study (natural fractures)

Figure 7 shows the MSE with iterations for training, validation, and testing sets for the LM algorithm. To avoid overfitting, the MSE in the validation set is monitored and the training will stop once the lowest MSE is reached. Also, testing and validation MSE should have similar characteristics to avoid overfitting and have a rigorous network. Figure 7 shows the training stops after 33 iterations which when the MSE for the validation set is minimum. Moreover, Fig. 7 clearly shows that testing and validation sets have the same MSE characteristics.

Fig. 7
figure 7

MSE vs epochs for the LM training function (natural fractures)

Figure 8 shows the actual and predicted mud loss for training Fig. 8a, validation Fig. 8b, testing Fig. 8c, and all Fig. 8d datasets. The R2 for the training, validation, and testing is 0.96, 0.95, and 0.948, respectively. The network has an overall R2 of 0.956. With this high R2, the network can be used to predict mud loss prior to drilling for formations with natural fractures.

Fig. 8
figure 8

Predicted and actual mud loss (natural fractures)

Equation 4 can be used to estimate mud loss for formations with natural fractures prior to drilling.

$$Mud Loss = \left[ {\mathop \sum \limits_{i = 1}^{n} w_{2i} \left( {\frac{2}{{1 + e^{{ - 2\left( {\mathop \sum \nolimits_{j = 1}^{J} w_{1i,j} x_{j} + b_{1i} } \right)}} }} - 1} \right) + b_{2} } \right]$$
(4)

where n is the number of neurons in the hidden layer that optimized to be ten, w1 is the hidden layer’s weight, w2 is the output layer’s weight, b1 is the hidden layer’s bias, b2 is the output layer’s bias, and x is the input. The j’s are related to the input variables such that j = 1 is MW, j = 2 is ECD, j = 3 is PV, j = 4 is Yp, j = 5 is Q, j = 6 is RPM, j = 7 is WOB, and j = 8 is Nozzles TFA. Table 4 summarizes the coefficients for Eq. 4.

Table 4 Coefficients for natural fracture formations mud losses (Eq. (4))

4.2 Induced fractures network

A neural network with one input layer, one hidden layer with ten neurons, and one output layer was created for the induced fractures dataset. Figures 9 and 10 show MSE and R2 for all training functions, respectively. Although the BR algorithm has a lower MSE, the LM algorithm was chosen because it has a higher R2.

Fig. 9
figure 9

MSE of all training functions examined in this study (induced fractures)

Fig. 10
figure 10

R2 of all training functions examined in this study (induced fractures)

Figure 11 shows the MSE for the LM algorithm for training, validation, and testing. Figure 11 shows the training stops after 19 iterations which when the MSE for the validation set is minimum. Moreover, Fig. 11 clearly shows that the testing and validation sets have the same MSE characteristics. Figure 12 shows the actual and predicted mud loss for training Fig. 12a, validation Fig. 12b, testing Fig. 12c, and all Fig. 12d datasets. The R2 for the training, validation, and testing is 0.928, 0.925, and 0.91, respectively. The network has an overall R2 of 0.925. With this high R2, the network can be used to predict mud loss prior to drilling for formations with induced fractures.

Fig. 11
figure 11

MSE vs epochs for the LM training function (induced fractures)

Fig. 12
figure 12

Predicted and actual mud losses (induced fractures)

Equation 5 can be used to estimate mud loss for formations with induced fractures prior to drilling.

$$Mud Loss = \left[ {\sum\nolimits_{i = 1}^{n} {w_{2i} \left( {\frac{2}{{1 + e^{{ - 2\left( {\mathop \sum \nolimits_{j = 1}^{J} w_{1i,j} x_{j} + b_{1i} } \right)}} }} - 1} \right) + b_{2} } } \right]$$
(5)

where n is the number of neurons in the hidden layer that optimized to be ten, w1 is the hidden layer’s weight, w2 is the output layer’s weight, b1 is the hidden layer’s bias, b2 is the output layer’s bias, and x is the input. The j’s are related to the input variables such that j = 1 is MW, j = 2 is ECD, j = 3 is PV, j = 4 is Yp, j = 5 is Q, j = 6 is RPM, j = 7 is WOB, and j = 8 is Nozzles TFA. Table 5 summarizes the coefficients for Eq. 5.

Table 5 Coefficients for induced fracture formations mud losses (Eq. (5))

4.3 Verification of the models

To further ensure the validity of the created neural network models, the models were tested on 24 new oil wells from different locations around the world (locations where data were gathered). Table 6 shows the 24 new tested oil wells, 12 wells for naturally fractured formations, and 12 for induced fractured formations. As can be seen, both models closely track the actual mud loss. The highest error in the naturally fractured formations network is about 6.34% in well 3, while the highest error in the induced fractured formations network is about 5.5%. These errors are not significant (one barrel per hour is not significant). Therefore, the networks are reliable to be utilized in the locations where data were collected to estimate mud loss based on key drilling parameters within an acceptable margin of error. Using the created models, key drilling parameters can be set to limit or minimize mud loss and save time and money.

Table 6 Verifications of the created networks

5 Conclusion

The following conclusions can be deduced based on this study:

  • Two neural networks were created to be used to predict mud loss for natural and induced fractures. The networks showed the ability to predict lost circulations prior to drilling within an acceptable range of error.

  • After testing various training algorithms, the LM function was selected to be used to train both networks because it had the highest R2 which makes it better for predictions.

  • The created neural networks can be used in reverse to limit mud loss in induced and natural fractures by setting the key drilling parameters and obtaining a target mud loss.

  • To further investigate and verify the created models, 24 new wells were used to test the models. The models’ outputs closely tracked the actual mud loss with a maximum error of 6.34%.

  • This work overcame the shortcoming in the previous studies about the estimation of mud loss prior to drilling. This is the first study that provides a generalized model to estimate lost circulation prior to drilling that can be used worldwide.