Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Artificial neural networks (ANNs) and their application in geoscience and petroleum industry are considered in this chapter. Application of ANNs has shown to be an effective tool to solve nonlinear complex engineering problems particularly when there are no straightforward analytical or even numerical solutions.

Frequently, when there is no analytical solution or approach to a complex or non-straightforward problem wherein the involved parameters and their exact relationship are not known clearly, ANN is applied. However, one can use ANN even in linearly behaving problems having an analytical solution so that its accuracy and functionality can be determined.

Indeed, a so-called flow of information in the ANN model takes place utilizing basic processing units which are artificial neurons connected to each other. In this way, processing of the data takes place through a network of neurons.

In petroleum industry applications to date, the commonly utilized ANN structure has been Feed-forward artificial neural network (FF-ANN) due to its simplicity compared to other neural structures. FF-ANNs are network structures in which the information or data will propagate only in one direction. This network has a learning ability to recognize the relationship between the inputs and outputs provided that adequate training data are supplied. The FF-ANNs typically consist of three layers including input layer, hidden layer, and output layer. There are other structures of ANNs which are not discussed in this chapter because of their limited usage in petroleum industry. In terms of number of hidden layers, perceptrons and multilayer networks will be discussed.

Although it is possible to have more than one hidden layer in the network, normally a single layer is preferred in many applications. The number of neurons in the input and output layers is normally determined by the problem. However, the optimal number of neurons in the hidden layer (and even the number of hidden layers) must be determined by trial and error in order to obtain a proper network size with highest possible performance.

The network learning or training is attained by adjustment of the weights corresponding to connections or links between the neurons to produce outputs with acceptable errors. After training, the network performance is tested in two stages (test and validation). If the network performance was successful in these three stages, the network would be recognized as a capable tool for simulation using new inputs and obtaining new results. Recognition and prevention of over-fitting will be discussed.

In most of geoscience and petroleum engineering applications, the FNN will employ backpropagation as its training or learning algorithm. The algorithm is called backpropagation because the output error is propagated backward to the links between neurons in the previous layers during training. In this way, backpropagation helps to modify the weights of links in order to achieve a desired output. The work mechanism of backpropagation algorithm is adjustment of each weight of the network individually based on selection of the path of the steepest gradient descent to minimize the error function (which is usually mean squared error or MSE). As each ANN performance is evaluated by the corresponding errors, knowledge about the relevant statistical model error parameters such as MSE is of great importance.

In this chapter, an attempt is made to give a comprehensive applied discussion of neural networks from basics to application. Thus, an applied example of ANN application in petroleum industry is given after presenting the basics. Then, a short overview of applications of neural network approach would be given.

2 Artificial Neural Networks (ANNs) Basics

2.1 ANN Structure in General

Artificial neural networks’ structure and function are indeed an extremely simplified version or simulation of the biological human brain. ANNs have been developed as simplification of the mathematical models of biological neural networks (Fig. 1) by having some assumptions which include processing of information takes place in some processing elements called neurons (first assumption). In this simplification, the information is transferred between neurons using connection links (2nd assumption), and a specified weight is allocated to each link to be multiplied by the information or signal which is passing each link (3rd assumption). Then, each neuron allocates a desired bias or threshold value to be added to the sum to yield a net value (4th assumption), the net value is given as input to an activation or transfer function (which is normally nonlinear function) and in this way the output of the neuron would yield (5th assumption). Simply, the function of the whole ANN structure is just the calculation of the output of all the neurons existing in the network.

Fig. 1
figure 1

Biological and artificial neuron similarity

2.2 Artificial Neurons

A typical neuron structure is shown in Fig. 2. As can be seen in this Figure the inputs are multiplied by the corresponding weights, and then, their summation is found. In addition, a bias is added to the summation as an error correction. This value is called the net value. The net value (as the input) is passed through a function called the activation function (Fig. 3). The output of this function is indeed the output of the neuron which would be used as the input to the other neurons in the next layer. This is what actually happens inside each individual neuron.

Fig. 2
figure 2

A typical neuron structure and the neuron output by applying the activation function on net value

Fig. 3
figure 3

Two popular sigmoid activation/transfer functions. a logistic function \( \varvec{f}\left( \varvec{x} \right) = \frac{1}{{1 + {\text{e}}^{{ - \varvec{ax}}} }} \) and b hyperbolic tangent function f (x) = tanh (x)

Please note that the weights and inputs could be assumed as vectors and thus their multiplication is in reality considered as their inner product.

The weight allocated to each connection link, to be multiplied by each input of the neuron, is indeed representative of the importance of the input in the problem, and in this way, this input importance or strength is transferred to the next layer through the corresponding link. For example, in bottom hole pressure (BHP) prediction downhole, there are several input parameters which are of importance and must be taken into consideration. A good trained ANN would allocate higher weights to be multiplied by the more important or stronger input parameter values.

2.3 Activation Function

It should be noted that the activation function (Fig. 3) takes the net as its input and its output is considered as the output of the neuron. Typical activation functions are as follows:

  • Threshold Function { f = 0 when x < 0, 1 when x ≥ 0},

  • Piecewise Linear Function {f = 0 for x ≤ −0.5, f = 0 for −0.5 ≤ x ≤ 0.5 and f = 1 for x > 0.5},

  • Logistic Sigmoid Function \( \left\{ { f\left( x \right) = \frac{1}{{1 + {\text{e}}^{ - x} }}} \right\} \) which has values between 0 and 1,

  • Sigmoid Hyperbolic Tangent Function {f (x) = tanh (x)} with values between −1 and 1.

Among the activation functions, the sigmoid functions are very common. In Fig. 3, the common sigmoid functions have been illustrated, wherein a is a constant. The sigmoid functions are more frequently used because they are continuous and of course have positive smooth and derivative. The sigmoid functions, unlike some other functions, do have valid derivatives in all points. The derivative of the logistic sigmoid function is indeed smooth as shown below:

$$ f^{'} \left( x \right) = \frac{{ - {\text{e}}^{{ - \varvec{a}x}} }}{{(1 + {\text{e}}^{{ - \varvec{a}x}} )^{2} }} = \varvec{a }f(x) \times \{ 1 - f\left( x \right)\} $$
(1)

The derivative of the sigmoid hyperbolic tangent function is also smooth as shown below:

$$ f^{'} \left( x \right) = 1 - { \tanh }^{2} (x) = \{ 1 - f^{2} \left( x \right)\} $$
(2)

It is just to note that the derivative of the activation or transfer function in ANN is of great importance in order to give more ability to the network. This is because the neuron activation function must be differentiable and continuous at each point. Consequently, depending on the purpose, the logistic sigmoid and sigmoid hyperbolic tangent functions are normally performing better than the rest.

It is also to be noted that in ANN with no hidden layers, no activation functions are applied and the output of each neuron is the net value (summation plus bias). These ANNs are usually used for simple problems.

3 ANN Structure and Feed-forward Artificial Neural Networks (FF-ANNs)

As said above, ANNs structurally consist of some neurons which are linked to each other and cooperate to transform inputs into outputs in the best possible manner. However, on a larger scale, ANN structure is in turn made up of an input layer, one or several hidden layers, and also an output one. In each layer, there are one or several neurons.

There exists only one input and only one output layer all the time. However, there can exist a different number of hidden layers. They can be none, one, two, or even more. As a matter of fact, the number of hidden layers depends on the complexity of the problem.

The FF-ANN has been the first and simplest ANN yet introduced. In this type of ANN, as seen in Fig. 4, the flow or movement of information takes place from the input neurons, through the hidden neurons (if any) to the output neurons (only in the forward direction, without any cycles or loops). Simplicity of this network has helped it to be in common use in most of the petroleum engineering applications to date. FF-ANN is normally capable of learning the implicit governing relationship between the inputs and outputs.

Fig. 4
figure 4

Typical feed-forward artificial neural networks (FF-ANNs)

The numbers of neurons in the input layer and output layer are normally determined by the problem. However, the number of neurons in the hidden layer has to be specified by the user. Based on the experience of the user, the optimal number of neurons must be determined so that an efficient neural network is obtained. Yet, the only way to determine the optimal number of neurons in the hidden layer is performed by trial and error (based on the user’s experience).

4 Backpropagation and Learning

In petroleum engineering application, FF-ANN employs backpropagation of errors in the training phase as their training algorithm (with a supervised learning method). Indeed, backpropagation gets its name from the fact that, during training, the output error is propagated backward to the links or connections between neurons in the previous layers. During this backpropagation of errors, the weights of the links between neurons are adjusted. This process is continued in an iterative manner. In this way, the weights corresponding to links are modified in order to obtain a desired output (with less error), or in other words better learning of the algorithm.

To obtain a more real understanding of the learning mechanism of backpropagation, the expected real output and the predicted output by ANN are compared and an error function is evaluated. Before the training begins, some initial values are allocated to the links between neurons as the weights. Afterwards, by start of training process, a number of data points are fed to the network to be trained using these real examples. For simplicity, a perceptron Footnote 1 ANN (with no hidden layers) consisting of 2 input neurons and one output neuron is considered. The training data points have the frame of (Input1, Input2, V ex). Commonly, the error function utilized to measure the deviation of the expected real output (V ex) and the output by ANN is the mean squared error (MSE). The MSE is found by:

$$ {\text{MSE}} = (V_{\text{ex}} - O_{\text{ANN}} )^{2} $$

where

\( V_{\text{ex}} \) :

Real expected output value corresponding to a specified input data point (a fixed value, already known)

\( O_{{ {\text{ANN}}}} \) :

Evaluated value by whole ANN using specified input data points (the output value of the output neuron of the network)

In the mentioned perceptron above (Fig. 5), for instance, suppose a data point of (2, 1, 1) is taken into account for training the network. Please note that the values 2 and 1 are input independent parameter values, and 1 is the dependent value. In this data point, the value of 1 is the expected real output value. If the mean squared error (MSE) values are plotted on the y-axis versus the possible output values computed by ANN (\( O_{\text{ANN}} \)) on the x-axis, a parabolic shape is resulted as shown in Fig. 6. The minimum of the parabola is corresponding to the global minimum of the error or MSE (the most favorable point with error equal to zero). The nearer the predicted output value by ANN (\( O_{\text{ANN}} \)) is to the expected real value (\( V_{\text{ex}} \)), the less the MSE would be.

Fig. 5
figure 5

Example perceptron with 2 input neurons and 1 output neuron for MSE consideration

Fig. 6
figure 6

Ideal 2-D graph of (MSE) error versus predicted output by ANN (\( O_{\text{ANN}} \)) considering a known data point of (2, 1, 1) as an example. The value of 1 is the expected real output. Thus, \( {\text{MSE}} = (1 - \varvec{O}_{\text{ANN}} )^{2} \)

Considering \( O_{\text{ANN}} = {\text{Input}}_{1} w_{1} + {\text{Input}}_{2} w_{2} \) (ignoring the bias for simplicity), the 3-D map of the error surface (MSE) could be drawn considering the known data point of (2, 1, 1) as an example (Fig. 7).

Fig. 7
figure 7

Ideal 3-D map of error surface (MSE) versus x (w 1) and y (w 2) considering a data point of (2, 1, 1) for training a perceptron with 2 input neurons

The work mechanism of backpropagation algorithm is adjustment of each weight of the network individually based on selection of the path of the steepest gradient descent to minimize the error function (which is usually MSE). In more details, backpropagation calculates the gradient descent or derivative of the error of the ANN prediction with respect to all the weights existing in the network. For further simplicity of understanding, the learning mechanism by which backpropagation reduces MSE of the ANN (highest gradient descent) is analogous to the way a mountain climber can descend a hill just by selecting the steepest path down the hill at each point. The hill steepness and path that the climber has to select and go through at each point could be, respectively, considered as representative of the slope and gradient of the error surface at that point.

Ideally, it is assumed that only one global minimum exists in the error surface. But this is not necessarily the case (there may be a lot of local minima and also maxima in the error surface as shown in Fig. 8). In reality, backpropagation learning algorithm (with gradient descent) would finally converge to an error minimum which is actually a local minimum of error. Undoubtedly, this local minimum of error may not be necessarily global at all. All optimizing algorithms such as genetic algorithm, ant colony, particle swarm optimization have been designed to give the ANN backpropagation more capability to escape local error minima and reach global minimum of error.

Fig. 8
figure 8

Backpropagation (with gradient descent error mitigation mechanism) can only find the local minimum of error, which is not necessarily the global minimum of error

If the initial point of the gradient descent process in backpropagation is somewhere between a local minimum and a local maximum, the training of the network would finally lead to the local minimum, which is not desirable to the user. This dependence of the ANNs which learn by backpropagation algorithm on the initial starting point is an important limitation to this algorithm (Fig. 8). Giving several different random initial values prior to each training is required to prevent trapping into local minima.

It is just to note that weights corresponding to the links of the network are the only variables that can be modified by the network to minimize the error. Still, to the authors’ information, modification of ANN structure (number of neurons and hidden layers) with the objective of error reduction is not possible by the network itself, except by self-user’s trial and error or automatic procedures.

As backpropagation is based on calculation of the MSE gradient with respect to all weights existing in the network, one of the requirements of the backpropagation is that differentiable activation functions should be used in neurons. This is the reason why sigmoid functions, as differentiable functions, are so popular in petroleum engineering and geoscience applications.

As said before, the MSE is found by:

$$ {\text{MSE}} = \frac{1}{2}(V_{\text{ex}} - O_{\text{ANN}} )^{2} $$
(3)

Please note that the ½ factor has been added so that the derivative has no coefficient: \( {\text{MSE}}^{ '} = (V_{\text{ex}} - O_{\text{ANN}} ) \).

For perceptrons (ANNs with no hidden layers), the activation function is linear or \( O_{\text{ANN}} \) is simply the weighted sum of the inputs as follows:

$$ O_{\text{ANN}} = \mathop \sum \limits_{i = 1}^{n} W_{i} \times {\text{Input}}_{i} + {\text{Bias}} $$
(4)

where

\( {\text{Input}}_{i} \) :

Input value to output neuron from neuron i

\( W_{i} \) :

Weight corresponding to link i (between input neuron i and the output neuron)

For multilayer ANNs, \( O_{\text{ANN}} \) is found after application of a nonlinear activation function as follows:

$$ O_{\text{ANN}} = f\left( {\text{Net}} \right) = f(\mathop \sum \limits_{i = 1}^{n} W_{i} \times {\text{Input}}_{i} + {\text{Bias}}) $$
(5)

where

f :

Activation function (usually a sigmoid function)

Since backpropagation applies gradient descent method for error reduction (as said before), the derivate of MSE gradient with respect to weights in the network is calculated utilizing the chain rule of partial derivatives as follows:

$$ \frac{{\partial {\text{MSE}}}}{{\partial W_{i} }} = \frac{{\partial {\text{MSE}}}}{{\partial O_{\text{ANN}} }} \times \frac{{\partial O_{\text{ANN}} }}{{\partial {\text{Net}}}} \times \frac{{\partial {\text{Net}}}}{{\partial W_{i} }} $$
(6)

where

\( \frac{{\partial {\text{Net}}}}{{\partial W_{i} }} \)

Rate of change of net value with respect to weight i

Note Net = \( \sum\nolimits_{i = 1}^{n} {W_{i} \times {\text{Input}}_{i} } \)

\( = {\text{Input}}_{i} \)

\( \frac{{\partial O_{ANN} }}{\partial Net} \)

Rate of change of the output value from output neuron with respect to net value.

Note Activation functions (f) is normally considered as sigmoid functions (logistic or tangent hyperbolic sigmoid function):

\( O_{\text{ANN}} = f({\text{Net}}) = \frac{1}{{1 + {\text{e}}^{\text{ - Net}} }} \), \( f\left( {\text{Net}} \right) = { \tanh }({\text{Net}}) \)

\( = \frac{\partial f}{{\partial {\text{Net}}}} = O_{\text{ANN}} \times (1 - O_{\text{ANN}} ) \)

(for logistic f)

\( = \frac{\partial f}{{\partial {\text{Net}}}} = (1 - O_{\text{ANN}} )^{2} \) for tanh

\( \frac{{\partial {\text{MSE}}}}{{\partial O_{\text{ANN}} }} \)

Rate of change of MSE with respect to the output value from output neuron.

Note \( {\text{MSE}} = \frac{1}{2}(V_{\text{ex}} - O_{\text{ANN}} )^{2} \)

\( = V_{\text{ex}} - O_{\text{ANN}} \)

\( \frac{{\partial {\text{MSE}}}}{{\partial W_{i} }} \)

Rate of change of MSE with respect to weight i

(weight corresponding to link from neuron i)

Note \( \frac{{\partial {\text{MSE}}}}{{\partial W_{i} }} = \frac{{\partial {\text{MSE}}}}{{\partial O_{{{\text{ANN}}}} }} \times \frac{{\partial O_{{{\text{ANN}}}} }}{{\partial {\text{Net}}}} \times \frac{{\partial {\text{Net}}}}{{\partial W_{i} }} \)

\( \begin{aligned} = & (V_{\text{ex}} - O_{\text{ANN}} ) \times O_{\text{ANN}} (1 - O_{\text{ANN}} ) \\ & \times {\text{Input}}_{i} \\ \end{aligned} \)

(for logistic f)

\( \begin{aligned} = & (V_{\text{ex}} - O_{\text{ANN}} ) \times (1 - O_{\text{ANN}} )^{2} \\ & \times {\text{Input}}_{i} \\ \end{aligned} \)

(for tanh)

To summarize the above calculations, the gradient of MSE with respect to W i is equal to:

$$ \frac{{\partial {\text{MSE}}}}{{\partial W_{i} }} = (V_{\text{ex}} - O_{\text{ANN}} ) \times f^{'} \times {\text{Input}}_{i} $$
(7)

If the logistic sigmoid function is used for the activation function or f, we have:

$$ \frac{{\partial {\text{MSE}}}}{{\partial W_{i} }} = (V_{\text{ex}} - O_{\text{ANN}} ) \times O_{\text{ANN}} (1 - O_{\text{ANN}} ) \times {\text{Input}}_{i} $$
(8)

If the sigmoid hyperbolic tangent function is used for the activation function (f), we have:

$$ \frac{{\partial {\text{MSE}}}}{{\partial W_{i} }} = (V_{\text{ex}} - O_{\text{ANN}} ) \times (1 - O_{\text{ANN}}^{2} ) \times {\text{Input}}_{i} $$
(9)

\( \Delta W_{i} \) is found by multiplying \( \frac{{\partial {\text{MSE}}}}{{\partial W_{i} }} \) by the learning rate (\( \alpha \)). Thus, for a multilayer ANN , the value of weight change is equal to:

$$ \Delta W_{i} = \alpha (V_{\text{ex}} - O_{\text{ANN}} ) \times f^{'} \times {\text{Input}}_{i} $$
(10)

For the logistic sigmoid activation function, we have:

$$ \Delta W_{i} = \alpha (V_{\text{ex}} - O_{\text{ANN}} ) \times O_{\text{ANN}} (1 - O_{\text{ANN}} ) \times {\text{Input}}_{i} $$
(11)

For the tangent hyperbolic sigmoid activation function, we have:

$$ \Delta W_{i} = \alpha (V_{\text{ex}} - O_{\text{ANN}} ) \times (1 - O_{\text{ANN}}^{2} ) \times {\text{Input}}_{i} $$
(12)

For a perceptron ANN , f is linear (or \( f^{'} \) is equal to 1). Thus, the value of weight change is equal to:

$$ \Delta W_{i} = \alpha (V_{\text{ex}} - O_{\text{ANN}} ) \times {\text{Input}}_{i} $$
(13)

Please note that in the above calculations, for simplicity, the value of weight change (stated above) has been evaluated using only one data point. It is also reminded, in the MSE function yet considered, it was assumed that only one neuron exists in the output layer. Generally, mean square error (MSE) can be evaluated using:

$$ {\text{MSE}} = \frac{1}{2}\mathop \sum \limits_{k = 1}^{T} \mathop \sum \limits_{j = 1}^{m} \left\{ {(V_{{{\text{ex}},j}} (K) - O_{{{\text{ANN}},j}} (K)} \right\}^{2} $$
(14)

where

T :

Number of training samples from known data point given for training the network

$$ \begin{array}{*{20}l} {(1)\,({\text{Input}}_{ 1} ,{\text{Input}}_{ 2} , \ldots ,V_{{{\text{ex}}, 1}} ,V_{{{\text{ex}}, 2}} , \ldots } \hfill \\ {(2)\,({\text{Input}}_{ 1} ,{\text{Input}}_{ 2} , \ldots ,V_{{{\text{ex}}, 1}} ,V_{{{\text{ex}}, 2}} , \ldots } \hfill \\ \ldots \hfill \\ {(T)\,({\text{Input}}_{ 1} ,{\text{Input}}_{ 2} , \ldots ,V_{{{\text{ex}}, 1}} ,V_{{{\text{ex}}, 2}} , \ldots } \hfill \\ \end{array} $$
m :

Number of output neurons or nodes

\( V_{{{\text{ex}},j}} (K) \) :

Expected real value of output no. k (Input1, Input1, …., \( V_{{{\text{ex}},1}} (K) \)

\( O_{{{\text{ANN}},j}} \left( K \right) \) :

Predicted or estimated value by ANN

4.1 Perceptrons and Backpropagation Algorithm

The ANN structure, in which there are no hidden layers, is called perceptron. A percepron is indeed just like a simple neuron. Perceptrons are only applicable in linear simple problems. In perceptrons, only the simple bias/threshold activation function is utilized, namely by adding a bias/threshold value to the summation. A typical perceptron neural network (with no hidden layer) is shown in Fig. 9.

Fig. 9
figure 9

A typical perceptron as an ANN example

It is noted that training is usually performed in an iterative manner. An epoch is indeed the process of providing the network with the entire training data points, calculating the network output and error function, and modifying the network’s weights for the next epoch. An epoch is composed to several iterations (providing or presenting one data point to the network can be considered as an iteration). Based on complexity of the problem and the ANN, sometimes a large number of epochs are required for training the network.

Indeed, when the input data are presented to the network, the flow of information is forward (FF-ANN). However, as said previously, backpropagation is the backward error propagation of weight adjustments.

The calculations corresponding to the composing neurons of the perceptron in Fig. 9 are shown below.

In each one epoch, the net value (summation plus the bias) and final output value of the neuron (O) are calculated by the perceptron by the ANN as follows:

$$ {\text{Output}} = 1 \times \left( { - 3.6} \right) + \left( { - 1} \right) \times 5 + \left( { - 2} \right) \times 3 + 16.46 = 1.86 $$

Now, suppose that the real expected output value is equal to 1. If so, in each next epoch for each data point, the weight values are changed such that the output value gets nearer to the real expected value. This process continues until nearest possible output values to the real expected values are obtained. In this simple ANN, the modified weights are evaluated as follows:

$$ \Delta W_{i} = \alpha (V_{\text{expected}} - O_{{ {\text{ANN}}}} ) \times {\text{Input}}_{i} $$
$$ W_{{i,{\text{modified}}}} = W_{{i,{\text{previous}}}} +\varvec{\alpha} (V_{\text{expected}} - O_{{ {\text{ANN}}}} ) \times {\text{Input}}_{i} $$
(15)

where

\( \Delta W_{i} \) :

Magnitude of change of the weight corresponding to link i (between neuron i and output neuron)

\( \varvec{\alpha} \) :

Learning rate (a constant)

\( V_{\text{expected}} \) :

Real expected output value corresponding to a specified input data point (already known)

\( O_{{ {\text{ANN}}}} \) :

Evaluated value by whole ANN using specified input data points (the output value of the output neuron)

\( {\text{Input}}_{i} \) :

Input value to output neuron from neuron i

\( W_{{i,{\text{previous}}}} \) :

Value of the weight in the previous epoch

Thus, assuming \( \varvec{\alpha} \) (called learning rate) to be equal to 1, the modified weights are as follows:

$$ \begin{aligned} W_{1,\bmod .} & = - 3.6 + 1 \times (1 - 1.86) \times 1 = - 4.46 \\ W_{2,\bmod .} & = 5 + 1 \times (1 - 1.86) \times ( - 1) = 5.86 \\ W_{3, \bmod .} & = 3 + 1 \times (1 - 1.86) \times ( - 2) = 4.72 \\ \end{aligned} $$

Thus, after training the network during one epoch using the data point {(1, −1, −2) as input and 1 as output}, the above modified weight values would be used in the next epoch.

4.2 Multilayer ANNs and Backpropagation Algorithm

For more complex problems, normally the ANN should have hidden layers. These neural networks are called multilayer ANNs. Multi-layer ANNs could be considered as a developed or extended perceptron. The structure of a multilayer ANN with one hidden layer has been shown in Fig. 10. Application of multilayer ANN is not only restricted to very simple linear problems, but can also utilized for complex or non-straightforward problems.

Fig. 10
figure 10

A multilayer ANN with one hidden layer (a logistic sigmoid function has been considered as the activation/transfer function)

Unlike perceptrons, in multilayer ANN, the activation function is not just a bias/threshold, but activation functions (usually sigmoid functions) are usually applied. As could be seen in Fig. 10, the ANN structure is composed of one input layer (consisting of 3 neurons), one output layer (consisting of one neuron), and also one hidden layer (consisting of 2 neurons) as a typical example. For most engineering purposes, only one or 2 hidden layers at most are normally adequate to be used in the structure. The number of neurons in the hidden layers is crucial in the performance and functionality of the network. However, depending on the complexity of the problem, an optimization should be made not to use too many neurons in the hidden layers as this causes over-fitting or over-training .

For the multilayer ANN shown in Fig. 10, in each one epoch, the net value and the output of each neuron in the hidden and output layers are calculated by the ANN as follows:

  • H 1 : the net value and output of this neuron are found as follows:

    $$ {\text{Net}} = 1 \times \left( { - 3.6} \right) + \left( { - 1} \right) \times 5 + \left( { - 2} \right) \times 3 + 16.46 = 1.86 $$
    $$ {\text{Output}} = \frac{1}{{1 + {\text{e}}^{ - 1.86} }} = 0.86 $$
  • H 2 :

    $$ {\text{Net}} = 1 \times \left( { - 2} \right) + \left( { - 1} \right) \times 2 + \left( { - 2} \right) \times \left( { - 4.1} \right) - 11.85 = - 7.65 $$
    $$ {\text{Output}} = \frac{1}{{1 + e^{ - ( - 7.65)} }} = 4.75 \times 10^{ - 4} $$
  • O: the net value and output of this neuron (which is the output of the ANN) are found as follows:

    $$ {\text{Net}} = 0.86 \times \left( { - 1.1} \right) + 4.75 \times 10^{ - 4} \times 8.7 + 3.132 = 2.187 $$
    $$ {\text{Output}} = \frac{1}{{1 + {\text{e}}^{ - 2,187} }} = 0.9 $$

Now suppose that the real expected value is equal to 1. If so, in each next epoch for each data points, the weight values are changed or modified such that the output value gets nearer to the real expected value (here equal to 1). This process continues until nearest possible output values to real expected values are obtained. In this multilayer ANN, the modified weights are evaluated as follows:

$$ \Delta W_{i} = f^{'}_{\text{output}} \times (V_{\text{expected}} - O_{{ {\text{ANN}}}} ) \times {\text{Input}}_{i} $$

Replacing the logistic sigmoid activation function derivative for the output neuron yields:

$$ \Delta W_{i} =\varvec{\alpha} O_{{ {\text{ANN}}}} (1 - O_{{ {\text{ANN}}}} ) \times (V_{\text{expected}} - O_{{ {\text{ANN}}}} ) \times {\text{Input}}_{i} $$

The modified weight for this sigmoid function is as follows:

$$ W_{\text{modified}} = W_{\text{previous}} +\varvec{\alpha} O_{{ {\text{ANN}}}} (1 - O_{{ {\text{ANN}}}} ) \times (V_{\text{expected}} - O_{{ {\text{ANN}}}} ) \times {\text{Input}}_{i} $$
(16)

Replacing the hyperbolic tangent sigmoid activation function derivative for the output neuron yields:

$$ \Delta W_{i} =\varvec{\alpha} (1 - O_{{ {\text{ANN}}}} )^{2} \times (V_{\text{expected}} - O_{{ {\text{ANN}}}} ) \times {\text{Input}}_{i} $$

The modified weight for the hyperbolic tangent sigmoid function is as follows:

$$ W_{\text{modified}} = W_{\text{previous}} +\varvec{\alpha} (1 - O_{{ {\text{ANN}}}} )^{2} \times (V_{\text{expected}} - O_{{ {\text{ANN}}}} ) \times {\text{Input}}_{i} $$
(17)

where

\( O_{{ {\text{ANN}}}} \) :

Evaluated value by whole ANN using specified input data points (the output value of the output neuron). This is fixed value for each epoch

\( f^{'} \) :

The derivative of the activation function corresponding to the output neuron

\( {\text{Input}}_{i} \) :

Input value of the neuron i

\( W_{{i,{\text{previous}}}} \) :

Value of the weight corresponding to link i in the previous epoch

\( W_{{i,{\text{modified}}}} \) :

Value of the weight corresponding to link i to be used for the next epoch

Input value is the value enters to each neuron. If the neuron is in the hidden layer, the input value is the net value (summation plus bias) to the neuron. In perceptrons (without hidden layer ANN) which are applicable to simple linear problems, the activation function is indeed f(x) = \( \varvec{\alpha} \) x. Therefore, applying the above relations to perceptrons would yield \( f^{'} \) to be equal to 1 or \( \varvec{\alpha} \) (15).

Thus, assuming \( \alpha \) (called learning rate) to be equal to 1, the modified weights are evaluated as follows:

$$ W_{\text{modified}} = W_{\text{previous}} +\varvec{\alpha} O_{{ {\text{ANN}}}} (1 - O_{{ {\text{ANN}}}} ) \times (V_{\text{expected}} - O_{{ {\text{ANN}}}} ) \times {\text{Input}}_{i} $$
$$ W_{I1 - H1} = - 3.6 + 1 \times 0.9\left( {1 - 0.9} \right) \times (1 - 0.9) \times 1 = - 3.591 $$
$$ W_{I2 - H1} = 5 + 1 \times 0.9\left( {1 - 0.9} \right) \times (1 - 0.9) \times ( - 1) = 4.991 $$
$$ W_{I3 - H1} = 3 + 1 \times 0.9\left( {1 - 0.9} \right) \times (1 - 0.9) \times ( - 2) = 2.982 $$
$$ W_{I1 - H2} = - 2 + 1 \times 0.9\left( {1 - 0.9} \right) \times \left( {1 - 0.9} \right) \times 1 = - 1.991 $$
$$ W_{I2 - H2} = 2 + 1 \times 0.9\left( {1 - 0.9} \right) \times (1 - 0.9) \times ( - 1) = 1.991 $$
$$ W_{I3 - H2} = - 4.1 + 1 \times 0.9\left( {1 - 0.9} \right) \times \left( {1 - 0.9} \right) \times \left( { - 2} \right) = - 4.118 $$
$$ W_{H1 - O} = - 1.1 + 1 \times 0.9\left( {1 - 0.9} \right) \times (1 - 0.9) \times 1.86 = - 1.083 $$
$$ W_{H2 - O} = 8.7 + 1 \times 0.9\left( {1 - 0.9} \right) \times (1 - 0.9) \times ( - 7.65) = 8.631 $$

Thus, after training the network during one epoch using the data point {(1, −1, −2) as input and 1 as output}, the above modified weight values would be used in the next epoch. This process takes place for all the data points given to the network in the training stage.

It is noted that if the learning rate is too low, the learning process would be too slow. If the learning rate is too high, the weights and objective function would diverge and no good learning would occur. In linear problems, proper learning rates could be computed using the Hessian matrix (Bertsekas and Tsitsiklis 1996). It is also possible to adjust the learning rate while training. A number of proposals exist in the neural network literature to adjust the learning rate while training. However, most of them are not effective. Among these works, Darken and Moody (1992) could be named.

Please note that the ANN structures discussed above are feed-forward ANN (FF-ANN) with backpropagation algorithm for weight modification. This has been very popular in petroleum engineering practices. Thus, these types of neural networks are further discussed as follows.

5 Data Processing by ANN

Typically, processing of ANN data is performed in three sections including training, testing /calibration, and validation /verification phases. For this purpose, the number of the known data points comprising input and output values is divided into two categories. In most of the petroleum engineering practices performed by the authors, typically the number of available data points (inputs and output values available) is divided into two independent parts: (1) 60 % of the data points for utilization in the training phase and (2) 40 % of the data points for utilization in testing. Some users test the trained neural network in two stages: (a) 20 % of the data points for first testing of the trained network and (b) 20 % of the data for second testing purposes. Depending on the number of data points available and experience of the users (which is very important), different users adopt different division of the data. Some may take 70 % of the available data points for training and the rest for testing. The number of the data points plays an important role in successful training. Fake data points could cause trouble for successful training particularly if their number is many.

Then, if testing gives promising results, the user can go for validation or verification phase. In this phase, some new input data is applied to the trained ANN to get the network outputs. The network outputs are thus considered to be very near to the real ones and could be used for instance in petroleum engineering.

Indeed, during the training phase, the desired network is developed. Then, it is tested. When the training and testing process has been finished successfully (after training and simultaneously testing the network by test data points), the network is applied to the validation data points in order to predict outputs using new input data.

5.1 Training

As one of the main similarities between artificial neural networks (ANNs) and biological neural networks, both have the ability to learn or to be trained. As said earlier, the output of a neuron is a function of the net value (which is the weighted sum of the inputs plus a bias). Indeed, after successful training, an ANN can have the capability of creating reasonable outputs using new inputs (during testing and validation phases). The more reasonable the neural networks respond to the new data, the neural networks has got higher functionality in terms of generalized prediction. Good training is thus of great importance to enhance the functionality of ANN.

In the beginning, the ANN allocates a random weight to each input value. Then, during the training process , when the inputs are introduced to the network, the weights (corresponding to the links between all the neurons of the network) would be adjusted or modified such that finally the ANN output(s) is/are very near to the expected real output value(s). Depending on how close the output created by ANN is to the expected real output, the weights between the neurons are modified such that if the same inputs are introduced to the network, the network would provide an output pattern closer to the expected real output.

Surely, if the difference between the created data (by ANN) and real data is considerable, more modification of the weights is performed during training. In this way, at any time in training, a kind of memory is attached to the neurons which store the weights in the past computations. Finally, if convergence is attained after training, we expect the ANN to give outputs which are in proximity to expected real outputs. In Fig. 9 (perceptron) and the corresponding calculations, it was previously shown how the weights were modified (W 1,mod, W 2,mod, W 3,mod) during training.

In more details, it could be stated that the training or the learning stage of the network is accomplished by summing up the errors at the output and creating a cost/risk function in terms of network inputs and weights and minimizing the cost function with respect to the network inputs. The cost function is basically based on mean squared error (MSE) of the outputs. The process of MSE error mitigation during training takes place in an iterative process during many iteration times. Each iteration, which is indeed the process of providing the network with inputs and modifying the network’s weights, is called an epoch. Normally, a lot of epochs are required to train a typical ANN.

Training continues as said above until the created output value or pattern complied with the quality criteria based on statistical error values. In other words, the stopping criteria is based on the minimization of MSE (Cacciola et al. 2009). This type of training wherein both inputs and actual outputs are supplied to the network (called Supervised Training) is usually more common in practice. In unsupervised training, only input values are supplied and ANN adjusts its own weights such that similar outputs are given out of the network while inserting similar inputs. Most of the neural network applications in the oil and gas industry are based on supervised training algorithms (Mohaghegh 2000).

5.2 Over-fitting

In iterative backpropagation ANNs which are commonly used in petroleum industry, there is a serious problem which arises after too much training of the network. It is important to know when to stop training the network and go for testing and validation phases. As a matter of fact, too much training would cause over-fitting or over-training. Over-fitting is also called memorization as well because in this case, the network would indeed memorize the data points used in training and give a very accurate match with them, but it would lose its generalization capability which is required in testing and validation phases. Please note that over-fitting does not really apply to ANNs which are trained using non-iterative processes. However, it can be said that ANNs used in petroleum are normally iterative.

Indeed, during the initial phase of training, both the training and validation set errors show a decreasing trend with time or number of epochs. It is noted that training and validation set errors are, respectively, errors corresponding to training and validation data points. Nevertheless, when the over-fitting of the ANN starts, the validation error suddenly starts to increase, while the training error still continues its decreasing trend (Fig. 11). The blue dots in Fig. 12 illustrate the train data points (X, Y). The blue and red lines in Fig. 12, respectively, show the true functional relationship and the ANN learned function. As can be seen, due to too much training, there is a large difference between the true functional curve and the ANN learned function in points other than train data points. This is representative of over-fitting. The input values have been shown in the x-axis for simplicity, and the output values have been denoted by the values in the Y-axis. The saved ANN weights and biases/thresholds are corresponding to the minimum of the validation set error.

Fig. 11
figure 11

The sudden increase of validation set error (black arrow) shows over-fitting in the graph of MSE versus time/epoch number

Fig. 12
figure 12

Over-fitting. Too much complexity of the ANN due to too much learning (red curve) has caused the ANN estimated values by ANN learned function at inputs other than train values to be very different from true functional relationship or curve (blue curve). x-axis and y-axis are, respectively, representative of inputs and ANN outputs

Figure 12 illustrates the problem of over-fitting in machine learning. The blue dots represent training set data. The blue line represents the true functional relationship, while the red line shows the learned or trained function, which is shown to be under the effect of over-fitting.

It is noted that if the test set error had shown a considerable increase before the validation set error, it could be guessed over-fitting could have happened. Thus, in Fig. 13, for instance, there are no worries about over-fitting until stopping point at epoch number 6. But surely after epoch 6, again over-fitting would happen. The minimum validation error is at epoch no. 6, but the training has continued 6 more epochs until epoch number 12 (the user has instructed the network do so). Thus, it is possible to rely on the ANN weights and biases adjusted based on stopping point at epoch 6. Just to remind that as times passes by (epoch increase), the ANN is getting more and more complex.

Fig. 13
figure 13

At epoch no. 10, the error of second test or validation error (MSE as its set error function) and also first test set error increased, while training set error is still decreasing. But, as the increase of the validation set error is not too much sharp, it is no sign of over-fitting

In order to prevent over-fitting, first it is required that the network is not too much complex that models even the noises. Second, before the termination of the training process, the process is commonly stopped from time to time so that the network generalization capability is checked using the test data points. If the network performed well in the test phase (if the MSE in the test phase is in the magnitude of 0.001 or a little higher), no further training is required. Since the outputs of the test data points are not utilized in the training phase, the network prediction or generalization capabilities could be analyzed by comparing the network outputs (using the input values in testing phase) with the real test output values.

5.3 Testing

After the training stage, we reach to the testing or calibration stage wherein the weights between the neurons are tested. Testing takes place using input–output pairs that have not been utilized in the training stage. The difference between the desired and the actual output can show if enough training has happened. It is noted that it is usually a common fault among users with not enough experience to test/calibrate the ANN using the same data points used for training.

5.3.1 Validation

Validation is the last stage in the ANN application or neural data processing by ANN. To summarize the data processing by ANN, during the training phase, a trained network is developed. Then, it is tested. If it shows good performance during testing, it is a recognized as a proper network to be used with new data. During validation, the user can apply the trained network to obtain results. In this stage, new input data (with unknown outputs) is supplied to the trained ANN in order to obtain the network outputs. As the trained network has performed well in the testing stage, the outputs obtained in the validation stage are considered to be trustworthy and are taken for granted as nearest to real.

It must be noted that the validation stage is usually considered as the second test. If so, the validation and test error curves versus the number of epochs are plotted along with the training error curve. The validation and test error curves should also have the declining trend just like the training error curve before convergence. After convergence, the validation and test error curve would rise, while training error curve would continue declining showing too much training (over-fitting). Therefore, if the training curve is declining while the validation and test ones are rising and do not show a decline before convergence, it indicates a problem with the network. To help remove the problem with the network, it is suggested to follow the steps below:

  1. 1.

    Re-check the data: Among the data points, there might be some wrong or faulty ones. Try to detect them and delete them from the available data points.

  2. 2.

    Reconsider the effective parameters: Try to consider one or more new input parameter whose effect on the output parameter might have been ignored. This requires a review of the problem and identifying the effective parameters to find its corresponding values.

According to author’s experience, the error was reduced dramatically using the above two methods.

Sometimes, the output is not dependent on some of the data or in other words its corresponding sensitivity is very small. This case can be found by sensitivity test analysis.

Please note that normalizing the data can also help to obtain better results from the network performance. Normalizing means arranging all the available data values in the range of 0–1. By knowing the maximum and minimum value among of the data, normalizing is possible. Thus, the normalized value is found by the following relation:

$$ \delta = \frac{{d - d_{ \hbox{min} } }}{{d_{ \hbox{max} } - d_{ \hbox{min} } }} $$
(18)

It is noted that some users consider the MSE error in the validation phase as the model quality. Some others may consider the error in the test phase.

6 ANN and Validation Error Statistical Parameters

As mentioned in the text, MSE is the main error function to compare the neural network models and performance and in more detail the discrepancy between the expected real values and the output by ANN. Additionally, several other statistical parameters are also utilized to report this discrepancy. Collectively, these error parameters include mean squared error (MSE), average root mean squared error (ARMSE), average percent error (APE), average absolute percent error (AAPE), Pearson correlation coefficient (R), squared Pearson correlation coefficient (R2), standard deviation (SD), and variance (V). It is noted that variance is simply the square of standard deviation.

It is to note that R 2 can be used along with R to analyze the network performance. MSE and R 2 collectively can give an indication of the performance of the network. Generally, a R 2 value greater than 0.9 indicates a very satisfactory model performance, while a R 2 value in the range 0.8–0.9 signifies a good performance, and the value less than 0.8 indicates a rather unsatisfactory model performance (Coulibaly and Baldwin 2005).

Definitions and mathematical relations of these parameters are given in the Appendix.

7 Sequential Forward Selection of Input Parameters (SFS)

As said before, one of the challenges in neural network modeling is the determination of the important effective input parameters on the output. Certainly, experience and knowledge of the problem could help in determination of the effective parameters or selection criterion. However, it is better to utilize a systematic method to determine the impact of each parameter, rank them. Using sequential forward selection (SFS), each of the input parameters is considered individually. Then, the input parameters with the highest impact on the output parameters are detected in successive stages so that ranking of the parameters could be performed on the basis of highest to lowest impact. For more clarity of the procedure, an example is given below:

Suppose 5 input parameters denoted by a to e have been characterized as important parameters of a problem and pressure is the output parameter. To utilize SFS for ranking of the parameters which have the greatest impact, in the first stage, 5 neural network models are constructed each just using one single input parameter. To compare their performance, the errors of all these single input parameters are evaluated. As it is illustrated in Fig. 14, using single parameter c, for instance, network creation leads to least error of all. Thus, parameter c is the most effective input parameter or with highest impact on output parameter which is bottom hole pressure (BHP).

Fig. 14
figure 14

1st stage of SFS. The MSE error corresponding to the 5 neural networks each just trained by a single input parameter (a to e). The input parameter with the most impact is c

In the second stage, 4 neural networks using 2 input parameters are constructed such that input parameter c is surely considered (fixed parameter) and the second parameter is changed among the 7 remaining parameters. In this way, 4 neural networks are constructed. Now, their corresponding errors are calculated. Out of these 4 networks, the second parameter of the network with the least error is selected as the second most effective parameter. For instance, this could be parameter no. e (Fig. 15).

Fig. 15
figure 15

The MSE error corresponding to the 4 neural networks each just trained by a single input parameter (a to e). The second most effective parameter is e

At the third stage, 3 neural networks are constructed using 3 input parameters such that input parameters c and e as the first two most influential parameters are considered (fixed parameters) and the third input parameter is changed among the 3 remaining parameters. The above SFS process is continued until the ranking of all the input parameters is performed (Fig. 16). Collectively, 15 neural networks will have been considered to know the final ranking. After finding the input parameters with maximum to minimum effect, the errors of five neural networks corresponding to each stage of SFS (first constructed with the single most effective input parameter, second constructed with two most effective input parameter, etc.) are compared with each other. Finally, Fig. 16 is yielded which shows how adding input parameters could reduce error.

Fig. 16
figure 16

Final ranking of input parameters

8 Applied Examples of ANN in Geoscience and Petroleum Engineering

Among the many applications of ANN in geoscience and petroleum engineering, some applied examples are given as follows.

8.1 Multiphase Flow

Among really complex problems, multi-phase flow problems could be considered. One important parameter in two phase flow in wells is the bottom hole pressure (BHP). In underbalanced drilling (UBD), the BHP should be kept less than formation pore pressure and above wellbore collapse pressure (to prevent wellbore instability). Thus, the prediction capability of predicting BHP in UBD operations is of great importance which could leave the necessity of measuring devices bottom hole. Different approaches have their own defects. For the following reasons, the errors of the mentioned approaches are not small:

Because of the complexity of the phenomenon, indeed no analytical solutions exist for multi-phase problems. Empirical multi-phase flow correlations sometimes over-predict, make extrapolations risky, and are susceptible to uncertainties.

The use of mechanistic modeling approaches has been increasing in multi-phase flow problems in pipes; however, utilizing them in annular flow problems has not been very promising.

Taking into account the lack of analytical solutions in multi-phase problems, artificial neural network was utilized to evaluate BHP in multi-phase annular flow while UBD operations (Ashena et al. 2010). Thus, BHP was considered as the output parameter. An attempt was made to find the parameters which have influence on the output parameter (BHP) or BHP depends on. This is obtained just by a brief studying of multi-phase flow relations and also experience. Seven parameters were found as effective parameters:

  • Rate of liquid or diesel injection (in gallon per minute or gpm)

  • Rate of gas or nitrogen injection (in standard cubic foot per minute or scfm)

  • Measured depth or MD (in meter)

  • True vertical depth or TVD (in meter)

  • Inclination angle from the vertical (in degrees)

  • Well surface pressure (in psi)

  • Well surface temperature (in degrees celsius)

Qualitatively speaking, it must be noted that the more the value of item 1 (rate of liquid injection), the more the value of BHP. The more the value of item 2 (rate of gas injection), the less the value of BHP. Please note that as items 3–5 (measured depth, true vertical depth, and inclination angle) collectively take the hole depth in vertical and directional wells into account, they were considered as input parameters. Normally, the neural network structure is itself responsible for recognizing the extent of the input parameter strength or importance.

The number of 163 data points with the above 7 input parameters and measured BHP (the only output parameter) was collected. About 10 data points were found to cause too much error. After their deletion, the ANN performance was enhanced.

A 153 collected data points should be divided for use in training and testing phases. 60 % of them were allocated to training, 20 % to testing-1, and 20 % to validation (or testing-2) purposes. Some additional data have been used as simulation data with known measured values as well. As the number of the available collected data points was limited, only limited number of neurons and layers was tried to be utilized. Feed-forward ANN with backpropagation algorithm of learning was set as the neural network. Input data were inserted from the MATLAB workspace. All the data values were normalized (with the range between 0 and 1).

The training function was set as Levenberg–Marquardt algorithm (trainlm). It is noted that Levenberg–Marquardt algorithm makes an interpolation between the Gauss–Newton algorithm and the gradient descent method. MSE was selected as the performance or error function. The hyperbolic tangent sigmoid function (tansig) was selected as the transfer or activation function. We could have also selected the logistic sigmoid function (logsig). According to the authors’ experience, it would be good to consider 2–2.5 times the number of the inputs as the number of neurons in the hidden layer as one option. In this study, it was shown that the case with 3 layers (2 hidden layers) has the least error and hence is used to simulate the input data. However, normally 1 hidden layer can meet the engineering needs. In this study, as the number of layers was increased, the error was decreased. As can be seen from case 3 to 4 (Table 1), since the number of data points to train the network, 92, is not that many, increasing the number of neurons may not necessarily decrease the MSE. It is just to note that it is required to first reinitialize the weights at each time of running neural networks.

Table 1 Different ANN structures used in the example and the validation error

In Fig. 17, the value of MSE versus the number of epochs for one case of the above example has been given. It is shown that the value of error (MSE) of training, first testing (test), and second testing (as only named validation in Fig. 17) is decreasing with the number of epochs before convergence is reached. Upon convergence, the errors of one of the tests or both start increasing, while the training error still decreasing. The criterion for MSE calculation is the error value at the convergence (MSE: 0.007 for the case given).

Fig. 17
figure 17

MSE versus number of epochs (MSE: 0.007)

Please note that if the starting increase of the validation (or second test) and test (or first test) MSE curves is not too much sharp at the convergence point, the neural network performance is considered to be valid. In Fig. 17, at the convergence point (epoch 3), the starting increase in the validation (second testing) MSE is not too sharp.

In Fig. 18, the values of the Pearson coefficients (R) for the training, first testing (test), and second testing (validation) have been shown. The y-axis and x-axis, respectively, show the predicted outputs by ANN and the expected real values (targets). The closer these values are to the value 1, the better the indication of convergence. In our example, these values are good enough (about 1) and thus show good convergence. This graph is an important indication of the validity of the trained neural network. Suppose only the R 2 corresponding to training was near to 1 (e.g. 0.9), and the first and second test values were not (e.g. 0.4). If so, the neural network performance was weak and validity of the work was under question.

Fig. 18
figure 18

The Pearson coefficients for the training, testing 1(test), and testing 2 (validation). The y-axis and x-axis, respectively, show the predicted outputs by ANN and the expected real values (target). The values are near to 1 and thus ANN has good valid performance

To analyze the performance of different neural network models, several statistical parameters were utilized. These parameters include average absolute percent error (AAPE), average percent error (APE), average root mean squared error (ARMSE), Pearson correlation coefficient (R), squared Pearson coefficient (R 2), standard deviation (SD), and variance (V) as shown in Table 1. In the Appendix, the required statistical parameters have been described.

The trained ANN above, which has been validated successful after two testing, could be used for simulation using new input data.

8.2 Well Hydraulics

Drilling hydraulics simulation is indeed a non-straightforward problem which is more sophisticated in complex wells (slim holes and extended reach wells). There are many unknowns in this problem. As simulation by hydraulics simulators is time consuming, they are not suitable for application in real-time drilling.

Because of the above reasons, Fruhwirth et al. (2006) utilized a number of 11 generations of a special ANN called by them as completely connected perceptron (CCP) for prediction of pump pressure or hydraulic pressure losses.

It is noted that CCP is a more general type of perceptron which could be considered as multilayered as seen in Fig. 19. In each generation, one hidden layer was added to the CCP (first generation without any hidden layer and the 11th generation with 10 hidden layers). Out of the available data, 50 % was devoted to training. Then, 25 % of the data points were allocated to each test or validation. The real data of two wells were utilized for training the ANN. The input drilling parameters considered include bit measured depth (MD in m), bit true vertical depth (TVD in m), block position (in m), rate of penetration (ROP in m/hr), average mud flow rate (in m3/s), average hook load (in kg), average drill string revolutions (RPM), average weight on bit (WOB in kg), and average mud weight out of hole (in kg/m3). Average pump pressure (in bar) was considered as the output parameter.

Fig. 19
figure 19

Growing completely connected perceptron (cVision Manual)

Utilization of a completely connected perceptron (CCP) has the advantage of eliminating the need to find an optimal number of hidden layers (Fruhwirth et al. 2006). The schematics of different generations of ANN are shown in Fig. 20.

Fig. 20
figure 20

Schematics of completely connected perceptron (cVision Manual)

The results of modeling by ANN were promising. As seen in Fig. 21, the RMS error (in bar) is low enough, and there is not much change in the error from a generation on (the number of hidden layers does not have much effect from a number on). It is reminded that pump pressure is the output parameter. In Fig. 22a, the measured, calculated pump pressures by ANN (as the output) and also the corresponding error versus time have been shown for one of the wells. In Fig. 22b, a good match of the data could be observed in the cross-plot of the measured (expected real data) and the calculated (or predicted) output by ANN.

Fig. 21
figure 21

The RMS error corresponding to learning or training, test and validation phases (Fruhwirth et al. 2006)

Fig. 22
figure 22

a Measured, calculated by ANN pump pressure and error versus time and b cross-plot of calculated versus measured pump pressure (Fruhwirth et al. 2006)

Although the results obtained by the authors have enough accuracy, the authors have added some more features to the input data including MD-TVD ratio, dog leg severity (DLS), sine and cosine of well inclination, and three features corresponding to Reynolds number (N Re). Reynolds number was considered because it is related to the mud rheological properties, well geometry, and mud flow rate (which all have an effect on hydraulics).

In a further study, torque and bit measured depth have been added as additional input parameters and it has been concluded that 95 % of the theoretical predicted standpipe pressure values lie within 10 bars of the real measured data (Fig. 23). Also, in the cross-plot of predicted (y-axis) and measured pressure (x-axis) as shown Fig. 24, the squared Pearson coefficient has been found to be equal to 0.9677 (Todorov et al. 2014). Also, the simulated pressure losses clearly follow the trend of the measured standpipe pressure (Fig. 23). All the above discussions indicate the capability of ANN in handling complex hydraulics problems.

Fig. 23
figure 23

Measured standpipe pressure and calculated pressure drop by ANN versus time (Todorov and Thonhauser 2014)

Fig. 24
figure 24

cross-plot of predicted (y-axis) and measured pressure (x-axis)

8.3 Drilling Optimization

Many studies have been performed on the application of ANN for drilling optimization of rate of penetration (ROP) in the literature.

In a recent work, four ROP models with, respectively 6, 9, 15, and 18 input parameters or data channels were constructed using ANN with the objective of investigating the effect of vibration parameters on ROP (Esmaeili et al. 2012). In the first two models, the vibration parameters were not considered. In the third model, formation mechanical properties were not considered though vibration parameters were taken into consideration.

In the first ROP model, only the drilling parameters (average and standard deviation of WOB, average and standard deviation of RPM of drill string, and average and standard deviation of torque) were considered as input parameters or channels.

In the second ROP model, the drilling and mechanical properties (uniaxial compressive strength UCS, Young’s modulus of elasticity, and Poisson’s ratio) were considered as input parameters.

In the third ROP model, the drilling and vibration parameters were considered as input parameters. The vibration parameters include standard deviation, and first and second order frequency moment of X, Y, and Z component of vibration.

In the fourth ROP model, all the drilling, mechanical properties, and vibration parameters were considered as input parameters or channels.

In Fig. 25, the ranking of input parameters (including drilling, mechanical, and vibration) has been made for the fourth model using sequential forward selection (SFS) which shows the parameters UCS, standard deviation of Z component of vibration, average WOB, etc. that have the most impact on ROP.

Fig. 25
figure 25

Ranking of input channels (or parameters) for the fourth ROP Model using sequential forward selection (Esmaeili et al. 2012)

Eventually, Esmaeili et al. (2012) reached the results in Table 2.

Table 2 Summary of ROP models and corresponding RMS errors (Esmaeili et al. 2012)

Most of the common ROP models (Bourgoyne, Warren, Maurer, etc.), which are based on only the drilling and mechanical properties, do not consider the vibration properties. The RMS error corresponding to the second ROP model, which looks like the mentioned models, was calculated as 0.208 mm/min. Nevertheless, in the fourth ROP model wherein the vibration parameters were considered, the RMS error was calculated as 0.194 mm/min which shows vibration parameters have important effects on ROP modeling.

In Fig. 26, the cross-plot of actual ROP values (predicted by ANN) and desired ROP values (measured or expected real values) have been shown which shows a better match in the fourth model.

Fig. 26
figure 26

Actual ROP (predicted by ANN) versus desired ROP (expected real or measured values) for all four ROP models from ad (Esmaeili et al. 2012)

Some similar drilling optimization studies using neural networks to be mentioned are as follows: Gidh et al. (2012) in prediction of bit wear, Lind and Kabirova (2014) in prediction of drilling problems, and Bataee et al. (2010).

8.4 Permeability Prediction

Permeability is one of the important parameters which have been estimated using ANN in many applications.

In an application of ANN to predict permeability (Naeeni et al. 2010), the input parameters considered in the FF-ANN with backpropagation algorithm, include depth, CT (true conductivity), DT (sonic travel time), NPHI (neutron porosity), RHOB (bulk density), SGR (spectral gamma ray), NDSEP (neutron-density log separation), northing of well, easting of well, S WT (water saturation), and FZI (flow zone indicator). The number of three hidden layers (with 13, 10, and 1 neurons) was selected for the network. In order to estimate permeability, determination of different hydraulic flow units (HFU) is the first stage because it leads to FZI values which are needed for the first stage. At the same time, FZI which is related to pore size and geometry has the following relation with RQI:

$$ {\text{FZI}} = \frac{\text{RQI}}{{\varnothing_{\text{z}} }} $$
(19)

It is also noted that RQI and \( \varnothing_{\text{z}} \) are evaluated by:

$$ {\text{RQI}}(\upmu{\text{m}}) = 0.031\sqrt {\frac{K}{{\varnothing_{\text{e}} }}} $$
(20)
$$ \varnothing_{\text{z}} = \frac{{\varnothing_{\text{e}} }}{{1 - \varnothing_{\text{e}} }} $$
(21)

where \( \varnothing_{\text{e}} \) is the effective porosity.

Thus, taking logarithms from both sides leads to:

$$ {\text{Log }}\left( {\text{RQI}} \right){\text{ = Log}}\left( {\varnothing_{\text{z}} } \right) + {\text{Log}}\left( {\text{FZI}} \right) $$
(22)

Therefore, \( {\text{Log}}\left( {\text{FZI}} \right) \) is the intercept of the plot of \( {\text{RQI}} \) versus \( \varnothing_{\text{z}} \) in log–log scale. Thus, first RQI (rock quality index) values versus normalized porosity values (\( \varnothing_{\text{z}} \)) were plotted in log–log scale using core data. Second, RQI values versus \( \varnothing_{\text{z}} \) were plotted in a log–log scale. Third, some straight lines are selected to intersect with line \( \varnothing_{\text{z}} = 1 \) such that reasonable initial guesses of intercepts are obtained as mean FZI values.

Fourth, the data points are assigned to the adjacent straight lines and are considered as different HFUs (clustering technique). This stage requires considerable time. Fifth, the intercept or FZI of each HFU is recalculated utilizing regression equations and is compared with the guess in the fourth stage. If the difference between the recalculated FZI value and the guess value in the fourth stage is considerable, it is required to come back to the fourth stage so that the guessed FZI can be updated.

Following the above procedure, finally different rock types could be determined with good accuracy (HFU determination is done) and also permeability values could be estimated as shown in Figs. 27 and 28.

Fig. 27
figure 27

The number of 15 rock types has been identified in the collected data from the reservoir (Naeeni et al. 2010)

Fig. 28
figure 28

For each rock type of the reservoir, one permeability curve versus porosity has been determined (Naeeni et al. 2010)

As the second step, an ANN was trained with well log and RQI data as input and permeability data as output. Eventually, it is possible to simulate the ANN using well log data in uncored wells to estimate their permeability values.

As the cross-plot of predicted versus measured values of permeability is shown in Fig. 29, the performance of the ANN in permeability prediction was promising with the Pearson coefficient of 0.85 for the validation phase

Fig. 29
figure 29

Predicted permeability values or output values as y-axis versus core permeability values as x-axis (Naeeni et al. 2010)

In another study with a rather similar approach (Kharrat et al. 2009), the predicted permeability values show a good match with the core measured values. The well FZI profile versus depth has been shown on the left of Fig. 30 (with dots showing the values corresponding to core measured values). The predicted and core measured permeability values versus depth have been illustrated on the right of Fig. 30.

Fig. 30
figure 30

FZI profile of one well based on log and core data (left) and its permeability predictions (Kharrat et al. 2009)

Some similar reservoir engineering studies using neural networks are as follows: Thomas and Pointe (1995) in conductive fracture identification, Lechner and Zangl (2005) in reservoir performance prediction, Adeyemi and Sulaimon (2012) in predicting wax formation, and Tang (2008) in reservoir facies classification.

9 Conclusion

Artificial neural networks (ANNs) have shown to be an effective tool to solve complex problems with no analytical solutions. In this chapter, the following topics have been covered: artificial neural networks basics (neurons, activation function, ANN structure), feed-forward ANN, backpropagation and learning, perceptrons and Backpropagation, multilayer ANNs and backpropagation algorithm, data processing by ANN (training, over-fitting, testing, validation), ANN and statistical parameters, and some applied examples of ANN in geoscience and petroleum engineering.