1 Introduction

To supply uninterrupted electric power to the end users is a challenging task for the power system engineers. The cause of the fault may be beyond human control, but it is essential to detect the type of the fault and accurately locate it. Conductors contact with each other or ground, and then fault is generated. Different kinds of faults are single line-to-ground fault (SLG), line-to-line fault (LL), double line-to-ground fault (LLG) and triple line fault (LLL). SLG, LL and LLG faults are unbalanced faults, whereas LLL fault is balanced one. High fault current flows in the power system network due to short circuit, and it causes overheating and mechanical stress on the equipment of the power system [1,2,3,4,5].

Open circuit occurs when any one of the situation aries such as disconnection of one or more phases; or circuit breakers/isolators opens; or joins of cable or jumper break occurs at the tower tension point. Due to open circuit one or two phase, produce unbalance current in the system, so it causes heating of rotating machine. Such abnormal condition should be protected by protective scheme.

Information is collected from journals, books, conference papers, articles online libraries, and databases like IEEE, IET, ELSEVIER, Taylor & Francis, Google Scholar, Scopus, EBSCO and many more relevant websites.

The remaining part of the paper is systematized as follows. Section 2 presents conventional methods that are used for transmission line protection, Sect. 3 is about signal processing technique, Sects. 4 and 5 explain various methods of artificial intelligence (AI)-based techniques and some special techniques, Sect. 6 explains the strength and weakness of all the technique, Sect. 7 is about comparative study of fault classification, location and detection of transmission line, Sect. 8 presents practical case study and comparison of fault detection, classification and location methods, and Sect. 9 gives the conclusion drawn from the survey followed by references.

2 Conventional methods used for transmission line protection

Impedance measurement-based method and travelling wave method are the conventional methods broadly used for detection, classification and localization of the fault in a transmission line [6].

In impedance-based methods, the distance relay operation is accurate and reliable on low value of fault impedance, but did not rely for high fault impedance [7]. Based on a number of current and voltage signals collected from a terminal of transmission line, single-end or two-end impedance methods are proposed. The concept of single-ended impedance-based method is to identify the location of the fault by calculating the apparent impedance seen from one termination of the line. Impedance-based method fault position error is high due to high fault path impedance, load on the line, source parameters and shunt capacitance [8,9,10,11].

Two-ended impedance-based method is implemented to locate the fault to eliminate the above-said problems. The disadvantage of this method is a high computational burden due to measurement of current and voltage signals at two ends of the line. However, improve the accuracy to locate the fault [12,13,14].

Travelling wave-based methods are used to determine the distance of fault by using correlation of forward and backward waves travelling in a transmission line. This method has less error to locate faults in high resistance faults. But the main difficulties are computational burden, expensive and high sampling frequency, difficult for practical application [15,16,17].

3 Signal processing technique

3.1 Discrete wavelet transform

Time scale decomposition of DWT is done by a digital filtering process up to level 8 as displayed in Fig. 1. The fault signal is fed to the low pass filter (LPF) and high pass filter (HPF) where factor 2 is down-sampled. Detail coefficient (d1) is the production of HPF at level one. Approximation coefficient (a1) is the production of LPF at level one. Similarly, the process is continued to decompose the signal until and unless only two samples are left for decomposition. Due to less computational burden, DWT is used in fault analysis in a transmission line [18,19,20,21].

Fig. 1
figure 1

Decomposition tree of discrete wavelet transform (DWT)

The DWT of a signal x(t) is defined as

$${\text{DWT}}(x,m,n) = \frac{1}{{\sqrt {a_{0}^{m} } }}\sum\limits_{m} {} \sum\limits_{n} {x(k)\psi^{*} } \left( {\frac{{k - nb_{0} a_{0}^{m} }}{{a_{0}^{m} }}} \right)$$
(1)

where K, m and n are integer. \(a_{0}^{m}\) and \(nb_{0} a_{0}^{m}\) are represented as dilation (scale) and translation (time shift) parameter. \(b_{0}\) and \(a_{0}\) are constant and taken as 1 and 2, respectively [89].

3.2 Wavelet transform

Wavelet transform has flexible resizing of the window for use of frequency–time information. It is applicable for non-stationary signal. But detailed information can be obtained in WT at a higher sampling frequency [22,23,24].

Mathematical expression of a signal x(t) in WT is given below

$$W_{\tau ,s} (t) = \frac{1}{\sqrt s }\int\limits_{ - \infty }^{\infty } {x(t)\psi \left( {\frac{t - \tau }{s}} \right)} {\text{d}}t$$
(2)

Translation factor and scale factors are denoted as \(\tau\) and m, respectively. \(\psi (t)\) is the mother wavelet [25].

3.3 Wavelet packet transform (WPT)

To get the important data on high frequency, WPT is implemented. So both approximation coefficient (a1) and detail coefficient (d1) are decomposed to get full frequency band. Due to calculation burden, it is decomposed up to 4 levels as shown in Fig. 2. So WPT gives an excellent frequency resolution and maximum number of features than DWT [26,27,28].

Fig. 2
figure 2

Wavelet packet transform (WPT)

WPT of a signal x(t) is

$$W_{b}^{n,a} = 2^{a/2} \int {f(t)\psi_{n} (2^{ - a} t - b){\text{d}}t}$$
(3)

Wavelet position and scale are denoted by b and a. Mother wavelet is \(\psi_{n}\). The nth and (n + 1)th level decomposition are related as

$$W_{k}^{2n + 1,a + 1} = \sum {h(b - 2k)W_{b}^{n,a} }$$
(4)
$$W_{k}^{2n + 1,a + 1} = \sum {g(b - 2k)W_{b}^{n,a} }$$
(5)

where wavelet quadrature mirror filter coefficients are h(i)and g(i) [94].

3.4 S Transform

S transform is the combined properties of wavelet transform and short-time Fourier transform (STFT). It is implemented for non-stationary signals where the window width changes inversely with frequency. The main advantage of the ST than other signal processing tool is that it provides information on time, frequency and phase angle of signal. S transform is protected to noise. So it is widely used in fault studies in power system [29,30,31].

The mathematical expression for S transform [32] for signal x(t) is specified as:

$$S\left( {\tau ,f} \right) = \int\limits_{ - \infty }^{\infty } {x\left( t \right)} \frac{\left| f \right|}{{\sqrt {2\pi } }}e^{{\frac{{ - \left( {\tau - t} \right)^{2} f^{2} }}{2}}} e^{ - i2\pi ft} {\text{d}}t$$
(6)

Time and frequency signify t and f, respectively. τ is the control parameter for adjusting the Gaussian window. Frequency (f) and phase (ϕ) [32] of the signal is well defined in (7) and (8).

$$\phi \left( {\tau ,f} \right) = a\tan \left\{ {\frac{{{\text{imag}}(S(\tau ,f))}}{{{\text{real}}(S(\tau ,f))}}} \right\}$$
(7)
$$F\left( {\tau , \, f} \right) = \frac{1}{2\pi }\frac{\partial }{\partial t}\left\{ {2\pi f\tau + \phi \left( {\tau ,f} \right)} \right\}$$
(8)

4 Artificial intelligence (AI)-based techniques

Artificial intelligence (AI)-based methods are used for detection, classification and position of the fault in a transmission network. Support vector machine (SVM), decision tree (DT) classifier, extremely learning machine (ELM)-based method, artificial immune system (AIS), self-organizing map (SOM), auto-regressive neural network (ARNN), artificial neural network (ANN)-based technique, adaptive neuro-fuzzy inference system (ANFIS), adaptive resonance theory (ARP), fuzzy logic control (FLC) and expert system technique and many more are AI-based techniques used in power system. To find the solution of complex multiobjective nonlinear systems, the above-said methods are used to get faster solution and less error. The paper focuses on signal processing techniques in combination with artificial intelligence methods to accurately detect, locate and classify the faults in a transmission network.

4.1 Artificial neural network (ANN)

Due to simple, better  generalization property, adaptive nature, ANN is widely used for fault location, classification and detection in power system transmission line in both real-time and offline application. Faulty signal is trained by ANN as an input and to diagnose fault condition as an output [33].

4.2 Back-propagation neural network (BPNN)

For pattern recognition, BPNN is effectively used. To adjust the feedback of network, error is reduced. The main problem is selecting the number of neurons and hidden layers for each layer. Using large number of neurons and hidden layers makes the training process slow. On the other hand, less number of neurons and hidden layers make divergent of the training process [34] BPNN is used to identify the fault in the transmission network. BPNN and PNN (probabilistic neural network classifier) with S transform are used to detection and classification of fault is proposed in [35]. Six statics features are imported from current or voltage signals by S transform and then classified by probabilistic neural network (PNN). But under noise condition, the accuracy of fault classification is reduced.

4.3 Probabilistic neural network (PNN)

The training examples are classified allowing to their distribution values of probability density function (PDF) in PNN algorithms. Mathematically, the PDF is explained below [36]

$$f_{k} (X) = \frac{1}{{N_{k} }}\sum\limits_{j = 1}^{{N_{k} }} {\exp \left( { - \frac{{\left\| {X - X_{kj} } \right\|}}{{2\sigma^{2} }}} \right)}$$
(9)

The output vector of the hidden layer H is modified as

$$H_{h} = \exp \left( { - \frac{{\sum\nolimits_{i} {X_{i} } - W_{ih}^{xh} }}{{2\sigma^{2} }}} \right)$$
(10)
$${\text{net}}_{j} = \frac{1}{{N_{j} }}\sum\limits_{h} {W_{hj}^{hy} H_{h} } \,{\text{and}}\,N_{j} = \sum\limits_{h} {W_{hj}^{hy} ,}$$
(11)

\(net_{j} = \max_{k} (net_{k} )\) then \(y_{j} = 1\) else \(y_{j} = 0\). Number of input, hidden units, outputs, training examples and clusters are denoted as i, h, j, k and N, respectively. Smoothing parameter (standard deviation) and the input vector are presented as r and X, respectively. The Euclidean distance between the vectors X and \(X_{kj}\) is given below \(\left\| {X - X_{kj} } \right\| = \sum\nolimits_{i} {(X - X_{kj} )^{2} }\). The connection weight between the input layer X and the hidden layer H is \(W_{ih}^{xh}\) and hidden layer to the output layer Y is \(W_{hj}^{hy}\) as shown in Fig. 3 [86].

Fig. 3
figure 3

Structure of PNN

Input vector is classified into two classes in a Bayesian optimal manner. To calculate the PDF, Bayes decision rule is applied. All PDF is positive and equal to one after integration over all values [86].

4.4 Feedforward neural network (FFNN)

FFNN made with input, hidden and an output layer with multilayer perceptron and back-propagation learning algorithm. The error produced by this method is minimized by adjusting weight and biases of the network. FFNN structure is shown in Fig. 4 [37].

Fig. 4
figure 4

Feedforward neural network algorithm structure

If x 1, x 2,…., x i ,…x n are the input variable of neuron j. The output u j is given below

$$u_{j} = \varphi \left( {\sum\limits_{i = 1}^{N} {w_{ij} x_{i} + b_{j} } } \right)$$
(12)

where \(\varphi\) is the activation function and the bias of neuron j is b j . w ij is the weight factor connecting ith input and jth neuron [168].

4.5 Radial basis function neural network (RBFNN)

RBFNN contains 3 layers, and they are characterized by input, hidden and output layer. The input layer signals are given to the hidden layer where nonlinear radial basis function neuron action will take place, and linear neurons contain the output layer. Figure 5 shows the RBFNN architecture. The output Y is expressed as below

$$Y = f(x) = w_{0} + \sum\limits_{i = 1}^{m} {W_{i} \phi (D_{i} )}$$
(13)

where the x = input vector, bias = w 0, weight parameter = W i , number of nodes in hidden layer = m, radial basic function (D i ) is a Gaussian function.

$$\phi (D_{i} ) = \exp \left( {\frac{{ - D_{i}^{2} }}{{\sigma^{2} }}} \right)$$
(14)

where \(\sigma\) is the cluster radius. RBFNN locates the fault in transmission line better than BPNN [38, 39].

Fig. 5
figure 5

Architecture of radial basis function network

4.6 Fuzzy logic-based methods

Fuzzy logic works on the principle of ‘if–then’ relationship. It is used for classification, location and detection of fault in a transmission network. The computational burden of this method is less, but accuracy is affected due to the resistance of the fault and the inception angle of the fault [40, 41].

A simple overall organization of a fuzzy scheme consists of fuzzification, fuzzy inference system, fuzzy rule base and defuzzification as displayed in Fig. 6 for fault classification. In the fuzzification stage, crisp numbers are mapped into fuzzy set. After fuzzification, the fuzzified inputs are given to the fuzzy inference system,  and following the given fuzzy rule base, it gives the type of fault in its output. Finally, in the defuzzification stage, the fuzzy output set is mapped into crisp fault type [42].

Fig. 6
figure 6

Fuzzy logic system for fault classification

4.7 Adaptive neuro-fuzzy inference system (ANFIS)

Adaptive network means multilayer network, where every node operates a particular function of the applied data set. The function of the node varies node to node. It is similar to neural network, and the function is same as a fuzzy inference system. ANFIS is used for location and classification fault in a transmission line. Accuracy of this method is better. Due to fuzzy logic, it will take more time to train the data set. A method uses wavelet multiresolution analysis (MRA) to extract the important features, then applying the ANFIS to locate the fault in transmission line [43]. In [44], ANFIS method is compared with the fuzzy inference system (FIS), adaptive neuro-fuzzy inference system and artificial neural network (ANN) to locate the fault in the system. Error analysis by Monte Carlo simulation presents that the ANFIS algorithm is better reliable and precise than FIS and ANN methods in the circumstance of different simulations of various faults. But in this proposed method, computational efficiency is affected during processing of the data and more memory space is required for the calculation.

For fuzzy inference system, x and y are two inputs and fi is the output. Mathematically, 2 fuzzy if–then rules of Takagi–Sugeno’s are given below

  • Rule 1: if x is A 1 and y is B 1, then \(f_{1} = p_{1} x + q_{1} y + r_{1}\)

  • Rule 2: if x is A 2 and y is B 2, then \(f_{2} = p_{2} x + q_{2} y + r_{2}\)

where fuzzy sets are denoted by Ai and Bi and design parameters are pi, qi and ri.

The architecture of ANFIS consists of 5 layers as shown in Fig. 7 [21].

Fig. 7
figure 7

Structure of ANFIS

Layer 1: Every node in these layers is an adaptive node with a node function.

$$o_{i}^{1} = \mu_{Ai} (x)$$
(15)
$$o_{i}^{1} = \mu_{Bi} (y)$$
(16)

where input to the node are x and y and \(o_{i}^{1}\) is the membership function of Ai and Bi. Ai and Bi are the linguistic labels related to node function. So \(\mu_{Ai} (x)\) can adopt any bell-shaped function as follows

$$\mu_{Ai} (x) = \frac{1}{{1 + \left\{ {\left( {\frac{{x - C_{i} }}{{a_{i} }}} \right)^{2} } \right\}^{bi} }}$$
(17)

Label 2: Every node is fixed and multiplies with the incoming signal. Firing strength is the weight degree of the if–then rules. The output is

$$W_{i} = \mu_{Ai} (x)\mu_{Bi} (y)$$
(18)

Layer 3: It is the normalized layer and normalized the firing strength.

$$\overline{{W_{i} }} = \frac{{W_{i} }}{{W_{1} + W_{2} }}$$
(19)

Layer 4: All the nodes in these layers are square node with a node function.

$$\overline{{W_{i} }} f_{i} = \overline{{W_{i} }} (p_{i} x + q_{i} y + r_{i} )$$
(20)

Layer 5: The summation of all incoming signal output is

$$o_{i}^{5} = \sum\limits_{i} {\overline{{W_{i} }} } f_{i} = \frac{{\sum\nolimits_{i} {\overline{{W_{i} }} } f_{i} }}{{\sum\nolimits_{i} {W_{i} } }}$$
(21)

4.8 Decision tree (DT)

DT is a data miming classification technique. For high-dimensional pattern classification, DT is applied based on selection of attribute that maximizes and fixes data division. Attributes are split into several branches recursively until the termination and the classification are achieved. The mathematically DT technique is

$$\begin{aligned} \overline{X} = \left\{ {X_{1} ,X_{2} , \ldots \ldots \ldots \ldots X_{m} } \right\}^{T} \hfill \\ X_{i} = \left\{ {x_{1} ,x_{2} , \ldots x_{ij} , \ldots \ldots x_{in} } \right\} \hfill \\ S = \left\{ {S_{1} ,S_{2} , \ldots S_{i} , \ldots \ldots S_{m} } \right\}^{T} \hfill \\ \end{aligned}$$
(22)

The available observation number is m, the independent variable number is n, m dimension vector S is having the variable predicated from \(\overline{X}\) and\(X_{i}\). T is the vector transpose. The ith component of n dimension independent variable \(x_{i1} ,x_{i2} , \ldots x_{ij} , \ldots \ldots x_{in}\) is autonomous variable of the pattern vector \(X_{i}\).

The target of DT is to predict S based on the observation of \(\overline{X}\).  DTs shows differnt level of accuracy when developed from different \(\overline{X}\). To get the optimal tree is a difficult task because of the large size of the search space. This algorithm develops a DT by a sequence of local optimal decision about which features can be used to partition data set \(\overline{X}\). The optimal size DT \(T_{k0}\) is generated according to the below optimization problem

$$R(T_{k0} ) = \min_{k} \{ R(T_{k} )\} ,\quad k = 1,2,3 \ldots \ldots K$$
(23)
$$R(T) = \sum\limits_{t \in T} {\{ r(t)p(t)\} }$$
(24)

The misclassification error of the tree T k is presented by \(R(T_{k} )\), where the optimal DT model \(T_{k0}\) is used to reduce the misclassification error \(R(T_{k} )\). Binary tree (T) is \(T \in \{ T_{1} ,T_{2} ,T_{3} , \ldots \ldots ,T_{k} ,t_{1} \}\), where the index number of the tree is K, tree node is t, root node is t 1. Re-substitution estimation of error in misclassification of the node t is r(t) and probability drop into node t is p(t). \(T^{L}\) and \(T^{R}\) are denoted as subtrees and define the left and right set of partition. Figure 8 shows the lattice L binary partition into conjointly left/right sets. Two-dimensional binary classifications are shown in Fig. 9. The left set gets the lattice elements with feature q having less value than threshold. In right set, feature q value of lattice components is more than the threshold value [110].

Fig. 8
figure 8

Binary classification of DT

Fig. 9
figure 9

Boundary-based classification

4.9 Support vector machine

SVM is a statistical method used for the computational learning purpose [45]. Sequential minimal optimization (SMO) for kernel support vector  machine is implemented in LIBSVM to support the regression (nu-SVR). It has always given a global solution rather than local minima. The error bound is controlled with cost parameter, and width of hyper axis is controlled by gamma parameter. SVM is used for better accuracy in the location and classification of fault in a transmission line. SVM structure algorithms are shown in Fig. 10, where K(·) = kernel function, M = number of support vectors, F(x) = decision function, W = weights and b = bias [46,47,48].

Fig. 10
figure 10

SVM structure algorithms

For n dimension inputs Si(i = 1, 2,….M), where M is the sample number. Output Oi = 1 for class 1 and Oi = − 1 for class 2.

Mathematically, the hyperplane is

$$f(s) = w^{T} s + b = \sum\limits_{j = 1}^{n} {w_{j} s_{j} + b = 0}$$
(25)

where n dimension vector is w and b is a parameter. The position of hyperplane is decided by w and b magnitude as shown in Fig. 11.

Fig. 11
figure 11

SVM hyperplane for classification

If Oi = 1, then the constraints is \(f(s_{i} ) \ge 1\) and if Oi = − 1, then \(f(s_{i} ) \ge - 1\), so

$$O_{i} f(s_{i} ) = O_{i} (w^{T} s + b) \ge + 1$$
(26)

\(\left\| w \right\|^{ - 2}\) is the geometrical distance. Then the optimization problem of the optimal hyperplane is [45]

$${\text{Minimize}}\,\frac{1}{2}\left\| w \right\|^{ - 2} + C\sum\limits_{i = 1}^{M} {\xi_{i} }$$
(27)
$${\text{Subject}}\,{\text{ to}}\,O_{i} (w^{T} s + b) \ge 1 - \xi_{i} \,{\text{and}}\,\xi_{i} \ge 0\,{\text{for}}\, \, i = 1,2, \ldots M$$
(28)

The optimal bias of b * is

$$b^{*} = - \frac{1}{2}\sum\limits_{{SV_{s} }} {O_{i} \alpha_{i}^{*} } (v_{1}^{T} s_{i} + v_{2}^{T} s_{i} )$$
(29)

For class 1 and class 2, v 1 and v 2 are random SVM.

The decision function is

$$f(s) = \sum\limits_{{SV_{s} }} {\alpha_{i} O_{i} } s_{i}^{T} s + b^{*}$$
(30)

Data samples have classified as

$$s \in \left\{ \begin{array}{ll} {\text{Class-}}1, & f(s) \ge 0 \hfill \\ {\text{Class-}}2, &{\text{otherwise }} \hfill \\ \end{array} \right\}$$
(31)

4.10 Random forest

Biggest grouping de-correlated tree interpreters are called as random forest, and every tree independently depends on the random vector sample. Instability and  noise are the major disadvantage of a singular tree, but when developed suitably deep, they have a comparatively small bias. So there are perfect candidates for collaborative rising as they can apprehend complex interactions and totally benefit from a combination-based variance decrease [49]. Random selection of features to divide each node and resampling the training set to propagate each tree yield error rates that are de-correlated and noise tolerant. The errors of the forests are converged to a perimeter as the number of trees in the forest is huge [50].

The main concept of the collaborative tree growing processes is in the nth tree (n ≤ T tree, total number of tree ensemble). \(\theta n\) is produced as a random vector and independent of \(\theta n, \ldots \ldots \theta n - 1\) previous random vector in the same distribution. From the training set M, single tree grows and the attribute set \(\theta n\),  and output classify Sn (Y, \(\theta n\)), here input vector is Y. In the random riven selection, \(\theta\) contains the number of Ttree, the no of attributes Ta > T try in the training set M.

The assemblage of tree-structured classifiers {Sn(Y, \(\theta n\)), n = 1,…T tree}  contained in random forest, where \(\theta n\) is liberated indistinguishable distributed random vectors and tree performs a unit vote for the utmost popular class at input Y, respectively.

All distinct trees are united to predictions for ensemble of trees. For the class that most trees vote is reverted as the extrapolation of the ensemble to classify.

$$c_{\text{RF}}^{{T_{\text{tree}}}} (Y) = {\text{majority}}\,{\text{vote}}\,{\hat{C}_{n}} (Y),n = 1 \ldots \ldots \ldots n_{\text{tree}}$$
(32)

\(\hat{C}_{n} (Y)\) is the class prediction of the nth RF tree.For classification, the class that most trees vote is returned as the prediction of the ensemble. In relatively class frequency, i.e. for prediction probability single tree average is

$$p_{\text{RF}}^{{T_{\text{tree}} }} \left( {c_{\text{RF}}^{{T_{\text{tree}} }} \in \{ M,I\} Y} \right) = \frac{1}{{T_{\text{tree}} }}\sum\limits_{1}^{{T_{\text{tree}} }} {P_{{{\text{Sn(}}\theta n,T )}} } (c_{n} \in \{ M,I\} |Y)$$
(33)

where \(P_{{{\text{Sn(}}\theta n,T )}}\) is the probability associated with Y by the RF tree Sn(Y, \(\theta n\)). A old-fashioned decision tree basically signifies an overt decision boundary, and a case E is classified into class c if E falls into the decision area consistent to c. The class probability p(c|E) is normally projected by the portion of occurrences of class c in the leaf into which E falls [51, 52].

4.11 Extreme learning machine (ELM)

Extreme learning machine has only one optimize hidden layer. The main advantage of ELM is that there is no requirement of tuning of the hidden layer. Figure 12 shows the structure of ELM. Kernel function and nonlinear activation function are applied to scale the data for a definite range. Weight and bias value adjustment is not required in ELM methods. It is faster and gives better performance than conventional function ELM used for fault location and classification in the power system network [53, 54].

Fig. 12
figure 12

Structure of ELM

ELM technique is explained by using a training data set of \(\left\{ {x_{i} {\kern 1pt} ,} \right.\left. {y_{i} } \right\}\) where \(x_{i} \in \Re^{p}\) and \(t_{i} \in \Re^{q}\), i = 1… n. n is the number of samples. Mathematically, single hidden layer feedforward neural network expressed as

$$\sum\limits_{i = 1}^{l} {\beta_{i} } f\left( {w_{i} {\kern 1pt} \cdot x_{j} + b_{i} } \right) = O_{{j{\kern 1pt} ,\,j = 1, \ldots \ldots \ldots ,n}}$$
(34)

where f(x) is the activation function. w i is the weight that connects ith input neuron to hidden neuron and β i is the weight that connects ith hidden neuron to output neuron. ‘b i ’ is the bias represented by the threshold of the ith hidden neuron, and output of jth input is denoted by O j.

For n sample and (L) hidden layer of the ELM is given below [55]

$$\sum\limits_{j = 1}^{l} {\left\| {O_{j} - t_{j} } \right\|} = 0$$
(35)

So, (33) turns out to be:

$$\sum\limits_{i = 1}^{l} {\beta_{i} } f\left( {w_{i} \cdot x_{j} + b_{i} } \right) = t_{j,\,\,\;j = 1, \ldots \ldots \ldots ,n}$$
(36)

(35) is stated as:

$$H\beta = T$$
(37)

where

$$H = f\left( {w_{i} \cdot x_{j} + b_{i} } \right)$$
(38)

H is the hidden layer matrix, and input weight w and biases b are randomly chosen. So least-square solution

$$\min\nolimits_{\beta } \left\| {H\beta - T} \right\|$$
(39)

The result of (36) is stated as:

$$\mathop \beta \limits^{\sim} = H^{\dag } T$$
(40)

where H signifies the Moore–Penrose general inverse of matrix H, hidden layer output matrix is symbolized by β and target matrix is symbolized by T.

5 Emerging computational intelligence techniques

5.1 Stationary wavelet transform (SWT)

SWT is alike as WPT and also called as non-decimated wavelet transform. The main change in SWT is up-sampling of the decomposed coefficients. So the filter coefficient at every level holds the same no. of samples as the original signal. DWT does not get the equivalent shift of the output, but SWT has this property due to shifting of the original signal [128]. Filtering and feature extraction by  SWT is applied in [130]. Decaying DC offset current due to current transformer, high-order harmonics and noise are removed by SWT.

5.2 Principal component analysis (PCA)

The main advantage of PCA is to map the data from the original high-dimensional space to low-dimensional subspace, so the dimension of the data is reduced, where the best outcome is the variance of the data [145]. In [140] wavelet transform (WT) and principal component analysis (PCA), techniques are used for location and classification faults in Taipower 345 kV power transmission network. For feature extraction, PCA is used in [146].

5.3 Wide-area fault location methods

Location of faults in power network PMU plays a major role, but failure occurs to locate the fault if the end terminal PMU fails to record the faulty signal. It is not economical to locate PMU at every bus of the network due to communication problem and high cost. But optimal PMU placement overcomes this problem [147]. In [148, 149], location of fault in transmission grid was determined using wide-area synchronized voltage measurements with the help of global positioning system (GPS) receivers. The main advantage of the proposed algorithm is, it requires less synchronize measuring devices. The outcomes of the technique give closed-form expression solution. Location of the fault in transmission line by using a non-iterative wide-area technique was proposed in [150]. Impedance matrix was developed by the help of pre-fault positive-sequence and negative-sequence network topology. The location of the fault in the transmission line is determined by using linear least-squares method. The accuracy of the technique is not affected by the high resistance fault. In [151], PMU was used to synchronize the voltage and current signals for the localization of the fault in the transmission grid and successfully diagnose the fault in a hierarchical manner.

5.4 Modal transformation

The phase signal of three-phase systems is decomposed into their modal components by means of the modal transformation matrices. For the un-transposed multiphase lines, eigenvector-based transformation matrix is applied to the phase impedance and admittance matrices to decide the current and voltage transformation matrices. Wedepohl, Karrenbauer and Clarke transformations are non-identical real-value matrices, which are selected for balanced (equally transposed) multiphase lines [135]. In [152, 153], Clarke transformation was implemented to decouple three-phase quantity to α, β (two stationary phase components) and 0 (zero-sequence component) on the basis of characteristics of fault.

5.5 Independent component analysis (ICA)

ICA is defined as given below.

Let random vectors be X and S, where \(X = \{ x_{1} ,x_{2} , \ldots \ldots \ldots x_{n} \}\) and \(S = \{ s_{1} ,s_{2} , \ldots \ldots \ldots s_{n} \}\). The matrix A has element a ij . The X T is the transpose of X a row vector, as all vectors are taken as column vectors. The mixing model is

$$X = As$$
(41)

If A is the columns matrix, then A is denoted by a j . The modified model matrix is

$$X = \sum\limits_{i = 1}^{n} {a_{i} s_{i} }$$
(42)

The ICA is the statistical model in Eq (40). So ICA is also called generative model [154, 155]. In [156], a combination of ICA, travelling wave and SVM in high-voltage (HV) transmission lines for the location and classification of fault is proposed. The results of the technique give 100 and 99% classification and location accuracy in a real transmission line of a noise faulty signal environment. The main advantage ICA technique overcomes the noise problem in the signal. ICA works on the principle of blind source separation problem [157] and applicable for the separation of the Gaussian signals from non-Gaussian signals [158].

5.6 Pencil matrix method

To extract the parameter from the exponentially damped/undamped signal, PMM is applied in [159]. PMM is less affected by noise and has better computational efficiency [160]. PMM is also used to extract the fundamental frequency component of transmission line and eliminating the DC offset and higher-order harmonic components of the faulty signal [161]. The algorithm of matrix pencil is explained in [162, 163].

6 Strength and weakness of all the technique

Generalized strength and weakness of all the technique are explained in Table 1 as given below.

Table 1 Generalized strength and weakness of all the techniques

7 Comparative studies of fault classification, location and detection of transmission line

To sustain the stability of power networks, it is required to detect the fault and locate the fault in a transmission line. So many methods and techniques are used to detect the faults. Different circumstances like the fault inception angle, loading condition, fault resistance, harmonics and DC offset in the fault signal result in unsatisfactory output. Researchers have implemented various methods and algorithms in both online and offline to identify, locate and classify the faults on transmission network, so that the system operates effectively and efficiently. Comparative analysis of different methods that are used for classification, location and detection of fault in transmission line is shown in the table below. The purpose of the system, input used for algorithm, features and numerical result of the various methods are highlighted in Tables 2, 3, 4, 5 and 6. Table 2 represents the comparative study of fault location of a transmission line, and Table 3 shows the comparative study of fault classification of a transmission line. Fault classification and detection of transmission line is presented in Table 4, and fault classification and location of transmission lines is compared in Table 5. Comparative study of fault classification, location and detection of transmission line is presented in Table 6.

Table 2 Comparative study of fault location of a transmission line
Table 3 Comparative study of fault classification of a transmission line
Table 4 Comparative study for classification and detection of fault on a transmission line
Table 5 Comparative study of fault classification and location on transmission line
Table 6 Comparative study of fault classification, location and detection of transmission line

8 Practical case study and comparison of fault detection, classification and location methods

Travelling wave-based technique is implemented in [164] to locate the fault in a 230 kV, 200-km transmission line using the real-time digital simulator (RTDS). The main advantage of this technique is synchronization of data from both the terminal is not required. So this method is applicable for real-time application for synchronized or unsynchronized two-terminal data. The outcome of this method is acceptable. In [165], PMU-based state estimation technique is implemented in a real 18-bus distribution network for the detection and location of faults and faulty line. The outcomes of this method are not affected by noise and the nature of load/generators. But this technique is more costly, as PMU is placed at every bus of the system. Current and voltage signals of both ends are used to locate the faults at CEMIG (Energetic Company of Minas Gerais—Brazil) transmission lines in [166]. Digital event recorders are installed for collection of signal. The proposed algorithms mainly depend on the fault point voltage magnitude and do not require the phase angle and synchronized data set. So this technique is robust, accurate and easy to apply in real short-circuit cases. The fault location error is only 0.03%. The maximal overlap discrete wavelet transform (MODWT) [167] is applied in real-time detection of fault, where faults are produced by the real-time digital simulator. MODWT has the same characteristic as DWT but up-sampling take place there. The current and voltage signals are decomposed by MODWT and then computed for detection of fault in real time. But this technique’s accuracy is affected by the saturation of the transducer. In [169], maximal overlap discrete wavelet transform (MODWT) and discrete wavelet transform (DWT) are implemented in real time for fault detection and location 500 kV, 400-km-long transmission lines. The MODWT gives acceptable accuracy (mean error is 0.63%) as comparable to DWT. The technique is executed with the help of real-time digital simulator (RTDS). Real-time and offline fault classification is done in [170] by using the MODWT technique in 230 kV transmission line. Offline and real-time classification are evaluated by using actual oscillographic records and real-time digital simulator (RTDS), respectively. For line-to-ground and line-to-line faults, the classification accuracy in real time is 100%. But in the wavelet coefficient energy investigation, the misclassification problem occurs for the double line-to-ground fault. In [136], hardware arrangement is done for analysis of faults. The high-speed communication action is done by fibre-optic links/Etherne to locate the fault quickly, where PMU/digital fault recorder (DFR) is used as sampling unit.

9 Conclusions

Conventional methods are used for detection, classification and location of the fault in the transmission network, but to overcome the limitation of these methods, signal processing technique and artificial intelligence (AI)-based methods are widely applied in power system protection. Some of the selective and important papers are analysed to compare the system use, techniques, methods, input signal, features and numerical results, where artificial Intelligence (AI)-based method is the efficient, fast, accurate and robust for detection, classification and location of the fault in a transmission line. This paper helps the researcher for development and further study in this field.