1 Introduction

Electrical power system is a backbone of today’s economy. Telecommunication network, banking, computer, manufacturer and railway networks are few examples that cannot function without reliable of electrical power. Power systems consist of electrical components assembled for the purpose of generating and transmitting electrical power to end users through transmission and distribution lines. Examples of electrical component including generators, transformers, buses, transmission lines, distribution lines and load.

Transmission lines are the primary component of the power systems and become an important aspect of reliable power system operation. Unfortunately, it exposed directly to environment conditions such as lightning, wind, physical contact by animals, falling tree and dirty insulators [1, 2]. As a result, they face the highest failure rate among the power systems components. It is about 80–90% of fault rate occurred on transmission lines power system [2, 3]. Protection scheme has become a great concern to protect the power systems against possible faults and any abnormal conditions. An effective protection scheme offers fast systems restoration, improves system availability and performance, saves time for searching crew to locate the fault and will reduce operating cost. In addition, earlier faults clearance and restoration not only offer reliability but sometimes will block any disturbance happen in power systems that may cause to blackout [4]. Therefore, it is important to detect, classify and locate the fault occurrences in power systems as fast as possible. Recently, many researchers have proposed fault classification schemes in power systems using artificial intelligent (AI) techniques and produced promising result in determining the correct fault types. In general, this method can be divided into three stages. The first stage is signal processing of three-phase voltage and/or current fault signals. The second stage is the feature extraction whereby significant important features are extracted processed signals. The third stage is the fault classification in which the extracted features are input to the selected classifiers in order to perform a variety of protection functions such as fault detection, zone identification, classification and location.

Many researchers have applied steady-state components to extract the features from fault current and/or voltage signals [5,6,7]. However, the technique shows poor performance due to easily affected by the changes of the surrounding factors such as fault location, noise, fault resistance and fault inception angle. Recently, some researchers applied fault-generated transient signals, and results showed that the technique is proven to be unaffected by the surrounding conditions [8, 9]. A proper extracting tool for extracting fault transient signals is also important. This is necessary to provide an efficient signal processing tool where the significant features from fault transient signals can be extracted and input to the classifiers. Wavelet transform (WT) is a well-known signal processing tool with a superior ability to localize the time–frequency that was selected in this study. In many studies, the use of WT in power system protection has proven successful and provided excellent results for fault detection, zone identification, classification and location estimation [9,10,11].

Nevertheless, the main difficulty in transmission lines fault classification is distinguishing the presence of ground fault. Some researchers have highlighted that the ground fault is the most difficult to classify among the fault types [8, 9, 12, 13]. Unlike the three-phase faults, the ground fault does not have a specific phase current to represent itself. As a result, the accuracy performance of the faulted phases that involved ground fault had been found lower as compared to those that do not involve ground fault. Besides, the presence of ground fault highly depends on the three-phase currents. Therefore, it is important to extract a feature that is able to correlate the phase currents with the ground fault. This is because the ground phase is the most difficult phase to classify amongst the three-phase fault. Therefore, more focus on detection of ground fault should be given in order to improve the performance of fault classification.

Furthermore, many of research work has been directed to address the problem of an accurate performance of the fault classification scheme. Various classifiers were introduced by researchers such as Adaptive Neuro-Fuzzy (ANFIS), Neural Networks, Rough Membership neural networks (RMNN), Fuzzy K-Nearest Neighbours (Fuzzy-KNN), K-Nearest Network (KNN) and Support Vector Machine (SVM) [9, 12, 14, 15]. In all of this, ANN is one of the most frequently and the best AI classifiers which have been used by researcher to perform the fault classification in the transmission lines with highest accuracy performance [9, 16, 17]. The most popular and commonly used ANN, called the MLP network, had been employed as the classifier to perform fault classification in transmission lines. Therefore, the transmission lines fault classification scheme that can offers better accuracy performance based on combination of WT–MLP network technique should be proposed that can deal with wide variation of surrounding conditions.

The primary aim of this study is to establish a robust scheme for classification of faults on power transmission lines based on WT and ANN. In order to achieve this aim, the study has two objectives. The first objective is to formulate a new feature based on fault transient signal that able to correlate between the three-phase faults and ground fault (g) called class-dependent feature (CDF). The major problem in classifying fault types on transmission lines is obviously due to the presence of ground fault. Many researchers have highlighted that the ground fault is the most difficult to classify among the fault types [8, 9, 12, 13]. The CDF is formulated to provide sufficient information to ground fault and will hopefully improve the classification performance. The second objective is to construct a new ANN structure called 2-Tier MLP network. The 2-Tier MLP network is a new structure where two MLP networks are jointly used to solve the fault classification types. In the 2-Tier MLP network, data samples that are easy to learn and correctly classified (referred to the three-phase fault, A, B and C) will be carried out by MLP 1 (Tier 1). Meanwhile, data samples that are more difficult to learn (referred to the ground fault) will be trained using MLP 2 (Tier 2). In this way, each MLP network is tuned independently, and hence, more emphasis can be put on samples that are difficult to learn while less emphasis on samples that are relatively easy to learn. This structure will hopefully enhance the classification performance as well as optimize the network size and parameters.

Fig. 1
figure 1

Fault classification using AI technique

Fig. 2
figure 2

Flowchart of fault classification scheme

This study has several limitations: first the scheme is mainly focused on fault classification. It requires fault detector to detect the fault occurrence prior to classification. The scheme also does not provide information in determining the actual fault position. Another limitation is that the scheme is developed for fault classification in three-phase single transmission lines.

In general, the method comprises of four main steps: (1) modelling of the power transmission lines, (2) signal processing, (3) features extraction and (4) fault classification. In the first step, a power transmission line is modelled and various fault conditions are simulated using MATLAB/Simulink. In signal processing step, WT is applied to extract features from fault transient current. In features extraction stage, wavelet features and a new feature that properly describe the presence of ground fault called CDF are extracted. Finally, these features are fed into the classification stage. A new ANN structure called 2-Tier MLP network that able to separate the difficulty of dataset based on their true class label is proposed to improve fault classification performance.

This paper is organized into four sections. Section 1 is an introduction. Section 2 presents the proposed fault classification using AI technique. Section 3 consists of results and discussion of the proposed work. Finally, Sect. 4 discusses on the conclusion.

2 Fault classification using artificial intelligent (AI) technique

In general, most of the fault classification using AI techniques consists of three main stages, which are signal processing, feature extraction and fault classification. Figure 1 shows a block diagram of a transmission lines fault classification using the AI technique. First, the three-phase fault voltage and/or current are processed into a suitable form to represent the occurrences of faults. Then, important information is extracted from the processed signals in the feature extraction stage. Finally, the extracted feature is used by AI to classify the type of fault occurrence in the transmission lines power system.

The proposed fault classification scheme is illustrated in Fig. 2. It involves four main stages named modelling of power transmission lines, signal processing, feature extraction and classification. The following subsections describe in detail the four main stages of the proposed fault classification scheme.

Fig. 3
figure 3

Single power transmission lines model

2.1 Modelling of power transmission lines

The modelling of the power transmission lines is required to generate fault signals. Figure 3 shows the single power transmission lines model for this study. It consists of two source generators: S1 and S2, two buses: bus A and bus B and a Circuit Breaker (CB). Basically, digital relay requires two input signals: currents and voltages measured via instrument transformers. Instrument transformers are the current transformers (CT) and capacitive voltage transformer (CVT) used to replicate and scale down the primary current and voltage signals, respectively. The transmission line parameters considered in this work are tabulated in Table 1.

Table 1 Transmission lines parameters and their values

The model is designed using MATLAB/Simulink signals that consist of faults are obtained by varying three fault parameters: fault inception angle (\(\theta \)), fault resistance (\({R}_{\mathrm{f}}\)), and fault location (L). These parameters are varied to generate ten types of faults, namely Ag, Bg, Cg, AB, BC, CA, ABg, BCg, CAg and ABC faults.

2.2 Signal processing using wavelet transform (WT)

Fault-generated signals from the transmission lines model are input to the signal processing stage. At this stage, the transient components of fault current are utilized as input to WT to extract the important features. The fault will cause the amplitude of the current signals on the faulted phase(s) to change suddenly as compared to those healthy phases. Based on this fact, the faulted phase selection can be developed, which can give high performance under different systems and fault conditions.

2.2.1 Wavelet transform (WT)

The WT has been known to be an effective processing tool since it decomposes the original signals into different frequency components. Hence, WT is defined for a number of given signals found in the continuous domain. On the other hand, continues wavelet transform (CWT) is defined as the sum of all time of the signal multiplied by scaled and shifted versions of the mother wavelet \(\psi (t)\). Besides, the CWT of the given signal f(t) is defined in (1).

$$\begin{aligned} \hbox {CWT}(f,a,b)=\frac{1}{\sqrt{|a|}}\int _{-\infty }^{+\infty } {f(t)\psi *\left( {\frac{t-b}{a}} \right) } \hbox {d}t \end{aligned}$$
(1)

where \(\psi (t)\) is known as mother wavelet and ‘*’ is a complex conjugate. a and b are the scale factor and translation factor, respectively. In practical applications, discrete wavelet transform (DWT) has always been used as it is the discretized version of the WT. The DWT of the f[k] is shown in (2).

$$\begin{aligned} \hbox {DWT}[f,m,n]=\frac{1}{\sqrt{a_o^m }}\sum _k {f[k]\psi *\left( {\frac{n-ka_o^m }{a_o^m }} \right) } \end{aligned}$$
(2)

f[k] is the sampled input signal, while \(a=a_0^m\) and \(b=ka_0^m\) represent the discretized parameters of the scale factor and the translation factor, respectively. The DWT is applied through the multi-resolution analysis (MRA) or the sub-band coding. The aim of the MRA computing is to develop representations of the input signal at various levels of components resolution. The MRA computing is an effective signal processing tool in analysing and monitoring fault in protection system and power system disturbance [18, 19]. It is also known as a simple and fast algorithm, besides being capable in providing a sufficient amount of dataset reduction. The MRA computing of the discrete signal f[k] is illustrated in Fig. 4. By applying the MRA, the discrete signal f[k] is decomposed into two different scales via high-pass filter (HPF) and low-pass filter (LPF) in pair. After that, each pair of signal is decomposed into two categories called components approximation (cA1) and components detail (cD1). The outputs from both filters are decimated by two to obtain the detail coefficient (cD1) and the approximation coefficient (cA1), which constitute the level one decomposition of the original signal at the first stage. The approximation coefficient (cA1) is then sent to the second stage to repeat the procedure. Finally, the signal is decomposed up to the desired nth level.

Fig. 4
figure 4

Wavelet decomposition using MRA

Fig. 5
figure 5

Time window of Ag fault

In order to evaluate the robustness of the system, several authors have included noise into their system [9, 17, 20]. In this study, a white-noise model with signal-to-noise ratio (SNR) set to 20 and 30 dB has been included in the system.

To obtain the transient components, this study proposes to select one quarter (1/4) cycle post-fault currents (equivalent to 5 ms time duration). This method was found sufficient to present the transient components and was suggested in the literature [9]. Figure 5 shows the time window of one quarter of the post-fault currents. The transient components of fault current are marked by the red dotted box. The fault will cause the amplitude of the current signals on the faulted phase(s) to change suddenly as compared to those healthy phases.

The one quarter post-fault currents signal was input to WT to decompose into 7th level of detailed and approximate coefficients. The decomposition levels were found to be enough to provide the information of fault signals as suggested in [1, 13].

Figure 6 shows the MRA decomposition level for Ag fault current signals for phases A. From the graph, phase A shows the higher amplitude for all reconstruction Detail components (cD) and Approximate components (cA). Meanwhile, the amplitude of all reconstruction Detail components (cD) and Approximate components (cA) for the phases B and C is very small due to unfaulted phases. The sample rate is set to 50 kHz, and therefore, 250 sampling points are obtained for 5 ms time duration.

Fig. 6
figure 6

Decomposition of WT for phase A of Ag fault

In WT, the decomposition task is performed by the mother wavelet (\(\psi (t)\)). The chosen mother wavelet is also very important, as it can provide the best similarity information regarding fault phases. There are various types of the mother wavelet families applied by the researchers including Daubechies (db), Morlet, Coiflet and Symlet (sym) [9, 10, 15]. In this study, Daubechies-4 (db4) mother wavelet has been selected as mother wavelet as it provides the best similarity to the fault transient and is capable of providing sufficient information for the classifier [9, 10].

2.3 Features extraction

In any signal processing application, the input signal contains a large amount of information. It is important to extract only interesting information for further processing. Features extraction, therefore, is one of the most important processes after performing the signal processing.

Recently, WT has been used by many researchers, especially in power systems due to its strong capability of frequency domain and time analysis [21,22,23]. The WT converts the signals into coefficient components while maintaining the similarity of the input fault signal at particular location and the reconstruction time scale [24, 25]. From all of the seven levels decomposed, only detail components (cD) will be utilized to calculate the significant features as suggested by many researchers [9, 15, 26, 27].

In this study, three common features are extracted from the three-phase current fault signals in terms of energy and statistic features. The three features are wavelet energy, mean and standard deviation [9, 25, 28, 29]. All the detail component samples are squared before calculating the features. The aim is to obtain positive sample values and enhance the classifier accuracy performance. All the detail component samples are also reconstructed to obtain original signal without loss of information. Finally, all extracted features are normalized between 0 and 1 to prevent features with large numerical ranges that dominate the features with a small numerical range. The details of the six features are described as follows:

  1. 1.

    Wavelet energy ( E )

For each fault state, three wavelet energy was calculated including \(E_{\mathrm{A}}\), \(E_{\mathrm{B}}\) and \(E_{\mathrm{C}}\). The \(E_{\mathrm{A}}\), \(E_{\mathrm{B}}\) and \(E_{\mathrm{C}}\) represent the energy for each phase of the three-phase fault current signals. The formula for the wavelet energy, \(E_{i}\), is given as follows:

$$\begin{aligned} E_i =\sum _{h=1}^6 {\sum _{j=1}^N {|cD_{ih} (N)} } |^{2} \end{aligned}$$
(3)

where i represents the three-phase current, \(i\,=\,\)A, B, C and h represent the MRA decomposition of cD from level 1 to 6. j is the waveform sample index at 50 kHz sampling point, \(j={ 1, 2,\ldots , 250}\).

  1. 2.

    Wavelet mean ( \(\mu \) )

For each fault state, three wavelet means were calculated including \(\mu _\mathrm{A}\), \(\mu _\mathrm{B}\) and \(\mu _\mathrm{C}\). The \(\mu _\mathrm{A}\), \(\mu _\mathrm{B}\) and \(\mu _\mathrm{C}\) represent the mathematical statistic means for each phase of the three-phase fault current signals. The formula for wavelet mean, \(\mu _{i}\), is given as follows:

$$\begin{aligned} \mu i=\frac{1}{N}\sum _{h = 4}^7 {\sum _{j=1}^N {(cD_{ih} (N))} } \end{aligned}$$
(4)

where h for wavelet mean represents the MRA decomposition of cD from level four to seven.

  1. 3.

    Wavelet standard deviation ( \(\sigma \) )

For each fault state, three wavelet standard deviations were calculated including \(\sigma _\mathrm{A}\), \(\sigma _\mathrm{B}\) and \(\sigma _\mathrm{C}\). The \(\sigma _\mathrm{A}\), \(\sigma _\mathrm{B}\) and \(\sigma _\mathrm{C}\) represent the mathematical statistic standard deviation for each phase of the three-phase fault current signals. The formula for wavelet standard deviation, \(\sigma _{i}\), is given as follows:

$$\begin{aligned} \sigma _i =\left[ {\frac{1}{N-1}\sum _{h=4}^7 {\sum _{i=1}^N {\left( (cD_{ih} (N))-\mu _i (cD_{ih} (N)\right) ^{2}} } } \right] ^{1/2} \end{aligned}$$
(5)

2.4 Fault classification using class-dependent features and 2-Tier MLP network

As previously mentioned, the main difficulty in transmission lines fault classification is distinguishing the presence of ground fault. It was also known that the presence of ground fault highly depends on the three-phase currents. Therefore, it is important to extract a feature that is able to correlate the phase currents with the ground fault. This study proposed a novel feature that is able to correlate the three-phase currents with the ground fault. The proposed features are called CDF. This study also proposed a novel structure called the 2-Tier MLP network to improve the accuracy of fault classification. The following will explain the CDF and 2-Tier MLP network in detail.

2.4.1 Class-dependent features (CDF)

The CDF is obtained by considering the type (class) of faults occurring on the three-phase currents (\({y}_{\mathrm{A}}\), \({y}_{\mathrm{B}}\) and \({y}_{\mathrm{C}}\)) and the selected wavelet features. The type of faults is determined by first, presenting the wavelet features [wavelet mean (\(\mu _{\mathrm{i}}\)) and standard deviation (\(\sigma _{\mathrm{i}}\))] to a MLP network. Then, the network is trained to classify the type of faults occurred. Finally, the outputs of this MLP network are rounded to the nearest number, and this is equivalent to the type of faults. The three-phase currents (\({y}_{\mathrm{A}}\), \({y}_{\mathrm{B}}\) and \({y}_{\mathrm{C}}\)) will take the value of 0 for healthy phase and 1 for faulted phase. Meanwhile, the selected wavelet features for this study are wavelet energy (\(E_{\mathrm{i}}\)) and wavelet mean (\(\mu _{\mathrm{i}}\)). The selection of wavelet energy (\(E_{\mathrm{i}}\)) and wavelet mean (\(\mu _{\mathrm{i}}\)) is determined experimentally. Figure 7 shows the block diagram to represent the inputs and outputs of CDF. The CDF consists of three features which are energy, \(E_\mathrm{CDF}\), mean, \(\mu _\mathrm{CDF}\) and fault category, \(F_\mathrm{CDF}\). The CDF was derived by observing the value of the wavelet energy, mean and faults occurring on the three-phase currents (\({y}_{\mathrm{A}}\), \({y}_{\mathrm{B}}\) and \({y}_{\mathrm{C}}\)). It was found that the single line fault (A or B or C fault) is usually associated with the occurrence of ground fault. It was also observed that, for double line faults (AB, AC and BC), the value of energy and mean of one faulted phase will become lower with the presence of ground fault. Meanwhile, the three-line phase (ABC) usually indicates the absence of a ground fault condition. Based on these observations, the mathematical description of the three CDFs is given as follows:

$$\begin{aligned} E_\mathrm{CDF}= & {} \left\{ {\begin{array}{lll} {\min (E_\mathrm{A} ,E_\mathrm{B} ,E_\mathrm{C} )}&{} \quad \hbox {if}&{} {y_\mathrm{A} \cup y_\mathrm{B} \cup y_\mathrm{C} } \\ {\min (E_\mathrm{A} ,E_\mathrm{B} )}&{} \quad \hbox {if}&{} {y_\mathrm{A} \cap y_\mathrm{B} } \\ {\min (E_\mathrm{A} ,E_\mathrm{C} )}&{} \quad \hbox {if}&{} {y_\mathrm{A} \cap y_\mathrm{C} } \\ {\min (E_\mathrm{B} ,E_\mathrm{C} )}&{} \quad \hbox {if}&{} {y_\mathrm{B} \cap y_\mathrm{C} } \\ {\max (E_\mathrm{A} ,E_\mathrm{B} ,E_\mathrm{C} )}&{} \quad \hbox {if}&{} {y_\mathrm{A} \cap y_\mathrm{B} \cap y_\mathrm{C} } \\ \end{array}} \right. \end{aligned}$$
(6)
$$\begin{aligned} \mu _\mathrm{CDF}= & {} \left\{ {\begin{array}{lll} {\min (\mu _\mathrm{A} ,\mu _\mathrm{B} ,\mu _\mathrm{C} )}&{} \quad \hbox {if}&{} {y_\mathrm{A} \cup y_\mathrm{B} \cup y_\mathrm{C} } \\ {\min (\mu _\mathrm{A} ,\mu _\mathrm{B} )}&{}\quad \hbox {if}&{} {y_\mathrm{A} \cap y_\mathrm{B} } \\ {\min (\mu _\mathrm{A} ,\mu _\mathrm{C} )}&{} \quad \hbox {if}&{} {y_\mathrm{A} \cap y_\mathrm{C} } \\ {\min (\mu _\mathrm{B} ,\mu _\mathrm{C} )}&{} \quad \hbox {if}&{} {y_\mathrm{B} \cap y_\mathrm{C} } \\ {\max (\mu _\mathrm{A} ,\mu _\mathrm{B} ,\mu _\mathrm{C} )}&{} \quad \hbox {if}&{} {y_\mathrm{A} \cap y_\mathrm{B} \cap y_\mathrm{C} } \\ \end{array}} \right. \end{aligned}$$
(7)
$$\begin{aligned} F_\mathrm{CDF}= & {} \left\{ {\begin{array}{lll} 1&{} \quad \hbox {if}&{} {y_\mathrm{A} \cup y_\mathrm{B} \cup y_\mathrm{C} } \\ 2&{}\quad \hbox {if}&{} {y_\mathrm{A} \cap y_\mathrm{B} } \\ 3&{} \quad \hbox {if}&{} {y_\mathrm{A} \cap y_\mathrm{C} } \\ 4&{}\quad \hbox {if}&{} {y_\mathrm{B} \cap y_\mathrm{C} } \\ 5&{} \quad \hbox {if}&{} {y_\mathrm{A} \cap y_\mathrm{B} \cap y_\mathrm{C} } \\ \end{array}} \right. \end{aligned}$$
(8)

where min(\(\cdot \)) and max(\(\cdot \)) return the minimum and maximum value of the argument. The \(E_{i}\) and \(\mu _{i}\) are the wavelet energy and mean.

Fig. 7
figure 7

Class-dependent features (CDF)

2.4.2 2-Tier MLP network structure

Accurate detection and classification of faults are necessary to facilitate faster maintenance and restoration of supply. Most ANN-based fault classification systems utilized single ANN to classify all the faults [10, 30, 31], even though not all fault classes are equally difficult to distinguish from the true class label. One major disadvantage of this approach is the large size of the ANN involved. ANNs with larger number of outputs are difficult to optimize and their performance is usually lower than that of smaller networks.

To overcome this predicament, a new method using the 2-Tier MLP network is proposed. This method enables the problem knowledge to be incorporated into the network structures and the complexity of the overall problem can be divided and conquered more effectively. Specifically, in the fault classification problem, the classification of ground fault is more difficult compared to the three-phase fault. By separating the phase fault and ground fault during classification, it will reduce the complexity of the overall problems. This provides an efficient method of the MLP network training and improves the overall network performance.

The most popular and commonly used ANN, called the MLP network, had been employed as the classifier to perform fault classification in transmission lines. MLP network is a feedforward ANN that consists of three types of layers. Figure 8 illustrates an MLP network with one hidden layer. The first layer and the last layer are known as input layer and output layer, respectively. Meanwhile, the layer between the first and the last layers is known as hidden layer. The output of MLP network with one hidden layer, \(y_{k}\), is given by:

$$\begin{aligned} y_k =\sum _{j=1}^{n_h} {w_{jk}^2 } F\left[ {\sum _{i=1}^{n_i } {w_{ij}^1 v_i +b_j } } \right] \end{aligned}$$
(9)

where \({w}^{1}_{{ij}}\) and \({w}^{2}_{{jk}}\) denote the weights between input and hidden layer and weights between hidden and output layer, respectively. \({b}_{{j}}\) and \({v}_{{i}}\) represent the thresholds in hidden nodes and inputs that are presented to the input layer, respectively; \(n_{i}\), k and \(n_{h}\) are the number of input nodes, output nodes and hidden nodes, respectively. \(F(\cdot )\) is called an activation function. In this paper, hyperbolic tangent function was used for the activation function of the MLP network. The weights \({w}^{1}_{{ij}}\) and \({w}^{2}_{{jk}}\) and thresholds \({b}_{{j}}\) are unknown and should be selected to minimize the prediction error defined as:

$$\begin{aligned} e_k =y_k^d -y_k \end{aligned}$$
(10)

where \({y}^{{d}}_{{k}}\) and \({y}_{{k}}\) are the desired outputs and MLP network outputs, respectively. The Levenberg–Marquardt backpropagation (trainlm) is applied as the training algorithm for the network.

Fig. 8
figure 8

A MLP network

The 2-Tier MLP network consists of two MLP networks as illustrated in Fig. 9. The function of each network is explained as follows:

MLP 1 (Tier 1) This MLP network caters for the easy part of fault classification. It is used to classify the three-phase fault (A, B and C) from a set of wavelet features. The output of this network will also be used for the mathematical description of CDF.

MLP 2 (Tier 2) This MLP network caters for the difficult part of fault classification. It is used to detect the presence of the ground fault from the CDF. The output of network will take the value of 1 for the presence of ground fault (involving ground fault) and 0 for the unpresence of ground fault (without involving ground fault).

The combination of outputs from these MLP 1 and MLP 2 will be applied to the 10-type fault classification. The detail of the output for both MLPs is discussed in the next section.

Fig. 9
figure 9

2-Tier MLP network

2.5 Data sample and evaluation of the classification system

The MLP networks require training with a dataset to classify the fault types. To obtain a good classification model, the use of suitable dataset with a proper representation is very important. The wavelet features and CDF, as discussed in the previous sections, are used to build the dataset for training, validation and testing the network. The training dataset is used by the network to estimate its weights in order to model the classification task. The validation dataset is used to improve generalization, optimize the network size and prevent network overfitting by determining the smallest error during the training process. The testing dataset is used to evaluate the performance of the network for the fault classification task. These three datasets are obtained by varying fault parameters. Tables 23 and 4 show the details of the fault parameters and their values for training, validation and testing dataset.

Table 2 Training fault parameters and their values
Table 3 Validation fault parameters and their values
Table 4 Testing fault parameters and their values

For each dataset, the 10 types of faults have been considered in this study. Thus, a total of 10 (fault types) \(\times \) 6 (fault location) \(\times \) 5 (fault inception angle) \(\times \) 9 (fault resistance) = 2700 cases are generated for the training dataset. Similarly, a total of 2700 cases are generated for validation. Meanwhile, for testing dataset, three types of dataset, ideal(without noise) (2700 cases), 20 dB (2700 cases) and 30 dB (2700 cases) are generated.

The output of 2-Tier MLP network indicates the type fault to be classified. The MLP 1 consists of three output which correspond to the fault condition of three phases (phase A, B and C). Meanwhile, the MLP 2 only consists of single output which corresponds to ground (g) fault. The outputs of both MLP networks are rounded to the nearest number and either 0 or 1 that represents the absence or presence of the fault in the corresponding three phases (A, B and C) and ground faults. The 10 types of faults and the corresponding output of the 2-Tier MLP network are listed in Table 5.

Furthermore, the MLP Network classifier is evaluated in terms of the classification accuracy percentages. By definition, accuracy is the number of objects correctly classified as 10-fault types over the total number of objects. The classifier accuracy is determined by the following equation:

$$\begin{aligned}&\hbox {Accuracy}\;(\%)\nonumber \\&\quad =\frac{\hbox {Number\;of\;Fault\;Type\;Correctly\;Classified}}{\hbox {Total\;Number\;of\;Fault\;Cases}}\times 100\nonumber \\ \end{aligned}$$
(11)

The accuracy is a measure of the closeness of agreement between the predicted output of the 2-Tier MLP network and the actual result. The performance of the proposed 2-Tier MLP network is compared with several commonly applied ANN structures for fault classification. The structures are single, double and quadruple ANN. In a single ANN structure, all wavelet features (wavelet mean and standard deviation) are input into one ANN and used to classify all types of faults. The selection of wavelet mean (\(\mu _{\mathrm{i}}\)) and standard deviation (\(\sigma \)) as input features are determined experimentally. Figures 1011 and 12 show the single, double and quadruple MLP network.

Table 5 Fault classifier ANN outputs for 10 fault types
Fig. 10
figure 10

Single MLP network structure

Fig. 11
figure 11

Double MLP network structure

Fig. 12
figure 12

Quadruple MLP network structure

Fig. 13
figure 13

Performance of 1 MLP Network against number of hidden nodes during training, validation and testing

3 Results and discussion

This section describes the performance of the proposed CDF and 2-Tier MLP network for fault classification. In this analysis, the wavelet features [wavelet mean (\(\mu _{\mathrm{i}}\)) and standard deviation (\(\sigma \))] were input into the MLP 1. Meanwhile, the CDF was input into MLP 2. The Levenberg–Marquardt (LM) training algorithm was selected for training both MLP1 and MLP 2. The training algorithm was used extensively in many studies [32,33,34] and found to be the fastest method for training moderate-sized MLP network [35, 36]. All LM algorithm parameters are set to the default values as in the MATLAB neural network toolbox. A common type of activation functions, hyperbolic tangent (tansig) and linear (purelin) were selected for the hidden and output layers, respectively, for both MLP networks [37]. The number of hidden nodes was varied from 1 to 50. For each number of hidden node, the MLP networks were run five times with different sets of weight and bias initializations and the highest testing accuracy was recorded. The best classification performance was selected based on the highest testing accuracy and the lowest number of hidden nodes.

To investigate its performance, the 2-Tier MLP network is benchmarked against three MLP network structures which are single MLP network double MLP network structure and quadruple MLP network structures. All MLP network structures including 1 MLP, 2 MLP and 4 MLP are trained using the same training algorithm as the 2-Tier MLP network, LM training algorithm. The hyperbolic tangent (tansig) and linear (purelin) were also selected as the activation function for the hidden and output layers, respectively, for all MLP network structures. The number of hidden nodes was varied from 1 to 50. Figure 13 illustrates the process of finding the optimum hidden nodes for 1 MLP Network. The simulation result consists of three outputs which are training, validating and testing. The selection of optimum was based on the highest accuracy of validation with the lowest number of hidden nodes. Table 6 tabulates the testing accuracies of different network structures using ideal dataset.

The results in Table 6 show that the testing accuracies vary for different types of faults. This implies that different types of faults have different levels of classification difficulty. The classification task for single line to ground fault (Ag, Bg and Cg) is the easiest among the type of fault. Most of the network structures achieved 100% classification accuracy with the lowest accuracy noted at 98.52% by using the 2 MLP network structure.

Meanwhile, faults involving double lines (AB, AC, BC, ABg, ACg and BCg) are the most difficult to classify. This is due to the possibility of the presence or absence of the ground fault for phases A and B faults have lower values compared to AB fault. Most of the classification accuracies are decreased with the presence of the ground fault. For instance, by using 1 MLP, the classification accuracy has decreased from 100% (AB fault) to 88.15% (ABg fault). Similarly, the classification accuracy has decreased from 100% (AC fault) to 99.26% (ACg fault). This implies that the presence of the ground fault results in the lowest accuracy for fault classification. The difficulty was overcome by the 2-Tier MLP network. The MLP 2 in the 2-Tier MLP network is used to specifically determine the presence of ground fault.

This helps in improving the overall fault classification accuracy. By using the 2-Tier MLP network, the classification accuracy has remained the same accuracy for AB and ABg faults (100%). The classification accuracy has also remained the same accuracy for BC and BCg faults (100%). Meanwhile, the classification accuracy has slightly decreased from 100% (AC fault) to 99.63% (ACg fault). However, it still produced the highest accuracy among the other network structures.

Table 6 Testing accuracies of different network structures using ideal dataset

The structure of the 2-Tier is almost similar to the 2 MLP network structure. Both structures consist of 2 MLP network. Therefore, the classification accuracies using both network structures are almost similar. However, the classification accuracies for ABg and BCg faults using 2-Tier MLP network are higher than the 2 MLP network. For ABg fault, the classification accuracy has increased from 87.04% by using 2 MLP to 100% by using the 2-Tier MLP network. Similarly, For BCg fault, the classification accuracy has increased from 99.26% by using 2 MLP to 100% by using the 2-Tier MLP network. The CDF in the 2-Tier MLP network helps in detecting the presence of the ground fault and further improves the classification accuracy.

The results also show that the 2-Tier MLP network structure achieved the highest accuracies for all fault types. Out of 10 fault types, 2-Tier MLP network achieved eight highest accuracies for Bg, Cg, AB, AC, BC, ABg, ACg and BCg (highlighted in bold in Table 6). As comparison, 1 MLP network achieved six highest accuracies for Ag, Bg, Cg, AB, AC and ABC. The 2 MLP networks also achieved six highest accuracies for Bg, Cg, AB, AC, BC and ACg. The 4 MLP networks achieved four highest accuracies for Bg, AB, AC and BCg. This implies that the 2-Tier MLP network produced better classification accuracy for all fault types as compared to the other MLP network structures.

Table 7 gives testing accuracies of different network structures using a more difficult dataset, 30 dB noise dataset. It can be seen that the testing accuracies vary for the 10 types of faults. Similar to the performance using ideal dataset, the results show that the classification task for single line to ground fault (Ag, Bg and Cg) is the easiest among the types of fault. Almost half of the network structures achieved 100% classification accuracy with the lowest accuracy being 98.52% by using the 2 MLP network structure. Meanwhile, faults involving double lines (AB, AC, BC, ABg, ACg and BCg) are the most difficult to classify. This is due to the possibility of the presence or absence of the ground fault and distorted signals due to the noise. Classification accuracies are fluctuated with the presence of the ground fault and noisy signals. For instance, by using 1 MLP, the classification accuracy has decreased from 95.93% (ABg fault) to 78.52% (AB fault). Similarly, the classification accuracy has decreased from 99.63% (AC fault) to 98.89% (ACg fault). This implies that the presence of the ground fault and noisy signals results in the lowest accuracy for fault classification. The difficulty was overcome by 2-Tier MLP network. The MLP 2 in 2-Tier MLP network is used to specifically determine the presence of the ground fault and noisy signals. This helps in improving the overall fault classification accuracy. By using the 2-Tier MLP network, the classification accuracy has remained the same for AB and ABg faults (100%). The classification accuracy has also remained the same for AC and ACg faults (100%), and BC and BCg faults (100%).

The structure of the 2-Tier is almost similar to the 2 MLP network structure. Both structures consist of 2 MLP network. Therefore, the classification accuracies using both network structures are almost similar. However, the classification accuracies for AB, AC, BC, ABg, BCg and ABC faults using 2-Tier MLP network are higher than the 2 MLP network. For AB fault, the classification accuracy has increased from 81.48% by using 2 MLP to 100% by using the 2-Tier MLP network. Similarly, for AC fault, the classification accuracy has increased from 98.89% by using 2 MLP to 100% by using 2-Tier MLP network. For the BC fault, the classification accuracy has increased from 93.70% by using 2 MLP to 100% by using the 2-Tier MLP network. For ABg fault, the classification accuracy has increased from 95.93% by using 2 MLP to 100% by using the 2-Tier MLP network. For BCg fault, the classification accuracy has increased from 99.26% by using 2 MLP to 100% by using the 2-Tier MLP network. For ABC fault, the classification accuracy has increased from 92.96% by using 2 MLP to 95.56% by using the 2-Tier MLP network. The CDF in the 2-Tier MLP network helps in detecting the presence of the ground fault with noisy signals and further improves the classification accuracy.

The results also show that the 2-Tier MLP network structure achieved the highest accuracies for all fault types. Out of 10 fault types, the 2-Tier MLP network achieved seven highest accuracies for Bg, AB, AC, BC, ABg, ACg and BCg (highlighted in bold in Table 7). As comparison, 1 MLP network achieved four highest accuracies for Ag, Bg, Cg and ABC. Two MLP networks also achieved two highest accuracies for Bg and ACg. Four MLP network achieved four highest accuracies for Ag, Bg and ACg. This implies that the 2-Tier MLP network produced better classification accuracy for all fault types as compared to the other MLP network structures, and it was robust against noise.

Table 8 gives testing accuracies of different network structures using the most difficult dataset, 20 dB noise dataset. It can be seen that most of the testing accuracies are decreased compared to the performance achieved by using ideal and 30 dB datasets. The classification task for single line to ground fault (Ag, Bg and Cg) maintained the easiest among the type of fault. Most of the network structure achieved classification accuracies between 99.26 and 100%.

Table 7 Testing accuracies of different network structures using 30 dB dataset

Meanwhile, faults involving double lines (AB, AC, BC, ABg, ACg and BCg) are the most difficult to classify. Classification accuracies are fluctuated with the presence of the ground fault and noisy signals. For instance, by using 1 MLP, the classification accuracy has decreased from 94.82% (ABg fault) to 43.70% (AB fault). Similarly, the classification accuracy has decreased from 94.44% (ACg fault) to 90.37% (AC fault). The difficulty was overcome by the 2-Tier MLP network. The MLP 2 in the 2-Tier MLP network is used to specifically determine the presence of the ground fault for noisy signals. This helps in improving the overall fault classification accuracy. By using the 2-Tier MLP network, the classification accuracy has slightly increased from 99.26% (AB fault) to 100% (ABg fault). The classification accuracy has also slightly increased for AC (99.26%) and ACg (99.63%) faults, and BC (98.89%) and BCg (100%) faults.

As the 2-Tier structure is almost similar to the 2 MLP network structure which consists of 2 MLP network, the classification accuracies using both network structures are almost similar. However, the classification accuracies for AB, AC, BC, ABg, ACg, BCg and ABC faults using the 2-Tier MLP network are higher than the 2 MLP network. For AB fault, the classification accuracy has increased from 45.56% by using 2 MLP to 99.26% by using the 2-Tier MLP network. Similarly, for AC fault, the classification accuracy has increased from 80.74% by using 2 MLP to 99.26% by using the 2-Tier MLP network. For BC fault, the classification accuracy has increased from 70.00% by using 2 MLP to 98.89% by using the 2-Tier MLP network. For ABg fault, the classification accuracy has increased from 94.44% by using 2 MLP to 100% by using the 2-Tier MLP network. For ACg fault, the classification accuracy has increased from 97.41% by using 2 MLP to 99.63% by using the 2-Tier MLP network. For BCg fault, the classification accuracy has increased from 96.67% by using 2 MLP to 100% by using the 2-Tier MLP network. For ABC fault, the classification accuracy has increased from 91.48% by using 2 MLP to 95.56% by using the 2-Tier MLP network. The CDF in 2-Tier MLP network helps in detecting the presence of ground fault with noisy signals and further improves the classification accuracy.

Table 8 Testing accuracies of different network structures using 20 dB dataset

The results also show that the 2-Tier MLP network structure achieved the highest accuracies for all fault types. Out of 10 fault types, 2-Tier MLP network achieved nine highest accuracies for Ag, Bg, Cg, AB, AC, BC, ABg, ACg and BCg (highlighted in bold in Table 8). As comparison, 1 MLP network achieved three highest accuracies for Ag, Bg and ABC. Meanwhile, 2 MLP and 4 MLP networks also achieved three highest accuracies for Ag, Bg and Cg, respectively. This implies that the 2-Tier MLP network produced better classification accuracy for all fault types as compared to the other MLP network structures and which was robust against severe noise. Table 9 tabulates the overall testing accuracies of different network structures using ideal, 30 and 20 dB datasets.

Table 9 presents the number of optimum hidden nodes, tuning parameters and average testing accuracies of different MLP network structures using ideal, 30 and 20 dB datasets. To obtain the number of tuning parameters in an MLP network, PT the following equation is used.

$$\begin{aligned} P_{T}=(N_\mathrm{i} \times N_\mathrm{h})+ (N_\mathrm{h} \times N_\mathrm{o})+ N_\mathrm{h} \end{aligned}$$
(12)

\(N_\mathrm{i}\), \(N_\mathrm{h}\) and \(N_\mathrm{o}\) represent the total number of nodes in the input, hidden and output layers of the MLP network. For ease of comparison, the classification accuracies are also plotted in Fig. 14. Different network structures require different number of hidden nodes ranging from 2 to 45 hidden nodes. The results show that the 2-Tier MLP network requires the smallest amount of hidden nodes (15 and 6 nodes) and tuning parameters (120). Meanwhile, 1 MLP network requires the largest amount of hidden nodes (45nodes) and tuning parameters (315). Therefore, the 2-Tier structure produced the most fitted network among all network structures and reduced the computation of the network time during testing.

Table 9 Overall testing accuracies of different network structures using ideal, 30 and 20 dB dataset
Fig. 14
figure 14

Classification accuracies of MLP network using different network structure

Table 10 Performance comparison with previously reported methods

The results show that all MLP network structures provided good average classification accuracies, and they are able to represent the 10 fault types. The best average accuracy for fault classification was achieved by using the 2-Tier MLP network (99.36%) as highlighted in bold in Table 9, followed by 1 MLP (94.42%), 2 MLP (93.92%) and 4 MLP (93.57%).

The result shows a significant decrease in the classification accuracy when noise is added to the system. The significant reduction in the classification accuracies is shown clearly in Fig. 14. By adding noise to the system, it has caused distortion to the fault current signals. This makes the classification task for each MLP network structures more difficult in distinguishing the fault types. The distortion of the current signals with 20 dB is much worse compared to the 30 dB. Therefore, the classification accuracies of 20 dB dataset are lower than the 30 dB dataset. However, the proposed 2-Tier MLP network was able to overcome this problem by integrating the CDF with the 2-Tier MLP structure to classify the ground fault. It can be clearly seen that by using CDF and 2-Tier MLP network, it is able to reduce the percentage of the classification error to <1%. The 2-Tier structure testing accuracy for ideal, 30 and 20 dB datasets is 99.52, 99.37 and 99.19%, respectively.

The 2-Tier MLP network structure achieved the best fault classification accuracy for ideal (99.52%), 30 dB (99.37%) and 20 dB (99.19%) noise datasets. In contrast, the 4 MLP network structure shows the lowest fault classification accuracy for 30 dB (95.78%) and 20 dB (86.67%) datasets. Meanwhile, 1 MLP network structure shows the lowest fault classification accuracy for ideal (98.15%) dataset.

The average accuracy of 1 MLP network was higher than 2 MLP and 4 MLP networks. As mentioned in previously, the main difficulty in fault classification is to detect the presence or absence of ground fault. In 2 MLP and 4 MLP network structures, the ground fault is individually determined by an MLP network and highly dependent on the information from wavelet features. Conversely, in 1 MLP network, three-phase faults (A, B and C faults) and ground fault are determined by a single MLP network. During the network training process, the weights are updated iteratively to minimize the error at the three-phase output nodes as well as the ground fault output node. The integration of weights in this MLP network has indirectly helped to correlate the three-phase fault with ground fault. Therefore, the performance of the 1 MLP network was better than 2 MLP and 4 MLP network structures. The 2-Tier MLP network consists of CDF that is specially designed to correlate the three-phase faults and the ground fault. The structure provides an effective method to detect the presence of the ground fault and further improves the overall fault classification performance.

Finally, the proposed scheme are benchmarked against three fault classification methods, He et al. [9], Roy and Bhattacharya [17] and Samantaray [20]. Table 10 tabulates the accuracies of the three fault classification methods and the proposed method. The results show that the proposed scheme achieved the highest testing accuracy for ideal and 20 dB datasets (highlighted in bold in Table 10). Meanwhile for 30 dB dataset, the proposed scheme and He et al. [9] show the equal result with 99.37% accuracy. For the average accuracy, the testing accuracy obtained by He et al. [9] shows the highest. However, the result is based only for 30 dB noise dataset without considering the ideal and 20 dB datasets. From these results, it can be concluded that the proposed scheme has outperformed others methods and more robust to noise.

4 Conclusion

A new technique for fault classification in transmission lines using wavelet features and neural network has been presented in this research. A new feature called CDF has been proposed to properly describe the presence of ground fault. The CDF consists of wavelet energy, wavelet mean and fault category. The research also proposed a new structure called 2-Tier MLP network to improve the accuracy of fault classification by separating the phase fault and ground fault during fault classification. Simulation results indicated that the proposed CDF combined with 2-Tier MLP network has successfully enhanced the capability of software algorithm in the protection device to classify fault types. The method outperformed 1MLP, 2MLP and 4MLP, structures with the highest average accuracy (99.36%) and robust against noise and numerous fault conditions. At present, this study provides analysis of the class-dependent feature (CDF) and 2-Tier MLP network for fault classification in transmission lines. For future work, the proposed method can be extended for analysis of fault detection and location in the transmission lines as well as the distribution lines.