1 Introduction

Epilepsy is a chronic neurological disorder caused due to abnormal and excessive brain neuronal activity, in which Electroencephalogram (EEG) signal is the most commonly used and efficient clinical technique to assess epilepsy due to its inexpensiveness and availability (Zhang et al. 2017). Traditionally, detection of epileptic seizures based on the visual inspection of neurologists is tedious, laborious and subjective (Martis et al. 2015). In addition, it requires expertise in the analysis of long-duration EEG signals (Scheuer and Wilson 2004). In those application scenarios absence of experts, for example, in emergency, computer-aided automatic detection of epileptic seizure becomes significant. To overcome above-mentioned disadvantages, numerous seizure detection techniques involving signal processing and machine learning tools have been developed, such as support vector machine (SVM), extreme learning machine (ELM), random forest (RF) and deep learning (Zhang and Chen 2016; Song et al. 2016; Mursalin et al. 2017; Acharya et al. 2018; Ullah et al. 2018; Li et al. 2019; Subasi et al. 2019; Sharma et al. 2018; Sharma and Pachori 2017a, b; Bhati et al. 2017a, b; Bhattacharyya and Pachori 2017; Tiwari et al. 2016; Sharma et al. 2017; Bhattacharyya et al. 2017; Sharma and Pachori 2015; Kumar et al. 2015; Pachori and Patidar 2014; Bajaj and Pachori 2012, 2013; Pachori and Bajaj 2011; Pachori 2008; Pachori et al. 2015). However, it still remains an open problem of automatic detection with high efficiency and accuracy in distinguishing normal, interictal and ictal EEG signals (Djemili et al. 2016). In attempt to sovle the problem, various algorithms have been developed. Since EEG signals are the redundant discrete-time sequences, numerous methods with combination of time-domain, frequency-domain, time-frequency-domain and nonlinear analysis have been proposed (Acharya et al. 2013). For the time-domain analysis, representative techniques such as linear prediction (Sheintuch et al. 2014), fractional linear prediction (Joshi et al. 2014), principal component analysis (PCA) based radial basis function neural network (Kafashan et al. 2017), etc, have been proposed for seizure detection and EEG classification. For the frequency-domain analysis, with an assumption that EEG signals are stationary, Fourier transform is usually employed to extract features for epileptic seizure detection. Samiee et al. (2015) applied the rational Discrete Short Time Fourier Transform (DSTFT) to extract features for the separation of seizure epochs from seizure-free epochs using a Multilayer Perceptron (MLP) classifier. Considering the non-stationary nature of EEG signals (Subasi and Gursoy 2010), for the time-frequency-domain analysis, a wavelet transform tool together with certain classifier has usually been used for the epileptic seizure detection. Hassan et al. (2016) decomposed the EEG signal segments into sub-bands using Tunable-Q factor wavelet transform (TQWT) and several spectral features were extracted. Then bootstrap aggregating was employed for epileptic seizure classification. For the nonlinear analysis, various nonlinear parameters extracted through different types of entropies (Acharya et al. 2015), Lyapunov exponent (Shayegh et al. 2014), fractal dimension (Zhang et al. 2015), correlation dimension (Sato et al. 2015), recurrence quantification analysis (RQA) (Timothy et al. 2017) and Hurst exponent (Lahmiri 2018) methods have been used for automatic detection of epileptic EEG signals. Aarabi and He (2017) developed a method on the fusion of features extracted from correlation dimension, correlation entropy, noise level, Lempel–Ziv complexity, largest Lyapunov exponent, and nonlinear interdependence for the detection of focal EEG signals.

Despite the fact that these previous approaches have demonstrated respectable classification accuracy, the potential of nonlinear methods has not been thoroughly investigated. The EEG signal is highly random, nonlinear, nonstationary and non-Gaussian in nature (Acharya et al. 2013), for which nonlinear features characterize the EEG more accurately than linear models (Wang et al. 2017). Considering this characteristics, several self-adaptive signal processing methods, such as empirical mode decomposition (EMD) (Huang et al. 1998; Huang and Kunoth 2013), local mean decomposition (LMD) (Park et al. 2011) and intrinsic time-scale decomposition (ITD) (Frei and Osorio 2007), can be employed to extract effective and predominant features from EEG signals (Li et al. 2013; Zahra et al. 2017). EMD decomposes a multi-component signal into a series of single components and a residual signal while LMD decomposes any complicated signal into a series of product functions. However, there exist some drawbacks in these methods, in which the EMD method contains over envelope, mode mixing, end effects and unexplainable negative frequency caused by Hilbert transformation (Chen et al. 2011), while the LMD method has distorted components, mode mixing and time-consuming decomposition (Li et al. 2015). To address these problems, recently, a new technical tool named ITD, has been introduced by Frei and Osorio (2007) for analyzing data from nonstationary and nonlinear processes. Compared with EMD, more local characteristic information of the signal can be utilized in ITD method. In addition, the negative frequency caused by Hilbert transform has been completely eliminated (Feng et al. 2016). Furthermore, the computational efficiency has been significantly improved. With high decomposition efficiency and frequency resolution, ITD can help decompose a complex signal into several proper rotation components (PRCs) and a baseline signal, which leads to the accurate extraction of the dynamic features of nonlinear signals. Meanwhile, there is no spline interpolation and screening process in ITD method which contains low edge effect (An et al. 2012; Xing et al. 2017). ITD can better preserve and extract the EEG system dynamics which is effective for the classification of normal, interictal and ictal EEG signals. Phase space reconstruction (PSR) is another popular nonlinear tool for analyzing composite, nonlinear and nonstationary signals (Takens 1980; Xu et al. 2013; Lee et al. 2014; Chen et al. 2014; Jia et al. 2017). The principle of PSR is to transform the properties of a time series into topological properties of a geometrical object which is embedded in a space, wherein all possible states of the system are represented. Each state corresponds to a unique point, and this reconstructed space is sharing the same topological properties as the original space. The dynamics in the reconstructed state space is equivalent to the original dynamics. Hence reconstructed phase space is a very useful tool to extract nonlinear dynamics of the signal (Takens 1980; Xu et al. 2013; Lee et al. 2014; Chen et al. 2014; Jia et al. 2017). It is hypothesized that EEG system dynamics between normal, interictal and ictal EEG signals are significantly different, which implies that PSR offers the potential to compute the difference and classify these EEG signals.

The novelty of this work lies in four aspects: (1) ITD method is employed to measure the variability of EEG signals and the first and second proper rotation components (PRCs) are extracted as predominant PRCs which contain most of the EEG signals’ energy; (2) discrete wavelet transform (DWT) decomposes the predominant PRCs into different frequency bands, which are used to construct the reference variables. (3) 3D phase space of the two PRCs components is reconstructed, in which the properties associated with the EEG system dynamics are preserved; (3) EEG system dynamics can be modeled and identified using neural networks, which employ the ED of 3D PSR of the reference variables as features; (4) the difference of EEG system dynamics between normal, interictal and ictal EEG signals is computed and used for the discrimination between the three groups based on a bank of estimators. Detailed description is illustrated as follows. In the present study we propose a combined and computational method from the area of nonlinear method and machine learning for the classification of normal, interictal and ictal EEG signals. To explore the underlying motor strategies in the three groups, neural networks together with ITD, discrete wavelet transform (DWT) and PSR are implemented for this purpose. The complete algorithm encompasses four principal stages: (1) EEG signals are decomposed into a series of proper rotation components (PRCs) and a baseline signal by using the ITD method. The first two PRCs of the EEG signals are extracted, which contain most of the EEG signals’ energy and are considered to be the predominant PRCs. (2) four levels DWT is employed to decompose the predominant PRCs into different frequency bands, in which third-order Daubechies (db3) wavelet function is selected for analysis. (3) Phase space of the PRCs is reconstructed based on db3, in which the properties associated with the nonlinear EEG system dynamics are preserved. Three-dimensional (3D) PSR together with Euclidean distance (ED) has been utilized to derive features, which demonstrate significant difference in EEG system dynamics between normal, interictal and ictal EEG signals. (4) Neural networks are then used to model, identify and classify EEG system dynamics between normal (healthy), interictal and ictal EEG signals.

Fig. 1
figure 1

Flowchart of the proposed method for the classification of normal, interictal and ictal EEG signals using ITD, DWT, PSR, ED and neural networks

The rest of the paper is organized as follows. Section 2 introduces the details of the proposed method, including the Bonn dataset, data description, ITD, DWT, PSR, ED, feature extraction and selection, learning and classification mechanisms. Section 3 presents experimental results. Sections 4 and 5 give some discussions and conclusions, respectively.

2 Method

In this section, we propose a method to discriminate between normal, interictal and ictal EEG signals using the information obtained from nonlinear EEG dynamics. It is divided into the training stage and the classification stage and follows the following steps. In the first step, ITD is applied to decompose EEG signals into several PRCs to extract predominant components. In the second step, DWT is employed to decompose the predominant PRCs into different frequency bands. In the third step, PSR is applied to extract nonlinear dynamics of EEG signals and Euclidean distances are computed. Finally, feature vectors are fed into the neural networks for the modeling and identification of EEG system dynamics. The difference of dynamics between normal (healthy), interictal and ictal EEG signals will be applied for the classification task. The flowchart of the proposed algorithm is illustrated in Fig. 1.

2.1 EEG database

In the present study we use the open and publicly available Bonn University database (Andrzejak et al. 2001) consisting of five different sets (Z, O, N, F and S), each of which contains 100 single-channel EEG segments of 23.6-s duration. All EEG signals were recorded at a sampling rate of 173.61 Hz using a 128-channel amplifier system with an average common reference. Band-pass filter was set with the frequency 0.53–40 Hz. Hence each signal has 4097 recordings, which means the data length of each signal is 4097. Set Z and O contain surface EEG recordings that were carried out on five healthy subjects in relaxing state. Set Z was recorded when subjects’ eyes were open while set O was recorded when subjects’ eyes were closed. Sets N, F, and S contain intracranial recordings from depth and strip electrodes collected from five epileptic patients. Set N contains seizure-free intervals collected from the hippocampal formation of opposite hemisphere, set F contains seizure-free intervals collected from epileptogenic zone, and set S contains epileptic seizure segments originated from all channels. EEG recordings from Z–O, N–F and S datasets were defined as normal (healthy), interictal and ictal signals, respectively.

2.2 Intrinsic time-scale decomposition (ITD)

Intrinsic time-scale decomposition (ITD) is suitable for analyzing nonstationary and nonlinear signals such as the EEG signals. Without resorting to the spline interpolation to signal extrema and sifting in mono-component separation, it decomposes a signal into proper rotation components (PRCs) that are suitable to calculate the instantaneous frequency and amplitude, based on the baseline defined via linear transform. The obtained decomposition result precisely preserves the temporal information of each component regarding signal critical points and riding waves, with a time resolution equal to the time scale of the occurrence of extrema in the raw signal (Feng et al. 2016). Based on the single wave analysis, it extracts accurately the inherent instantaneous amplitude and frequency/phase information and other relevant morphological features (Frei and Osorio 2007).

For a time series signal I(t), define the operator L to extract the baseline signal from I(t) and the residual signal is called the proper rotation component (PRC). The decomposed signal I(t) can be expressed as

$$\begin{aligned} I(t)=L I(t)+(1-L)I(t)=B(t)+H(t) \end{aligned}$$
(1)

where B(t) is the baseline signal and H(t) is the proper rotation.

The decomposition procedure of a nonlinear signal can be summarized by the following steps:

  • Step 1 Find the local extrema of the signal I(t), denoted by \(I_k\), and the corresponding occurrence time instant \(\tau _k, k=0,1,2,\ldots \). For convenience \(\tau _0=0\).

  • Step 2 Suppose the operators B(t) and H(t) are given over the interval \([0, \tau _k]\), and I(t) is set on the interval \(t\in [0, \tau _{k+2}]\). Then on the interval \([\tau _k, \tau _{k+1}]\) between two adjacent extrema \(I_k\) and \(I_{k+1}\), the piecewise baseline extraction operator is defined as

    $$\begin{aligned} LI(t)=B(t)=B_k+(\frac{B_{k+1}-B_k}{I_{k+1}-I_k})\times (I(t)-I_k), \quad t\in [\tau _k, \tau _{k+1}], \end{aligned}$$
    (2)

    where

    $$\begin{aligned} B_{k+1}=\beta [I_k+(\frac{\tau _{k+1}-\tau _{k}}{\tau _{k+2}-\tau _{k}})(I_{k+2}-I_k)]+(1-\beta )I_{k+1}, \end{aligned}$$
    (3)

    and \(0<\beta <1\), typically \(\beta =0.5\).

  • Step 3 After extracting the baseline signal, the operator \(\Theta \) for extracting the residual signal as PRCs is defined as

    $$\begin{aligned} \Theta I(t)\equiv (1-L)I(t)=I(t)-B(t) \end{aligned}$$
    (4)

According to the definition, the PRC is a riding wave with the highest frequency on the baseline. Therefore, ITD separates the PRC in a frequency order from high to low. In addition, the PRC is obtained directly by subtracting the baseline from the input signal, without resorting to any sifting within each iterative decomposition. Thus, ITD has low computational complexity, and more importantly, avoids the smoothing of transients and time-scale smearing due to repetitive sifting (Feng et al. 2016).

Take the baseline B(t) as the input signal I(t), and repeat steps (1)–(3), until the baseline becomes a monotonic function or a constant. Eventually, the raw signal will be decomposed into PRCs and a trend (Feng et al. 2016)

$$\begin{aligned} I(t)=\sum \limits _{i=1}^\rho H^i(t)+B^\rho (t), \end{aligned}$$
(5)

where \(\rho \) is the decomposition level.

Samples of the ITD of EEG signals from the five sets are demonstrated in Fig. 2.

Fig. 2
figure 2figure 2figure 2

Samples of ITD of EEG signals from five sets

2.3 Discrete wavelet transform (DWT)

Wavelet transform is an effective time-frequency tool for the analysis of non-stationary signals. Discrete Wavelet Transform (DWT) is a procedure for the decomposition of input signal H(t) (H(t) is the PRC of the EEG signal in this work) into sets of function, called wavelets, by scaling and shifting of mother wavelet function. Consequently, the decomposition i.e. set of wavelet coefficients are formed.

To accomplish this, the signal H(t) can be reconstructed as linear combination of wavelets and weighting wavelet coefficients. The setting of appropriate wavelet function and the number of decomposition levels is of great importance for correctly reconstructing the signal H(t). In order to extract five physiological EEG bands, four levels DWT with third-order Daubechies (db3) wavelet function have been used (Table 1 represents the frequency distribution of the DWT-based coefficients of the PRCs of the EEG signals at 173.6 Hz), from which the choice of the mother wavelet is supported by many works in literature (Vavadi et al. 2010; Tawfik 2016; Li et al. 2017). Figure 3 shows samples of EEG channel of five sets and their decomposed frequency bands of predominant PRCs. Since the frequency components above 40 Hz is lack of use in epilepsy analysis, in order to reduce the feature dimension, the advisable sub-bands (D4 and A4) are selected for feature acquisition.

Fig. 3
figure 3figure 3figure 3

Samples of four levels DWT of PRC1 and PRC2 of the EEG signals from five sets

Table 1 Frequency band of PRCs of the EEG signal using fourth level decomposition

2.4 Phase space reconstruction (PSR)

It is sometimes necessary to search for patterns in a time series and in a higher dimensional transformation of the time series (Sun et al. 2015). Phase space reconstruction is a method used to reconstruct the so-called phase space. The concept of phase space is a useful tool for characterizing any low-dimensional or high-dimensional dynamic system. A dynamic system can be described using a phase space diagram, which essentially provides a coordinate system where the coordinates are all the variables comprising mathematical formulation of the system. A point in the phase space represents the state of the system at any given time (Lee et al. 2014; Sivakumar 2002). Every db3 wavelet function of the PRC of the EEG signals can be written as the time series vector \(V=\{v_1,v_2,v_3, \ldots ,v_K\}\), where K is the total number of data points. The phase space can be reconstructed according to Lee et al. 2014:

$$\begin{aligned} Y_j=(V_j,V_{j+\tau },V_{j+2\tau },\ldots ,V_{j+(d-1)\tau }) \end{aligned}$$
(6)

where \(j=1,2, \ldots ,K-(d-1)\tau \), d is the embedding dimension of the phase space and \(\tau \) is a time lag. It is worthwhile to mention that the properties associated with the EEG dynamics are preserved in the reconstructed phase space.

The behaviour of the signal over time can be visualized using PSR (especially when \(d=2\) or 3). In this work, we have confined our discussion to the value of embedding dimension \(d=3\), because of their visualization simplicity. In addition, different studies have found this value to best represent the attractor for human movement (Venkataraman and Turaga 2016; Som et al. 2016). For \(\tau \), we either use the first-zero crossing of the autocorrelation function for each time series or the average \(\tau \) value obtained from all the time series in the training dataset using the method proposed in Michael (2005). In this study, we consider the values of time lag \(\tau =1\) to test the classification performance. PSR for \(d=3\) has been referred to as 3D PSR.

Reconstructed phase spaces have been proven to be topologically equivalent to the original system and therefore are capable of recovering the nonlinear dynamics of the generating system (Takens 1980; Xu et al. 2013). This implies that the full dynamics of the EEG system are accessible in this space, and for this reason, features extracted from it can potentially contain more and/or different information than the common features extraction method (Chen et al. 2014).

3D PSR is the plot of three delayed vectors \(V_j,V_{j+1}\) and \(V_{j+2}\) to visualize the dynamics of human EEG system. Euclidian distance (ED) of a point \((V_j,V_{j+1},V_{j+2})\), which is the distance of the point from origin in 3D PSR and can be defined as Lee et al. 2014

$$\begin{aligned} ED_j=\sqrt{V_j^2+V_{j+1}^2+V_{j+2}^2} \end{aligned}$$
(7)

ED measures can be used in features extraction and have been studied and applied in many fields, such as clustering algorithms and induced aggregation operators (Merigó and Casanovas 2011).

2.5 Feature extraction and selection

In order to obtain more efficient features, this paper proposes the following extraction scheme.

(1) ITD of the EEG signals and derivation of predominant PRCs. The signals obtained by ITD method, which are a series of decomposing signals, cannot be directly used to classify because of the high feature dimension. To solve this problem, the Pearson’s correlation coefficient is calculated to measure the correlation between the first four PRCs and the original EEG signals. The PRC with higher correlation coefficient is more highly correlated to the original signal, which means the signal energy is mostly concentrated in this PRC as well. In the present study most of the energy is concentrated in PRC1 and PRC2 components, which have the most important information from the EEG signals and are considered to be the predominant PRCs (seen from Table 2).

Table 2 The average correlation coefficients between each PRC and original EEG signals from five sets of the Bonn database

(2) Four levels DWT is employed to decompose the predominant PRCs into different frequency bands, in which third-order Daubechies (db3) wavelet function is selected for analysis. D4 and A4 of the PRC1 and PRC2 EEG signals are regarded as reference variables \([PRC1^{D4},PRC1^{A4},PRC2^{D4},PRC2^{A4}]^T\) and are used for feature derivation.

(3) Reconstruct the phase space of the reference variables with selected values of d and \(\tau \);

(4) Compute ED of 3D PSR of the reference variables. Concatenate them to form a feature vector \([ED_{j}^{PRC1^{D4}},ED_j^{PRC1^{A4}},ED_j^{PRC2^{D4}},ED_j^{PRC2^{A4}}]^T\).

For the Bonn epileptic database, EEG signals are analyzed and signal dynamics are extracted by using ITD, DWT and 3D PSR. First, ITD of the EEG signals are exhibited in Fig. 2. Four levels DWT of the PRC1 and PRC2 of EEG signals from the five sets are demonstrated in Fig. 3. The db3 of the first two PRCs are utilized to form the reference variables \([PRC1^{D4},PRC1^{A4},PRC2^{D4},PRC2^{A4}]^T\). Samples of the 3D PSR of the reference variables are exhibited in Fig. 4. After 3D PSR, features of \([ED_j^{PRC1^{D4}},ED_j^{PRC1^{A4}},ED_j^{PRC2^{D4}},ED_j^{PRC2^{A4}}]^T\) for EEG signals of the five sets are derived through ED computation, as demonstrated in Fig. 5. As we have analyzed before, significant difference in EEG system dynamics have been reported between EEG signals of five sets, which can also be seen obviously from Fig. 4.

Fig. 4
figure 4

Samples of 3D PSR of the reference variables \([PRC1^{D4},PRC1^{A4},PRC2^{D4},PRC2^{A4}]^T\) of EEG signals from five sets

Fig. 5
figure 5

Samples of Euclidian distance of 3D PSR of the reference variables \([PRC1^{D4},PRC1^{A4},PRC2^{D4},PRC2^{A4}]^T\) of EEG signals

2.6 Training and modeling mechanism based on selected features

In this section, we present a scheme for modeling and deriving of EEG system dynamics from normal, interictal and ictal EEG signals based on the above mentioned features.

Consider a general nonlinear EEG system dynamics in the following form:

$$\begin{aligned} \dot{x}=F(x;p)+v(x;p) \end{aligned}$$
(8)

where \(x=[x_1,\ldots ,x_n]^T\in R^n\) are the system states which represent the features \([ED_j^{PRC1^{D4}},ED_j^{PRC1^{A4}},ED_j^{PRC2^{D4}},ED_j^{PRC2^{A4}}]^T\), p is a constant vector of system parameters. \(F(x;p)=[f_1(x;p),\ldots ,f_n(x;p)]^T\) is a smooth but unknown nonlinear vector representing the EEG system dynamics, v(xp) is the modeling uncertainty. Since the modeling uncertainty v(xp) and the EEG system dynamics F(xp) cannot be decoupled from each other, we consider the two terms together as an undivided term, and define \(\phi (x;p):=F(x;p)+v(x;p)\) as the general EEG system dynamics. Then, the following steps are taken to model and derive the EEG system dynamics via deterministic learning theory (Wang and Hill 2006, 2007, 2009).

In the first step, standard RBF neural networks are constructed in the following form

$$\begin{aligned} f_{nn}(Z)=\sum \limits _{i=1}^N w_is_i(Z)=W^TS(Z), \end{aligned}$$
(9)

where Z is the input vector, \(W=[w_1,\ldots ,w_N]^T\in R^N\) is the weight vector, N is the node number of the neural networks, and \(S(Z)=[s_1(\parallel Z-\mu _1\parallel ),..., s_N(\parallel Z-\mu _N\parallel )]^T\), with \(s_i(\parallel Z-\mu _i\parallel )=\exp [\frac{-(Z-\mu _i)^T(Z-\mu _i)}{\eta _i^2}]\) being a Gaussian function, \(\mu _i(i=1,...,N)\) being distinct points in state space, and \(\eta _i\) being the width of the receptive field.

In the second step, the following dynamical RBF neural networks are employed to model and derive the EEG system dynamics \(\phi (x;p)\):

$$\begin{aligned} \dot{\hat{x}}=-A(\hat{x}-x)+\hat{W}_j^TS_j(x) \end{aligned}$$
(10)

where \(\hat{x}=[\hat{x}_1,\ldots ,\hat{x}_n]\) is the state vector of the dynamical RBF neural networks, \(A=diag[a_1,\ldots ,a_n]\) is a diagonal matrix, with \(a_i>0\) being design constants, localized RBF neural networks \(\hat{W}_j^TS_j(x)=\sum \nolimits _{i=1}^N w_{ij}s_{ij}(x)\) are used to approximate the unknown \(\phi (x;p)\), where \(\hat{W}_j=[w_{1j},\ldots ,w_{Nj}]^T\), \(S_j=[s_{1j},\ldots ,s_{Nj}]^T\), for \(j=1,\ldots ,n\).

The following law is used to update the neural weights

$$\begin{aligned} \dot{\hat{W}}_i=\dot{\tilde{W}}_i=-\Gamma _iS_i(x)\tilde{x}_i-\sigma _i\Gamma _i\hat{W}_i \end{aligned}$$
(11)

where \(\tilde{x}_i=\hat{x}_i-x_i, \tilde{W}_i=\hat{W}_i-W_i^*\), \(W_i^*\) is the ideal constant weight vector such that \(\phi _i(x;p)={W_i^*}^TS_i(x)+\epsilon _i(x)\), \(\epsilon _i(x)<\epsilon ^*\) represents the neural network modeling error, \(\Gamma _i=\Gamma _i^T>0\), and \(\sigma _i>0\) is a small value.

With Eqs. (8)–(10), the derivative of the state estimation error \(\tilde{x}_i\) satisfies

$$\begin{aligned} \dot{\tilde{x}}_i=-a_i\tilde{x}_i+\hat{W}_i^TS_i(x)-\phi _i(x;p)=-a_i\tilde{x}_i+\tilde{W}_i^TS_i(x)-\epsilon _i \end{aligned}$$
(12)

In the third step, by using the local approximation property of RBF neural networks, the overall system consisting of dynamical model (12) and the neural weight updating law (11) can be summarized into the following form in the region \(\Omega _\zeta \)

$$\begin{aligned} \left[ \begin{array}{c} \dot{\tilde{x}}_i\\ \dot{\tilde{W}}_{\zeta i} \end{array} \right] = \left[ \begin{array}{cc} -a_i&{}S_{\zeta i}(x)^T\\ -\Gamma _{\zeta i}S_{\zeta i}(x)&{}0 \end{array} \right] \left[ \begin{array}{c} \tilde{x}_i\\ \tilde{W}_{\zeta i} \end{array} \right] + \left[ \begin{array}{c} -\epsilon _{\zeta i}\\ -\sigma _i\Gamma _{\zeta i}\hat{W}_{\zeta i} \end{array} \right] \end{aligned}$$
(13)

and

$$\begin{aligned} \dot{\hat{W}}_{\bar{\zeta }i}=\dot{\tilde{W}}_{\bar{\zeta }i}=-\Gamma _{\bar{\zeta }i}S_{\bar{\zeta }i}(x)\tilde{x}_i-\sigma _i\Gamma _{\bar{\zeta }i}\hat{W}_{\bar{\zeta }i} \end{aligned}$$
(14)

where \(\epsilon _{\zeta i}=\epsilon _i-\tilde{W}_{\bar{\zeta }i}^TS_{\bar{\zeta }}(x)\). The subscripts \((\cdot )_\zeta \) and \((\cdot )_{\bar{\zeta }}\) are used to stand for terms related to the regions close to and far away from the trajectory \(\varphi _\zeta (x_0)\). The region close to the trajectory is defined as \(\Omega _\zeta :=\{Z|\mathrm {dist}(Z,\varphi _\zeta )\le d_{\iota }\}\), where \(Z=x, d_\iota >0\) is a constant satisfying \(s(d_\iota )>\iota \), \(s(\cdot )\) is the RBF used in the network, \(\iota \) is a small positive constant. The related subvectors are given as: \(S_{\zeta i}(x)=[s_{j1}(x),\ldots ,s_{j\zeta }(x)]^T\in R^{N_\zeta }\), with the neurons centered in the local region \(\Omega _\zeta \), and \(W_\zeta ^*=[w_{j1}^*,\ldots ,w_{j\zeta }^*]^T\in R^{N_\zeta }\) is the corresponding weight subvector, with \(N_\zeta <N\). For localized RBF neural networks, \(|\tilde{W}_{\bar{\zeta }i}^TS_{\bar{\zeta i}}(x)|\) is small, so \(\epsilon _{\zeta i}=O(\epsilon _i)\).

By the convergence result, we can obtain a constant vector of neural weights according to

$$\begin{aligned} \bar{W}_i=mean_{t\in [t_a,t_b]}\hat{W}_i(t) \end{aligned}$$
(15)

where \(t_b>t_a>0\) represent a time segment after the transient process. Therefore, we conclude that accurate identification of the function \(\phi _i(x;p)\) is obtained along the trajectory \(\varphi _\zeta (x_0)\) by using \(\bar{W}_i^TS_i(x)\), i.e.,

$$\begin{aligned} \phi _i(x;p)=\bar{W}_i^TS_i(x)+\epsilon _{i2} \end{aligned}$$
(16)

where \(\epsilon _{i2}=O(\epsilon _{i1})\) and subsequently \(\epsilon _{i2}=O(\epsilon ^*)\).

2.7 Classification mechanism

In this section, we present a scheme to classify normal, interictal and ictal EEG signals.

Consider a training dataset consisting of EEG signal patterns \(\varphi _\zeta ^k\), \(k=1,\ldots ,M\), with the kth training pattern \(\varphi _\zeta ^k\) generated from

$$\begin{aligned} \dot{x}=F^k(x;p^k)+v^k(x;p^k), \quad x(t_0)=x_{\zeta 0} \end{aligned}$$
(17)

where \(F^k(x;p^k)\) denotes the EEG system dynamics, \(v^k(x;p^k)\) denotes the modeling uncertainty, \(p^k\) is the system parameter vector.

As demonstrated in Sect. 2.6, the general EEG system dynamics \(\phi ^k(x;p^k):=F^k(x;p^k)+v^k(x;p^k)\) can be accurately derived and preserved in constant RBF neural networks \(\bar{W}^{k^T}S(x)\). By utilizing the learned knowledge obtained in the training stage, a bank of M estimators is constructed for the training EEG signal patterns as follows:

$$\begin{aligned} \dot{\bar{\chi }}^k=-B(\bar{\chi }^k-x)+\bar{W}^{k^T}S(x) \end{aligned}$$
(18)

where \(k=1,\ldots ,M\) is used to stand for the kth estimator, \(\bar{\chi }^k=[\bar{\chi }_1^k,\ldots ,\bar{\chi }_n^k]^T\) is the state of the estimator, \(B=diag[b_1, \ldots , b_n]\) is a diagonal matrix which is kept the same for all estimators, x is the state of an input test EEG signal pattern generated from Eq. (8).

In the classification phase, by comparing the test EEG signal pattern (standing for a normal, interictal or ictal EEG signal pattern) generated from EEG system (8) with the set of M estimators (18), we obtain the following test error systems:

$$\begin{aligned} \dot{\tilde{\chi }}_i^k=-b_i\tilde{\chi }_i^k+\bar{W}_i^{k^T}S_i(x)-\phi _i(x;p),\quad i=1,\ldots ,n,~~k=1,\ldots ,M \end{aligned}$$
(19)

where \(\tilde{\chi }_i^k=\bar{\chi }_i^k-x_i\) is the state estimation (or synchronization) error. We compute the average \(L_1\) norm of the error \(\tilde{\chi }_i^k(t)\)

$$\begin{aligned} \Vert \tilde{\chi }_i^k(t)\Vert _1=\frac{1}{\mathrm {T}_c}\int _{t-\mathrm {T}_c}^t|\tilde{\chi }_i^k(\tau )|d\tau ,~~~t\ge \mathrm {T}_c \end{aligned}$$
(20)

where \(\mathrm {T}_c\) is the cycle of EEG signals.

The fundamental idea of the classification between normal, interictal and ictal EEG signals is that if a test EEG signal pattern is similar to the trained EEG signal pattern \(s~(s\in \{1,\ldots ,k\})\), the constant RBF network \(\bar{W}_i^{s^T}S_i(x)\) embedded in the matched estimator s will quickly recall the learned knowledge by providing accurate approximation to EEG system dynamics. Thus, the corresponding error \(\Vert \tilde{\chi }_i^s(t)\Vert _1\) will become the smallest among all the errors \(\Vert \tilde{\chi }_i^k(t)\Vert _1\). Based on the smallest error principle, the appearing test EEG signal pattern can be classified. We have the following classification scheme.

Classification scheme If there exists some finite time \(t^s,s\in \{1,\ldots ,k\}\) and some \(i\in \{1,\ldots ,n\}\) such that \(\Vert \tilde{\chi }_i^s(t)\Vert _1<\Vert \tilde{\chi }_i^k(t)\Vert _1\) for all \(t>t^s\), then the appearing EEG signal pattern can be classified.

3 Experimental results

Experiments are implemented using matlab software and tested on an Intel Core i7 6700K 3.5 GHz computer with 32 GB RAM. We assign feature vector sequences for all the EEG signals in the Bonn database. Based on the method described in Sect. 2.5, we extract features through EEG signal time series which means input of the RBF neural networks \(x=[ED_j^{PRC1^{D4}},ED_j^{PRC1^{A4}},ED_j^{PRC2^{D4}},ED_j^{PRC2^{A4}}]^T\). In order to eliminate data difference between different features, all feature data are normalized to \([-1, 1]\).

Several experiments are carried out to verify the effectiveness of the proposed method. The classification results will be evaluated with the 10-fold and leave-one-out cross-validation styles. The data are divided into the training and test subsets. For the 10-fold cross-validation, the data set is divided into ten subsets. Each time, one of the ten subsets is used as the test set and the other night subsets are put together to form a training set. For the leave-one-out cross-validation style, each time we select one EEG signal pattern for classification, the rest of the EEG signal patterns for training. This process is repeated K (representing the number of EEG signal patterns) times and the leave-one-out classification accuracy is calculated as the average of the classification accuracy of all of the individually left-out patterns.

For the evaluation, six performance parameters are used including the Sensitivity (SEN), the Specificity (SPF), the Accuracy (ACC), the Positive Predictive Value (PPV), the Negative Predictive Value (NPV) and the Matthews Correlation Coefficient (MCC) (Azar and El-Said 2014). To be accurate, a classifier must have a high classification accuracy, a high sensitivity, as well as a high specificity (Chu 1999). For a larger value of MCC, the classifier performance will be better (Azar and El-Said 2014; Yuan et al. 2007).

In the past literature, various approaches focused on the classification of EEG signals between Sets A and E. However the effectiveness of the classification between different groups of datasets was not investigated thoroughly. It is therefore more desirable to figure out the ability of the proposed method to classify EEG signals containing different combinations of datasets (Z, O, N, F and S). To address this issue, 11 different classification problems are made from aforementioned datasets. All experiments described in Table 3 focus on distinguishing normal, interictal and ictal EEG signals. Cases 1 to 8 deal with the binary classification while cases 9 to 11 accomplish multi-class classification.

Table 3 Different experimental cases in the present study
Table 4 Performance of the proposed classification approach evaluated by 10-fold cross-validation method with Case 1: Z–S
Table 5 Performance of the proposed classification approach evaluated by 10-fold cross-validation method with Case 2: O–S
Table 6 Performance of the proposed classification approach evaluated by 10-fold cross-validation method with Case 3: N–S
Table 7 Performance of the proposed classification approach evaluated by 10-fold cross-validation method with Case 4: F–S
Table 8 Performance of the proposed classification approach evaluated by 10-fold cross-validation method with Case 5: NF–S
Table 9 Performance of the proposed classification approach evaluated by 10-fold cross-validation method with Case 6: Z–F
Table 10 Performance of the proposed classification approach evaluated by 10-fold cross-validation method with Case 7: ZONF-S
Table 11 Performance of the proposed classification approach evaluated by 10-fold cross-validation method with Case 8: ZO–NFS
Table 12 Performance of the proposed classification approach evaluated by 10-fold cross-validation method with Case 9: Z–N–S
Table 13 Performance of the proposed classification approach evaluated by 10-fold cross-validation method with Case 10: ZO-NF-S
Table 14 Performance of the proposed classification approach evaluated by 10-fold cross-validation method with Case 11: Z–O–N–F–S

The classification results on different cases have been illustrated in Tables 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 and 14 with 10-fold and leave-one-out cross-validation styles. Our study demonstrates the accuracy improvements to differentiate between normal, interictal and ictal EEG signals. Overall, our classification approach achieves good performance, which indicates that the proposed pattern recognition system can effectively differentiate between different classes of EEG signals by using nonlinear features and neural network based classification tools.

4 Discussion

The experimental results of this study demonstrate that normal, interictal and ictal EEG signals could be detected automatically by means of hybrid feature extraction methods and neural networks. The proposed scheme focuses not only on providing evidence to support the claim that interictal and ictal EEG signals demonstrate altered dynamics compared to normal EEG signals, but also on providing an automatic and objective method to distinguish between the three groups of EEG signals.

Recently, different methods reported in the literature have been proposed to automatically detect the normal, interictal and ictal EEG signals. It should be noted that all of the recent methods demonstrated in Table 15 were evaluated using 10-fold cross-validation.

For case 1 (A–E), Isik and Sezer (2012) used tools including Wavelet Transform (WT), Multilayer Perceptron (MLP) and Elman artificial neural networks (ANN), and the achieved classification accuracy was 96%. Du et al. (2012) extracted principal component features using principal components analysis (PCA) on 15 high-order spectra (HOS) features. Then eight different classifier including ANN, MLP, RBF network, random forest, rotation forest, logistic regression, model trees, simple logistic regression, and bagging were employed to evaluate the classification performance of the proposed features, in which the simple logistic regression achieved the highest accuracy of 94.5%. Zhang et al. (2018) combined fuzzy distribution entropy with wavelet packet decomposition, Kruskal–Wallis nonparametric one-way analysis of variance and k-nearest neighbor (KNN) classifier to classify the EEG signals and the achieved best accuracy was 100%. In our proposed method, the achieved accuracy is 99%.

Table 15 Summary of classification performance (10-fold cross-validation style) obtained for some cases using the same dataset in the literature

For cases 2–4, the classification accuracy achieved by our proposed method is 99.25%, 99.02%, and 98.18% respectively. Tawfik (2016) achieved the classification accuracy of 85%, 93.5%, and 96.5%, respectively, for these experimental cases with the combination of weighted permutation entropy and SVM. Recently, with the development of deep learning method, Ahmedt-Aristizabal et al. (2018) used recurrent Neural Networks (RNNs) via the use of Long-Short Term Memory (LSTM) networks and achieved classification accuracy of 94.75%, 97.25%, and 96.5% for these experimental cases, respectively. However, there are efficient formulas behind deep learning success (Goceri 2018), parameters such as batch size should be chosen carefully (Goceri and Gooya 2018). In comparison, the achieved accuracy for cased 2–4 in our proposed method is 99.5%, 98.5% and 99.5%, respectively.

For cases 5–7, our proposed method achieved the classification accuracy of 98%, 99.5% and 98%, respectively. For case 5 (NF-S), Joshi et al. (2014) utilized the fractional linear prediction technique together with the SVM classifier and the reported classification accuracy was 95.33%. For the same case, Diykh et al. (2017) used complex networks approach and reported the classification accuracy of 97.8%. For case 6 (Z–F), Jaiswal and Banka (2017) employed the local neighbor descriptive pattern (LNDP) and one-dimensional local gradient pattern (1D-LGP) together with ANN for the classification and reported the accuracy of 99.90%. Kaya et al. (2014) used one-dimensional local binary pattern (1D-LBP) to extract the histogram features and fed them into the BayesNet classifier for the classification. The reported accuracy was 99.50%. For case 7 (ZONF-S), Kumar et al. (2014) used the DWT-based fuzzy approximate entropy to extract features and fed them into the SVM classifier to achieve the classification accuracy of 97.38%. Mursalin et al. (2017) used the improved correlation-based feature selection method (ICFS) together with Random Forest classifier and reported the classification accuracy of 97.4%.

For case 8 (ZO–NFS), Kaya et al. (2014) reported accuracy of 93%. Acharya et al. (2018) used the 13-layer deep convolutional neural network (CNN) algorithm and reported the accuracy of 88.7%. In comparison, the achieved accuracy in our proposed method is 95.2%.

Case 9 (Z–F–S) is a multi-class classification problem including three classes. Jaiswal and Banka (2017) reported the accuracy of 98.22% and 97.06% by using LNDP and 1D-LGP, respectively. Kaya et al. (2014) reported the classification accuracy of 95.67% with 1D-LBP. Li et al. (2017) used the dual-tree complex wavelet transform (DT-CWT) to decompose EEG signals into five constituent sub-bands, which were associated with the nonlinear features of Hurst exponent (H), Fractal Dimension (FD) and Permutation Entropy (PE). Then four classifiers including SVM, KNN, random forest and rotation forest were employed and the reported classification accuracy was 98.87%. In comparison, the achieved accuracy in our proposed method is 99%.

Case 10 (ZO–NF–S) is another multi-class classification problem including three classes. Wang et al. (2011) reported the accuracy of 97.13% by using wavelet packet entropy features together with a classifier of artificial neural network. Acharya et al. (2012) reported the classification accuracy of 99% with Wavelet packet decomposition and Gaussian mixture model. In comparison, the achieved accuracy in our proposed method is 99.4%.

Case 11 (Z–O–N–F–S) is a five-class classification problem. Zahra et al. (2017) decomposed the EEG signal to its multiple intrinsic scales by using the multivariate empirical mode decomposition algorithm. After removing the intrinsic mode functions (IMFs) belonging to noise and other unnecessary artifacts, classification on the remainder of IMFs has been performed by employing a feature vector via artificial neural network framework. The reported accuracy was 87.2%. In comparison, the achieved accuracy in our proposed method is 94%.

Different from the above discussed methods, this study proposes a hybrid method to extract effective features based on ITD, DWT, PSR and ED. These features are fed into dynamical estimators which are consisting of RBF neural networks to classify different classes of EEG signals. Comparison of the classification performance to other state-of-the-art methods on the same database is demonstrated in Table 15. The proposed method provides an average classification accuracy of 98.15% for the eleven cases through 10-fold cross-validation. Due to the use of 10-fold cross-validation, the classification performance is robust. The method studied in this paper has the potential to serve as a supportive technical means to other approaches such as fMRI for the diagnosis of epilepsy.

Because the dimension of the features and the number of the neurons used in the study is 4 and 83,521, respectively, the computational load is relatively high. It is also time-consuming to carry out the ITD and DWT computation which may increase the complexity. However, with the development of computer technology, more powerful workstations and high-performance computers have been used to improve the computational capacity and reduce the computing time. This makes it easier to carry out ITD and DWT computation and become applicable in real-time applications, which significantly reduces the complexity. Hence it is acceptable to implement the experiments on an Intel Core i7 6700K 3.5 GHz computer with 32 GB RAM in the present study. In future work the authors will try to optimize the algorithm structure and adopt new computing technology and equipment to improve the computational performance and further reduce the complexity.

In general, the experimental results have shown that the proposed method can acquire high accuracy in epilepsy detecting on two-class, three-class and five-class classification problems. This demonstrates that our scheme is appropriate in solving problems with multiple classes. Automated analysis of epileptic seizure activity has a strong clinical potential. Also, it can be more important to produce mobile health technologies about the disease (Goceri and Songul 2018). Another important property is its computational simplicity after employing high-performance computers, which reduces the complex and makes it possible to be deployed in clinical applications. Consequently, this new approach can better meet clinical demands in the aspects of efficiency, functionality, universality and simplicity with satisfactory accuracy. These characteristics would make this method become an attractive alternative offer for actual clinical diagnosis. There are many factors in the proposed method that work together to improve the classification performance, which includes the following advantages. ITD could extract most important information of the EEG signals through predominant PRCs. DWT decomposes the predominant PRCs into different frequency bands, which are used to construct the reference variables. PSR plots EEG system dynamics along the advisable db3 sub-bands (D4 and A4) of PRCs trajectory in a 3D phase space diagram and visualizes the EEG system dynamics. ED measures and derives features, which are fed into RBF neural networks for the modeling, identification and classification of EEG system dynamics between normal, interictal and ictal EEG signals. However, some limitations such as the regulation principle of the embedding dimension and time lag, the relationship between the classification performance and the PSR parameters, still need to be improved and overcome. It would be of interest to develop strategy for adaptive selection of PSR parameters which could create best classification performance.

5 Conclusions

In this study, effective feature extraction techniques including ITD, DWT, PSR and ED have been introduced for epileptic EEG signal classification. All the techniques extract informative features for classification, which are computationally simple and easy to implement. The results of this study indicate that the pattern classification of EEG signals can offer an objective method to assess the disparity of EEG system dynamics between normal, interictal and ictal EEG signals. However, some limitations such as the relatively small size of the database, the regulation principle of the embedding dimension and time lag, still need to be improved and overcome. In future research, features introduced in other methods such as complete ensemble empirical mode decomposition (CEEMD), various entropies, Hurst exponent, mean-frequency (MF) and root-mean-square (RMS) bandwidth, Lempel–Ziv complexity, largest Lyapunov exponent, fractal dimension and other nonlinear features, can also be explored in the proposed framework to evaluate its classification performance. The results of the present study can be improved further by using wide database with more patients and various features. In addition, the future scope of this research will include identification of seizure stages besides the seizure detection part.