1 Introduction

Among the various neurological disorders, epilepsy is perhaps the most important, considering its prevalence among a large population of the world. Epilepsy is characterized by unprovoked recurring seizures that arises out of excessive and hyper-synchronous activities of neurons in the brain. Seizure may be accompanied by the loss of consciousness/convulsions, decline in cognitive ability and may lead to injury, sometimes to death. Electroencephalogram (EEG) signals represent electrical activities in the neurons of the brain and are obtained from electrodes inserted intra-cranially or on the scalp. EEG signals are widely used as convenient and relatively inexpensive means for epilepsy diagnosis and management [1] . Highly trained neurologists monitor long-term EEG signals for seizure detection, and epilepsy diagnosis, in general. However, this is a tedious, time-consuming and expensive task. An automated method of seizure detection can assist the neurologist since he will then have to monitor the sections of EEG records around the detected portion only. In addition, automated seizure detection may be used for implantable and closed-loop neuro-stimulation devices for seizure suppression such as the responsive neurostimulators (RNS) [2], as well as to localize the epileptogenic regions of the brain in order to avoid any undue morbidity and ensure effective epilepsy surgery.

Various algorithms have been proposed in the literature for automatic detection of seizures [320]. The indispensable part of a detection algorithm is the extraction of the appropriate features to discriminate the EEG signals. The detection process is carried out by extracting the features from an EEG signal and classify them into the appropriate categories such as seizure (ictal) and non-seizure (non-ictal). A widely used approach to process nonlinear signals such as the EEG is to decompose them into time–frequency sub-bands and subsequently perform the process using features extracted from the sub-bands [3, 21]. Among various methods, those using features obtained in time–frequency domains have been shown to be highly promising in the detection of seizures. One of the reasons for this is that the diverse processes of brain dynamics and associated neuronal activities are more properly represented in time–frequency sub-bands as compared to the original EEG [6]. Another reason may be that the seizure events often evolve as increased spike or poly-spike-like activities that are expected to be better visualized in the time–frequency domain. Recently, the dual-tree complex wavelet transform (DT-CWT) has been introduced by Selesnick and Kingsbury et al. [22] as a better time–frequency representation of signals as compared to the traditionally used discrete wavelet transform (DWT), widely used in the EEG literature for their analysis of epileptoform activities. The DT-CWT has also been extensively used for the processing of images and video signals [1830]. However, research reports on the use of DT-CWT for the processing of biological signals, especially EEG signals is rather limited [18, 19, 29]. In [18], ANN- and SVM-based classifiers are proposed using the variance of the DT-CWT sub-bands as features. While the variance represents a statistical average of a signal, the underlying statistics of a signal is more properly described by using an appropriate prior. It is relevant to mention that, in general, utilizing the statistics of the DT-CWT coefficients of image and video signals employing suitable priors has been shown to provide an improved performance in the processing of these signals [2327]. Thus, it would be interesting to develop classifiers for discriminating EEG signals using the parameters of a prior that can suitably describe the statistics of these signals in DT-CWT domain. The objective of this paper is to develop SVM-based classifiers for the diagnosis of epilepsy and detection of seizure using the parameters of a symmetric normal inverse Gaussian prior extracted from DT-CWT sub-bands as features. Initial results of the present paper about the ability of the NIG parameters in discriminating the EEG signals are presented in [19]; it is shown that on average, the values of the NIG parameters for healthy, inter-ictal and ictal EEG segments are quite distinguishable. It should be mentioned that no classification of EEG signals is carried out in [19]. In the present paper, the appropriateness of the NIG prior in modeling the DT-CWT coefficients of various types of EEG signals is demonstrated, and the distinguishable nature of the NIG parameters is illustrated. The effectiveness of the proposed method is comprehensively studied using a publicly available EEG database for a number of clinically relevant classification cases and compared to those of the state-of-the-art techniques.

In this study, in summary, the NIG parameters calculated from the DT-CWT sub-bands of EEG signals are used to develop SVM-based classifiers. The ability of these classifiers in discriminating EEG signals into several clinically relevant cases is investigated. The performance is measured in terms of accuracy, sensitivity, specificity, and compared with that of several recent methods.

2 Methodology

In this section, the modeling of the EEG signals in DT-CWT domain using an NIG probability density function (pdf) and the ability of the NIG parameters calculated from the DT-CWT coefficients of these signals in discriminating them is briefly discussed.

2.1 The EEG database

The EEG signals are obtained from a widely used database, publicly available in the Web site of Bonn University [31, 32]. The database consists of 500 single-channel EEG segments of 23.6-s duration each. There are five sets of grouped data, namely A, B, C, D, and E each containing 100 EEG segments. Sets A and B consist of surface EEG segments collected from five healthy volunteers, using the international standard 10–20 electrode placement scheme, in awake and relaxed state, with their eyes open and closed, respectively. Recordings in Sets C and D are obtained from the electrodes placed hippocampal formation of the opposite hemisphere and in the epileptogenic zone, respectively. Data in Set E are collected intra-cranially from these electrodes as well as those implanted in temporal and basal regions of the neocortex. The EEG data in Sets C and D correspond to seizure-free epochs, whereas the recordings in Set E correspond to seizure attacks. All the EEG signals are recorded using the same 128-channel amplifier system and digitized at 173.61 Hz with a 12-bit resolution. Thus, the sample length of each segment is \(173.61 \times 23.6 \approx 4{,}097\), and the corresponding bandwidth is 86.8 Hz. However, the frequency range of an EEG signal usually spans over 0–60 Hz. The frequencies greater than 60 Hz may be considered as noise [6]. On the other hand, the highest frequency component of an EEG segment of the database is 86.8 Hz since the sampling frequency is 173.61 Hz. The frequencies beyond 60 Hz are thus removed by using a sixth-order Butterworth filter.

2.2 Dual-tree complex wavelet transform (DT-CWT)

It is reported in the literature that the DWT is useful for feature extraction in time–frequency domain and analysis of EEG signals to detect epileptoform activities [4, 6, 10]. This is mainly due to its ability to provide an efficient sparse representation of non-stationary signals through time–frequency localization. However, the DWT has a number of drawbacks that include oscillatory nature of wavelets (limiting its performance around singularities), lack of shift invariance, aliasing, and limited directional information [22]. The DT-CWT offers a better time–frequency representation of non-stationary signals by ameliorating these problems of DWT through the implementation of a discrete complex wavelet transform using separable filter banks as in the DWT. Basically, it employs two real DWT trees, where the tree on the top (See Fig. 1) represents the real part of the complex wavelet coefficient, whereas the bottom one the imaginary part. The DT-CWT coefficients are non-oscillating with a nearly shift-invariant magnitude and significantly reduced aliasing with more directionalities as compared to the DWT and UDWT, while being only \(2^{d}\) times redundant for signals with dimension \(d\). The non-oscillating magnitude and low computational complexity (due to a modest redundancy) make the DT-CWT a better and attractive choice for the analysis of nonlinear signals such as EEG. Figure 2 shows the plots of sample EEG segments of 10 s from the datasets D (top left) and E (top right) in the first row. The plots of the corresponding first level DT-CWT real and imaginary coefficients are shown in the second and third rows, respectively.

Fig. 1
figure 1

1D dual-tree complex wavelet transformation

Fig. 2
figure 2

Sample EEG signals from Sets D and E and the corresponding DT-CWT coefficients; plots on the left column correspond to the sample EEG signals from Set D and its DT-CWT sub-bands, whereas those in the right column to that of Set E

In this paper, the parameters of an NIG pdf are estimated from the various sub-bands of a four-level DT-CWT decomposition of the filtered EEG signals. After the first level of decomposition, the EEG signal, X (0–60 Hz), is decomposed into its higher resolution components y1 (30–60 Hz) and lower resolution components, z1 (0–30 Hz). In the second level, the z1 component is then decomposed into higher resolution components, y2 (15–30 Hz) and lower resolution components, z2 (0–15 Hz). Thus, the components obtained after four levels of decomposition include the sub-bands z4 (0–4 Hz), y4 (4–8 Hz), y3 (8–15 Hz), y2 (15–30 Hz), and y1 (30–60 Hz). Reconstructions of these five components using the inverse DT-CWT approximately correspond to the five physiological EEG sub-bands delta, theta, alpha, beta, and gamma, respectively [6]. Although, the sub-bands might overlap, it is insignificant considering their physiologically approximate nature. Since, each DT-CWT coefficient has two parts, real and imaginary, the four-level decomposition yields ten sub-bands in total (five for real and five for imaginary). In the present paper, the real and imaginary parts of the DT-CWT sub-bands are represented by (y1,1), (y1,2), (y2,1),(y2,2), (y3,1), (y3,2), (y4,1), (y4,2), (z4,1), and (z4,2). For example, (y1,1) and (y1,2) represent the real and imaginary parts of the y1 sub-band.

2.3 Modeling of the EEG signals using an NIG pdf

It is assumed that the symmetric NIG pdf can appropriately model the statistics of EEG signals in DT-CWT domain. The motivation for using the NIG pdf arises from its success in modeling the statistics of nonlinear signals with heavy-tailed statistics, for example, financial data, hydrophone data, economics data, images, and video signals, among others [2426, 3336]. The symmetric NIG pdf is a variance mean mixture density where the inverse Gaussian density is the mixing distribution and expressed as

$$\begin{aligned} P_{\alpha , \delta }(x)=\frac{A (\delta ,\alpha )K_{1}\left( \alpha \,\root \of {\delta ^{2}+x^{2}}\right) }{\root \of {(\delta ^{2}+x^{2})}} \end{aligned}$$
(1)

where \(K_{1}\) is the first-order modified Bessel function of the second kind, \(\hbox {A}(\delta , \alpha )=\frac{\delta \alpha }{\pi }\hbox {exp}(\delta \alpha )\), and \(X\) represents the DT-CWT coefficients of an EEG signal. The steepness of the pdf is controlled by \(\alpha \) in that, as it is increased, it becomes steeper. The other parameter \(\delta \) is a scale factor that controls its dispersion. Figure 3 shows the plots of the pdf for various values of \(\alpha \) and \(\delta \). Figure 4 shows the empirical pdfs of DT-CWT sub-band (y1,1) of EEG recordings from Sets A, C, and E. It is seen that the pdfs are of different shapes in terms of peakedness and dispersion and demonstrate heavy-tailedness, especially for Set E. Figure 5 provides the variance stabilized \(p-p\) plots of the empirical pdfs (shown in Fig. 4), and the NIG and zero-mean Gaussian pdfs used to model the corresponding EEG sub-bands. The \(p-p\) plot is obtained by plotting \(F_a(x)^{t}\) against \(F_e(x)^{t}\) where

$$\begin{aligned} F_a(x)^{t}&= \frac{2}{\pi } \arcsin \left\{ \root \of {F_a(x)}\right\} \end{aligned}$$
(2)
$$\begin{aligned} F_e(x)^{t}&= \frac{2}{\pi } \arcsin \left\{ \root \of {F_e(x)} \right\} \end{aligned}$$
(3)

\(F_a(x)\) and \(F_e(x)\) denote the cumulative density function(cdf) of a prior pdf and the empirical cdf, respectively [33]. In order to obtain the plots, the NIG parameters are estimated from the corresponding DT-CWT sub-bands as [25]

$$\begin{aligned} \alpha&= \root \of {\frac{3K_{x}^{2}}{K_{x}^{4}}}\end{aligned}$$
(4)
$$\begin{aligned} \delta&= \alpha K_{x}^{2} \end{aligned}$$
(5)

where The second- and fourth-order cumulants of an NIG pdf are denoted as \(K_{x}^{2}\) and \(K_{x}^{4}\), respectively.

Fig. 3
figure 3

Plots of NIG pdf for various values of \(\alpha \) and \(\delta \)

Fig. 4
figure 4

Plots of the empirical pdfs for real part of the sub-band y1 (y1,1) of three samples of EEG signals from the Sets A, C, and E

Fig. 5
figure 5

\(p-p\) plots of the empirical, NIG, and Gaussian pdfs for the real parts of sub-band y1 (y1,1) for three sample EEG signals

From Fig. 5, it is seen that for Set A and Set C, the NIG and Gaussian pdfs provide a close fit to the empirical ones. However, for Set E, the NIG pdf provides a superior fit as compared to that of the Gaussian pdf. Note that an NIG pdf tends to a Gaussian one with variance \(\frac{\delta }{\alpha }\) as \(\alpha \rightarrow \infty \) [33]. Thus, the Gaussian pdf is a special case of the NIG pdf. Considering this fact and our observations from the \(p-p\) plots, the NIG pdf is more appropriate for modeling various types of EEG data in DT-CWT domain as compared to a Gaussian pdf.

Finally, in Fig. 6, the box plots are shown for the five sets using the values of \(\alpha \) and \(\delta \), respectively, estimated from the real parts of the y1 sub-bands that is (y1,1). From the box plots, it is clear that the NIG parameters can discriminate the EEG data quite well. Overall, the discussion in this section indicates that (i) an NIG pdf is a highly suitable prior for modeling the statistics of EEG signals in the DT-CWT domain and (ii) the NIG parameters can distinguish EEG signals effectively. Based on these observations and the results of [19] about the discriminating ability of the NIG parameters, support vector machine (SVM)-based classifiers are developed in the next section where the NIG parameters are utilized as features.

Fig. 6
figure 6

Box plots of a \(\alpha \) and b \(\delta \) for (y1,1)

3 Proposed SVM-based classification of EEG signals

3.1 Support vector machine (SVM)

A support vector machine (SVM) is a binary classifier, which projects the nonlinear but separable data onto a higher dimensional space by using as appropriate kernel function and subsequently determining the best hyperplane to separate the data in the projected space. One of the advantages of using an SVM is its automatic complexity control to avoid over-fitting [37]. The reason for choosing the SVM is its wide use in pattern classification, regression, and density estimation. A proper kernel function for a certain problem is dependent on the specific data. In this paper, radial-basis function (RBF) kernel is used as it yields a better performance as compared to the other kernel functions. Although the SVM is a binary classifier, it may be used to solve multi-class problems by combining several of its kind. In this paper, the error-correcting output coding (ECOC) approach obtained from digital communication [38] is employed for that purpose. A maximum of \(2^{n-1}-1\) SVMs are trained for separating \(n\) classes. For example, to separate three classes (X, Y, and Z), three classifiers are used: the first SVM classifies X from Y and Z, the second Y from X and Z, and the third Z from X and Y. The classifier-output code for a pattern is a combination of targets of all the separate SVMs. In the previous example, vectors from classes X, Y, and Z have codes (1,\(-\)1,\(-\)1), (\(-\)1,1,\(-\)1), and (\(-\)1,\(-\)1,1), respectively. If each of the separate SVMs classifies a pattern correctly, the classifier-target code is met and the ECOC approach reports no error for that pattern. Notice that for the binary (two way) classification purposes, a single SVM is sufficient.

3.2 The proposed classification method

For the SVM-based classification, first the features of classification are extracted. Due to the non-stationary nature of the EEG data, prior to feature extraction, an EEG record is first divided into 16 non-overlapping segments where each of the segments is assumed to be stationary. As there are 100 EEG datasets from each of the Sets A, B, C, D, and E, in total, \(500\times 16=8{,}000\) segments are generated. Next, each of the segments is subjected to a four-level DT-CWT decomposition, giving 10 sub-bands for each. Subsequently, the NIG parameters \(\alpha \) and \(\delta \) are estimated from each sub-band using (6) and (7). For example, Set A has \(100 \times 16=1{,}600\) segments and \(1{,}600\times 10=16{,}000\) sub-bands, thus 16,000 values of \(\alpha \) and \(\delta \) each are obtained for Set A. Next, training and testing are carried out using the extracted features. For a particular set, half of the segments, chosen randomly, are used for training and the rest half for testing in an SVM classifier. For example, if the target is to discriminate the segments of Set A from those of Set E, then among the 1,600 segments of Set A, 800 segments are used for training and the rest 800 for testing. The distribution is same for the Set E.

4 Results of the experiments

In this Section, the performances of the proposed SVM-based classifiers are described and compared to those of the state-of-the-art methods using well-known figures of merit, sensitivity (sen), specificity (spec), and accuracy (acc) [5] for various classification cases. For the five sets of EEG records described earlier, six different cases of classification are considered. The cases are chosen based on their clinical relevance and use in various papers in the literature to facilitate comparison and shown in Table 1.

Table 1 Classes considered for classification

Here, the healthy class includes the signals acquired from healthy people, whereas the inter-ictal class includes the seizure-free epochs of the epilepsy patients, and ictal class includes the seizure epochs. As for clinical relevance, Cases I and IV are related to the discrimination of healthy persons from the epilepsy patients as well as occurrence of seizures. It may also be relevant to the fact that in some cases, inter-ictal epileptoform discharges are observed for healthy persons, whereas about 10 % epilepsy patients never show such discharges. Cases II and III correspond to the detection of seizure and, in addition, may be related to the discrimination of surface EEGs from the intra-cranial ones since Sets A, B and C, D, E are acquired from surface and intra-cranial electrodes, respectively. Case V corresponds to the detection of the onset of seizure, since the signals in Set D are obtained from epileptogenic zone and thus highly related to the early-ictal activities. Similarly, Case VI is related to discriminating the ictal recordings from the inter-ictal ones. Overall, the cases are relevant to epilepsy diagnosis and seizure detection.

The performance of the proposed method is first studied using the features from a DT-CWT sub-band. Table 2 shows the corresponding values of sensitivity, specificity, and accuracy obtained by using the features from various sub-bands. It is seen that features obtained from the high-frequency sub-bands, such as y1, y2 or y3 sub-bands, provide better performances than those of the low-frequency ones, for example y4 and z4. Next, the performance is studied for various cases using different combinations of features obtained from y1, y2, and y3 sub-bands. The corresponding values of the sensitivity (sen), specificity (spec), and accuracy (acc) are provided in Table 3. It is seen that the performance of the proposed method improves significantly when features from two or more sub-bands are used. The best performance is achieved when features from the three sub-bands y1, y2, and y3 are used, in conjunction, giving 100 % accuracy with 100 % sensitivity and 100 % specificity with the exception of Case I. However, in Case I, the accuracy is quite high, about 96 %. Due to the mis-classification of classes A and B, and the classes C and D, into the class E, 100 % accuracy is not achieved. It is also seen that 100 % sensitivity is achieved for Set E which indicates that the signals in Set E are accurately discriminated from the signals in Sets A, B, C, and D. Thus, the clinical effect of this classification error is much less as compared to a mis-classification of the signals in Set E. It is also observed that the sensitivity for Classes (A, B) and (C, D) are 95.84 and 94.84 %, respectively, which indicates that a very small number of segments are mis-classified for these classes. Note that the signals in Sets A and B are obtained from healthy persons, whereas those in Set C and D are collected during inter-ictal periods, thus the neurologists can discard the related false alarms.

Table 2 Classification performance for features from single-level sub-bands
Table 3 Classification performance for features from multi-level sub-bands

The performance of the proposed method is compared with that of several state-of-the-art algorithms in Table 4. For Case I, the proposed method gives an accuracy, significantly higher than that of [5], and also higher than that of in [17], and almost the same as that of [7]. However, the number of features used for [7] is 40, whereas for the proposed method, it is 12 only. For the other cases, the proposed method provides 100 % accuracy with 100 % sensitivity and 100 % specificity. Note that the method of [18] uses variance, calculated as sample variance, as features in an SVM for classifying EEG signals. The sample variance is the maximum likelihood (ML) estimate of a zero-mean Gaussian pdf. In this respect, the results of [18] can be regarded as that obtained from an SVM classifier using Gaussian parameters. It is noted from Table 4 that for classification schemes other than binary (two way) ones, the use of NIG parameters, as compared to that of employing Gaussian parameters, yields better accuracy.

Table 4 Comparison of classification performance obtained for various algorithms

The proposed method is also computationally fast. It is implemented in MATLAB [39] on a desktop computer with an Intel core to duo 2.66 GHz processor and 2 GB RAM. The time required to extract the necessary 12 features from a segment of 23.6/16 \(\approx \) 1.475 s, is 0.003–0.005 s. In [7, 9], time–frequency features are extracted from a recording of 23.6 s and the feature extraction time varies from 3.8–4.09 s. In [5], features are extracted from a recording of 1.475 s, but the feature extraction time is 0.2–0.6 s. For testing, the SVM-based classifiers require typically 0.05 and 0.02 s, respectively, for Cases I, IV and Cases II, III, V, VI (to identify the appropriate class of an EEG segment). Thus, for example, for a 24-h continuous EEG recordings of epilepsy patients (consisting Sets D and E type segments), the processing time of the proposed method for seizure detection can be expected to be around 24 min. The computational performance can be further improved by developing a standalone \(\hbox {C}/\hbox {C}++\) program for the proposed method and implementing it on multiple cores in parallel fashion.

5 Conclusion

In this paper, a SVM-based method has been proposed using statistical NIG parameters computed in DT-CWT domain as features for the automatic seizure detection and epilepsy. The suitability of an NIG pdf in modeling EEG signals in DT-CWT domain has been demonstrated. The discrimination of EEG signals using the NIG parameters in the DT-CWT sub-bands has been discussed. SVM classifiers have been developed for binary as well as multi-way classification, the latter employing the ECOC approach. The performance of the SVM-based classification has been studied for a number of clinically relevant cases. It has been shown that the parameters obtained from the three high-frequency DT-CWT sub-bands yield 100 % sensitivity, specificity, and accuracy. Furthermore, the proposed method gives 100 % accuracy in all the cases except one for which the accuracy is also quite high, about 96 %. However, the corresponding sensitivity for ictal classes has been found to be 100 % indicating accurate detection of actual seizure events. In comparison with several state-of-the-art algorithms, the proposed method has been shown to provide better or at least almost the same accuracy in detecting seizure and epilepsy. The proposed method has also been shown to be computationally fast in terms of feature extraction and seizure detection. The overall performance and computational speed suggest that it can be useful for automated analysis/monitoring of clinically used continuous EEG records. Since the proposed method uses the statistics of EEG signals in time–frequency domain, similar performance is expected by the proposed classifiers for the long-term EEG records. Currently, the authors of the present paper are conducting a study using long-term EEG records.