1 Introduction

Epilepsy is a neurological disease that is not contagious, not a mental illness, and not a developmental disability. A seizure is a brief disruption of the electrical activities in the human brain [1]. Epileptic seizures are attributed to deformities in the human brain that make the patient prone to seizures, which are usually frequent and recurrent [2]. According to the International League Against Epilepsy (ILAE), the most recent definition is that “Epilepsy is a disorder of the brain characterized by an enduring predisposition to generate epileptic seizures”. This definition of epilepsy requires the occurrence of at least one epileptic seizure. The seizures occur because of sudden and abnormal electrical activities in the brain, or excessive electrical discharges in a group of brain cells. Different parts of the brain can be the source of such discharges. When the nerve cells send distorted signals, they ignite patients with distressed feelings making them act strangely, usually as a spasm or a violent vibration involving the muscles [3,4,5,6].

Research studies provided by the World Health Organization (WHO) show that approximately 50 million people suffer from epilepsy worldwide. The estimated proportion of the general population with active epilepsy, i.e., continuing seizures or with the need for treatment at a given time, is between 4 and 10 per 1000 persons [7]. However, some studies in low- and middle-income countries reveal that the proportion is much higher, between 7 and 14 per 1000 persons. Most seizures last from 30 seconds to 2 minutes and do not cause lasting harm [8]. However, there is a medical emergency if seizures last longer than 5 minutes or if a person has many seizures and does not wake up between them. Seizures may start at any time during life and occur sporadically at infrequent intervals or frequently.

EEG is the recording of electrical activities along the scalp produced by the discharging of neurons within the brain. It refers to the recording of the brain spontaneous electrical activities over a period. When brain cells (neurons) are activated, local current flows are produced. The electrical activities are detected using small, and flat metal discs (electrodes) attached to the brain scalp. The brain cells send electrical impulses that are active all the time, even during human sleeping [9].

Automatic seizure detection and prediction from EEG signals have received considerable research attention for a better understanding of epilepsy and more efficient management of the disease. Feature extraction is a key step in performing EEG signal classification for detection or prediction [10]. We imagine courageously a method in which classification is carried out without complex feature extraction, and the recent development of CNN has provided a new way of addressing this issue. Original EEG signals converted to images using spectrogram estimation are directly used to train a CNN. We not only consider binary epilepsy scenarios, e.g., NS versus S and NS versus PS, but also verify the ability to classify NS versus S versus PS.

In this paper, two approaches for detecting and predicting epileptic seizures in EEG signals are introduced. The first one is based on feature extraction in the HMS domain. EEG signals are first split into Intrinsic Mode Functions (IMFs). The instantaneous frequency spectrum of each of the collected IMFs is then obtained using the Hilbert transform. For IMFs, main features such as spectral entropy, skewness, kurtosis, and sub-band energies are extracted and then used. SVM, LR, and KNN are the three classifiers utilized. Secondly, an efficient approach for activity detection from EEG signals is proposed. It incorporates the generation of spectrograms of EEG signals. A CNN is used for the classification of spectrogram images. The proposed model depends on the use of residual learning and depth concatenation techniques. The main contributions of this work can be listed as follows:

  • Proposal of an approach that depends on HMS domain and SVM for seizure detection and seizure prediction.

  • Proposal of an approach that generates spectrograms of EEG signals and performs various classification scenarios for seizure detection and prediction.

  • Proposal of a CNN model to be used for classification. The proposed model incorporates depth concatenation and residual learning strategies.

  • Study of the impact of different CNN hyper-parameters on the classification performance.

  • Measurement of the performance of the proposed models and comparison with different state-of-the-art models.

2 Related work

In recent years, there has been a growing interest in the utilization of ML and Deep Learning (DL) models for the classification of biomedical data. These models have been applied on a wide range of data types, including EEG signals, Magnetic Resonance Imaging (MRI) scans [11, 12], and electrocardiography (ECG) signals. Various ML and DL techniques, such as SVMs, kNNs, Random Forests (RFs), CNNs, and Recurrent Neural Networks (RNNs), have been used to classify biomedical data with high accuracy and precision. Additionally, several studies have used ensemble methods, such as boosting and bagging, to improve the classification performance. Overall, the use of ML and DL models for the classification of biomedical data is a promising area of research with the potential to revolutionize the field of healthcare [13, 14]. For epileptic seizure detection using EEG signals, many strategies have been developed. In [15], the researchers suggested an automated seizure detection approach in the Empirical Mode Decomposition (EMD) domain. Higher-order statistics, including variance, kurtosis, and skewness, are extracted and utilized as features. An artificial neural network is used for the classification task. Bizopoulos et al. [16] presented HMS analysis in combination with k-means clustering to detect epileptic seizures. The authors of [17] employed a method for detecting seizures from EEG signals based on HMS. As discriminative features, spectral entropies and sub-band energies are utilized. An SVM is used for classification. Ibrahim et al. [18] introduced three models for the classification task of EEG signals. Two of the models are patient-specific and designed for the classification of NS versus PS activities for seizure prediction, and NS versus S activities for seizure detection. The third model is patient non-specific, making it better suited for general classification tasks. The first model utilizes a CNN with residual blocks, containing thirteen layers and four residual learning blocks. It works on spectrograms of EEG signal segments. The second model depends also on a CNN with three layers and works on spectrograms. The third model, in contrast, depends on Phase Space Reconstruction (PSR) to eliminate the limitations of spectrograms used in the first two models. A five-layer CNN is used with this strategy.

Riaz et al. [19] presented a technique for extracting features from EEG signals using the EMD. It depends on temporal moments of the third order, as well as spectral features like the spectral centroid, coefficient of variation, and spectral skewness of the IMFs. These features are physiologically meaningful as they can differentiate normal EEG signals from pathological EEG signals in terms of temporal and spectral centroids, dispersions, and symmetries. The extracted features are then fed into an SVM classifier. For epileptic seizure detection, Hassan et al. [20] employed Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN). Once the EEG signal segments have been decomposed into CEEMDAN IMFs, these IMFs are modeled using the symmetric Normal Inverse Gaussian (NIG) Probability Density Function (PDF). The estimated NIG parameters are then used as features for the epileptic seizure detection algorithm. The symmetric NIG PDF is a variance-mean mixed density in which the inverse Gaussian density constitutes the mixed distribution. The scale and feature factors of the NIG PDF computed from each of the CEEMDAN IMFs are utilized as features in the epileptic seizure detection algorithm. Bouaziz et al. [21] have managed to segment the EEG signals of CHB-MIT into 2-second frames, and then transformed them into a spatial representation by producing a set of intensity images. These images were fed to a CNN, which has a total of eight layers comprising one initial input layer, five hidden layers, one fully-connected layer, and an output layer. Their approach achieved an accuracy of 99.48%. It managed to reduce dimensionality, and then allow the Genetic Algorithm (GA) classification.

Rajaguru et al. [22] adopted Multilayer Auto-Encoders (MAEs) and Expectation-Maximization merged with Principal Component Analysis (EM-PCA). The performance index represented in classification accuracy was 93.78%. Natural and abnormal brain activities were studied by Roy et al. [23], as they proposed four different DL schemes. The development of the ChronoNet model was conducted based on other models. This model gave 90.60% and 86.57% training and testing accuracies, respectively. A multi-scale 3D-CNN with a bidirectional Gated Recurrent Unit (GRU) model was introduced by Choi et al. [24] for cross-patient seizure detection. Short-Time Fourier Transform (STFT) was used to get spectral and temporal features from EEG signals.

Any proposed method should be able to distinguish between NS and S states, when it comes to seizure detection. Shoeb [25] employed an SVM classifier to identify seizures. His approach achieved a False-Positive Rate (FPR) of 0.08 /h and an average accuracy of 96%. Thodoroff et al. [26] presented a seizure detection approach based on a recurrent convolutional neural network and an image-based EEG signal representation. The results showed a sensitivity of 96% and an FPR of 0.08 /h. In [27], the authors used a hybrid method to select IMFs extracted by EMD and Ensemble Empirical Mode Decomposition (EEMD). For classification, an SVM is utilized. By using EMD and the EEMD, average accuracies of 94.56% and 96.06%, respectively, have been achieved. Truong et al. [28] used the Freiburghigher-order statistic spital intracranial EEG (iEEG) dataset, the CHB-MIT dataset, and the American Epilepsy Society (AES) Seizure Prediction Challenge dataset. To construct spectrograms, the STFT algorithm has been applied to raw data. This patient-oriented model gave sensitivity values of 81.4%, 81.2%, and 75% and FPR values of 0.06/h, 0.16/h, and 0.21/h, for the pre-mentioned datasets, respectively.

For seizure prediction, the Hilbert-Huang Transform (HHT) and a Bayesian classifier were utilized in [29]. First, signals are pre-processed for noise reduction. The HHT is then used to extract features. For feature selection, a Correlation-based Feature Selection (CFS) method was used. For classification, Bayesian networks were utilized. Finally, post-classification, a post-processing technique, was used to merge the individual probabilities gained. This approach has a sensitivity of 96.55% and an FPR of 0.21/h. Consul et al. [30] introduced a Hilbert domain hardware prediction algorithm. After obtaining the instantaneous phase using the Hilbert transform, the Phase-Difference (PD) approach has been used. This approach achieved a prediction time ranging from 51 seconds to 188 minutes and a sensitivity of 88.2%.

Chu et al. [31] used an attractor state-analysis-based seizure predictive model. The accuracy of this model was 86.6%, with a false prediction rate of 0.367/h and an average prediction time of 45.3 minutes. Sedik et al. [32] used a statistical framework based on the use of various digital filters to predict seizures. A prediction time of 66.6 minutes, an accuracy of 96.2485%, and a false-alarm rate of 0.10526/h have been achieved. Emara et al. [33] presented an automatic seizure detection approach based on Scale-Invariant Feature Transform (SIFT) in the frequency domain as a feature extraction tool. This approach has been tested on the CHB-MIT dataset. An accuracy of 99.97% has been achieved. Emara et al. [34] proposed an approach for EEG seizure prediction and channel selection in the Hilbert domain. Signal attributes in the Hilbert domain, including amplitude, derivative, local mean, local variance, and median, are analyzed statistically to perform the channel selection and seizure prediction tasks. An average prediction rate of 96.46%, an average false-alarm rate of 0.028/h, and an average prediction time of 60.16 min for a 90-min prediction horizon have been reported.

Yoo et al. [35] proposed another time-domain detection technique, where the signal energy is computed during S and NS intervals on patient-specific data. They used SVM as a classifier with an accuracy of 84.4%. In addition, there are several seizure detection methods based on frequency domain processing. Rana et al. [36] proposed a technique based on multi-channel Electro-Cortico-Gram (ECoG) and the phase-slope index. Another technique was introduced depending on frequency-moment signatures to detect patient-specific seizures with a sensitivity of 91% [37]. Furthermore, several methods have been proposed based on dividing the EEG signals into time intervals, and then applying a suitable transform. Gabor transform, Fourier transform, and many other transforms have been used to obtain suitable seizure features. Zhou et al. [38] proposed another technique using lacunarity and Bayesian linear discriminant analysis for seizure detection with a sensitivity of 96.25%. Liu et al. [39] proposed a wavelet-transform-based technique, using SVM in long-term iEEG with a sensitivity of 94.46%, and a specificity of 95.26%.

The CNN-based classification methods achieve remarkable results on various datasets. To study these methods, several aspects should be taken into consideration, including the number and the architecture of CNNs, the dataset used for training, the type of loss function and the incorporated learning strategies. Vidyaratne et al. [40] proposed cellular neural networks and bi-directional recurrent neural networks to extract temporal features for seizure analysis. Shoeb et al. [41] presented a patient-specific ML technique based on the CHB-MIT dataset. They extracted spectral and spatial features and then combined non-EEG features to form feature vectors. Their approach detected 96% of 173 test seizures in an event-based assessment. Pramod et al. [42] and Turner et al. [43] used deep belief networks applied to multi-channel EEG data for seizure detection. Moreover, Kashif et al. [44] designed a hybrid Local Binary Pattern (LBP) wavelet-based approach to classify EEG signals for epilepsy patients. LBP is used to transform the EEG signal into a new signal, and then the Discrete Wavelet Transform (DWT) is employed to decompose the obtained signal. Linear Discriminant Analysis (LDA) classifier has been used for the classification process. Experiments were carried out on 105 seizures for 14 randomly-selected subjects of the CHB-MIT dataset.

Lorena et al. [45] developed a patient non-specific strategy for seizure detection based on the Stationary Wavelet Transform of EEG signals. Their approach was tested on scalp EEG records of 24-48 h for 18 epilepsy patients. Safi et al. [46] proposed a framework based on Convolutional Denoising Auto-encoder (CDA) for multivariate time series imputation. In addition, they performed a pre-processing step to encode time series data into 2D images using Gramian Angular Summation Field (GASF). Wang et al. [47] designed an approach to convert time series data into novel representations using Gramian Angular Field (GAF) and Markov Transition Field (MTF) images. Barra et al. [48] presented a method for forecasting certain patterns by applying DL technologies and encoding time series on GAF images. In [49], epileptic seizures have been classified based on the reinforcement learning technique. In this technique, Hilbert-Huang transform is used to extract 19 time-frequency domain features. Its classification accuracy reached 96.79%. In [50], seizure classification has been performed depending on brain activities. In addition, wavelet transform is used for EEG signal decomposition. Classification accuracy reached 89.60% using cubic SVM classifier and 87.00% using weighted KNN classifier.

Table 1 Summary of the utilized EEG dataset

3 Materials and methods

3.1 CHB-MIT dataset

The experiments have been carried out on large datasets to ensure generality. The CHB-MIT dataset [51] is a publicly-available dataset from physionet.org that contains 686 sEEG taken for 24 patients treated at Boston Children’s hospital. Only 198 of the 686 records contain seizures. The worldwide 10-20 standard EEG electrode placement and labeling have been employed for dataset acquisition. However, 17 of the seizure files exhibited distinct channel montages. As a result, these 17 records have been eliminated from this study, leaving 181 seizure files. Table 1 provides detailed information about the dataset used to evaluate the proposed approach.

Fig. 1
figure 1

Block diagram of the proposed epileptic seizure detection approach using HMS with different classifiers

3.2 ML-based approach

Figure 1 depicts the main architecture of the proposed approach for seizure detection and prediction. EEG signal analysis involves segmentation, EMD, HHT, and ML classifier to detect epileptic seizures. The first step is the segmentation process, which involves dividing the EEG recording into small segments of a specific length. A window refers to the specific time frame or duration used to divide the EEG recording into smaller segments. The window size can have a significant impact on the analysis and interpretation of the EEG signals. A smaller window size will provide a higher temporal resolution and will allow for the detection of short-lived events, such as seizures. However, smaller window sizes can also increase the amount of noise, making it more difficult to detect seizures. On the other hand, larger window sizes will provide a lower temporal resolution but will reduce the amount of noise, making it easier to detect seizures. The segmentation process is followed by the application of the HHT on the segments. The HHT is a signal processing technique that can be used to decompose a non-linear and non-stationary signal into its IMFs. The HHT is based on the EMD and the Hilbert transform. The EMD decomposes the signal into a set of IMFs, each representing a different intrinsic oscillatory mode present in the signal [52,53,54]. The Hilbert transform is then applied on each IMF to obtain the corresponding instantaneous frequency. By applying the HHT on the segments, the proposed approach provides a detailed analysis of the EEG signal and extracts relevant features that can be used to detect seizures. The HHT allows for a frequency-time analysis of the EEG segments, providing further information about the signal dynamics, and the IMFs obtained from the EMD allow isolation of different intrinsic oscillatory modes present in the signal. After that, spectral entropies, sub-band energies, and higher-order statistics are used as features to classify the EEG segments as S or NS. These features are extracted from the segments after applying the HHT and are used to train ML models. Spectral entropies, such as Shannon, Tsallis, and Renyi entropies, are calculated from the power spectra of the signals. They can be used to distinguish S segments from NS segments. This can be done by comparing the entropy of the signal during S and NS states and identifying any significant differences in complexity. Sub-band energies represent the energy present in different frequency bands of the signal. They can provide information on the distribution of energy in different frequency bands. This strategy can be used to detect any changes in energy distribution that may be indicative of seizures. Higher-order statistics are statistical features that capture the characteristics of a signal beyond the traditional second-order features such as power and energy. Examples of higher-order statistics include kurtosis and skewness. These features can provide additional information about the signal that can be used to detect seizures. They can be used to detect any changes in the distribution of the signal that may be indicative of seizures. Overall, these features are chosen to reflect the characteristics of the EEG signal that can help to distinguish S from NS or PS segments. These features are then combined to provide a comprehensive analysis of the EEG signal. The features are fed into ML models like SVM, to classify the segments as S, NS, and PS. ML models are used to classify the EEG segments as S or NS based on the features extracted from the segments. The proposed approach depends on the use of three different types of ML models: SVM, KNN, and LR models. SVM is a supervised learning algorithm that creates a hyperplane or a set of hyperplanes in high-dimensional space to separate different classes. SVM is known for its ability to handle high-dimensional and non-linearly separable data, which makes it a good candidate for EEG signal analysis. KNN is a non-parametric tool that assigns a class label to a new data point based on the major class among its k-nearest neighbors. KNN is a simple and easy-to-implement algorithm that is known for its good performance on small datasets. LR is a supervised learning algorithm that models the relationship between a dependent variable and one or more independent variables by fitting a probability distribution function. LR is a simple and easy-to-implement algorithm that can be used to model the probability of an event occurring. These three algorithms have been selected as they are known to be good classifiers, and they have been widely used in EEG signal analysis. The tuning parameters of these models are presented in Table 2. The performance of the models has been evaluated using metrics such as accuracy, sensitivity, and specificity, and fine-tuned, accordingly. The combination of these ML models with the proposed feature extraction method, which gives spectral entropies, sub-band energies, and higher-order statistics, will provide a comprehensive analysis of the EEG signal and increase the chances of detecting seizures with high accuracy.

Table 2 Tuning parameters definitions and their values for SVM, KNN, and LR classifiers in the proposed approach for seizure detection and prediction

3.2.1 The Hilbert-Huang Transform (HHT) and its spectrum

The Hilbert transform is applied on each IMF component once the IMFs have been computed using the EMD [51].

$$\begin{aligned} H[q_{i}(t)]=\frac{1}{\pi }{\int }_{-\infty }^{ +\infty }\frac{q_{i}(\tau )}{t-\tau }d\tau \end{aligned}$$
(1)

where \( q_{i}(\tau )\) and \(H[q_{i}(t)]\) form a complex conjugate pair that specifies an analytic signal \(Z_{i}(t)\).

$$\begin{aligned} Z_{i}(t)=q_{i}(t )+jH[q_{i}(t)] \end{aligned}$$
(2)

It can be represented as:

$$\begin{aligned} Z_{i}(t)=a_{i}(t )exp(j\omega _{i}(t)) \end{aligned}$$
(3)

where \(a_{i}(t)\) is the amplitude and \(\theta _{i}(t)\) represents the phase.

$$\begin{aligned} a_{i}(t)= & {} \sqrt{c_{i}(t )+H^{2}[c _{i}(t)]}\end{aligned}$$
(4)
$$\begin{aligned} \theta _{i}(t)= & {} arctan\left( \frac{H[c _{i}(t)]}{c_{i}(t )} \right) \end{aligned}$$
(5)

Thus, the instantaneous frequency \(\omega _{i}(t)\) can be defined as:

$$\begin{aligned} \omega _{i}(t)=\frac{d\theta _{i}(t)}{dt} \end{aligned}$$
(6)

Therefore, the original data can be defined as follows:

$$\begin{aligned} E(t)=Re\sum _{i=1}^{L}a_{i}(t)exp(j{\int }\omega _{i}(t)dt) \end{aligned}$$
(7)

where the residue \(u_{l}(t)\) has been discarded. Hilbert-Huang spectrum represents the instantaneous amplitude and the instantaneous frequency in a three-dimensional plot, where the amplitude represents the height in the time-frequency plane.

$$\begin{aligned} H(\omega ,t)=Re\sum _{i=1}^{L}a_{i}(t)exp(j{\int }\omega _{i}(t)dt) \end{aligned}$$
(8)

Finally, the marginal spectrum \(h(\omega )\) can be expressed as follows:

$$\begin{aligned} h(\omega )={\int }_{0}^{T}H(\omega ,t)dt \end{aligned}$$
(9)

The marginal spectrum gives a measure of the total energy contribution from each frequency value. Thus, the local marginal spectrum of each IMF component is defined as:

$$\begin{aligned} h_{i}(\omega )={\int }_{0}^{T}H_{i}(\omega ,t)dt \end{aligned}$$
(10)

The local marginal spectrum \(h_{i}(\omega )\) gives a representation of the total amplitude contribution versus frequency \(\omega \) that we are interested in.

3.2.2 Feature extraction

The distinctive attributes of a signal are represented by features. The next step is to identify discriminating features of EEG signals for different classes. Different attributes, namely, Renyi entropy, Tsallis entropy, Shannon entropy, sub-band energies, skewness, and kurtosis are extracted and used as the main features.

  1. 1.

    Spectral Entropies Entropy is an indication of disorder in physical systems. It is related to the amount of information obtained by observations of disordered systems. Spectral entropy depends on the PDF of spectral probabilities. Flat probability distribution means high entropy. On the other hand, peaked probability distribution means low entropy [56]. The Fourier-spectrum-based entropies have a great contribution to the success of EEG seizure detection problems [56, 57]. Entropy is a statistical measure of the variability within the EEG signal. HMS-based entropy is exploited to provide a better performance in EEG signal analysis, due to its highest performance in non-stationary signal analysis. Three different statistical entropies are employed and discussed. To estimate the entropy, the spectrum should be normalized to obtain the probability mass function.

    $$\begin{aligned} p_{i}=\frac{P_{i}}{\sum _{i=1}^{n}P_{i}} \end{aligned}$$
    (11)

    where \(P_{i}\) represents the energy content corresponding to the frequency component i. Moreover, \(p_{i}\) represents the probability density function of the spectrum. Then, the Shannon entropy is expressed as follows [58]:

    $$\begin{aligned} SEN = -\sum _{i=1}^{n}p_{i}log p_{i} \end{aligned}$$
    (12)

    where \( p_{i} \) is the probability density of the spectrum:

    $$\begin{aligned} \sum _{i=1}^{n}p_{i} = 1 \end{aligned}$$
    (13)

    The Renyi entropy is expressed as follows [59]:

    $$\begin{aligned} REN_{\alpha }= \frac{1}{1-\alpha }log\sum _{i=1}^{n}p_{i}^{\alpha } \end{aligned}$$
    (14)

    The Tsallis entropy is expressed as follows [60]:

    $$\begin{aligned} TEN_{\alpha }= \frac{1}{1-\alpha }(1-\sum _{i=1}^{n}p_{i}^{\alpha }) \end{aligned}$$
    (15)

    where \(\alpha \) is a tuning factor to generate a profile that is less sensitive to the shape of probability distributions. In this paper, \(\alpha \) is set to 2 for both Renyi and Tsallis entropies.

  2. 2.

    Sub-band energies Sub-bands are obtained with digital filters to extract features from each one. The used sub-bands are delta: 0-4 Hz, theta: 4-8 Hz, alpha: 8-12 Hz, beta: 12-30 H, and gamma: 30-50 Hz [61]. The energy distribution between S and NS segments is quite different [62]. For a normal EEG segment, the energy is included in the delta wave, while the same wave in the seizure contains a small proportion of the total energy. Therefore, sub-band energy is effective in EEG seizure detection. The sub-band energies in HMS are expressed as follows [63]:

    $$\begin{aligned} e_{i}=\sum _{f=0}^{k-1}h^{2}_{i} \end{aligned}$$
    (16)

    where k represents the total number of frequency bins and \(h_{i}\) is the \(i^{th}\) sub-band of the spectrum.

  3. 3.

    Higher-order statistics The distribution of the samples of an EEG signal is characterized by its level of dispersion and asymmetry \(\mu \). Hence, skewness and kurtosis are utilized as main features for the EEG seizure detection problem. For an N-point sequence, \(X=x_1,x_2,..., x_N\), the corresponding skewness \(\beta _1\), and kurtosis \(\beta _2\) are calculated as follows [64]:

    $$\begin{aligned} \beta _{1}= & {} \frac{1}{N}\sum _{i=1}^{N}(\frac{x_{i}-\mu }{\sigma })^{3}\end{aligned}$$
    (17)
    $$\begin{aligned} \beta _{2}= & {} \frac{1}{N}\sum _{i=1}^{N}(\frac{x_{i}-\mu }{\sigma })^{4} \end{aligned}$$
    (18)

    where \(\mu \) represents the sample mean of the sequence and \(\sigma \) denotes its Standard Deviation (SD). The skewness and kurtosis are computed from the second-order, third-order, and fourth-order moments.

Fig. 2
figure 2

Comparison of HHT spectral analysis of Seizure (S) and Non-Seizure (NS) EEG signals for the frequency band of 0-90 Hz

3.2.3 Performance metrics

The proposed approach performance is evaluated using standard metrics such as sensitivity, specificity, and accuracy [65, 66]. \(T_P\) is the total number of true seizure events, whereas \(T_n\) denotes the total number of true normal events. The variables \(F_P\) (False positive) and \(F_n\) (False negative) denote the total number of erroneous seizures and normal events, respectively.

$$\begin{aligned} Sensitivity= & {} \frac{T_p}{T_p+F_n} \times 100\%\end{aligned}$$
(19)
$$\begin{aligned} Specificity= & {} \frac{T_n}{T_n+F_p} \times 100\%\end{aligned}$$
(20)
$$\begin{aligned} Accuracy= & {} \frac{T_p+T_n}{T_p+T_n+F_p+F_n} \times 100\% \end{aligned}$$
(21)
Fig. 3
figure 3

Entropy distribution of Shannon (a), Renyi (b), and Tsallis (c) of Seizure (S) and Non-Seizure (NS) EEG signals for CHB-MIT dataset

Table 3 Mean values of entropies of S, NS and PS activities (SD shown in parenthesis) for the CHB-MIT dataset
Table 4 Mean values of skewness and kurtosis of S, NS and PS activities (SD shown in parenthesis) for CHB-MIT dataset

3.2.4 Results and discussion

The proposed approach performance is evaluated on the CHB-MIT dataset [51]. EMD processing is performed, and hence, HHT is applied on the EEG signals. Then, the HMS can be calculated. Figure 2 shows the HMS spectrum for S and NS EEG signals. It is clear that the magnitude of the HMS is different for S and NS activities.

Figure 3 illustrates the spectral entropies distribution for S and NS activities for multi-channel EEG signals. The mean and SD for the features collected for the different activities of EEG signals are shown in Tables 3 and 4. The mean values for S, NS, and PS activities are all distinct. Except for the kurtosis, all features have a small SD. Furthermore, in comparison with other activities, the mean values of kurtosis for S, NS, and PS activities are relatively large. The kurtosis is progressively decreased as the IMF level increases (Table 5).

Table 5 Mean values of sub-band energies of S, NS and PS activities (SD shown in parenthesis) for CHB-MIT dataset
Table 6 Classification Performance using SVM, KNN, and LR classifiers for CHB-MIT dataset

Table 6 presents the obtained results for SVM, KNN, and LR classifiers. An accuracy of 100%, a sensitivity of 100%, and a specificity of 100% are obtained for both SVM, KNN, and LR classifiers. For the last case, which is especially important during on-line detection of seizure occurrence for an epilepsy patient, an accuracy of 100%, a sensitivity of 100% and a specificity of 100% are obtained for both SVM, KNN, and LR classifiers. Thus, these results might be valuable for implantable devices such as the cranial implanted Respirator Neuro-Simulator (RNS) [67]. The proposed approach holds prospect for such devices, since it can detect seizures accurately as evidenced from the ability to discriminate pre-seizure from seizure classes with a 100% accuracy (Fig. 4).

3.3 CNN-based approach

This proposed approach for seizure detection and prediction is described in Fig. 5. It adopts the spectrogram estimation process to transform the EEG signals into an image-like format. The spectrogram of an EEG signal is an estimation of the time evolution of the EEG frequency content. After image acquisition, a CNN is used to extract deep features from the spectrogram images. The CNN is responsible for taking an input image and assigning learnable weights and biases to different objects in that image. Convolutional, pooling, and depth concatenation layers are used for the process of feature extraction. Finally, the extracted deep features are used for classification to obtain the detection and prediction results.

3.3.1 Spectrogram estimation

A spectrogram shows how the frequency content of a signal changes with time. The spectrogram graph shows the energy content of a signal expressed as a function of time and frequency. The produced graph shows amplitude-dependent colors with the horizontal and vertical axes as time and frequency. The first step to calculate the spectrogram is the segmentation of the EEG signal to equal-length windows. The window size should depend on the non-stationary nature of the EEG signal. The idea here is that the spectral properties of an EEG non-stationary signal can be displayed through a series of spectral snapshots. As a non-stationary signal, EEG signal frequencies change with time. Choosing the segment length is the most important step in the spectrogram estimation, because it determines and fixes the frequency resolution. The segment length (time resolution) must be short enough. In this proposal, we use a sliding window of size 1 second, and a temporal resolution of 1 second with no overlapping. The next step in the signal analysis is the computation of the spectrum to get the short-time Fourier transform. Finally, the power of each spectrum is displayed segment by segment. These spectra are laid side by side to form the image. A magnitude-dependent color map is produced as an image. Figure 6 shows a number of spectrogram images including the NS, PS, and S cases. These images belong to patients 1, 2, and 3.

Fig. 4
figure 4

Box plot of each sub-band energy features for S and NS activities

Fig. 5
figure 5

Block diagram of the proposed CNN-based approach

Fig. 6
figure 6

Various spectrogram images for patients 1, 2, and 3: (a) NS case, (b) PS case, and (c) S case

3.3.2 Convolutional neural network (CNN)

The CNN architecture is inspired by the organization of the visual cortex. In addition, this architecture is similar to the connectivity pattern of neurons in the human brain. Individual neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. Such fields overlap to cover the entire visual area. The  CNN depends on relevant filters to extract the temporal and spatial dependencies in an image. In this work, we propose a CNN model in which the residual learning and depth concatenation strategies have been adopted as illustrated in Fig. 7. The size of the input layer is \( 227 \times 227 \times 3 \). The proposed model contains thirteen convolutional layers, each of which produces an output feature map \( f_{x,y,k}^{c,l} \) for a particular layer l and an input \(f_{x,y}^{O_p,l-1} \) [18]:

$$\begin{aligned} f_{x,y,k}^{c,l}={W_{k}^{l}}^Tf_{x,y}^{O_p,l-1}+b_{k}^{l} \end{aligned}$$
(22)
Fig. 7
figure 7

The architecture of the proposed CNN model

where \(W_{k}^{l}\) are the shared weights, \(b_{k}^{l}\) is the bias and c denotes convolution. \(O_p\) represents the input image, for \( l = 1 \), while it represents convolution, pooling or activation, for \( l > 1 \). Furthermore, we use different kernels for each convolutional layer. For example, layers (1, 6, 7, 12, and 13) have 192 kernels, layers (3, 5, 9, and 11) have 128 kernels, and layers (2, 4, 8, and 10) have 64 kernels. All layers have kernels of size \( 3\times 3 \) except layers 2 and 8 that have kernels of size \( 1\times 1 \). Each convolutional layer is followed by a Rectified Linear Unit “ReLU” activation function. This function transforms the weighted sum of inputs that goes into the artificial neurons.

In addition, a residual learning strategy is applied. This strategy is used to optimize the loss of CNNs in an easy way. The output of a residual block R can be expressed as:

$$\begin{aligned} f_{x,y,k}^{R,l}=f_{x,y}^{O_p,l-q}+F(f_{x,y}^{O_p,l-q},W_k ) \end{aligned}$$
(23)

where \( f_{x,y}^{O_p,l-q} \) is the input feature map, F(.) is the residual mapping to be learned, and q is the total number of stacked layers. The proposed model includes four residual learning blocks. Each block includes a depth concatenation layer to increase the depth of the feature map by concatenating the feature maps that are generated by various filter sizes. The \( 1^{st} \) depth layer concatenates the output from the 3\(^{rd}\) and 5\(^{th}\) convolutional layers. The \( 2^{nd} \) depth layer concatenates the output from the \( 1^{st} \)and \( 6^{th} \) convolutional layers. The \(3^{rd} \) depth layer concatenates the output from the \( 9^{th} \) and \( 11^{th} \) convolutional layers. Finally, the \( 4^{th} \) depth layer concatenates the output from the \( 7^{th} \) and \( 12^{th} \) convolutional layers. By adding the concatenation layers, the number of units at each stage can be increased without an uncontrolled blow-up in the computational complexity at later stages. The improvement of the computational resources allows for increasing the number of stages and the width of each stage of a CNN. Moreover, maximum pooling, fully-connected, and softmax layers are found in the proposed model. The maximum pooling layer computes the maximum value in a local spatial neighborhood, and then reduces spatial resolution. Fully-connected and softmax layers are used for classification and computing the loss, respectively. Table 7 provides the number of kernels and the size of each kernel for each convolutional layer.

3.3.3 Experimental results

Experiments are carried out on the signals of a group of patients from the CHB-MIT dataset. The used dataset is divided into three classes, namely NS, PS, and S. The performance of the proposed approach is measured in terms of accuracy, specificity, precision, sensitivity, and F-score. Moreover, the performance of the proposed approach is compared with those of pre-trained CNN models such as VGG19, ResNet101, and Inceptionv3.

Our target is to reach the optimal performance of the proposed CNNs. To achieve that, we have to select the Optimization Algorithm (OA) to be used and adjust the values of various hyperparameters such as weight decay, momentum value, mini-batch size, maximum epochs, and learning rate.

  • The OAs are mainly used to reduce the losses by changing weights and learning rate. Here, three optimization algorithms are selected, namely Adaptive Moment estimation (Adam), Root Mean Square propagation (RMSprop), and Stochastic Gradient Descent with Momentum (SGDM).

    Table 7 The number and size of kernels for each convolutional layer of the proposed CNN model
    Table 8 Performance of the proposed CNN for different optimization algorithms and learning rates
  • Weight decay is a DL technique that adds a penalty term to the cost function to shrink the weights during back-propagation. The best value of weight decay is between 0 and 0.1. Here, the weight decay is set to \( 5 \times 10^{-4}\).

  • Momentum is a gradient-descent algorithm used to overcome the oscillations of the cost across flat spots and noisy gradients of the search space. The momentum value is set to 0.9.

  • Mini-batch size is defined as the amount of data included in each epoch weight change. Each epoch consists of one forward pass and one back-propagation pass over all the training samples. The mini-batch size is set to 32. Also, we select the maximum epochs to be 5.

  • Learning rate is a parameter used to adjust the CNN model in response to the estimated error, when the weights are updated. We select three values of learning rate, namely 0.1, 0.01, and 0.001.

Table 8 shows the performance of the proposed CNN. We consider the data of patient 1 and the classification scenario of NS vs. PS. As mentioned before, we have three classes for each patient. All NS images of all patients are grouped together under the same class NS, and this is also applied to PS and S images. We have 10,000 images for each class. These images are divided into 70% for training and 30% for testing. We intend to perform three scenarios of classification. In the \( 1^{st} \) scenario, classification is performed between NS and PS classes. The \( 2^{nd} \) scenario provides the classification results between NS and S classes. Finally, the \( 3^{rd} \) scenario differentiates between NS, PS, and S classes, as shown in Table 9. Figures 8, 9 and 10 introduce the ROC and precision sensitivity curves for PS and S, S and NS and finally, PS, S, and NS cases, respectively.

The results demonstrate the effectiveness of using CNN models for EEG signal classification and seizure detection. The proposed CNN model achieved an accuracy of 97.66% for NS versus PS, 95.59% for NS versus S, and 94.51% for NS versus S versus PS cases. This accuracy is significantly higher than that of other models used in this study, such as VGG19, ResNet101, and Inceptionv3. The high accuracy of the proposed CNN model is attributed to its ability to extract features from the EEG signals that are more relevant to the task of seizure detection. The CNN model uses a combination of residual learning convolutional and pooling layers to extract features from the EEG signals, which allows it to learn complex patterns in the data. Additionally, the use of a large number of parameters in the CNN model allows it to capture more information from the EEG signals, which in turn leads to better performance. In terms of sensitivity, precision, and F-score, the proposed CNN model also outperforms the other models. The sensitivity, precision, and F-score of the proposed CNN model are 95.79%, 94.86%, and 95.32%, respectively, for NS versus PS, 94.73%, 93.68% and 94.2%, respectively, for NS versus S, and 93.04%, 92.47%, and 92.75%, respectively, for NS versus S versus PS cases. Hence, the proposed CNN model is not only accurate, but also highly specific and sensitive in identifying seizures.

However, it is important to note that the results presented here are based on a specific dataset, the CHB-MIT dataset, which contains 6 male and 18 female subjects. Therefore, it is essential to test the proposed CNN model on other datasets to confirm its generalizability. Additionally, the used dataset is not large enough for generalization.

Table 9 Classification performance using CNN-based model with CHB-MIT dataset
Fig. 8
figure 8

ROC and precision recall curves for PS versus S cases

Fig. 9
figure 9

ROC and precision recall curves for S versus NS cases

Fig. 10
figure 10

ROC and precision recall curves for PS, S and NS cases

Table 10 Comparison between the proposed approaches and the other existing methods in terms of accuracy using the CHB-MIT dataset

3.4 Comparison with the state-of-the-art methods

A comparison between the proposed approaches and other published ones on the CHB-MIT dataset is presented in Table 10. It is clear that the proposed approaches achieve better results than those of the state-of-the-art methods. It produces improvement in terms of accuracy for the classification of NS and S classes. It provides an accuracy that reaches 100% and 95.59% with ML-based and CNN-based classification, respectively. In addition, it outperforms the other methods for the classification of PS and NS classes. It provides an accuracy that reaches 100% and 97.66% with ML-based and CNN-based classification, respectively.

4 Conclusions

Two different approaches have been adopted for EEG signal classification in this paper. The first one depends on the HMS, spectral entropies, higher-order statistics and sub-band energies, while the second ones depends on a CNN model with spectrogram images. The models were evaluated on the CHB-MIT dataset. The obtained results for the ML-based model were 100% for accuracy, while the obtained results for the second CNN-based model were 97.66%, 95.59%, and 94.51% for seizure versus pre-seizure, non-seizure versus seizure, and non-seizure versus seizure versus pre-seizure classes, respectively. These results reveal that the proposed approaches have high accuracy in detecting seizures, and can be used for seizure detection in clinical settings. However, there are also some limitations to this research. One limitation is the small sample size of the CHB-MIT dataset, which may not be representative of the general population. Additionally, the models were only evaluated using EEG data, and it would be beneficial to evaluate the models using other types of neurological data as well. Future work could include expanding the dataset to include a larger and more diverse sample of individuals, evaluating the models using other types of neurological data, and exploring other ML techniques to improve the performance of the models. Additionally, it would be valuable to conduct a clinical trial to test the practical applicability of the proposed approaches in a real-world setting.