1 Introduction

The epilepsy disorder is characterized by a fast abnormal evacuation of the brain neurons accompanied by seizures and can affect anyone regardless of age. About 1% of the world's population suffers from epilepsy, a severe neurological disease. The effects include loss of consciousness, audible or visual distortions, unusual behavior, or jerky movements. As a result of epilepsy's consequences, patients' quality of life is significantly impacted [1,2,3,4]. Patients with epilepsy suffer both physical and mental health issues, and it may even threaten their lives in some severe cases. To improve the quality of life of patients with epilepsy, it is important to provide suitable and effective protective measures [5].

A non-invasive biophysical tool called an electroencephalogram (EEG) can detect many neurological disorders since it provides a non-invasive assessment of electrical brain activity [6]. Doctors visually analyze EEG signals to diagnose and treat epilepsy, although this process is time-consuming and difficult to perform. Up to now, numerous attempts have been made to detect epileptic seizure patterns automatically [7,8,9].

An automated diagnosis system is developed through a series of sequential steps that are preprocessing, feature extraction, and classification [10,11,12,13,14,15]. Various normalizations and transformations are performed on the raw signals in the preprocessing stage, in order to standardize the model for the following stages. During the feature extraction stage, the signals are analyzed in order to identify the distinctive signatures present in them. For extracting features from these signals, wavelet transforms [16,17,18], empirical mode decompositions [19, 20], nonlinear features [21], empirical wavelet transforms (EWTs) [22], and variational mode decompositions (VMDs) [23] are commonly used. As a result, handcrafted feature extraction methods are widely used to extract attributes that are then classified by neural networks [24] and support vector machines (SVM) [25]. The DWT decomposition is used to perform three-class seizure categorization using a nonlinear sparse extreme learning machine classifier with an eight-dimensional feature vector [26].

According to the above methods, each EEG signal is decomposed into several levels, and different characteristics from each level are analyzed to increase classification accuracy. The dynamics of the signal cannot be accurately described by any standard group of features. On top of that, it creates a wider range of features, which reduces accuracy and is more likely to overfit the classification model because incoherent and unnecessary features expand that range. This has resulted in computational complexity and a feature-dependent nature of seizure detection approaches.

Deep learning systems are now outperforming classical machine learning techniques in several disciplines [27,28,29]. Analysis of intracranial and scalp EEGs using time-domain and frequency-domain features as inputs to a CNN [30]. A deep learning model, which consists of a 13-layer convolutional neural network (CNN) algorithm, was implemented by Acharya et al. [31] in three-class classifications, and it achieved 88.67% accuracy, 90% specificity, and 95% sensitivity on the seizure detection problem. A further part of the dataset was further divided into four sub-segments of 1024 by Ullah et al. [32]. By reducing the sub-bands by 50% overlap, sub-bands were then formed by utilizing 512 window lengths. They used pyramidal neural network models of one-dimensional convolution (P-1D-CNN) to classify the EEG data. In their studies, 99.1% of seizures were detected accurately. According to Hussein et al. (2018), the data were initially divided into segments whose non-overlapping steps were then used to detect patterns between successive samples of EEG data. To classify normal and seizure EEG signals with high-level features learned from the long short-term memory (LSTM) network and the Softmax classifier, accuracy of 90.0–100% has been reported in [33].

The long short-term memory (LSTM) developed by Hochreiter and Schmidthuber has been improved by Alex Graves. Context information is used by recurrent neural networks in the mapping process of inputs and outputs. This is a special recurrent neural network (RNN) that incorporates context information into the mapping between input sequences and outputs. LSTM was considered an improved version of the conventional recurrent neural network (RNN). A two-layer LSTM network has been proposed by Tsiouris et al. by using LSTMs to predict seizure occurrences [34]. The long-term recurrent convolutional networks (LRCNs) can be used to detect and locate epileptogenic zones using image-based scalp EEG waveforms developed by [35]. The fully convolutional nested long short-term memory (FCNLSTM) detects epileptic seizures end-to-end automatically [36]. Additionally, CNN-LSTM architectures are combined to detect seizures [37]. In this 1D-CNN extracts features, whereas an LSTM network is used to recognize EEG sequences. According to this model, the accuracy of the binary class is 99.39 and 82.00% for five classes.

This paper has the following main contributions:

  1. (i)

    The performance of deep learning models depends on more data. There are not enough data in datasets to train deep learning models. So, this work uses overlapped sliding rectangular window technique to create a large number of samples from raw EEG data.

  2. (ii)

    As an alternative to current methods, we develop 1D-CNN-LSTM hybrid model to improve seizure detection accuracy.

  3. (iii)

    Furthermore, the purpose of this paper is to assess the performance of an equal as well as unequal number of EEG samples from each class in order to detect different epilepsy cases.

  4. (iv)

    More emphasis is placed on fine-tuning the hyperparameters of the classification model to produce accurate results.

  5. (v)

    All state-of-the-art performance measures are utilized in order to ensure reliable results in classification. To further increase reliability, K-fold validation is applied.

  6. (vi)

    The proposed research work outperforms many existing systems without incorporating any feature engineering onto the given dataset and is evaluated and compared with recent existing approaches.

The remaining paper has been organized as follows: The EEG dataset details are described in Sect. 2. Section 3 describes the framework that has been proposed for the classification of the given dataset. Section 4 examines and analyzes the classification results of both datasets. Section 5 effectively concludes the proposed work's findings.

2 Materials and methods

2.1 Datasets

Two different single-channel EEG datasets were used to evaluate the efficacy and robustness of the proposed model for seizure EEG signal classification. A total of five clinically relevant multiclass classification tasks (CTs) have been examined using database sets from Bonn University (BoU) and Neurology and Sleep Centre (NSC).

2.1.1 Dataset-1: Bonn University dataset (BoU)

In this study, we used a freely available EEG dataset [38]. Five subsets are included here: A, B, C, D, and E. All subsets contain 100 single-channel EEG signals. It takes 23.6 s for each signal. Each of sets A and B consists of five healthy individuals undergoing extracranial EEG recordings in awake and relaxed states with their eyes opened or closed, respectively. An archive of presurgical EEG signals is used to produce sets C, D, and E. After hippocampal resection, five patients are selected whose seizures have been completely controlled. Therefore, epileptogenic zones are diagnosed in these resection sites. EEG signals from the epileptogenic zone and hippocampal formation in opposite hemisphere are captured during seizure-free intervals (i.e., interictal) in sets C and D. An ictal EEG recorded with intracranial electrodes corresponds to the signals of seizure activity in set E. A sampling rate of 173.61 Hz is used for recording the signals in digital format. Each signal consists of 4097 samples. The following five sets of EEG signals were used to develop four different classification tasks (CTs). Table 1 summarizes the CTs as well as their descriptions.

Table 1 The details of CTs from two datasets

2.1.2 Dataset-2: Neurology and Sleep Centre dataset (NSC)

The dataset used in this study was collected from the Neurology and Sleep Centre (NSC), Hauz Khas, New Delhi [39]. The gold-plated electrodes were arranged on the scalps of ten epileptic patients in accordance with the 10–20 electrode placement system, resulting in 200-Hz sampling rates of EEG obtained with a Grass Telefactor Comet AS40 Amplification System. After filtering EEG recordings between 0.5 and 70 Hz, an expert team classified filtered recordings into preictal, interictal, and ictal categories. Based on the three classes, there are three folders: preictal, interictal, and ictal. For each of these folders, there are fifty EEG segments with 1024 points of data. Here, preictal, interictal, and ictal classes are referred to as G, H, and I, correspondingly. We designed a G–H–I classification task using the sets G, H, and I described in Table 1.

2.2 Preprocessing of EEG dataset

2.2.1 Data preparation

Dataset-1: The Bonn dataset, which is a smaller dataset not suitable for deep learning, was divided into equal parts in order to increase the number of samples in the dataset. A total of 4097 data points are contained in each EEG record. It was necessary to overlap the last data points with the previous samples in order to segment the dataset into 24 equal pieces. An average of 174 samples is contained in each piece. Therefore, each set consists of 2400 observations. In total, 12,000 observations were collected across five sets.

Dataset-2: This is also a smaller dataset, so the same procedure was followed to increase the total number of samples. Each signal consists of 1024 data points, which are divided into 200 sub-samples with overlapped sliding rectangular windows. Thus, each set is comprised of 60 segments, resulting in 3000 samples. The total number of data points is 9000. In Table 2, the number of samples generated from data preparation is shown.

Table 2 The number of EEG samples in each set

Figure 1 illustrates the method for dividing data into training, validation, and testing sets of A–C–E classification tasks. The whole dataset is divided as follows: 80% for training and 20% for testing. Those training/testing sets are referred to as Group-1. In order to obtain metrics and fine-tune hyperparameters used for seizure detection, we perform tenfold cross-validation (CV) splitting on 10% of validation data at each fold on the training set. These training and validation sets belong to Group-2. The proposed model is evaluated using 20% of testing set Group-1.

Fig. 1
figure 1

Dataset split for A–C–E classification tasks

2.2.2 Z-score

EEG signals recorded from the scalp are highly non-stationary and dynamic. For the purpose of reducing variations in EEG signals among different datasets, we use the common Z-score normalization method. For a given EEG signal \(X \in R^{L \times 1}\), L represents the Length or sampling frequency of the signal.

The mean value (\(X_{{{\text{mean}}}}\)) and standard deviation (\(X_{{{\text{std}}}}\)) value of the input signal is calculated using Eqs. (1)–(2), respectively.

$$X_{{{\text{mean}}}} = \frac{1}{L} \mathop \sum \limits_{i = 1}^{L} X_{i}$$
(1)
$$X_{{{\text{std}}}} = \sqrt {\frac{{\left( {X_{i} - X_{{{\text{mean}}}} } \right)^{2} }}{L}}$$
(2)

Z-score is calculated using Eq. (3)

$$X_{{z{\text{score}}}} = \frac{{X_{i} - X_{{{\text{mean}}}} }}{{X_{{{\text{std}}}} }}$$
(3)

2.3 Proposed model

2.3.1 Convolutional neural network (CNN)

Among deep learning-based networks, CNNs are most popular for learning features in a variety of tasks. A CNN algorithm does not require defining features manually, unlike a traditional machine learning algorithm. Convolution kernels are used to generate the local receptive field, which is then automatically identified from the raw data to avoid removing valuable information. It is possible to classify features by a CNN without relying on a more traditional framework that usually involves learning features and classifying them separately. Normally, CNNs have five layers: an input, a convolutional, an activation, a pooling, and a fully connected layer [40]. This type of convolution arises from the one-dimensional data signals designed for analyzing epilepsy EEG signals for the purposes of processing epilepsy signals. Using a one-dimensional convolutional layer [40], the neuron i produces the following output:

$$y_{i} = f\left( {\mathop \sum \limits_{j = 1}^{n} x_{j} *W_{i - j + n} } \right) + b$$
(4)

where \(W\) is the convolutional kernal, \(x_{j}\) is the input EEG signal, the number of samples in x is n, the bias is b, and the nonlinear activation function is f.

2.3.2 Long short-term memory network (LSTM)

Time series with nonlinear characteristics are frequently analyzed by LSTM. Three gates are included in an LSTM: the forget gate, the memory cell, and the output gate, which increase its capability to process temporal information [41].

With the help of the sigmoid function, the forget gate determines what information from the previous cell state should be discarded and is given as follows:

$$f_{t} = \sigma \left( {W_{f,x} *X_{t} + W_{f,h} *h_{t - 1} + b_{f} } \right)$$
(5)

During the input gate, the state is updated or not based on the current input. There are three steps involved in adding information. Using a sigmoid function, it regulates which values from \(X_{t}\) and \(h_{t - 1}\) need to be added to the cell state. The second step creates a vector representing all of the temporary cell gates with tanh function. As a final step, the regulatory filter value is multiplied by the newly created vector, and the resulting information is added to the next LSTM block via the recurrent connections. These operations are calculated by Eqs. (6)–(8)

$$i_{t} = \sigma \left( {W_{i,x} *X_{t} + W_{i,h} *h_{t - 1} + b_{i} } \right)$$
(6)
$$\widehat{{C_{t} }} = \tanh \left( {W_{c,x} *X_{t} + W_{c,h} *h_{t - 1} + b_{c} } \right)$$
(7)
$$c_{t} = C_{t - 1} *f_{t} + i_{t} *\widehat{{C_{t} }}$$
(8)

As a result, the output gate is responsible for selecting useful information from the current state of the cell. Finally, it is obtained using Eqs. (9) and (10) in the output layer.

$$O_{t } = \sigma \left( {W_{o,x} *X_{t} + W_{o,h} *h_{t - 1} + b_{o} } \right)$$
(9)
$$h_{t} = O_{t} *{\text{tan}}h\left( {C_{t} } \right)$$
(10)

The hidden state of an LSTM unit is updated via input gates, the previous hidden state is retained or discarded via forget gates, and the output gates decide if the current hidden state is carried forward into the next cycle.

2.3.3 1D-CNN-LSTM hybrid model

Figure 2 shows the  automatic detection process for epilepsy utilizing a CNN-LSTM model.

Fig. 2
figure 2

Proposed seizure classification architecture

Specifically, the proposed model consists of an input layer, three convolutional layers, a max-pooling layer, two LSTM layers, a time-distributed fully connected (TDFC) layer, and two fully connected (FC) layers, and a Softmax output layer. Table 3 describes the structure of the layers comprising the proposed model.

Table 3 Complete information about the 1D-CNN model including the layers and parameters involved (Bonn University dataset)

Weight sharing and local receptive fields are how CNN models learn the relevant features of input signals. In the LSTM model, each of the time step output characteristics is extracted through the gate mechanism. Instead of taking only the output of the last time step, the time-distributed fully connected layer takes the outputs of all time steps. On this basis, this paper proposes an automatic classification system for epileptic EEG signals that use CNN and LSTM networks.

This proposed model was directly fed the 1D EEG signal data, and its shape is N × 1. In this case, N is 1741 \(\times\) 1 for the Bonn University dataset and 200 \(\times\) 1 for the NSC dataset. In the first convolutional layer, the input data are subjected to 1D convolutional kernels, their shapes are 3 \(\times\) 1 and their stride is 1, which allows to extraction of abstract features of the signal data. In this convolutional layer, followed by batch normalization (BN) and rectified linear unit (ReLU) activation are presented. BN uses re-centering to ensure that the data are nonlinear and fully activates the ReLU to ensure effective training and performance [27].

In Conv Layer 2, 64 kernels in the size of 3 \(\times\) 1 128 kernels in the same size in the Conv Layer 3. Similar to the convolution operation in Conv Layer1, ReLU is also used as an activation function. To reduce the size of the feature matrix, following the third convolutional layer, apply a max-pooling layer, which has a pool window size of 2 with a stride of 2. A total of 128 feature maps with signal length that was 87 \(\times\) 1 are output after the pooling operation.

Two LSTM layers are used in the sequence learning block. Layer1 and Layer2 of the LSTM contain 128 and 64 neurons, respectively, with a dropout ratio of 0.25. Both LSTM layers are set to return sequence so the full hidden state sequence will be output. LSTM layers are translated into meaningful seizure-associated features using a time-distributed fully connected (dense) layer. It has 50 hidden nodes since it takes the outputs of all the time steps as inputs instead of just the output of the last time step. We used a global average pooling (GAP) layer because all EEG segments are equally important in predicting labels.

The output features are fed into three FC layers after they pass through the GAP layer. In FC Layer1, there are 512 neurons, and in FC Layer2, there are 256 neurons. Finally, a Softmax layer is employed to predict the label of each input EEG signal, which is used for calculating the probability that each input EEG segment belongs to each class. The same procedure was followed in dataset 2.

3 Results and discussion

3.1 Training parameters

It is essential to tune the proposed model for seizure detection to achieve the best performance. The performance depends on the number of hidden neurons, the learning rate, the batch size, and the regularization effect. In cross-validation of a training set, hyperparameters are adjusted to control a model's learning capability. Losses are determined by sparse cross-entropy, and it used Adam optimizers whose hyperparameters are decay rate (beta 1 = 0.9), square gradient decay rate (beta 2 = 0.999), and learning rate (0.0001). The batch size of the model was 32 and trained with 100 epochs.

3.2 Data split strategy

There are two parts to the database: training and testing. Initially, from the total amount of dataset, 80% was used for training and 20% for testing. During the learning process, 80% of train datasets are used tenfold cross-validation to tune hyperparameters are used to improve proposed model performance on test data for the dataset. Data distributions were chosen randomly, and specific random seed values were used to assure a reliable and consistent model.

This method was evaluated using cross-validation in order to assess its reliability. There is an equal span of normal–interictal–ictal EEG data samples among the N folds of the training EEG signals. In the proposed model, onefold of the N folds of data was used as test data to evaluate the trained models. Hence, the models have been validated (N–9) and trained using (N–1) folds. The fold validation split has been set up to avoid the problem of overfitting. Every time an epoch ends, a corresponding loss is calculated in order for the model not to overfit the training data. The training process is terminated if the loss remains to rise continuously over ten consecutive epochs evaluating models with the test fold and then repeating this process until all folds have been processed. The final step is to evaluate each classification problem's model efficiency using the remaining 20% of unseen test data.

3.3 Performance evaluation criteria

A Python 3.7.13 environment is used to design and implement the deep learning model used to detect seizure events automatically using the deep learning tool Keras. The Keras library was implemented using TensorFlow 2.8. We performed all experiments on a desktop computer consisting of an Intel(R) Core(TM) i5-6500 processor running at 3.20 GHz, 16 GB of RAM memory, and a 2-GB NVIDIA GeForce GT 710 GPU.

The proposed model performance is measured by the fact that it performs well on the test data that it has never encountered during training. The test data were evaluated using a variety of evaluation criteria [42], which include sensitivity (Sen), specificity (Spe), and recognition accuracy (Acc). They are as follows:

$${\text{sensitivity }}\left( {{\text{Sen}}} \right) = \frac{{\text{number of true positive signals}}}{{\text{number of actually positive signals}}}$$
(11)
$${\text{specificity }}\left( {{\text{Spe}}} \right) = \frac{{\text{number of true negative signals}}}{{\text{number of actually negative signals}}}$$
(12)
$${\text{Accuracy }}\left( {{\text{Acc}}} \right) = \frac{{\text{number of correct signals}}}{{\text{total number of signals}}}$$
(13)

The proportion of correctly detected positive data to the total number of actual positives is considered a measure of how well the system detects positive outcomes is called sensitivity. In terms of specificity, the ratio of correctly detected negative signals to the number of actual negative signals  measures the ability to correctly detect negative signals. Accuracy measures the ratio of accurately estimated observations to the total number of observations.

The confusion matrix of the three classes is presented in Table 4. \(C_{{{\text{si}}}}\) signifies the sum of signals from the seizure class and, according to the proposed system, is classified as signals from the interictal class, and similar interpretations are possible for the rest of the parameters.

Table 4 Confusion matrix based on three classifications

3.4 Results of the 1D-CNN-LSTM model

In this section, Table 5 gives performance metric formulae for three class classification problems.

Table 5 The detailed definition of the three-class performance metrics

Table 6 shows the performance values obtained from cross-validation. Cross-validation results show that proposed hybrid 1D–CNN-LSTM models succeed in achieving high-performance rates, regardless of the dataset partitioning. The five balanced three-class classification tasks achieved an average of 96% on all three performance metrics. However, the unbalanced (AB-CD-E) also achieved an average of 97% Sen, 98% Spe, and 97% accuracy. There is an average classification performance of 95% for five class CTs. It is almost attained 100% of the classification results for the DSC dataset.

Table 6 Cross-validation accuracy results with various CTs

It includes six classification tasks, and Table 7 presents the performance results with the confusion matrix associated with each of the tasks.

Table 7 The classification results for different cases of BoU and NSC

According to the overall accuracy, 97.43% for A–C–E, 97.36% for A–D–E, 99.09% for B–C–E, 99.37% for B–D–E, 97.08% for (AB)–(CD)–E, and 100% for G – H – I. We found that the closed eye signal (dataset B) performed better than the open eye (Set A) signal in the classification of normal–interictal–ictal. A second important finding was that both datasets also achieved higher classification accuracy with a wide sensitivity range.

Based on the whole sets of values from the Bonn University dataset, Table 8 shows the results of five class classifications.

Table 8 Results of five class classifications from Bonn University dataset

3.5 A visual representation of the discrimination power of the proposed model

A dimensionality reduction technique called t-SNE (t-distributed stochastic neighbor embedding) is used to display high-dimensional data in a lower-dimensional space [43, 44]. It is used to visualize the discriminating power of the proposed model on different classes of datasets. In Fig. 3, the 2D space output of the proposed model with the Conv1D, LSTM, and FCN layers is visualized with t-SNEA cluster of feature vectors associated with the same classes of EEGs reside together, whereas the feature vectors associated with different classes of EEGs are separated. It is clear in Fig. 3 that the normal–interictal or preictal–ictal EEG signals have been separated efficiently with less overlapping with other classes, which exhibits the great generalization proficiency of the 1D-CNN-LSTM architecture. Here, ‘Label 0’ is set (AB) and set G, ‘Label 1’ is set (CD) and set H, and ‘Label 2’ is set E and set I.

Fig. 3
figure 3

Two-dimensional feature visualization using t-SNE

4 Discussion

Different datasets are not appropriate for evaluating the different models. The performance of the model can differ depending on the dataset used to train and test it. In order to demonstrate the effectiveness of the proposed model, we have used the same dataset, the EEG dataset from Bonn University and the Delhi sleep center. In Table 9, we compare existing systems with ours, comparing their accuracy, sensitivity, and specificity.

Table 9 Performance comparison using two datasets

The Fourier, wavelet, and EMD transforms are used as an input of the deep learning model with two Conv layers and three FCN layers [45]. The 2D frequency–time scalograms are with 2D-CNN for the classification of EEG signals, which have an accuracy rate of 94–99% but a significantly lower sensitivity rate [46]. Using three Conv blocks and three FC layers, the 1D-CNN classified the three classes with an accuracy of 96.73–98.06% [47]. To detect seizures, Zhao proposed seizureNet [48], which used two parallel 1D-CNNs and achieved classification accuracy between 97 and 99%.

The proposed method achieved better classification results for complicated five classification tasks than the hybrid model used in [37]. In comparison with [43, 45], our results are lower, but the number of data used in our model is larger.

For Dataset 2, Stein kernel-based sparse representation (SR) algorithm was proposed by Hong Peng et al. [49], it reached 97.21% for seizure EEG classification. The method for detecting epileptic seizures is through the use of SVM classifiers, along with a denoising technique that takes advantage of multiscale principal component analysis (MSPCA) and empirical mode decomposition (EMD) with three refined composite multiscale features [50]. In [51], the multiscale spectral features (MSSFs) are derived from the multiscale power spectral density (MPSD), and random forest (RF) was used to classify seizure EEG signals.

As shown in Table 9, the proposed method provides greater discrimination accuracy and sensitivity than that of the previous studies. It was found that the proposed system could differentiate ictal (seizure) from interictal and normal cases with high accuracy. Several studies in the literature are compared with the results. These studies used different techniques to identify features. In some studies, signal-to-image conversion methods are used to increase the success rate. But proposed model using 1D-CNN and LSTM in combination is significantly effective at detecting seizures and identifying features automatically in EEG signals for two different datasets.

5 Conclusion

The proposed architecture is composed of a deep learning-based 1D-CNN-LSTM with dense convolution blocks, an LSTM, and a time-distributed fully connected layer (TDFCL) and deals with three-class classification problems using publically available datasets. The overlapped sliding rectangular window technique creates a large number of samples from raw EEG data. The proposed model significantly improved the accuracy of a variety of EEG classification problems when compared to current state-of the-art methodologies. Therefore, this model provides neurologists with a more reliable and accurate way to diagnose seizures in epileptic patients, improving their confidence in their diagnoses. Moreover, because of the unavailability of labeled EEG data and the complexity of labeling EEG samples, the research could utilize the data augmentation technique of the proposed model which could be used to diagnose and predict other brain diseases such as Alzheimer's disease, mental disorders, and Parkinson's disease. Additionally, we will investigate conducting further research on the topic in the future, including validation of the proposed model on additional datasets, such as multi-channel datasets.