1 Introduction

Epileptic seizures are one of the most common brain disorders, and according to World Health Organization (WHO) reports, approximately 50 million people worldwide are suffering from it. Mainly middle income and low-income countries people are affected by this disease. Seizures results due to excessive discharge of neurons in the brain. It can result in a sudden fall if a person is standing or an accident if a person is driving. Individuals detected with epilepsy have a higher death rate as compared to a healthy person.

Generally, an EEG machine is used to capture the electrical activity of the brain for diagnosis purposes. The significant advantage of EEG is its high temporal resolution, noninvasiveness in nature, and low cost. EEG is obtained by placing electrodes on the human scalp. The widely recognized 10–20 method is used for this work. Highly skilled neurologists are required to analyze EEG readings; still, the wrong prediction is probable. So, an automated system to assist clinicians is high in demand. Observing the dreadful situation of this disease, researchers have done a lot of work in the last decade.

1.1 Related work

Researchers have proposed many hand-engineered feature extraction techniques for epileptic seizure detection systems in the last twenty years. Time analysis [13], frequency analysis [41], and time-frequency(t-f) analysis [32] have been used to extract features followed by machine learning classifier. Discrete wavelet transform (DWT) is one of the most widely used techniques for this purpose [25]. Different variants of entropy [26], hurst exponent [24], Djorth parameters, statistical averages, average frequency, relative spike amplitude are the most important features calculated in previous works. Support vector machine(SVM) [28], K-Nearest Neighbour (KNN), Naïve Bayes, Random Forest [40], artificial neural network (ANN) [38] are some of the most important classifiers used for the detection purpose. The major disadvantage of these automated systems is they require a specialist to select the optimum set of features.

After the advent of deep learning, various groups have started working on it. Deep learning has the advantage of automatic feature selection. Ulah et al. [39] presented a pyramidal structure of CNN and used a large number of epochs to reach acceptable accuracy. As well as data augmentation has also been done in their work to increase the number of instances which is an additional stage in the automated system. In [23], authors have used a combination of 1D and 2D CNN network followed by autoencoder for detection in mobile multimedia framework. They have achieved 99.02% accuracy on CHB-MIT dataset and the total training time reported is around 7 h. Zhou, Tian, and Cao have fed time domain and frequency domain signals in the CNN network for seizure classification purposes [44]. In [11], authors have used plot images of EEG as input, to CNN to classify seizures and non-seizures. The model took approximately 11 h to train on GPU system for 50 epochs. In another study, the authors have incorporated three convolutional layers to learn the seizure properties from data then fed into the Nested Long Short Term Memory (NLSTM) to predict the output label [20]. San-Segundo et al. [35] introduced an algorithm where the outputs of Fourier transform, wavelet transform, and empirical mode decomposition are fed to CNN, which comprises two convolutional layers and three fully connected layers. A hybrid model of CNN and LSTM is used for seizure detection and tumor as well as identification of eye status [21]. In [5], authors have converted EEG data into saliency encoded spectrograms and uses an ensemble of CNN for classification.

However, all the above-mentioned deep learning methodologies give good accuracy; still, complexity is increased in terms of the number of layers, number of stages, number of epochs, and hybrid approach. The high complexity of the model burdens the computational requirements. In addition to that, all deep learning networks are very time-consuming in the training process, and usually, a GPU is required [15]. Also, some methodologies convert 1-D EEG signals to 2-D images to fed into CNN which may result in loss of important information. So, real-time implementation would be a little cumbersome task. Hence, there is a need for a useful model to overcome these issues.

1.2 Contribution of the work

This main objective of this work is to make an automated seizure detection system using the CNN framework having the least computational requirements and training time. Along with that, the authors have also focused on the robustness of the extracted features. The main challenge is getting a high detection rate in terms of less complexity and reduced training time. The proposed 1D CNN model directly takes raw data; it doesn’t require any data splitting or augmentation steps, and fewer epochs are needed to extract features from the model. These things reduce the complexity of the model and make it suitable for real-time clinical monitoring. The features extracted from the proposed CNN model in this paper are merged with Logistic Regression (LR), Support vector machine (SVM), Random Forest (RF), K-Nearest Neighbour (KNN), Gaussian Naïve Bayes (GNB), Decision Tree (DT), Adaboost. The performance comparison depicts the sturdiness of the proposed CNN model.

The rest of the paper is structured as follows. Section 2 presents the dataset used, and Section 3 describes the proposed model. Results have been shown in Section 4 and discussion in Section 5. Finally, the conclusion has been presented in Section 6. Figure 1 shows the schematic of the proposed methodology. The detailed description of each block is described in the methodology section.

Fig. 1
figure 1

Proposed methodology using 1D CNN framework and ML classifier

2 Dataset description

The experiments have been performed on a publicly available EEG dataset of the University of Bonn, Germany [3]. The EEG dataset has five different groups, namely Z, O, N, F and S, and it is collected with the help of 128 channels. Each group of the dataset has 100 segments. The duration of each segment is 23.6 s and has the sampling frequency of 173.61 Hz.

The Z and O group segments were acquired from healthy person with eyes open and closed, respectively, using 10–20 system. The data collected in groups N and F are from hippocampal formation and epileptogenic zone, respectively, during seizure-free periods of epileptic patients. Segments of group S were taken from epileptic patients during the seizures period. Figure 2 presents EEG segment of Z, O, N, F and S group.

Fig. 2
figure 2

Signal traces of the five groups (a) Z, (b) O, (c) N, (d) F, (e) S

3 Methodology

3.1 Feature extraction

In this paper, features for epileptic seizure detection system are extracted using CNN. The advantage of using CNN is there is no need to select features manually. CNN is a type of ANN whose concept is inspired by the animal visual cortex. It is also known as shift-invariant artificial neural networks (SIANN). In recent years, handmade feature extraction techniques have been replaced by deep learning methods. CNN is a subset of deep learning that has attracted the research community, especially image classification problems. It has been used for fundus images [8], x-ray [10], CT scan [42], and other medical images in recent years. Many variants of CNN have also been used in object detection [27, 43]. CNN uses convolution operation to learn higher-order features from the data. It consists of a convolutional layer, pooling layer, and activation function. The proposed technique uses CNN on one-dimensional EEG signals for the best results in the terms of computational complexity, training time, and feature extraction. Table 1 shows the architecture of the proposed CNN model and the description of layers is as follows.

One dimensional convolutional layer

It comprises kernels (filters) that are convolved with the input EEG signal. A kernel is a matrix that slides across the EEG signal to perform convolution operation after the operation feature map is generated.

$${f}_{m}=\sum\nolimits_{n=0}^{N-1}{x}_{n}{l}_{m-n}$$
(1)

where \(x\) is EEG data, \(l\) is filter, and N is the number of elements in \(x.\) The subscript denotes the nth element of the vector, and the output vector is f.

Pooling layer

There are different types of pooling layers. Maximum pooling, average pooling, global maximum pooling and global average pooling. In this work, one maximum pooling layer and one global average pooling layer are explored for better results. Pooling is used for downsampling operation. Maximum pooling helps in reducing the dimension of output neurons from the convolutional layer by selecting only the maximum value in each feature map, which prevents overfitting and reduces computational intensity. Global average pooling layer calculates the average of the previous layer feature map, and it is used in place of fully connected layers. Pooling layers do not have any trainable parameters.

Rectified linear activation unit (ReLu)

Activation function is used to activate the node’s summed input after every convolutional layer. In this work, ReLu activation function is used in which it assigns zero for all negative values and has a linear identity for all positive values. The advantage of using Relu is the model converges fast and takes less time to train.

Batch Normalization (BN)

Each layer of the neural network tries to correct its output due to an update in weights and biases during the training process. Due to which one has to initialize the parameters carefully and has to choose a small learning rate. This phenomenon is known as the internal covariate shift. To overcome this problem, batch normalization has been proposed [17]. Batch normalization normalizes the activation layer’s value in mini-batches at each layer’s input, due to which neural network stability increases. It helps in avoiding the special initialization of parameters yet provides faster convergence. In the proposed model, three BN layers have been used.

Table 1 CNN structure for feature extraction

Fully connected layer

Fully connected layers connect every neuron in one layer to every neuron in another layer. It accepts flattened output from the convolutional layers.

3.2 Classification

The end layer of the CNN is replaced by machine learning classifier and seven classification techniques have been taken into consideration, (a) Random Forest, (b) Support Vector Machine, (c) K-Nearest Neighbour, (d) Gaussian Naïve Bayes, (e) Decision Tree, (f) Logistic Regression, and (g) Adaboost. The brief elucidation of these classifiers is as follows:

  1. (a)

    Random Forest - It is an ensemble learning method made out of various decision trees. In this classifier, the outputs of all decision trees are aggregated for classification [6]. The simplicity and flexibility of DT are combined in RF, which results in improvement of accuracy. The operation of RF algorithm is as follows: It creates bootstrapped dataset by selecting samples randomly from the original dataset and having the same dimensions as the original. The same sample can be selected more than once. After that, a DT is build using bootstrapped dataset but using only a random subset of variables at each step. Now again, it creates a new bootstrapped dataset and builds a DT in the same manner. It continues this process a number of times resulting in a wide variety of DT; this makes RF more effective than individual DT. Predictor values run down to all trees, and it belongs to a class that has more votes. The parameters used in the experimental work are: number of estimators = 300, maximum depth = 100, and minimum sample split = 3.

  2. (b)

    Support Vector Machine - This algorithm classifies the data points by finding a hyperplane in an N-dimensional space, where N is the number of features. To maximize the margin between data points of different classes is the main aim of this classifier so that future data points can be classified with more confidence.

  3. (c)

    K-Nearest Neighbour - It is a non-linear, non-parametric, and one of the simplest classifiers. It classifies the data based on the similarities between the sample [19]. Similarity can be termed as closeness, proximity, or distance. The most popular method of calculating distance is Euclidian distance (ED).

    $$ED= \sqrt{\sum\nolimits_{i=1}^{n}{\left({y}_{1i}-{y}_{2i}\right)}^{2}}$$
    (2)

    where, \({y}_{1i}=\left({y}_{11}, {y}_{12}\dots {y}_{1n}\right)\) and \({y}_{2i}=\left({y}_{21}, {y}_{22}\dots {y}_{12}\right)\) are the data points, while n is the number of dimensions. The value of K decides the performance of this classifier. It is one of the widely used classifiers in the industry due to its less calculation time. K is set to be 5 in this work.

  4. (d)

    Gaussian Naïve Bayes - This classifier has a simple probabilistic model which utilizes Bayes theorem for classification purpose. It works on the theory of class conditional independence, which assumes that a particular class’s attribute is entirely independent of others [22]. The main advantage of GNB classifier is it needs less amount of training data compared to other classifiers. In this work, it is assumed that the data has a normal (Gaussian) distribution.

  5. (e)

    Decision Tree - It is a tree-structured classifier. It has two types of nodes: a decision node and another is a leaf node. The decision node performs a test or takes the decision based on specific rules applied to features, and each leaf node shows the outcome of this test [30]. It is simple to understand, interpret, and visualize. In this study, gini index is used for the selection of features at each level selection. The main disadvantage of DT is there is a high probability of overfitting.

    $$Gini=1-\sum\nolimits_{i}{P}_{i}^{2}$$
    (3)

    where \({P}_{i}\) is the probability of ith class.

  6. (f)

    Logistic Regression - It is a special case of ordinary linear regression (OLR) models. It imposes less strict requirements than OLR. LR shows a non-linear relationship between the explanatory variables and response. LR determines the changes in the logarithm of odds of the response variable, and it utilizes the sigmoid function for classification.

  7. (g)

    Adaboost - It is a sequential ensemble classifier that focuses on combining results of weak classifiers to get a strong output. [12]. It assigns proper weights to weak classifiers at any level during training to get a high classification rate. Misclassified data is assigned to higher weights to be adequately classified by the next classifier, and the number of estimators in this experiment is set to 3.

4 Results

Raw EEG data has been directly fed into the proposed model, and the dataset is split randomly ten times into 70:30 ratios for training and testing, respectively. In Table 2, the average results have been reported, and nine experiments have been performed in the experimental setup. Classification accuracy, sensitivity, and specificity are calculated to check the performance of the proposed model. The metrics are given as

$$Accuracy= \frac{TP+TN}{TP+FN+TN+FP} \times 100\%$$
(4)
$$Sensitivity=\frac{TP}{TP+FN} \times 100\%$$
(5)
$$Specificity= \frac{TN}{TN+FP} \times 100\%$$
(6)

where TP = True Positive, TN = True Negative, FP = False Positive, FN = False Negative

Table 2 Results obtained for different machine learning classifiers

LR, RF, SVM, and KNN show an accuracy of 99.83% between Z (healthy person- eyes open) and S (seizures). All classifiers have achieved 100% sensitivity except GNB. To balance the imbalanced data clusters ZO-S, NF-S, ZONF-S, ZO-NFS, and ZO-NF-S Adaptive Synthetic (ADASYN) sampling method is used [16]. Among the three classes, maximum accuracy of 97.93% is achieved by LR classifier. Tables 3 and 4 show the comparison of metrics for different classifiers. It can be seen from both tables that there is a maximum 1.18% difference among different classifiers for detection between seizures and non-seizures. This shows the robustness of the extracted features using the proposed CNN model.

Table 3 Comparison of metrics for data cluster Z-S
Table 4 Comparison of metrics for data cluster N-S

To compare our CNN-based framework, we have removed the last fully connected layer with 100 neurons and passed it through softmax function for classification. It is a type of activation function and usually the last layer in the deep learning classification model. This function gives the probability distribution of different output classes. For testing and training purposes K-fold cross-validation scheme has been used. In this method, data has been split into K folds, and at some point, each part is used as a testing set. K is set to 10 in this experimental setup; hence the data cluster is divided into 10 folds. 10 iterations took place, and in each one, different folds are carried as test data. The final output is taken as an average of the results obtained from each iteration. Training of the model is done using Tensorflow, a deep learning library on Anaconda Software. Cross-entropy loss function and Adam as optimizer have been used in this paper. Learning rate in adam optimizer is taken 0.00000001, and the other five parameters are set to their default values. A small learning rate is taken to avoid local minima and control the oscillation of the network. Batch size of 10 has been selected, and different data clusters have been trained using different number of epochs. Minimum epoch is 200 for O-S, F-S, ZO-NF-S. It can be seen from Table 5 in classification between normal and ictal, data cluster O-S shows the highest classification accuracy of 99.5% and 100% sensitivity and specificity while Z-S gives 99% and ZO-S gives 98% accuracy. The classification between interictal and ictal groups shows a maximum of 98.5% accuracy and a minimum of 98%.

Table 5 Results obtained for softmax classification

5 Discussion

Many feature extraction techniques followed by different classifiers have been proposed for epileptic seizure detection. A comparison of different methodologies has been represented in Table 6. In previous work, authors used only alpha band to detect seizures and achieved maximum accuracy of 98%. Short-time Fourier transform (STFT) was used to extract features, and four time-frequency statistical features were fed to different classifiers to analyze the performance [31]. In another work, Haralick features were extracted from the gamma band, followed by a decision tree classifier and 96% success rate to distinguish between seizures and healthy persons [33]. There is always a possibility that some important information is partially or fully missed in selecting the features in classic feature detectors. However, in the proposed method, since the features are extracted directly from EEG data, no preprocessing steps, transformations, or any feature selection technique are involved, so the maximum information is potentially extracted.

Acharya et al. [1] were the first to use CNN for epileptic seizure detection. They presented a network with 13 layers, which gives 88.7% accuracy for three-class classification on Bonn EEG dataset. They trained the model using 150 epochs. In [4], the authors presented a CNN structure having 18 layers for feature extraction followed by an RF classifier to detect neonatal seizures and achieved 77% accuracy. Raghu et al. [29] used pretrained networks such as Alexnet, Vgg16, Vgg19, and other landmark models on Temple University dataset. They presented two approaches: (a) transfer learning and (b) extracting features from pretrained networks followed by SVM classifier. Transfer learning approach achieved 82.85% accuracy and 88.30% accuracy using extracting feature approach.

Almost all deep learning networks are very time-consuming in the training process. The advantages of the proposed methodology regarding other deep learning techniques are that it does not require any preprocessing step or conversion of 1-D EEG signal to 2-D. On average, only 7 epochs are required to train the model and per epoch takes approximately 10 s. The simulations were carried on Intel(R) Core(TM) i7-8700 CPU@ 3.2 GHz having 8GB RAM. The total training time is less than 2 min. The reduced number of stages and epochs makes this model suitable for portable/ wearable devices.

Table 6 Comparison of the proposed algorithm with other studies on Bonn EEG dataset

The limitations of the proposed methodology need to be acknowledged. The parameters of the model i.e., the number of filters and number of convolutional layers, etc. have been chosen using trial and error method. The experiments are performed on a small dataset. To generalize the results, we will implement them on a large dataset in the future.

6 Conclusions

A CNN-based framework for feature extraction followed by a machine learning classifier was introduced in this paper. The proposed methodology does not require a hand-engineered feature extraction process. It automatically extracts the features optimally based on training data. Few epochs (average seven in the proposed technique) make it faster than other algorithms, a prime condition for real-time application. All classifiers show almost the same accuracy to classify among different data clusters, making the extracted features robust in nature. The model was also compared by terminating the classifier part with the softmax function. It is concluded that machine learning classifiers give more accuracy in few epochs than pure deep learning model. The proposed detection system will be helpful for a neurologist for making decisions correctly. Both cloud-based and standalone systems can be developed using the proposed model.