1 Introduction

The brain disorder with abnormal electrical discharges is called Epilepsy. There are recurrent episodes of this abnormality which leads to seizures. The neurons communicate among themselves called electrical communication. Sometimes, this communication is disrupted called ‘Seizures’ (Harender & Sharma, 2017).

Timely seizure detection is most crucial. The electrical activities of the brain are measured by the Electroencephalogram (EEG). Also, EEG automatically analyzes the abnormality of brain activity.

In case of uncommon epileptic detection of seizures, there is the necessity of long-term EEG recording. The frequent seizure episodes need efficient detection methods which may help the people for their safety. There are many challenges associated with seizure detection. Automated systems are being developed to automatically detect the seizures with clear distinction among instances with seizures and without seizures.

The occurrence of epileptic seizures is random for time scale and the signals undergo the stepwise activities of signal pre-processing, extraction of the feature set and classification. These steps reflect the states of the brain (Thakare & Gore, 2019). Many researchers have defined the approaches to apply the techniques for improvement in the performance of seizure detection (Kalbhor & Harpale, 2016).

EEG classification with sparse Bayesian learning improves the performance of machine learning for the application of brain-computer interface (Jin et al., 2020).

Feature selection is an important method for data preprocessing and machine learning (ML) (Anter et al., 2020a, 2021a; Basha et al., 2021; ElSoud & Anter, 2016). Features may be linear or nonlinear in nature with high dimensional, highly correlated, and the vast number of features, which could largely decrease the detection accuracy (Li et al., 2020). So, it is desired to develop a feature selection method to identify the optimal number of features to ensure the continuity of the model work with the highest efficiency. Feature selection methods based on bioinspired optimization algorithms became more successful in different applications (Anter & Ali, 2020; Anter et al., 2020b, 2020c, 2020d; Azar et al. 2020). Therefore, this study aims to develop a new model based on probabilistic particle swarm optimization (PPSO) and Sequential Differential Evolution (SDE) for optimal subset of neuro-features. In addition, machine learning algorithms can be well trained with the support of evolutionary algorithms like Genetic Algorithms (GA) (Ríos et al., 2020). A distributed control approach for real-time data uses the framework of an adaptive ensemble classification (Sufang, 2020). An enhanced version of Differential Evolution uses roulette wheel selection (Ho-Huu et al., 2018). A well-known structural optimization along with frequency constraints is highly nonlinear. There may be many optimum solutions for complex optimization problems for example; the most efficient way to search the cascade controller's parameter space is to use particle swarm optimization (PSO). An integrated mathematical model of a parallel robot platform's hydraulic actuator was employed (Nedic et al., 2014). Moreover, a parallel robot platform can be found by efficiently searching the cascade controller's parameters using a Swarm Intelligence technique like the firefly algorithm (Nedic et al., 2015).

The long-term EEG recording is needed for the detection of uncommon seizures. The frequent seizure episodes need efficient detection methods are required for frequent seizure episodes and safety of people. The occurrence of epileptic seizures is random for time scale and the signals undergo the stepwise activities of signal pre-processing, extraction of the feature set and classification.

In order to handle the challenges associated with seizure detection and classification, automated systems are being developed which automatically detect the seizures.

The classification of seizure and non-seizure data is a challenging task when dealing with complex transformed features with ultra-high dimensions and extracting the best possible features from EEG. Therefore, there is a need to design and develop a method to select optimal features. This article presents a new hybrid approach to select the optimal features.

The rest of this paper is structured as follows: Sect. 2 introduces the related work to the epileptic EEG data. Section 3 discusses the proposed new hybrid method is designed to generate optimal subsets of features. In Sect. 4, experimental results and analysis are presented. Finally, the conclusion and remarks are presented in the last section of conclusions.

2 Related research

The most widely accepted, authorized data repository is with the University Hospital of Bonn.

The research literature discusses several schemes for the analysis of EEG data. Concerning selected methods, the performance parameters reflect the variations in the values. Accuracy, sensitivity, and specificity are mostly used parameters.

The detection of seizures and related analysis with a deep convolutional neural network automatically detects the presence of seizures. It is time-consuming with EEG signals (Acharya et al., 2018). A deep convolutional neural network with thirteen layers is presented for the detection of the three classes which are normal, parietal and seizure namely.

One of the limitations in clinical applications with limited annotated data is the large training data size. In seizure detection research, the group of supervised classification methods is recommended for accurate detection of activities related to epilepsy and real-time EEG data. From various localized channels, EEG facilitates the records with multi-frequency brain signals which are non-stationary.

EEG signals have complex features. In the classification task, the non-stationary feature of EEG needs to be classified. This is an important aspect of the analysis. The tools are linear and nonlinear and used for the analysis of data. With stationary assumptions, tools can be functional to nonlinear signals. This leads to the loss of information in many cases. Nonlinear tools are available which are majorly suited for the signals which are not stationary and not linear (Quintero-Rincón et al., 2018).

In the process of decomposition of EEG data into several brands, the non-stationary EEG signals for analysis of frequency have several transforms. Fourier Transform (FT) is mostly not suitable for signals which are not stationary. Therefore, Short Time Fourier Transform is recommended. It has one constraint and that is the window of finite length. This gives a poor-quality resolution. The solution to this problem is Wavelet transform (WT). This is for the decomposition of the EEG data (Quintero-Rincón et al., 2018).

The EEG signals decomposition leads to better extraction of the feature. Discrete Wavelet Transform (DWT) is the most promising technique which can be applied to the EEG data for decomposition. The features are similar to averaging the power, standard deviation and mean absolute value are extracted. These features are useful for further analysis with K nearest neighbor (Quintero-Rincón et al., 2018).

Discrete Wavelet Transform (DWT) and Continuous Wavelet Transform (CWT) can be applied to EEG data. The core difficulty of CWT is redundancy. Therefore, for the decomposition, DWT is used. DWT is a not redundant and extremely efficient transform. At different levels, this is used for signal decomposition using wavelets (Quintero-Rincón et al., 2018).

In the classification of EEG signals, the major and important role is with problem algorithm and feature selection. They help in extracting the correct and meaningful features of the objective of the problem.

A genetic algorithm is a well-known optimization technique for the selection of optimal features. Genetic algorithm with classifiers has performed the task of classification with better accuracy (Sathish et al., 2017).

The research on wavelet transformation of wavelets for the abstraction of features and seizure detection was presented. The observation states that the most performing method is wavelet transform. It produces the results efficiently with accuracy and sensitivity (Thakare & Gore, 2019).

A model which is hybrid for the detection of epileptic seizure (Subasi et al., 2019) was established. This work uses methods for optimization like Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). They search for optimal parameters for the task of EEG classification. The DWT is used for the analysis of EEG signals. The calculation of various statistical values of features includes averaging the power, the value of the mean, and standard deviation.

The analysis of frequency and its process uses the Fourier transform for the same. Since nature-wise, EEG signal is not stationary and not linear too; mostly Fourier Transform is not preferred one. Short-Time Fourier Transform (STFT) is a solution to over this (Deshprabhu & Shenvi, 2015).

Intelligent classifiers (Garde et al., 2018) discuss the performance of several classifiers for the classification of EEG data. It is observed that analysis of EEG data before the classification task leads to better performance.

Hybrid swarm intelligence for clustering with multiple objectives (Thakare & Kharche, 2017) was proposed which include the development of the hybrid algorithm for clustering tasks. The hybridization of PSO with traditional clustering algorithms is discussed.

The well-known technique for the classification of patterns is the Support vector machine (Ganji et al., 2019). Fuzzy Neural Network learning algorithms mostly focus on minimizing the risk. SVM gives excellent generalization performance by simultaneously minimizing both empirical and expected risks.

The differential evolution algorithm optimizes the weight of the base learner (Zhang et al., 2017) with an ensemble approach. Several weak learners are combined to improve the classification performance. In feature selection with differential evolution, one of the important steps is the elimination of noisy features (Hancer, 2020). This discovers meaningful knowledge. Differential evolution (DE) is one of the efficient methods for numerical optimization with an evolutionary approach (Shen et al., 2016). In classification, the multi-objective features (Xue et al., 2014) are selected with conflicting objectives that minimize the features by maximizing the accuracy. The cooperative micro-DE is developed (Parsopoulos, 2009) for optimization of a large amount of data. Hybridization of Differential Evolution of self-adaptive nature (Zhao et al., 2011) optimizes large-scale data. The approaches of improvement in DE concerning optimization of high-dimensional function; are discussed in the research article (Brest & Maučec, 2011; Wang et al., 2011).

Differential evolution (DE) is a scheme for identifying global optimal patterns that use a neighborhood-based mutation operator (Das et al., 2009) over continuous spaces but has slow convergence. A group of DE and all variants use the concept of the neighborhood which calculates the neighborhood of each member. The proposed work is about balancing the examination and utilization abilities of DE. It avoids granting burdens of evaluations of function.

The intelligent epileptic seizure model was proposed based on global search whale optimization algorithm and local search genetic algorithm for optimal subset neuro-features selection. In addition the metaheuristic algorithms was applied on extreme learning machine using differential evolutionary (DE) for weights and biases adaption (Anter et al., 2021b).

The examination of features for the detection of patterns misused is focused on intrusion detection systems (Chebrolu et al., 2005). The performance of feature selection algorithms is compared with various classifiers. It is found that the selection of significant input features is important to design a lightweight system. The hybrid architecture is designed for real-world intrusion detection.

The process of selection of features on bi-objective Differential Evolution (Das et al., 2017) is one of the most demanding tasks found in Data Mining and pattern recognition. An improved binary differential evolution simultaneously optimizes the feature selection criteria. The main contribution of this paper are:

  • The new hybrid method, Probabilistic Particle Swarm Optimization (PPSO) is designed to generate optimal subset of features.

  • Feature are extracted from EEG signals using the coefficient and the energy of the coefficients of the wavelet sub-bands

  • The state-of-the-art review of Significance of bio-inspired algorithms for feature optimization for EEG signals is studied and presented.

  • The future direction of proposed framework for large-scale data is discussed.

The comparative study of different methods are presented to detect the epileptic seizure disorder in Table 1.

Table 1 Comparative Analysis of some Existing Methods

3 Proposed model

The proposed new hybrid method is designed to generate optimal subsets of features. The optimal features are essential for the classification algorithm to work efficiently. This is due to high computations of complex transformed features which always results in a growing mass of feature sets.

The proposed PPSO has the capability of simple computation and rapid convergence as a stochastic search scheme. Algorithms for stochastic optimization are highly problem dependent. It is possible to propose computational complexity, but it won't tell you how much better one algorithm is than another. That depends on what you hope to accomplish. We Implemented PPSO algorithms and the time it takes to reach to satisfactory answer is observed.

The computational burden is majorly specified through optimization problem than algorithm. Selecting optimal feature set using probabilistic function is achieved in the said research. An algorithm is executed for a number of cycles and performance is observed. The objective function parameters change the complexity of an optimization problem and as a result the employed algorithm. The time complexity of PPSO is O(N2).

The topological neighbors of a particle are tracked by PPSO in order of best position. A new particle velocity is updated at each time step using the individual and global best positions. When PPSO is included in the traditional probabilistic function, the computation is simple and barely increases the computational strain. Premature convergence is also mitigated by the adaptability of PPSO to change the ratio of local to global exploration of the problem space. The number of iterations necessary to get a good outcome varied depending on the objective function. Too few iterations ended the search prematurely, whereas too many iterations resulted in unnecessarily increased computing complexity and required more time.

The proposed method is divided into two phases. In phase I, the features are extracted from the EEG signals using wavelet sub-bands to form the feature vector. In phase II, the optimal feature subsets of EEG signals which are non-stationary are extracted with Sequential Differential Evolution (SDE) methodology and compared with the optimal feature subsets extracted with new Probabilistic Particle Swarm Optimization (PPSO) algorithm.

Generally, the EEG signal computes different frequency bands and are namely the alpha, the beta, the delta, the gamma, and the theta. The sub-bands are decomposed, and the calculated parameters are called features.

In phase III, the performance of SDE and PPSO is evaluated with different classification algorithms which classify seizure and non-seizure EEG signals. Figure 1 represents the workflow of the proposed model.

Fig. 1
figure 1

Workflow of the proposed model. Here, DWT is Discrete Wavelet Transform, SDE is Sequential Differential Evolution, PSO is Particle Swarm Optimization, PPSO is Probabilistic Particle Swarm Optimization, NB is Naïve Bayes Algorithm, SVM is Support Vector Machine, DT is Decision Tree algorithm, RF is Random Forest Algorithm, K-NN is K Nearest Neighbour Algorithm

3.1 Data acquisition and signal decomposition:

In this subsection, the data acquisition process is discussed. The acquisition of EEG data is from the data repository of the Epileptology Section of the University of Bonn (Wang et al., 2011). There are a total of 500 individuals. The five subsets A to E have 100 1-channel EEG segments. Each file has brain action recorded for the time; 23 s. There are 4097 data points prepared from Time Series samples. Each individual in this dataset has 4097 data points. These are recorded for the time; 23 s.

In a second, each chunk has 178 data points. Therefore, row-wise; it has 23 multiplied by 500, which is equal to 11,500 pieces of information, and column-wise; in a second, 178 data points. There are two sets, A and B which include the segments, are extracted from the face of EEG recordings (Garde et al., 2018; Tzimourta et al., 2019). This data is about five health-wise fit volunteers. They were in an awake state with the situations of opened eyes (A) and closed eyes (B).

The EEG repository has a diagnosis of pre-surgical cases. These are taken from five dissimilar patients. Hence, we have other three sets (C, D and E). Seizure-free activity is in sets C and D, whereas; set E is tainted with seizure. Figure 2 shows different EEG signals for the five sets seizure and seizure-free activity.

Fig. 2
figure 2

Different EEG signals for seizure and seizure-free activity. Here set A is situations of opened eyes and set B is closed eyes, sets C and D are Seizure-free activities, set E is tainted with seizure

DWT produces estimated and complete coefficients. The signals are accepted through high pass and low pass filters. This is continued until the preferred occurrence is obtained. In DWT, the signals are decomposed into high and low-frequency sub-band.

Until decomposition of the entire signal, it is divided into sub-bands of high and low frequency. Approximation coefficients are the low frequencies coefficients and detail coefficients are with high frequencies. The decomposition of EEG data into various bands leads to its frequency range like delta, theta, alpha, beta, and gamma.

3.2 Feature extraction and feature selection

In literature, extraction takes place with the statistical and non-statistical domains. In this research, the features are computed for an alpha, beta and gamma band. These six features are evaluated, and the mean absolute value of the sub-band is with coefficients of wavelet. For each coefficient, a variance is taken of the wavelet in the sub-bands. A sub-band has both the standard deviation as the average power of the wavelet coefficients and the energy of the coefficients of the wavelet.

The information of frequency of the signal is calculated with Mean. The information regarding the frequency contents of the signal is received from average power. The magnitude of the variation is evaluated by computing the standard deviation in the signal frequency. The approximation A4 and details D1-D4 are derived from these features as shown in Fig. 3. The resulting feature vectors classify EEG data. Here, four levels are selected. To analyze the EEG recordings, the Daubechies wavelets are selected and experimented with. The extraction of coefficients of wavelet cannot be considered to input the classifier as there will be an increase in the classifier's dimensionality.

Fig. 3
figure 3

Four level wavelet decomposition of the EEG

The DWT transform has been successfully used for the research on seizure dataset. In this research, the data is preprocessed, and noise is removed using gaussian filter before applying DWT. Therefore, filtering process is not required to remove noise.

The features considered for selection may vary in classification accuracy. In the proposed work, SDE, PSO and PPSO select the prominent feature vectors.

3.3 Feature analysis using analysis of variance (ANOVA)

The one-way ANOVA test has been performed with one independent variable called attention and dependent variable, predefinedlabel. Statistics shows, degree of freedom (df) is 1.0 that represents maximum number of logically independent values, these values have the freedom to vary, in the data sample. The F statistic shows the variability among the samples and within samples is 0.03113.

Similarly, the one-way ANOVA test has been performed for categorical and numerical features (all features). The df and F values are calculated. Below Table 2 shows the statistics for 1 way ANOVA for all features.

Table 2 The statistics for 1 way ANOVA for all EEG features

3.4 Probabilistic particle swarm optimization(PPSO) for optimal feature selection

The parameters of the classification algorithm can be discovered using optimization methods. PPSO is a newly developed hybrid algorithm. It uses the probabilistic function to calculate the probability that an unknown pattern belongs to a particular class. The fitness function of PSO is the probability value calculated (Thakare & Kharche, 2017). The basic logic of PSO is modified for the classification task. PSO and PPSO both fall in the category of nature-inspired computing. PSO is popular for optimization whereas PPSO is developed and tested on various datasets for optimization results.

The Naive Bayes technique is used in PPSO to predict the class of the test dataset. When assuming independence holds and less training data is required, a nave Bayes classifier outperforms other classifiers such as logistic regression. The Naive-Bayes method, at its most basic level, assesses the probability of a feature's connection to a target variable and then selects the feature with the highest probability. This concept is applied to the construction of the Probabilistic Particle Swarm Optimization method, which is used to choose the best characteristics. Probability is determined and fed into the PSO algorithm's fitness function as an input. Figure 4 represents the workflow of probabilistic particle swarm optimization algorithm and Table 3 represents pseudo-code of probabilistic particle swarm optimization algorithm.

Fig. 4
figure 4

The workflow of probabilistic particle swarm optimization algorithm

Table 3 Pseudo-code of PPSO algorithm

3.5 Sequential differential evolution (SDE) for optimal feature selection

Differential Evolution (DE) is a technique of evolutionary computation to optimize the parameters of a problem and concerning a given measure of quality, tries to improve the candidate solutions in a search space through the number of generations. This method is proven to be the best method for optimizing the large search space. Like other evolutionary methods, SDE also results in the best possible solution or near-optimal solution.

The gradient of the problem which is being optimized is not the part of DE. It is represented as a real-valued function that is multidimensional. This states that the problem of finding optimal need not be differentiable. This is the need for the traditional methodology of optimization methods which include gradient descent and quasi-newton methods. DE is applicable on noisy, continuous, or change over time etc. optimization problems (Price et al., 2006; Rocca et al., 2011).

With the given population of aspirant solutions, DE works on the optimality of a problem and creates new aspirants or candidate solutions. This is done by combining existing with simple formulae and finally considering only that candidate solution that has the best score or highest fitness for the objective of the optimization problem. Therefore usually, the problem of optimization is considered as a black box that provides a quality measure for a given candidate solution. This is the reason; it doesn’t require gradient.

The Sequential Differential Evolution Algorithm extracts the optimal features from feature space. These optimal features are further evaluated for their performance with different classifiers.

4 Population formation

DE’s nearby all adaptable development have a paired vector populations. Both the populations enclose Npop T-dimensional vectors of the real-valued parameter. Suppose the population is denoted by Qx and is a collection of the vector components, Ym,g. These vectors are acceptable as initial points and compared to all the available vectors in the population.

$$ Qx,g = (Ym,g),\;\;m = 0,1,....Npop - 1,gt = 0,1,....gt\max , $$
(1)
$$ Yi,g = (Yn,i,g),\;\;n = 0,1,T - 1 $$
(2)

Here, we need to work with arrays and modular arithmetic so; indices starting with 0 are preferred. A generation of specific vector, g represents the index, and it starts from 0 to gtmax with the increment of 1. I is a population index that is assigned to all vectors. This has values from 0 to Npop – 1. Vectors along with their set of parameters are indexed from 0 to (T -1).

The operator mutation with DE is usually random. To construct an intermediary population, Qv,g, of Npop mutant vectors, Rm,g, randomly vectors are chosen once initialized.

$$ Qv,g = (Rm,g),\;m = 0,1,....Npop - 1,gt = 0,1,....gt\max $$
(3)
$$ Rm,g = (Rn,m,g),\;n = 0,1,T - 1 $$
(4)

The vectors in the present population are crossed with a mutant produced. In this way, a trial population, Qu, of Npop trial vectors, ui,g is generated.

$$ Qu,g = (u_{m} ,g),{\mkern 1mu} m = 0,1,....N_{pop} - 1,gt = 0,1,....gt\;{\text{max}} $$
(5)
$$ u_{m} ,g = (u_{n} ,m,g), \, n = 0,1,T \, - \, 1 $$
(6)

During crossover or recombination, the trial vectors mostly overwrite the mutant population when recombination happened. Therefore, a single array is capable of holding both the populations.

5 Initialization

When the population is initialized, the initial settings of parameters are important. One should specify for each parameter the upper and lower bounds before the initialization of the parameters. The initialization vectors are two-dimensional, and these values are collected in these vectors. The upper and lower bounds will be specified after initialization. In a prescribed range, a random number generator is used to assign a value to each parameter of the vector.

6 Mutation

The mutation is an operator which evolves the various target regions in the population. DE performs mutation and recombination to produce a newly generated population of Np trial vectors. This is referred to as a Differential mutation. This has highly scaled, sampled randomly, vector difference to a resultant third vector. This creates the mutant vector which is the blend of these three vectors which are randomly chosen.

The formula can be evolved by using scale factor, base vector, target vector and difference vector. A population is evolved at a rate that is controlled by a positive real number called the scale factor which has no upper limit. The effective values are greater than the base vector and target vector. There are various ways to determine the base vector index. Here, we assumed that it is just a vector index that is randomly chosen and is completely dissimilar to the target vector index.

The dissimilar vector indices are distinct from each other. These are also distinct from base and target vector indices. For each mutation, these different vectors are randomly selected.

7 Crossover

The crossover strategy is also called the recombination technique. To harmonize the mutation, which is differential in nature, the uniform crossover strategy is employed with the DE model. Crossover works with the parameter values of vectors which are taken from two dissimilar vectors in nature. DE performs crossover with all vectors with a mutant vector. A user can define the value of crossover probability to control the values of parameters. These are the values copied from the mutant. It is important to establish the various sources contributing to a given parameter. The output of uniform crossover and uniform random number generator are compared.

8 Selection

In each run, there is a comparison of the trial vector with the target vector with the objective function value. If this value of the trial vector is minor or equal to the values of the target vector, there is a substitute for the target vector. In a successive generation, the trailing vector becomes the target vector. DE compares every one trial vector with the target vector. The parameters are inherited from these vectors. DE is more powerful and can tightly put together crossover and selection over the existing evolutionary methods.

Upon successful installation of the new population, these processes of mutation, crossover and selection are repeated. This cycle is executed until the termination criteria or specific optimum is located. For example, one can write a source code to compare the number of generations with the present maximum, gmax.

8.1 Classification of optimal feature subsets

The decomposition of sub-bands of parameters calculates the features. The wavelet coefficients are considered for calculating the features. The feature selection process optimizes the feature subset which is employed as input. The classifier used these features for classification. Naïve Bayes, Support Vector Machine, Random Forest, k-Nearest Neighbor and Decision Tree algorithms are used for this research.

The features are extracted at various decomposition levels. The seizure abnormality is detected with classifiers and accuracy is measured.

8.1.1 Naïve Bayes algorithm (NB)

This algorithm easily and fastly predicts the class of test data set. In terms of training data, it works better compared to logistic regression and others. This algorithm calculates the relation of the feature with the target variable. This gives a notion to match the probability. The selected feature set has the highest probability (Anter et al., 2021b).

NB is a probability-based method of classification. It uses Bayesian theory. In search space, each feature of a given class is self-determining. Less training data for classification is required.

8.1.2 Support vector machine (SVM)

SVM is an algorithm that is supervised in nature. The initial training is with the tagged EEG feature set. These tags are with seizure or without a seizure. The samples of the training phase build a model to assign the input samples category-wise. This is considered a new set.

The points indicate training samples and are separated as a category. This constructs a hyperplane and sometimes a set with hyperplanes. There are supportive vectors nearby the hyper plane. SVM looks for promising hyperplanes. An objective is to raise the margin linking the support vectors. SVM encourages high-dimensional data and produces good results (Ho-Huu et al., 2018).

8.1.3 Random forest (RF)

The Random Forest is a widely used method. As per research studies, for complex data RF is the better choice for the classification task. If RF is trained systematically, it can optimize and predict various types of conditions of patients. RF is capable of classifying various degrees of predictions in epilepsy and works with multiple conditions. This model is never much responsive to the deletion of partial features. Therefore, the rule of classification becomes more robust (Rajesh et al., 2015; Wang et al., 2019).

8.1.4 K-nearest neighbor (k-NN)

K-Nearest Neighbor is a method for classification which is pretty straightforward and non-linear. Even with large training data, this algorithm performs well. This algorithm is simple as it won’t require the separate training and testing phases (Quintero-Rincón et al., 2018; Sathish et al., 2017).

8.1.5 Decision tree classifier (DT)

Among all learning algorithms, one of the opted algorithms for machine learning is a Decision tree. In the decision tree, nodes represent attributes or features, a decision rule is represented by branch or links and the outcome is towards each leaf. It is good in classifying observed data. Inverted tree representation is an outcome of the Decision Tree technique (Castro, 2006; Rajesh et al., 2015).

9 Result and analysis

This section discusses the result and analysis with several machine learning algorithms and performance measures. A wavelet-based method, Discrete Wavelet Transform (DWT) with four-level decomposition automatically detects the epileptic activity by extracting the major features which contribute to detection. Daubechies (db4) is selected for this work which calculates Mean, Variance, Standard deviation, Average power and Energy in each sub-band. These bands are alpha, beta and gamma bands. These bands reflect abnormalities of the brain activity and the availability of seizure disorder which are considered as major extracted feature vectors.

DWT with four-level decompositions extracts the feature vectors are further optimized with three methods namely, Particle Swarm Optimization (PSO), new Probabilistic Particle Swarm Optimization (PPSO) and Sequential Differential Evolution (SDE). The resulting features are selected and feature vectors are generated.

The machine learning algorithms classify the features extracted with DWT and selected with PSO, PPSO and SDE into two groups describing features accountable for seizures and features not contributing seizure description.

In the said research, the performance of various Machine Learning algorithms for classification of feature set with and without optimization are compared. The experimented machine learning algorithms are Naïve Bayes Algorithm (NB), Support Vector Machine (SVM), Decision Tree algorithm (DT), Random Forest Algorithm (RF) and K Nearest Neighbour Algorithm (KNN). Initially, the feature vectors are extracted with Discrete wavelet transform. The classification performance of the Support Vector Machine supersite other classifiers. Then the optimal feature set are selected using three algorithms: Particle Swarm Optimization (PSO), newly proposed Probabilistic Particle Swarm Optimization (PPSO) and Sequential Differential Evolution (SDE). The classification of these optimal feature set is compared using same set of classification algorithms. The results of all the feature optimization and selection methods are recorded for number of cycles and the statistical results are presented.

The classification accuracy is evaluated with the performance measures accuracy; sensitivity and specificity.

The performance of the proposed method is evaluated using measures for accuracy, sensitivity and specificity because confusion matrix, is also known as an error matrix. It allows visualization of the performance of the proposed algorithm and comparison with other algorithms. Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa).

figure a

Accuracy is calculated for training and validation. It gives correct classification rate using following Eq. (7). Here, True Positive (TP) is positive tuples that were correctly labeled. True Negative (TN) is negative tuples that were correctly labeled. False Negative (FN) is positive tuples that were mislabeled as negative and False Positive (FP) is Negative tuples that were incorrectly labeled as positive.

$$ {\text{Accuracy}} = \frac{TP + TN}{{TP + FP + FN + TN}}*100 $$
(7)

Sensitivity is the true positive rate or recall that measures the proportion of actual positives that are correctly identified and specificity is true negative rate which measures the proportion of actual negatives that are correctly identified. Specificity and sensitivity are calculated using Eqs. (8) and (9).

$$ Sensitivity\frac{TP}{P} $$
(8)
$$ Sensitivity\frac{TN}{P} $$
(9)

Table 4 depicts the results of the classification of feature vectors with DWT. These results are tabulated to compare the classification accuracy with and without optimization of the feature set.

Table 4 Classification of Feature Extracted with DWT

The Classifiers are evaluated for their performance. It is tabulated without using the process of feature selection. Discrete wavelet transform extracts the features from the EEG data. The classification performance of the Support Vector Machine supersite other classifiers with accuracy as a measure. These resultants are documented to make a comparison with the classification results of optimized feature vectors. In this section, further the results of feature selection methods, PSO, PPSO and SDE are compared with classification algorithms. The results of all the feature optimization and selection methods are recorded for cycles 20, 40, 60, 80 and 100. The SVM classifier outperformed giving accuracy of 97.53% with PSO, 97.74% with PPSO and 98.34% with SDE at 100th cycle, converged state.

The statistical results are presented in Tables 5, 6, 7, 8 and 9.

Table 5 Comparison of performance of feature selection algorithm at 20th cycle
Table 6 Comparison of performance of feature selection algorithm at 40th cycle
Table 7 Comparison of performance of feature selection algorithm at 60th cycle
Table 8 Comparison of performance of feature selection algorithm at 80th cycle
Table 9 Comparison of performance of feature selection algorithm at 100th cycle

10 Observations on results

The experimentation is done on the dataset described in subsection data acquisition and decomposition. The results of the experimentation are presented in Tables 2, 3 and 4. The algorithms are converged when the number of generations exceeds 100. Here the results of all the feature optimization and selection methods are recorded for cycles 20, 40, 60, 80 and 100. The performance is closely monitored and statistical results are tabulated in Tables 5, 6, 7, 8 and 9 respectively. Following are the detailed discussions.

  1. i.

    Among all the classifiers, Random Forest and KNN outperform when features are extracted from the dataset with DWT, as depicted in Table 4. Here the intention is to understand how classifiers perform with feature extraction.

  2. ii.

    The features extracted are now optimized with the evolutionary optimization techniques, PSO, PPSO and SDE. Here, the intention is to optimize the feature set and select the feature vectors. The performance is evaluated with several classifiers. The accuracy of classifying every instance as either seizure or non-seizure class is well tabulated in Tables 5, 6, 7, 8 and 9. It has been observed that the classification accuracy is improved with an optimized feature set than accuracy with features extracted with DWT.

  3. iii.

    Among all the classifiers, SVM outperformed giving an accuracy of 97.53% with PSO, 97.74% with PPSO and 98.34% with SDE at the 100th cycle. The results of PSO and its improved version PPSO are compared and it is found that PPSO supersite the PSO in terms of accuracy for almost all the classifiers. PPSO is the improved version of PSO and results have demonstrated the improved performance.

  4. iv.

    The performance of PPSO and SDE are compared. It is found that all the classifier results in the increased accuracy of classification for all the instances as seizure and nonseizure except Naïve Bayes and KNN. The performance of Naïve Bayes is increased with PPSO due to multiple probability calculations. PPSO uses a hybrid fitness function which is formed by the combination of probabilistic function and objective function of PSO. Therefore there is an improvement from PSO to PPSO but, with SDE the performance is degraded. KNN calculates nearest neighbors whereas these calculations show significant improvement if there is diversity in the feature set.

10.1 Comparisons among different deep neural network methods

This subsection compares the proposed method to various deep neural network methods (including the extreme learning machine (ELM) with Hard limit activation function, the cascade-forward neural network (CFNN), the recurrent neural network (RNN), the feed-forward neural network (FFNN) with 10 hidden layers and 10 maximum epochs, and the probabilistic neural network (PNN)) to find the most effective solution or non-linear EEG seizure detection problem. Table 10 displays the comparison's results using various measurements. The proposed method provided the most trustworthy findings, as shown in Table 10, followed by the ELM's results, which had an accuracy rate of about 94%, a sensitivity rate of about 39%, and a specificity rate of about 97%. These findings demonstrate that the proposed method accuracy was enhanced by the sequential differential evolution (SDE) and support vector machine (SVM) integration. Additionally, as demonstrated in Fig. 5, the proposed method outperforms deep neural network methods like (FFNN, CFNN, and RNN) and produces high accuracy with minimal error and computationally quick results. These findings demonstrate that the proposed method has superior performance and successful results in the challenging, non-linear dataset for epileptic seizures. See Table 10 and Fig. 5 for further details about various deep neural network methods results.

Table 10 Performance results of the proposed method and different deep neural network methods
Fig. 5
figure 5

Visual results for the performance of the proposed method and different deep neural network methods

11 Conclusion and future work

This article presents a broad study of the significance of bio-inspired algorithms for feature optimization for EEG signals. The EEG data of seizure patients have been used to evaluate the method.

The proposed PPSO is developed by the authors. There are various swarm intelligence techniques and many researchers are using them for optimization purposes. PSO is one of the best algorithms among them. Proposed PPSO is a hybrid version of PSO and pseudocode is presented in the paper for research community.

A new hybrid method is developed to select the optimal features. DWT extracts the features from EEG data with four levels of decomposition. The performance is examined with several classifiers. The newly developed Probabilistic Particle Swarm Optimization (PPSO) algorithm and Sequential Differential Evolution (SDE) Algorithm selects the optimal features from the feature space of EEG. The optimal features selected using PSO, PPSO and SDE are further evaluated for their performance with different classifiers. The classification results of PSO, PPSO and SDE are compared. The experimentation results demonstrated that among all the classifiers, SVM outperformed giving accuracy of 97.53% with PSO, 97.74% with PPSO and 98.34% with SDE at 100th cycle, converged state. It is observed that feature extraction and a further selection of optimal features result in improved classification accuracy of EEG data as seizure and non-seizure. In the future work, we will address the problem of parameter optimization to improve the convergence and performance of the proposed algorithm using iterative learning model (Tao et al., 2021). The proposed framework with ML accelerators may be used for large-scale data. It can be applied to other datasets with higher diversity to get the optimal classification results. The bigger size data sets may require specialized hardware to improve performance. Empirical Wavelet Transform (EWT) and Variational Mode Decomposition (VMD) methods build adaptive wavelets for signal decomposition and apply the filtering to remove the noise. The DWT technique is used for said research since it is a dataset independent method, provides better signal compression and data is well transferred into frequency domain. In future work, EWT and VMD will be used for feature extraction.