Introduction

Electroencephalogram (EEG) signals are generally used in the medical field to detect brain-related problems. Based on the results given by EEG signals, necessary actions are taken to resolve those brain-related problems. Our brain cells interconnect with one another via electrical signals, and these cells always remain active. EEG signals record the electrical changes in brain activity which is very useful in the diagnosis of brain-related problems. In EEG, a small-sized metal device called electrodes is placed over the scalp of the subject which records the electrical activity of the brain and passes it to the computer to store those records. The placement of the electrodes over the scalp plays an important role in fruitful diagnoses. Specialists like brain surgeons, psychiatrists, and neurologists realize that EEG is a beneficial diagnostic and can be helpful to predict certain clinical issues. EEG was fundamentally developed for the diagnosis of epilepsy. Epilepsy is one of the baleful neurological disorders in which the brain activity of a human becomes abnormal which causes seizures, sensation, and loss of cognizance as well. Caton disclosed the electrical activity of the brain for monkeys and rabbits in 1875 [1]. Then, Beck [2] studied the electrical signals of the mind for dogs and rabbits in 1890. Finally, Berger, the German psychologist and scientist recorded the EEG of a human for the very first time in 1924 [3, 4]. One of the most significant advantages of EEG is the ability to observe brain activity in real-time, at the millisecond level, which is not probable with other high-resolution imaging techniques [5]. EEG measures both amplitude and frequency [6]. Due to this advantage, EEG is very popular in the researcher’s community to deal with brain-related problems and ameliorate the diagnoses of different mental issues. But EEG data contains an enormous amount of noise whose effect should be avoided during modeling. After the removal of noisy data, many different machine learning approaches could be implemented on the processed data to detect brain-related problems. To select the most important features of EEG signals, several feature extraction techniques, namely, principal component analysis (PCA) [7], independent component analysis (ICA) [8], wavelet-transform, and others, are be performed.

EEG signals are extremely complex for a non-professional observer to interpret and draw conclusions about various brain-related issues. For this reason, there is a need for automatic EEG signal interpretation that can help in the early prediction of brain-related problems. In these days, the selection of a proper ML model is one of the most difficult tasks as there are many models available. Support vector machine (SVM) [9] is one such legendary algorithm which is based on the principle of structural risk minimization (SRM). SVM can be used for performing various tasks related to both classification and regression. SVM constructs a hyperplane that separates the data points based on the maximal margin hyperplane. One of the most important advantages of SVM is that it does not suffer from the problem of local minima unlike artificial neural networks (ANN). SVM can also work with very high dimensional data efficiently and does not suffer from the curse of dimensionality. Hence, SVM is used by researchers to solve an extensive range of real-world problems. Yeo et al. [10] applied SVM to detect the car driver’s drowsiness while driving a car using EEG data. Subasi and Gursoy [11] used ICA, PCA, and linear discriminant analysis (LDA) to extract the important features from the EEG data and then applied SVM to detect epileptic seizures. Recently, Afifi et al. [12], performed melanoma detection, using the SVM model. Although SVM has several advantages, the key demerit of SVM is that it takes high computational time for large-scale size datasets as it solves a large quadratic programming problem (QPP) [13].

To solve the problem of SVM and to improve the generalization performance, several variations of SVM have been suggested in the literature. Twin SVM (TWSVM) [14] is one such work which is influenced by generalized eigenvalue proximal SVM (GEPSVM) [15]. TWSVM searches for two non-parallel hyperplanes where each of the hyperplanes is close to one class and as far as possible from the other class. Whereas SVM solves a large QPP, TWSVM solves a pair of small QPPs. As a result, it lowers the computational cost and makes TWSVM nearly four times faster than conventional SVM. Several extensions of TWSVM are proposed by researchers such as least square TWSVM (LSTWSVM) [16], improved TWSVM [17], robust TWSVM [18], robust twin bounded SVM [19], and density-weighted TWSVM [20]. Peng developed a twin parametric margin SVM (TPMSVM) [21] in which two non-parallel parametric-margin (PM) hyperplanes are generated, which are solved by two smaller size SVM-type problems. TPMSVM constructs two PM hyperplanes so that each one decides the positive or negative PM, whereas this is not the case in TWSVM as discussed above, and the QPPs for these two methods are completely different. Peng et al. [22] proposed another variant of TPMSVM called structural TPMSVM (STPMSVM) where the structural information of data was taken into consideration. Furthermore, Peng et al. [23] found out that the decision function of traditional TPMSVM losses the sparsity, and hence, they developed another method, i.e., centroid-based TPMSVM (CTPMSVM) where the decision hyperplane becomes sparse as it optimizes the projection values of the centroid points of the target classes. Shao et al. [24] suggested another variant of TPMSVM termed least squares TPMSVM (LSTPMSVM). Recently, Gupta et al. [25] suggested a novel classifier which is based on TPMSVM and FSVM called fuzzy-based Lagrangian TPMSVM to analyze biomedical data.

Several studies in the literature prove that incorporating prior information about the data distribution to the classifier drastically improves the performance of the same. Universum data, along with the SVM classifier, serves as prior information about the data distribution in USVM [26]. It is believed that the Universum data should not belong to any of the concerned classes and must fall in between the target classes. The concept of Universum data is used to solve many real-world problems due to its higher generalization performance [27, 28]. But it cannot be concluded that the Universum data will always lead to a high generalization performance. Motivated by TWSVM and USVM, Qi et al. [29] developed a new methodology utilizing the benefits of both TWSVM and USVM called Universum TWSVM (UTWSVM). Richhariya and Gupta [30] used iterative UTWSVM to classify the facial expressions automatically. Recently, Zhao et al. [31] proposed an efficient non-parallel hyperplane-based USVM (UNHSVM) for classification. Furthermore, a fuzzy USVM has been proposed to enhance prior information by assigning weights to Universum points based on information entropy [32]. A reduced Universum TWSVM is implemented to address the class imbalance problems [33]. Recently Kumar and Gupta [34] proposed a novel Universum-based Lagrangian twin bounded SVM for EEG signal classification. Moosaei et al. [35] suggested an Universum parametric margin v-SVM for classification. Richhariya et al. [36] diagnosed disease using a new USVM based on recursive feature elimination (USVM-RFE). For solving the same problem, Richhariya and Tanveer [37] suggested a fuzzy Universum least squares TWSVM (FULSTSVM). Moosaei and Hladik [38] suggested a Lagrangian-based method for Universum twin bounded SVM. Ricchariya and Tanveer [39] suggested a novel angle-based Universum LSTSVM (AULSTSVM) for classification. Ganaie et al. [40] proposed a k-nearest neighbor weighted reduced UTWSVM for imbalanced data classification problems (KWRUTSVM-CIL). To take advantage of the interclass information, weight vectors are used in the KWRUTSVM-CIL’s corresponding constraints of the objective functions.

Inspired by the previous works of Qi et al. [29], Moosaei et al. [35], Richhariya et al. [36], and Peng [21] and to utilize the benefit of Universum data, a new variant of TPMSVM has been proposed in this paper called Universum-based TPMSVM (UTPMSVM). In UTPMSVM, the slack variables are taken in 2-norm rather than 1-norm which formulates a strongly convex problem; hence, it always leads to unique solutions. The proposed method contains regularization terms which prevent it from the problem of overfitting. The proposed UTPMSVM intends to generate two nonparallel hyperplanes, each of which decides whether the separating hyperplane has a positive or negative parametric margin. Like the UTWSVM, the UTPMSVM also solves two smaller-sized QPPs for this purpose rather than solving a larger one, unlike traditional SVM or USVM. In this paper, seizure EEG signals and healthy EEG signals are considered for the classification. Interictal data falls between the seizure and healthy signals. So, interictal data has been utilized as Universum data. As stated above EEG data contains lots of noise and outliers, to get rid of those problems, many feature extraction methods have been applied, namely, PCA, ICA, and wavelet transform. To prove the acceptability of the proposed classifier, it is applied to several well-known real-world datasets. The results of the proposed UTPMSVM are compared with USVM, UNHSVM, TPMSVM, and AULSTSVM. The Universum points used in our UTPMSVM method come directly from the EEG dataset. As the Universum, we use the interictal or seizure-free signals from the EEG dataset. This more effectively provides the necessary prior information to the TPMSVM classifier since the variation of the seizure-free state signal occurs between the variation of the healthy and epileptic EEG signals. Since our Universum data is not derived from training data, there are no outliers in the Universum data, and no noise from training data is present [26]. The main contributions of the work are as follows:

  • A novel UTPMSVM is proposed to classify seizures and healthy EEG signals.

  • UTPMSVM incorporates prior knowledge regarding the data distribution from interictal EEG signals.

  • Three different feature extractors have been applied to extract the most important features.

  • Statistical analysis is performed to reveal the superiority of the proposed method over other related models.

Related Work

In this section, we have discussed a few related models. They are USVM and TPMSVM. Moreover, the proposed UTPMSVM is also elaborated.

Universum Support Vector Machine (USVM)

Weston et al. [41] proposed a variant of classical SVM named Universum SVM (USVM) for binary classification problem by incorporating Universum data. Universum data are treated as non-examples that do not belong to any of the concerned classes. The idea of Universum is somewhat similar to the Bayesian idea. However, there is an important applied distinction between these two concepts. If there should arise an occurrence of Bayesian deduction, the earlier information is about the information on choice guidelines, whereas Universum data is the information about the collection of examples. The kernel function used here is \(k({z}_{p},{z}_{a})=\phi {({z}_{p})}^{t}\phi ({z}_{a})\) where \(\phi\) is the mapping function. The QPP of USVM can be expressed as,

$${min \ C}_{k}\sum_{a=1}^{2|k|}{\chi }_{a}+\frac{1}{2}||w|{|}^{2}+C(\sum_{p=1}^{m}{\sigma }_{p})$$

subject to,

$$\begin{array}{l}{y}_{p}\left(\phi {\left({z}_{p}\right)}^{t}w+g\right)\ge 1-{\kappa }_{p},{\kappa }_{p}\ge 0,\forall p=1,\dots \dots ..,m\\ {y}_{a}\left(\phi {\left({z}_{a}\right)}^{t}w+g\right)\ge -\varepsilon -{\chi }_{d},{\chi }_{d}\ge 0,\forall a=1,........,2 |k|\end{array}$$
(1)

where \(m\) is the total number of data-points; \(\chi ,\kappa\) are the slack variables; \(C,{C}_{k}\) are the penalty terms; \(k\) is the total number of Universum data-points; and tolerance value of Universum is \(\varepsilon .\)

Now, the dual primal problem is obtained using Lagrangian multipliers (LMs) and further implementing Karush–Kuhn–Tucker (KKT), is shown as,

$$max \sum_{p=1}^{m+2|k|}{\kappa }_{l}{\eta }_{p}^{*}-\frac{1}{2}\sum_{p=1}^{m+2|k|}\sum_{a=1}^{m+2|k|}{\eta }_{l}^{*}{\eta }_{a}^{*}k({z}_{p},{z}_{a}){y}_{p}{y}_{a}$$

subject to,

$$\begin{array}{l}0\leq\eta_p^\ast\leq C,\mathrm{where} \ \forall p=1,..........,m\\\kappa_p=1,\mathrm{where} \ \forall p=1,..........,m\\0\leq\eta_p^\ast\leq C_k,\mathrm{where} \ \forall p=n+1,..........,m+2\vert k\vert\\\kappa_p=-\varepsilon,\mathrm{where} \ \forall p=n+1,..........,m+2\vert k\vert\\\mathrm{and} \ \sum\limits_{p=1}^{m+2\vert k\vert}\eta_p^\ast y_p=0\end{array}$$
(2)

Now, let us suppose \(z\in {R}^{m}\) is a new test instance. The function that will determine the class label of that instance as,

$$f\left(z\right)=sign\left(\sum_{p=1}^{m+2|k|}{\eta }_{p}^{*}{y}_{p}k({z}_{p},{z}_{a})+g\right)$$
(3)

Twin Parametric-Margin Support Vector Machine (TPMSVM)

TPMSVM determines its non-parallel margin hyperplanes by solving a pair of QPPs. Let us suppose we have a binary classification problem where we have two different classes of data i.e.,  +1 and -1 respectively. Let us assume that the number of datapoints belonging to +1 class is \({k}_{1}\) and the number of datapoints belonging to \(-1\) class is \({k}_{2}\). Let, \({D}_{1}\) and \({D}_{2}\) be two matrices represents the datapoints belonging to +1 and -1 class.

For the linear case, TPMSVM constructs two hyperplanes:

$$f_1\left(z\right)=w_1z+g_1=0\;\mathrm{and}\;f_2(z)=w_2z+g_2=0$$
(4)

By introducing the positive and negative parametric-margin hyperplanes, data will be separated by TPMSVM if:

$$\begin{array}{c}{w}_{1}{z}_{i}+{c}_{1}\ge 0,i=\mathrm{1,2},........,{k}_{1}\\ {w}_{2}{z}_{i}+{c}_{2}\ge 0,i=\mathrm{1,2},........,{k}_{2}\end{array}$$
(5)

To find out the margins, we need to solve the following optimization problem:

$$min \frac{1}{2}||{w}_{1}|{|}^{2}+\frac{{\lambda }_{1}}{{k}_{2}}{e}_{2}^{t}({D}_{2}{w}_{1}+{e}_{2}{g}_{1})+\frac{{d}_{1}}{{k}_{1}}{e}_{1}^{t}\rho$$

subject to,

$$\begin{array}{c}{D}_{1}{w}_{1}+{e}_{1}{g}_{1}\ge 0-\rho \\ \rho \ge 0{e}_{1}\end{array}$$
(6)

and

$$min\frac{1}{2}||{w}_{2}|{|}^{2}-\frac{{\lambda }_{2}}{{k}_{1}}{e}_{1}^{t}({D}_{1}{w}_{2}+{e}_{1}{g}_{2})+\frac{{d}_{2}}{{k}_{2}}{e}_{2}^{t}\kappa$$

subject to,

$$-\left({D}_{2}{w}_{2}+{e}_{2}{g}_{2}\right)\ge 0-\kappa , \kappa \ge 0{e}_{2}$$
(7)

where \(\rho\) and \(\kappa\) are slack variables; \({d}_{1},{d}_{2}\ge 0,{\lambda }_{1},{\lambda }_{2}\) are the regularization terms; and \({e}_{1}\) and \({e}_{2}\) are the vectors of 1 s of appropriate dimension.

The dual QPPs of (6) and (7) is as follows:

$$max-\frac{1}{2}{\theta }^{t}{D}_{1}{D}_{1}{}^{t}\theta +\frac{{\lambda }_{1}}{{k}_{2}}{\theta }^{t}{D}_{1}{D}_{2}{}^{t}{e}_{2}$$

subject to,

$$e_1^t\theta=\lambda_1\;\mathrm{and}\;0\leq\theta\leq\frac{d_1}{k_1}e_1$$
(8)

and

$$max-\frac{1}{2}{\omega }^{t}{D}_{2}{D}_{2}{}^{t}\omega +\frac{{\lambda }_{2}}{{k}_{1}}{\omega }^{t}{D}_{2}{D}_{1}{}^{t}{e}_{1}$$

subject to,

$$e_2^t\omega=\lambda_2\;\mathrm{and}\;0\leq\omega\leq\frac{d_2}{k_2}e_2$$
(9)

where \(\theta\) and \(\omega\) are LMs. After solving Eqs. (16) and (17), we will get the vector of LMs, and then, we can compute \({w}_{1},{w}_{2},{g}_{1}\), and \({g}_{2}\). Finally, we use the following function to determine the class label of a new test instance \(z\in {R}^{m}\):

$$f(z)=sign\left(\frac{{w}_{1}z+{g}_{1}}{||{w}_{1}||}+\frac{{w}_{2}z+{g}_{2}}{||{w}_{2}||}\right)$$
(10)

Proposed Universum-Based Twin Parametric Margin Support Vector Machine

In this paper, we have proposed an efficient classifier called Universum-based twin parametric margin SVM (UTPMSVM) for classifying EEG signals. In the formulation of the proposed classifier, we have used the L2-norm instead of the L1-norm. Here, \(K({x}^{t},{M}^{t}){w}_{1}+{g}_{1}=0\) and \(K({x}^{t},{M}^{t}){w}_{2}+{g}_{2}=0\) are the two non-parallel hyperplanes which are measured by the primal problems of UTPMSVM,

$$min \frac{1}{2}(||{w}_{1}|{|}^{2}+{g}_{1}^{2})+{c}_{1}{e}_{2}^{t}(K(B,{D}^{t}){w}_{1}+{e}_{2}{g}_{1})+\frac{{c}_{2}}{2}{\xi }^{t}\xi +\frac{{c}_{3}}{2}{\psi }_{1}^{t}{\psi }_{1}$$

subject to,

$$\begin{array}{l}K(A,{D}^{t}){w}_{1}+{e}_{1}{g}_{1}\ge 0-\xi \\ K(U,{D}^{t}){w}_{1}+{e}_{u}{g}_{1}+(1-\varepsilon ){e}_{u}\ge {\psi }_{1}\end{array}$$
(11)
$$\mathrm{and \ }min \frac{1}{2}(||{w}_{2}|{|}^{2}+{g}_{2}^{2})-{c}_{4}{e}_{1}^{t}(K(A,{D}^{t}){w}_{2}+{e}_{1}{g}_{2})+\frac{{c}_{5}}{2}{\eta }^{t}\eta +\frac{{c}_{6}}{2}{\psi }_{2}^{t}{\psi }_{2}$$

subject to,

$$\begin{array}{l}K(B,{D}^{t}){w}_{2}+{e}_{2}{g}_{2}\le \eta \\ -(K(U,{D}^{t}){w}_{2}+{e}_{u}{g}_{2}+{\psi }_{2}\ge (-1+\varepsilon ){e}_{u}\end{array}$$
(12)

Assume that,

$$\begin{aligned}& u_1=\begin{bmatrix}w_1\\g_1\end{bmatrix},u_2=\begin{bmatrix}w_2\\g_2\end{bmatrix}, \\ & G \, =\lbrack K(B,D^te_2), H =\lbrack K(A,D^t)e_1\rbrack,\\ & V\, =\left[K\left(U,D^te_u\right)\right]\;\mathrm{and}\;E=(1-\varepsilon)e_u\end{aligned}$$

After substituting the above assumptions in Eqs. (14) and (15), it become,

$$\begin{aligned}max \ {L}_{1}=&\frac{1}{2}{u}_{1}^{t}{u}_{1}+{c}_{1}{e}_{2}^{t}G \ {u}_{1}+\frac{{c}_{2}}{2}{\xi }^{t}\xi +\frac{{c}_{3}}{2}{\psi }_{1}^{t}{\psi }_{t} \\ & -{\alpha }_{1}^{t}(H{u}_{1}+\varepsilon )-{\alpha }_{2}^{t}(V{u}_{1}+E+{\psi }_{1})\end{aligned}$$

subject to,

$$\begin{array}{l}H{u}_{1}\ge -\xi \\ V{u}_{1}+E\ge -{\psi }_{1}\end{array}$$
(13)

and

$$min\frac{1}{2}{u}_{2}^{t}{u}_{2}-{c}_{4}{e}_{1}^{t}H \ {u}_{2}+\frac{{c}_{5}}{2}{\eta }^{t}\eta +\frac{{c}_{6}}{2}{\psi }_{2}^{t}{\psi }_{2}$$

subject to,

$$\begin{array}{l}-G{u}_{2}+\eta \ge 0\\ -V{u}_{2}+{\psi }_{2}+E\ge 0\end{array}$$
(14)

The Lagrangian of Eqs. (13) and (14) is as follows:

$$max \ {L}_{1}= \ \frac{1}{2}{u}_{1}^{t}{u}_{1}+{c}_{1}{e}_{2}^{t}G \ {u}_{1}+\frac{{c}_{2}}{2}{\xi }^{t}\xi +\frac{{c}_{3}}{2}{\psi }_{1}^{t}{\psi }_{t}-{\alpha }_{1}^{t}(H{u}_{1}+\varepsilon )-{\alpha }_{2}^{t}(V{u}_{1}+E+{\psi }_{1})$$
(15)

and

$$\begin{aligned}{max \ L}_{2}=& \ \frac{1}{2}{u}_{2}^{t}{u}_{2}-{c}_{4}{e}_{1}^{t}H \ {u}_{2}+\frac{{c}_{5}}{2}{\eta }^{t}\eta +\frac{{c}_{6}}{2}{\psi }_{2}^{t}{\psi }_{2}\\ & -{\alpha }_{1}^{t}(-G{u}_{2}+\eta )-{\alpha }_{2}^{t}(-V{u}_{2}+E+{\psi }_{2})\end{aligned}$$
(16)

where \({\alpha }_{1}\) and \({\alpha }_{2}\) are the LMs.

From Eq. (15) we get,

$$\frac{\partial {L}_{1}}{\partial {u }_{1}}=0\Rightarrow {u}_{1}=H^{t}\alpha_{1}+V^{t}\alpha_{2}-c_{1}G^{t}e_{2}$$
(17)
$$\frac{\partial {L}_{1}}{\partial \xi }=0\Rightarrow \xi =\frac{{\alpha }_{1}}{{c}_{2}}$$
(18)
$$\frac{\partial {L}_{1}}{\partial {\psi }_{1}}=0\Rightarrow {\psi }_{1}=\frac{{\alpha }_{2}}{{c}_{3}}$$
(19)

After substituting the values of (17), (18), and (19) in Eq. (15), we achieve,

$$max \ {L}_{1}=-\frac{1}{2}{\alpha }^{t}\left[\begin{array}{c}\left(H{H}^{t}+{I}\!\left/{c}_{2}\right.\right)H{V}^{t}\\ V{H}^{t}\left(V{V}^{t}+{I}\!\left/{c}_{3}\right.\right)\end{array}\right]\alpha +\left({c}_{1}{e}_{2}^{t}G{Z}_{1}^{t}-[0 \ E]\right)\alpha$$
(20)

where \({Z}_{1}=\left[\begin{array}{c}H\\ V\end{array}\right]\).

Further, from Eq. (16), we get,

$$\frac{\partial {L}_{2}}{\partial {u}_{2}}=0\Rightarrow {u}_{2}={c}_{4}{H}^{t}{e}_{1}-{G}^{t}{\alpha }_{1}-{V}^{t}{\alpha }_{2}$$
(21)
$$\frac{\partial {L}_{2}}{\partial \eta }=0\Rightarrow \eta =\frac{{\alpha }_{1}}{{c}_{5}}$$
(22)
$$\frac{\partial {L}_{2}}{\partial {\psi }_{2}}=0\Rightarrow {\psi }_{2}=\frac{{\alpha }_{2}}{{c}_{6}}$$
(23)

After putting the values of (21), (22), and (23) in Eq. (16), we get

$$max \ {L}_{2}=-\frac{1}{2}{\alpha }^{t}\left[\begin{array}{c}\left(G{G}^{t}+{I}\!\left/{c}_{5}\right.\right)G{V}^{t}\\ V{G}^{t}\left(V{V}^{t}+{I}\!\left/{c}_{6}\right.\right)\end{array}\right]\alpha +\left({c}_{4}{e}_{1}^{t}H{Z}_{2}^{t}-[0 \ E]\right)\alpha$$
(24)

where \({Z}_{2}=\left[\begin{array}{c}G\\ V\end{array}\right]\).

After finding the values of the Lagrangian parameters, we can find the values of the parameters \({w}_{1},{w}_{2},{g}_{1},{g}_{2}\) using the following expressions:

$$\left[\begin{array}{c}{w}_{1}\\ {g}_{1}\end{array}\right]={H}^{t}{\alpha }_{1}+{V}^{t}{\alpha }_{2}-{c}_{1}{G}^{t}{e}_{2}$$

and

$$\left[\begin{array}{c}{w}_{2}\\ {g}_{2}\end{array}\right]={c}_{4}{H}^{t}{e}_{1}-{G}^{t}{\alpha }_{1}-{V}^{t}{\alpha }_{2}$$

Experimental Setup, Results, and Analysis

Experimental simulations have been performed on a 64-bit windows OS based computer with 4 GB RAM and i5 processor. We have non-linear (NL) kernel for the experiments. The Gaussian kernel which may be expressed as \(k({x}_{h},{x}_{j})=-\mathrm{exp}(-\mu ||{x}_{h}-{x}_{j}|{|}^{2})\), where \({x}_{h},{x}_{j}\) represents the samples, is used as the non-linear kernel. The \(C\) and \(\mu\) parameters of USVM, UNHSVM, TPMSVM, AULSTSVM, and UTPMSVM are selected from \(\left\{{10}^{-3},...,{10}^{3}\right\}\) and \(\left\{{2}^{-5},{2}^{-3},{2}^{-2},..,{2}^{3},{2}^{5}\right\}\). Also, the \({c}_{7}\) parameter of AULSTSVM is selected from \(\left\{0.1,0.2,....,1\right\}\). Moreover, the \(\varepsilon\) parameter for the proposed UTPMSVM classifier is chosen from \(\left\{0.1,0.3,0.5,0.7,0.9\right\}\). The classification performance, as well as the optimal parameters, is computed using a fivefold cross-validation method. The MOSEK optimization toolbox is utilized to solve the QPPs of USVM, UNHVM, and UTPMSVM [42]. The performance of the classifiers is evaluated using accuracy and F1-score which can be determined as

$$Accuracy=\frac{True \ Positive+True \ Negative}{True \ Positive+True \ Negative+False \ Positive+False \ Negative}$$
$$Recall=\frac{True \ Positive}{True \ Positive+False \ Negative}$$
$$Precision=\frac{True\;Positive}{True\;Positive+False\;Positive}$$
$${F}_{1}-score=\frac{2(precison\times recall)}{precision+recall}$$

EEG Signal Classification

We have considered the classification between the healthy and seizure signals in this work. The EEG signal dataset is collected from [43] which consists of Z, N, O, F, and S signals. Each set consists of 100 single-channel EEG signals that were each collected for 23.6 s at a sampling rate of 173.61 Hz. Five healthy participants’ surface EEG recordings are shown in sets Z and O, with their eyes open and closed, respectively. In sets N and F, the hippocampal formation of the opposing hemisphere of the brain and the epileptogenic zone, respectively, were the recording sites for five patients in the interictal stage. Seizure recordings from all the recording sites exhibiting ictal activity make up the set S for the ictal state. For N, F, and S, intra-cranial EEG recording is the mode used. N is used as Universum data that appears between seizure and healthy signals. The dataset S was compiled from seizure recording destinations with physiological movement. Each training and testing session contains 100 samples. One of the main reasons for the model’s poor performance is its high dimensionality. To solve this issue, we used common feature extraction techniques such as PCA, ICA, and wavelet transform (WT). NL kernels are used to reduce the dimensions in PCA. Few wavelet families are used to implement the discrete wavelet transform (DWT) [44] at various stages of decomposition. The approximation and decomposition coefficients are combined to form the function vector. The degree of decomposition for Daubechies wavelets db1, db2, db4, and db6 is set to level-3. Figures 1, 2, 3, 4 shows the decomposition of db1 and db2 at level 3 for N and S signals. Here, Figs. 1 and 2 indicate decomposition using db1 at level 3 for signals N and S respectively. Moreover, Figs. 3 and 4 indicate decomposition using db2 at level 3 for signals N and S respectively. 

Fig. 1
figure 1

Decomposition of N signal using db1 at level 3

Fig. 2
figure 2

Decomposition of S signal using db1 at level 3

Fig. 3
figure 3

Decomposition of N signal using db2 at level 3

Fig. 4
figure 4

Decomposition of S signal using db2 at level 3

PCA is used to reduce dimensionality in the case of ICA and WT. ICA is used in the same manner as in [34, 45]. The class discriminatory ratio (CDR) is used to filter the PCA components and select the most appropriate PCA components. The proposed framework is presented in Fig. 5. The EEG signal is passed as an input, and the features of the EEG signal are extracted using PCA, ICA, and wavelet transform. The extracted features are provided as input to the proposed UTPMSVM model. The classification performance is measured using accuracy and F1-score. Table 1 shows the classification accuracies of the USVM, UNHSVM, AULSTSVM, and proposed UTPMSVM, for different EEG signals with non-linear kernels. It is observed that our proposed UTPMSVM showed the best results in 15 cases out of 28 which indicate the supremacy of the UTPMSVM model compared to USVM, UNHSVM, and AULSTSVM to classify seizure and non-seizure signals. One can also notice that the mean accuracy of proposed UTPMSVM (74.2142) is increased by 9.599%, 0.629%, and 0.1929% compared to USVM (67.7142), UNHSVM (73.75), and AULSTSVM (74.07) models. Moreover, the mean F1-score of proposed UTPMSVM (0.7042) is increased by 2.578%, 2.176%, and 2.0156% compared to USVM (0.6865), UNHSVM (0.6892), and AULSTSVM (0.6903) models. Additionally, we have also shown the rank of the reported classifiers on different EEG signals in Table 2. Moreover, the mean ranks of the classifiers are exhibited in Fig. 6. It can be observed that the proposed UTPMSVM showed the lowest mean rank based on both, F1-score and accuracy, which reveals the effectiveness of the proposed model.

Fig. 5
figure 5

Block diagram of the proposed framework for classifying EEG signal

Table 1 Accuracy and F1-score of USVM, UNHSVM, AULSTSVM, and proposed UTPMSVM on EEG datasets
Table 2 Ranks based on Accuracy and F1-score of USVM, UNHSVM, AULSTSVM, and proposed UTPMSVM on EEG datasets
Fig. 6
figure 6

Mean rank comparison among the reported models based on the experiments on EEG datasets

Experiment on Real-World Datasets

Experiments have been performed on 30 real-world benchmark datasets to see the efficiency of the proposed UTPMSVM model. These datasets are taken from the UCI ML data repository [46] and KEEL imbalanced data repository [47]. The classification performance of UTPMSVM is compared with USVM, UNHSVM, TPMSVM, and AULSTSVM. The classification accuracies of USVM, UNHSVM, TPMSVM, AULSTSVM, and UTPMSVM with the optimal parameters and training time are shown in Table 3. It can be noticed that the proposed UTPMSVM shows comparable or better classification performance compared to other reported models. The mean accuracies of the models are also shown in the last row. It is observed that the proposed UTPMSVM shows the highest mean accuracy (87.5887%) compared to USVM (83.8189%), UNHSVM (85.2132%), TPMSVM (86.0933%), and AULSTSVM (85.384%). Further, the ranks based on their classification accuracies are shown in Table 4. One can notice that the proposed UTPMSVM shows the best performance in 17 cases out of 30 which shows the efficiency of the proposed UTPMSVM model. Moreover, the mean rank is presented in the last row of Table 4. It can be observed that UTPMSVM shows the lowest mean rank compared to USVM, UNHSVM, TPMSVM, and AULSTSVM.

Table 3 USVM, UNHSVM, TPMSVM, AULSTSVM, and proposed UTPMSVM results for real-world datasets
Table 4 Specific ranks and average ranks of USVM, UNHSVM, TPMSVM, AULSTSVM, and UTPMSVM for real-world datasets

Figure 7 shows the parameter insensitivity performance of UTPMSVM on Autism, Led7digit-02-4-5-6-7-8-9_vs_1, Vowel, and WDBC. It is possible to confirm that our suggested UTPMSVM’s performance is not dependent on the values of its parameters ε and µ. Extensive simulations reveal that the user-specified parameter µ does not significantly affect the performance of UTPMSVM.

Fig. 7
figure 7

ε and µ parameter insensitivity of UTPMSVM on a autism, b Led7digit-0-2-4-5-6-7-8-9_vs_1, c vowel, and d WDBC

Friedman Test for Statistical Comparison

The ranks of each classifier for the real-world datasets are shown in Table 4. It can be noted that the proposed method achieves the best average rank. To prove this argument statistically, the Friedman test [48] has been performed. Initially, we assume that all the methods are identical under the null hypothesis. Then, the Friedman statistic is computed using the given formula:

$${\chi }_{F}^{2}=\frac{12\times X}{p\times (p+1)}\left[\sum_{i=1}^{p}{R}_{i}^{2}-\frac{p{(p+1)}^{2}}{4}\right]$$

where X represents the total number of datasets and p is the number of classifiers. Here, X = 30 and p = 5.

$${\chi }_{F}^{2}\frac{12\times 30}{5\times (5+1)}[({3}{.516}^{2}+{3,216}^{2}+{3.266}^{2}+{3}^{2}+{2}^{2})-\frac{5{(5+1)}^{2}}{4}]\cong {16.62}$$

After that, \({F}_{F}\) value is computed as given below:

$${F}_{F}=\frac{(X-1){\chi }_{F}^{2}}{X(p-1)-{\chi }_{F}^{2}}=\frac{(30-1)\times 16.62}{30\times (5-1)-16.62}\cong {4}.662$$

The critical value (CV) of \(F \ (\mathrm{4,116})\) is 2.45 for the significance level \(\alpha = 0.05\). The value \({F}_{F}\) is greater than the CV. So, we reject the null hypothesis. As the null hypothesis got rejected, we can proceed with post-hoc test and for that Nemenyi post hoc test is used. Further, the critical difference (CD) is calculated as follows:

$$CD={q}_{\alpha }\sqrt{\frac{p(p+1)}{6X}}=\text{2.728}\sqrt{\frac{\text{5(5}+\text{1)}}{{6}\times 30}}\cong {1}.1137$$

The pair-wise difference of USVM, UNHSVM, and TPMSVM with UTPMSVM are \(\text{3.516 - 2}=\text{1.516,} \, 3.2146\text{ - 2}={1}.2146\), and \(\text{3.2666 - 2}=1.{2}66\) which is higher than the CD. We can conclude that UTPMSVM shows statistically better performance compared to USVM, UNHSVM, and TPMSVM. But, the pair-wise difference of AULSTSVM with UTPMSVM is 1, and therefore, it cannot be concluded that UTPMSVM shows statistically better performance than AULSTSVM. However, it can be observed from Table 4 that UTPMSVM shows low average rank than AULSTSVM.

Figure 8 shows the comparative results of the Nemenyi statistics among all reported classifiers based on mean ranks. The classifiers with higher and lower ranks are plotted on the right and the left, respectively. The classifiers within a horizontal line with a length less than or equal to the CD (1.1137) perform statistically identically.

Fig. 8
figure 8

Graphical visualization of the Nemenyi test. The CD is 1.1137

It is noticeable that UTPMSVM, AULSTSVM, and UNHSVM are on the right side of the graph which indicates the efficiency of these models. Moreover, while comparing the UTPMSVM and UNHSVM, it is observed that the horizontal line does not connect these two models. This indicates that the proposed UTPMSVM shows significantly better performance than UNHSVM. It can be further noted that UTPMSVM shows significantly better performance than USVM, TPMSVM, and UNHSVM. Moreover, the solid line connects UTPMSVM and AULSTSVM; hence, they are not significantly different.

Wilcoxon Test for Statistical Comparison

Additionally, to show the substantial difference between UTPMSVM and the other implemented classifiers, we further performed the two-tailed Wilcoxon signed-rank test (WST) [49]. The test outcomes are shown in Table 5. The second column of Table 5 shows the difference between UTPMSVM and USVM based on WST, where “x” denotes that the accuracy (ACC) of UTPMSVM is more than that of USVM, “y” indicates that the ACC of UTPMSVM is less than that of USVM, and “z” indicates that the ACC of UTPMSVM is similar that of USVM. The level of significance is taken as 0.05. The mean differences between UTPMSVM and USVM are also shown. It can also be noticed that the “p” value is less than the level of significance, which indicates the dominance of UTPMSVM over USVM. Similar conclusions can be derived from the third and fourth columns. As a result, it can be concluded that UTPMSVM’s accuracy distribution drastically differs from USVM, UNHSVM, and TPMSVM, demonstrating that UTPMSVM is significantly different from USVM, UNHSVM, and TPMSVM. However, it cannot be concluded that the accuracy distribution of UTPMSVM differs from AULSTSVM which is due to the “p” value being greater than the level of significance.

Table 5 Wilcoxon sign-rank test comparison

Conclusion

We suggest a new Universum-based twin parametric margin SVM (UTPMSVM) for EEG signal classification problems. The suggested model shows an improvement in generalization performance over the already existing TPMSVM model for real-world as well as EEG signal classification problems. It is well known that the Universum samples work as prior information about the distribution of data. Hence, Universum-based models are suggested for classifying the EEG signals. In addition, to diminish the influence of noise from the EEG signals, we have used several feature reduction algorithms as a pre-processing step. To validate the efficiency of the proposed UTPMSVM, and its classification performance with USVM, UNHSVM, and TPMSVM. The results portray the efficacy of the proposed models for both EEG and other real-world datasets. Further, statistical analyses confirm the dominance of the proposed UTPMSVM over other models. The basic drawback of the UTPMSVM is that due to the incorporation of 2-norm, its sparsity is lost. Improving the sparsity of the UTPMSVM could be an interesting aspect of future work. Moreover, one can remodel the UTPMSVM for solving the multiclass classification problem in future. In addition to that, taking inspiration from the recent works of Tanveer et al. [50] and Ganaie et al. [51], we can improve the model to deal with large-scale as well as noisy datasets. Also, deep learning-based strategies can be embedded with our proposed model for the multiclass classification of EEG signals.