1 Introduction

The brain–computer interface (BCI) is a technique of establishing a channel for communication between the brain of the user and an external device without using the brain’s normal nerve pathways to other body parts [1]. It provides an advanced technology which can translate the intention of a user from the analysis of brain signals directly into corresponding commands to establish a communication channel directly between the human brain and external devices [2]. BCI is a multi-disciplinary research field involving neurology, rehabilitation engineering, human–computer interaction (HCI), signal processing and machine learning [3]. In recent years, the brain signals have been extensively analyzed and explored for BCI applications. Electroencephalography (EEG) is an important tool for recording functional brain activity. It is the most used signal acquisition technique for MI-based BCIs due to its simplicity and ease of use [4]. It offers better temporal resolution at a lower cost, which makes it popular among researchers [5]. Professor Hans Berger from Germany discovered in 1924 that electrical signals produced by the human brain could be recorded from the scalp using electrodes [6]. He developed the technique of electroencephalography (EEG) for fetching electrical signals of the brain. Although EEG is a popularly used mechanism for fetching brain signals, other techniques like magnetoencephalography (MEG) and electrocorticography (ECoG) can also be used to monitor the activities of the human brain. The availability of powerful computer equipment at lower costs and new insights into the functionality of the human brain has encouraged researchers to focus on developing new supplementary communication and control technology for patients suffering from neuromuscular disorders due to sclerosis, brain stroke, spinal cord injury, etc. [7]. The reliability of a BCI system for rehabilitation of such patients is of paramount importance. The safety of such systems can be enhanced by improvement in hardware as well as using advanced techniques such as machine learning to make such system intelligent and reliable.

BCIs can be used in a variety of areas including prosthetic limbs, mobility devices, robotics and device communication. These developments have led to improvements in the techniques of processing of signals emanating from the scalp during the performance of a specific type of mental task. The major objective of BCI research is to develop supplementary systems that allow disabled users to control artificial limbs and communicate with the outer environment. The electrodes are placed as per standardized international 10–20 system of EEG, on different parts of the scalp of the human subject to record the electrical activity of the brain [8]. These signals acquired from electrodes reflect the motor imagery (MI) activity of the subject, such as the hand, foot or tongue movements. [9]. For the operation of every BCI, a neurological control signal is required. Different BCI systems have been developed on the basis of this control signal, [10]. Most of the current BCI systems using EEG fetch their input from neurological phenomena such as P300 potential, event-related potential (ERP), Mu and/or beta rhythms with event-related synchronization (ERS) and event-related desynchronization (ERD), cognitive task-related EEGs, visual evoked potential (VEP) and slow cortical potentials (SCP). Subjects using the BCI induce brain activity pattern, by following the experimental protocol for that particular BCI approach. The protocol followed by the subjects can be MI movements and focusing on visual clue of flashing characters on screen. Motor imagery (MI) is a common paradigm used in BCI. It is an MI task in which a subject is cued to just imagine the movement of a specific limb, without actually executing its action. EEG signals are then recorded while executing multiple MI tasks of hands, feet, tongue movement, etc. [11]. Movement or even preparation for movement leads to a decrease in mu and beta rhythms, which is called ERD. In contrast to this, mu and beta rhythm increase after the movement is completed, which is called ERS. The ERD and ERS do not require the actual movement, but occur with mere imagination of movement, which is called motor imagery (MI), and hence can effectively be used for BCI. The efficiency of a BCI system depends on the choice of suitable algorithms for implementation of its various stages [12]. It is important to choose a suitable classifier from the range of well-known classifiers such as linear discriminant analysis (LDA), support vector machine (SVM), fuzzy logic (FL), k-nearest neighbor algorithm (KNN) and artificial neural network (ANN) for EEG classification [13]. For multi-class classification, selection of a particular classifier is a critical issue in the BCI system [14]. Linear classifiers are generally preferred for EEG classification due to their low computational complexity and better stability [15]. They are also less prone to overfitting problem as compared to nonlinear classifiers, especially when only a limited number of samples are available [16].The main objective of the BCI-based applications is to accurately translate the brain wave patterns extracted from the EEG signals into the desired machine commands. The objective of many researches is to enhance this accuracy of interpretation of the harvested EEG signals [17]. SVM is a popularly used classifier for MI-based BCI systems. It establishes an optimum hyperplane to separate different classes as far as possible [18]. It can implement multi-class classification and is immune to curse-of-dimensionality of data. Selection of suitable kernel parameters in SVM is of paramount importance to obtain classification accuracy results [19]. Parameters value can be varied to set the the boundary decision in the classifier [20]. This work presents selection of suitable kernel and setting optimal values of the kernel parameters to obtain the decision function, which enhances classification accuracy and overall performance of the MI-based BCI system. In this work, SVM with polynomial kernel (SVM-PK) approach is proposed for EEG signal classification in MI-based BCI system. The performance is improved by selecting the optimal values of the polynomial kernel, by using the grid-search method. These values are then varied to obtain better classification accuracy, which is evaluated by using K-fold cross-validation procedure. This work has improved the performance of MI-based BCI system by enhancing the classification accuracy of MI data, which is then compared with other methods executed on the same dataset, as reported in literature [21].

2 Related work

The goal of the classification stage in BCI is to automatically assign a class to the feature vector, which was extracted in the previous stage. It represents the mental task performed by the BCI user. Classification is obtained by executing algorithms called classifiers. The researchers have explored different methods to implement classifiers for identification of the class to which the feature vector belongs.

Garrett et al. [22] reported the results of implementation of one linear (LDA) and two nonlinear classifiers (NN and SVM) for classification of spontaneous EEG signals, while subjects performed five mental tasks. They concluded that SVMs provide a powerful method for data classification as it uses machine learning and artificial intelligence (AI), for systematic exploration of the EEG feature classification.

Kamousi et al. [23] proposed a novel approach of using source analysis for classifying MI tasks. They proposed two-equivalent-dipoles analysis for classification of signals recorded from 15 channels from sensory motor area of four subjects. They used noise normalization, spatial filtering, time–frequency analysis and independent component analysis for preprocessing of these signals and reported 80% classification accuracy.

Pfurtscheller et al. [24] reported that phase information and adaptive classification can improve the performance of a BCI and also reduce its training time. They reported that by use of high harmonics features for classification, the performance of a four-class BCI system can be improved. They also demonstrated that feedback can modify sensory motor rhythms and recommended powerful algorithms to search for electrode placement locations.

Bhuvaneswari et al. [18] reviewed different kernel functions in SVM, which is a machine learning method for classification of EEG signals in MI-based BCIs. They used ICA for preprocessing and removing the artifacts, to improve signal-to-noise ratio. They discussed the important role of kernel function in nonlinear separable methods, while using SVM for classification of EEG signals.

Ilyas et al. [25] reviewed the selection of appropriate algorithms for preprocessing, feature extraction and feature classification in a BCI system. They have discussed their advantages, disadvantages and current trends of BCI research.

Mahmood et al. [16] considered mu and beta frequency ranges of recorded EEG signals for MI-based BCI system. They employed CSP for feature extraction and SVM for classification of these signals. They evaluated their approach on dataset IIIa of BCI competition III, and observed improvement in classification accuracy for online BCI systems.

Arbabi et al. [26] compared the effect of different types of selected features and classification algorithms for classifying brain signals in MI-based BCI systems. The results showed that statistical features and signal energy in different frequency bands are among the most appropriate features, which can be processed for implementing a BCI system.

Zhang et al. [27] introduced both cascade and parallel convolutional recurrent neural network models for estimating the intended movements by analysis of raw EEG data. They evaluated their performance on a large-scale movement intention EEG dataset fetched from 108 subjects, and investigated the influence of the spatio-temporal information on the performance of the proposed BCI system.

Lotte et al. [9] surveyed existing literature and reported that there is a need for validation of techniques on off-line as well online BCI systems. They emphasized that calibration of such systems should improve their convenience and robustness aainst real-life noise in EEG signals. They suggested that the techniques used in BCI systems should be invariant over time, users and contexts. They recommended the use of new generation of BCI classification methods that process human feedback, so that they can adapt to user states, traits and skills.

The authors in [21] have proposed a fuzzy logic system (FLS)-based approach for multi-class MI data classification. They fused the fuzzy system with particle swarm optimization (PSO) method for improving the classification performance. They used CSP algorithm in the feature extraction phase to extract relevant discriminant features from multi-class EEG data. The learning process of an FLS is computationally intensive. Hence, they reduced the computational expense of the multi-class FLS-based BCI system by application of PSO to reduce processing time. They cross-validated the performance of the proposed FLS method on benchmark data sets, and suggested studying more efficient feature extraction and selection methods in future research to improve the classification performance of a BCI system.

The author in [28] has presented a deep learning approach for classification of MI-based BCI using an adaptive method to determine the threshold. The widely used common spatial pattern (CSP) method is used to extract the variance-based CSP features, which is then fed to the deep neural network for classification. They presented a framework for use of deep neural network (DNN) for MI-BCI classification and evaluated the effectiveness of the proposed framework on dataset IVa of the BCI Competition III.

Fig. 1
figure 1

Timing of the dataset 2a from BCI competition IV [31]

3 Dataset and methodology

The MI-based BCI paradigm is widely used in a variety of applications. It has shown better potential for rehabilitation of patients suffering from motor impairments. It can provide them with an alternative mechanism to communicate with the external world just by thinking of a motor task, without actually performing the movement. In this work, publicly available benchmark EEG dataset from BCI competition IV is used, to evaluate the classification accuracy of our proposed approach [29]. This dataset is extensively used by the BCI research community and contains four-class MI data which is described in the next section

3.1 Experimental paradigm

In this work, the data set 2a from the BCI competition IV [29] is used. This data set is publicly available for the research community and consists of EEG data recorded from nine subjects. Two sessions were recorded for a given subject on two different days. Each session consisted of six runs and each run consisted of 48 trials (12 trials for each motor imagery class). In each trial, a cue was shown on the screen instructing the subject to perform one of the four MI tasks using the left hand, right hand, both feet, or tongue movement [30].

A single session during the experiment consisted of 288 trials, 72 trials for each of the four MI tasks. Each trial started with a short sound (warning tone) and a fixation cross was shown on the computer screen. After 2 s, in the place of the fixation cross, a cue was shown (in a form of a small arrow) telling the subject to start the corresponding motor imagery task. After another 1.25 s, the arrow changed back to the fixation cross. The MI task is continued until the sixth second of the trial when the fixation cross disappeared. After that there was a short break where the screen was black again. The timing diagram of data acquisition is shown in Fig. 1.

4 Preprocessing to remove artifacts from EEG signals

The acquired data contain a lot of noise from external sources, hence it is necessary to remove these artifacts from EEG signals, in the preprocessing stage, which influences the performance of the overall BCI system [3]. The recorded data contain undesired signals such as electrooculography (EOG), electromyography (EMG), electrocardiography (ECG) and power line noise signals. The methods used for preprocessing depends on the noise levels present in raw signals as well as on techniques used in further processing of the data.

Simple frequency-specific filtering techniques are not sufficient to remove these noise signals due to their overlapping spectral characteristics and poor spatial resolution of EEG signals. Hence, sophisticated spatial filtering methods such common spatial pattern (CSP), principal component analysis (PCA) and independent component analysis (ICA) are popularly used in the preprocessing stage to reduce these noise signals and improve its spatial resolution. This stage aims at cleaning and denoising the recorded digital data for enhancing the relevant information embedded in the signals.

In this study, ICA is used to remove artifacts from EEG signals and isolate the required information from these signals. It is a computational method which separates signals from multiple sources into subcategories, based on their statistical independence [32]. It performs the separation of independent components by maximizing their statistical independence. ICA is applied to remove EOG, EMG and ECG artifacts from the acquired signals. In the dataset used in this study, there are 22 EEG channels and 3 EOG channels for recording the signals from electrodes placed on the scalp of the subject . ICA is used to remove three EOG channels related to the movement of the eyes [33].

4.1 Independent component analysis

ICA is often used for detection and removal of the eye, muscle, and line noise artifacts.

Fig. 2
figure 2

Flowchart for ICA data decomposition and back projection

The EEG activity observed at different electrodes placed on the scalp overlaps and generates some redundant information. ICA is used to separate the artifacts acqured from multiple electrodes.

ICA application to a matrix of EEG scalp data finds an ’unmixing ’ matrix of weights (W). This matrix is then multiplied by the scalp data matrix to generate a matrix of independent component (IC) activities, as shown in Fig. 2. EEGLAB toolkit is used in this study, for an automated version of the infomax ICA algorithm.

4.2 Feature extraction

The successful classification of MI tasks can be achieved by the successful extraction of the required features from the EEG signals. CSP is a popularly used feature extraction method for the MI-based BCI system. It establishes linear subspaces, so that the variance value of one of the projected class is enhanced to a maximum degree. Simultaneously, the variance value of the other class is reduced to a minimum. The optimal identified spatial filters are established by collective diagonalization of the two covariance matrices, which is calculated from two classes of the EEG signals. In the first step, the normalized covariance matrix of the fetched EEG signal E in each trial is calculated as:

$$\begin{aligned} R = EE'/(\mathrm{trace}( EE')', \end{aligned}$$
(1)

where E denotes \(n \times t\) matrix, n is the number of channels and t is the number of samples. The average of covariance matrices calculated from trials within a class, \({M}_a\) and \({M}_b\), is added to produce a comprehensive covariance matrix \({M}_c={M}_a+{M}_b\). The eigenvectors \({E}_c\) and eigenvalues \(\lambda \) of this covariance matrix result in whitening transform

$$\begin{aligned} W = \lambda ^{-1/2} E'_{c'}, \end{aligned}$$
(2)

where \({M}_{{c}} = {E}_{{c}} \lambda E'_{c'}.\) Then, \({M}_a\) and \({M}_b\) are transformed by

$$\begin{aligned} S_a = WM_aW' , S_b = WM_bW'. \end{aligned}$$
(3)

The values of \({S}_a\) and \({S}_b\) are calculated by using the same eigenvectors, in such a way that \({S}_a = U \psi _aU'\) and \({S}_b = U \psi _bU'\). U is calculated from the common orthonormal eigenvectors of \({S}_a\) and \({S}_b\) and the values of \(\psi _a\) and \(\psi _b\) represent the calculated diagonal matrices of eigenvalues, such that they add up to a total of 1.

Fig. 3
figure 3

Filter bank common spatial pattern [34]

Subsequently, both classes a and b are projected onto U1, which is the first eigenvector , leading to class a yielding the maximal value of variance and class b yielding the minimal value of variance. In contrast to this, when both classes are computationally projected onto the last of the eigenvector \({U}_m\), then the class a attains the minimal value of variance, whereas the class b attains the maximal value of variance. In implementation, only a few of the eigenvectors are selected, \(U*={U}_1,\ldots ,{U}_m,{U}_{N-m+1},\ldots ,{U}_N,\) wherein the value of m is low (\(m\ll N\)). The finally calculated projection matrix is represented as

$$\begin{aligned} P=U * W. \end{aligned}$$
(4)

The attained dimension of the original signals is reduced to ’2m’ as per the following equation:

$$\begin{aligned} f_p=\log \left( \frac{var(Z_p)}{\sum _{p=1}^{2m} var(z_p)}\right) , p=1,\ldots ,2m \end{aligned}$$
(5)

The logarithmic transformation method is used to extract normal distributed elements in f.

This work has focused on the motor and sensorimotor rhythms, which fall in the frequency band of 8–30 Hz, by using band pass filter to select the relevant band. The patterns of MI signals are distinguished by temporal/spectral and spatial filters. Hence, their optimization has a direct impact on the performance of a BCI system. Feature extraction is implemented by using FBCSP [34].

The FBCSP algorithm is illustrated in Fig. 3. It has stages of signal processing and then implementation of machine learning procedure on the processed EEG data. It first filters EEG signals in multiple frequency bands using a filter bank. It uses band pass filters in having a range of 0.5–40 Hz. Spatial filters are optimized, for each of the filter band, using the classic CSP algorithm. Finally, among the multiple spatial filters obtained, the best resulting features are selected using feature selection algorithms using mutual information-based feature (MIBIF) selection. The MIBIF method selects both the best spectral and spatial filters as each feature corresponds to a single frequency band and CSP spatial filter. It calculates mutual information for each of the feature and arranges it in decreasing order. The top k features are selected from this list for the next stage of classification.

4.3 Classification

Various classification algorithms can be used for a BCI system. The choice of such a classification algorithm depends on many factors including the BCI paradigm used and type of recorded input data. The efficiency of the classifier has a critical effect on the performance and accuracy attained by the BCI system. In this work, linear classifiers are analyzed. These types of classifiers use linear functions to demarcate different MI classes. LDA and SVM are two main linear classifiers used in MI-based BCIs. The LDA technique has a very low computational requirement which makes it suitable for an online BCI system. SVM is efficient for synchronous BCI due to its regularization property and immunity to the curse-of-dimensionality problem. In this paper, the performance of classifiers is analyzed and compared on dataset 2a of BCI Competition IV [15].

Fig. 4
figure 4

Separation of nonlinear data points [35]

4.4 Support vector machine

Support vector machine is a popular classifier for MI-based BCI systems to classify EEG signals. It establishes a hyperplane separating all data points belonging to one class from the ones belonging to other classes. It creates decision boundaries by using support vectors. It separates different classes by mapping the data to a higher-dimensional space. It tries to maximize the margins by using a kernel function [36]. It allows parameter adjustment to improve the classification rate. The values of the degree of the kernel and regularization parameter, represented as d and C, respectively, are chosen to adjust the balance between algorithmic complexity and number of non-separable points.

SVM increases the margin of separation between various classes and tries to reduce the classification error to a minimum, for data points represented by the respective slack variables, as represented in Fig. 4. The SVM for a k-class problem with n-training points can be represented as a minimization of

$$\begin{aligned} \begin{aligned} Q(w,b,\epsilon )=\frac{1}{2}\Sigma ^{k}_{j=1}w^T_iw_i+C\Sigma ^{n}_{i=1}\Sigma ^k_{j\ne {y_i};{j=1}}\epsilon _{ij} \end{aligned}, \end{aligned}$$
(6)

subject to

$$\begin{aligned} \begin{aligned}&w^T_{yi}\theta (x_i)+b_{yi} \ge w^T_j\theta (x_i)+b_j+1-\epsilon _{ij} \\&\epsilon _{ij}\ge 0,i=1,2,\ldots ,n \, and \, j\in {{i.2,\ldots ,k}}, j\ne y_i, \end{aligned} \end{aligned}$$
(7)

where \({x}_i\) is the vector of the ith data point, \({y}_i\) is the class for the ith data point , \(\epsilon _{ij}\) is the slack variable as a measure of error, and C is the regularization parameter for balancing error minimization and margin maximization. Figure 5 shows the slack variables for individual classes and formation of the classification problem.

Fig. 5
figure 5

SVM error representation [35]

The SVM classifier tries to reduce the value of \(k \times n\) slack variables while maximizing the k margins [35]. The multi-class classification function is represented by \(\hbox {argmax}_{j=1, \ldots , k}\) \({w}^{T}_{j}\phi {(x_i)+b_j},\) where a data point (x) is associated with class j. The classification score for the point x is maximized by using weights. The constraints represented in Eqs. 6 and 7 are converted into equivalent unconstrained formulation by Lagrange multipliers \(\alpha _{ij}\) and \(\beta _{ij}\)

$$\begin{aligned} \begin{aligned}&Q(w,b,\xi ,\alpha ,\beta )=\frac{1}{2}\Sigma _{j=1}^kw_j^Tw_j +C\Sigma _{i=1}^n\Sigma ^k_{j\ne y_i;j=1}\xi _{ij}\\&\quad -\Sigma _{i=1}^n\Sigma _{j \ne y_i;j=1}^k\alpha _{ij}((w_{yi}-w_j)^T\phi (x_i)\\&\quad +b_{yi}+b_j-1+\xi _{ij})\\&\quad -\Sigma _{i=1}^n\Sigma _{j \ne y_i;j=1}^k\beta _{ij}\xi _{ij} =\frac{1}{2}\Sigma _{j=1}^kw_j^Tw_j\\&\quad -\Sigma _\text {i=1}^n \Sigma _{j \ne y_i;j=1}^kz_{ij}(w^T_j\theta (x_i)+b_j-1)\\&\quad -\Sigma _{i=1}^n\Sigma _{j \ne y_i;j=1}^k(\alpha _{ij}\\&\quad +\beta _{ij}-C)\xi _{ij}, \end{aligned} \end{aligned}$$
(8)

where

$$\begin{aligned} z_{ij}=\Sigma _{j\ne {y_i;j=1}}^k\alpha _{ij} for j=y_i; \end{aligned}$$
(9)

otherwise,

$$\begin{aligned} =-\alpha _{ij}, \end{aligned}$$
(10)

and the conditions for optimality are:

$$\begin{aligned} \alpha _{ij}((w_{y_i}-w_j)^T\phi (x_i)+b_{y_i}+b_j-1+\xi _{ij})=0 \end{aligned}$$
(11)

for

$$\begin{aligned}&\mathrm{for} j\ne y_i,j=1,\ldots ,n \end{aligned}$$
(12)
$$\begin{aligned}&\beta _{ij}\xi _{ij}=0 \,\mathrm{for}\, j\ne y_i,j=1,\ldots ,k, i=1,\ldots ,n, \end{aligned}$$
(13)

in addition to \({Q}{(w,b,\epsilon ,\alpha ,\beta )}\) being minimized in \({w,b,\epsilon }\) (derivatives equal to zero). The dual formulation is obtained by reducing (3)–(6) using the kernel function k(x,y)=\(\phi (x)^T\phi (y)\). The dual formulation is to maximize

$$\begin{aligned} Q(\alpha )= & {} \Sigma _{i=1}^n\Sigma _{j\ne y_i;j=1}^k\alpha _{ij}\nonumber \\&-\frac{1}{2}\Sigma _{i=1}^n\Sigma _{j=1}^kz_{ij}z_{1j}K(x_i,x_1), \end{aligned}$$
(14)

subject to

$$\begin{aligned}&\Sigma _{i=1}^nz_{ij}=0 for j\ne y_i, j=1,\ldots ,n \end{aligned}$$
(15)
$$\begin{aligned}&0\le \alpha _{ij}\le C for j\ne y_i, j=1,\ldots ,k, i=1,\ldots ,n \end{aligned}$$
(16)

Finally, the decision function for class j is given by:

$$\begin{aligned} f_j(x)=\Sigma _{i=1}^nz_{ij}K(x_i,x)+b_j, \end{aligned}$$
(17)

and the classification is to assign class j to data point x which satisfies \(\hbox {argmax}_{j=1,\ldots ,k}{f}_j\)(x).

Fig. 6
figure 6

Flowchart of the proposed approach

The memory requirement and processing time affects the performance of an optimization technique. \(\hbox {SVM}^\mathrm{light}\)is an implementation of the SVM classifier. Its efficiency is enhanced by reducing its training time and suitable selection of kernel parameters.

Abe et al. [37] have improvised by including the bias term. The optimization is represented for n slack formulation by Eq. 18.

$$\begin{aligned} Q(w,b,\xi )=\frac{1}{2}\Sigma _{j=1}^kw_i^Tw_i+C\Sigma _{i=1}^n\xi _i \end{aligned}$$
(18)

subject to

$$\begin{aligned} \begin{aligned}&(w_{yi}^T - w^T_j) \phi (x_i)+b_{yi}-b_j \ge {1 - \xi _{ij}} \xi _{ij} \ge {0,1,\ldots ,n} \\&\quad and j\in {1,\ldots ,k},j \ne y_i \end{aligned} \end{aligned}$$
(19)

the dual formation is to maximize

$$\begin{aligned} Q{\alpha }=\Sigma ^n_{i=1}\xi _i-\frac{1}{2}\Sigma ^n_{i,l=1}\Sigma ^k_{j=1}Z_{ij}Z_{lj}\alpha _i\alpha _lK(x_i,x_l) \end{aligned}$$
(20)

subject to

$$\begin{aligned}&\Sigma ^n_{i=1}Z_{ij}\alpha _i=0 \mathrm{for} j\ne (y_i,j=1,\ldots ,k) \end{aligned}$$
(21)
$$\begin{aligned}&0 \le (n-1) \alpha _i \le C \mathrm{for} j \ne y_i,j=1,\ldots ,k,i=1,\ldots ,n.\nonumber \\ \end{aligned}$$
(22)

The class j decision function is given by

$$\begin{aligned} f_{j(x)}=\Sigma _{i=1}^nZ_{ij}\alpha _jK(x_i,x)+b_j. \end{aligned}$$
(23)

The regularization parameter (C) limits the value of learned weights as shown in Eq. 23. It performs a balancing act between margin maximization and slack minimization [35].

In this paper, \(\hbox {SVM}^\mathrm{light}\) is used as a classifier for multi-class MI EEG signals [38]. Its polynomial kernel parameters are then varied in a range of values, to attain better performance [39]. Parameter selection plays an important role to obtain accurate classification results [19]. The parameters values can be set, which have a direct effect decision boundary of the classifier [20]. This work focuses on selecting the kernel and then optimizing the values of its parameters to improve the interpretation capability of the decision function. It improves the classification accuracy and overall performance of an MI-based BCI system. The \(\hbox {SVM}^\mathrm{light}\) [40] is an implementation of SVM. It is used in this work, as it is designed for optimization problems.

Table 1 Classification accuracy
Fig. 7
figure 7

Classification accuracy

Table 2 Misclassification rate on dataset 2a of BCI competition IV
Fig. 8
figure 8

Misclassification rate

Table 3 Comparison of related work in literature
Table 4 Classification accuracy of the proposed approach and existing approaches [21] for BCI IV dataset 2a
Fig. 9
figure 9

Classification accuracy

4.4.1 Parameters selection

Parameter optimization of the selected kernel has a significant effect on the efficiency of the SVM classifier for multi-class EEG signals. In this paper, the polynomial kernel is selected due to its generalization capability. The degree of polynomial kernel characterizes the decision boundary. Eq. 24 represents the achieved decision function.

$$\begin{aligned} f_j(x)=\Sigma _{i=1}^nZ_{ij}\alpha _jK(x_i,x)+b_j, \end{aligned}$$
(24)

where b represents the bias term, \({x}_i\) is the ith feature vector and n is the number of feature vectors. The vector \(\alpha _i\) is the parameter decision boundaries and K(\({x}_i\),x) is a kernel function. The polynomial kernel function is represented as K(\({x}_i\), x) = \({(s\,a^*b+c)}{^d}\). The kernel makes distinction between multiple classes with significant margin, due to its flexibility [41]. C is a regularization parameter, representing the trade-off between maximization of the margin and the error on the training dataset. The grid search method using cross-validation is executed to assign different values to C from a wide range, to maximize average classification accuracy.

5 Proposed approach

The research work introduces a proposed approach based on SVM in the multi-class EEG signal classification as shown in Fig. 6. In this work, appropriate methods were used for implementation of various stages of the BCI system. The ICA is used for signal preprocessing to remove noise and artifacts from the acquired signals. The FBCSP method, which is a variant of CSP, is used for extraction and then selection of extracted features. The selected features are then processed by the classifier. Signal classification is performed by using SVM, and its appropriate kernel is chosen. The performance is enhanced by optimizing the parameters of its polynomial kernel. The optimal parameter values were searched using the two-step grid search method. The proposed approach (SVM-PK) is executed on dataset 2a of BCI competition IV, and its performance is evaluated by using the fivefold cross-validation procedure.

6 Results and discussion

The proposed approach is implemented on publicly available dataset 2a of BCI competition IV, and its performance is evaluated. The ICA was used for preprocessing to remove the artifacts. In this paper, FBCSP and SVM-PK are used for feature selection and classification, respectively, for MI-based BCI systems. The performance of the proposed method in terms of different statistical measures, such as classification accuracy and misclassification rate, is evaluated.

In the classifier stage, parameters of the polynomial kernel are assigned to different values to improve the classification accuracy. The degree of polynomial kernel (d) was assigned a value of 3 while varying the regularization parameter (C) by coarse grid search in steps of 10 between 0.1 to 100 [0.1,10,20,—-,90,100]. The classification accuracy of 0.664 was attained at all these values of C. To improve this further, fine grid search was used to find the optimal value of C, by lowering it in the neighborhood of 0.1 and attained a better average classification accuracy of 0.67 for \(C=0.001\) and 0.669 for \(C=0.01\), as shown in table 1. Thus, the classification accuracy was enhanced by finding the optimal values of C and d parameters of the polynomial kernel.

The Fig. 7 shows classification the accuracy for the nine subjects with different C values of the kernel. The misclassification rate was reduced to 0.329 as shown in Table 2 and Fig. 8.

It is shown in Table 3 that the classification accuracy of the proposed approach is improved as compared to its attained value using other approaches as reported in literature.

Classification algorithms of LDA, KNN, NB, Ensemble, FLS and SVM are evaluated, and their performance is reported in literature [21]. Their reported performance on dataset 2a of BCI competition IV is compared with our proposed approach as shown in Table 4 and Fig. 9. It is also compared with classification methods reported in [42], in which the authors have used SVM, NBPW, NBPW with FBCSP and PPTSVM as classifier methods. It is shown that the proposed approach of SVM-PK offers improved classification accuracy for subjects 1, 3, 7, 8 and 9, while the overall average accuracy improved significantly.

7 Conclusion and future scope

In this paper, SVM with polynomial kernel approach is proposed for the classification of multi-class MI EEG signals. The performance of the proposed approach was evaluated from dataset 2a from BCI competition IV. In the preprocessing stage, the ICA is employed for removing artifacts from the acquired EEG signals. The next stage of feature extraction and feature selection is implemented by the FBCSP method. The selected features are then provided to the classifier. Signal classification is performed by SVM with polynomial kernel and its parameters are varied to search for their optimal values, using the grid search method. The performance is evaluated using a fivefold cross-validation process, which indicates the reliability of the obtained results, as it yields better performance. The proposed approach attains an average classification accuracy of 0.67, which is more than that of other approaches executed on the same dataset, as reported in literature. It enhances the reliability and safety of a BCI system for rehabilitation, by improving its classification accuracy. Future research can also investigate different techniques for optimization of different classifier parameters to further improve the efficiency of a BCI system. Future researches can explore other methods for feature extraction and feature selection.