Introduction

For people with neurological disorders, brain computer interface (BCI) has provided a potential way to establish communication and restore lost motor functions by translating their brain signals into device commands. BCI has garnered much interest among researchers due to its practical applications in computers, virtual gaming, assistive appliances, speech synthesizers, and neural prostheses [1,2,3,4,5]. Several brain modalities have been used to measure brain signals such as magnetoencephalography (MEG), functional magnetic resonance imaging (fMRI), positron emission tomography (PET), electroencephalography (EEG), etc. Among these modalities, EEG based BCI is the most widely used modality for analysis of brain signals due to its non-invasive nature, low measurement cost and high resolution. In EEG based BCI, electrodes are placed on the scalp of the brain to capture electrical signals, that are generated by the neuronal activity of brain, for the purpose of communication. Major EEG based BCI paradigms include P300, visually evoked potential, sensorimotor rhythms (motor–imagery) [6], etc. Among these, particular attention has been received by motor-imagery based BCIs, which involve visualization of the movement of a specific body motor part [7, 8]. Motor imagery BCIs use variations in sensory motor rhythms (μ and β rhythms) to translate brain signals into control commands [9]. These variations are detected over the sensorimotor cortex and are induced by execution or imagination of hand or leg movement [6]. The amplitude of sensorymotor rhythms reduces during motor imagination or execution which is known as Event-Related Desynchronization (ERD). The subsequent increment in the amplitude of sensorymotor rhythms instantly after the execution or imagination of movement is called as Event-Related Synchronization (ERS) [6, 10,11,12].

These brain signals undergo volume conduction which provides weak spatial resolution [6, 11]. To analyze a single trial EEG, a BCI device is tuned to the subject specific characteristics by calculating data dependent spatial filters. CSP method [2, 13, 14], a data-driven spatial filtering technique, is a widely used spatial filter method in motor imagery BCI. CSP is aimed at finding the spatial filters from multichannel EEG signal which maximize variance of one class and at the same time minimizes variance of the other class [2, 6, 14]. The method is computationally simple and reflects the specific activation of cortical areas by assigning weights to the electrodes according to their importance. It also reduces the dimension of the data.

The major drawback of the CSP method is that it is sensitive to the presence of artifacts in raw EEG data and the non-stationary nature of EEG signals. Further, CSP suffers from the small size problem as the number of task-related trials is less as compared to the number of electrodes [15]. The size of covariance matrices obtained using CSP is of the order O(N2), where N is the number of electrodes. If the covariance matrices in CSP are estimated with a relatively small number of trials, the presence of a single trial contaminated with artifacts may not provide an appropriate CSP filter. In such situation, CSP suffers from the problem of overfitting [16, 17] and thus leads to poor performance. A variant of CSP, Stationary CSP (SCSP), is suggested in literature [17] that can handle the artifacts and non-stationarity of EEG data.

However, CSP as well as SCSP, in its original formulation, do not consider the spectral information of the signal to derive the spatial filter. It is pointed out in the literature [13, 14] that a specific set of frequency bands helps in discriminating the two different motor imagery tasks. A set of the frequency range for EEG is defined according to its distribution over the scalp or biological significance [6, 18]. These frequency bands are referred as delta (1–4 Hz), theta (4–7 Hz), alpha (7–12 Hz), beta (12–30 Hz), and gamma (30–40 Hz). Most of the relevant information gained from motor imagery signals lies in mu (7–12 Hz) and beta (12–30 Hz) bands of brain EEG which typically falls in the frequency bands of 7–30 Hz [2]. CSP spatial filters are thus applied to EEG signals filtered from these relevant frequency bands (mu and beta bands) to optimize the performance of a motor imagery BCI. It is possible that frequency subbands other than mu (7–12 Hz) and beta (12–30 Hz) are more relevant to distinguish motor imagery tasks. Manual tweaking, as well as exhaustive search, can help in determining the best frequency bands, but this is computationally intensive. It is thus desirable to automatically find optimal subject-specific frequency bands that relate to brain activities associated with motor imagery tasks in order to achieve higher accuracy.

In literature, several research works [19,20,21] suggested methods to determine spatial filters from a predefined filter bank of fixed sized non-overlapping subband frequency filters. Working in this direction, Novi et al. (2007) proposed Subband CSP (SBCSP) [19] that used data filtered from different fixed sized subbands to extract features by applying CSP followed by linear discriminant analysis to distinguish motor imagery tasks. The features from subbands were ranked using recursive feature elimination (RFE) method based on SVM. In Filter Bank CSP (FBCSP) method [20], maximal mutual information criterion was used to select optimal spatio-temporal filters from data filtered using different fixed size frequency bands. SBCSP and FBCSP methods employ manual setting of fixed sized (bandwidth 4 Hz) frequency subbands in the range of 4–40 Hz. Thus, any other relevant subbands that can possibly be present in the given frequency band range were not explored in these two methods, which may lead to poor performance.

The efficacy of the CSP/SCSP also depends on the choice of time segment of the EEG taken relative to the visual cue presented to the subject [22, 23]. Typically, time segment of 1 s after the cue is taken for computation of CSP/SCSP spatial filters. However, the generation of motor imagery related EEG rhythms varies with the subject involved and relevant time segment cannot be identified manually [22]. Thus, there is a need to identify subject specific and task related frequency filter and relevant time segment EEG data for better performance of motor imager tasks.

To determine the subject specific optimal frequency bands, the research work [21] has proposed the Combined Variable Sized Common Spatial Patterns method (CVSCSP) that generates a variable size frequency subbands filter bank. However, CVSCSP is not able to detect the irregularities in performance of a given subject that arise due to use of irrelevant time segment of a trial. Further, it cannot handle the artifacts and non-stationarity of the signal. In this paper, we have proposed a modified version of the CVSCSP that is more robust to artifacts and include utilizes relevant temporal features. In the proposed method, in order to capture the relevant temporal features for a given subject, we segment the data from each trial into three different overlapping time segments. The obtained data from each time segment is then bandpass filtered using the variable size frequency subbands. Spatial features are extracted using SCSP to handle artifacts and nonstationarity from the bandpass filtered data of different time segment separately. Finally, the extracted features are combined to form a high dimensional feature vector. Thus, the proposed model is able to take an advantage of temporal, spatial and frequency information of the data simultaneously. However, the high dimension feature vector obtained may contain irrelevant features. In order to obtain relevant subset of features, univariate feature selection method is used to rank the obtained features. We have investigated four well-known univariate feature selection methods to rank features.

The proposed method involves four phases: In the first phase, we segment the raw data into overlapping time segment data, generate filter bank of variable sized frequency subbands to filter the data. In the second phase, a combination of SCSP and linear discriminant analysis is used to compute features from filtered data. The obtained features are ranked using univariate feature selection method in the third phase. Finally in the fourth phase, a classification model is learnt using the ranked features. We have also performed Friedman statistical test [24] to determine the statistical difference among the proposed method and the existing methods i.e. CSP, SBCSP, FBCSP and CVSCSP.

The major contributions of this paper include:

  1. (a)

    The proposed method utilized relevant temporal, spectral and spatial information to distinguish motor imagery tasks.

  2. (b)

    Four univariate filter feature selection methods are investigated to find a reduced subset of relevant features for motor imagery tasks classification.

  3. (c)

    The performance of the proposed method is compared with the existing methods on two publicly available datasets. Friedman statistical test is employed to show that the proposed method statistically significantly outperformed the existing methods.

Rest of the article is organized as follows: “Related works” section includes related research works of motor imagery BCI. In “Combined variable sized subband and temporal filter based stationary common spatial patterns (CVSTSCSP)” section, we discuss the proposed method. “Experimental data and results” section illustrates the experimental setup and results. Finally, in “Conclusion and future directions” section, conclusion of the article with some future insights are discussed.

Related works

CSP is one of primary spatial filtering techniques used in the area of motor imagery BCI. However, CSP performs poorly due to problems like non-stationarity of EEG signals, artefacts generated from eye movements, electromyographic activity or any other muscular movement, irrelevant frequency filtering, etc. [2, 17]. CSP variants like common spatio-spectral pattern (CSSP) and common sparse spectral spatial pattern (CSSSP) include time delay embedding to optimize spectral filters simultaneously with the optimization of CSP filters. These methods are able to overcome some of the limitations faced by CSP. However, due to multiple time delay embedding and regularization of classifier parameters, space and time complexity of these techniques is quite high [25]. Another variant of CSP i.e. the stationary common spatial pattern (SCSP) has been proposed to reduce the effect of non-stationary characteristics of the EEG signal by introducing a penalty term in the CSP’s target function [17]. Further in this direction, to take care of non-stationary and variable nature of EEG signals, non-homogenous spatial filters (distinct frequency and time dependent spatial filters) have been used [26]. In a similar kind of research work [23, 27], spatial and spatio-spectral filters are estimated by a generalized CSP framework using an optimization constraint and specific target function for improving classification performance and reducing the instability caused by non-stationarity of EEG signals. Subject transfer based composite local temporal correlation CSP has been proposed to deal with noise and inter subject variability using the concept of local temporal based covariance matrices and composite approach based subject in the research work [9]. All these methods, performs simultaneous optimization of spatial or spectral filters within CSP optimization criterion.

On the other hand, instead of simultaneous optimization of a spectral filter within CSP, some of the other variants of CSP select significant features from multiple frequency bands to improve classification performance. Subband CSP [19] extracted spatial filters features from a non-overlapping fixed size subband frequency filter bank and used LDA score fusion for classification. SBCSP uses RFE SVM based feature selection to remove irrelevant subband features. In the research work [20], mutual information has been used as a feature selection criterion that considers the nonlinear correlation between the features from different frequency bands and the class variable. In another research work [28], sparse filter band CSP has been proposed that uses overlapping fixed sized subbands of a frequency band to optimise the CSP feature selection with the lasso estimate. In the similar research direction, spatial spectral filters are optimised in [29] with the aim of minimizing Bayesian classification error while maximizing mutual information among the frequency bands. However, all these variants of CSP, uses fixed size subband filters for feature extraction. The research work proposed by [23] have suggested the use of backtracking search optimization algorithm for relevant frequency band and time segment selection for motor imagery BCI. However, evolutionary algorithms are computationally intensive. Also, these methods require tuning of more number of parameters such as kind of selection, crossover operator, population size, fitness function to achieve optimal solution [30]. Hence, evolutionary algorithms are not suitable for real time BCI application. Further, most of the research work discussed in literature considers features based on temporal, spatial and spectral content separately or in combination of two and not all of the three simultaneously.

Combined variable sized subband and temporal filter based stationary common spatial patterns (CVSTSCSP)

Figure 1 shows the flow diagram of the four phases used in proposed CVSTSCSP model. A brief description of each of these phases is given below:

Fig. 1
figure 1

Flow Diagram of the proposed CVSTSCSP model

Preprocessing

  • Data segmentation

In the proposed method, we segment the raw data into three overlapping time windows [TS1:0.5–2.5, TS2: 1.0–3.0, TS3:1.5–3.5]. The data from time window [0.0–0.5 and 3.5–4.0] has not been used to avoid the overlapping from resting data [22].

  • Generation of Frequency Band and Bandpass Filtering

In this step, we generate frequency subbands of variable size from a defined frequency range, a minimum bandwidth and a defined frequency granularity i.e. the length between the central frequencies of two contiguous bands. The generated set of frequency subbands act as a filter bank of variable size subbands. The process to generate variable sized frequency filter bank explicitly does not need information of relevant subbands to distinguish given two motor imagery tasks. As an example, various subbands generated for frequency range (7–32), a minimum bandwidth and granularity of 5 Hz can be seen in Fig. 2. These variable sized set of subbands act as overlapping subband filter bank. The data obtained from each of the time segment is bandpass filtered from this filter bank of variable sized subband filters.

Fig. 2
figure 2

Various subbands obtained from minimum bandwidth (5 Hz.) and granularity (5 Hz.) in a frequency range (7–32 Hz)

Feature extraction

SCSP

To extract relevant features filtered data, stationary common spatial patterns (SCSP) [17] technique is utilized, which is a variant of CSP spatial filtering technique. In CSP, spatial filters are derived from the simultaneous diagonalization of the covariance matrices of the EEG signal data from each class that can be achieved by solving the optimization problem of the following Rayleigh criterion maximization function:

$$ \mathrm{R}\left(\mathbf{w}\right)=\frac{{\mathbf{W}}^{\mathbf{T}}{\boldsymbol{\Sigma}}_1\mathbf{W}}{{\mathbf{W}}^{\mathbf{T}}\left\{{\boldsymbol{\Sigma}}_1+{\boldsymbol{\Sigma}}_2\right\}\mathbf{W}} $$
(1)

where Σ1and Σ2 are normalized average covariance matrices of class1 and class 2 respectively and W is a spatial filter matrix. However, CSP suffers from problems like presence of artifacts and stationarities within the signal. In SCSP, to minimize the effect of non-stationarity, a measure of stationarity has been used which is given by the sum of absolute differences between the projected average variance of all trials and the projected variance in each trial. The difference between normalized average covariance Σ1or Σ2 and the covariance matrix of each trial k for each class is given by:

$$ {\varDelta}_{\mathbf{c}}^{\left(\mathbf{k}\right)}=\mathrm{s}\left({\boldsymbol{\Sigma}}_{\mathbf{c}}^{\mathbf{k}}-{\boldsymbol{\Sigma}}_{\mathbf{c}}\right) $$
(2)

where s is an operator for making symmetric matrix positive definite. The average difference matrix for class c (c = 1, 2) is given as:

$$ {\overline{\varDelta}}_{\mathrm{c}}=\frac{1}{\mathrm{K}}{\sum}_{\mathrm{k}=1}^{\mathrm{K}}{\varDelta}_{\mathrm{c}}^{\mathrm{k}} $$
(3)

The modified Rayleigh Criterion maximization function is given as:

$$ \mathrm{R}\left(\mathbf{w}\right)=\frac{{\mathbf{W}}^{\mathbf{T}}{\boldsymbol{\Sigma}}_1\mathbf{W}}{{\mathbf{W}}^{\mathbf{T}}\left\{{\boldsymbol{\Sigma}}_1+{\boldsymbol{\Sigma}}_2\right\}\mathbf{W}+\upalpha \mathrm{P}\left(\mathbf{W}\right)} $$
(4)

where \( \mathrm{P}\left(\mathbf{W}\right)={\mathbf{W}}^{\mathrm{T}}\left({\overline{\varDelta}}_1+{\overline{\varDelta}}_2\right)\mathbf{W} \) is penalty term and α is a constant determined using method proposed in [31]. The transformed matrix Z for a given trial X, is given as:

$$ \mathbf{Z}=\mathbf{WX} $$
(5)

Feature fp is computed as:

$$ {\mathbf{f}}_{\mathrm{p}}=\log \left(\frac{\operatorname{var}\left({\mathbf{Z}}_{\mathbf{p}}\right)}{\sum_{\mathrm{p}=1}^{2\mathrm{r}}\operatorname{var}\left({\mathbf{Z}}_{\mathbf{p}}\right)}\right) $$
(6)

where Zp are the first and last r rows of Z.

LDA

SCSP features, extracted from each combination of TSth time segment and kth subband, are transformed using linear discriminant analyser which provides a projection matrix \( {\mathbf{W}}_{\mathbf{lda}}^{\mathbf{TS},\mathbf{k}} \) that minimizes intra class variance \( {\mathbf{S}}_{\mathrm{W}}^{\mathrm{TS},\mathrm{k}} \) and maximizes inter class variance \( {\mathbf{S}}_{\mathrm{B}}^{\mathrm{TS},\mathrm{k}} \) given by:

$$ {\mathbf{S}}_{\mathrm{B}}^{\mathrm{T}\mathrm{S},\mathrm{k}}=\left({\mathbf{m}}_2^{\mathrm{T}\mathrm{S},\mathrm{k}}-{\mathbf{m}}_1^{\mathrm{T}\mathrm{S},\mathrm{k}}\right){\left({\mathbf{m}}_2^{\mathrm{T}\mathrm{S},\mathrm{k}}-{\mathbf{m}}_1^{\mathrm{T}\mathrm{S},\mathrm{k}}\right)}^{\mathrm{T}} $$
(7)

and

$$ {\mathbf{S}}_{\mathrm{W}}^{\mathrm{T}\mathrm{S},\mathrm{k}}={\sum}_{{\mathrm{f}}_{\mathrm{p}}\in \mathrm{c}1}\left({\mathbf{f}}_{\mathrm{p}}^{\mathrm{T}\mathrm{S},\mathrm{k}}-{\mathbf{m}}_1^{\mathrm{T}\mathrm{S},\mathrm{k}}\right){\left({\mathbf{f}}_{\mathrm{p}}^{\mathrm{T}\mathrm{S},\mathrm{k}}-{\mathbf{m}}_1^{\mathrm{T}\mathrm{S},\mathrm{k}}\right)}^{\mathrm{T}}+{\sum}_{{\mathrm{f}}_{\mathrm{p}}\in \mathrm{c}2}\left({\mathbf{f}}_{\mathrm{p}}^{\mathrm{T}\mathrm{S},\mathrm{k}}-{\mathbf{m}}_2^{\mathrm{T}\mathrm{S},\mathrm{k}}\right){\left({\mathbf{f}}_{\mathrm{p}}^{\mathrm{T}\mathrm{S},\mathrm{k}}-{\mathbf{m}}_2^{\mathrm{T}\mathrm{S},\mathrm{k}}\right)}^{\mathrm{T}} $$
(8)

The cost function for TSth time segment and kth subband, which needs to be maximized, is given by:

$$ {\mathbf{J}}^{\mathbf{T}\mathbf{S},\mathbf{k}}=\frac{{{\mathbf{W}}_{\mathbf{lda}}^{\mathbf{T}\mathbf{S},\mathbf{k}}}^{\mathbf{T}}{\mathbf{S}}_{\mathrm{B}}^{\mathrm{TS},\mathrm{k}}{\mathbf{W}}_{\mathbf{lda}}^{\mathbf{T}\mathbf{S},\mathbf{k}}}{{{\mathbf{W}}_{\mathbf{lda}}^{\mathbf{T}\mathbf{S},\mathbf{k}}}^{\mathbf{T}}{\mathbf{S}}_{\mathrm{W}}^{\mathrm{TS},\mathrm{k}}{\mathbf{W}}_{\mathbf{lda}}^{\mathbf{T}\mathbf{S},\mathbf{k}}} $$
(9)

where \( {\mathbf{m}}_1^{\mathrm{TS},\mathrm{k}} \) and \( {\mathbf{m}}_2^{\mathrm{TS},\mathrm{k}} \) are means of class 1 and class 2 features for TSth time segment and kth subband respectively. The score for TSth time segment and kth subband is defined as

$$ {\mathbf{s}}_{\mathrm{k}}^{\mathrm{TS}}={\mathbf{W}}_{\mathbf{lda}}^{\mathbf{TS},\mathbf{k}}{\mathbf{f}}_{\mathrm{p}}^{\mathrm{TS},\mathrm{k}} $$
(10)

The scores obtained from each combination of TSth time segment and kth subband are fused to form a 3*k-dimensional feature vector \( {\left[{\mathbf{s}}_1^{\mathbf{T}\mathbf{S}1},{\mathbf{s}}_2^{\mathbf{T}\mathbf{S}1}\dots {\mathbf{s}}_{\mathbf{k}}^{\mathbf{T}\mathbf{S}1},{\mathbf{s}}_1^{\mathbf{T}\mathbf{S}2},{\mathbf{s}}_2^{\mathbf{T}\mathbf{S}2}\dots {\mathbf{s}}_{\mathbf{k}}^{\mathbf{T}\mathbf{S}2},{\mathbf{s}}_1^{\mathbf{T}\mathbf{S}3},{\mathbf{s}}_2^{\mathbf{T}\mathbf{S}1}\dots {\mathbf{s}}_{\mathbf{k}}^{\mathbf{T}\mathbf{S}3}\right]}^{\mathbf{T}} \) corresponding to each trial.

Feature selection

Feature vector derived may enclose features from irrelevant time segment and subband for a given mental task. These irrelevant features may deteriorate the performance of decision model. In order to gain proper insight about the features and their relevance to a class variable, univariate feature selection techniques have been used in literature [32]. A subset of feature are selected on the basis of a selection criterion. The reduced and relevant set of features will require reduced space and less computation time for leaning a model and will also provide improved classification performance. Thus, the ranking of features is carried out in the third phase using following univariate feature selection approaches:

  • Euclidean distance: Euclidean distance [32] is a Pythagorean Theorem based simple distance measure. It measures the distance between the data points belonging to two classes. Euclidean distance between the two classes c 1 and c 2 for including the feature f is given by:

$$ {D}_{1,2}\left(\boldsymbol{f}\right)=\sqrt{\left({\mu}_1^{\mathrm{f}}-{\mu}_2^{\mathrm{f}}\right)\left({\mu}_1^{\mathrm{f}}-{\mu}_2^{\mathrm{f}}\right)} $$
(11)

where \( {\mu}_i^{\mathrm{f}} \) is the mean of feature f for class ci. The high value of D1,2 characterizes that the two classes are highly distinguishing. It is simple and fast. However, it assumes that the samples are distributed about its mean spherically.

  • Correlation: Correlation [33] is adopted as a measure of goodness between two variables (feature f and class variable c) with the assumption that a good feature is highly correlated to class label. The linear correlation coefficient between class ci (i = 1, 2) and feature f is given by:

$$ \mathrm{R}\left(\boldsymbol{f}\right)=\frac{\sum_{\mathbf{i}}\left(\mathbf{f}-{\mu}_i^{\mathrm{f}}\right)\left({\mathbf{c}}_{\mathbf{i}}-{\mu}_i^{\mathrm{c}}\right)}{\sqrt{\sum_{\mathbf{i}}{\left(\mathbf{f}-{\mu}_i^{\mathrm{f}}\right)}^2}\sqrt{\sum_{\mathbf{i}}{\left({\mathbf{c}}_{\mathbf{i}}-{\mu}_i^{\mathrm{c}}\right)}^2}} $$
(12)

where \( {\mu}_i^{\mathrm{f}} \) is the mean of feature f and \( {\mu}_i^{\mathrm{c}} \) is the mean of class ci. The value of r lies between −1 and 1. Higher the magnitude of R, more relevant is that feature f.

  • Mutual information: The mutual information [34] measures the nonlinear correlation between two random variables. The mutual information I(c; f) between class ci (i = 1, 2) and feature f is given by:

$$ \mathrm{I}\left({\mathrm{c}}_{\mathrm{i}};\boldsymbol{f}\right)=\mathrm{H}\left({\mathrm{c}}_{\mathrm{i}}\ \right)-\mathrm{H}\left({\mathrm{c}}_{\mathrm{i}}\ |\boldsymbol{f}\right) $$
(13)

where H(ci) is the entropy function for class variable c given by:

$$ \mathrm{H}\left({\mathrm{c}}_{\mathrm{i}}\right)=-{\sum}_{\mathrm{i}=1}^2\mathrm{P}\left({\mathrm{c}}_{\mathrm{i}}\right)\mathrm{logP}\left({\mathrm{c}}_{\mathrm{i}}\right) $$
(14)

and H(ci| f) is the change in entropy value of class variable c by observing feature f is given by:

$$ \mathrm{H}\left({\mathrm{c}}_{\mathrm{i}}\ |\boldsymbol{f}\right)=-{\sum}_{\mathrm{f}=1}^{{\mathrm{N}}_{\mathrm{f}}}\mathrm{P}\left(\boldsymbol{f}\right)\left({\sum}_{\mathrm{i}=1}^2\mathrm{P}\left({\mathrm{c}}_{\mathrm{i}}\ |\boldsymbol{f}\right)\mathrm{logP}\Big({\mathrm{c}}_{\mathrm{i}}\ |\boldsymbol{f}\Big)\right) $$
(15)

where P(ci) is the probability density function for class variable ci and P(ci | f) is the conditional probability density function. Higher the magnitude of I(ci; f), more relevant is that feature f to the class variable ci. Given an initial set of features, a subset of features that provides maximal mutual information is selected for classification.

  • Fisher discriminant ratio (FDR): FDR [35] is a ranking approach that ranks the features on the basis of following measure:

$$ \mathrm{FDR}\left(\boldsymbol{f}\right)=\frac{{\left({\mu}_1^{\mathrm{f}}-{\mu}_2^{\mathrm{f}}\right)}^2}{\upsigma_1^2+{\upsigma}_2^2} $$
(16)

where μi and σi denote mean and variance of the ith class features, respectively. Higher value of FDR depicts the data of different classes is more separable and less scattered around their mean.

Classification

After obtaining relevant features, a decision model is built. Two well-known classifiers such as linear discriminant analysis (LDA) [36] and support vector machine (SVM) [33] are investigated in this paper.

Experimental data and results

Dataset description and parameter setting

Dataset 1: “BCI competition III dataset Iva”

Fraunhofer FIRST and Campus Benjamin Franklin of the Charite - University Medicine Berlin have provided this dataset [37]. The given dataset is composed of motor Imagery EEG signals generated during right hand and right foot motor imagination. Five healthy subjects (aa, al, av., aw and ay) were employed for acquisition of data. Each subject’s dataset consists of EEG signals of 280 trials. The signals were measured using 118 EEG channel locations from extended international 10/20 electrode montage system. During each trial, the subject was provided with a visual cue shown for 3.5 s showing which of the three motor imagery tasks, the subject needs to perform: left hand motor imagery, right hand motor imagery, and right foot motor imagery. The captured EEG data was preprocessed using a bandpass filter of 0.05–200 Hz and then fed to digitization at 1000 Hz and downsampled at 100 Hz. The resting window between two adjacent experiments was randomly taken from a time period of 1.75–2.25 s. EEG trials only for the right-hand motor imagery and right-foot motor imagery were given for competition purpose. These parameter settings are provided by the BCI competition.

Dataset 2: “BCI competition IV dataset Ia”

Berlin BCI group, Fraunhofer FIRST and Campus Benjamin Franklin of the Charité University Medicine Berlin have provided this dataset [38]. Seven healthy subjects (ds1a, ds1b, ds1c, ds1d, ds1e, ds1f and ds1g) were employed for acquisition of data. Each subject’s dataset consists of EEG signals of 200 trials. The signals were measured using 59 EEG channel locations from extended international 10/20 electrode montage system. For the period of each trial, the subject was provided with a visual cue shown for 4 s. For each subject, two classes of motor imagery were selected from the three classes left hand motor imagery, right hand motor imagery, and foot motor imagery at a time. The whole dataset is divided into two categories: Calibration Data and Evaluation Data. The captured EEG Signals were bandpass filtered between 0.05 and 200 Hz and then fed to digitization at 1000 Hz and downsampled at 100 Hz. The resting window between two adjacent experiments was randomly taken from a time period of 2–4 s. These parameter settings are provided by the BCI competition.

For experimental analysis, we have used the data from each trial that belongs to the overlapping time windows [0.5 to 2.5, 1.0 to 3.0, and 1.5 to 3.5 s] [22] after the onset of stimulus which yields a total of 200-time units per electrode in a trial or an EEG signal matrix of 118 × 200 per trial for Dataset 1 and 59 × 200 [21, 31, 39] per trial for Dataset 2. Variable size subbands is generated from a given frequency band [7–30 Hz] at a variable bandwidth (bw) = [3,4,…7 Hz] and granularity (gr) = [3,4,…7 Hz]. Thus, the smallest considered bandwidth of variable subbands is not very small thus information loss would be minimal. The time segmented data is then bandpass filtered using the different variable sized subbands filter bank. Stationary CSP in combination with LDA is then used for extracting features. SCSP penalty parameter μ = 0.1 (decided using cross-validation) is used for all the experiments on both datasets. In literature [18], it is shown that r= 1 or r = 2 is a good choice and adding more number of spatial patterns does not enhance the classification performance. Therefore, in this research work, the number of spatial patterns has been fixed to r = 1 [19,20,21]. Hence for each time segment and variable size band combination, we obtain two features for each trial and LDA analysis converts the obtained two features from each combination of time segment and subband into one feature. Univariate feature selection methods are then used to rank these features. To achieve the best performance of learning machine, grid search is employed to obtain SVM regularization parameter C and the Gaussian kernel parameter σ which varied from 1 to 500 and 1 to 100 respectively. The optimal value so obtained were C = 100 and σ = 10 based on experimentation.

The performance of the CKSSCSP is compared with existing methods (CSP, SBCSP, FBCSP and CVSCSP) in terms of average classification error. The classification error is reported as average of 10 runs of 10-fold cross-validation classification error for each subject. In our experiment, for comparison with existing methods, we have performed experiments on the parameters that were suggested in the existing works [19,20,21]. Therefore, in this work, a fixed size bw = 4 Hz is considered for SBCSP, FBCSP and for CVSCSP bw = 4 Hz, gr = 4 Hz and Euclidean distance based feature selection are used for performance evaluation. Different values of bw and gr ranging from 3 to 7 Hz have been used for evaluating the variation in the performance of each subject using proposed method CVSTSCSP.

Results and discussion

Figures 3, 4, 5 and 6 show variations in classification error of the proposed method CVSTSCSP with the choice of single time segments (TS1:0.5–2.5, TS2: 1.0–3.0, TS3:1.5–3.5), and all three segments (ALL_TS)) of Dataset 1 and Dataset 2 for SVM and LDA classifiers respectively using different univariate feature selection methods. The classification error is reported in terms of average classification error of 10 runs of 10-fold cross-validation for all subjects. The following can be noted from Figs. 3 and 4 for Dataset 1:

  • Using correlation based feature selection, there is an overall decrease of 23.04%, 30.9% and 24.68% in average classification error using LDA classifier and a decrease of 10.65%, 24.29% and 24.77% in average classification error using SVM classifier with the use of all relevant features from the three time segments ALL_TS in CVSTSCSP as compared to relevant features from single time segment TS1, TS2 and TS3 in CVSTSCSP respectively.

  • Using FDR based feature selection, there is an overall decrease of 25.12%, 32.25% and 24.98% in average classification error using LDA classifier and a decrease of 9.33%, 23.65% and 24.45% in average classification error using SVM classifier with the use of all relevant features from the three time segments ALL_TS in CVSTSCSP as compared to relevant features from single time segment TS1, TS2 and TS3 in CVSTSCSP respectively.

  • Using Euclidean based feature selection, there is an overall decrease of 6%, 3.2% and 12.68% in average classification error using LDA classifier and a decrease of 16.89%, 16.53% and 35.38% in average classification error using SVM classifier with the use of all relevant features from the three time segments ALL_TS in CVSTSCSP as compared to relevant features from single time segment TS1, TS2 and TS3 in CVSTSCSP respectively.

  • Using mutual information based feature selection, there is an overall decrease of 35.7%, 32.5% and 41.3% in average classification error using LDA classifier and a decrease of 17.09%, 13.98% and 41.04% in average classification error using SVM classifier with the use of all relevant features from the three time segments ALL_TS in CVSTSCSP as compared to relevant features from single time segment TS1, TS2 and TS3 in CVSTSCSP respectively.

  • The combination of CVSTSCSP with mutual information based feature selection performs best and the combination of CVSTSCSP with Euclidean distance based feature selection performs worst among all combinations of feature selection with CVSTSCSP with SVM classifier.

  • The combination of CVSTSCSP with mutual information based feature selection performs best and the combination of CVSTSCSP with correlation based feature selection performs worst among all combinations of feature selection with CVSTSCSP with LDA classifier.

  • On an average, minimum average classification error of 5.1% and 4.48% is obtained using mutual information based feature selection for Dataset 1 with SVM and LDA classifier respectively. Finally, we can also observe from Figs. 3 and 4 that classification error has reduced significantly with the use of all relevant features from the three time segments ALL_TS as compared to relevant features from single time segment TS1, TS2 and TS3 in the proposed CVSTSCSP method with SVM as well as LDA classifier.

Fig. 3
figure 3

Comparison of average classification error for Dataset 1 for the proposed method CVSTSCSP using SVM classifier

Fig. 4
figure 4

Comparison of average classification error for Dataset 1 for the proposed method CVSTSCSP using LDA classifier

Fig. 5
figure 5

Comparison of average classification error for Dataset 2 for the proposed method CVSTSCSP using SVM classifier

Fig. 6
figure 6

Comparison of average classification error for Dataset 2 for the proposed method CVSTSCSP using LDA classifier

Following deductions can be made from Figs. 5 and 6 for Dataset 2:

  • Using correlation based feature selection, there is an overall decrease of 39.93%, 17.28% and 33.18% in average classification error using LDA classifier and a decrease of 43.75%, 32.91% and 41.12% in average classification error using SVM classifier with the use of all relevant features from the three time segments ALL_TS in CVSTSCSP as compared to relevant features from single time segment TS1, TS2 and TS3 in CVSTSCSP respectively.

  • Using FDR based feature selection, there is an overall decrease of 39.98%, 17.42% and 33.20% in average classification error using LDA classifier and a decrease of 43.69%, 32.84% and 41.05% in average classification error using SVM classifier with the use of all relevant features from the three time segments ALL_TS in CVSTSCSP as compared to relevant features from single time segment TS1, TS2 and TS3 in CVSTSCSP respectively.

  • Using Euclidean based feature selection, there is an overall decrease of 29.38%, 15.62% and 23.22% in average classification error using LDA classifier and a decrease of 38.07%, 28.25% and 31.31% in average classification error using SVM classifier with the use of all relevant features from the three time segments ALL_TS in CVSTSCSP as compared to relevant features from single time segment TS1, TS2 and TS3 in CVSTSCSP respectively.

  • Using mutual information based feature selection, there is an overall decrease of 38.98%, 14.94% and 31.93% in average classification error using LDA classifier and a decrease of 43.42%, 33.48% and 41.21% in average classification error using SVM classifier with the use of all relevant features from the three time segments ALL_TS in CVSTSCSP as compared to relevant features from single time segment TS1, TS2 and TS3 in CVSTSCSP respectively.

  • The combination of CVSTSCSP with mutual information based feature selection performs best and the combination of CVSTSCSP with Euclidean distance based feature selection performs worst among all combinations of feature selection with CVSTSCSP with SVM classifier.

  • The combination of CVSTSCSP with mutual information based feature selection performs best and the combination of CVSTSCSP with correlation based feature selection performs worst among all combinations of feature selection with CVSTSCSP with LDA classifier.

  • On an average, a minimum average classification error of 6.46% is obtained using correlation based feature selection with SVM classifier and a minimum average classification error of 7.68% is obtained using FDR based feature selection with LDA classifier for Dataset 2.

  • Finally, we can also observe from Figs. 5 and 6 that classification error has reduced significantly with the use of all relevant features from the three time segments ALL_TS as compared to relevant features from single time segment TS1, TS2 and TS3 in the proposed CVSTSCSP method with SVM as well as LDA classifier

Tables 1 and 2 show the comparison of average classification error of all the existing methods with the proposed approach CVSTSCSP for dataset1 with LDA and SVM classifiers respectively. From Tables 1 and 2, the following observations can be deduced for Dataset 1:

  • The proposed method CVSTSCSP performs best among all methods and achieves a minimum classification error of 0.04 and 0.05 with LDA and SVM classifier respectively.

  • There is an overall decrease of 80.72% and 82.2% in classification error with the use of CVSTSCSP in comparison to CSP with LDA and SVM classifier respectively.

  • An overall decrease of 82.0% and 79.82% in classification error can be observed using the proposed method CVSTSCSP in comparison to SBCSP with LDA and SVM respectively.

  • A decrease of 82.3% and 84.2% classification error has been achieved with CVSTSCSP in comparison to FBCSP with LDA and SVM classifier respectively.

  • A deduction in classification error of 86.2% and 89.1% has also been obtained with the proposed method CVSTSCSP in comparison to CVSCSP method.

  • An average decrease of 81.1%, 80.9%, 83.3% and 87.7% can be observed using the proposed method CVSTSCSP in comparison to CSP, SBCSP, FBCSP and CVSCSP respectively over all classifiers and all subjects of Dataset 1.

Table 1 Comparison of classification error of different methods with LDA at a bandwidth bw = 4 Hz for Dataset 1
Table 2 Comparison of classification error of different methods with SVM at a bandwidth bw = 4 Hz for Dataset 1

Tables 3 and 4 shows the comparison of average classification error of all the existing methods with the proposed approach for Dataset 2 with LDA and SVM classifiers respectively. From Tables 3 and 4, following observations can be deduced for Dataset 2 as above:

  • The proposed method CVSTSCSP performs best among all methods and achieves a minimum classification error of 0.08 and 0.07 with LDA and SVM classifier respectively.

  • There is an overall decrease of 79.8% and 76.7% in classification error with the use of CVSTSCSP as compared to CSP with LDA and SVM classifier respectively.

  • An overall decrease of 78.7% and 72.1% in classification error can be observed using the proposed method as compared to SBCSP with LDA and SVM respectively.

  • A decrease of 82.2% and 79.4% classification error has been achieved with CVSTSCSP in comparison to FBCSP with LDA and SVM classifier respectively.

  • A deduction in classification error of 85.2% and 84.38% have also been obtained while comparing the proposed method and CVSCSP method.

  • An average decrease of 78.5%, 75.4%, 80.8% and 84.80% in classification error can be observed using the proposed method CVSTSCSP in comparison to CSP, SBCSP, FBCSP and CVSCSP respectively over all classifiers and all subjects of Dataset 2.

Table 3 Comparison of classification error of different methods with LDA at a bandwidth bw = 4 Hz for Dataset 2
Table 4 Comparison of classification error of different methods with SVM at a bandwidth bw = 4 Hz for Dataset 2

An average decrease of 79.7%, 78.20%, 82.1% and 86.26% in classification error over both datasets and both classifiers can be observed using the proposed method CVSTSCSP in comparison to CSP, SBCSP, FBCSP and CVSCSP respectively.

Figures 7 and 8 show the variations in classification error with different combinations of bandwidth (bw) and granularilty (gr) values for different subjects of Dataset 1 and Dataset 2 respectively. We can observe from Figs. 7 and 8 that the classification error varies with the choice of bandwidth (bw) and granularilty (gr). Also, the minimum classification error is achieved with different combination of bw and gr for different subjects. It can also be noted that the classification error is more sensitive to the choice of bw in comparison to gr. Further, it can be observed that larger values of gr and bw leads to a degraded performance as the number of bands generated are less in number and may not consists of the relevant subset of subbands to a particular subject.

Fig. 7
figure 7

Comparison of classification error for all subjects of Dataset 1 at different bandwidth and granularity values

Fig. 8
figure 8

Comparison of classification error for all subjects of Dataset 2 at different bandwidth and granularity values

To find the statistical difference among all the experimental methods, Friedman statistical tests [20] have been conducted in this study. The null hypothesis assumes that the performance of all the methods is equivalent in terms of classification error. Table 5 shows the statistical ranking of all the method obtained using Friedman’s statistical test. It can be observed from Table 5 that the proposed method CVSTSCSP in combination with mutual information based feature selection (mi-CVSTSCSP–All_TS) performs the best, which has achieved the least rank value of 3.167. The p value calculated using Iman and Davenport statistic [20] is 3.06E-37 which confirms the significant difference among the all the methods used in our experimental study. Thus, we can reject the null hypothesis and state that the proposed method is statistically significant.

Table 5 Average Ranking of Algorithms

To compare all other methods with the best ranked method i.e. control method (mi-CVSTSCSP–All_TS), p values are computed using a defined post hoc methods (Hommel, Holm and Hochberg methods) [32]. Table 6 shows the p values obtained for defined post hoc methods. The bold value highlights the significant difference of the control method (mi-CVSTSCSP–All_TS) with all other methods at a significance level of 0.05 using the post hoc methods.

Table 6 p-values obtained using friedman statistics while comparing with the control method (mi-CVSTSCSP -All Segments (R0)) at the significance level of 0.05

Conclusion and future directions

Many Feature extraction techniques have been used in the area of BCI for recognition of motor imagery tasks. CSP is one of the popular spatial feature extraction method used in the area of motor imagery EEG classification. The performance of CSP is highly dependent on subject specific characteristics like frequency band, relevant time segment within a trial, spatial filters, and presence of artifacts in the EEG signal.

In this paper, we proposed a four-phase method CVSTSCSP to determine relevant features from a set of spectral, temporal and spatial features to reduce the classification error to distinguish motor imagery tasks. In order to determine relevant temporal information, the EEG signal is segmented into three overlapping time segments. Further, to choose the relevant spectral features, we have used variable sized subband filter bank. To reduce the effect of artifacts and non-stationarity, SCSP is used for feature extraction. In order to select a reduced subset of relevant subset of features from high-dimensional feature vector, we have used univariate feature selection method. We have investigated four univariate feature selection methods such as Euclidean distance, correlation, mutual information and Fisher discriminant ratio. Two well-known classifiers LDA and SVM are used to the build decision model. It is observed that with the use of relevant temporal information in the proposed CVSTSCSP method, the performance improves in terms of the classification error in comparison to the CVSCSP method, which consider the whole signal. The combination of CVSTSCSP with mutual information based feature selection achieves minimum classification error for Dataset 1 and comparable classification error for Dataset 2 among all combinations of CVSTSCSP with different feature selection method. It is also noted that the classification error is more sensitive to the choice of bandwidth (bw) in comparison to granularilty (gr). Experimental results demonstrate that the proposed method CVSTSCSP outperforms the existing methods such as CSP, SBCSP, FBCSP and CVSCSP in terms of classification error. Friedman statistical test has been performed to confirm the significant difference among the all the methods used in our experimental study.

For evaluation of the proposed method, we have conducted all the experiments on two class motor imagery EEG data only. In future, we will extend the proposed method for multiclass classification. Univariate feature ranking methods have been used in this study. Although these methods are simple, efficient to implement and select relevant features. But these methods ignore the correlation among the subset of relevant features, which may degrade the performance. Multivariate feature selection methods have been suggested in research work [33] which provide relevant and non-redundant subset of features. We will utilize multivariate feature selection methods in our future works. The proposed method uses static selection of subject specific time segment. In future work, we will extend our research work for automatic selection of subject specific time segment.