Keywords

1 Introduction

Brain computer interface (BCI) is an assistive technology which helps disabled people by setting up a direct communication link between the users brain and an electronic device or software such as a wheel chair, a computer running a word processing program or a quad-copter [10, 18]. In a BCI system, the activity of the brain is measured and then converted to the control commands for the controlled device. There are many techniques for measuring the brain activity such as functional magnetic resonance imaging (FMRI), magnetoencephalography (MEG), positron emission tomography (PET), electrocorticogram (ECoG) or electroencephalography (EEG). Among them, EEG is the preferred way of acquiring brain signals thanks to its practicality, being low cost, non-invasiveness and portability [20].

Motor imagery (MI) is an independent BCI method which uses motor cortex as a signal source. MI guesses motor intentions of the user without any actual muscular movement. In this context, the user imagines moving a limb while his/her EEG is analyzed continuously by a BCI system which finds out the imagined movement. Then, a command to be sent to the controlled device is generated according to the type of the imagined motor movement. Motor imagery studies showed that, imagination of movement of a limb creates special oscillations called event related synchronization (ERS) and event related de-synchronization (ERD) at specific frequency bands [2, 15].

The pioneer study of Penfield and Boldrey in 1937 revealed important information about spatial organization of motor cortex. They reported that, any muscle group in the body is presented at a specific area on the motor cortex. The sizes of these areas are proportional with the usage skills of the corresponding limb rather than the limbs real size. The popular figure named homunculus resizes the limbs proportional to their areas occupied on the motor cortex. The spatial organization on motor cortex yields identification of various motor imagery tasks according to the location of event related synchronization (ERS) and de-synchronization (ERD) rhythms. However, scalp EEG signal is seriously affected by the volume conduction effect in which the EEG signals all over the scalp are mixed up and this results in poor spatial resolution [5]. In order to remove the volume conduction effect, some spatial filtering methods are proposed such as common average reference (CAR) [12], Laplacian (LAP) [12], common spatial filters (CSP) [16] and spatial filter network (SFN) [19]. Among these methods, CSP is a well known method for motor imagery classification problem and it was proven to be efficient in recent BCI competitions [3, 4].

To focus on the event related synchronization and de-synchronization (ERD and ERS) signals and to achieve a high classification performance, it is necessary to filter the EEG signal with a band pass filter prior to CSP calculation. However, one problem is that, the frequency bands of these signals vary from subject to subject. Generally, the cut-off frequencies of the band pass filter are either selected manually or unspecifically set to a broad band filter [7], which results in poor classification performance. Manual searching of the best frequency band through the training set is laborious and time consuming [13]. Thus, optimizing a spectral filter along with the spatial filter is highly desirable [7].

Common spatio-spectral pattern (CSSP) algorithm [11] is the firstly proposed method to address this problem. CSSP embeds time delayed channels into the original EEG signal in order to create a first order FIR filter for each channel. Obtained results showed an improvement of the CSSP algorithm over CSP. However, a first order FIR filter is very limited to select a certain frequency band from the EEG spectrum. After that, an improvement to CSSP, Common sparse spectral spatial pattern (CSSSP) was proposed [7]. CSSSP designs a FIR filter with any order and common to all channels. This method searches for a set of spectral-spatial filter coefficients by gradient search method which is computationally expensive with additional cost for sparsification and it needs some parameter tunings.

Sub-band common spatial patterns (SBCSP) [13] and filter bank common spatial patterns (FBCSP) [1] methods are based on optimizing spatial filters for multiple spectral filters that have different pass-bands. As reported in BCI Competition III and IV, FBCSP method achieved a high classification accuracy [4]. In these methods, a filter bank is used in order to decompose EEG signal into multiple frequency bands and a separate spatial filter is calculated for each band by CSP method. Then, features belong to different frequency bands are chosen by feature selection methods based on mutual information maximization.

Higashi et al. recently proposed a method for simultaneous design of spectral and spatial filters [9] called discriminative filter bank CSP (DFBCSP). DFBCSP algorithm optimizes the coefficients of FIR filter(s) and corresponding spatial weights. DFBCSP proposes an iterative method to optimize the spatial and spectral filter coefficients by converting the spatial and spectral optimization problems into separate generalized eigen value problems. Since it is an iterative method, reaching the optimum point should take many steps and optimization speed of the DFBCSP method depends on the degree of the FIR filter to be optimized.

In this paper, we present spatio-spectral filtering method which binds the spatial and the spectral filters in a mixed architecture that we call filter bank common spatio - spectral patterns (FBCSSP). FBCSSP finds out the required filter parameters with simple CSP calculations in one pass, without any iteration. The detailed description of FBCSSP is found in Materials and Methods section. We then compare the proposed method with other spatio spectral filtering methods in the literature and obtained results shows higher classification accuracy over them.

This paper is organized as follows, in Sect. 2, CSP and the proposed FBCSSP method will be described in details. In Sect. 3 we give the evaluations of the proposed method comparing with other methods. Section 4 investigates the advantages and the disadvantages of the FBCSSP algorithm. Finally, the Conclusion section summarizes the study and concludes the paper.

2 Materials and Methods

2.1 Common Spatial Patterns

Let \(X_k\) to be the \(k^{th}\) epoch with class c in a motor imagery experiment which includes N EEG channels and T time samples that are filtered with a band bass filter which is manually set at a fixed frequency band. Let \(\varvec{w} \in \mathbb {R}^{Nx1}\) to be an N dimensional spatial filter. Spatial filtering is simply the linear combination of the channels with the coefficients of \(\varvec{w}\):

$$\begin{aligned} \varvec{z}_{k} = \varvec{w}^{\top }X_{k} \end{aligned}$$
(1)

where \( \varvec{z}_{k}\in \mathbb {R}^{1xT}\) denotes the projection of epoch \(X_{k}\) and \(^\top \) is the transpose operation. CSP method searches for the best filter which maximizes the average power of one class while minimizing the average power of the other class. Since the epoch \(X_k\) is a zero average signal (\(\mu _k=0\)) as a result of band pass filtering, power of \(\varvec{z}_k\) is obtained by the variance calculation:

$$\begin{aligned} P_{k}^\top = \sigma ^2 \left( \varvec{z}_k \right) = \frac{1}{T}\sum \limits _{t=1}^T \left| \varvec{w}^\top \left( X_k(t)-\mu _k \right) \right| ^2 = \varvec{w}^\top R_k \varvec{w} \end{aligned}$$
(2)

Where \(R_k \in \mathbb {R}^{NxN}\) is the covariance matrix of epoch k . Let \(R_c \in \mathbb {R}^{NxN}\) to be the average covariance matrices of the epochs that belong to the class c:

$$\begin{aligned} R^{(c)}=\frac{1}{n_c} \sum \limits _{k\in c} ^{n_c} R_k \end{aligned}$$
(3)

CSP uses Rayleigh ratio as an optimization function, which is the ratio of average powers after spatial filtering:

(4)

This optimization problem is solved by converting it to a generalized eigen value problem:

$$\begin{aligned} ({{R}^{(2)}}^{-1}{R}^{(1)})\varvec{w} = \lambda \varvec{w} \end{aligned}$$
(5)

Since the covariance matrices are of dimension NxN, solution to the generalized eigen value problem above generates n eigen vector (\(\varvec{w}\)) - eigen value (\(\lambda _n\)) pairs (\(n=1,2,...N\)). Note that, for any solution \(\varvec{w}_n\), Rayleigh ratio in (4) gives \(\lambda _n\). Thus, eigen vector corresponding to the largest eigen value gives the maximum power ratio for class 1 over class 2 and, eigen vector corresponding to the smallest eigen value gives the maximum power ratio for class 2 over class 1. So, CSP firstly sorts the eigen values in descending order:

$$\begin{aligned} \lambda _1> \lambda _2> \cdots> \lambda _m> \lambda _{m+1}> \cdots \cdots>\lambda _{N-m}> \lambda _{N-m+1} > \cdots \lambda _{N} \end{aligned}$$
(6)

and then gets the m upper and m lower eigen vectors in order to create a spatial filter matrix \(W\in \mathbb {R}^{MxN}\), where \(M=2m\). To classify an input epoch with unknown class label, CSP firstly creates feature vectors using epochs with known class labels. Feature vector of an epoch is usually log-variance of the spatially filtered signal:

$$\begin{aligned} \varvec{f}_{j}=log \left( \frac{var(\varvec{z}^j)}{\sum _{l=1}^{M}var(\varvec{z}^l)} \right) \quad j=1,2,...M \end{aligned}$$
(7)

where j represents the column number in feature vector \(\varvec{f}\in \mathbb {R}^{M}\). In the above equation, logarithm function is used for approximating the feature distribution to a normal distribution [8].

2.2 Filter Bank Common Spatio - Spectral Patterns

The proposed method of this study called filter bank common spatio - spectral patterns (FBCSSP) method consists of two CSP layers. At the first layer, EEG signal is filtered with a couple of FIR band pass filters. Then, each band passed EEG signal with N channels are spatially filtered in the CSP-1 layer so that, best spatial patterns for each frequency bands are determined in this layer. At this point, proposed method differs from FBCSP and SBCSP methods. These methods finalize preprocessing and extract features at the end of the first layer. However, FBCSSP method continues signal preprocessing operation. Obtained CSP-1 outputs are directly given to a second CSP filter, CSP-2. The purpose of CSP-2 is linearly combining the outputs of the first spatial filter layer so that maximum divergence could be obtained.

Fig. 1.
figure 1

A flowchart regarding filter bank common spatio - spectral patterns method.

Let \(X \in \mathbb {R}^{NxT}\) be an input EEG signal matrix with T samples and N channels, called as epoch. Firstly, all epochs in the training set are filtered with FIR band pass filters at desired frequencies with degree P,

$$\begin{aligned} \hat{X}_{f,n}(t) = \sum \limits _{p=0}^P \varvec{h}_{f,p} X_n(t-p) \qquad \hat{X}_f \in \mathbb {R}^{NxT} \quad f =(1,2, \cdots F) \end{aligned}$$
(8)

Where, \(\varvec{h}_{f,p}\) is the \(p^{th}\) weight of \(f^{th}\) FIR filter. For each FIR filter output, a CSP filter is created. Let the average covariance matrices at the output of the \(f^{th}\) filter be \(R_f^c\) where, c is the class label,

$$\begin{aligned} R_f^c = \frac{1}{K_c} \sum _{k \in c} \hat{X}_f^k \left( \hat{X}_f^k\right) ^\top \end{aligned}$$
(9)

Where \(K_c\) is the number of epochs which belong to the class c. For two classes, classical CSP approach may be applied. However, in order to find the spatial filters in a multiclass BCI experiment, one versus rest (OVR) CSP method may be applied [6]. Let \(m_1\) o denote the number of spatial filters for one class at the first layer and \(M_1\) to denote the total number of spatial filters where \(M_1=m_1C\) and C is the total number of classes. Let the obtained spatial filter to be denoted with \(U_f \in \mathbb {R}^{NxM_1}\). Then, the output of this spatial filter will be,

$$\begin{aligned} Y_f = U_f^\top \hat{X}_f \qquad Y_f \in \mathbb {R}^{M_1xT} \end{aligned}$$
(10)

For the next layer, all outputs of first spatial layer are concatenated row by row and a new epoch is created. Let Y be the new epoch matrix which is defined as,

$$\begin{aligned} Y = \begin{bmatrix} Y_1 \\ \\ Y_2 \\ \\ \vdots \\ \\ Y_F \end{bmatrix} \qquad Y \in \mathbb {R}^{FM_1xT} \end{aligned}$$
(11)

The second CSP layer works as a frequency selection. Let the average covariance matrices for this layer be \(R^c \in \mathbb {R}^{FM_1xFM_1}\). \(R^c\) is calculated as,

$$\begin{aligned} R^c = \frac{1}{K_c} \, \, \sum \limits _{k \in c} Y^k \left( Y^k\right) \top \end{aligned}$$
(12)

Again, classical CSP or OVR CSP methods may be used for calculation of the spatial filter matrix. Let \(m_2\) to be the number of spatial filters for each class at the second layer. Then, the total number of spatial filters in this layer will be \(M_2=m_2C\). If we denote \(W \in \mathbb {R}^{FM_1xM_2}\) as the spatial filter matrix of the second label, the output of this layer will be,

$$\begin{aligned} Z=W^\top Y \qquad \qquad Z\in \mathbb {R}^{M_2xT} \end{aligned}$$
(13)

In the feature extraction method, same equation is used with CSP method that was given in (7). Note that, in this case there will be \(M_2\) features. Obtained features is given to a linear classifier such as LDA.

2.3 Filter Bank Selection

The filter bank used in FBCSSP may be configured according to the requirements and prior information related with the processed signal. For motor imagery signals, a filter bank covering 8 Hz to 36 Hz is reasonable. However, FBCSSP has the capability to combine the output filters and finally generate an optimized spectral filter. Therefore, it is better to choose a filter bank which covers a wide frequency band in which the banks overlap.

While creating the filter bank structure, having linear phase response is the most important point because in the second CSP layer, a spectral combination operation is done. FIR filters are appropriate option for being linear phase response.

Here, the effect of linear combining the filter bank outputs will be analyzed. Embedding (8) into (10) gives the output of the first layer in terms of the input and the spectral filter parameters,

$$\begin{aligned} Y_f (t) = U_f^\top \sum \limits _{p=0}^P \varvec{h}_{f,p} X(t-p) \end{aligned}$$
(14)

By using (13), it is possible to write the overall filter within one equation,

$$\begin{aligned} Z(t) = \sum \limits _{f=1}^F W_f^T U_f^\top \sum \limits _{p=0}^P \varvec{h}_{f,p} X(t-p) \end{aligned}$$
(15)

where, \(W_f \in \mathbb {R}^{M_1xM_2}\) is the spatial filter matrix in the second layer, associated with the \(f^{th}\) output of the first layer. By organizing this equation, we reach the equation,

$$\begin{aligned} Z(t) = \sum \limits _{p=0}^P \delta _p^\top X(t-p) \end{aligned}$$
(16)

which is the characteristic equation of the proposed spatio - spectral filter. \(\delta _p \in \mathbb {R}^{NxD}\) holds the spatial and spectral characteristics of the FBCSSP filter and it is defined as the following equation.

$$\begin{aligned} \delta _p=\sum \limits _{f=1}^F U_f W_f \end{aligned}$$
(17)

Above result yields, linear combination of the outputs of FIR filters means linear combination of the filter coefficients. So, the both CSP layers bring up a FIR filter which is a combination of the banks in the filter bank. Therefore, using overlapped and numerous filters should give more flexible FIR filters. An example filter bank frequency response is given in Fig. 4. Here, there are 7 FIR filters that cover the entire frequency band and their linear combination.

Normally, CSP filter extracts independent components while simultaneously diagonalizing of two covariance matrices. So, applying the CSP method to the output of another CSP filter will not improve the divergence of the signal since the output of the first CSP filter is linearly independent. However in the proposed method, when joined altogether, the outputs of the first layer will become a non-independent multi channel signal thus, CSP of second layer should increase the overall divergence of the incoming signal. In fact, second CSP makes a spectral weighting while linearly combining the outputs of the first layer.

Since they are linear filters of the same type, instead of cascading CSP filters one after another, a single CSP could be used. Indeed, this should give a higher Rayleigh ratio then the FBCSSP. However, this CSP matrix filter would have NxF inputs and \(M_2\) outputs and classifying performance will not be higher as expected.

Different to various spectral filter optimization methods reported in the literature, FBCSSP searches for the linear combinations of some predefined FIR filters. FBCSSP method has some advantages over these methods. Firstly, computational complexity of the algorithm will not increase with the degree of the FIR filters. Because only the outputs of the filters are being used, algorithm does not try to manipulate the filter weights. Whereas, those methods face with increasing computational complexity with the increasing filter degree. Secondly, FBCSSP method gives the flexibility of defining various FIR filters. This makes the algorithm to embed the existing prior knowledge into the spectra-spatial filters. For example, one can design FIR filters especially at the spectral region of \(\mu \) and \(\beta \) waves and ignore the other frequencies. Third advantage of the FBCSSP is its non-iterative structure. The methods in [7, 9, 17] use an alternating optimization strategy which iteratively increases the fitness function by updating spatial and spectral filter parameters, respectively. Different spatial locations at different frequency bands may be activated in execution of motor imagery. This leads to spatial patterns specific to frequency band. FBCSSP method does not ignore this assumption and produces spatial filters for each defined frequency bands. This enables us to investigate the obtained spatial-spectral filters at a specific frequency band.

2.4 Data Description and Preprocessing

We used the data set from BCI competition III, which is data set IVA. Detailed information about this BCI competition may be found in [4]. Data set IVA is a 2 class motor imagery data set which includes EEG records from 5 subjects labeled AA, AL, AV, AW and AY. Actually there were three classes (‘right’, ‘left’ and ‘foot’) in the original experiment, only cues for the classes ‘right’ and ‘foot’ are provided in the public dataset. The recording includes 118 EEG channels were measured at positions of the extended international 10/20-system. Signals were sampled at 1000 Hz digitized with 16 bit (0.1uV) accuracy and band-pass filtered between 0.05 and 200 Hz. Classes were labeled as right hand and foot. For each subject, there are 280 trials defined with starting and ending markers as well as its class label.

In this study, we applied the same pre-processing steps to all subjects. (i) we selected electrodes on the motor cortex area. Selected electrodes for the two data sets are shown in Fig. 2. (ii) For CSP method, EEG is band pass filtered with 8–30 Hz 5th order Butterworth filter since this band covers the motor imagery signal frequency range roughly. For spatio spectral filtering methods, we used a 5 th order Butterworth filter with a pass band of 1–49 Hz. (iii) For each trial, we used EEG signals in time segment between 0.5 s–2.5 s after instruction cue. Also trials marked with rejected trial were excluded. Preprocessing phase is given in Fig. 3.

Fig. 2.
figure 2

EEG channels used in BCI competition III Data Set IVa. EEG was captured with 118 electrodes according to the extended international 10/20-system

Fig. 3.
figure 3

Preprocessing progress used in the evaluation

2.5 Selected filters

In this study, we used a total of seven overlapping band pass FIR filters which cover the frequency band 1 50 Hz. The degree of the filters (P) was set to 20. Note that one can search for different filter bank configuration providing that only FIR filters are used because of their linear phase response. Since the selected filters overlap, their linear combination should produce new spectral filter specific to the subject under test. Frequency response of the filters used in the study is given in Fig. 4.

Fig. 4.
figure 4

Frequency responses of the selected FIR filters used in the study

3 Results

3.1 Evaluated Methods and Selected Configurations

In the following paragraphs, the methods that were evaluated will be listed with the configuration specific to the method itself. Presented classification algorithms have at least one setting values that is called hyper-parameter. For each subjects and each method, hyper parameters are selected from a list and the best combination of the parameter values that gives the highest average performance within a K-fold cross validation is reported under the obtained classification accuracy.

CSP

CSP method uses m as a hyper parameter which stands for the number of spatial filters per class used for constructing the spatial filter. Possible values for m was selected from \(\{1,2,3,4,5\}\). Then for each subject, those possible m values were tried was tried one by one and the best m that gives the highest average performance was selected.

FBCSP

For FBCSP method, the filter bank used was a FIR filter bank including 7 FIR filters with cut off frequencies [2,10; 8,16;14,22; 20,28;26,34;32,40;38,46]. The degree of the filters was set to 20. The frequency responses of the filters in the filter bank configured for evaluation are given in Fig. 4. The parameter m for FBCSP method was chosen out of \(\{1,2,3,4,5\}\) and the number of features selected (d) was chosen out of \(\{1,2,3\cdots 10\}\).

FBCSSP

In FBCSSP method, the hyper parameters that were tried for the best combination are \(\{m_1,m_2\}\) which are the number of spatial filters for the first and the second CSP layers per class, respectively. the values of \(m_1\) and \(m_2\) are selected from the list \(\{1,2,3,4,5\}\) so that there are 25 combinations for each subject.

The FIR filter bank used for FBCSSP method includes 7 FIR filters with cut off frequencies [2,10; 8,16;14,22; 20,28;26,34;32,40;38,46]. The degree of the filters was set to 20. The frequency responses of the filters in the filter bank configured for evaluation are given in Fig. 4.

3.2 Classification Results

The classification accuracies of the described methods for the two BCI competition data sets are listed in Table 1. The outputs of the methods listed here were classified using standard LDA classifier. Ten fold cross validation classification accuracy was calculated for all methods. It is obvious that FBCSSP performs high classification accuracy. Short description and specific configurations for each of the methods used were given in the previous sub section.

Table 1 reports classification accuracies of each method for each subject. The formula of the percentage accuracy (\(ACC\%\)) was given by the formula,

$$\begin{aligned} ACC \%=100\sum \limits _{c=1}^C \frac{TP_c}{TP_c+TN_c+FP_c+FN_c} \end{aligned}$$
(18)

Where, C is the total number of classes in the data set. The table also lists the standard deviation (std) of any method for any subject. Since 10-fold cross validation was used for evaluating the classification accuracy, std. represents the standard deviation of all folds for the given subject and method.

Table 1. Classification performances of the listed methods for the subjects in BCI competition III Data Set IVa.

The last column lists the overall accuracy and standard deviation for any method and all subjects which summarizes the corresponding method’s classification performance.

The number(s) in parenthesizes in any cell notifies the selected values of parameters for the corresponding subject and the method. Also, the names of the parameters are given in the second column. Note that, the accuracy value in each cell is the outcome of the given parameter configuration.

Since the supplied data is subjective, methods’ performances highly vary along the subjects. Therefore, along with the quantitative performance summary supplied by the tables, graphical presentations of the performances which benchmark the listed methods with box plots subject by subject are given in Fig. 5. The figure shows the classification accuracies of the subjects belong to the data set IVA. In this figure, the horizontal axis is the methods that were evaluated and the vertical axis is the evaluated performance value. For any subject and any method, the box boundaries represent the upper and lower 25 % quartiles of the input data which is the output of the 10 fold cross validation for the selected configuration. The red horizontal lines inside or on the boundary of the boxes represent the median values. The whiskers (dashed lines above and below the boxes) extend to the most extreme data points the box plot algorithm considers to be not outliers, and the outliers are plotted with red cross marks individually.

Fig. 5.
figure 5

Boxplots displaying the disturbance of classification performances for the subjects in dataset IVA. (Color figure online)

Fig. 6.
figure 6

Obtained spatial filters of CSP (Left) and FBCSSP (Right) methods for subjects of BCI competition III, data set IVA

3.3 Spatial and Spectral Filters

Spatial filters calculated by training CSP and FBCSSP are given in Fig. 6 for all of the subjects in the BCI data set. Also, spectral filters of FBCSSP are given in Fig. 7. Frequency response of the trained FBCSSP network is calculated by scanning all of the inputs with signal at a given frequency and measuring the average power at the output. In the figures, the spectral filter response is normalized. Spatial filter illustrations are prepared similarly, all network inputs are scanned by inputting an impulse and measuring the power of the signal for each input at the output. Then, the calculated power corresponding to any input is converted to a gray scale color value and displayed on a head figure with electrodes located. The pass band of the obtained spectral filters are located approximately within the band, which is associated with the sensorimotor cortex [14]. Besides, most of the spatial filters successfully focused on the area related with the corresponding motor action over the sensorimotor cortex. Thus, FBCSSP is a convenient method not only for acquiring higher classification rates, but also for extracting the physiological information successfully.

Fig. 7.
figure 7

Obtained spectral filters of FBCSSP method for subjects of BCI competition III, data set IVA

4 Conclusion

In a motor imagery classification problem, applying a band pass filter which is suitable with the frequency band of the subjects sensoriomotor cortex helps finding out better spatial filters that focus on the related area on the motor cortex better. However, the frequency band of the filter is subjective. Thus, searching for a method that automatically sets the required band pass filter is an important issue. Proposed FBCSSP method is a spatio spectral filtering algorithm which optimizes spatial and spectral filters specific to any subject.

FBCSSP method is formed with a filter bank and two consecutive CSP layers in which the first CSP layer plays role on localizing spatial filters specific to a given frequency band while second one weights frequency bands and designs the spectral filter by linearly combining the output of the first layer. The proposed algorithm uses CSP, which is a state of art method in motor imagery classification. Proposed method was inspected in terms of classification performance and physiological plausibility of the obtained spectral and spatial filters. For evaluation, we used a publicly available data set which is very popular in motor imagery classification studies. Classification performance table shows that FBCSSP algorithm is a successful method. Furthermore, we confirmed the physiological plausibility of the filter by inspecting filters spectral and spatial responses. Reported results show that the FBCSSP method is a successful spatio spectral method for motor imagery signal classification.

Developed method will be adapted for multiple classes as a future work. Also, proposed method’s performance will be inspected by testing with more datasets and more spatio-spectral methods found in the literature.