Keywords

1 Introduction

Machine learning has gained widespread attention in this modern era due to technological advancements over the last few decades. These advancements in technology have provided means to high computational power and speed. As a result, machine learning has been widely applied in various applications, one of which is pattern recognition. Manually recognizing patterns in different types of complex signals is time consuming and a difficult task. Machine learning techniques help us to tackle these problems.

Brain-computer-interface (BCI) has recently gained increased attention with applications in gaming [1], stroke rehabilitation [2,3,4,5,6], emotion recognition [7,8,9,10,11,12], sleep stage classification [13,14,15,16], seizure detection/diagnosing epilepsy [17,18,19,20,21,22,23] and other applications [2, 24,25,26,27]. The use of non-invasive sensors is preferred over invasive sensors due to the fact that it does not require any surgery, is low cost, portable and simple to use. In a BCI system, usually non-invasive sensors are used to capture the brain activities. The patterns of the brain activities acquired are then recognized using machine learning and pattern recognition techniques. A BCI system involves three basic steps: signal acquisition, feature extraction and signal classification/recognition as shown in Fig. 1. Once the signal is recognized, it is then appropriately translated to control signals for communication with external devices. It is very much desirable that a BCI system has the ability to correctly recognize the signals as accurately as possible. Therefore, obtaining high recognition rate or classification accuracy for a BCI application is driving more and more research to be carried out with various methods being proposed.

Fig. 1.
figure 1

Conceptual overview of a BCI system

The EEG signals for the same MI task varies from one subject to another due to different skull size, skin thickness, age and due to the fact that the way subjects think about the same task differs amongst different subjects. Although subject independent BCI systems would be highly desirable, due to the factors mentioned above, subject specific BCI systems have been usually proposed. Furthermore, these factors also affect the frequency range in which the signals are significantly discriminative for each of the subjects. Manually finding or tuning the filter band parameters is a difficult and time consuming exercise. Therefore, to tackle these problems many researchers have proposed various methods for autonomously finding the filter bands. This would provide signals that are as discriminative as possible between different tasks thereby boosting the ability to correctly recognize different categories of MI tasks. Filter bank common spatial pattern (FBCSP) [28], discriminative FBCSP (DFBCSP) [29], and binary particle swarm optimization (BPSO) for frequency band selection [30] are some of the methods proposed to tackle this problem. In the FBCSP approach [28], the raw EEG signal is filtered using multiple zero-phase Chebyshev Type II Infinite Impulse Response (IIR) filter banks in the range of 4–40 Hz, each having a bandwidth of 4 Hz. There was no overlap in the frequencies of the different filter banks. CSP spatial filters were computed using the filtered signal for each of the filter banks. The CSP features obtained from each filter bank were then concatenated and several methods of feature selection were used to select the significant features. A number of classifiers were also evaluated and promising results were obtained. To further improve the FBCSP approach, DFBCSP was proposed. In DFBCSP [29], the raw EEG signal is filtered using multiple filters in the range of 6–40 Hz. The filter banks have a bandwidth of 4 Hz with an overlap of 2 Hz. Instead of extracting features from the filtered signals of all the filter banks the authors have proposed using fisher’s ratio of the single channels band power calculated using channel C3 or C4 to select four filter banks that will contain most discriminative information about the MI tasks. Using these four selected bands, CSP features are obtained from each band to train a support vector machine (SVM) classifier. The DFBCSP method outperformed the FBCSP method. Wei and Wei [30] proposed using BPSO for selecting the best frequency sub-bands from ten frequency sub-bands in the range of 8–30 Hz each having bandwidth of 4 Hz with an overlap of 2 Hz. Due to computational complexity, the authors performed evaluation using selected 24 and 14 channels. As such the results were not compared with FBCSP or DFBCSP approaches. However, they showed promising improvements in comparison to the conventional CSP approach. Other approaches have also been proposed which looked at other aspects such as ways of extracting more significant features [31,32,33,34,35,36,37,38,39,40,41], feature selection [42,43,44] and classification [45,46,47,48] approaches.

The FBCSP and DFBCSP approaches use multiple frequency bands and these results in an increase in computational complexity of the system. The BPSO approach for selecting the frequency bands or sub-bands mostly selected only a single sub-band. However, it requires high computational power in the training phase as the number of channels is increased. To tackle this problem, we proposed a scheme to find a single frequency band that will contain the most discriminative information between the MI tasks [49]. Genetic algorithm (GA) was employed for this purpose. In this work, we extend our previous work [49] by proposing the use of common spatial spectral pattern (CSSP) instead of CSP to further improve the scheme. This is a simple yet an effective approach (mostly ignored by researchers in this field) that improves the spatial resolution of the signal resulting in improved performance. We achieved promising results using the proposed scheme.

The remainder of this paper is organized as follows: in Sect. 2 we present our proposed scheme in detail. Section 3 presents the description of the datasets used together with the results. Discussion of the results and future works are presented in Sect. 4 while conclusions are presented in the last section.

2 Methodology

BCI has become a hot topic of research. One of the major challenges faced by researchers is the low signal-to-noise ratio (SNR) of the EEG signals acquired using non-invasive sensors. The SNR is improved to some extent by using non-invasive sensors. However, since non-invasive sensors require surgery, they are not preferred for majority of BCI applications. As such, more and more approaches are proposed by researchers with the aim to improve the classification accuracy of a BCI system. A major solution to the low SNR is to filter out the unwanted signal. Finding the frequency range which contains the most important information about the MI tasks is quite challenging. While use of multiple sub-bands has been a key to the improved performance, it also increased the computational complexity. Keeping this in mind, we propose a scheme based on CSSP that autonomously finds a single frequency band which provides maximum information to distinguish between different MI tasks. This results in an increased recognition ability of our proposed scheme and is the major contribution of this work. Our proposed scheme is presented in detail in the following sub-sections.

2.1 The Proposed Scheme

The overall framework of our proposed scheme is shown in Fig. 2. In our proposed scheme, the filter parameters of a bandpass filter are optimized using optimization algorithm. Once the parameters are determined, the training data is filtered using the filter parameters. CSSP spatial filters are determined using the filtered data and the filtered data is transformed using the learned spatial filters. The variance based features are then extracted, which are used to train a support vector machine (SVM) classifier. The test data also undergoes the same procedure except that the learned parameters are used for bandpass filtering, spatial filtering and classification is done using the trained classifier.

Fig. 2.
figure 2

Overall framework of the proposed scheme

2.2 Common Spatial Spectral Pattern (CSSP)

CSSP is a simple method that was proposed to increase the spatial resolution of the signal, which results in the signal containing more information that helps in the recognition of the MI EEG signals with higher accuracies. The only difference between CSP and CSSP is the training and test samples. In CSSP a temporal delayed signal is inserted to the raw signal, which in turn doubles the dimension of the signal. All other processes are the same for CSP and CSSP approaches. The spatial filters \( W_{CSP} \) are learned from the training data, and the training and test data are transformed to a new time series using (1). The variance based features are then extracted from the spatially filtered data.

$$ Z = W_{CSP} X $$
(1)

2.3 Optimization of Filter Parameters

Filtering the signal using appropriate temporal filter to obtain as much important information as possible is a vital step in a BCI system. Here, we employ the method proposed in our previous work [49]. The three main parameters of a Butterworth bandpass filter (filter order, lower cutoff frequency and upper cutoff frequency) are optimized. Any optimization algorithm can be used for this purpose. However, we used genetic algorithm (GA) as used in [49]. 10-fold cross validation method has been used to evaluate the performance of the filter parameters during the optimization phase.

3 Results and Discussion

In order to validate our work, we have evaluated the proposed scheme on two publicly available datasets, BCI Competition III Dataset IVa and BCI Competition IV Dataset 1 referred to as dataset 1 and dataset 2, respectively from here onwards. These datasets have been widely used in this field. Dataset 1 contains 2 classes of MI EEG signals recorded from 5 subjects. We have utilized the down sampled signal at 100 Hz as used in other works. Dataset 2 contains 2 classes of MI EEG signals sampled at 1000 Hz, however, the down sampled signal at 100 Hz is used. It contains signals recorded from 7 subjects. A detailed description of the datasets can be obtained at http://www.bbci.de/competition/.

We have utilized 10 × 10-fold cross validation approach to evaluate our proposed scheme. The 10 × 10-fold cross validation results are also reported for all other competing methods in order to make a fair comparison between the methods. The average misclassification rates and their kappa coefficient values for different methods are shown in Tables 1 and 2 for dataset 1 and dataset 2, respectively. For the conventional CSP approach, we have used a 7–30 Hz frequency band. Parameters such as the number of spatial filters and the number of bands used are adopted from the respective works as initially proposed by the respective authors. It can be seen from Tables 1 and 2 that our proposed scheme outperformed all other competing methods achieving the lowest misclassification rate of 9.95% and 18.72%, and also achieved highest average kappa coefficient values of 0.801 and 0.624 for dataset 1 and dataset 2, respectively. Subjects al and aw of dataset 1 and subjects b, e and f of dataset 2 obtained the lowest misclassification rates using the proposed scheme. Compared to the conventional CSP approach and the GA based filter optimization approach using CSP (GA-CSP) [49], the proposed scheme achieved reduction in the misclassification rate by 3.52% and 0.80% for dataset 1 and, 5.52% and 1.52% for dataset 2, respectively. Our method also outperformed the scheme proposed in [42], which utilized sparse Bayesian learning to obtain the sparse feature vectors (SBLFB).

Table 1. The average misclassification rates and their kappa coefficient values (given in brackets) for different methods evaluated using dataset 1
Table 2. The average misclassification rates and their kappa coefficient values (given in brackets) for different methods evaluated using dataset 2

As mentioned earlier, using CSSP instead of CSP improves the spatial resolution of the signal and thus the signal contains more important information. This results in a reduction in the misclassification rate. Figure 3 shows the distribution of the best 2 features for one of the trial runs of subject d (of dataset 2) for CSP, GA-CSP and the proposed scheme. It is clearly evident from Fig. 3 that the features learned by the proposed scheme contains more information about the different MI tasks and is due to the increased spatial resolution. Thus, the proposed method takes advantage of the spatial resolution and filter optimization for achieving improved performance.

Fig. 3.
figure 3

Distribution of the best 2 features for one of the trial runs of subject d (of dataset 2). On the left is the distribution of training data and on the right is the distribution of the test data.

It should also be noted that the proposed scheme did not achieve the lowest misclassification rate for all subjects. However, in such cases the misclassification rate of the subjects using the proposed scheme was within 1.85% of the lowest misclassification rate for that particular subject except for subject aa of dataset 1. Subject aa of dataset 1 achieved lowest misclassification rate of 9.21% using DFBCSP approach, while the proposed scheme achieved the second lowest misclassification rate of 14.77%, a difference of 5.56%. This is because the important information about the different MI tasks for this subject was around two different frequency bands and DFBCSP was successfully able to select those frequency bands. On the other hand, the proposed scheme only finds a single wide band. This is evident from [33], where it is shown that the wide band was not selected. This also paves way for future works to test the proposed scheme for tuning multiple filters. With multiple filters we can also employ dimensionality reduction techniques [50,51,52,53,54], feature selection [55, 56] and clustering methods [57, 58], and classifiers [41, 59, 60]. Furthermore, in this work we have utilized a single sample point delay in the CSSP approach. In future, we will consider multiple sample point delays and develop methods to select the best number of sample point delays for each subject in a quest to try and further improve the performance of the proposed scheme.

4 Conclusions

In this paper we have proposed a scheme that utilizes CSSP and filter optimization using GA. The proposed scheme achieved the lowest misclassification rate and highest kappa coefficient values outperforming all other competing methods. Another advantage of the scheme is that any optimization algorithm can be used for optimizing the filter parameters. It is recommended that future works be carried out to test and evaluate the effects of parameter optimization of multiple filter bands. Also, future works may consider optimizing the number of sample point delays for each of the subject that would give optimal results. The proposed scheme would prove vital for developing improved BCI systems.