Introduction

Brain-computer interfacing (BCI) technology is becoming increasingly popular with the availability of commercial electroencephalogram (EEG) acquiring devices. Compared to medical grade EEG devices, commercial devices are low cost and more portable. They do not require special expertise to use and are more comfortable to wear. Therefore, BCI systems implemented using these devices have a potential use in a variety of entertainment [1,2,3], aesthetic [4], rehabilitative [5, 6], therapeutic [7], prosthetic [8] and personal health monitoring [9] applications in addition to conventional applications in clinical healthcare and research.

Generally, a BCI system works by extracting information about a user’s intention from his or her EEG activity and translating it into a control signal. Motor imagery (MI) is one of the popular paradigms of extracting information from EEG. Here, imagined or planned movements of one’s body parts can be identified using the information extracted from EEG. Classifying different MI is, therefore, a crucial component of such a BCI.

Compared to other EEG paradigms such as steady state visually evoked potentials (SSVEP), MI is less strenuous for the user. Popular alternatives such as SSVEP or oddball paradigms using P300 potential require the user to be attentive throughout complicated and extensive series of external visual or auditory stimuli. Many studies have reported this to cause discomfort to users when used for prolonged periods [10, 11]. However, MI does not depend on visual stimulations to elicit EEG potentials. It may require the subject to respond to cues only during the training phase. In practical BCI applications such as prosthetics, rehabilitation, and gaming, MI is more intuitive and natural as control signals compared to other paradigms.

However, the identification and classification of MI are more challenging compared to SSVEP or P300 [12]. Because of this, even if MI is an ideal paradigm in the user’s end, practical application of MI as a BCI paradigm is limited [13]. In addition to this, the lower signal quality and limited distribution density of electrodes in commercial EEG devices, in general, make MI detection more difficult when such devices are used. It is also important for MI classification algorithms to be robust and accurate for a wide range of subjects with varying degrees of BCI literacy [8]. Therefore, it is required to improve the performance of MI identification and classification if this user-friendly paradigm is to be successfully implemented in practically used consumer-level BCI.

Many recent studies have been undertaken in the field of MI classification. Several feature extraction methods have been used including common spatial patterns (CSP) [14], cross-correlation [15] and wavelet decomposition [16]. Proposed MI classification algorithms include machine learning algorithms such as naive Bayes classifiers [14], K nearest neighbors (KNN), support vector machines (SVM), linear discriminant analysis (LDA), decision trees [15, 16], ensemble classifiers [16], voting methods [15], neural networks [17] and convolutional neural networks [18, 19].

Despite the abundance of work related to general MI classification, there are only a few studies that focus specifically on MI classification with commercial EEG devices. Vamvakousis and Ramirez [20] have studied whether the EPOC headset is suitable to detect the presence or absence of imaginary and actual toe movement, however, the ability to differentiate between two imaginary movements have not been studied here. Fakhruzzaman et al. [21] have also studied the suitability of the commercial headset to identify MI from the lack of it. Their analysis using CSP algorithm available through the OpenViBE software [22] concludes that the low-cost device is not recommendable for BCI applications. The algorithmic component of these work is similar to those used with clinical grade EEG devices and therefore lack robustness required in an application with commercial BCI. Takehara et al. [23] have compared the headset and a medical grade device for the effectiveness of using signal power as a discriminant feature to classify MI and found the low-cost device to be satisfactory for BCI applications. Martinez-Leon et al. [24] has compared the performance of a MI classification based on dFasArt models proposed by Cano-Izquierdo et al. [25] on EEG data acquired from the headset against BCI competition IV dataset and arrived at similar conclusions. A few studies have focused on optimizing MI classification algorithms to suit BCI implemented with commercial EEG devices. Hurtado-Rincon et al. [26] have obtained a set of features used in MI classification from EEG recorded using the same device and have proposed a feature relevance study to select the best discriminating features. Yang et al. [27] proposed a novel subject-specific channel selection method based on a criterion derived from Fisher’s discriminant analysis that can be applied with a small number of electrodes typically available with a commercial EEG device. Schiatti et al. [28] have used a feature optimization algorithm to select band power features to classify MI.

The primary objective of this study is twofold. Firstly, we focused on developing a robust classification algorithm WaveCSP that can be used to differentiate MI using EEG acquired by low-cost commercial devices. WaveCSP incorporates wavelet transform and CSP algorithms to extract features from the mu-beta rhythm of EEG. The objective was to increase the number of features to capture intra-band time and frequency domain discrimination between the classes. We evaluated the performance of left hand vs right hand clenching MI classification using the proposed algorithm for EEG acquired using an EPOC EEG headset [29]. Secondly, we focused on the specific challenges associated with MI classification using EEG from a commercial device including a limited number of electrodes, limited spatial distribution of electrodes, lower signal quality and subject variabilities. To independently evaluate the effect of these limitations in isolation, we evaluated the algorithm with a publicly available MI database of EEG acquired using a medical grade device from 109 subjects.

Methods

Dataset

EEG data from Physionet

The publicly available EEG data were sourced from the Physionet MI dataset [30]. We used the left-hand versus right-hand MI paradigm where the subject imagines clenching his right or left fist according to a cue displayed on a screen. When the cue appears directing towards either the left or the right side of the screen, the subject imagines opening and closing the corresponding fist until the cue disappears. Then the subject relaxes a random time of TR (< 2 s) until another cue is displayed. Figure 1—Top shows the timeline of a single trial of the experiment. The EEG recorded simultaneously consists of signals from 64 EEG channels, each sampled at 160 Hz and saved along with annotation markers for the display of right and left cues. Data were available for 109 volunteering subjects who performed a total of 45 left and right MI tasks in three experiments that took 2 min each.

Fig. 1
figure 1

Top-The timeline of a single trial of MI experiment. Bottom-The cue displayed to instruct the subject to imagine right-hand clenching MI

Data from the commercial EEG headset

The EEG data were collected using the aforementioned EEG headset. We adopted the same experimental protocols reported under which the publicly available data were collected. We used the OpenViBE EEG platform for acquiring EEG, generating stimulations for cues, displaying cues and for recording signals and annotation markers as GDF files. The target cue for left and right MI were chosen to be arrows pointing to the corresponding direction (Fig. 1 -Bottom). The EEG signals consisted of 14 EEG channels, each sampled at 128 Hz. Each of the 25 volunteering subjects performed a total of 46 left and right MI tasks. Five out of the 25 subjects had previously participated in at least one BCI experiment. The rest had no previous experience with any type of BCI.

Analysis

The data collected from the two sources were used for five studies as described in Table 1. The experimental procedure and the parameters used for training and testing the classifiers were kept constant across all the studies. Based on these studies, five different analyses were undertaken to comparatively evaluate the performance of the proposed algorithm on a medical grade and consumer grade EEG with inherent limitations as shown in Table 2. Stages of the WaveCSP algorithm are shown in Fig. 2.

Table 1 Description of the studies performed
Table 2 Description of the analysis of the studies performed
Fig. 2
figure 2

The schematic representation of the algorithm

Algorithm

Pre-processing

The EEG acquired from the Physionet database [31] were clean and noise-free due to the superior quality of the acquisition system. Therefore, no filtering was applied to these signals. However, the signals recorded using the headset were noisy and therefore a filter was used to smoothen the signal as shown in Eq. 1. A moving average filter of width \(w=10\) was used to optimize classification accuracy.

$$y\left[ n \right]=\frac{1}{w}~\mathop \sum \limits_{{i=0}}^{{w - 1}} x\left[ {n - i} \right]$$
(1)

The signals from both the data sets were epoched into blocks of time series signals using the event markers for each left-hand and right-hand MI trials using the EEGLAB toolbox [32] available in MATLAB.

Frequency filter

Mu and beta rhythms, defined to be the EEG signal bandwidth confined in the ranges of 7.5–12.5 Hz and 16–31 Hz respectively have been well documented as corresponding to the motor activity of the brain. Therefore, we extracted the signal components \(y\left[ n \right]\) belonging to the frequency range 7.5–31 Hz from the raw EEG signals \(x\left[ n \right]~\) using a Hamming windowed sinc filter \(f\left[ n \right]~\) as shown in Eq. (2) where the \(*\) indicates convolution operation. The filter order was optimally chosen by EEGLAB implementation of the filter.

$$y\left[ n \right]=f\left[ n \right]*x[n]$$
(2)

Wavelet decomposition

The combined mu-beta component of each signal was decomposed into three wavelet components using discrete wavelet transform (DWT) using Harr mother wavelet. The objective of applying wavelet transform was to capture the intra-band discriminators for the two classes that span both in time and frequency domains. Each of the 64 (or 14) channels were decomposed into three components such that there were 192 (or 42) signals available for the next stage.

CSP filter training

Each of the three sets of components (with 64 or 14 channels each) resulting from wavelet decomposition were used to train a CSP filter. CSP is a spatial filtering technique that maximizes the ratio of variance between two sets of time series signals.

Consider XL and XR as the two sets of signals belonging to the two classes (left hand vs right hand) of MI. Each of XL and XR consist of n = 64 or 14 signals of length T samples each in the form of a matrix of dimension n × T. Here T was determined by the sampling rate of the recorded EEG and the time interval for which the cue arrow was displayed. The goal of CSP filter training is to obtain a spatial filter W such that the ratio of variance between the resulting signals after multiplying the original signal with the filter is maximized.

$$W~=\mathop {~\arg ~\hbox{max} }\limits_{W} \frac{{\parallel W{X_L}{\parallel ^2}}}{{\parallel W{X_R}{\parallel ^2}}}$$
(3)

The solution for this optimization problem can be obtained by simultaneous diagonalization of the averaged normalized spatial covariance of the two matrices \({X_L}~\) and\(~{X_R}\). The normalized spatial covariance of the two sets of signals \({R_L}~\) and \({R_R}~\) can be calculated as follows:

$$\begin{gathered} {R_L}=\frac{{{X_L}X_{L}^{T}}}{{trace({X_L}X_{L}^{T})}} \hfill \\ {R_R}=\frac{{{X_R}X_{R}^{T}}}{{trace({X_R}X_{R}^{T})}} \hfill \\ \end{gathered}$$
(4)

Average of \({R_L}~\) and \(~{R_R}\) is computed over the 23 (or 22) trials of left and right imagery for each subject to produce the average normalized spatial covariance matrices \(\overline{{{R_L}}}\) and \(\overline{{{R_R}}}\). Simultaneous diagonalization of \(\overline{{{R_L}}}\) and \(\overline{{{R_R}}}\) is equivalent to Eigen decomposition of \(\overline{{{R_L}}}^{-1}\overline{{{R_R}}}\):

$$\overline{{{R_L}}}^{-1}\overline{{{R_R}}}={\text{~}}PD{P^{ - 1}}$$
(5)

Here \(D\) is the diagonal matrix of eigenvalues and columns of \(~~P\) represent the corresponding eigenvector. When the columns of P are sorted in descending order of the corresponding eigenvalue and transposed we obtain \(n\) spatial filters in the \(n\) rows of the resulting matrix \(~V\):

$$V=sort{(P)^T}$$
(6)

We define \(W\) of dimension \(2m \times n~\) as the spatial filter matrix by selecting the first\(~m~\)and last \(m~(<n)\) rows of \(~V\). Using \(W\) we obtain \(2m\) spatially filtered signals that maximize the variance between the two classes represented by each row of \({F_L}~and~{F_R}\) where:

$$\begin{gathered} {F_L}=W{X_L} \hfill \\ {F_R}=W{X_R} \hfill \\ \end{gathered}$$
(7)

Feature extraction

Three spatial filters \({W_1},~{W_2},~and~{W_3}\) were obtained for each of the three wavelet decomposition components from the previous step. We selected \(m=2\) after evaluating the performance of the algorithm for \(~m=1,~2~and~3\). The resulting filtered signals \({F_1},~{F_2},~and~{F_3}~\) contained four filtered signals each. Two sets of features \({f_A}~and~{f_B}~\) were extracted from each of the signals to produce a total of 24 features for each EEG epoch (Table 3). Here \({F_{i,j}}\) denotes the \({j^{th}}\) row of \({F_i}\):

$$\begin{gathered} {f_A}\left( {i,j} \right)=\log \left( {\frac{{\operatorname{var} \left( {{F_{i,j}}} \right)}}{{\mathop \sum \nolimits_{{j=1}}^{4} \operatorname{var} \left( {{F_{i,j}}} \right)}}} \right)~ \hfill \\ {f_B}\left( {i,j} \right)=~\mathop \sum \limits_{{k=1}}^{l} {F_{i,j}}\left[ k \right]~~where~l=length~of~{F_{i,j}} \hfill \\ for~i=1,2,3~~and~j=1,2,3,4 \hfill \\ \end{gathered}$$
(8)
Table 3 Feature vector used for machine learning

Classifier

Three commonly used machine learning algorithms, a linear SVM classifier, an LDA classifier and a KNN classifier were tested as the machine learning framework. The three algorithms have been widely used in the binary classification of MI. A linear SVM classifier separates two classes of training data points by maximizing the gap between the decision boundaries. New predictions are then made by assigning a test data point to a category based on side of the boundary that it belongs. LDA assigns a class to a test data point based on whether it satisfies a criterion derived as a function of a linear combination of the training data points. In KNN a test data point is assigned to the class most common among its K nearest neighbors where Euclidean distance is considered in the multidimensional feature space.

The model parameters for the classifiers were heuristically chosen to optimize the performance using preliminary experiments. Based on the results of the preliminary experiments, we selected a Matlab implementation of linear SVM which automatically selects a suitable scale factor by subsampling the input data. The soft margin parameter was set to 1. LDA was implemented in Matlab using uniform prior probabilities and same diagonal covariance matrix for both the classes. KNN was implemented with K = 1 neighbors and uniformly weighted Euclidean distance. All three algorithms standardize the predictor variables by the corresponding mean and standard deviation.

Performance evaluation

For each of the subjects, 45 trials of either left or right MI were available. Fivefold cross-validation was performed by dividing the 45 trials into five groups of 9 trials each and using a combination of four of them as the training set and the remaining set of records as the testing set in each of the fivefolds of validation. The classifiers were trained using the feature vector of 24 features obtained from each trial.

The classifier outputs were compared with the class labels of the testing trials and accuracy of prediction was evaluated as the percentage of correctly classified trials out of the total number of trials conducted for the subject. In contrast to other physiological signals such as ECG, EEG signals depend on the subject’s thought process. Termed BCI literacy, the degree to which one can successfully communicate through BCI differs from person to person [33]. Particularly, BCI illiteracy in MI tasks is known as “Motor Imagery Inability” that signifies the inability of some people to elicit imaginary movements [34]. These subjects are unable to perform imaginary motor activity even if they perfectly understood the instructions and fully intend to engage in the experiment. Therefore, the mean accuracy of all the subjects may not accurately describe the true performance of the classification algorithm. To address this issue, we adopted a measure proposed by Muller-Putz et al. [35] to identify subjects for whom the algorithm performed significantly well as opposed to “random guessing”. The random process of classifying a signal into either left or right MI can be modeled as a binomial process with expected value 50%. The upper bound of the 95% confidence interval of this expected value indicates a percentage above which a classifier can be considered to have performed significantly better than randomly guessing. According to the method proposed by Muller-Putz et al. [35], for 45 trials the upper bound of the confidence interval is 64%. We developed the metrics described in Table 4 to evaluate the performance of the algorithm.

Table 4 Definition of performance metrics

We compared the performance of the proposed algorithm against previously reported algorithms for data obtained using a medical grade EEG device (in study 1) and for data obtained using a commercial EEG device (in study 3). In study 1, we compared the performance of the proposed algorithm against two recent works based on SUTCCSP algorithm [36, 37] that are evaluated on the Physionet MI dataset which is acquired using a medical grade EEG device. In study 3a, we compared the performance of the proposed algorithm against two previously reported works where the same commercial headset was used to classify left vs right MI using CSP features [21] and band power features [28] respectively. We limited the comparison for only the studies that used the same device (commercial or medical grade) and the same MI classification (left vs. right hand) because of the performance variability across devices and EEG paradigms.

Results

The study 1 for the three classifiers was done using data from all the 109 subjects of the Physionet dataset. All three classifiers performed nearly equal, with KNN slightly leading the others in terms of SPP (Table 5). To evaluate the performance of the algorithm independent of the limitations imposed by the headset, we compared the results of our algorithm with two recently published work that used the same medical grade EEG dataset. Both the algorithms use EEG features extracted with strong-uncorrelating Transform Complex CSP (SUTCCSP) which is an extension of CSP method for the analysis of pair-wise MI data. Kim et al. [37] show a superior SA but a low SPP resulting in an overall loss in mean accuracy. Park et al. [36] have reported slightly lower SA and slightly higher SPP than the proposed method. However, they have not included data from four subjects citing poor signal quality. Moreover, they have neither reported the overall mean accuracy or the mean accuracy of subjects who performed below the significant accuracy.

Table 5 Performance evaluation in the studies conducted

In study 2, when only the 14 channels of EEG signals acquired from the medical grade data using electrodes available in the commercial headset were used, a clear reduction of performance was observed. Interestingly, unlike in study 1, SVM outperformed the other two algorithms.

In study 3a, data collected from 25 subjects using the headset was used to evaluate the performance of the algorithm, all three classifiers performed poorly. SVM showed the best SA with significant accuracy level for 4 subjects out of 25. In comparison, Fakhruzzaman et al. [21] used online processing with an OpenViBE implementation of CSP algorithm. Following CSP filtering, the signal is filtered in the alpha/beta [8 30] Hz range, split into blocks of 1 s every 16th second and the logarithmic band power is computed as features for an LDA classifier. The performance was evaluated on a single subject. Schiatti et al. [28] used band power features extracted from all 14 electrodes available in the commercial headset. These features included five frequency bands [Mu = (8–13) Hz, B1 = (13–18) Hz, B2 = (18–23) Hz, B3 = (23–28) Hz, and B = (13–30) Hz] extracted from three time windows [TW1 = (0–2) s, TW2 = (1–3) s, and TW3 = (2–4) s] after the stimulus presentation. Mutual information-based feature selection was used to select optimal features and subsequently, an SVM classifier was trained. While their results are slightly better than the proposed algorithm, they evaluated the performance only on three subjects.

In study 3b, headset data with all 14 electrodes in regular position from only the five subjects with prior BCI experience were used. A significant improvement in all three performance metrics was observed. The SVM classifier showed significant performance for four out of the five subjects who had prior BCI experience while LDA and KNN both showed significant performance for three out of the five subjects.

Study 4 was performed to investigate the effect of electrode distribution of the commercial headset. As shown in Fig. 3, the device does not have electrodes over the motor cortex in its usual position. Therefore, this study was performed with the headset positioned backwards from its usual position such that the frontal sensors align with the motor and premotor cortices. We used the data from the subject who had prior BCI experience in this study because of the very poor performance of the other subjects. Better signal acquisition from the motor cortex was expected and hence better performance. However, the mean accuracy and SPP have reduced considerably while SA has considerably increased.

Fig. 3
figure 3

The recommended positioning of electrodes in the EPOC headset and the region of motor cortex mapped over the scalp

Discussion

The performance of an EEG classifier heavily depends on the degree to which the subject was able to accurately perform the mental activity studied, in addition to the inherent differences between subjects in the electrical activity of the brain. Therefore, having a high SPP is important for a classifier to be implemented in consumer level BCI. It has been widely accepted that MI BCI paradigm requires more training for successful use. Through analysis 1, the results from studies 3a and 3b clearly show that subjects who had prior experience with BCI (four had performed similar MI experiments, one had experienced an SSVEP experiment) show better performance. With four out of five subjects having significant performance, WaveCSP-SVM shows the best performance for subjects who have some prior BCI experience.

We performed analysis 2 by comparing the results of study 1 and study 2. A clear reduction in performance has been observed in all the performance metrics. Therefore, it can be concluded that independent of the signal quality, the limited number of electrodes available in the headset reduces the performance of MI classification. Since increasing the number of electrodes is not a feasible option for an easy-to-wear headset, increasing the number of classifier features in a non-redundant fashion is demonstrated in our study.

When the EEG headset is positioned such that the frontal sensors align with the mid-region of the scalp, motor cortex activity should be more pronounced in the recorded EEG resulting in better classification accuracy. In analysis 3 we compared the performance of the classifier with data obtained with the headset in the usual position vs the modified position. However, due to the physical design of the headset, the reference electrodes lose contact from the scalp when altered from the ordinary position. This effect varies according to the shape and size of the subject’s head as the device has been designed as a one-size-fits-all headset. In study 4, WaveCSP-SVM performance with 89% SA and 20% SPP implies that there was one subject out of the five for whom the classifier worked with very high accuracy. This may be a result of the fact that the modified position of the headset fits well on the subject’s scalp.

EEG acquired from the commercial headset is more susceptible to noise and artifacts since the device is typically worn over hair and is not tightly fixed. In analysis 4, comparing study 2 and study 3a, we see a general decrease in all performance metrics when the commercial EEG headset is used instead of the medical grade EEG device. There is a significant drop in SPP when the commercial headset is used, which may be attributed to its lower signal quality. This provides evidence that signal quality of the commercial device has an effect on classification accuracy. It can be observed that, when the medical grade device with better signal quality is used, subjects for whom performance is higher than 64% the classification accuracy is considerably high and for those lower than 64%, performance is considerably low.

An overall drop in performance is observed when the commercial headset is compared with the medical grade device in analysis 5. In comparison with previous work on the commercial device, WaveCSP-SVM has better performance with respect to mean accuracy and significant accuracy. Meaningful interpretation of SA and SPP requires a large set of test subjects.

MI classification problem is solved in the twofold process of feature extraction and classification. There are many classification algorithms that have been shown to work well with EEG signals. However, the major bottleneck is in feature extraction methods. While the use of deep learning instead of handcrafted features could be a solution to this problem, its suitability MI classification should be further investigated. Existing work using neural networks and convolutional neural networks do not produce significantly better results than the classical methods. Sparse nature of MI data could be a reason for this observation.

Conclusion

This study proposes a robust MI classifier that attempts to address the limitations posed by a commercial EEG acquisition device. Comparative analysis of the performance of the proposed algorithm for different configurations of data acquired from a commercial device and data acquired from a medical grade device is undertaken. Subjects who had prior experience in BCI performed significantly better compared to those who did not have any BCI experience. The use of a smaller number of electrodes has reduced the performance irrespective of the signal quality and other factors. Conclusive evidence was not found for the hypothesis that positioning the EEG electrodes closer to the motor cortex results in better classification performance. Practical limitations for such modified electrode placement were identified. The quality of the acquired EEG signal was found to affect the performance of MI classification. Further, EEG from medical grade device better discriminates between subjects with significant and poor performance. Variability of subjects and their prior experience and training in using BCI was identified as a significant aspect of MI classification in this work. We collected data from 25 subjects for this study due to the lack of a large EEG dataset acquired with the headset. Future work should focus on collecting a data set with more subjects using a commercial EEG device. The relationship between the levels of training and experience in BCI and MI performance could be explored in detail.