Keywords

1 Introduction

Electroencephalogram (EEG) based Brain-Computer Interface (BCI) noninvasively measures the electrical activity of the brain and converts them into control commands for external devices. It provides to individuals suffering from severe motor disabilities an alternate mode of control and communication with the external world like a computer cursor [1], virtual keyboards [2], movement in virtual reality [3], limb prosthetics [4, 5] etc., without using their normal neuromuscular pathway of the body.

EEG based BCI is a signal processing and pattern recognition system which decodes activities encoded in the electrical activity of the brain. It basically consists of preprocessing module, feature extraction module, dimensionality reduction module and the classifier [6]. Once EEG signals are captured they are processed to reduce noise and artifacts by the preprocessing module. Then the feature extraction module extracts representative features vectors to obtain discriminative information of each class or type. The dimension of the feature vector is reduced by the dimensionality reduction module to assure that most discriminative information is contained in a lower dimension thereby helping in improving the detection accuracy. The reduced feature vector is then fed into the classifier which translates it into control commands for the external devices. To operate on any such BCI system, the subject has to engage in active or passive mental activity. Changes in EEG corresponding to this activity is decoded by the BCI system and converted into control command for external devices [7]. One of the widely used mental strategies is motor imagery (MI) of certain tasks or activities.

MI based BCI is gaining popularity as it can be used as alternate method by the paralyzed subjects to generate changes in neuronal circuits by their motor intent rather than using their restricted ability to do physical movements [8]. While a majority of these studies are focused on classification of motor imagination of the hand movements or upper limb, only recently MI based BCI for lower limb rehabilitation is gaining attention. In [9], kinesthetic walking motor imagery of lower limb is used to control an avatar in a virtual reality environment. In [10], right and left foot motor imagery is detected from EEG signals using beta rebound. In [11], a joint channel and frequency selection method is proposed to detect the walking motor imagery of the lower limb from the idling activity. Given the smaller foot representation area in the brain [12] compared to the upper limb and subjects difficulty in using an MI based BCI, the detection rates of lower limb motor imagery is low compared to upper limb. Hence efforts have to be made to improve the classification performance to better detect the motor imagery of lower limb.

To improve the detection of motor intent from EEG, the most discriminative features have to represent different classes in the detection problem. In a classification problem, dimensionality reduction of the feature vector helps in deriving the most discriminative features for a particular task. Band power features (BP), Adaptive Autoregressive parameters (AAR) and Hjorth parameters (HJ) are successfully used in the design of BCI for classifying various motor tasks [13, 14]. The major problem in the feature based BCI system is the high dimensionality of the feature vector being fed into the classifier. This dimensionality problem indeed results in increased computational complexity and poor performance of the system. Thus dimensionality reduction techniques are used to reduce the dimension of the feature vector.

Dimensionality reduction techniques [15] are basically of two types, feature selection (FS), and feature transformation (FT) techniques respectively. In FS, best feature subset is selected from the existing set of features without changing the original representation of the features whereas, in FT techniques, the number of features is reduced by projecting the data points from higher dimensional space to a lower dimensional space. FT techniques do not neglect any features, unlike FS techniques. Since BCI systems are to be adapted to each subject differently, therefore selecting only a subset of features for designing the system does not prove beneficial for the detection problem [16].

Therefore in this work, we explore the discriminative power of different features widely used in BCI research namely BP, AAR, and HJ in representing MI of brisk walking. We also explore the role of various FT techniques mainly Principal Component Analysis (PCA) [17], Locality Preserving Projections (LPP) [18] and Local Fisher Discriminant Analysis (LFDA) [19] in improving the detection of the features. In this study, we are evaluating the efficacy of these FT methods in improving the detection of the brisk walking motor imagery. Firstly various features were extracted from the brain signals, and then we classify these features with and without using FT techniques to test its role in improving the detection problem.

2 Materials and Methods

2.1 Subjects

Five healthy subjects (5 males: 22–28 years in age) participated in the current study. The subjects have no history of any neuromuscular disorders and have never participated in a BCI study before. The current study is approved by the Institute Ethical Committee (IEC), Indian Institute of Technology (IIT) Kharagpur, India. Informed consent is taken from the subjects prior to the study.

2.2 Experimental Setup and Data Collection

During the experiment, subjects are seated in a comfortable armchair at a distance 1.5 m from a 19 in. monitor where visual cues are shown according to which the subjects performed brisk walking motor imagery. The timing protocol of the experiment is shown in Fig. 1. The experiment begins with a cross at the center of the screen. At 2 s an auditory cue in the form of a beep acts as a marker for the visual cue of motor imagery or idle task that appears on the screen at 3 s. The cue lasts for 4 s followed by a blank screen that appears for a period of 4 s and an additional random time period of 1–2 s between trials to avoid adaptation. A total of 5 runs of 40 trials, 20 of each class (imagery and idle) respectively are collected from each subject.

Fig. 1.
figure 1

The timing diagram of the experimental protocol

EEG is recorded from 7 Ag/AgCl scalp electrodes placed over the sensorimotor region, namely at FCz, C3, C1, Cz, C2, C4, CPz. EEG is recorded using Thought Technology’s Flexcomp Infinity encoder, Canada at 2048 Hz sampling frequency and notch filtered at 50 Hz to remove the power line interference.

2.3 Preprocessing and Feature Extraction

The EEG signals are then band pass filtered in the frequency range of 0.5–30 Hz to avoid artifacts and down sampled to 256 Hz for further processing. Epochs corresponding to the motor imagery and rest are then extracted from the data. Band powers features [20] in alpha (8–12 Hz) and beta bands (16–24 Hz), Hjorth parameters [21] and Adaptive Auto Regressive coefficients (AAR) [22] are extracted in windows of one second from each trial. The features extracted are averaged over all the windows to obtain final feature vector per trial consisting of features extracted from all channels. The flowchart of the entire methodology followed in this study is as shown in Fig. 2.

Fig. 2.
figure 2

Flowchart of the methodology followed to process the EEG data

Fig. 3.
figure 3

10 fold cross-validation accuracy versus reduced dimension for subjects 1 to 5 as shown in subplots (a)–(e) respectively

Feature Transformation.

It is a type of dimensionality reduction techniques, where data points in the original feature space are projected into lower dimension feature space by combining or fusing features. Whereas in FS techniques, only a subset of features are considered from the original feature space. Since BCI systems are adapted to each subject differently, selecting only a subset of features for designing the system does not prove beneficial for the detection problem [16]. In this line of thought the utility of different FT techniques in improving detection of motor imagery of brisk walking are explored.

If \( \varvec{x} = [x_{1} ,x_{2} \cdots x_{N} ] \) represents the \( D \times N \) feature matrix, where each column is a \( D \) dimensional feature vector \( \varvec{x}_{N} \) representing motor activity or rest, then a FT technique maps \( X \) into \( Z \) with reduced dimensionality \( \ell \) assuming that the dataset \( X \) has an intrinsic dimensionality \( \ell \) embedded in the original \( D \) dimensional space [3, 13]. The samples in the subspace are obtained by transforming the original features vectors using a projection or feature transformation matrix given by equation as under

$$ Z = P^{T} X $$
(1)

where \( P \) is the projection matrix of size \( D \times \ell \). The linear FT methods studied here are PCA, LPP and LFDA.

Principal Component Analysis (PCA).

PCA is a popular nonparametric and an unsupervised feature reduction technique. It finds a linear lower dimensional representation for the original dataset, such that its maximal variability is preserved [17, 23]. This subspace is spanned by orthonormal unit vectors that form the new coordinate axis for the projected dataset. The feature transformation matrix for PCA is obtained by solving maximum optimization problem given as under

$$ P_{PCA} = \mathop {\arg \hbox{max} }\limits_{{P \in {\mathbf{\mathbb{R}}}^{Dx\ell } }} [trace(P^{T} CP)] $$
(2)

where \( C \) is the sample covariance matrix of the feature matrix \( X \). This projection matrix \( P_{PCA} \) retains the maximal variance property of the original data set and is obtained by solving eigenvalue decomposition of the covariance matrix of the dataset.

Locality Preserving Projection (LPP).

Locality preserving projections finds a linear low dimensional representation of the original dataset by preserving its local neighbourhood structure. It is proposed by He and Niyogi [18]. This transformation technique finds a subspace where data points that are close in the original dataset are also close in the reduced space and vice versa for the farther points. The adjacency of the data points is defined by \( k \)-nearest neighbour. In this study, \( k = 7 \) is taken [24]. The transformation matrix \( {\mathbf{P}}_{LPP} \) is then obtained by solving a minimum optimization problem given as under

$$ \begin{aligned} {\mathbf{P}}_{LPP} & \, = \,\mathop {\arg \hbox{min} }\limits_{{P \in {\mathbb{R}}^{Dx\ell } }} \left[ {\frac{1}{2}\sum\limits_{i,j = 1}^{N} {\left\| {{\mathbf{P}}^{\text{T}} {\mathbf{x}}_{i} \, - \,{\mathbf{P}}^{\text{T}} {\mathbf{x}}_{j} } \right\|}^{2} W_{i,j} } \right] \\ & \quad \quad { = }\, \, \mathop {\arg \hbox{min} }\limits_{{P \in {\mathbb{R}}^{Dx\ell } }} P^{T} XLX^{T} P \\ & {\text{subject}}\;{\text{to}}\quad P^{T} XDX^{T} P = 1 \\ \end{aligned} $$
(3)

where \( x_{i} \), \( x_{j} \) are the \( i^{th} \) and \( j^{th} \) feature vectors. The affinity matrix \( W_{i,j} \), gives the affinity between \( x_{i} ,x_{j} \). The value of the matrix elements lies in the range of \( [0,1] \). Affinity values are lower for feature vectors, \( x_{i} ,x_{j} \), that are close in the feature space and vice versa. \( D \) is a diagonal matrix, \( D_{ii} = \sum\nolimits_{j} {W_{i,j} } \) and \( L \) is a laplacian matrix, \( L = D - A \).

Local Fisher Discriminant Analysis (LFDA).

LFDA tries to combine the advantage of the LPP into Fisher Discriminant Analysis (FDA) by defining between-class covariance matrix and within-class covariance matrix in a local manner [19]. The covariance matrices are weighted or scaled by the distance between a data point and its \( k \) nearest neighbor like in LPP. Doing this it preserves within the class local structure of the data in the reduced space while ensuring maximum distance between the samples from different classes. These modified scatter matrices are used to define the Fisher ratio. Then the transformation matrix of LFDA is obtained by maximizing the Fisher ratio as done in FDA, given as under

$$ P_{LFDA} = \mathop {\arg \hbox{max} }\limits_{{P \in {\mathbb{R}}^{D \times \ell } }} [trace((P^{T} \tilde{S}_{w} P)^{ - 1} (P^{T} \tilde{S}_{B} P))] $$
(4)

where \( \tilde{S}_{B} \) and \( \tilde{S}_{w} \) are between-class and within-class covariance matrices respectively. The solution is obtained by solving a generalized eigen value problem of \( \tilde{S}_{w} \) and \( \tilde{S}_{B} \). The transformation matrix is formed from the top \( \ell \) eigenvectors corresponding to maximum \( \ell \) eigenvalues.

2.4 Classification

To test the discriminating power of each individual feature, feature matrix formed from each feature type is classified separately and then classification is performed by taking all features types together. To select the reduced dimension \( \ell \), the feature vectors are projected sequentially, using the feature transformation technique, into the reduced space of dimension starting from first dimension till \( D^{th} \) dimension. Ten-fold cross-validation (CV) with linear discriminant classifier (LDA) [17, 23, 25] is performed in each of these subspaces. The mean accuracy, specificity and sensitivity of the 10 fold CV is computed in these subspaces. The dimension that gives maximum mean classification accuracy and minimum error rate is chosen to be the lower dimensional subspace for final projection.

3 Results

Mean classification accuracy, sensitivity and specificity values in % for 10 fold CV using LDA classifier on different features for all the subjects are given in Table 1. In this study LDA is applied to classify each feature separately and then all features are considered together. It can be inferred from these values that there is not a single feature that performs consistently well for all the subjects in detection of brisk walking motor imagery from idle state. The features perform either below, at or just above the chance level classification accuracy. HJ features performs better in case of subjects 2 and 4, AAR features in subject 1 and BP features in subject 3. The classification accuracy of the best performing feature is bold faced for each subject in Table 1. In case of subject 5, none of features perform above chance level accuracy. But the classification accuracy obtained using these FE techniques are mostly below or just above the chance level classification accuracy. Moreover, considering all the features together further decreases the classification accuracy in all the subjects. Similar observations can be drawn from the sensitivity and specificity values for the features. Using these features directly for setting up a BCI system gives poor performance. Therefore further in this study the role of FT techniques, i.e., PCA, LPP and LFDA techniques in improving the performance of the BCI system based on these features for detection of lower limb brisk walking motor intent from the idle condition is explored.

Table 1. Mean classification accuracy (%), sensitivity (%) and specificity (%) of 10 fold cross- validation using LDA classifier for 5 subjects (S1–S5)

The performance of the LDA classifier post the application of FT techniques, that is, PCA, LPP and LFDA on the entire set of features taken together are given in Table 2. The FT techniques are used to project the feature matrix into reduced dimensional subspace starting from first dimension to \( D^{th} \) dimension sequentially. Mean classification accuracy, specificity and sensitivity of 10 fold cross-validation of LDA classifier is calculated in each dimension. The dimension with maximum mean classification accuracy and minimum misclassification rate is chosen as the subspace. The reduced dimension attained by each technique on the basis of above criteria is also given in the third column of Table 2. It can be clearly observed from Table 2, that classification accuracy value is improved by using the FT techniques that reduces the dimensionality of the input feature space. Among all the FT methods, LFDA performs the best in terms of accuracy, sensitivity and specificity of the classification. There is no significant difference between PCA and LPP technique on accuracy, sensitivity and specificity values except the reduced dimension attained by PCA is lower than LPP in subjects 1, 3, 4 and vice versa in subjects 2 and 5. Whereas LFDA attains significantly higher accuracy, sensitivity and specificity values compared to LPP and PCA methods. It also attains the lowest dimension for all subjects except in the case of S5 where even though LPP finds a lower dimensional representation but fails in attaining better accuracy than LFDA. Similar observations can be made for subjects 1 and 3, where PCA gives lowest subspace dimension compared to LFDA but with lower values for the performance measures. To further validate the performance of the FT techniques, scatter plots of the data points in two-dimensional subspace obtained by each method for subject S2 are given in Fig. 4. It can be observed that LFDA finds two-dimensional subspace that clearly discriminates between the two classes whereas there is strong overlapping of the points in 2D subspace attained by PCA and LPP techniques.

Table 2. Mean and standard deviation of classification accuracy (%), sensitivity (%), specificity (%) of 10 fold cross-validation and the reduced dimension attained by the FT techniques (PCA, LPP and LFDA) for subjects (S1–S5)
Fig. 4.
figure 4

Scatter plots in 2 dimensional reduced space obtained by FT techniques, (a) PCA (b) LPP (c) LFDA for subject 2

A plot of average 10 fold CV accuracies along each reduced dimension from one to \( D^{th} \) dimensional subspace obtained by each of the FT techniques for all the subjects are given in Fig. 3. There is a great variability in the evolution of the classification accuracy for these techniques across subjects. This can be attributed to the variability of the EEG signals from subject to subject. From these plots it can be inferred that for most of the subjects the accuracy values increases with dimension or wanders about the mean value in case of PCA and LPP methods. Whereas using LFDA the accuracy is higher for lower dimensions and gradually decreases with increasing dimension. Also the mean classification accuracy for LFDA technique is higher compared to LPP and PCA technique as can be seen in the plots. But the performance of the PCA, LPP and LFDA converges in the first and at \( D^{th} \) dimension in all subjects except in S5 where they have different accuracy in the first dimension.

The results obtained by this study are in accordance with the results obtained in [16] where LFDA performs the best in identifying the motor imagery of the hand. Therefore LFDA is a promising technique which can be used for detecting motor imagery tasks. This concept is verified by testing the FT techniques in detecting brisk walking MI task in this study and thereby giving a supporting evidence of its usefulness in lower limb MI task in addition to upper limb imagery tasks performed in [16].

4 Discussions

In this current study, the combination of features and FT techniques in detecting the brisk walking motor intent are explored. Standard EEG features for the motor imagery are extracted and the role of the FT methods in improving the discriminative information of the features in representing the lower limb brisk walking motor imagery is explored in this study. The BP, HJ and AAR features are not able to discriminate the imagery trials from rest trials. This could be possibly because of the low signal to noise characteristics of EEG signals. Advanced artifact rejection algorithm can be applied prior to feature extraction technique to improve the signal to noise ratio. Since in the experiment, only the electrodes placed over the sensorimotor region are considered, the effects of artifacts are less pronounced and hence bandpass filtering is used to remove noise corresponding to low and high frequency.

On applying FT techniques on the complete set of poorly performing features, the classification accuracy of the detection problem is significantly improved. Among the FT techniques, best improvement is obtained using the LFDA. The detection rate of the lower limb could be further improved by considering other representative features for lower limb motor imagery task and using the capabilities of LFDA in projecting the features into a lower dimension. Channel selection methods can be explored in addition to FT techniques to select the most informative channels which would further improve the classification accuracy.

5 Conclusions

From this study it can be inferred that among various FT techniques, LFDA performs best in improving the detection of brisk walking motor imagery compared to other FT techniques or without using any FT technique. Hence LFDA can be used as a potential dimensionality reduction method in the design of feature based BCI system for detection of brisk walking motor intent from idle condition.