Motor Imagery Task Classification Using a Signal-Dependent Orthogonal Transform Based Feature Extraction

Mesbah, Mostefa; Khorshidtalab, Aida; Baali, Hamza; Al-Ani, Ahmed

doi:10.1007/978-3-319-26535-3_1

Mostefa Mesbah^17,18,
Aida Khorshidtalab¹⁹,
Hamza Baali²⁰ &
…
Ahmed Al-Ani²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9490))

Included in the following conference series:

International Conference on Neural Information Processing

1779 Accesses
3 Citations

Abstract

In this paper, we present the results of classifying electroencephalographic (EEG) signals into four motor imagery tasks using a new method for feature extraction. This method is based on a signal-dependent orthogonal transform, referred to as LP-SVD, defined as the left singular vectors of the LPC filter impulse response matrix. Using a logistic tree based model classifier, the extracted features are mapped into one of four motor imagery movements, namely left hand, right hand, foot, and tongue. The proposed technique-based classification performance was benchmarked against those based on two widely used linear transform for feature extraction methods, namely discrete cosine transform (DCT) and adaptive autoregressive (AAR). By achieving an accuracy of 67.35 %, the LP-SVD based method outperformed the other two by large margins (+25 % compared to DCT and +6 % compared to AAR-based methods).

Access provided by Autonomous University of Puebla. Download conference paper PDF

Feature Extraction Techniques for the Classification of Four-Class Motor Imagery Based EEG Data: A Comparison

Classification of Four-Class Motor-Imagery Data for Brain-Computer Interfaces

Comparative Analysis of Feature Extraction Techniques in Motor Imagery EEG Signal Classification

Keywords

1 Introduction

The aim of Brain-computer interface (BCI) is to set a direct communication link between the brain and external electronic devices whereby brain signals are translated into useful commands. Such communication link would assist people suffering from severe muscular (motor) disabilities with an alternative means of communication and control that bypass the normal output pathways [1—3]. In this paper, we focus on an important sub-component of BCI systems, namely feature extraction. This sub-component’s aim is to identify a set of features that are effective in discriminating between different classes of interest.

Transform based approaches form an important class of feature extraction techniques. Their aim is to find a more compact lower-dimensional representation in which most of the signal’s information is packed in a few number of uncorrelated coefficients. By eliminating irrelevant features (transform coefficients), these methods allow extracting effective features that preserve the generalization capability while lessening the computational complexity associated with the classification stage [4]. These transform-based approaches can be subdivided into linear and nonlinear, supervised and unsupervised, and signal dependent and signal independent methods. The most widely used linear techniques are PCA and LDA. The first one is unsupervised and aims at maximizing the variance of the projected data, using the eigenvectors of the sample covariance matrix, onto a low-dimensional subspace called principal subspace. In contrast, the latter is supervised and attempts to find a linear mapping that maximizes linear class separability of the data in a low-dimensional space [5].

Recently, the authors introduced a signal-dependent linear orthogonal transform, referred to as LP-SVD transform [6]. The transform has the advantage of forming the transformation matrix using only the AR model parameters, instead of the data samples as in the case of PCA. This transform is used in this paper to map EEG data into a new domain where only a few spectral coefficients contain most of the signal’s energy. A subset of these transform coefficients, in conjunction with the LP coefficients and the error variance, were used as features in the classification of EEG into four class motor imagery tasks. The feature extraction method was validated using BCI IIIa competition dataset and its classification capability was assessed against two state-of-the-art methods based on DCT and AAR transforms.

The rest of the paper is organized as follows. Section 2 (a) describes the EEG data, its acquisition, and its pre-processing. Section 2 introduces the LP-SVD transform and how it is used in feature extraction. Section 3 compares the classification performance of the proposed LP-SVD based technique against two methods based on two of the most widely used linear transform for feature extraction. Section 4 concludes the paper.

2 Methodology

2.1 Data Acquisition and Pre-processing

The dataset IIIa from the BCI competition III (2005) [7] was used to evaluate the effectiveness of the proposed feature extraction method. It is a widely used benchmark dataset of multiclass motor imagery tasks recorded from three subjects; referred to as K3b, K6b and L1b. The multichannel EEG signals were recorded using a 64-channel Neuroscan EEG amplifier (Compumedics, Charlotte, North Carolina, USA). Only 60 EEG channels were actually recorded from the scalp of each subject using the 10–20 system and referential montage. The left and right mastoids served as reference and ground respectively. The recorded signal was sampled at 250 Hz and filtered using a bandpass filter with 1 and 50 Hz cut-off frequencies. A notch filter was then applied to suppress the interference originated from power lines. During the experiments, each subject was instructed to perform imagery movements associated with visual cues. Each trial started with an empty black screen at t = 0 s. At time point t = 2 s, a short beep tone was presented and a cross ‘+’ appeared on the screen to raise the subject’s attention. At t = 3 s, an arrow pointed to one of the four main directions (left, right, upwards or downwards) was presented. Each of the four directions, indicated by this arrow, instructed the subject to imagine one of the following four movements: left hand, right hand, tongue or foot, respectively. The imagination process was performed until the cross disappeared at t = 7 s. Each of the four cues was randomly displayed ten times in each run. No feedback was provided to the subject. The recorded dataset from subject K3b consists of 9 runs, while the ones from K6b and L1b consist of 6 runs each, which resulted in 360 trials for subject K3 and 240 trials for each of the other two subjects.

2.2 The LP-SVD Transform

The LP-SVD transform is constructed using a two-step process, namely the estimation of LPC filter coefficients and the computation of the left singular vectors of LPC filter impulse response matrix using singular value decomposition (SVD).

Linear prediction (LP) consists of computing the current signal observation, $ y\left( n \right) $, using a linear combination of its P past samples, namely,$ y\left( {n - i} \right) {\text{for}} i = 1, \ldots , P $. This can be expressed mathematically by [8]

$$ y\left( n \right) = - \mathop \sum \limits_{i = 1}^{P} a_{i} y\left( {n - i} \right) + e\left( n \right), $$

(1)

where, a _i are the linear prediction coefficients (LPCs), P is the prediction order and $ e\left( n \right) $ is the prediction error. Equation (1) can be written in a more compact form using the following matrix notations:

$$ \varvec{y} = \varvec{He}, $$

(2)

where $ \varvec{y} = [y\left( 1 \right), \ldots ,y\left( N \right)]^{T} $ and $ \varvec{e} = [e\left( 1 \right), \ldots ,e\left( N \right)]^{T} $ are respectively the N × 1 columns vectors of the data samples and the prediction residual, while H is the N × N impulse response matrix of the synthesis filter (also called LPC filter) whose entries are completely determined by the linear prediction coefficients a _i. The matrix H is lower triangular and Toeplitz. Applying the SVD to H gives:

$$ \varvec{y} = \varvec{UDV}^{\varvec{T}} \varvec{e} $$

(3)

U and V are the N × N orthogonal matrices containing the left and right eigenvectors of H and D is the N × N diagonal matrix of singular values [9].

We define the transformation that maps the measurement vector $ \left( \varvec{y} \right) $ to a feature vector ($ \varvec{\theta} $) by [6]:

$$ \varvec{\theta}= \varvec{U}^{\varvec{T}} \varvec{y} $$

(4)

It is important to note that the transform operation $ (\varvec{U}^{\varvec{T}} \varvec{y}) $ by itself does not achieve any dimensionality reduction. It only decorrelates and packs a large fraction of the signal energy into a relatively few transform coefficients as shown in Fig. 1.

2.3 LP-SVD-Based Feature Extraction

Our approach involves extracting features from each EEG segment. These features include the estimated LP coefficients (a _i), the prediction error variance $ \left( { Vr} \right) $, and a subset of the most significant transform coefficients $ \varvec{\theta} $. These features are described below.

According to the above LP analysis, the EEG vector is described in terms of all-poles filter coefficients and the prediction error. There are two classical approaches used to estimate the LP parameters, namely the autocorrelation and the covariance methods. In this study, we used the autocorrelation method as it guarantees the stability of the filter and allows the efficient Levinson-Durbin recursion to be used to estimate the model parameters [8]. Once the coefficients are estimated, the prediction error sequence can be computed using (1). The estimate of the prediction error e(n) variance is given by:

$$ Vr = \frac{1}{N - 1}\mathop \sum \limits_{n = 1}^{N} \left( {e\left( n \right) - \bar{e}} \right)^{2} , $$

(5)

where $ \bar{e} $ is the arithmetic mean of the prediction error vector e and N is its length.

The data vector y is presented in the new coordinates $ \{ \varvec{u}_{\varvec{i}} \} $ by the transform coefficients or scores $ \theta_{i} $. The transform coefficients corresponding to the K largest singular values are selected as features:

$$ \hat{\varvec{\theta }} = \varvec{\hat{U}y,}\,,{\text{ The columns of}}\,\hat{\varvec{U}}\;are\left\{ {\varvec{u}_{1} ,\varvec{u}_{2} , \ldots ,\varvec{u}_{K} } \right\}. $$

(6)

a.
DCT–based feature extraction procedure.

The DCT is a signal independent, real-valued, orthogonal transform that is asymptotically equivalent to the optimal principal component analysis (PCA) for highly correlated first-order stationary autoregressive signals [10]. The orthonormal basis vectors $ \varvec{w}_{k} $ of an N points discrete cosine transform (DCT-II) are giving by:

$$ \varvec{w}_{k} = \left\{ {\begin{array}{*{20}l} {\frac{1}{\sqrt N } \left( {1,1, \ldots ,1} \right)^{T} } \hfill & {{\text{for }}k = 1} \hfill \\ {\frac{2}{\sqrt N }\left( {\cos \frac{k\pi }{2N} ,\cos \frac{3k\pi }{2N} , \ldots ,\cos \frac{{\left( {2N - 1} \right)k\pi }}{2N} } \right)^{\varvec{T}} } \hfill & {{\text{for }}k = 2, \cdots ,N} \hfill \\ \end{array} } \right. $$

(7))

The N × N orthogonal DCT matrix is then defined as $ \varvec{W} = (\varvec{w}_{1} , \ldots ,\varvec{w}_{N} ) $. It follows immediately that the relation between a data vector y and its DCT transform Y is given by:

$$ \varvec{Y} = \varvec{W}^{\varvec{T}} \varvec{y} $$

(8)

The resulting DCT coefficients represented by the vector Y are concentrated in the low-frequency subspace as shown in Fig. 2. Dimensionality reduction using DCT is realized by using only these low frequency coefficients as features and discarding the remaining high frequency coefficients. This is illustrated by the following linear mapping.

$$ \hat{\varvec{Y}} = \hat{\varvec{W}}\varvec{y} , $$

(9)

The columns of $ \hat{\varvec{W}} $ are $ \left\{ {\varvec{w}_{1} ,\varvec{w}_{2} , \ldots ,\varvec{w}_{K} } \right\} $

Figure 2 shows an exemplary DCT coefficients vector of the EEG data of Fig. 1. The energy of the transformed data is packed into the first few low frequency coefficients while all high frequency coefficients are relatively small.

3 Experimental Results and Discussion

This section is divided into two parts. The first part is devoted to the AR model order selection. The second part evaluates the performance of the LP-SVD-based feature extraction method against two well-known related feature extraction methods. The classifier used to measure the performance is a logistic model tree implemented as part of the Weka software package with its default parameters [11]. This classifier, that uses SimpleLogistic, has a merit over other classifiers due to its use of LogitBoost. To evaluate the classification results, we used 10 fold cross-validation where the data is randomly split into 10 folds of equal size.

3.1 AR Model Selection

To investigate the appropriate AR model order and the number of transform coefficients to be retained as features, we performed a series of simulations. In this part, only the parameters characterizing the LP-SVD transform are used as features, namely, a subset of transform coefficients ($ \hat{\varvec{\theta }} $), the LP coefficients (a _i) and the prediction error variance (Vr). The features were extracted from the electrode sites over the primary motor area C3, CZ, and C4. These are widely considered to be the most informative channels associated with motor imagery tasks [12].

We varied the AR model order from one to seven using the EEG segments from t = 3.5 s to t = 5.5 s (501 samples) from each trial. The best model order was selected based on the resulting classification accuracy. This criterion is more suitable, in the present context, than the commonly used one in signal representation (modeling), namely the tradeoff between the model order and the prediction error variance. Table 1 shows the classification results as function of the order of the AR model.

Table 1. AR model order selection

Full size table

For all subjects, the highest classification accuracy, on average, was obtained with first order AR model and using a subset of four transform coefficients with results ranging from 42.08 % for subject l1b to 66.11 % for subject K3b. Therefore, this model order and number of transform coefficients were used in subsequent analysis.

3.2 Feature Extraction Evaluation

This part compares the performance of the feature extraction method to those using similar approaches, which are based on signal modeling and orthogonal transform. These techniques are based on adaptive autoregressive (AAR) model [12] and discrete cosine transform (DCT). In particular, Schlögl et al. [12] applied a third order adaptive autoregressive (AAR) model for EEG signal analysis. The extracted AAR coefficients, which provide dynamic information about the signal spectrum, served as features. The authors used three different classifiers namely, neural network based on k-nearest neighbour (kNN), support vector machines (SVM), and linear discriminant analysis (LDA) to classify the EEG signal into one of the four classes described earlier. The results showed that the SVM-based classifier achieved the best accuracies followed by LDA and then kNN. The authors also reported that the best results were obtained when using the features extracted from all 60 monopolar channels. In this evaluation, we used these same channels to provide a fair comparison between the methods.

To find the adequate number of DCT coefficients that achieve the highest classification performance for the different subjects, we varied the number of retained DCT coefficients from 5 to 50 with a step size of 5. Table 2 summarizes the obtained classification results as a function of the number of retained DCT coefficients. The number of coefficients required, for subjects K6b, L1b and K3b, to achieve the highest classification accuracies were 15, 40, and 20, respectively.

Table 2. Performance (classification accuracy) of DCT-based feature extraction using 60 Monopolar Channels

Full size table

The performances of the three feature extraction approaches mentioned above are summarized in Table 3. It can be seen that when only the transform coefficients were used as features, the proposed approach outperformed the DCT-based one by up to 23 % in terms of accuracy (for subject L1b) with 10 times fewer number of features. Meanwhile, when the LP coefficient and the residual error variance were added to the LP-SVD transform coefficients, our technique performed better than the two methods for subjects L1b and K6b and achieved comparable results to the AAR-based method for subject K3b. On average, the improvement, in terms of accuracy was about +25 % compared to DCT and +6 % compared to AAR-based methods. It is pertinent to point out that, unlike DCT which results only in the transform coefficients as features, our method results in other features, LPC coefficients and residual signal variance, that led to a better characterization of the signal. In addition, the DCT is signal independent while our proposed transform is signal dependent. These two facts explain the difference in performance between the two methods.

Table 3. Comparative analysis of different features extraction approaches

Full size table

4 Conclusion

In the present study, we presented a feature extraction approach based on the combination of autoregressive modeling and orthogonal transformation. Results of classification experiments, using a benchmark dataset from the BCI competition III, and comparison against closely related approaches, namely DCT and AAR, demonstrates that the proposed feature set is compact and offers a significant improvement in performance as judged by the classification accuracy. The number of transform coefficients was kept constant during all the experiments. It would be interesting to address the issue of parameter tuning in future studies. Future work will also include adding more features to improve the performance beyond the one obtained in this study.

References

Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., Arnaldi, B.: A review of classification algorithms for EEG-based brain–computer interfaces. J. Neural Eng. 4, R1-R13 (2007)
Article Google Scholar
Zander, T.O., Kothe, C., Jatzev, S., Gaertner, M.: Enhancing human-computer interaction with input from active and passive brain-computer interfaces. In: Tan, D.S., Nijholt, A. (eds.) brain-computer interfaces. Human-Computer Interaction Series, pp. 181–199. Springer, London (2010)
Chapter Google Scholar
Saa, J.F.D., Cetin, M.: Discriminative methods for classification of asynchronous imaginary motor tasks from EEG data. IEEE Trans. Neural Syst. Rehabil. Eng. 21(5), 716–724 (2013)
Article Google Scholar
Ozertem, U., Erdogmus, D., Jenssen, R.: Spectral feature projections that maximize Shannon mutual information with class labels. Pattern Recogn. 39(7), 1241–1252 (2006)
Article MATH Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)
MATH Google Scholar
Baali, H., Akmeliawati, R., Salami, M.J.E., Khorshidtalab, A., Lim, E.-G.: ECG parametric modeling based on signal dependent orthogonal transform. IEEE Signal Process. Lett. 21(10), 1293–1297 (2014)
Article Google Scholar
Blankertz, B., Müller, K.-R., Krusienski, D.J., Schalk, G., Wolpaw, J.R., Schlögl, A., Gert Pfurtscheller, JdR, Millan, M.S., Birbaumer, N.: The BCI competition III: validating alternative approaches to actual BCI problems. IEEE Trans. Neural Syst. Rehabil. Eng. 14(2), 153–159 (2006)
Article Google Scholar
Vaidyanathan, P.P.: The theory of linear prediction. Synth. Lect. Sign. Proces. 2(1), 1–184 (2007)
Google Scholar
Strang, G.: Computational Science and Engineering, vol. 1. Wellesley-Cambridge Press, Wellesley (2007)
MATH Google Scholar
Ahmed, N., Milne, P.J., Harris, S.G.: Electrocardiographic data compression via orthogonal transforms. IEEE Trans. Biomed. Eng. 6(BME-22), 484–487 (1975)
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Schlögl, A., Lee, F., Bischof, H., Pfurtscheller, G.: Characterization of four-class motor imagery EEG data for the BCI-competition 2005. J. Neural Eng. 2(4), L14 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering College of Engineering, Sultan Qaboos University, P O Box: 33, Muscat, 123, Sultanate of Oman
Mostefa Mesbah
School of Computer Science and Software Engineering, The University of Western Australia, 35 Stirling Highway, Perth, WA, 6009, Australia
Mostefa Mesbah
Intelligent Mechatronics System Research Unit, Department of Mechatronics Engineering, International Islamic University Malaysia (IIUM), Kuala Lumpur, Malaysia
Aida Khorshidtalab
Malaysia Industry Transformation, Technology Park Malaysia, 57000, Kuala Lumpur, Malaysia
Hamza Baali
Faculty of Eng and IT, University of Technology Sydney, Ultimo, NSW, 2007, Australia
Ahmed Al-Ani

Authors

Mostefa Mesbah
View author publications
You can also search for this author in PubMed Google Scholar
Aida Khorshidtalab
View author publications
You can also search for this author in PubMed Google Scholar
Hamza Baali
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Al-Ani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Al-Ani .

Editor information

Editors and Affiliations

University of Istanbul, Istanbul, Turkey
Sabri Arik
University at Qatar, Doha, Qatar
Tingwen Huang
Tunku Abdul Rahman University College, Kuala Lumpur, Malaysia
Weng Kin Lai
University of Science Technology, Wuhan, China
Qingshan Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mesbah, M., Khorshidtalab, A., Baali, H., Al-Ani, A. (2015). Motor Imagery Task Classification Using a Signal-Dependent Orthogonal Transform Based Feature Extraction. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9490. Springer, Cham. https://doi.org/10.1007/978-3-319-26535-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-26535-3_1
Published: 10 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26534-6
Online ISBN: 978-3-319-26535-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Motor Imagery Task Classification Using a Signal-Dependent Orthogonal Transform Based Feature Extraction

Abstract

Similar content being viewed by others

Feature Extraction Techniques for the Classification of Four-Class Motor Imagery Based EEG Data: A Comparison

Classification of Four-Class Motor-Imagery Data for Brain-Computer Interfaces

Comparative Analysis of Feature Extraction Techniques in Motor Imagery EEG Signal Classification

Keywords

1 Introduction