Generalized optimal spatial filtering using a kernel approach with application to EEG classification

Zhao, Qibin; Rutkowski, Tomasz M.; Zhang, Liqing; Cichocki, Andrzej

doi:10.1007/s11571-010-9125-x

Generalized optimal spatial filtering using a kernel approach with application to EEG classification

Research article
Published: 03 August 2010

Volume 4, pages 355–358, (2010)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Cognitive Neurodynamics Aims and scope Submit manuscript

Generalized optimal spatial filtering using a kernel approach with application to EEG classification

Download PDF

Qibin Zhao¹,
Tomasz M. Rutkowski¹,
Liqing Zhang² &
…
Andrzej Cichocki¹

296 Accesses
18 Citations
Explore all metrics

Abstract

Common spatial patterns (CSP) has been widely used for finding the linear spatial filters which are able to extract the discriminative brain activities between two different mental tasks. However, the CSP is difficult to capture the nonlinearly clustered structure from the non-stationary EEG signals. To relax the presumption of strictly linear patterns in the CSP, in this paper, a generalized CSP (GCSP) based on generalized singular value decomposition (GSVD) and kernel method is proposed. Our method is able to find the nonlinear spatial filters which are formulated in the feature space defined by a nonlinear mapping through kernel functions. Furthermore, in order to overcome the overfitting problem, the regularized GCSP is developed by adding the regularized parameters. The experimental results demonstrate that our method is an effective nonlinear spatial filtering method.

Euler common spatial patterns for EEG classification

Article 22 January 2022

Shrinkage Common Spatial Pattern for Feature Extraction in Brain-Computer Interface

Classification of EEG Motion Artifact Signals Using Spatial ICA

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Recently, Brain Computer Interface (BCI), which aims to establish a direct communicate pathway between brain and computer, has been widely researched over the world [1]. EEG-based BCIs which rely on the motor imagery of users are of particular interest to the BCI community. Because of many facts such as low topographical resolution and high noise level, it has been a big challenge to extract effective features from EEG signals and perform classification. A lot of methods have been proposed for feature extraction and classification of EEG signals as well as parameters selection [2]. To this end, CSP method [3, 4, 5] has been proven to be very powerful in determining the spatial filters which can extract discriminative brain rhythms. However, it suffers from the limitation of assuming an absolutely linear relation between the source signals of the brain and the recorded EEG signals.

In order to make CSP applicable to nonlinearly structured data, kernel-based methods have been applied. The main idea of kernel-based methods is to map the input data to a feature space by a nonlinear mapping where inner products in the feature space can be computed by a kernel function without knowing the nonlinear mapping explicitly [6, 7, 8, 9, 10]. In [11], the nonlinear CSP was proposed, which is restricted in the equal number of trials from each class. A more flexible formulation of the kernel CSP was given in [12]. However, the complexity of the algorithm and the selection of kernel functions became new problems. More recently, a hybrid linear and kernel CSP [13] is proposed to use hybrid kernel functions which are mostly linear but can also account for small degrees of nonlinearity.

The conventional CSP algorithm can be solved by two step PCA procedure. And the most existing kernel CSP methods are based on kernel PCA [7]. A common limitation of these methods is that the class-covariance in the feature space must be nonsingular, which restricts it application to large EEG training samples. To reduce the time of calibration procedure for BCI application, the efficient training algorithm based on smaller number of EEG trials is necessary. Therefore, we propose the nonlinear CSP based on kernel methods by using the generalized singular value decomposition.

The remainder of the paper are organized as follows. In Sect. 2, we briefly introduce the classical CSP algorithm and represent it in another formula which can be solved by GSVD, and then the kernel-based GCSP algorithm is described in Sect. 3. In Sect. 4, experimental results are illustrated and discussed. Finally, the conclusions are provided in Sect. 5.

Linear CSP

Generally, a M × T matrix X _j represents the j-th trial of EEG signals, where M is the channels number and T is the samples per channel. The class-specific spatial covariance matrix can be obtained by

$$ {\bf R}_{c} = \frac{1}{n_{c}} \sum_{{\bf X}_{j}\in \hbox{class}\,c} \frac{{\bf X}_{j}{\bf X}^{T}_{j}}{\hbox{trace}({\bf X}_{j}{\bf X}^{T}_{j})}, \quad c=1,2. $$

(1)

Here, n _c is the number of trials recorded under the c-th mental task. A normalization is applied to each X _j to make the sum of the energies of all signal channels is equal to 1. For two-class case, the total spatial covariance matrix can be formulated as

$$ {\bf R}_{t} = {\bf R}_{1} + {\bf R}_{2}. $$

(2)

The objective of CSP is to find the optimal spatial filter w which makes the average energy of one class is maximized while the other class is minimized (and vise versa). These filters are often expressed with the Rayleigh Quotient

$$ {\bf w} = \arg\max_{\bf w}\frac{{\bf w}^{T}{\bf R}_{c}{\bf w}}{{\bf w}^{T}{\bf R}_{t}{\bf w}}. $$

(3)

R _c and R _t in (3) can be further formulated as

$$ {\bf R}_{c} = {\bf H}_{c}{\bf H}_{c}^{T} \quad \hbox{and} \quad {\bf R}_{t} = {\bf H}_{t}{\bf H}_{t}^{T}, $$

(4)

with

$$ {\bf H}_{c} = \left[ \frac{{\bf X}_{1}}{\sqrt{n_{c}\times \hbox{trace}({\bf X}_{1}{\bf X}^{T}_{1})}}, \ldots, \frac{{\bf X} _{j}}{\sqrt{n_{c}\times \hbox{trace}({\bf X}_{j}{\bf X}^{T}_{j})}} \right]_{\mathop{{\bf X}_{j}\in \hbox{class}\,c,}\limits_{1\leq j\leq n_{c}}} $$

(5)

and

$$ {\bf H}_{t} = [{\bf H}_{1}, \quad {\bf H}_{2}], $$

(6)

where H _c ∈ R ^M × p _c, p _c = T × n _c, p _c is the total number of sample points for all EEG trials belong to the c-class.

Kernel-based GCSP

In this section, we present a nonlinear extension of CSP based on kernel functions and the GSVD. By using the kernel method, we can work on the feature space through kernel functions, as long as the problem formulation depends only on the inner products between data points. Let’s suppose that the input space is mapped into a Hilbert space through a nonlinear mapping function as:

$$ \phi : {\bf R}^{N} \mapsto {\mathcal{F}}\,{\bf x}\,\mapsto\,\phi({\bf x}). $$

(7)

In the nonlinear mapping space, the inner product is denoted by a kernel function,

$$ <\phi({\bf a}), \phi({\bf b})> = k({\bf a},{\bf b}). $$

(8)

We denote the nonlinear mapping of a matrix X which represents a single trial of EEG signals as

$$ {\bf X} \mapsto \phi({\bf X}), \quad {i.e.},\quad [{\bf x}_{1}\ldots{\bf x}_{n}] \mapsto [\phi({\bf x}_{1})\ldots\phi({\bf x}_{n})]. $$

(9)

To apply the kernel method on CSP in the feature space instead of the original input space, the c-class covariance matrix in (1) can be expressed as

$$ {\mathcal{R}}_{c} = \frac{1}{n_{c}}\sum_{{\bf X}_{j}\in \hbox{class}\,c} \frac{\phi({\bf X}_{j})\phi({\bf X}_{j})^{T}}{\hbox{trace}(\phi({\bf X}_{j})\phi({\bf X}_{j})^{T})}, \quad c=1,2. $$

(10)

The normalized part of X _j in the feature space is denoted by the kernel function

$$ h_{{\bf X}_{j}} = \hbox{trace}(\phi({\bf X}_{j})\phi({\bf X}_{j})^{T}) = \sum_{{\bf x}_{k}\in{\bf X}_{j}}{k({\bf x}_{k},{\bf x}_{k})}. $$

(11)

Similar with (4, 5, 6), we also have

$$ {\mathcal{R}}_{c} = {\mathcal{H}}_{c}{\mathcal{H}}_{c}^{T} \quad \hbox{and} \quad {\mathcal{R}}_{t} = {\mathcal{H}}_{t}{\mathcal{H}}_{t}^{T}, $$

(12)

with

$$ {\mathcal{H}}_{c} = \left[\frac{\phi({\bf X}_{1})}{\sqrt{n_{c} \sum_{{\bf x}_{k}\in{\bf X}_{1}}{k({\bf x}_{k},{\bf x}_{k})}}}, \ldots, \frac{\phi({\bf X} _{j})}{\sqrt{n_{c} \sum_{{\bf x}_{k}\in{\bf X}_{j}}{k({\bf x}_{k},{\bf x}_{k})}}}\right]_{\mathop{{\bf X}_{j}\in {class c},}\limits_{1\leq j\leq n_{c}}}, $$

(13)

$$ {\mathcal{H}}_{t} = [{\mathcal{H}}_{1}, \quad {\mathcal{H}}_{2}], $$

(14)

where ${\mathcal{H}}_{c}\in {\mathcal{R}}^{M\times p_{c}}, p_{c}= T\times n_{c}$.

Then the CSP in ${\mathcal{F}}$ space can also be expressed as Rayleigh Quotient

$$ \varphi = \arg\max_\varphi\frac{\varphi^{T}{\mathcal{R}}_{c}\varphi} {\varphi^{T}\cal{R}_{t}\varphi}, $$

(15)

which can be solved by generalized eigenvalue problem

$$ {\mathcal{R}}_{c}\varphi = \lambda {\mathcal{R}}_{t}\varphi. $$

(16)

Hence we can restrict the solution space for (16) to span {ϕ(X)}. Let φ be represented as a linear combination of ϕ(x _i)

$$ \varphi = \sum_{i=1}^{P}\alpha_{i}\phi({\bf x}_{i}) = {\bf Y}{\alpha}, $$

(17)

where Y = [ϕ(x ₁), ...ϕ(x _P)], α = [α₁, ..., α_P]^T, and P is the total number of sample points for all EEG trials. According to (12) and (17), the equation (16) can be rewritten with left multiplied by Y ^T as follows

$$ ({\bf Y}^{T}{\mathcal{H}}_{c})({\bf Y}^{T}{\mathcal{H}}_{c})^{T}\varvec{\alpha} =\lambda ({\bf Y}^{T}{\mathcal{H}}_{t})({\bf Y}^{T}{\mathcal{H}}_{t})^{T}{\alpha}. $$

(18)

Now α is the new eigenvector in the ${\mathcal{F}}$ space. The matrix ${\mathcal{H}}_{c}^{T}{\bf Y}$ can be computed and denoted by

$$ {\mathcal{K}}_{c}^{T} ={\mathcal{H}}_{c}^{T}{\bf Y} = \left[ \begin{array}{ccc} \frac{k({\bf x} _{1},{\bf x}_{1})}{\sqrt{n_{c} h_{{\bf x}_{1}}}} & \ldots& \frac{k({\bf x}_{1},{\bf x}_{P})}{\sqrt{n_{c} h_{{\bf x}_{1}}}}\\ \vdots &\vdots&\vdots\\ \frac{k({\bf x}_{p_{c}},{\bf x}_{1})}{\sqrt{n_{c} h_{{\bf x}_{p_{c}}}}}& &\frac{k({\bf x}_{p_{c}},{\bf x}_{P})}{\sqrt{n_{c} h_{{\bf x}_{p_{c}}}}}\\ \end{array}\right], $$

(19)

where h _x _j = h _X _j and x _j ∈ X _j. We define N = n ₁ + n ₂ is the total number of EEG trials for two-class case and P = p ₁ + p ₂ is the sample points for the both classes data set. Hence,

$$ {\mathcal{K}}_{t}^{T} ={\mathcal{H}}_{t}^{T}{\bf Y} = \left[ \begin{array}{ccc} \frac{k({\bf x} _{1},{\bf x}_{1})}{\sqrt{n_{c} h_{{\bf x}_{1}}}} &\ldots& \frac{k({\bf x}_{1},{\bf x}_{P})}{\sqrt{n_{c} h_{{\bf x}_{1}}}}\\ \vdots &\vdots&\vdots\\ \frac{k({\bf x}_{P},{\bf x}_{1})}{\sqrt{n_{c} h_{{\bf x}_{P}}}}&&\frac{k({\bf x}_{P},{\bf x}_{P})}{\sqrt{n_{c} h_{{\bf x}_{P}}}} \end{array} \right]. $$

(20)

Then (18) is equivalent to

$$ {\mathcal{K}}_{c}{\mathcal{K}}_{c}^{T}{\alpha} = \lambda {\mathcal{K}}_{t}{\mathcal{K}}_{t}^{T}{\alpha}. $$

(21)

In order to relieve the nonsingular restriction for ${\mathcal{K}}_{c}{\mathcal{K}}_{c}^{T}$ and ${\mathcal{K}}_{t}{\mathcal{K}}_{t}^{T}$, we apply the GSVD to the pair $({\mathcal{K}}_{c}^{T}, {\mathcal{K}}_{t}^{T})$. Then

$$ {\bf U}^{T}{\mathcal{K}}_{c}^{T}{\bf X} = [\Upgamma_{c} \quad 0] \quad \hbox{and} \quad {\bf V}^{T}{\mathcal{K}}_{t}^{T}{\bf X} = [\Upgamma_{t} \quad 0], $$

(22)

where U and V are orthogonal and X is nonsingular, $\Upgamma_{c}^{T}\Upgamma_{c} + \Upgamma_{t}^{T}\Upgamma_{t} = {\bf I}$ and $\Upgamma_{c}^{T}\Upgamma_{c}, \Upgamma_{t}^{T}\Upgamma_{t}$ are diagonal matrices with nonincreasing and nondecreasing diagonal components respectively. Then the simultaneous diagonalizations of ${\mathcal{K}}_{c}{\mathcal{K}}_{c}^{T}$ and ${\mathcal{K}}_{t}{\mathcal{K}}_{t}^{T}$ can be obtained as

$$ {\bf X}^{T}{\mathcal{K}}_{c}{\mathcal{K}}_{c}^{T}{\bf X} =\left[ \begin{array}{cc} \Upgamma_{c}^{T}\Upgamma_{c} &0 \\ 0 &0\\ \end{array}\right] \quad \hbox{and} \quad {\bf X}^{T}{\mathcal{K}}_{t}{\mathcal{K}}_{t}^{T}{\bf X} =\left[ \begin{array}{cc} \Upgamma_{t}^{T}\Upgamma_{t} &0 \\ 0 & 0\\ \end{array}\right]. $$

(23)

The regularized GCSP criterion can be introduced through adding a regularization parameter η to (14)

$$ {\mathcal{H}}_{t} = [\eta{\mathcal{H}}_{1}, \quad (1-\eta){\mathcal{H}}_{2}], \quad 0 \leq \eta \leq 1. $$

(24)

Large values of η puts emphasis on the first class and vice versa. The appropriate value for η can be obtained by cross-validation.

The columns of X in (22) solves (21). Let ${\alpha}$ be the matrix obtained by the first r (default is 2) columns of X. For each class of c, we can obtain the corresponding ${\alpha}$. In the end, we combine the ${\alpha}$ for both two classes as the final spatial filters. Hence, for any new input EEG trial Z, the nonlinear spatial filtering by j-th filters is given by

$$ \varphi_{j}^{T}\phi({\bf Z}) = \varvec{\alpha}_{j}^{T}{\bf Y}^{T}\phi({\bf Z}) = \left[\sum_{i=1}^{P}\alpha_{ji} k({\bf x}_{i},{\bf z}_{1}),\ldots, \sum_{i=1}^{P}\alpha_{ji} k({\bf x}_{i},{\bf z}_{T})\right]. $$

(25)

Therefore, the energy of filtered EEG trial in (25) can be used to create the feature vectors as the input of classifier.

Experimental results

We conduct classification experiments on real EEG signals to discriminate between imagination of left hand movements (first class) and right hand movements (second class). Data are taken from two subjects S1 and S2. At second 2 of each trial a symbol indicating one specific mental task is displayed. At second 5, the screen is blank to relax the subject till the start of next trial. EEG signals are recorded from 6 electrodes with the sample rate of 256 Hz. The preprocess is performed to bandpass filter the EEG signals between 8 and 30 Hz. Each trial was split into non-overlapping time-segments of 1.5 s length prior to calculation of the spatial filters.

The Fig. 1 shows the data distribution of source signals filtered by the GCSP spatial filters. It is obvious that one class has the maximal variances while the others have the minimal variance in each direction. Therefore, the energies along the corresponding axes are enhanced by the kernel feature extractor. In this study, the Polynomial kernel function of k(x, y) = (x, y)^d, d = 2 and Gaussian kernel function of $k(x,y)=exp(-{\|x-y\|^{2}}/{2\sigma^{2}})$ are applied in GCSP and linear support vector machine (SVM) classifiers are used to assess the classification performance. The optimal value for the parameter σ is determined through 5-fold cross-validation on training data. Table 1 gives the classification results of GCSP and linear CSP methods with small number of EEG training trials. In the case of subject S2, the linear CSP and GCSP resulted in roughly similar performance, while the results of subject S1 obtained by GCSP are generally better than the linear CSP. Experimental results demonstrate the superiority of nonlinear feature extractor empirically.

Table 1 Classification accuracies (%) of linear CSP and nolinear GCSP algorithm by cross-validation

Full size table

However, the disadvantages of the GCSP lie in the time and memory complexity of the algorithm. Consider the GSVD for the kernel matrix ${\mathcal{K}}_{c}, {\mathcal{K}}_{t}$ in Eq. (21). The kernel matrices have a relatively large dimension in typical EEG classification problems. To put it into perspective, assume 20 EEG trials for each class is given, each one with a length of 1000 samples, then ${\mathcal{K}}_{c}, {\mathcal{K}}_{t}$ will have about 20,000² and 40,000² elements respectively. GSVD of such two matrices imposes a high computational burden.

Conclusion

In this paper the kernel CSP approach based on GSVD is described as a nonlinear spatial filtering method. For real BCI applications, one tends to use as few training trials as necessary. The optimal kernel feature extractor proposed in this paper meets this need fairly well. One advantage of GCSP is that it can be applied regardless of singularity of the spatial covariance matrices both in the original space and in the feature space. In the future, the design of kernel functions specialized to each certain subject and new applications besides EEG signal classification would be further investigated.

References

Wolpaw JR, Birbaumer N et al (2002) Brain-computer interfaces for communication and control. Clin Neurophysiol 113:767–791
Article PubMed Google Scholar
Long J, Li Y, Yu Z (2010) A semi-supervised support vector machine approach for parameter setting in motor imagery-based brain computer interfaces. Cogn Neurodynamics 1–10
Ramoser H, Muller-Gerking J, Pfurtscheller G (2000) Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Trans Rehabil Eng 8(4):441–446
Article CAS PubMed Google Scholar
Müller-Gerking J, Pfurtscheller G, Flyvbjerg H (1999) Designing optimal spatial filters for single-trial eeg classification in a movement tast. Clin Neurophysiol 110:787–798
Article PubMed Google Scholar
Huang G, Liu G, Meng J, Zhang D, Zhu X (2010) Model based generalization analysis of common spatial pattern in brain computer interfaces. Cogn Neurodynamics. doi:10.1007/s11571-010-9117-x
Muller K, Mika S, Ratsch G, Tsuda K, Scholkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–201
Article CAS PubMed Google Scholar
Schölkopf B, Smola A, Müller K (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
Article Google Scholar
Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Comput 12(10):2385–2404
Article CAS PubMed Google Scholar
Yang J, Frangi A, Yang J, Zhang D, Jin Z (2005) KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition. IEEE Trans Pattern Anal Mach Intell (2005) 230–244
Mika S, Ratsch G, Weston J, Scholkopf B, Muller K (1999) Fisher discriminant analysis with kernels. Neural Netw Signal Process IX:41–48
Google Scholar
Sun S, Zhang C (2006) An optimal kernel feature extractor and its application to EEG signal classification. Neurocomputing 69(13–15):1743–1748
Article Google Scholar
Zhang J, Tang J, Yao L (2007) Optimizing spatial filters with kernel methods for BCI applications. In: Society of Photo-Optical Instrumentation Engineers (SPIE) conference series, vol 6790, 138
Nasihatkon B, Boostani R, Jahromi M (2009) An efficient hybrid linear and kernel CSP approach for EEG feature extraction. Neurocomputing

Download references

Acknowledgments

The work was supported in part by the Science and Technology Commission of Shanghai Municipality (Grant No. 08511501701) and the National Natural Science Foundation of China (Grant No. 60775007, 90920014).

Author information

Authors and Affiliations

Laboratory for Advanced Brain Signal Processing, Brain Science Institute, RIKEN, Saitama, Japan
Qibin Zhao, Tomasz M. Rutkowski & Andrzej Cichocki
MOE-Microsoft Laboratory for Intelligent Computing and Intelligent Systems, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China
Liqing Zhang

Authors

Qibin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz M. Rutkowski
View author publications
You can also search for this author in PubMed Google Scholar
Liqing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej Cichocki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qibin Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, Q., Rutkowski, T.M., Zhang, L. et al. Generalized optimal spatial filtering using a kernel approach with application to EEG classification. Cogn Neurodyn 4, 355–358 (2010). https://doi.org/10.1007/s11571-010-9125-x

Download citation

Received: 28 February 2010
Revised: 19 July 2010
Accepted: 21 July 2010
Published: 03 August 2010
Issue Date: December 2010
DOI: https://doi.org/10.1007/s11571-010-9125-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Generalized optimal spatial filtering using a kernel approach with application to EEG classification

Abstract

Similar content being viewed by others

Euler common spatial patterns for EEG classification

Shrinkage Common Spatial Pattern for Feature Extraction in Brain-Computer Interface

Classification of EEG Motion Artifact Signals Using Spatial ICA

Introduction

Linear CSP

Kernel-based GCSP

Experimental results

Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generalized optimal spatial filtering using a kernel approach with application to EEG classification

Abstract

Similar content being viewed by others

Euler common spatial patterns for EEG classification

Shrinkage Common Spatial Pattern for Feature Extraction in Brain-Computer Interface

Classification of EEG Motion Artifact Signals Using Spatial ICA

Explore related subjects

Introduction

Linear CSP

Kernel-based GCSP

Experimental results

Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation