1 Introduction

Brain–computer interface (BCI) system is a new technology designed to create a pathway that connects the human brain and external devices without peripheral nerves and muscles (Xu et al., 2018). The BCI system provides a new means of communication for people with severe neuromuscular disorders by decoding task-related electroencephalogram (EEG) recordings and translating them into computer instructions for control and communication with external devices (Wolpaw et al., 2002; Jin et al., 2011).

So far steady state visually evoked potentials (da Cruz et al., 2015), P300 evoked potentials (Jin et al., 2015), slow cortical potentials (Mensh et al., 2004), and event-related desynchronization (ERD) (Pfurtscheller, 1977)/event-related synchronization (ERS) (Pfurtscheller, 1992) are neural response patterns commonly used in BCI systems. Motor imagery (MI)-based BCI systems, which are based on the ERD and ERS phenomena, are widely used as it is easier to operate than other systems based on external stimulus (Qiu et al., 2016).

Common spatial pattern (CSP) has proven to be a very effective feature extraction method (Ramoser et al., 1998), and its principle is to find spatial filters, to effectively evaluate discriminant information of MI by maximizing the variances of the projected signal of one class while minimizing another (Zhang et al., 2015; Blankertz et al., 2008). Since the EEG signal is very sensitive to noise, outliers caused by noise will cause poor computation of spatial filters based on the spatial covariance matrix, which leads to poor classification accuracy (Lotte & Guan, 2011; Thiyam et al., 2017). To address this problem, a large number of improved algorithms based on CSP are proposed. Sub-ABLD algorithm is a modified algorithm of CSP, and its principle is to overcome the problem caused by the non-stationary nature of EEG data by appropriately scaling the conditional covariance matrix and using different filter selection strategies (Thiyam et al., 2017). Sub-ABLD algorithm shows a certain degree of robustness to outliers trials in EEG data (Feng et al., 2018). Three real hyperparameters \(\alpha \), \(\beta \), and \(\eta \) affect the performance of Sub-ABLD (Thiyam et al., 2017), so how to choose better hyperparameters has a greater impact on improving the performance of Sub-ABLD algorithm. For the selection of hyperparameters, evolutionary algorithms (EA) such as genetic algorithm (GA) (Garrett et al., 2003) and particle swarm optimization (PSO) often have better effects. Compared with GA, PSO is widely used because of its advantages of simple programming, few parameters, and global search. Thus, this present study proposed the method of optimization hyperparameters of Sub-ABLD with PSO (PSO-Sub-ABLD) and compares it with CSP and Sub-ABLD with default hyperparameters. Two BCI competition datasets are selected to evaluate the performance.

The remainder content of this article is as follows: Sect. 2 describes the competition datasets used in this paper. Section 3 introduces the proposed method. Section 4 shows results. Finally, Sect. 5 concludes this study.

2 Description of the Data

In this paper, two competition datasets are used to evaluate the effectiveness of optimization parameters with PSO in the MI classification (Fig. 1).

  1. (1)

    Dataset1 (BCI competition IV datasets I): The dataset was recorded from 4 healthy subjects (named as a, b, f, and g) at 59 electrodes with sampling rate 100 Hz, during right hand, left hand, and foot MI tasks with a total of 200 trials that two classes of three tasks would be selected (Zhang et al., 2012). Each trial started with left, right, or bottom visual cues on the screen for a duration of 4 s and the subject completed the motor imagery tasks. More details about this dataset can be seen in the following website: http://www.bbci.de/competition/IV/.

  2. (2)

    Dataset2 (BCI competition III datasets IVa): The dataset was recorded from 5 healthy subjects (aa, al, av, aw, and ay) at 118 electrodes with down-sampling rate 100 Hz, during right hand and foot MI tasks with a total of 280 trials. The visual cue for each trial lasts for 3.5 s, in which only the right hand and right foot cues were displayed in the competition (Novi et al., 2007). More information about this dataset can be found from the following website: http://www.bbci.de/competition/iii/.

3 Method

3.1 Data Processing

For two datasets, starting from 0.5 to 2.5 s of EEG segment was selected from each trial (Song & Epps, 2007). The EEG data were third-order band-pass filtered with Butterworth band-pass filter of 8–30 Hz in this study (Sun et al., 2010).

Fig. 1
figure 1

Flow diagram of optimizing hyperparameters of Sub-ABLD with PSO

3.2 Sub-Alpha-Beta Log-Det Divergences

Sub-ABLD is an improved method of CSP algorithm. Its main purpose is to extract d expected spatial filters to reduce the impact of outliers contained in the EEG data on feature extraction. It is mainly divided into the following two steps. First, the discriminant subspace of the spatial filters is obtained by a robust method, and then, the EEG signal is filtered by the spatial filter obtained by discriminating the subspace. In the second step, the features are extracted with CSP algorithm. The input parameters of Sub-ABLD algorithm are the covariance matrices \(M_j\), \(N_j\) of two types of samples, hyperparameters \(\alpha \), \(\beta \), \(\eta \) and the number of filters. The process of obtaining the final spatial filter matrix is as follows:

  1. (1)

    Compute the prior probability \(p(c_1)\), \(p(c_2)\) and the average covariance matrix M, N of each class.

  2. (2)

    Compute the average covariance matrix \(\mathrm {Cov}(x)\) of the population (all types of stimuli) and perform eigenvalue decomposition on it.

    $$\begin{aligned} \text {Cov}(x)=p(c_1)M+p(c_2)N \end{aligned}$$
    (1)
    $$\begin{aligned} \text {Cov}(x)=U_1\Delta U_1^T \end{aligned}$$
    (2)
  3. (3)

    Compute the whitening matrix T:

    $$\begin{aligned} T=\Delta ^{-\frac{1}{2}} U_1^T \end{aligned}$$
    (3)
  4. (4)

    The whitening conversion process is performed on the covariance matrix of two types of samples and the average covariance matrix of each class to obtain \(\hat{M_j}, \hat{N_j}, \hat{M}, \hat{N}\).

  5. (5)

    Compute scaling parameters k:

    $$\begin{aligned} k={\left\{ \begin{array}{ll} {k_{\text {inf}}+\varepsilon } &{} {\text {for~~} k_{\text {inf}} \ge 1} \\ {1} &{} {\text {for~~} 1 \in (k_{\text {inf}}, k_{\text {sup}})} \\ {k_{\text {sup}}-\varepsilon } &{} {\text {for~~} k_{\text {stup}} \le 1} \end{array}\right. } \end{aligned}$$
    (4)
  6. (6)

    Initialize the iteration counter, \(i=0\);

  7. (7)

    Initialize the semi-orthogonalization matrix \(\Omega ^{(i)}=I_{n\times d}\), where n refers to the size of the average covariance matrix of each class, and d is the number of filters;

  8. (8)

    Compute robust criterion:

    $$\begin{aligned} f(\Omega ^{(i)})&=\eta {(p(c_1)\frac{1}{N_1}\Sigma _{j=1}^{N_1}D_{AB}^{(\alpha, \beta )}({(\Omega ^{(i)})}^T\hat{N_j}{(\Omega ^{(i)})} || k{(\Omega ^{(i)})}^T\hat{N}{(\Omega ^{(i)})}))}\nonumber \\&\quad -\eta {(p(c_1)\frac{1}{N_1}\Sigma _{j=1}^{N_1}D_{AB}^{(\alpha , \beta )}({(\Omega ^{(i)})}^T\hat{M_j}{(\Omega ^{(i)})}||k{(\Omega ^{(i)})}^T\hat{M}{(\Omega ^{(i)})}))}\nonumber \\&\quad +D_{AB}^{(\alpha , \beta )}({(\Omega ^{(i)})}^T\hat{M}{(\Omega ^{(i)})}||k{(\Omega ^{(i)})}^T\hat{N}{(\Omega ^{(i)})}) \end{aligned}$$
    (5)
  9. (9)

    Compute gradients \(\nabla f(\Omega ^{(i)})\), tangent matrices \(\Omega _{tg}^{i+1}\), and projection matrices \(\Omega ^{i+1}\).

    $$\begin{aligned} \nabla f(\Omega ^{(i)})=\frac{\partial f(\Omega ^{(i)})}{\partial \Omega ^{(i)}}-\Omega ^{(i)}\frac{\partial f(\Omega ^{(i)})}{\partial \Omega ^{(i)}}\Omega ^{(i)} \end{aligned}$$
    (6)
    $$\begin{aligned} 1. \Omega _{tg}^{i+1}=\Omega ^{i}+\mu ^{(i)} \nabla \mathrm {f}(\Omega ^{(i)}) \end{aligned}$$
    (7)
    $$\begin{aligned} \Omega ^{i+1}=N_L N_R^T \end{aligned}$$
    (8)
    $$\begin{aligned}{}[Q_L, D, Q_R]=svd(\Omega _{tg}^{i+1}, 0) \end{aligned}$$
    (9)
  10. (10)

    Increase the iteration counter and determine if it converges, otherwise return to step 8.

  11. (11)

    Select the maximum/minimum eigenvector of \(((\Omega ^{(i_{\text {max}})})^T\hat{M}\Omega ^{(i_{\text {max}})}\), \((\Omega ^{(i_{\text {max}})})^T\hat{N}\Omega ^{(i_{\text {max}})})\).

  12. (12)

    Obtain the final spatial filter matrix \(W^T\):

    $$\begin{aligned} W^T=V^T(\Omega ^{i_{(\text {max})}})^TT \end{aligned}$$
    (10)

After obtaining the final spatial filter matrix \(W^T\), the raw EEG data obtained by the time window interception is projected through the spatial filter matrix, and two new types of EEG data can be obtained by constructing corresponding features.

The total number d of expected spatial filters is 6 in this study.

3.3 PSO

PSO is a group intelligent optimization algorithm proposed by Kennedy and Eberhart to imitate bird foraging, which has been successfully applied to various optimization problems (Poli et al., 2007). The basic principle is to randomly initialize a group containing N particles in a three-dimensional space, each of which is a feasible solution to the optimization parameters. Mark the ith particle as \(x_i=(x_ {i1}, x_ {i2}, x_ {i3})\), bring it into the evaluation function, and determine the individual optimal position \(p_i^t=(x_{i1}^t, x_{i2}^t, x_{i3}^t)\) and global optimal position \(p_g^t=(x_{g1}^t, x_{g2}^t, x_{g3}^t)\). At the same time recording the velocity of the ith particle \(v_i=(v_{i1}, v_{i2}, v_{i3})\), its position and velocity update formula is as follows:

$$\begin{aligned} v_{id}^{t+1}=wv_{id}^t+c_1r_1(p_{id}^t-x_{id}^t)+c_2r_2(p_{gd}^t-x_{id}^t) \end{aligned}$$
(11)
$$\begin{aligned} x_{id}^{t+1}=x_{id}^t+v_{id}^{t+1} \end{aligned}$$
(12)

where w \((w=0.1)\) is the inertia weight proposed by Shi and Eberhart; the size of w affects the ability of the particle global and local search; \(r_1\) and \(r_2\) are random constants in (0, 1); and \(c_1(c_1 = 1.2)\) and \(c_2(c_2 = 1.2)\) are learning factors. The velocity of particles range from \(-1\) to 1 and the range of position is (\(-2\), 2).

4 Results

Subject f of dataset 1 and subject av of dataset 2 are respectively selected, and feature distributions of CSP, Sub-ABLD, and PSO-Sub-ABLD are compared as shown in Fig. 2. From the figure, it can be observed clearly that features extracted by PSO-Sub-ABLD are easier to classify than two other methods for two subjects.

Fig. 2
figure 2

Feature distribution (subject f of dataset 1 and subject av of dataset 2) of each class extracted by CSP, Sub-ABLD, and PSO-Sub-ABLD (cyan diamond represents the features of left hands, and magenta circle represents the features of right hands)

Tables 1 and 2 present classification accuracy derived by CSP, Sub-ABLD, and PSO-Sub-ABLD for all participants with test set of two datasets. For nine subjects selected in two datasets, PSO-Sub-ABLD improves the classification accuracy compared with two other algorithms and shows good generalization performance.

Table 1 summarizes that PSO-Sub-ABLD shows better classification accuracy for four subjects compared to two other methods. The average classification accuracy obtained by PSO-Sub-ABLD (\(\alpha = 0.5\), \(\beta = 0.9\), \(\eta = 1.7\)) is 75.9%, 13.8% higher than CSP and 8.8% higher than Sub-ABLD, respectively.

Table 2 shows classification accuracy for five subjects of dataset 2. The average classification accuracy obtained by PSO-Sub-ABLD (\(\alpha = 1.4\), \(\beta = 0.6\), \(\eta = 1.9\)) is 87%, 7.7% higher than CSP and 5.4% higher than Sub-ABLD, respectively.

Table 1 Classification accuracy comparison of CSP, Sub-ABLD (\(\alpha = \beta =\) 1.25, \(\eta =\) 0.25), and PSO-Sub-ABLD (\(\alpha = 0.9\), \(\beta = 1.7\), \(\eta = 0.5\)) using BCI competition IV dataset I
Table 2 Classification accuracy comparison of CSP, Sub-ABLD (\(\alpha = \beta = 2\), \(\eta = 0.5\)), and PSO-Sub-ABLD (\(\alpha = 0.6\), \(\beta =1.9\), \(\eta = 1.4\)) using BCI competition III dataset IVa

5 Conclusion

In this paper, PSO-Sub-ABLD has better robustness to outliers than CSP and Sub-ABLD to get better classification accuracy. Since default hyperparameters of Sub-ABLD result in poor generalization ability for different datasets, this study proposes PSO-Sub-ABLD algorithm, and it shows better generalization performance. In summary, compared with CSP and Sub-ABLD, PSO-Sub-ABLD obtains better classification performance based on the same classifier.

In the future work, one of the explorations is to combine channel selection and feature selection with PSO-Sub-ABLD to achieve better performance and to apply it to online systems.