A fuzzy-based classification strategy (FBCS) based on brain–computer interface

Saleh, Ahmed I.; Shehata, Sahar A.; Labeeb, Labeeb M.

doi:10.1007/s00500-017-2930-y

A fuzzy-based classification strategy (FBCS) based on brain–computer interface

Methodologies and Application
Published: 01 December 2017

Volume 23, pages 2343–2367, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Soft Computing Aims and scope Submit manuscript

A fuzzy-based classification strategy (FBCS) based on brain–computer interface

Download PDF

Ahmed I. Saleh¹,
Sahar A. Shehata¹ &
Labeeb M. Labeeb¹

524 Accesses
6 Citations
Explore all metrics

Abstract

Brain–computer interface systems help paralyzed people to control devices such as a computer cursor, robotic limbs, wheelchairs, or spellers by only using their thoughts. Nowadays, electroencephalogram (EEG) signals are mostly used to detect activity of various actions within the brain as they provide rich information about brain’s electrical activity. However, EEG signal generates large amount of data which is usually difficult to interpret and classify. This paper introduces a new classification strategy based on EEG signals, which is called fuzzy-based classification strategy (FBCS). FBCS minimizes the classification time by perfectly extracting the effective features of the produced EEG signals based on a set of elected electrodes using semantic analysis, then taking the classification decision accordingly. FBCS uses feature reduction and electrode selection techniques to reduce the dimensionality of data to be classified, which also improves the classification accuracy. Experimental results have shown that FBCS outperforms recent classification strategies in terms of accuracy and classification time.

Classification-Oriented Fuzzy-Rough Feature Selection for the EEG-Based Brain-Computer Interfaces

EEG Signals of Motor Imagery Classification Using Adaptive Neuro-Fuzzy Inference System

Hybrid Neuro-fuzzy Method for Data Analysis of Brain Activity Using EEG Signals

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Brain–computer interface (BCI) researches have grown over the past few decades since the 1970s. One goal of BCI research is to develop systems capable of classifying neural representations of natural movement planning and execution. Researchers aim to develop systems that help disable persons to communicate with external world and provide them a non-muscular pathway between their brains and prosthetic devices such as robotic limbs, wheelchairs, or spellers. Such applications are not only limited the use of prosthetic devices but also extended virtual gaming, tele-operation, communication, and robotics (McFarland and Wolpaw 2008; Daly and Wolpaw 2008).

However, BCI suffers from several challenges such as: (i) the training sets are sometimes relatively small, since the training process is affected by usability matter. Even though heavily training sessions are considered time-consuming and demanding for the subjects, in the training phase, trained subject’s signal has been used to learn the used classifier. Therefore, a significant challenge in designing a BCI is to balance the trade-off between the technological complexity of classifying the user’s brain signals and the amount of training needed for successful operation of the interface. (ii) Nonlinearity; the brain is a highly complex nonlinear system in which disordered behavior of neural ensembles can be detected. Thus, EEG signals can be better characterized by nonlinear dynamic methods than linear methods. (iii) Non-stationarity and noise; the used signals continuously changed over time either between or within the recording sessions. The mental, emotional state background through different sessions, fatigue, and concentration levels can contribute in EEG signals variability. Noise is also a big contributor in the BCI non-stationarity issue as it includes unwanted signals caused by alterations in electrode placement and environmental noise. (iv) High dimensionality; signals are recorded from multiple channels to preserve high spatial accuracy. As the amount of data needed to properly describe different signals increases exponentially with the dimensionality of the vectors, various feature extraction methods have been proposed. They play an important role in identifying distinguishing characteristics. Thus, the classifier performance will be affected only by the small number of distinctive features instead of the whole recorded signals that may contain redundancy (Abdulkader et al. 2015).

The basic steps of BCI include acquisition of brain signals, preprocessing, feature extraction, and classification. The decisions generated by the employed classifier can be used to control an external device. The electroencephalogram (EEG) is a popular signal acquisition noninvasive technique that allows BCI systems to measure electrical potentials of the brain at a temporal resolution on the order of milliseconds through electrodes placed on the surface of the scalp. Typically EEG caps with 6 to 64 electrodes are the mostly used (some cases use a much greater number of electrodes, e.g., 256), so the dimension of the feature space is often very large having redundant features which not only creates additional overhead of managing the space complexity but also might include outliers, thereby reducing classification accuracy (Rakotomamonjy et al. 2005).

Feature selection is a subarea of dimensionality reduction aims to identify the best subset of features out of original feature space. In BCI applications, principal component analysis (PCA) (Yu et al. 2014), independent component analysis (ICA) (Guo et al. 2013), sequential forward search (SFS) (Pal et al. 2014), and particle swarm optimization (PSO) (Hsu 2013) have been used for feature selection to reduce the dimensionality of data. After feature extraction and reduction, classification algorithms are used having two functions in training and practical applications of BCI. During training, the task is to infer a mapping between signals and classes using the labeled feature vector produced by the feature extraction module. During the application of BCI, the task is to discriminate different types of neurophysiologic signals translating them into commands therefore to allow for control of a BCI.

Recently, a number of widely used classifiers such as linear discriminant analysis (LDA), K nearest neighbor (KNN) algorithms, support vector machine (SVM), decision trees, Naive Bayes (NB) classifier, and neural networks (NN) (Lotte et al. 2007) have been used as BCI classifiers. Linearity is the main limitation of LDA, which can cause poor outcomes (Lotte et al. 2007). SVM has a low-speed execution but has good generalization properties (Lotte et al. 2007). On the other hand, KNN assigns an unseen data sample to the dominant class among its K nearest neighbors formed using the training set. KNN may fail in some BCI experiments due to its sensitivity to the curse of dimensionality. However, it performs efficiently with low-dimensional feature sets (Lotte et al. 2007). NB classifier is based on Bayes’ theorem, with a strong assumption of independence of the features, and it is more suitable in BCI applications with small number of trials.

This paper introduces a new fuzzy-based classification strategy (FBCS) based on brain–computer interface. FBCS includes novel techniques for feature reduction and electrode selection to reduce the dimensionality of data. Accordingly, both training time and response time have been minimized that makes FBCS suitable for real-time applications which need a quick response. On the other hand, FBCS is based on a fuzzy inference system which is employed for the classification task. To accomplish such task, a new instance of KNN is introduced, which is called fuzzified KNN (FKNN). In addition to the high classification accuracy, FKNN has a salient property that traditional KNN does not have, which is overfitting resistant. The cause is that FKNN adds several classification heuristics besides the K nearest neighbors, which are the distance among items in the feature space as well as the belonging degree of the item to the class. Those heuristics have been merged via a fuzzy inference system. Accordingly, FKNN provides accurate classification decisions. FKNN has been compared against recent classification techniques that had been applied to BCI. Experimental results have depicted that FKNN outperforms recent techniques as it gives not only the maximum classification accuracy and sensitivity but also the minimum response time. This paper is organized as follows: In Sect. 2, an overview about the EEG-based BCI systems as well as their main parts have been introduced. Section 3 shows the previous efforts about the dimensional reduction and BCI classification techniques. Section 4 introduces the proposed fuzzy-based classification strategy (FBCS). Section 5 discusses the experimental results. And finally, conclusion of our work is presented in Sect. 6.

2 General scheme of EEG-based BCI

Figure 1 illustrates the basic principle of EEG-based BCI. Initially, signals from the brain are acquired. Generally, there are three methods to acquire (capture) signals that represent the human brain electrical activities, which are (i) invasive, (ii) partially invasive, and (iii) noninvasive. Invasive capture provides high-quality signal reading, but causes great inconvenience and risks to human health. Partially invasive capture provides lower-quality signals and lower risk to health. On the other hand, noninvasive capture is fully external to the body, more convenient, and easy to use and provides good quality signal capture and not present risk to users.

Although there are many methods to detect brain signals, EEG acquisition system has relatively short time constants, can function in most environments, and require relatively simple and inexpensive equipment, offering the possibility of a new non-muscular communication and control channel. EEG signal is acquired with the help of a multi-channel headset having a certain sampling rate.

EEG (electroencephalogram) is the most popular noninvasive brain signal acquisition tool; thus, it is the cheapest and the simplest recording technique. However, it has low signal-to-noise ratio (SNR) due to the environmental noise and artifacts caused by muscle and eye movements. The EEG system contains electrodes, amplifiers, A/D converter, and a recording device, which may be a personal computer or similar. The electrodes acquire the signal from the scalp; the amplifiers process the analog signal to enlarge the amplitude of the EEG signals (signals on scalp are very small microvolt range (1/1,000,000 V)), so that the A/D converter can digitalize the signal accurately. Then, the recording device stores and displays the data. The produced signal which is digitized and analyzed can be used to extract commands that can control a computer or a device.

Applications include spelling, computer mouse control, and prosthesis or robot control. Generally, BCI can be applied in several applications; mainly, it allows paralyzed people to control prosthetic limbs with their mind; visual images can be transmitted to the mind of a blind person, allowing them to see; auditory data can be transmit to the mind of a deaf person, allowing them to hear. From another point of view, BCI allows gamers to control video games with their minds; it can also allow a mute person to have their thoughts displayed and spoken by a computer. Finally, a feedback is provided to the user for further interaction. An improvement in just one of these steps can improve the performance of a BCI system.

3 Related work

The main target of this paper is to introduce a new classification strategy to enhance the classification performance of BCI systems using the concept of dimensional reduction. Hence, some of the recent efforts in employing some dimensional reduction and classification techniques in BCI applications will be represented in this section.

Principle component analysis (PCA) is a widely used dimensionality reduction linear transformation technique. However, the projections it finds are to maximize variances, which are not necessarily related to classification performance (the class labels); it is not particularly useful in classification and pattern recognition applications. Linear discriminant analysis (LDA) attempts to overcome this limitation of PCA by finding linear projections that maximize class separability under the Gaussian distribution assumption (Fukunaga 1990). The LDA projections are optimized based on the means and the covariance matrices of classes, which are not descriptive of an arbitrary probability density function (pdf). Independent component analysis (ICA) has also been used as a tool to find linear transformations that maximize the statistical independence of random variables. However, it has similar drawbacks as PCA. CSP (common spatial patterns) can be used instead of PCA and ICA (Naeem et al. 2009).

Atyabi et al. (2012) introduced electrode reduction (ER) and feature reduction (FR) methods based on genetic algorithms (GA) and particle swarm optimization (PSO). Evolution-based methods are used to generate a set of indexes presenting either electrode seats or feature points that maximize the output of a weak classifier representing a comparison between genetic algorithms (GA), particle swarm optimization (PSO), and random search algorithm as electrode and feature reduction methods. The results indicate that on average across all subjects, in GA-based ER, GA-based FR, random-based ER, random-based FR, and PSO-based FR, the electrode reduction (ER) had a greater impact on classification performance compared to feature reduction (FR), and the combination of polynomial SVM- and GA-based ER performed better than all other methods except the combination of the use of the full-set with polynomial SVM.

Sparse common spatial pattern (SCSP) algorithm was proposed in Arvaneh (2011), to select the least number of channels within a constraint of classification accuracy. To select channels using the SCSP method, first two sparse common spatial filters corresponding to two motor imagery tasks are obtained. After obtaining the sparse filters, channels corresponding to the zero elements in both of the spatial filters are discarded, and the rest are defined as the selected channels. To compare and consider the importance of each selected channel, a ranking method was proposed as follows: first, the top ranked channels for each motor imagery task are determined from the maximum of the absolute value of the corresponding sparse spatial filter. SCSP algorithm yielded an average improvement in 10% in classification accuracy compared to the use of three channels.

Multi-objective particle swarm optimization (MOPSO) method proposed in Hasan et al. (2009) solves the problem of effective channel selection for brain–computer interface (BCI) systems. The proposed method was tested and compared to another search-based method, sequential floating forward search (SFFS). The results demonstrate the effectiveness of MOPSO in selecting a fewer number of channels with insignificant sacrifice in accuracy, which is very important to build robust online BCI systems.

Muhammad et al. (2015) presented a comparison of mostly used classification algorithm with a new unsupervised learning technique for classification, i.e., self-organizing maps (SOM) based on neural network. SOM and other algorithms have been used to categorize the feature vector acquired from the EEG dataset into their corresponding classes. Both original and reduced feature sets have been used for classification of motor imagery-based EEG signals. The reduction is performed by applying principal component analysis (PCA). It has been depicted from measured data that SOM shows a maximum classification accuracy of 84.17% on PCA implemented reduce feature set.

Nanayakkara and Sakkaff (2012) presented a new classification method, which is closely related to K nearest neighbor (KNN) classification method, named fixed distance neighbor (FDN) classifier. For comparison purposes, performance of KNN classification method and performance of the FDN method are tested with the same feature vectors derived from EEG datasets recorded for imagery motor movement mental tasks. It was found that FDN performed slightly better than KNN for most of the datasets used in this study, indicating that FDN is a viable classification method, which can be used in place of KNN in BCI systems.

Authors in [17] used a combination of bacterial foraging optimization and learning automata to determine the best subset of features from a given motor imagery electroencephalography (EEG)-based BCI dataset. They employed discrete wavelet transform to obtain a high-dimensional feature set and classified it by distance likelihood ratio test. This proposed feature selector produced an accuracy of 80.291% in 216 s. On the other hand, Zanchettin et al. (2012) presented a hybrid KNN-SVM method for cursive character recognition. The main idea was to increase the K nearest neighbor recognition rate, sensible to different classes with similar attributes, using the SVM as a decision classifier. The adaptation was to get the two most frequently classes in the KNN and use the SVM to decide between these two classes. The main disadvantage is the processing time.

The advantages of self-organizing maps (SOM) artificial networks and KNN were explored in Silva and Del-Moral-Hernandez (2011), so the KNN performs the classification process and the SOM work as preprocessing to the KNN classifier, applied to digits recognition in car plates. The main advantage of this method is that the time consumed by SOM-KNN is shorter than time consumed by KNN. Finally, a review of BCI several techniques for signal acquisition, preprocessing or signal enhancement, feature extraction, classification, and the control interface was discussed in Nicolas-Alonso and Gomez-Gil (2012), representing their advantages, drawbacks, and latest advances.

4 The proposed fuzzy-based classification strategy (FBCS)

This section illustrates the proposed fuzzy-based classification strategy (FBCS) in details. The different steps of FBCS are depicted in Fig. 2. As illustrated in Fig. 2, the proposed FBCS consists of seven sequential steps, namely (i) data acquisition, (ii) preprocessing, (iii) feature extraction, (iv) feature selection, (v) dimensionality reduction, (vi) classification, and (vii) decision making for choosing a certain action. However, FBCS mainly focuses on: (iv) feature selection to acquire a set of compact and informative features, (v) dimensionality reduction to minimize processing time, and (vi) classification to take the corresponding precise decisions. We claim that giving more attention to those steps will not only improve the performance of the BCI system, but also greatly reduce the computational load of the system. The next subsections explain those steps in more details.

4.1 Data acquisition

Data acquisition can be accomplished through an EEG cap. Figure 3 shows an example of EEG caps, representing a general view of EEG cap filled with electrodes. These electrodes are set up according to the standard 10/20 system of electrode placement method. EEG cap with 22 electrodes with 250 Hz sampling rate from Dataset 2a of BCI competition IV provided by BCI research group at Graz University (Brunner et al. 2008) and EEG cap with 118 electrodes with 1000 Hz sampling rate from Dataset IVa of BCI competition III (xxx yyy) are used as data acquisition components to evaluate the proposed FBCS.

4.2 Preprocessing

Generally, the purpose of signal preprocessing is to enhance the signal produced by EEG. Unfortunately, EEG-recorded data are highly challenging to evaluate due to the noise recorded together with EEG signal, non-stationary, and diverse artifacts. Artifacts are irrelevant unwanted signals present in BCI system. They have various origins, which include: utility frequencies such as noise, body movements, or eye blinks. As noise amplitude is usually larger than the signal of interest, the goal of preprocessing is to increase signal-to-noise ratio (SNR) for the signal acquired from EEG headset (Mallick and Kapgate 2015). Figure 4 represents the original signal before and after filtration to show the effect of noise.

The digital EEG signal is stored electronically and can be filtered. Filtering can be applied either in the frequency domain by selecting different pass bands, or in the spatial domain. Frequency filtering removes noises, such as filtering out direct current and high frequency noise (1–45 Hz). Frequency filtering also has the ability to select relevant frequency components, such as sensorimotor rhythm 8–12 Hz (mu). The goal of spatial domain filtering is to create a subset of EEG channels, which are related to certain brain activity, as well as to enhance the separability of the data. The choice of spatial filter can affect the SNR greatly. Bipolar derivation, Laplacian derivation, principal components analysis (PCA), independent components analysis (ICA), and common spatial patterns analysis (CSP) are alternative methods for deriving weights for a linear combination of EEG channels (Jung et al. 2000). In FBCS, EEG signals were band-pass filtered from 8 to 30 Hz including mu (8–13 Hz) and beta (13–30 Hz) rhythms, which are used for classifying motor imagery data.

4.3 Feature extraction

The goal of feature extraction is to represent the characteristics of original signal without unwanted redundancy. The features can be extracted from the EEG signal in two different domains, which are: time domain features (TDF) and frequency domain features (FDF).

Unlike Fourier transform, which provides frequency domain analysis at a constant resolution on the frequency scale, discrete wavelet transform (DWT) provides frequency domain as well as time domain analysis at multiple resolutions. Frequency domain analysis is mainly based on the power and coherence of each frequency band in the EEG signals. Spectral power estimation is the primary means of frequency domain analysis. While time domain analysis method mainly analyzes the geometric property of EEG waveforms, such as amplitude, mean, variance. It is widely used by EEG researchers for its intuition and clear physical meaning (Zhao et al. 2015).

In this paper, DWT is used. Signals are passed through filters with different cutoff frequencies and different scales. The number of filter stages (levels) to be used depends on the resolution required. So the feature vector is prepared using the detail coefficients of third and fourth level (D3 and D4) for each electrode because these levels contain information in the frequency range of 8–12 and 16–24 Hz. Considering a headset of 14 electrodes as an example, the dimension of a feature matrix is $M=1176$; thus, the number of samples is $S=84$ samples.

4.4 Feature reduction

Feature reduction methods aim to identify a subset of ‘meaningful’ features out of the original set of features. Feature reduction has several advantages such as (i) overfitting avoidance, this is because the classification model is trained with the most precise and informative features, (ii) performance promotion, and (iii) processing time minimization, which makes the model more suitable for real-time applications (Saleh et al. 2016; Saleh and Abulwafa 2017). Generally, feature reduction methods can be subdivided into filter, wrapper, and embedded methods (Saleh and Abulwafa 2017). Filter methods compute a score for each feature by their information content, and then select only the features that have the best scores. On the other hand, wrapper methods train a predictive model on subsets of features; then, the subset which gives the best accuracy is selected. Finally, embedded methods determine the optimal subset of features directly by the trained weights of the classification method.

In this paper, we propose a new filter approach for feature reduction; the goal is to determine one feature value for the repeated feature values of the same electrode. For number of n trails, there will be n feature matrices, i.e., n value for each element in the feature matrix of dimension $M(E\,\times \,S)$, for the above-mentioned example the matrix M of dimension ($14\,\times \,84$) is repeated n times for the same action. So, we propose a feature reduction phase to have only one nearly equivalent value for each element.

As shown in Fig. 5, theoretically the value of the feature $x_{i,j}$ remains constant for n trails for the same action. But due to the effect of many factors like the personal feeling, fatigue, happiness, sadness, the value of any element $x_{i,j}$ may differ in each trail. By following the next steps, there will be one value for each element constructing one feature vector for n trails of one action.

For each element in the matrix $(x_{i,j})$, the following steps should be followed:

Step 1 represent the n values of the element $(x_{i,j})$ the linear axis shown in Fig. 6, as $x_{1},x_{2} ,{\ldots },x_{n}$, which are assumed normally to be nearly identical.
Step 2 calculate the average value $\mu $ of the n values of the element $x_{i,j}$, as:
$$\begin{aligned} {\mu }=\frac{\mathop \sum \nolimits _{i=1}^n x_{i}}{n} \end{aligned}$$
(1)
Step 3 find the set of values of x in the neighborhood of $\mu $ after determining the neighborhood width as:
$$\begin{aligned} \hbox {NW}= \frac{{X_\mathrm{max}}-{X_\mathrm{min}}}{2} \end{aligned}$$
(2)

Then select the set
$$\begin{aligned} S_{1}=\{x(i) \forall \mu - \hbox {NW}< x(i)<\mu + \hbox {NW}\} \end{aligned}$$
(3)
Step 4 repeat step 2 and step 3 for a pre-defined $(\xi )$ times till find the approximately value of the element $x_{i,j}$ as the average of the items $\in {S}_{\xi }$, or till we reached to one value of element $x_{i,j}$ before the complete of $(\xi )$ times.

The pre-mentioned steps will be followed for all the elements; the data after n trails will be only one matrix of M elements. For illustration, if the system simply has 20 trails using 14 EEG electrodes and each electrode’s signal has been sampled to 84 samples, so there will be 20 feature matrices each of dimension 14$\,\times \,$84. Thus, within each matrix each element $ x_{i,j} $ will have 20 theoretically identical values which is equal 1176 elements. Table 1 shows an example of $x_{i,j}$ values from the 20th trails. The goal is to have only one feature matrix with a certain value for each element $x_{i,j}$. First, the average of the given 20 values should be determined, which will be equal to 0.537. Then, the range of neighborhood width as depicted in Eq. (2) is calculated, which is equal to 0.395. From Eq. (3), a selected list $S_{1}$ from the given feature values, which lies inside the elected range, is picked. Doing the same action for the selected set of values $S_{1}$, a new set of feature values $S_{2}$, which has a number of items less than that of $S_{1}$, can be selected. Assuming $\xi =10$, the value of $x_{i,j}$ can be represented by only 0.44 from $S_{8}$ after 8 times of calculations. Repeating this procedure for the remaining elements of the feature matrix, the result is one matrix representing the 20 trails after removing the outlier values.

Table 1 An example of the proposed feature reduction method

A fuzzy-based classification strategy (FBCS) based on brain–computer interface

Abstract

Similar content being viewed by others

Classification-Oriented Fuzzy-Rough Feature Selection for the EEG-Based Brain-Computer Interfaces

EEG Signals of Motor Imagery Classification Using Adaptive Neuro-Fuzzy Inference System

Hybrid Neuro-fuzzy Method for Data Analysis of Brain Activity Using EEG Signals

Explore related subjects

1 Introduction

2 General scheme of EEG-based BCI

3 Related work

4 The proposed fuzzy-based classification strategy (FBCS)

4.1 Data acquisition

4.2 Preprocessing

4.3 Feature extraction

4.4 Feature reduction

4.5 Electrode selection

4.6 Classification and decision making

Definition 1

Definition 2

Definition 3

5 Results and discussion

5.1 Employed datasets

5.2 Experimental results

5.2.1 Evaluating the proposed feature reduction methodology

5.2.2 Evaluating the proposed electrode selection methodology

5.2.3 Evaluating the proposed fuzzified KNN classifier

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation