Keywords

1 Introduction

Being one of the most natural parts in human-computer interaction (HCI), Brain-Computer Interfaces (BCIs) have shown great promise for the physically disabled people or people with severe neuromuscular disorders [1, 2]. According to several studies, signals recorded from the brain can become a substitute for any job that requires muscle control or movement [3]. There are a number of methods, such as electroencephalography (EEG), functional MRI (fMRI), electrocorticography (ECoG), calcium imaging, magnetoencephalography (MEG), functional near-infrared spectroscopy (fNIRS), etc., using which such brain signals can be captured.

Electroencephalography (Scalp EEG) signals, which are small amounts of electromagnetic waves emitted by the neurons in the brain [4], are one of the most popularly used signal acquisition techniques in the existing BCI systems due to their non-invasiveness, easy to use, reasonable temporal resolution and cost effectiveness compared to other brain signal recording methods [2]. As far as EEG recordings are concerned, the signals are the outcome of a highly complex non-linear and non-stationary stochastic biological process which contain a wide variety of noises both from internal and external sources. Thus, the use of computational intelligence  is required at every step of an EEG-based BCI system starting from removing noises (using advanced signal processing techniques such as SWTSD , ICA, EMD, other than traditional filtering by identifying/exploiting different artifact/noise characteristics/patterns) through feature extraction and selection (by using unsupervised learning like PCA , SVD , etc.) and finally for classification (either supervised learning based classifier like SVM, probabilistic classifier like NB or unsupervised learning based classifiers like neural networks namely RBF, MLP, DBN, k-NN, etc.).

EEG signals can be broken down into five main rhythms based on their frequency range: delta (δ), theta (θ), alpha (α), beta (β) and gamma (γ) [4, 5]. A brief description of the EEG rhythms and traits are shown in Table 1.

Table 1 Rhythms and their traits of EEG signal

EEG relies on the averaging of the responses of many neurons [6]. It is non-invasive where signal acquiring electrodes are positioned on the scalp according to the standard 10–20 international system [7] (see Fig. 1) to ensure reproducibility among studies.

Fig. 1
figure 1

The 10–20 international system

Every electrode in the 10–20 system has a unique identity that identifies which lobe and hemisphere of the brain does one particular electrode correspond to. The letters F, T, C P, and O stand for frontal, temporal, central, parietal, and occipital lobes respectively. Right hemisphere electrode positions are referred with even numbers (between 2 and 8) whereas odd numbers (between 1 and 7) correspond to the left hemisphere. Electrodes positioned on the midline are referred as a “z” (zero) [8]. This means that each of the electrodes provides information to a particular area of the brain. However, this highly depends on the accuracy of the electrodes’ placement.

One of the biggest disadvantages of EEG signals is that they are highly susceptible to noise mainly because of its non-invasive nature. These noises, often termed as artifacts, are influenced by extraneous signals, for example electromyography (EMG )—electrical signals originating from muscles in the face and scalp instead of signals originating in the brain [9] and electrooculography (EOG)—noise generated from eye movements/blinking [10]. Also motion artifact is a major source of noise in EEG due to physical movement of the subject [11]. Fortunately, there have been a significant number of researches done to utilize advanced signal processing techniques to overcome these noises [11,12,13,14,15].

There are several EEG signal acquisition devices typically used by researchers in the literature [16]. They are g.USBamp [17, 18], g.BSamp [19], and g.BCIsys [20] made by g.tec in Austria, Cerebus [21,22,23] made by Black-Rock Microsystems in USA, SynAmps 2 [24,25,26] made by Compumedics Neuroscan in Australia, wireless Emotiv EPOC [27,28,29,30] made by Emotiv Systems in USA, BrainNet-36 [31], ANT-Neuro [32], FlexComp Infiniti encoder [33], etc. In the recent past, a whole new domain for BCI researchers have opened up thanks to the advent of low-cost, easy to use portable dry/wet electrode wireless EEG recording devices such as NeuroSky’s MindWave [34], InteraXon’s Muse [35], Emotiv EPOC [27], etc. which have been used by researchers in several studies [4, 36,37,38] as well.

The objective of BCI systems is to extract specific signature of the brain activity and to translate them into command signals to control external devices (see Fig. 2) [39]. These features can be P300 evoked potentials, event-related potentials (ERPs) recorded on the cortex, slow cortical potentials (SCPs), sensorimotor rhythms acquired from the scalp, neuronal action potentials recorded within the cortex, etc.

Fig. 2
figure 2

A general description of a BCI system. The signal processing module can be divided into four submodules: pre-processing, feature extraction, feature selection and classification

Computational Intelligence is mainly involved in the Signal Processing module in Fig. 2 which can be broken down into four submodules [2]:

  • Pre-processing—removal of noises/artifacts from the EEG signals,

  • Feature Extraction—extracting features from the EEG signals,

  • Feature Selection —selecting only the features that contains most of the information and

  • Classification —deciding to which group does this set of EEG signals correspond to.

Researchers often skip the Feature Selection submodule [40,41,42,43,44,45,46] because, this step is only useful when the size or the dimensions of the features extracted by step (ii) is quite large. Large feature sets correspond to slower execution time making several BCI systems completely useless, especially online BCI systems. Thus, the Feature Selection step is used as a dimensionality reduction step to speed up computational time.

In this chapter, we first present a thorough review of several articles for different BCI paradigms. Our focus is on algorithms used by researchers for each of the submodules of the Signal Processing module of a BCI system to solve a particular problem. Then, we analyze different contemporary algorithms for each submodule of a Signal Processing module on two datasets we acquired from:

  • 19 college-aged young adults using Emotiv EPOC [27] at a sampling frequency of 128 Hz and

  • 19 college-aged young adults using the Muse headband [35] at a sampling frequency of 220 Hz

where each of the participants was shown three different types of videos [47].

2 Use of Computational Intelligence in Different BCI Applications

Based on brain activity patterns , there are mainly four types of EEG-based BCI systems [16]—event related desynchronization/synchronization (ERD/ERS) [48], steady-state visual evoked potential (SSVEP) [2], event-related potential (ERP) [49], and slow cortical potential (SCP) [50]. Except for SCP, the other three are most popular among researchers [51,52,53].

These EEG-based BCI paradigms have led to many BCI applications. Emotion classification [36, 54,55,56], cognitive task classification [38, 57], P300 spellers [58,59,60,61,62] and others [63, 64] as an alternative and augmentative communication (AAC) platform [65], brain-controlled wheelchair [66,67,68,69], controlling a robot [70,71,72,73], rehabilitation of locked-in patients [74,75,76,77,78], neuro-prosthesis [79,80,81,82], gaming [83, 84], etc. are just a few examples. In this section, we will discuss about different pattern recognition techniques used by researchers for the detection of the three most prominent brain activity patterns i.e. ERD/ERS, SSVEP and ERP.

2.1 Motor Imagery

One of the most researched domain in ERD/ERS based BCIs are Motor Imagery (MI) [85,86,87]. MI corresponds to the imagination of moving a body part (for example right/left hand, tongue, both feet, etc.) without actually moving it. Oscillatory activities can be observed in different locations in the brain’s sensorimotor cortex for different MI tasks. The objective is to classify such activities to be able to recognize the underlying MI task performed [88]. To achieve this, researchers in the past have experimented with various algorithms to improve the efficiency of the system as much as possible. A summary of different techniques used by researchers is presented in Table 2.

Table 2 Summary of algorithms used by researchers for MI-based BCI

Band-pass filtering the EEG data from 0.5 to 30 Hz, Hamedi et al. [40] implemented Integrated EEG (IEEG) and Root Mean Squares (RMS) as feature extraction algorithms and Radial Basis Function (RBF) Neural Networks and Multilayer Perceptron (MLP) as classifiers for three class (right/left hand and tongue movement) MI classification . Comparing these algorithms with Support Vector Machine (SVM) classifier and Willison Amplitude (WAMP) feature extraction algorithm, it was illustrated that SVM performs better with regards to accuracy and time taken for training and WAMP was more suitable than RMS and IEEG.

Chatterjee et al. [89] classified the BCI competition II [94] MI dataset of left and right-hand movements with the accuracy of 85% and 85.71% for SVM and MLP respectively. They achieved this result by applying wavelet-based energy-entropy method as the feature extraction technique and average power-based feature provided better ROC area than the statistical feature. Their data were filtered using an elliptic band-pass filter on the range 0.5 to 30 Hz.

An et al. [90] in their paper also used an elliptic band-pass filter to attenuate signals in the range of 8 to 30 Hz and used Neuroscan software to remove EOG artifacts. They found that deep belief network (DBN) gives a 4–6% better performance compared to SVM when DBN was constructed with the combination of Restrict Boltzmann Machine (RBM), Adaboost algorithm and Contrastive Divergence (CD) for 8 hidden layers. Number of nodes had no effect, but subject’s concentration and status played an important part in the performance of the classifier.

In a study [88] on BCI competition IV dataset 2b and competition II dataset III [94], the authors applied a combination of convolutional neural network (CNN) and stacked autoencoders (SAE) model and achieved an accuracy of 90.0% whereas the winner algorithm achieved 89.3% accuracy. According to kappa value, 9% improvement was achieved using this deep learning approach than the BCI competition winner algorithm.

Kevric et al. [41] presented a comparison among three feature extraction methods— Discrete Wavelet Transform (DWT), Wavelet Packet Decomposition (WPD), and Empirical Mode Decomposition (EMD). The maximum average accuracy of 92.8% was achieved with the combination of Multiscale Principal Component Analysis (MSPCA) as noise removal technique, higher-order statistical features extracted from WPD sub-bands and k-nearest neighbour (k-NN) as the classifier. EEG data were band-pass filtered from 0.05 to 200 Hz.

Hsu et al. [91] classified 10 subjects’ motor imagery data with SVM, genetic algorithm (GA) as feature selection method and student’s two-sample t-statistics and continuous wavelet transform (CWT) as feature extraction method. They achieved an average classification accuracy of 86.7%. Gaussian filter was used in order to smooth the power spectrum data.

In [92], a modified cross-correlation based logistic regression (CC-LR) algorithm was used on three statistical feature sets consisting of mean, standard deviation, skewness, maximum, minimum and kurtosis as six features for BCI competition III dataset IVa and IVb [94]. Their algorithms provide better accuracy in three out of five subjects when compared with eight other known algorithms and the difference between proposed method accuracy and BCI competition III winner algorithm is 0.3. Digitized data of 1000 Hz was band-pass filtered between 0.05 to 200 Hz with a 16-bit accuracy.

Zhang et al. [93] achieved 81.7% accuracy (with ±15.1 standard deviation) and computational time of less than 10 seconds by implementing sparse Bayesian learning of frequency bands (SBLFB). They extracted features via common spatial pattern (CSP) and achieved better results when this combination was compared with other proposed methods implemented on the BCI Competition IV IIb dataset [94]. A band-pass filter was applied (0.5–100 Hz) with a 50 Hz notch filter.

2.2 Steady State Visual Evoked Potential

When the flickering frequency of the visual stimuli matches the frequency of the firing frequency of the visual cortex’s neurons, the resulting brain signals are called Steady State Visual Evoked Potential (SSVEP) [95, 96]. SSVEP is identifiable in the range 5–60 Hz and is a very useful BCI tool due to its quite low signal to noise ratio (SNR). SSVEP can easily be identified in EEG signals and therefore it is possible to classify various kinds of visual stimuli. Researchers in the past have experimented with various techniques to classify these stimuli with competitive results.

Chen et al. [45] proposed a SSVEP-based single-channel BCI system using control algorithm and fuzzy tracking for amyotrophic lateral sclerosis (ALS) patient. Fuzzy control algorithm achieved average recognition rate of 96.97% compared to 94.9% achieved by canonical correlation analysis (CCA). In their proposed BCI system, they used fast Fourier transform (FFT) as feature extraction algorithm and in the pre-processing module, to extract data in the range 4–60 Hz, a 2nd-order Butterworth band-pass filter.

Maronidis et al. [97] proposed the use of Subclass Marginal Fisher Analysis (SMFA) to detect SSVEP and compared its result with CCA and Multiple Linear Regression (MLR) for different number of trials and channels. In both the settings, SMFA achieved better results than the other two algorithms. Authors used a 3rd degree band-pass Butterworth Infinite Impulse Response (IIR) filter (6–80 Hz) in the pre-processing module.

Kalaganis et al. [46] experimented with error-related potentials in SSVEP-based BCI system. Authors implemented Minimum Covariance Determinant (MCD) as an outlier detection algorithm or to remove noisy data, Common Spatial Patterns (CSP) as feature extraction technique, SVM, Random Forrest (RF) and Adaboost as classifiers. In comparison between SVM, RF and Adaboost, RF provides better average accuracy (0.8187) and recall rate (0.5633).

In the study conducted by Friman et al. [98], the authors achieved an average classification accuracy of 84% with the minimum energy method as classifier which takes about 4 msec computational time. Autoregressive model was implemented to calculate the noise level in SSVEP signal. In [99], authors compared between three feature extraction, feature selection and classification techniques for SSVEP-based BCI system. They implemented bank of filters, Welch’s method and short-term Fourier transform as feature extraction methods, incremental wrapper, Pearson’s method and Davies-Bouldin index as feature selection methods and support vector machine (SVM), linear discriminant analysis (LDA ), and extreme learning machine (ELM) as classifiers on band-pass Butterworth (5–60 Hz) and notch filtered (58–62 Hz) EEG signal. LDA provides a better classification accuracy with Welch’s method and incremental wrapper as feature extraction and feature selection methods respectively. Table 3 summarizes the algorithms used by different studies to classify SSVEP from EEG signals.

Table 3 Summary of algorithms used by researchers for SSVEP-based BCI

2.3 Event Related Potentials

The very small voltages in the brain structure generated due to the occurrence of certain events or stimuli are known as event-related potentials (ERPs) [100]. These fluctuations in the brain signal are evoked by and is also time-locked to a motor, sensory or cognitive event. Among several types of ERPs, namely N100 or N1, N200 or N2, P100 or P1, P200 or P2, etc., the P300 or P3 is the largest ERP component which gets triggered during an oddball paradigm. This oddball paradigm is one in which a participant is presented with a series of events which can be classified into two groups—frequently presented class and a rarely occurring class. The infrequent event generates a positive deflection (or a P300 peak) in the scalp voltage about 300 msec after stimulus presentation [101].

This P300 ERP has contributed substantially in the development of several EEG-based BCI applications. P300 Spellers [58,59,60,61,62], Brain Painting [102, 103], controlling a virtual environment [104], gaming [105], etc. are just a few examples. For such applications the proper detection of the P300 peak, like any other pattern recognition problem involves pre-processing, feature extraction, feature selection , and classification .

Typically, band-pass filters are used on raw EEG signals to extract data in the range 0.1–30 Hz [101]. Although, Speier et al. [42] and Chaurasiya et al. [106] used substantially different high cut-off frequencies of 60 Hz and 10 Hz respectively and were able to achieve very good results. Filtered raw EEG data as features are not uncommon for P300 Spellers [106, 107]. However, sophisticated methods like ordinary least-squares regression [108] or conventional methods involving wavelet transforms [43, 44] can also be found in the literature. Feature selection , as discussed before, are used only when the size of the dataset is quite big and therefore, out of the articles summarized in Table 4, only one paper used feature selection methods [106].

Table 4 Summary of algorithms used by researchers for ERP-based BCI

Currently, stepwise linear discriminant analysis (SWLDA) and SVM ensembles are the two classifiers dominating the detection of the P300 wave in the literature [42, 44, 106, 107, 109]. In [43], the authors experimented with a different classifier, Bayesian Linear Discriminate Analysis (BLDA), and noted that although increasing the training set size decreases the difference in results between BLDA and SVM, the results of SVM in P300-speller with familiar face model by utilizing a small training set is better than that of BLDA.

3 An Experiment with State-of-the-Art Algorithms—Video Category Classification

In this section, we experimented with several state-of-the-art algorithms (discussed in the previous section) for each of the submodules of the Signal Processing module of a BCI system (see Sect. 1) on two datasets we acquired using two EEG signal acquisition devices (Muse headband [35] and Emotiv EPOC [27]) where each of the participants were shown videos of three different genres. Our objective is to passively classify which type of video a person is watching from their Scalp EEG signals as this is the fundamental step of our long-term goal of building a BCI based passive video recommender system [47]. This data with preliminary code is downloadable from [110].

3.1 Experimental Setup and Data Acquisition Techniques

EEG Recordings

As previously mentioned, Muse headband by InteraXon [35] and Emotiv EPOC [27] by Emotiv Systems were used to record electroencephalogram (EEG) data to create two datasets. These off-the-shelf wireless devices have been used previously in several studies as well [4, 36,37,38]. Muse is a dry electrode EEG recording device with 5 channels (TP9, AF7, AF8 and TP10 with reference channel at FPz) and the Emotiv EPOC is a wet electrode device with 16 channels (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8 and AF4 with two reference channels at P3 and P4) arranged according to the international 10–20 system. Recording sampling frequency of the Muse and Emotiv EPOC were 220 Hz and 128 Hz respectively and the data were wirelessly transmitted to a computer via Bluetooth.

Demographics of Subjects

23 (15 males and 8 females) and 44 (32 males and 12 females) college-aged young adults contributed to dataset 1 (dataset created using the Muse headband) and dataset 2 (dataset created using Emotiv EPOC) respectively. The subjects had no personal history of mental or neurological disorders and had either normal or corrected-normal vision. The whole experiment for each of the subjects were also recorded using a webcam. We discarded data of 3 male and 1 female subjects from dataset 1 as after analysing these videos, we identified that one or more artifacts (excessive blinking, hand or body movements, etc. even after being instructed to move as less as possible) were excessively present in the signal. For this reason, we also selected 19 subjects with the same male to female ratio (12 males and 7 females) from dataset 2 as well to keep the comparisons between the two datasets legitimate.

All the participants signed informed consent forms prior to the study. The 19 selected participants for each of the datasets 1 and 2 had maximum, minimum, average, and standard deviation age of 26, 20, 22.5 and 1.35 and 23, 19, 21.2 and 1.32 respectively and all the 38 participants were right-handed.

Experimental Setup

Three different types of videos were shown to the participants (see Table 5): 1. Calming and informative, 2. Fictional and 3. Emotional. The criteria of choosing these three videos can be found in [47]. A five second blank black screen were shown between each of the three videos and also, at the beginning and at the end of the whole experiment. To give a hint of start, a message stating “The video will start in 5 seconds” was shown for two seconds at the very beginning (see Fig. 3). The compiled experimental video (accessible online in [111]) was of 6 minutes 43 seconds and the total experimental procedure including device setup took about 10 minutes per subject. The stimuli were presented on a 21.5-inch LED monitor with 60 Hz refresh rate.

Table 5 Details of the video clips
Fig. 3
figure 3

EEG data collection protocol for video category classification from EEG data

3.2 Experimental Study and Findings

Algorithms and Methods

In this section, we list out all the algorithms we experimented with for the Pre-processing, Feature Extraction, Feature Selection , and Classification submodules of the Signal Processing module of a BCI system to observe the best algorithm combination that achieves the highest accuracy in predicting the category of video a person is watching.

Pre-processing: As the three videos presented as stimuli were of different lengths, to classify without biasness, we selected one minute of raw EEG data from each of these videos—the part involved with the main story line of the video. The last minute of the first video states most of the information, one minute in the exact middle of the second video comprises of the main climax and/or story and the last minute of the third video reveals the emotional climax and thus, we selected raw EEG data from these parts.

After the extraction of these one-minute data, we carried out experiments following three different approaches. We did not use any artifact removal techniques in our first approach, i.e. used raw data. In our second approach, to remove artifacts, we used Stationary wavelet transform (SWT) based denoising and as our third artifact removal technique we used an extended SWT technique were we first applied SWT following which we eliminated all data whose absolute difference was above 2 standard deviation from the mean (SWTSD ).

SWT-based denoising was chosen in order to correct stereotyped artifacts such as muscle artifacts (EMG ), motion artifacts, blinking and lateral eye movement artifacts (EOG). We chose SWT as it is better than DWT (Discrete Wavelet Transform) because of its transitional invariance (e.g. slight change in signal does not change the wavelet coefficients much and thus doesn’t introduce much variations in energy distribution in different wavelet levels) [112]. A 5-level and 4-level SWT with Haar as mother (aka basis) wavelet has been applied on the EEG signals recorded from Muse (Fs = 220 Hz) and Emotiv EPOC (Fs = 128 Hz) headbands respectively. After the application of SWT, the output consists of final approximate coefficients (\( a_{5} /a_{4} \)) which represent distinct low frequency bands and a series of detail coefficients (\( d_{1} - d_{5} / d_{1} - d_{4} \)) which are the values of high frequency bands (see Tables 6 and 7). To remove artifacts from EEG signal, the updated universal threshold [113, 114] was applied on different scales of wavelet coefficients. Finally, by applying inverse SWT with Garrote threshold function as used in [113, 114], the artifact-reduced EEG data are reassembled using the latest set of wavelet coefficients.

Table 6 Illustration of SWT coefficients in relation to EEG rhythms in different frequency bands for dataset-1 using MUSE
Table 7 Illustration of SWT coefficients in relation to EEG rhythms in different frequency bands for dataset-2 using Emotiv EPOC

After applying an artifact removal technique, we experimented with two basic family of filters namely, Finite and Infinite Impulse Response (FIR and IIR) filters to band-pass filter out EEG signals in the range 5–48 Hz which also removed EOG artifacts as they are low frequency signals (less than 4 Hz) [115]. In addition, the selected bandwidth of the mentioned filter also inherently removes the power line interference (i.e. 50 Hz in our recording location) and its harmonics, thus Notch filter was not used in the preprocessing stage. We designed two FIR filters and three IIR filters. Table 8 presents their detailed configurations.

Table 8 Different filters with their design specifications

Feature Extraction: The objective of this submodule is extracting useful features from the filtered EEG data which are to be used by the Classification step. There exist several feature extraction algorithms among which we selected: Discrete Wavelet Transform (DWT), Fast Fourier Transform (FFT), Welch Spectrum (PWelch), Yule—AR Spectrum (PYAR) and Short Time Fourier Transform (STFT). Table 9 presents the parameters chosen for each of these algorithms.

Table 9 List of feature extraction methods with their default parameters

Feature Selection: Reduction of the dimensions of the features extracted in the last step can substantially reduce the execution time with pretty less or ignorable change in classification accuracy. For our problem, we chose two of the most popular feature selection algorithms—Principal Component Analysis (PCA) and Singular Value Decomposition (SVD).

Classification: Six very different type of classifiers in design and architecture were chosen for the classification submodule—Adaboost (AB), Support Vector Machines (SVMs), Multi-Class Linear Discriminant Analysis (MLDA), Multiple Linear Regression (MLR), Naïve Bayes (NB) and Decision Trees (MLTREE). Parameters for SVM were chosen kernel = linear and C = 1. An ensemble of 100 weak classifiers were used in Adaboost. The default parameters implemented by the MATLAB’s Statistics and Machine Learning toolbox were used as parameters for all the other classifiers.

The 10-fold Cross-Validation approach which in our case is also Leave-One-Out Cross-Validation (LOOCV) was used as an evaluation criterion for classification accuracy. We implemented the subject-specific approach in which the classifier is trained and tested using the data of the same subject, i.e. we divided the data of one subject into 10 epochs (6 second epochs), trained the classifier with 9 of them and tested with the remaining one and the whole procedure was repeated 10 times.

A computer with 3.4 GHz processor (Intel Core i7) and 16 GB memory were used to run all the experiments and they were implemented using the EEG processing toolbox developed by Oikonomou et al. [116].

Experimental Results and Discussion

Since, it is impossible to report the results of all the combinations of algorithms (300 combinations for each artifact removal technique, i.e. 900 combinations) we chose in the previous section, based on our preliminary results, except for the artifact removal techniques, we selected two top performing algorithms from each of the submodules. Thus, as filters we selected FLS and ICS1, PYAR and PWelch as feature extraction techniques, both PCA and SVD as feature selection methods and NB and SVM as classifiers. Tables 10 and 11, for dataset 1 (data acquired using Muse headband) and 2 (data acquired using Emotiv EPOC) respectively, presents the results achieved for each of the combination of algorithms when different artifact removal techniques were applied for all the channels of Muse (TP9, AF7, AF8 and TP10) and corresponding closely located channels of Emotiv EPOC (T7, AF3, AF4 and T8).

Table 10 Average accuracies for each of the combination of algorithms when different artifact removal techniques were applied for all the channels of Muse
Table 11 Average accuracies for each of the combination of algorithms when different artifact removal techniques were applied for 4 channels of Emotiv EPOC which are close correspondence with the channels of Muse

Artifact Removal Techniques: For dataset 1 (Table 10), an increase of 3.1% in average accuracy can be observed when SWT (57.8%) was applied compared to the average accuracy when raw data (54.7%) were used. The mixture of SWT followed by SD (SWTSD ) was able to achieve even better average accuracy of 61.9% with a difference of 4.1 and 7.2% with SWT and raw data respectively.

Similar results can also be observed for dataset 2 (Table 11). Although, the introduction of SWT (46.8%) slightly improved the average classification accuracy compared to raw data (46.2%), SWTSD (52.2%) substantially improved the results by 6.0%.

The results achieved from EEG data of both the devices infer the fact that, EEG signals are highly prone to artifacts and therefore, appropriate usage of artifact removal technique(s) can significantly improve classification accuracy. For the video category classification (VCC) problem, based on our results we can conclude that, our new method SWTSD performs better than the conventional artifact removal technique SWT. It is important to note that this does not however infer that SWTSD will perform better than SWT for other types of studies (e.g. MI, SSVEP, ERP, etc.) as well.

Impacts of usage of Feature Selection Algorithms: Our analysis will now concentrate on the results of SWTSD only as it is the better performing artifact removal technique. Table 12 present the results for dataset 1 and 2 respectively when feature selection techniques were not used.

Table 12 (a) Average accuracies for each of the combination of algorithms for all the channels of Muse when feature selection techniques were not applied. (b) Average accuracies for each of the combination of algorithms for 4 channels of Emotiv EPOC when feature selection techniques were not applied

As expected, when feature selection techniques were not used, an increase in average execution time per subject was observed for both the datasets. For dataset 1, the average execution time increased from 4.03 msec to 6.73 msec (increased by 67.0 percent) and for dataset 2, the average execution time almost doubled from 3.03 msec to 5.96 msec (increased by 96.7 percent).

When all the data are used for classification , the classification accuracy is expected to be higher compared to when feature selection methods are applied before classification . Although, as per Tables 11 and 12b, this is the case when Emotiv EPOC’s data were used (average accuracy of 52.2% and 58.9% with and without feature selection methods respectively), however, slightly different results can be observed when Muse’s data were used, i.e. average accuracy decreased from 61.9% to 60.3% when feature selection methods were not used (see Tables 10 and 12a).

This decrease in accuracy for Muse can be explained by the differences in sampling rate of the two devices (128 Hz for Emotiv EPOC and 220 Hz for Muse). The number of components chosen by the dimensionality reduction algorithms for both the datasets remained the same and so, feature selection algorithms had more options to choose from for dataset 1 than for dataset 2 and therefore, the number of redundant features selected for dataset 1 are less as well. Also, the selected features probably had less noise compared to the original data and hence, the classification accuracy improved.

Channel Selection: To identify which channel is most suitable for the VCC problem, for the sake of simplicity, our analysis will now concentrate on the results when feature selection algorithms were applied to the datasets (Tables 10 and 11). Average accuracies of channels TP9 (58.5%) and TP10 (58.1%) of the Muse headband are very similar with just a 0.4% difference. The results improve even further to 60.6% when data of channel AF7 were used. A significant increase in average accuracy can be observed for channel AF8 (70.3%) located at the right dorsolateral prefrontal cortex. As videos have the potential to evoke working memory in participants, one possible reason for such an increase of about 10% for this particular channel can be explained by the findings of [117] where the authors conclude that right dorsolateral prefrontal cortex is heavily involved with spatial working memory related tasks. There can be several other explanations for this abrupt increase in average accuracy which include emotions triggered by different videos in subjects, attentiveness, etc.

Comparing the results with the electrodes T7 (54.6%) and T8 (56.5%) of Emotiv EPOC, average accuracies were somewhat similar to that of the Muse headband for the electrodes TP9 (58.5%) and TP10 (58.1%) compared to the electrodes located at the frontal lobe. First of all, unlike Muse headband, the average accuracies deteriorated substantially for the electrodes AF3 (49.7%) and AF4 (47.9%) in comparison with the electrodes located at the temporal lobe (T7 and T8). Secondly, the difference between the average accuracies of AF7 (60.6%) and AF3 (49.7%) was 10.9% and between AF8 (70.3%) and AF4 (47.9%) a huge difference of 22.4% can be observed.

As reported in several studies [118,119,120,121], the performance of Emotiv EPOC compared to other EEG signal acquisition devices, is not up to the mark. This might be because, as per Fakhruzzaman et al. [122], Emotiv EPOC is a not a medical grade device, i.e. it is a consumer grade device and the all-size all-fit concept of this device is not so good as it sounds. Other than these reasons, the deterioration in average accuracy, specifically for the frontal lobe electrodes, can also be explained by the difference in spatial position of the two electrodes of the devices (Figs. 4 and 5). AF7 and AF8 of the Muse headband is located on the forehead whereas AF3 and AF4 of Emotiv EPOC is positioned on or above the hairline on the forehead depending on the size of the forehead of different individuals. This obstruction of hair for channels AF3 and AF4 makes them much more susceptible to artifacts compared to the channels AF7 and AF8 which are placed right on top of the skin and therefore, results of frontal lobe channels of Emotiv EPOC are worse than that of Muse.

Fig. 4
figure 4

AF7 and AF8 channel locations of the Muse headband

Fig. 5
figure 5

AF3 and AF4 channel locations of Emotiv EPOC

Table 13 provides the average accuracies achieved by each of the 16 combinations of algorithms for all the channels of Emotiv EPOC when SWTSD artifact removal technique was used. From all the results of the Muse headband (Table 10) and Emotiv EPOC (Table 13) the only channel that exceeded the minimal BCI performance criteria of 70% [123] was when data of channel AF8 of the Muse headband were used leading us to conclude that this is the most suitable channel and the Muse headband is the better device for the VCC problem. The highest average classification accuracy achieved by this channel was 77.7% (4.83 msec average total execution time per subject) and the combination of algorithms responsible were SWTSD and FLS for the pre-processing submodule, PYAR for the feature extraction submodule and SVD and NB for the submodules feature selection and classification respectively. Even though none of the channels of Emotiv EPOC achieved the minimal BCI performance criteria of 70% [123], the channel whose results were closest to it was T8 with highest average accuracy of 66.7% (2.96 msec average total execution time per subject) when SWTSD and ICS1, PWelch, PCA and SVM were used.

Table 13 Average accuracies for each of the combination of algorithms when SWTSD were used as an artifact removal technique for all the channels of Emotiv EPOC

The results of the channels located at the occipital lobe of Emotiv EPOC were surprisingly low. Other than the limitations of Emotiv EPOC mentioned before, this may be because that although exposure of videos triggers visual evoked potential (VEP) in the brain, other parts of the brain including the prefrontal dorsolateral cortex are more involved or is activated when such stimuli are presented.

Future Works: There are several areas we can work on in the future to improve our results. For example, the order of the IIR filters and the dimensions of the feature selection algorithms are being selected automatically by MATLAB. Optimizing these parameters will have an impact on the results. We used data epochs of 6 seconds which is a big epoch size for EEG related studies as the stationarity of the EEG signals with increasing epoch duration is expected to disappear [124]. This is a very crucial area which we hope to address in the future.

The relevant frequency bands for MI (7–30 Hz, mu and beta bands) [125], ERP (< 4 Hz, delta band) [126] and SSVEP (12–18 Hz) [127] based BCIs are well known by researchers. Analyzing a relevant frequency band for the VCC problem was beyond the scope of this study. As discussed in our previous work [47], we hope to target high-frequency gamma oscillations as they are heavily involved in working memory load related activity [128,129,130] and in activities requiring cross-modal sensory processing—perception combined from two separate senses, for example from sound and sight [131, 132].

One category of Machine Learning algorithms, neural networks, especially Deep Learning algorithms which is the recent hype among Machine Learning researchers, was not used in this study. As Deep Learning algorithms compared to conventional Machine Learning algorithms are performing much better in almost all type of studies including EEG-based BCIs [88, 90], we believe that using such algorithms will improve our results substantially. Also, apart from SWT, there exists several other artifact removal tools in the literature, e.g. Empirical Mode Decomposition (EMD), Adaptive filtering, Independent Component Analysis (ICA), etc. which we hope to apply on the VCC study as well [133].

In addition, in this study, we have used a single feature (either PWelch or PYAR) during feature extraction step. However, features extracted using combination of different statistical and non-statistical features found in time, frequency and wavelet domain [133] (e.g. standard deviation, variance, entropy, kurtosis, skewness, periodicity, maximum or minimum power in all three domains, AR features, line length, NEO, FFT-based features, etc.) with different weights concatenated into a single feature vector might enhance the classification accuracy significantly, which is also one of our future works.

4 Conclusion

This chapter attempted to address the existing computational intelligence techniques for pattern recognition in one of the EEG-based BCI applications, i.e. Video Category Classification (VCC) and their accuracies, challenges and suitability for such application.

Based on results found from experiments on VCC and reports from other studies, computational intelligence in BCI systems is problem or application specific and depends on several factors. For example, as reported in the previous section, data acquired from two different EEG signal acquisition devices (Muse headband and Emotiv EPOC) for the same experiment resulted in considerably different results. The correct choice of the relevant frequency band (e.g. SSVEP, 12–18 Hz [127]) also plays a crucial role in the end results.

In accordance with other studies [36, 134], it is also found that proper usage and optimization of artifact removal techniques significantly improves classification accuracies. Similarly, depending on the BCI paradigm and type, appropriate selection of feature extraction, feature selection and classification algorithms will also have a positive impact on the results.

We hope and believe that the rapid progress in technology, both hardware (e.g. better signal acquisition devices, better computer and smart phone hardware, etc.) and software (better Machine Learning techniques such as Deep Learning, continuous improvement of artifact removal techniques, etc.), will in the near future improve the accuracy and feasibility of BCI systems to a level at which these systems can be deployed in real world scenarios (e.g. online BCIs) resulting in a better lifestyle of the physically disabled people and also will increase the quality of life for all people across the world.