Keywords

1 Introduction

The development of brain-computer interface (BCI) systems has been significant because of their potential applications in the commercial industry such as developing controls for smart games and neuroergonomics applications [1]. There is a need for an efficient and online BCI system that may increase the chance of developing such robust systems. The successful identification of a single-trial P300 component could become the core of developing a BCI system. Also, the selection of an appropriate preprocessing method could improve the classification of the P300 component ultimately improving the performance of a BCI system [2, 3].

It is evident from the literature that the identification of P300 components based on single-trial ERPs has significant importance for BCIs. However, the detection of the P300 component involving the single-trial ERP analysis often confounded with the low signal-to-noise ratios. Hence, the P300 classification involving single-trial analysis is challenging because of the low classification accuracies [4]. Among many other techniques, the issue of low signal-to-noise ratio (SNR) may be addressed at the preprocessing level because reducing noise may improve the SNR. There could be many different methods for preprocessing; a scientific way is to perform a comparison involving various methods to investigate their impact on the classification of the P300 component.

In the literature, various linear and non-linear artifact reduction methods have been proposed [5, 6]. In general, the methods could be categorized as manual, semi-automatic and fully automatic methods. The manual methods are mainly based on visual inspection of long traces of EEG recordings and followed by the deletion or rejection of the segments confounded with artifacts. However, manual methods are inefficient because the rejection of noisy data segments can lead to a loss of useful neuronal information as well. On the other hand, semi-automatic methods are based on machine-based correction assisted by manual input, for example, an independent component analysis (ICA) [7] analysis and canonical correlation analysis (CCA) [8] where the independent component (IC) are computed by machine and the artifact-related components are selected manually. Consequently, the clean EEG data were reconstructed without using the artifact related component. The ICA method assumes that the IC only represents independent sources inside the brain. However, it is a weak assumption and an identification of a noisy component could be an iterative process. Moreover, an example of a fully automated method could be a method involving ICA with an objective selection criterion that can objectively select an artifact-related component such as the wavelet-enhanced ICA (wICA) [9]. In fully automatic methods, the artifact reduction or correction is performed without any human intervention. An automatic method may compute a threshold for each IC and compares it with a preset value to classify it either as artifact or clean data.

The objective of this paper is to investigate the impacts of performing different preprocessing methods on the classification of P300 components. The paper presents a machine learning (ML) framework for evaluation of single-trial P300 detection while investigating three scenarios such as no preprocessing, amplitude-based artifact rejection, and a combination of amplitude-based artifact rejection and wICA [9]. The results section provides the tables for the calculated efficiency of classification. In addition, the amplitude-based artifact rejection method is the most common method whereas the wICA has been claimed as an efficient online method for EEG analysis [9]. However, this manuscript has tested the wICA method on ERP data.

2 Methodology

Figure 1 presents a comprehensive block level representation of the proposed ERP experimental setup and analysis. A detailed description of the experimental setup is provided in Sects. 2.1 and 2.2. The recorded ERP data were further processed by the proposed ML methodology including the ERP segmentation, and online noise removal, features extraction and classification. In particular, the paper aims to develop an efficient and online method for binary classification including single trial P300 and non-P300 sweeps. The efficiency of the system is quantified by the classification recognition, precision and recall.

Fig. 1.
figure 1

The proposed automated method for ERP single-trial classification. The display randomly shows numbers between one and nine and the related EEG/ERP signal is acquired using a 10/20 EEG cap. After preprocessing and feature extraction, single-trial binary (target vs non-target) classification and its evaluation follows.

2.1 ERP Experimental Setup

In this study, the event-related potential (ERP) data were acquired from study participants including 250 children between ages 4 to 7 years, referred to as the study participants. The study participants were performing the guess the number experiment (a detailed description is provided in Sects. 2.2 and 2.3) [10]. The study participants and their parents have signed the informed consent and agreed for voluntary participation. The experimental procedure was explained to the study participant before commencing the EEG data recording.

2.2 Guess the Number Experiment

Figure 2 shows the experimental setup for guess the number experiment [10]. The experiment was designed to record the P300 responses generated by the brain. At the start of the recording session, the study participants were asked to remember a digit that can be any digit between 1 to 9. Then the participant was exposed to a computer screen with a random display of a sequence of the digits between 1 to 9. During this activity, the ERP responses were recorded and averaged so that the P300 could be observed. It was hypothesized that the elicitation of P300 corresponded to the digit thought by the participant at the start of the EPR recording session.

Fig. 2.
figure 2

The guess the number experiment [10]. The measured participant in the background is exposed to random visual stimuli sequence from one to nine while the experimenters can control both stimulation (on the right) and observe event-related potentials as they keep averaging in real time. Once they reach conclusion about the target stimulus, the experiment is stopped.

2.3 Data Acquisition

The EEG data acquisition was performed with a mobile EEG laboratory, a more comprehensive detail is provided in the reference [10]. In brief, the following hardware devices were used: the BrainVision standard V-Amp amplifier, standard small or medium 10/20 EEG cap, standard reference, ground and EOG electrodes, and monitor for presenting the numbers and two notebooks necessary to run stimulation and recording software applications. Only three electrodes named as the Fz, Cz and Pz, were used. The stimulation protocol was developed and run using the Presentation software tool produced by Neurobehavioral Systems, Inc. In addition, the Brain Vision Recorder was used for recording and storing raw EEG data, metadata describing the raw data, and stimuli data.

2.4 Segmentation of ERP Data

In this study, the preprocessing of the EEG data was followed by the segmentation of the EEG data according to the timing information of each stimulus. The segmentation has resulted in two types of ERP epochs: (1) the single trials representing the P300 responses (targets) and (2) the single trials that represented the non-P300 epochs (non-targets). The prestimulus interval was 200 ms and the post-stimulus interval was 1000 ms. The prestimulus interval was used to perform the baseline-correction for further ERP analysis. For illustration and validation purposes, Fig. 3 shows the grand averages of the ERP data acquired against the target and non-target stimuli. There is a clear P300 component following the target stimuli.

Fig. 3.
figure 3

The grand average involving the study participants; the two ERPs correspond to the target and non-targets

2.5 Preprocessing Scenarios

Figure 4 shows the preprocessing scenarios named as ‘no preprocessing’, ‘amplitude-based artifact rejection’, and ‘combination of artifact rejection and wICA’. The results section shows the impact of the three scenarios on the classification performances. In this study, the purpose of the EEG data recording was to record quality data. Therefore, the study participants were asked to reduce the eye blinks and head movements during the recording. However, the eye movements are unavoidable and often seen during in the recorded data. Hence, preprocessing the EEG data is a requirement in order to achieve clean EEG data.

Fig. 4.
figure 4

Preprocessing scenarios and computation of data matrices

Scenario 1: Data with No Preprocessing (No-Prep)

These data were used without any artifact (epoch) rejection, served as the baseline data and were used for comparison purposes with the two other methods employed in this study.

Scenario 2: Amplitude-Based Artifact Rejection Method

During this scenario, the ERP epochs containing artifacts were rejected based on their amplitude. This can be performed by finding the maximum amplitude of the data and comparing it with a threshold value. In general, an ERP amplitude can go up to 20 to 30 microvolts. Any ERP epoch with an amplitude of 50 microvolt or more could be confounded with an ocular artifact [11]. Hence, in order to get rid of the artifactual epochs, these kinds of ERPs are eliminated from the analysis.

Scenario 3: Combining Amplitude-Based Artifact Rejection Method and wICA

In the original paper [9], the wICA method was applied on the resting-state EEG data and proved as effective. In addition, the paper has provided a complete description of the method. In the present study, the Daubechies wavelet function was used with decomposition level i.e., j = 10. (The decomposition level was set equal to the log of the length of the ICA component, in our case the ICA component length was 1200; therefore, j = floor(log2(1200)), and it is equal to j = 10). In particular, the selection of the Daubechies wavelet function was motivated by its near-optimal time-frequency location properties for EEG signals [12]. Moreover, the wavelet scale was calculated mathematically as d = 2j. In this study, ‘j’ was equal to 10; therefore, d = 210 = 1024.

2.6 Feature Extraction

Complexity Features

In this manuscript, the composite permutation entropy index (CPEI) [13] was used for extracting the complexity of the single sweep ERP segments. Also, the fractal dimension (FD) [14] was employed as well. Both measures were utilized to compute the complexity features for the underlying ERP segments.

Morphological Features

Morphological features provide time-based information including the shape of the individual ERP epochs. In [15], the corresponding mathematical formulas are provided. In this manuscript, the morphological features included the computation of eleven different features: latency, maximum signal value, latency and amplitude ratio, absolute amplitude, absolute amplitude and latency ratio, positive area, negative area, total area, absolute total area, and peak-to-peak difference.

Integration of Features

The classification performance was computed on the combined or integrated features. The integration of features followed the concatenation of CPEI, FD and the morphological feature into a single vector. Finally, the data matrix represented the number of epochs x features. The dimensions for different simulation scenarios are presented in the results section.

2.7 Classification, Validation, and Performance Evaluation

In this paper, three classification models were trained and tested involving the k-nearest neighbor (KNN), support vector machine (SVM) and the logistic regression (LR). A detailed description on these classification models is out-of-scope regarding this paper and can be found elsewhere: kNN [16], SVM [17], and LR [18]. Many studies in the literature have evidenced the significance of choosing linear SVM because it has been considered as a stable and standard classification method [19]. In this study, the Matlab-based ‘Classification Learner App’ was utilized to train and test the SVM. In particular, the type of regularization was selected as ‘lasso’. In addition, the overfitting was effectively reduced by employing the 10-fold cross-validation (10-CV).

Figure 5 shows the scheme for 10-CV. According to the 10-CV, the observations in the EEG data matrix were randomly divided into test and train subgroups. The random division provides equal opportunity for the observations to be used both as test and train sequences. This method in part prevents over-fitting of the classifiers.

Fig. 5.
figure 5

The classification scheme for validating the proposed method

The classifiers performance was evaluated by computing the classification precision, recall, and recognition. The confusion matrix was constructed with values such as true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Based on these values, the following performance metrics were computed and presented in Eqs. 1, 2, and 3 below:

$$ Precision = \frac{TP}{TP + FP} $$
(1)
$$ Recall = \frac{TP}{TP + FN} $$
(2)
$$ Recognition = \frac{TP + TN}{TP + TN + FP + FN} $$
(3)

3 Results

Table 1 provides the model types and parameter values of the classifiers so that the study can be replicated for common datasets. In brief, the parameters were selected because of the classifier structure and underlying classification problem. In this study, only two classes were available; therefore, it was a binary classification problem. In the case of logistic regression, the preset value should be logit (because the LR classifier utilized a logistic function), and a binomial distribution suited the binary classification. Moreover, the k-NN was initialized with the nearest neighbor value as 1 to keep it simple, the distance measure has utilized a commonly known distance metric, i.e., Euclidean with equal weights. The classification method was one-vs-one that suits the binary classification with data standardization. In the case of SVM, a simple structure was investigated that led to the selection of a linear kernel with automatic scale. Like the kNN classifier, the SVM parameters ‘multiclass method’ was set to one-vs-one that suits the binary classification and ‘standardize data’ was set to true to remove any biases or outlier effects in the data. In this study, there was no optimization method utilized for optimizing the hyperparameters.

Table 1. Classification model types and parameters

Table 2 provides an overall picture of classification performance during the three preprocessing scenarios. The classification recognition provides information on the classification accuracies. The maximum classification accuracy was observed during scenario 3. The SVM and KNN provide the maximum accuracy 76.3%. There was a little difference (slight improvement) between scenarios 1 and 2. It shows that the amplitude-based artifact rejection method has removed artifacts and slightly improved the SNR. It helped during the classification of the P300 vs non-P300.

Table 2. Classification efficiency for the three scenarios

This is because the performance of classifiers can be explained with the given data. In the case of the first scenario, the noise in the data was not removed; it renders the decrease in the signal-to-noise ratio (SNR), overall. Therefore, the classifier was not able to separate the single trials of the P300 epochs from the non-P300 epochs. On the other hand, the performance of the classifier got better with a rejection of artifact epochs. This behavior was expected and proved that the SNR of the data got better because of the removal of the artifact epochs. Lastly, these noise-free epochs were subjected to the further removal of noise with the wICA method. In this scenario, the classifiers have performed much better than in scenarios 1 and 2. These results have signified the importance of preprocessing ERP data before making any further analysis.

In this study, it was observed that there was a limit to set the threshold for artifact rejection, i.e., 20 mV. A decrease less than this threshold value had resulted into decreased classification performance. One possible reason could be a rejection of many artifactual epochs that may provide useful information. Hence, information decreased due to rejection could not train the classifiers well.

Lastly, the wICA method was based on computing the ICA components; therefore, it inherently involved the limitations associated with the ICA. In this study, the application of wICA on the raw ERP segments decreased the classification accuracy because ICA could not perform in the presence of so many components. Therefore, the application of wICA should follow the rejection of artifactual components.

4 Discussion

The data used for this study were obtained on 250 school-age children, and are publicly available [10]. They were collected during a simple BCI experiment, involving P300-based guessing a number 1–9 following visual stimulation with corresponding numbers. Such a big number of participants is unique among publicly available P300 datasets and gives us opportunities to compare signal processing and machine learning methods with sufficient statistical significance. In [20] and [21], deep learning techniques were compared with state-of-the-art classifiers as parts of an off-line BCI system. This manuscript is focused on single-trial classification.

As shown in Table 2, the wICA method provided better classification results than the amplitude-based artifact rejection method on the raw ERP data. There could be few possible reasons for these results. For example, many studies have evidenced that the ICA method has been considered as a successful method for separating the sources of artifacts in the EEG data. The wICA method employed in this study is the enhancement of the ICA method that further helps improve the detection of artifactual components. Because it can successfully reduce the “leak” of cerebral activity of interest into ICA components [9]. In addition, the wICA method, based on its thresholding criterion, emphasized on correction of EEG data without any deletion. This strategy helps in preserving useful neuronal information as compared with the methods that employ deleting the whole ERP segment based on amplitude maximum values.

Besides methods mentioned in the manuscript, there are other ways how to improve classification accuracy. However, they have their limitations, too. For example, the conventional and straightforward method for P300 extraction is the averaging method. The averaging method reduces the random noise and hence significantly improves the signal to noise ratio of the P300 component. The improvement in the signal to noise ratio is directly proportional to the number of epochs to be averaged. There is a minimum limit (i.e., at least 20 trials) to the number of epochs to be averaged [22]. However, the number of ERP epochs may reduce the speed of a BCI system. On the contrary, the development of methods for extracting the parameters of the P300 component such as its amplitude and latency from single-trials ERP epochs faced the challenge of low accuracy of character detection. However, this method has significantly improved the speed of the BCI system.

By directly addressing the identification of single-trial P300 components, the speed and accuracy of a BCI speller can be improved. An online EEG ML-based framework normally included the online preprocessing, features extraction, feature selection, classification and validation. This allows the researchers to improve the P300-based system at multiple levels such as at the preprocessing stage by incorporating the best preprocessing method. Moreover, the EEG data recorded with one reference can be converted to a different EEG reference such as the REST in order to achieve a better signal to noise ratio [23]. In addition, the improvement can be observed at feature extraction, and classification stages as well.

However, there are also some limitations of this study. First, by rejecting epochs contaminated by artifacts, we decrease the number of epochs available for classification. In on-line BCIs, this would translate into lower transfer bit-rates. Therefore, more research on how to set the amplitude threshold optimally needs to be performed. Second, although ICA and related methods can be applied even to three-channel EEG data [24], separation of artifact-related independent component gets much easier with more channels. Therefore, more analysis into why wICA can increase class separability with low number of EEG channels would shed light on the achieved results. Finally, although large number of datasets and 10-fold cross-validation limit overfitting, it is still not clear if the results can be reproduced for adults or people with disabilities.

5 Conclusion

The paper has investigated the single-trial ERP extraction while comparing the effects of pre-processing on the identification of the P300 component. Moreover, the paper has presented an efficient classification scheme for this purpose. The comparison results have presented that the wICA method shows the best results for ERP single trial extraction as compared with the amplitude-based artifact rejection method. Hence, based on the results it can be concluded that the wICA method can be utilized for the development of BCI systems.