Introduction

A brain–computer interface (BCI) is designed to establish a communication link between the human brain and a computer [1]. A BCI does not depend on muscle or peripheral nervous system activity. In particular, a BCI could help patients suffering from amyotrophic lateral sclerosis (ALS) to communicate or to complete various daily tasks, including controlling a computer or typing messages on a virtual keyboard. In recent years, auditory BCI (aBCI) has become a topic of great interest in computational auditory neuroscience. The aBCI utilizes human auditory pathway responses and allows users to operate external devices more quickly and simply, based on auditory evoked responses to sound stimuli. We utilize a spatial auditory paradigm to create a new aBCI system with which users consciously direct their attention to different locations in a horizontal surround sound environment with various noise stimuli, as depicted in Fig. 1. The spatial aBCI concept is founded on a basic feature of the human auditory pathway, which is very sensitive to the location of changing spatial auditory sources [2]. The auditory pathway also has a very good temporal resolution, which is an additional feature we would like to utilize in the spatial aBCI design. This will make it possible to reduce inter-stimuli intervals (ISI) of the presented sounds in comparison with vision-based applications [3]. Contemporary applications have thus far failed to use rear-to-the-head loudspeakers, as postulated as an optimal setting yet still not fully realized in Schreuder et al. [4]. Our proposal includes sound presentation from rear loudspeakers, thus realizing an eight-command BCI (octagonal horizontal plane full surround sound set-up). In previous publications, we discussed EEG electrodes selection, event-related potential (ERP) features optimization and linear discriminative analysis classification [5]. These earlier results were unsatisfactory as regards a successful online aBCI system application.

Fig. 1
figure 1

Spatial auditory BCI paradigm concept with eight loudspeakers in the upper part of the figure. The lower graph visualizes the stimulus presentation concept in the time domain. Each stimulus is presented for 30 ms with 170-ms silent breaks, so the ISI is set to 200 ms

In this paper, we discuss a novel aBCI paradigm based on the full surround sound horizontal stimuli as an informative cue with the support of the P300 component (known as the “aha response”) at a latency around and after the 300 ms, evoked by the expected/instructed targets [6]. Our hypothesis is that a significant ERP response will be found when subjects attend to the target direction and ignore the non-targets. To find the significant differences, we propose analyzing the response statistically to identify only those ERP latencies that contribute to the classification enhancement, in contrast to state-of-the-art approaches [1], in which the whole response is taken as a feature for subsequent classification.

The hypothesis of our research is that the horizontal full surround sound aBCI paradigm could be improved with the careful selection of ERP discriminative features that allows the use of the rear-to-the-head sound directions. For this purpose, we introduce a statistical response analysis, which ultimately leads to the final improvement in the information transfer rate (ITR).

The paper is organized as follows. In the next section, the experimental paradigm is explained together with the EEG preprocessing steps. Then, we discuss EEG feature selection using the method of statistical analysis of ERP responses. Finally, we present classification results obtained with a Gaussian Naive Bayesian Classifier (GNBC), which leads to an improvement in the ITR scores. The final section summarizes the paper.

Methods

Within the framework of the proposed novel aBCI paradigm, the subjects were asked to attend to and count targets while ignoring non-targets, as in the classical oddball paradigm [1, 6]. A target direction instruction regarding which direction should be attended to in each trial was displayed visually on a computer display located in front of the subject. First, we conducted psychophysical experiments to check possible preferred directions of the subjects by comparing response time delays. Next we conducted EEG recording experiments in an offline BCI setting. The EEG signals were recorded with a g.MOBILab+ EEG amplifier by g.tec . We used novel dry EEG electrodes g.SAHARA to further improve the subjects’ comfort, since these do not require conductive gel. The reference and ground electrodes were attached behind the left and right ears, respectively. To reduce unnecessary noise and to prevent degradation of the EEG signal quality as a result of electromyography (EMG) noise related to muscular movement in the ERP responses, the subjects were asked to minimize the blinking of their eyes, and facial and body movements during the experiments.

EEG experiments designed to validate the proposed spatial aBCI paradigm utilizing the P300 latency were conducted in the Multimedia Lab at the Life Science Center of TARA, University of Tsukuba, Japan. All the experimental procedures and study targets were explained to the subjects, who agreed to participate voluntarily. The experiments were conducted in agreement with the WMA Declaration of Helsinki—Ethical Principles for Medical Research Involving Human Subjects. All the experiments were conducted in a silent and low reverberation room in order to limit any interference from environmental acoustic noise.

The auditory stimuli were presented through eight loudspeakers in an octagonal setting, as depicted in the upper part of the Fig. 1. The eight sound stimuli directions proved to be optimal from the points of view of aBCI and human subject spatial auditory performance [4].

Two short white and pink noise stimuli bursts were used as depicted in the lower part of the Fig. 1 and described in the following section.

Psychophysical Experiment

In the psychophysical experiment, only the behavioral responses (button presses after the instructed and perceived target stimuli) were recorded. Different response time delays would suggest changing cognitive loads and task difficulties in function of the various spatial directions. The subjects were requested to press a button immediately after an instructed target direction was presented. The response delays in respect of auditory stimuli onsets were recorded and further analyzed in order to compare them with various spatial directions.

The results of the psychophysical experiment are presented in Tables 1 and 2. As a result of the tests conducted, we conclude that all the eight spatial sound stimuli locations had the same (differences among means statistically non-significant when compared with pairwise t tests) values for all the tested octagonal stimulus spatial directions for white and pink noise, which also confirms psychophysical experiments reported in Schreuder et al. [4].

Table 1 The spatial sound psychophysical experiment results
Table 2 The confusion matrix results from the psychophysical experiment averaged for all the subjects for pink and white noise stimulus, respectively

The Offline aBCI Experiment Protocol

The experimental hypothesis was that we would be able to distinguish from the ERP shape (mainly based on the P300 response latencies) which direction the subject attended to in the spatial auditory paradigm experiment.

To test the hypothesis, we conducted a series of EEG recording experiments in the offline BCI mode (with no instant feedback or classification results given to the subjects [1]). EEG recording experiments were conducted with the ten healthy subjects (eight males; two females; age range from 23 to 42 years, mean 25.8, SD 6.34). The subjects were requested to sit in a comfortable chair in the center of eight octagonally positioned loudspeakers, and the dry EEG electrodes were positioned on the scalp. The elevation of the loudspeakers was fixed at the subject’s ear level in order to create a horizontal spatial plane defined by the eight loudspeakers (see Fig. 1). The volume of the sound was set to 72 dB.

The sound stimuli were presented in random order and one at a time from a single loudspeaker (a single trial consisted of a delivery of a single target and seven non-targets). We employed two broadband noise stimuli types that allowed us to utilize the two spatial localization mechanisms of the human auditory pathway, the interaural time delay (ITD) and the interaural level difference (ILD) [2]. The white and pink noise stimuli both had 30 ms lengths with 5-ms linear attack and sustain intervals. For each subject and each stimulus, we performed eight sessions (altogether 64 targets and 448 non-targets were presented). Each subject was requested to focus on the instructed target direction which was presented on a computer display. The subject ignored the other non-target directions. Each subject was also requested to control her/his eye movements to decrease the unnecessary EMG noise during the experiments. Before each experiment, the subject was allowed a short practice session to get familiar with the spatial auditory stimulus conditions.

EEG Acquisition

The EEG signals were recorded by the g.MOBILab+ bio-amplifier with eight dry g.SAHARA electrodes. The EEG recording system captured the neurophysiological signals in a frequency range of 0.1–40.0 Hz. The following eight EEG electrode positions were chosen P3, P4, P5, P6, CzCPzPz, and POz, as in the 10/10 system [7]. The eight EEG channels were sampled with 256 Hz frequency and stored using a custom application programmed in MATLAB and Simulink environments.

EEG Response Analysis

The analysis of EEG ERP responses, leading to the final eight-direction spatial auditory classification for target and non-target locations, was composed of the following three steps:

  1. 1.

    EEG signals preprocessing: band-pass filtering, epoch segmentation, and artifact rejection;

  2. 2.

    Discriminative feature extraction using the analysis of variance (ANOVA) method;

  3. 3.

    The final classification of evoked response using the GNBC.

We describe the above steps in detail in the following sections.

EEG Preprocessing

First, we filtered digitally the signals with the two fifth-order Butterworth high- and low-pass filters, which were applied with cut-off frequencies at 0.5 and 25 Hz. The low-pass filtering removed possible muscle-activity-related artifacts. The high-pass filtering removed the direct current-related drifts of the EEG signals, as well as slow eye movement artifacts.

Next, the EEG signals were segmented creating the ERP-related epochs. Each epoch started 100 ms before stimulus onset and it ended after 700 ms. We used the 100 ms prestimuli onset interval as the baseline (see Fig. 2).

Fig. 2
figure 2

The grand mean averaged ERP responses of the ten subjects. Results of ERP P300 response for pink noise (upper panel) and white noise (lower panel). The solid red lines depict the attended targets and the blue dashes the ignored non-targets. The differences between targets and non-targets are very easy to distinguish after 300 ms (the so-called aha or P300 response) (Color figure online)

In the next step, the rejection of eye movement artifacts was carried out. Auditory spatial stimuli are known to cause uncontrolled eye movements in subjects [8], which in the current approach were removed with a threshold value set at the 80 μV (signal amplitude level above the usual EEG activity). The rejected epochs were not further processed, since in the current approach, the emphasis was on the spatial paradigm validation. In the following sections, feature extraction and ERP classification results are introduced.

ERP Feature Extraction Using ANOVA of the ERP Latencies

The aim here was to optimize the EEG response domain (mainly P300 response), which would provide a better separability for further classification. In order to do this, we conducted ANOVA of the two-class single-trial ERP distributions (target vs. non-target responses) in the spatial auditory experimental setting. The ERP response distributions passed “normality tests” and were comparable to more flexible methods such as the area under the curve analysis, yet the proposed ANOVA yielded the best results in our case. The majority of spatial aBCI applications aim at the P300 response latency [4, 5, 9, 10]. The example in Fig. 2 shows the averaged ERP responses to targets and non-targets (note the latencies range 300–600 ms). Next, the ANOVA method was applied to compare the differences of response distributions in single trials for each sample point of the collected ERPs. As a result, we were able to extract discriminative information leading to later classification optimization. The results of the above analysis are depicted in Figs. 3 and 4. The bottom panels in the above figures visualize the ANOVA’s p values for eight electrodes separately in each row using a color scheme, as explained in the color bars next to the panels. The p values are the probabilities of the null hypotheses that the distributions are significantly different (in life sciences, usually p < 0.05 is considered to be a significant value). The results in Figs. 3 and 4 clearly show that the postulated P300 latency area in the range 300–600 ms is the best to discriminate attended targets from ignored non-targets. This finding confirms our hypothesis that the P300 latencies are also related to spatial cognition in the human brain.

Fig. 3
figure 3

Grand mean average auditory evoked responses to spatial white noise stimuli of the ten subjects from the eight electrodes plotted separately in each row of the panels. The top panel shows the grand mean averaged response to the targets. The middle panel presents the grand mean averaged responses to non-targets. The bottom panel depicts the p values from the ANOVA for the eight electrodes separately

Fig. 4
figure 4

Grand mean average auditory evoked responses to spatial pink noise stimuli of the ten subjects from the eight electrodes plotted separately in each row of the panels. The top panel shows the grand mean averaged response to the targets. The middle panel presents the grand mean averaged responses to non-targets. The bottom panel depicts the p values from the ANOVA for the eight electrodes separately

Next in this paper, the binary classification problem is discussed. We evaluate our hypothesis that the “hand-picked” P300 latency ERP periods are significant features to improve the binary target vs. non-target classification accuracy. In order to find the most discriminable features from ERP responses, we used the results from the ANOVA method described above applied to the all ERP latencies. We “hand-picked” only those samples within each subject’s ERPs for which the p values were smaller than 0.05 (as depicted in blue in Figs. 3 and 4) in the range 300–600 ms.

The Offline ERP Classification in the aBCI Paradigm

We performed the classification steps for each subject separately in aBCI offline mode, which means that all the procedures were conducted after the collection of data from each experiment, without any online feedback to the subjects. The classification procedure in our case is a so-called binary task paradigm (target vs. non-target). In the classifier training and testing step, we selected 64 targets and a random subset of 64 non-targets (from the 448 available) to have a balanced number of the members in each class set. The resulting theoretical chance level was thus 50 %. Based on our previous classification trails reported in Cai et al. [5, 11], we proposed to use a Bayesian classifier, which yielded similar or even better results on our experimental data than linear discrimination analysis methods. The GNBC is particularly suited to highly dimensional features. The GNBC method produced results comparable to more sophisticated classification methods [12] for particular cases as reported in this paper. In our approach, we utilized a NaN-Toolbox which is a part of a BioSig environment [13]. The classifier input features were the real micro-volt EEG ERP latency values “hand-picked” as discussed in the previous section.

The results of the successful application of the GNBC technique are presented in the next section.

Results

The proposed approach resulted in the improvement in the aBCI paradigm for setting of both the combined and averaged electrodes for each subject separately. Detailed results are presented in the following sections. First, we introduce the ITR, which is a commonly used measure to compare various paradigms in the BCI research community [4]. We also present classification accuracy results.

Analysis of aBCI Results with ITR and Classification Accuracies

The amount of information carried by each selection in the BCI application is usually quantified by the ITR, which is calculated based on bit-per-selection R, defined as [4]:

$$R = \log_2 N + C\cdot\log_2 C + (1-C)\cdot\log_2\left(\frac{1-C}{N-1}\right),$$
(1)

where C is the classification accuracy and N is the number of classes (N = 8 in this paper). The final ITR is obtained after a multiplication by a classification speed V, resulting in a bit-per-minute rate (bit/min) as:

$$ITR = V \cdot R$$
(2)

The ITR results are summarized in Tables 3 and 5 and discussed in detail in the following sections. The GNBC classification accuracies are also summarized in Tables 4 and 6.

Table 3 The offline aBCI interfacing results based on features drawn from non-averaged trials in the form of ITR scores obtained as in Eqs. (1) and (2)
Table 4 The classification results for ERP latencies in P300 responses for target versus non-target paradigm
Table 5 The offline aBCI interfacing results based on features drawn from the averaged eight trials in the form of ITR scores obtained as in Eqs. (1) and (2)
Table 6 The classification results for ERP latencies in P300 responses for the mean of 8 targets versus average of 8 non-targets paradigm

The ITR and Classification Accuracy Results from the P300 ERP Range Latencies in the Single Channel Setting of Target Versus Non-target

A summary of the ITR results is presented in Table 3 (the corresponding classification accuracy is summarized in Table 4). We compare the ITR values obtained for the features drawn from the conventional whole ERP and the proposed “hand-picked” features resulting from the ANOVA. The proposed method allows us to boost the classification results up to +10.43 bit/min (+44 % classification accuracy). Only a single case shows a decrease using the leave-one-out cross-validation for the GNBC technique.

The ITR Results from the P300 ERP Range Latencies from the Averaged Eight Trials in the Setting of Target Versus Non-target

The single-trial classification results discussed in the previous sections have resulted mostly in lower than 14 bit/min (below 70 % accuracy) in the aBCI offline mode. In order to improve the results, for each subject and each stimuli, we averaged the eight target trails (convert 64 targets to 8 targets) and 8 non-target trials (convert 448 non-targets to 56 non-targets). In the classifier training and testing steps, we selected 8 targets and a random subset of 8 non-targets (from the 56 available), the same as the single-trial classification training and testing set. For each of the subjects in the case of the pink noise stimulus, the classification resulted in scores higher than 8 bit/min (80 % accuracy). In particular, for the subject numbers 1 and 2, the resulting ITR reached 14.06 bit/min (100 % accuracy). The summary of the results is presented in Tables 5 and 6 for ITR and classification accuracies, respectively. The comparison presented of the ITR and accuracy values obtained for the features drawn from the conventional whole ERP and the proposed “hand-picked” features, resulting from the ANOVA, supports the improvement of the proposed method. The method proposed allows us to increase the classification results (only a single case of a decrease was reported) using the leave-one-out cross-validation for the GNBC technique. An online aBCI application is planned as a next stage by the authors.

Discussion and Conclusions

In this paper, we have presented an approach leading to the improvement of classification accuracies and ITRs in a novel offline aBCI paradigm. This has been achieved by introducing ERP feature extraction in P300 range latencies to replace the classical whole evoked response range approaches.

The proposed improvement method allows the extraction of the most separable ERP features, enabling an increase in the classification accuracy and an improvement ITR of a maximum of +35.30 bit/min (22 % accuracy) in the case of features drawn for single-electrode ERP distributions. In the case of the features obtained from the eight trials averaged ERP responses, the majority of subjects also improved their results with a maximum increase of 10.43 bit/min (44 % in accuracy).

These are the very encouraging results, providing the possibility further to improve the auditory-paradigm-based BCI.

The main achievement reported in the paper allows us to improve the spatial aBCI paradigm in the offline mode, which is a step forward in non-vision-based interfacing strategies. We have also shown that in comparison with contemporary applications of spatial auditory BCI paradigms that fail to utilize rear-to-the-head loudspeakers, it is possible to utilize all spatial horizontal sound directions thanks to the proposed classification improvement approach based on the “hand-picked” ERP latencies.