Introduction

Brain–computer interface (BCI) uses brain signals to provide a direct communication pathway between a brain and external devices (Vaughan et al. 2006; Wolpaw et al. 2002). Among these brain signals, P300 event-related potential (ERP) has been widely used in electroencephalography (EEG)-based BCI systems (Farwell 2012). It relies on the fact that, infrequent stimuli, when interspersed with routine stimuli, typically evoke a positive peak at about 300 ms (Bernat et al. 2001). This is the so-called oddball paradigm (Fabiani et al. 1987; Güntekin and Başar 2010).

In Farwell and Donchin (1988), the oddball paradigm was used in a BCI system, and a P300 speller was established. In this paradigm, users were presented with a 6 × 6 matrix containing 36 characters. The rows and columns of the matrix were flashed randomly. The user was required to concentrate exclusively on the character to be selected and to ignore the other characters. In this case, the probability that the target flashed was 0.167 (2/12). The P300 elicited by the oddball was detected and translated into a character. In this stimuli display paradigm, the desired character was identified as the intersection of its row and column. This row–column (RC) paradigm has been widely used in P300 spellers as a benchmark.

It has been proved that P300 speller is very effective in detecting the characters with high accuracy. However, there is still a trade-off between the accuracy and the speed. Since the last two decades, a lot of efforts have been put into the development of the P300 speller paradigm. The individual parameters of the paradigm have been studied and optimized, which include matrix size (Allison and Pineda 2003), stimulus interval (Sellers et al. 2006), stimulus intensity and so on Takano et al. (2009), Allison and Pineda (2006), Salvaris and Sepulveda (2009), McFarland et al. (2011). In Donchin et al. (2000), Donchin et al. showed that the P300 amplitude is related to the probability of oddball occurrence. The smaller probability of the oddball event generally elicits a more significant P300.

Various P300 stimuli display paradigms have been proposed (Guan et al. 2004; Fazel-Rezai and Abhari 2009; Townsend et al. 2010). Besides the RC paradigm, two typical paradigms are single character (SC) paradigm proposed in Guan et al. (2004) and region-based (RB) paradigm in Fazel-Rezai and Abhari (2009). In Guan et al. (2004), Guan et al. compared RC speller with the SC speller and concluded that a higher accuracy can be obtained using the SC speller. This maybe explained by its lower probability of oddball occurrence, which is 0.025 if there are 40 flashing buttons. In Fazel-Rezai and Abhari (2009), the RB paradigm was designed to overcome crowding effect of flashing buttons. Strasburger (2005) showed that the crowding effect may lead to errors when spelling characters. Fazel-Rezai et al. discussed the adjacency problem (Fazel-Rezai 2007) and showed that the flashes of non-target characters near to target may attract user’s attention and produce P300. Both RC speller and RB speller were tested in Fazel Rezai and Abhari (2008) and the results showed that RB speller achieved better accuracy. In Fazel-Rezai and Ahmad (2011), six subjects were asked to spell two words using SC and RB spellers, respectively. The results showed that higher accuracy can be obtained using the RB speller as compared to SC spellers. However, the sample size was small and the experiment was short in this study. It is necessary to confirm which one is better between SC and RB speller and which change was more important. This is an important issue to clarify, which may allow further improvements of the P300-based BCI system.

In this paper, we present a comparison between the SC speller and the RB speller. A character input experiment involving 12 subjects was conducted. Using the collected data, we analyzed P300 detection performance, the P300 waveforms and Fisher ratios. Our experimental and data analysis results showed that the RB paradigm may enhance the P300 potential, and thus improve the performance of the P300 speller.

The organization of this paper is as follows. In the second section, the description of two spellers, experiment procedures, data collection and processing methods are presented. “Experiment results and discussions” are given in third section. “Conclusion” are available in last section.

Methods

In this section, the SC speller and the RB speller are first described. Based on the two spellers, the experimental procedure, data collection and processing are then presented.

Single character (SC) speller

The SC speller was first proposed in Guan et al. (2004). In this speller, a subject is presented with a six by six matrix of characters as illustrated in Fig. 1. When the speller starts, each single character is intensified for 60 ms in a random order (Guan et al. 2004). Unlike in the RC speller, characters in the SC speller are intensified one by one. A single repeat (round) includes 36 flashes corresponding to 36 individual characters. The subject needs to focus on one target character in the matrix at a time. Therefore, the probability of oddball occurrence is 0.028 (1/36). By detecting the P300 potential, the single target character could be found after several rounds of intensifications.

Fig. 1
figure 1

The stimuli display paradigm for SC speller (Guan et al. 2004)

Region-based (RB) speller

The RB speller here is originated from Fazel-Rezai and Abhari (2009) and we have made several changes. In the RB speller, six groups of characters are first arranged into different regions (Fig. 2a). The regions are arranged at the corners of a hexagon in order to reduce the interference of adjacent regions. At the first level (region selection level), the six characters “AHOV03” located at the most separated corners of the six regions flash in a random order so that the spatial crowding effect can be alleviated. For example, the character “A” representing its region is intensified in Fig. 2a. After the region selection, the speller enters into the second level (Fig. 2b), where six characters in the selected region are separated. The individual characters are then intensified in a random order. Thus, one repeat/round of flashes for each level includes 6 flashes. Therefore, the probability of oddball occurrence is 0.167 (1/6) for each level. In this paradigm, each flash lasts for 75 ms, followed by a 75 ms inter-stimulus interval (Fazel-Rezai and Abhari 2009).

Fig. 2
figure 2

The stimuli display paradigm for RB speller (Fazel-Rezai and Abhari 2009). a Level 1: each region/group contains 6 characters/symbols, in which only one flashes. b Level 2: all characters/symbols are separated and flash in a random order

For the purpose of comparison, both SC speller and RB speller work in a synchronous mode and with the same configuration. For the SC speller, one character is input through 10 repeats/rounds of flashes. The time for each trial (corresponding to a character input) is 21.6 s. For the RB speller, a P300 detection in each level is also based on 10 rounds of button flashes and the corresponding time is 9 s. The time interval between two levels is 3.6 s. Thus the time for each trial corresponding to a character input is also 21.6 s. For the two spellers, the time interval between two sequential trials is 4 s, and the size of the stimuli display area is 700 × 700 px2. Furthermore, the number of characters/symbols that can be spelled is 36 for both spellers. Under this configuration, we will compare the online accuracy rates, offline accuracy rates, P300 waveforms, and Fisher ratios for the two spellers.

Experiment procedure

Each subject completed two experimental sessions on separate days, each for one speller. Half of the subjects began with the SC speller session and the other half began with the RB speller session. Each session consisted of a calibration phase and an online test phase. The calibration phase lasted 5 min in which 10 randomly selected characters were provided on the top of the graphical interface for copy-spelling. The subject was required to silently count the number of the target flashes. An SVM model was trained for each subject with data collected during the calibration phase and was subsequently applied to online test. In the online test phase, the task was to input 32 predefined characters: CAT, DOG, FISH, BOWL, GLOVE, HAT, SHOES, and WATER, which were used in BCI competition 2003 by Blankertz et al. (2004) (dataset IIb). Online result was provided directly on the GUI as a feedback to the subject. In each session, the EEG signals were also recorded for further offline analysis.

Data collection and processing

Twelve healthy right handed subjects, 11 males and 1 female aged from 21 to 32, attended the experiment. Two of them had limited prior experience for using P300 speller during the system development. The other ten subjects were native users. All the subjects had no history of psychological or neurological disorders.

Each subject sat in a comfortable chair approximately 0.6 m from a computer monitor. Stimuli were presented on a 19′′ TFT screen with a refreshing rate of 60 Hz and a resolution of 1440 × 900  px2. The EEG signals were recorded with a 32-channel cap and a Neuroscan NuAmps device. The choice of EEG reference is a critical issue for the study of brain activity. Usually, ear lobe or mastoid (i.e. bony outgrowth behind the ear) is used as reference channel (Allison and Pineda 2003; Allison and Pineda 2006; McFarland et al. 2011; Fazel-Rezai and Abhari 2009; Townsend et al. 2010; Guger et al. 2009; Talebi et al. 2012). Many studies focusing on reference-free methods such as scalp Laplacian (SL) (Yao 2002) and the reference electrode standardization technique (REST) (Qin et al. 2010; Yao 2001) have been published. In this paper, all channels were simply referenced to the right mastoid. Only 8 electrodes “Fz”, “Cz”, “P3”, “Pz”, “P4”, “O1”, “Oz” and “O2” were used in this experiment. The EEG signals were digitized at a sampling rate of 250 Hz and bandpass filtered at 0.1–30 Hz. Subsequently, the filtered signals were downsampled by a factor of 5 in order to reduce the computational complexity. The downsampled signals were then segmented from 0 to 600 ms after onset of each flash. One epoch of EEG data was thus obtained for each flash, of which the data from all channels were concatenated to form a feature vector.

We normalized all the feature vectors in both training sets and test sets by mapping them to the range [0, 1] for all subjects. The normalized feature vectors of test data sets served as the input of a SVM classifier trained by their corresponding training sets (Long et al. 2011). The SVM classifier was based on the popular LibSVM toolbox with Gaussian kernel (Chang and Lin 2011).

Experiment results and discussions

Online accuracy rate

Table 1 shows the online performance for the 12 subjects, from which we see that nine of the subjects performed more accurately in the RB speller, one performed more accurately for the SC speller, and two performed with similar performance for the two spellers. The average accuracy rates are 93.47 and 89.32 % for the RB speller and the SC speller, respectively. A t test was performed to the accuracy rates shown in Table 1. Significant difference of accuracy rates was observed between the two spellers (p = 0.0128, p < 0.05).

Table 1 Accuracy rate of 32 characters spelling for SC speller and RB speller

Offline accuracy rate

The average spelling accuracy rates of 12 subjects under various repeat numbers are shown in Fig. 3 for comparison. Each accuracy rate for a repeat number, e.g., 5 is obtained using the data collected from rounds 1 to 5. The number of repeats ranges from 1 to 10. Accuracy rates increase with the number of repeats for both spellers. For each number of repeat, higher accuracy rate is obtained from the RB speller than from the SC speller. A paired t test applied on each number of repeats shows a significant differences (p < 0.05).

Fig. 3
figure 3

Average spelling accuracy rates across all subjects versus numbers of repeats. The dotted line with circles and the solid line with squares are from the SC speller and RB speller, respectively, and error bars indicate standard variance

ERP waveform

For each channel and each subject, we average the EEG data in the time interval of 0–600 ms across 320 repeats (32 selections of target buttons with 10 repeats in each selection) for SC speller and across the 640 repeats (2 levels and 32 target button selections in each level with 10 repeats in each selection) for the RB speller. Two target ERP waveforms are then obtained for the two spellers, respectively. Next, we calculate two non-target ERP waveforms similarly for the two spellers, respectively, which are used as baseline. For instance, for the SC speller, the non-target ERP waveform is obtained by averaging the EEG data in the 0–600 ms interval of all the non-target buttons’ flashes during the 32 selections of target button. Furthermore, we compute the difference between target and non-target ERPs (target ERP minus non-target ERP) for each speller.

Figure 4 shows the two average ERP waveforms over all the 12 subjects at the electrode location Fz, Cz, P3, Pz, P4, O1, Oz and O2. The blue dashed waveforms are for the SC speller and the red solid waveforms are for the RB speller. For each ERP waveform, there exists a peak at approximate 350ms. We find that the peak is higher for the RB speller than for the SC speller at electrodes P3 (5.17 vs. 4.89), Pz (5.47 vs. 4.43), P4 (6.31 vs. 5.75), O1 (5.34 vs. 4.55), Oz (5.87 vs. 5.29), and O2 (5.63 vs. 4.49). Thus we have observed an enhancement of P300 potential for the RB speller compared with the SC speller. This may lead to a higher classification accuracy for the RB speller. Although theoretically smaller oddball probability may elicit more significant P300 potential, the amplitude of P300 potential is also affected by other factors such as stimulus characteristics (stimulus intensity and so on) (Sellers et al. 2006; Gonsalvez and Polich 2002; Covington and Polich 1996). In the RB speller, the stimuli display paradigm is optimized for reducing the crowding effect of flashing buttons, which also affects the P300 potential (will be further analyzed in the next subsection).

Fig. 4
figure 4

Two average ERP waveforms (target ERP minus non-target ERP) of all 12 subjects for the SC and RB spellers at each electrode location: Fz, Cz, P3, Pz, P4, O1, Oz or O2. Blue dashed lines: for the SC speller, red solid lines: for the RB speller. (Color figure online)

In order to illustrate the efficiency of P300 detection, we analyze the Fisher ratio between two classes of data (target vs. non-target) for the two spellers. Fisher ratio is an efficient criterion for assessing the separability of two classes of data (Duda and Hart 1973). It is the ratio of between-class scatter degree and within-class scatter degree of data. Larger Fisher ratio score means better discriminability of the two classes. For each subject, each channel, each number of repeat and each speller, we first obtain two sets of EEG data corresponding to target button flashes or non-target button flashes (two classes). Each data set contains all sample points of all 0–600 ms segment which correspond to all target button flashes (or all non-target button flashes). Based on the two data sets, the Fisher ratio score is calculated as

$$ FR = \frac{(m_1 - m_2)^2}{S_1+S_2}, $$
(1)

where m i and S i (i = 1,2) represent the mean and the variance of of the two data sets, respectively.

For all channels, three numbers of repeat, and two spellers, the average Fisher ratio scores across all subjects are shown in Table 2. It can be see that all the 8 channels have higher Fisher ratio scores for RB speller than for SC speller. A t test paired by subjects is applied for each channel and each repeat number and a significant difference is observed (p < 0.05). This result also implies that the P300 potential is enhanced by the stimuli display paradigm of the RB speller. Thus this stimuli display paradigm is more promising for the design of P300-based BCIs.

Table 2 Average Fisher ratio scores across all subjects for 8 channels, different number of repeats and two spellers

P300 detection performance and interference between adjacent flashing buttons

In each experimental session, totally there were 384 characters input for all the 12 subjects (32 characters for each subject). For SC speller, offline data analysis showed that there totally 41 errors happened (error rate: \(\frac{41}{384}=10.68\,\%\)). Thus the accuracy rate for P300 detection was 89.32 %, which was equal to the accuracy rate of character input. For the RB speller, there were 768 selections for all the 12 subjects (for each subject, 32 selections corresponding to the 32 input characters were made in each level). Among the 784 selections, there observed 36 errors (error rate: \(\frac{36}{768}=4.69\,\%\)). Thus the accuracy of P300 detection was 95.31 %, which was different from the spelling accuracy for this speller. Comparing the error rates, we may conclude that the stimuli display paradigm in each level of the RB speller is more suitable for P300 detection than for the stimuli display paradigm of the SC speller. Furthermore, among the 41 errors for SC speller, 28 happened at these flashing buttons which are adjacent to the target buttons (the rate was \(\frac{28}{41}=68.29\,\%\)). For the RB speller, 13 of 36 errors happened at the buttons adjacent to the target buttons (the rate was \(\frac{13}{36}=36.11\,\%\)). Thus the interference between adjacent flashing buttons was significantly reduced for RB speller compared with SC speller. This might be due to the increased distance between adjacent buttons.

Conclusion

In this paper, a comparison of two existing P300 spellers was presented. In our experiment, higher online accuracy was obtained for RB speller than for SC speller, which was consistent with previous report (Fazel-Rezai and Ahmad 2011). Further data analysis results, including P300 detection, P300 waveform and Fisher ratio, demonstrated that P300 potential was enhanced in RB speller. This enhancement, which led to better performance of BCI speller, might be due to the increased distance and the decreased interference between neighbor buttons. Our study also suggests that when we design a stimuli display paradigm for P300 detection, several factors such as the probability of oddball occurrence and the distance between adjacent buttons should be considered simultaneously. Another advantage of the RB P300 speller is that the GUI contains smaller number of flashing buttons. As the experience of our users, they may not feel dizzy even if they use the RB P300 speller for a long time (e.g., at least an hour). Future work includes other applications (e.g., brain switch) based on the stimuli display paradigm of the RB speller.