Introduction

A brain–computer interface (BCI) aims to restore the means of communication for people suffering severe motor disabilities or persisting in a vegetative state, by bypassing the peripheral nervous system to provide control over external devices such as robotic arms or other prostheses (Wolpaw et al. 2002). Methods used to acquire brain signals for BCI purposes can be either invasive or noninvasive. It has been shown that invasive BCIs are capable of reconstructing continuous limb movements to provide multidimensional control of robotic and prosthetic arms to monkeys and to people with long-standing tetraplegia (Hochberg et al. 2012; Philip et al. 2013). However, the same utility has not yet been achieved for noninvasive BCIs. The present research takes a step toward developing a noninvasive method for BCI by decoding subjects’ binary decisions as “yes” or “no,” using functional near-infrared spectroscopy (fNIRS). fNIRS is the use of near-infrared spectroscopy (NIRS) for functional imaging of brain. It is a new noninvasive optical imaging modality that uses light in the near-infrared range (typically of 650–1,000 nm wavelength) to measure the hemodynamic response of the cerebral cortex (Hoshi 2007; Wylie et al. 2009). The main advantages of this technique are its relatively low cost, safety, portability, wearability and overall ease of use. The principle of NIRS measurement, first reported by Jobsis (1977), has been applied typically to investigations into cerebral hemodynamics, but only in the last few years has it been used in the brain imaging, brain-state decoding and BCI context (Coyle et al. 2007; Naito et al. 2007; Sitaram et al. 2007; Luu and Chau 2009; Hu et al. 2010, 2011, 2013; Naseer and Hong 2013).

Decoding binary decisions is of particular importance in the development of BCI, as the first signal that we want to give to an assistive device is “on” or “off.” Binary decision decoding, as a mean of binary communication, might also be very useful for anarthric people or those persisting in a vegetative state. Previous studies have yielded promising results for motor-imagery-based fNIRS-BCI for healthy subjects (Coyle et al. 2007; Sitaram et al. 2007; Naseer and Hong 2013); however, in the case of patients with congenital or long-standing motor impairments, it is very difficult to extract functional activity via motor imagery in a manner suitable for BCI operation (Power et al. 2009). This issue has been addressed in recent studies that have used fNIRS to detect, from the prefrontal cortex, functional activities related to mental singing (Naito et al. 2007), mental arithmetic (Utsugi et al. 2007; Bauernfeind et al. 2008) and various other mental tasks (Ogata et al. 2007; Utsugi et al. 2007). In addition, motor disabilities are strongly tied with neuronal activities in the motor cortex or the parietal lobe, which leave the prefrontal cortex less likely to be implicated in cases involving motor problems. Signal attenuation and motion artifacts due to scalp hair are less severe when acquiring signals from the prefrontal cortex than when obtaining from the motor cortex. Various studies have shown that decision-making causes cognitive loads in the prefrontal cortex (Tranel et al. 2002; Volz et al. 2006; Yang and Raine 2009).

This paper proposes an fNIRS-based online binary decision decoding framework. Linear discriminant analysis (LDA) and support vector machine (SVM) are the two widely used classification methods for fNIRS- and EEG-based BCI (Sitaram et al. 2007; Luu and Chau 2009; Salvaris and Sepulveda 2009; Hu et al. 2012). To investigate and compare the performance of LDA and SVM, for binary decision decoding, both LDA and SVM are used for classification. The contributions of this study are as follows: (1) It is the first work on fNIRS-based online binary decision decoding using the prefrontal cortex. (2) The classifier trained by SVM offers significantly better classification accuracy than does that trained by LDA.

Materials and methods

Signal acquisition

fNIRS measures cortical brain activity through hemodynamic changes, that is, the changes in the cerebral blood flow, or the concentration changes of oxygenated hemoglobin (HbO) and deoxygenated hemoglobin (HbR). Light incident penetrating the outer tissues of the human brain diffuses through the tissue due to multiple scattering of photons. A portion of these photons is absorbed, while the rest continue to scatter as they make their way through the medium. Some of the photons exit the head after passing through the cortical areas, wherein the chromophores HbO and HbR are capable of absorbing near-infrared light. Back-reflected photons can be detected using a suitably placed photon detector. The intensity of light exited through the head is then used to calculate the HbO and HbR concentration changes (\( \Delta c_{\text{HbR}} (t) \) and \( \Delta c_{\text{HbO}} (t) \)) along the photon path. The relative change of the concentration of HbX (i.e., HbO and HbR), \( \Delta c_{\text{HbX}} \), is then calculated with reference to the dual-wavelength light intensity signals and the modified Beer–Lambert law as

$$ \left[ {\begin{array}{*{20}c} {\Delta c_{\text{HbO}} (t)} \\ {\Delta c_{\text{HbO}} (t)} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\alpha_{\text{HbO}} (\lambda_{1} )} & {\alpha_{\text{HbR}} (\lambda_{1} )} \\ {\alpha_{\text{HbO}} (\lambda_{2} )} & {\alpha_{\text{HbR}} (\lambda_{2} )} \\ \end{array} } \right]^{ - 1} \left[ {\begin{array}{*{20}c} {\Delta A(t,\lambda_{1} )} \\ {\Delta A(t,\lambda_{2} )} \\ \end{array} } \right]\frac{1}{l \times d} , $$
(1)

where \( \Delta A(t;\lambda_{j} ) \) (j = 1,2) is the unit-less absorbance (optical density) variation of the light emitter of wavelength λ j , α HbX(λ j ) is the extinction coefficient of HbX in μM−1mm−1, d is the unit-less differential pathlength factor (DPF), and l is the distance (in millimeters) between emitter and detector. A multichannel continuous-wave imaging system (DYNOT: DYnamic Near-infrared Optical Tomography; two wavelengths 760 and 830 nm) from NIRx Medical Technologies, NY, was used to acquire brain signals at a sampling rate of 1.81 Hz.

Subjects

Fourteen healthy male subjects (mean age 25.64 ± 4.06 years) participated in the experiment. None of them had a history of any psychiatric, neurological or visual disorder, and they all provided verbal informed consent. The experiment was conducted in accordance with the Declaration of Helsinki.

Optode configuration and placement

Three emitters and eight detectors were used to measure the signals from the prefrontal cortex. The optode configuration and channel distribution are shown in Fig. 1. This emitter–detector sequence was positioned on the forehead such that the bottom row of detectors was just above the eyebrows to detect activation in the prefrontal cortex of the brain. A large literature on the relationship between cerebral and extracerebral contribution and the emitter–detector distance is available (McCormick et al. 1992; Okada et al. 1997; Gratton et al. 2006; Zhang et al. 2007; Frederick et al. 2012; Gagnon et al. 2012). The emitter–detector distance plays an important role in fNIRS measurements, because as it increases, so does the imaging depth (McCormick et al. 1992). To measure the hemodynamic response signals from superficial tissues, usually an emitter–detector separation of around 3 cm is applied (Zhang et al. 2007; Frederick et al. 2012); with a separation of more than 5 cm, the signal arriving at the detector might become too weak to be usable (Gratton et al. 2006). Although the total number of channels in our configuration was 24, only those numbered in Fig. 1 were considered in the analysis as, owing to their appropriate emitter–detector separations, they contained useful information.

Fig. 1
figure 1

Optode placement and channel location in the experiment: Each red-filled square represents an emitter containing two wavelengths (760 and 830 nm), each circle represents a detector, and Fp1 and Fp2 represent two reference points from the international 10–20 system (color figure online)

Experimental procedure

In preparation for the experiment, the subjects were advised not to drink coffee or smoke cigarettes less than 3 h before the experiment. The subjects were seated in a comfortable chair facing a computer screen, positioned 70 cm from the subject’s eyes, and were asked to relax and restrict their head movement. They were shown some simple questions on the computer screen and were asked to answer them, by mentally making a binary decision, with a “yes” or a “no”. Thereafter, they openly declared their “yes” or “no” answers, on a custom-built graphical user interface (GUI). For making a “yes” decision, the subjects were instructed to perform mental arithmetic, and for making a “no” decision, they were asked to relax. Naito et al. (2007) and Sorger et al. (2009) used similar experimental procedures to decode two-choice and multiple answers, respectively.

Figure 2 illustrates the experimental sequence: (1) The first 20 s is a resting period to set up a baseline condition. (2) For the next 10 s, a single question is shown on the computer screen. (3) In the next 20 s, the subjects have to mentally make a binary decision, in answering the question with a “yes” or a “no.” (4) The last 20 s is another rest period, which allows the signals to settle to the baseline values. The above sequence was repeated for all ten questions for a total experimental duration of 700 s for each subject. After the sequence was completed for all ten questions, a GUI appeared on the screen, on which the subjects openly declared their ten answers within 50 s, 5 s having been allotted for each answer. During this 50-s period, fNIRS signals were not recorded.

Fig. 2
figure 2

Schematic illustration of the experimental paradigm used: The light blue blocks represent the 20-s rest periods at the beginning and at the end, the second green block represents the 10-s question presentation period, while the third red block represents the 20-s decision-making period (color figure online)

Ten questions presented in sequence are listed in Table 1. They queried the subjects about simple matters of daily life that could be answered easily with a “yes” or a “no.” A practice session was conducted for each subject before the actual experiments to allow them to become familiar with the experiment and the interface. During a mental arithmetic task, the participants performed a series of mental arithmetic calculation that appeared in a pseudorandom order. These calculations consisted of subtraction of a two-digit number (between 10 and 20) from a three-digit number throughout the task period with successive subtraction of a two-digit number from the result of the previous subtraction (e.g., 244–14, 240–11 and 229–16).

Table 1 Questions asked during the question presentation period

Signal processing

From the brain–computer interface point of view, one can either classify optical density variation directly or first convert the optical density signals to \( \Delta c_{\text{HbX}} \), using Eq. (1), prior to classification. Both methods have appeared in the literature. Naito et al. (2007) and Power et al. (2009) classified the optical density variations directly, whereas Coyle et al. (2007) and Sitaram et al. (2007) classified the \( \Delta c_{\text{HbX}} \) signals. None of the methods has been shown to perform better than the other in terms of classification accuracy. In this research, the change in optical density \( \Delta A(t) \) was calculated using the raw measurements of the two intensities (760 and 830 nm). \( \Delta c_{\text{HbX}} \) was then found using (1), since \( \Delta A(t) \), \( \alpha_{\text{HbX}} \) and l for both 760 and 830 nm intensity wavelength lights were known. For d, 5.9 was used in accordance with the literature (Delpy et al. 1988; van der Zee et al. 1992). The \( \Delta c_{\text{HbX}} \) signals contain high- and low-frequency physiological noises, especially due to heartbeat, respiration and Mayer waves (Santosa et al. 2013). To remove these, the fourth-order Butterworth filter was first applied to low-pass- and high-pass-filter the raw intensity signals with the cutoff frequencies of 0.6 and 0.01 Hz, respectively. For normalization, the signal was then divided by the mean amplitude of the baseline signal. It should be noted that it is neither possible to find out the absolute values of the concentration changes of HbX [due to the incremental form in (1)] nor possible to quantify even the relative values using the DPF, since the optical pathlength may vary on channels with the same emitter–detector distance. However, in this paper, for the purpose of getting close to the true values of concentration changes and for a possible extension to an adaptive DPF algorithm, the modified Beer–Lambert law has been used.

Classification

After the data preprocessing described in the previous section, classification was performed on the \( \Delta c_{\text{HbX}} (t) \) signals. The aim of classification is to decode binary decisions based on the features extracted from fNIRS data. The selected features in this paper are the means of ΔHbO and ΔHbR signals during the decision-making period (i.e., the task period between 30 and 50 s, see Fig. 2), which results in a two-dimensional feature space. Let \( x_{n} = \left[ {\begin{array}{*{20}c} {\overline{{\Delta c_{\text{HbO}} (t)}} } & {\overline{{\Delta c_{\text{HbR}} (t)}} } \\ \end{array} } \right]_{n}^{\text{T}} \) be the data point from the n-th sample (response to question) in the two-dimensional feature space, where the bar notation and superscript T denote mean and transpose, respectively. In our case, we classified the data into two classes: “yes” and “no.” Among the existing classification algorithms, the linear classifiers such as LDA and SVM are the two most accepted and commonly used classifiers for BCI applications. Both LDA and SVM use discriminant hyperplanes to separate the data representing two or more classes. Because of their simplicity and low computational requirements, these two classifiers are highly suitable for online BCI systems (Lotte et al. 2007).

LDA

In LDA, the separating hyperplane is obtained by seeking the projection that maximizes the distance between the two classes’ means and minimizes the interclass variances. LDA assumes normal distribution of the data, with equal covariance matrix for both classes (Lotte et al. 2007). The goal of LDA is to seek a vector v in the feature space such that two projected clusters of yes-decision (Y) and no-decision (N) on the v-direction can be well separated from each other while maintaining a small variance for each cluster. This can be done by maximizing the Fisher’s criterion given by

$$ J(v) = \frac{{v^{\text{T}} S_{\text{b}} v}}{{v^{\text{T}} S_{\text{w}} v}} $$
(2)

where S b and S w are the between-class and within-class scatter matrices defined as follows:

$$ S_{\text{b}} = \left( {m_{Y} - m_{N} } \right)\left( {m_{Y} - m_{N} } \right)^{\text{T}} , $$
(3)
$$ S_{\text{w}} = \mathop \sum \limits_{{x_{n} \in Y}} \left( {x_{n} - m_{Y} } \right)\left( {x_{n} - m_{N} } \right)^{\text{T}} + \mathop \sum \limits_{{x_{n} \in N}} \left( {x_{n} - m_{Y} } \right)\left( {x_{n} - m_{N} } \right)^{\text{T}} $$
(4)

where m Y and m N represent the group mean of classes Y and N, respectively. It can be seen that a vector v that satisfies (2) can be reformulated as a generalized eigenvalue problem as:

$$ S_{\text{w}}^{ - 1} S_{\text{b}} v = \lambda v. $$
(5)

The optimal v is then the eigenvector corresponding to the largest eigenvalue of \( S_{\text{w}}^{ - 1} S_{\text{b}} \) or is directly obtained as

$$ v = S_{\text{w}}^{ - 1} (m_{Y} - m_{N} ) $$
(6)

provided that S w is nonsingular.

SVM

On the other hand, the SVM classifier is designed to maximize the distance between the separating hyperplane and the nearest training point(s) (i.e., support vectors) (see Fig. 4). Recall that the objective of the separating hyperplane is to conclude yes- or no-decision from the means of \( \Delta c_{\text{HbO}} (t) \) and \( \Delta c_{\text{HbR}} (t) \) after the tasks. The separating hyperplane in the 2D feature space is given by the following equation.

$$ f\left( x \right) = r.x + b, $$

where r, xR 2 and bR 1 (see Fig. 4). The optimal solution r* that maximizes the distance between the hyperplane and the nearest training point(s) can be obtained by minimizing the following cost function.

$$ J\left( {r, \xi } \right) = \frac{1}{2}r^{2} + C.\mathop \sum \limits_{n = 1}^{z} \xi_{n} $$
(7)

while satisfying the following constraints

$$ \begin{gathered} \left( {x_{n} .r + b} \right) \ge 1 - \xi_{n} \,{\text{for}}\quad y_{n} = + 1 \hfill \\ \left( {x_{n} .r + b} \right) \ge - 1 + \xi_{n} \,{\text{for}}\quad y_{n} = - 1 \hfill \\ \xi_{n} \ge 0\,\forall n \hfill \\ \end{gathered} $$
(8)

where \( r^{2} = r^{\text{T}} r \), C is the positive regularization parameter chosen by the user (a large value of C corresponds to higher penalty for classification errors), \( \xi_{n} \) is the measure of training error, z is the number of misclassified samples, and \( y_{n} \) is the class label (+1 or −1 in the case of binary classification) for the n-th sample.

The linear decision boundaries for both LDA and SVM were obtained during an off-line training session prior to the test session. After that, the online classification into “yes” or “no” classes was performed by projecting the test samples, acquired after each experimental trial, on the decision boundaries. It should be noted that the signal processing and classification started after sequential incorporation of data acquired over one trial. The mean values of \( \Delta c_{\text{HbO}} (t) \) and \( \Delta c_{\text{HbR}} (t) \), for each response, averaged over the 20-s task period, and the selected 12 channels (numbered in Fig. 1) were used as the features for both LDA and SVM classification. The feature vector, hence, consisted of ten two-dimensional data points for each subject over one experimental trial.

The Matlab was utilized to perform the LDA classification, while the SVM classifier was implemented on Matlab using LibSVM (Chang and Lin 2011). For LDA implementation, the features were loaded into the “classification” module of the Matlab statistics toolbox and LDA was selected to classify the data. SVM classification implementation was carried out in the following steps: (1) The fNIRS data were transformed into the LibSVM format. (2) Each feature was scaled to a value within [−1, 1] range. Scaling was performed in order to avoid attributes within greater numeric ranges from dominating those within smaller numeric ranges. (3) The linear kernel of the SVM algorithm was selected. The default value of 1 was used as the regularization parameter. (4) The SVM model was trained for a specific subject. (5) The SVM model was then used to predict the class label, based on the features in the testing set.

Cross-validation, a standard procedure in pattern recognition and task discrimination in BCI, was used to calculate the classification accuracies. The tenfold cross-validation that mixes the data randomly into ten segments of which nine segments are used for training and the tenth is used for testing, with the error averaged over all training/testing combinations, was used to determine the average classification accuracies.

Online classification, in our case, means that the data were sequentially incorporated after each trial and the classification results were obtained as soon as one experimental trial was over. The system would have been real-time if the training was performed online and the classification results were obtained simultaneously.

Results

The average classification accuracy for each subject is presented in Table 2. The classification accuracy of the LDA classifier averaged over the entire subjects was 74.28 %, whereas that for SVM was 82.14 %. The percentages of “yes” responses made by the subjects are listed in Table 2. The t test was conducted to determine whether the difference between the mean of classification accuracies using LDA and SVM, respectively, was statistically significant or not. With the null hypothesis being “the difference in the mean of classification accuracies measured using SVM and LDA is not significant,” the p value was found to be 0.00238, which, based on the 5 % significance level, rejected the null hypothesis; that is to say, the difference between the LDA and SVM classification accuracies is statistically significant, and accordingly, it was determined that SVM performs better than LDA in terms of the classification accuracies.

Table 2 Comparison of LDA and SVM: classification results and the “yes” response percentage for each subject

Discussion

In this study, we demonstrated the feasibility of fNIRS-based online binary “yes” or “no” decision decoding. With the SVM, an average classification accuracy as high as 82.14 % was achieved. Due to individual differences (Yarkoni and Braver 2010), the peak values of the average \( \Delta c_{\text{HbX}} (t) \) also differed individually, though the average classification accuracy, using either LDA or SVM, did not fall below 60 % for any subject. The hemodynamic responses, however, remained similar throughout all subjects (see Fig. 3a). In Fig. 3b, the grand averages of \( \Delta c_{\text{HbO}} (t) \) across the entire subjects with one standard deviation are shown. To verify that the difference was caused by “yes” and “no” only (i.e., not by random groups of trials), t test was conducted for the first five questions versus the last five questions and also for odd versus even questions. The p values were found to be 0.254 and 0.432, respectively, which shows that there is no significant difference for random groups.

Fig. 3
figure 3figure 3

Average signals: a Average ∆c HbX(t) signals of Subjects 1, 2, 3, 4 and 7 for “yes” and “no” responses. b The grand average with standard deviation of ∆c HbO(t) signals, across all 14 subjects, for “yes” and “no” responses

fNIRS is an indirect optical measurement technique; that is, it does not detect neural activity directly, but rather detects the hemodynamic changes due to neural activation. Accordingly, there is always a time delay between an activity and the detected response. For BCI and other real-time applications, it might be possible to compromise classification accuracy versus temporal delay. However, for decoding applications, the higher classification accuracy would be desirable. With advanced filtering techniques (Khoa and Nakagawa 2008; Biallas et al. 2012; Kamran and Hong 2013), different features from fNIRS signals and different classification techniques [e.g., the hidden Markov model or neural networks (Khoa and Nakagawa 2008)], the classification accuracy can be increased. It has also been shown recently that using a hybrid NIRS–EEG brain–computer interface, the classification accuracies can be further increased (Fazli et al. 2012).

In this study, the effect of habituation (Szabo and Gauvin 1992) and differences between high- and low-skilled arithmetic problem solvers (Nunez-Pena and Suarez-Pellicioni 2012) were not considered. A continuing exposure to mental arithmetic might result in lower hemodynamic response due to habituation. Further study needs to be devised to investigate the effects of habituation and differences between high- and low-skilled problem solvers in mental arithmetic-induced hemodynamic response and thereby classification accuracies.

This paper presents the first work on fNIRS-based online binary decision decoding from the prefrontal cortex using linear discriminant analysis (LDA) and support vector machine (SVM), two widely used classification methods for fNIRS- and EEG-based BCI (Sitaram et al. 2007; Luu and Chau 2009; Salvaris and Sepulveda 2010; Hu et al. 2012; Li and Zhang 2012). The classifier trained by SVM offered significantly better classification than classification obtained from LDA. This result is similar to recent studies done by Hu et al. (2012) that successfully increased intra-subject classification accuracy using SVM. The results from this study also demonstrate the potential use of SVM to discriminate binary signals of neuronal activity within the prefrontal cortex, demonstrating the feasibility of fNIRS as applied to BCI (Figs. 4, 5).

Fig. 4
figure 4

The two-dimensional feature space for SVM classification

Fig. 5
figure 5

Classification accuracies, using LDA and SVM, for one complete trial averaged over all subjects

Conclusions

This paper presented an fNIRS-based online binary decision decoding framework based on the signals acquired from the prefrontal cortex. The LDA and SVM classifiers were used to decode the binary decisions as “yes” or “no.” Using the mean values of \( \Delta c_{\text{HbO}} (t) \) and \( \Delta c_{\text{HbR}} (t) \) as features to the classifiers, the average SVM classification accuracy was 82.14 %, whereas the average LDA accuracy was 74.28 %. The results of this research demonstrate the feasibility of fNIRS as applied to binary decision decoding and the potential use of SVM to discriminate signals from the prefrontal cortex.