Introduction

Functional near-infrared spectroscopy (fNIRS) is a non-invasive neuroimaging method that uses a tissue’s absorption of near-infrared light to measure relative oxygenated (OxyHb) and deoxygenated hemoglobin (DeoxyHb) concentrations (Ferrari and Quaresima 2012; Jobsis 1977; Obrig and Villringer 2003). The direct measurement of OxyHb and DeoxyHb concentrations allows cerebral hemodynamic responses to be assessed. Cortical functions have been extensively studied with fNIRS (Ferrari and Quaresima 2012; Obrig and Villringer 2003). Some studies have used visual stimulation conditions and have shown, for both infants and adults, that changes in OxyHb and DeoxyHb are significantly stronger over visual than non-visual areas. Both measures are also modulated by visual contrast (Meek et al. 1998; Plichta et al. 2007; Takahashi et al. 2000). Several studies have also shown auditory activation toward auditory stimulus (Abla and Okanoya 2008; Kochel et al. 2013; Minagawa-Kawai et al. 2002; Rinne et al. 1999). However, in addition to auditory regions, these studies have reported activation in the non-auditory areas including frontal, parietal or motor areas due to non-sensory-specific effects such as attention or motor preparation (Abla and Okanoya 2008; Remijn and Kojima 2010). In other words, an investigation of response specificity toward auditory stimuli has not been fully investigated.

Most of the auditory fNIRS studies published to date focus on higher-level auditory processing, with consequently less focus on bottom-up acoustic processing. To the best of our knowledge, there is as yet no evidence for stimulus-specific modulation of the OxyHb and DeoxyHb signal for auditory stimulation in conditions that would demonstrate the ability of fNIRS to detect low-level auditory perceptual differences. Moreover, evidence from fMRI of auditory modulation by perceived loudness has not been explored with fNIRS. Therefore, in the current study we cross-validated fNIRS recordings of auditory-related responses in adults while also using a visual task as a control. Accordingly, our aim was to investigate whether fNIRS measurement in the temporal areas could capture subtle changes in acoustic processing from auditory cortex.

To achieve this goal, we employed a three-level validation approach using simultaneously recording fNIRS and EEG with a visual and an auditory task. Specifically, by including visual and auditory stimulation blocks in a within-subjects design, it was possible to test for an interaction between brain area (visual and auditory) and sensory stimulus (visual and auditory). For the first level validation, we hypothesized that the fNIRS signal would reflect cortical area specificity and sensory stimulus selectivity. Area specificity was defined as a greater activation in auditory than in visual regions with auditory stimulation, and, conversely, greater activation in visual than in auditory regions for visual stimulation. Stimulus selectivity was defined as a greater activation in auditory regions by the auditory than by the visual stimuli, and vice versa for visual regions (Fig. 1). It should be noted, however, that the human sensory areas are usually not 100 % stimulus selective or area specific, since auditory stimuli can result in responses of the visual system and vice versa (Driver and Noesselt 2008; Naue et al. 2011; Raij et al. 2010; Thorne et al. 2011).

Fig. 1
figure 1

Hypothesis. Definition of area specificity and stimulus selectivity

For the second level validation, we tested whether OxyHb and DeoxyHb were modulated by physical differences in sound stimuli. Most studies performed with functional magnetic resonance imaging (fMRI) have indicated an almost linear increase of the blood oxygenation level dependent (BOLD) signal from fMRI with sound intensity (Uppenkamp and Rohl 2014). However, there are also some studies suggesting that the auditory cortical response is more related to perceived loudness than to physical sound intensity levels (Langers et al. 2007; Rohl and Uppenkamp 2012). Therefore, we included in the current experiment two levels of sound intensity for two types of stimulus and in addition a subjective loudness evaluation. It was predicted that both the manipulation of sound intensity and the perceived loudness would be reflected in OxyHb and DeoxyHb signal changes.

For the third level of validation, we performed a correlation between hemoglobin concentrations and sensory-related evoked potential amplitudes from EEG. A substantial amount of EEG studies have shown source localization from auditory-evoked potentials to be located in auditory areas (Hine and Debener 2007; Hine et al. 2008). Additionally, it has also been well-established that sensory-related evoked potentials are modulated by both physical and perceptual stimulus differences. Therefore evidence of correlation between fNIRS and EEG signals will further strengthen the idea that fNIRS is sensitive enough to detect low-level acoustic processing in auditory cortex. Although a few studies have investigated the relationship between neural activity measured by EEG and the hemodynamic response measured by fNIRS (Ehlis et al. 2009; Koch et al. 2008; Nasi et al. 2010; Takeuchi et al. 2009), the relationship between OxyHb/DeoxyHb on the one hand and visual/auditory evoked potentials on the other has not yet been fully addressed. Therefore in the current experiment we explored this relationship further. Specifically we hypothesized a correlation between hemoglobin concentrations and evoked potential amplitudes, and a correlation between EEG and fNIRS regarding the magnitude of experimental modulations.

Materials and Methods

Participants

Twenty-four adult volunteers (12 males, 12 females, mean age 23.5 years, standard deviation 2.45 years) participated in the study. All participants were right-handed as assessed via a classification of hand preference by association analysis (Oldfield 1971), with normal or corrected-to-normal vision, and with normal hearing. None of the participants had a history of neurological or psychiatric illness. All participants gave written consent prior to the experiment. All procedures were approved by the local ethics committee and conformed to the declaration of Helsinki. The participants were paid for their participation.

Stimuli and Setup

The current experiment included a visual and an auditory task. For the visual task we adopted the stimuli from a previous study (Sandmann et al. 2012). The visual stimuli consisted of reversing displays of circular checkerboard patterns (see Fig. 2a). The image pair of each stimulus is referred to as Images A and B. Image B was generated by rotating Image A 180°. All stimuli (1280 × 1024 pixels) were radial in nature and consisted of 20 rings, each of which was divided into 18 sectors with neighboring sectors being of opposite color. The radial nature of the stimuli compensated for the increase in receptive-field size with eccentricity (Rover and Bach 1985; Zemon and Ratliff 1982). There were four different pairs of checkerboard patterns that systematically varied in terms of luminance ratio. The proportion of white pixels in the stimulus pattern was set at 1/8 (Level 1: corresponds to 12.5 % white pixels), 2/8 (Level 2: corresponds to 25 % white pixels), 3/8 (Level 3: corresponds to 37.5 % white pixels) and 4/8 (Level 4: corresponds to 50 % white pixels). The contrast between white and black pixels was identical in all images used. Images A and B were presented at a reversal rate of 2 Hz for 10 s to form a visual stimulus. All visual stimuli were presented on a 24-inch monitor at a distance of 150 cm. The visual angle of the checkerboard diameter was 10.5°.

Fig. 2
figure 2

Stimuli. a The flickering checker board pairs. The proportion of white pixels in the stimulus was 12.5, 25, 37.5, and 50 % of the circular panel (from left to right). Image B is generated by rotating Image A by 180°. b The modulator used for Sound 2 in the auditory task. The modulator is a 10 Hz wave amplitude modulated at 4 Hz (marked by envelope)

For the auditory task, we used two types of sound. The first type named Sound 1 (S1) consisted of two pure tones, which were 440 and 554 Hz, sampled at 44.1 kHz, and which lasted for 500 ms (with 50 ms linear rise/fall). The two tones were presented alternately with a frequency of 2 Hz for 10 s, starting always with the 440 Hz tone. The second type, Sound 2 (S2), was adopted from a previous study (Duffy et al. 2013). A sine wave of 10 Hz was first amplitude modulated by a 4 Hz sine wave to form a modulator (Fig. 2b). Then this modulator (always with the same phase and envelope) was used to frequency modulate a carrier sine wave of 1 k Hz linearly to form S2. S2 had therefore a frequency range between 960 and 1060 Hz. The duration of the sound was 10 s and a Hanning ramp was applied to the first and the last 50 ms of the stimuli. By introducing S2 to the experiment, we aimed to observe a better signal from the fNIRS data since it included more frequency variation in the sound. The sound intensity of both types (S1 and S2) was set to either 40 dB or 70 dB SPL. Therefore, for the auditory task we used 4 sounds (2 sound types × 2 sound intensities). All auditory stimuli were delivered to the participants through E-A-RTONE 3A insert earphones with E-A-RLINK foam plugs.

Experimental Design

Participants performed one visual task and one auditory task each. Within the visual task, 4 (luminance ratios: levels 1 to 4) × 10 (repetitions) = 40 trials were presented in total. Each trial consisted of one luminance ratio (i.e. one image pair of the reversing checkerboard pattern) presented for 10 s, followed by a 20 s baseline with a fixation cross in the middle of the screen. Participants were instructed to fixate on the middle of the screen during the stimuli and the baseline period. The visual session lasted for 20 min, and a break of 1 min was given after 10 min. Within the auditory session, 2 (intensities: low, high) × 2 (stimulus types: S1 and S2) × 20 (repetitions) = 80 trials were presented. Each trial consisted of a sound presented for 10 s followed by a 20 s silence. A fixation cross was presented in the middle of the screen throughout the whole session. Participants were instructed to fixate on the cross the entire time during the session. The auditory session lasted for 40 min, and a break of 1 min was given every 10 min of the experiment. In general, the visual and the auditory stimuli were presented in randomized order within each task, and the sequence of the visual and the auditory task was counterbalanced across participants.

Procedure

Before the start of the experiment, participants passed a Landolt C vision test with visual acuity more than 0.8 and a hearing threshold test with less than 20 dB hearing loss in each ear (125–4,000 Hz). Participants were also asked to answer a set of questionnaires including handedness (Oldfield 1971) and health state. After the questionnaires, participants received an instruction sheet for both the visual and the auditory task. In the visual task, they were required always to fixate on the middle of the screen, and to press a button at the end of the stimulus to indicate whether the stimulus belonged to a higher (levels 3 or 4) or lower (1 or 2) luminance ratio. In the auditory task they were instructed always to fixate on the fixation cross in the middle of the screen (eyes closed was not allowed), and to press a button at the end of the stimulus to indicate whether the sound intensity was high or low. The participants received training prior to the actual data recording, and the training was repeated until they were 100 % correct with the button pressing. After the experiment, participants performed a questionnaire to indicate their subjective perception of sound intensity for all four sound stimuli (a seven level loudness scaling from too quiet to too loud).

Data Recording

Functional Near-Infrared Spectroscopy (fNIRS)

Functional near-infrared spectroscopy was recorded by a NIRScout 816 device (NIRxMedizintechnik GmbH, Berlin, Germany) with 8 LED sources (intensity 5mW/wavelength) and 12 detectors placed on the temporal and occipital areas of the scalp (Fig. 3). Regions of interest were defined as the left and the right visual area, and as the left and the right auditory area, as shown in Fig. 3. Within each area, two sources and three detectors were placed. The distance between a source and its neighboring detector was 3 cm. Each source-detector pair at 3 cm distance formed a channel, resulting in five channels per area and 20 channels in total. The emitted light from sources was with wavelengths 760 and 850 nm, the sampling rate was 6.25 Hz.

Fig. 3
figure 3

Optodes layout. On the top row are photographs from the left, back and right view of our customized simultaneous EEG-fNIRS caps. The rows below show an overlay of optodes on the cortex using a 3D digitizer system on one participant for demonstration purpose. Red circles are sources and green circles are detectors. Each neighboring source and detector pair forms one channel. The left-most panel is defined as left auditory area and the right-most panel is defined as right auditory area. The middle panel shows both left and right visual areas (separated by the gap between the two hemispheres). The size of grey circle on each channel represents how many times the channel was selected within each area (number of selections). Channels were selected separately for OxyHb and DeoxyHb (Color figure online)

Electroencephalography (EEG)

Electroencephalography (EEG) was recorded simultaneously with functional near-infrared spectroscopy (fNIRS) during the experiment. EEG was recorded by 94 equidistant Ag–AgCl electrodes in customized caps (Easycap, Herrsching, Germany). Two additional channels were placed below the left and right eye to record the electrooculogram. All channels were referenced to a nose-tip electrode. EEG was recorded using the BrainVision Recorder (Version 1.10), together with BrainAmp DC Amplifiers (Brain, Products, Gilching, Germany). The sampling rate was 1 kHz with online filtering between 0.016 Hz and 250 Hz. Electrode impedances were kept below 20 kOhm prior to data acquisition.

Data Processing

Loudness Scale

The subjective loudness scales from all participants were transferred into numbers from 1 to 7, where 1 represented too quiet and 7 represented too loud. The results for the loudness scale are provided in Table 1, indicating an effective loudness manipulation for S2, but showed surprisingly small loudness difference between the two sound intensities for S1. Further analysis of S1 showed that except for one participant, who reported the same loudness for both sound intensities, 11 participants indicated a loudness difference of two and the remaining 12 participants indicated a loudness scale difference of one between the two sound intensities. Based on this difference, participants were separated into a lower difference group (scale difference one) and a higher difference group (scale difference two). The auditory modulation with perceived loudness was calculated separately for the two groups. The participant who reported the same level of perceived loudness from each sound intensity level was excluded from this analysis.

Table 1 Loudness scale

Functional Near-Infrared Spectroscopy (fNIRS)

The fNIRS data were imported into Matlab and were transformed to concentration levels (unit: mM) of OxyHb and DeoxyHb using the NILAB toolbox (NIRxMedizintechnik GmbH, Berlin, Germany). OxyHb and DeoxyHb concentrations were then high-pass filtered at 0.015 Hz and low-pass filtered at 0.08 Hz. OxyHb and DeoxyHb concentrations were modeled separately with the general linear model (GLM) using a Boynton canonical HRF with 6 s delay (Malinen et al. 2007; Smith 2004). The GLM model was identical for both visual and auditory tasks, and was also applied to all participants. The beta values from the contrast of all stimuli (i.e. all conditions) versus baseline (corresponding to the 20 s fixation cross between the trials) were then extracted from the model. Within each predefined area (left/right visual area and left/right auditory area, 5 channels per each area), the channel with the highest beta value was selected independently for OxyHb and DeoxyHb (4 areas × 2 measurements (OxyHb and DeoxyHb), i.e. 8 channels per subject selected). Data from the selected channels were epoched from −5 to 25 s around the onset of the stimuli separately for each condition, and were averaged across trials. Baseline correction was applied from −5 to 0 s. Grand averages across subjects were calculated for the left and the right hemisphere, for both auditory and visual areas. Since in the current experiment, both visual and auditory stimuli were presented bilaterally, we averaged signals across the right and the left hemisphere. OxyHb and DeoxyHb concentrations were calculated as the mean amplitude within a 10-s time window. This time window was defined separately for each condition by the peak latency (defined from grand average) plus and minus 5 s. These concentration values were subjected to an analysis of variance (ANOVA) and correlation analyses.

Electroencephalography (EEG)

EEG data were analyzed with EEGLAB v12.0.2.4b (Delorme and Makeig 2004) in Matlab R2012a (The MathWorks Inc., Natick, MA, USA). Imported EEG data were filtered between 0.1 Hz and 40 Hz and down-sampled to 500 Hz. Independent component analysis (ICA) was applied, and CORRMAP (Viola et al. 2009) was used to remove eye blinks, eye movement, and heart-beat artifacts. After artifact correction, epochs were segmented from −62 to 448 ms around the visual and the auditory stimulus onset. Trials time-locked to visual stimuli were averaged separately for each luminance ratio to calculate visual evoked potentials (VEP). For the auditory conditions, trials time-locked to S1 were averaged separately for the two sound intensities to calculate auditory evoked potentials (AEP). In addition, since previous studies have found strong attenuation of AEP N100 amplitudes over repetitive stimulus due to adaptation (May and Tiitinen 2010), averages over each tone (440 or 554 Hz tones) were calculated to form AEP1 (i.e. AEP evoked to the first tone of each trial), AEP2 (i.e. AEP evoked to the second tone of each trial), and so on until AEP20 (i.e. AEP evoked to the last tone of each trial). Investigation for S2 was performed in the frequency domain and will be reported elsewhere.

The channels used for the ERP amplitude analysis were selected individually within a pre-defined region of interest (ROI). This individual approach was used because of the large variance across individuals in the specific channel with maximal P100 VEP peak amplitude (Fig. 4c). A visual ROI consisting of 17 channels was first set to cover most of the occipital electrodes (Fig. 4a). The P100 peak amplitude was then calculated from the maximum peak amplitude within the visual ROIs and the time window of 90–130 ms. For comparison between the ROI approach and the standard universal electrode approach, the P100 peak amplitude was also calculated from the maximum peak amplitude at channel Oz within the time window of 90–130 ms. Similarly, for the auditory task an auditory ROI consisting of 13 channels was set covering the frontal-central electrodes (Fig. 4b). The N100 peak amplitude was calculated as the minimum peak amplitude within the auditory ROI for the time window of 90–180 ms, and the P200 peak amplitude as the maximum peak amplitude within the ROI for the time window of 180–260 ms. The auditory peak analysis was done for the AEP average across stimuli as well as all the individually averaged tones (AEP1 to AEP20). In general, the time windows for peak detection were selected on the basis of the grand averages, separately for the visual (P100) and the auditory (N100, P200) task.

Fig. 4
figure 4

EEG ROI approach. a Visual ROI. b Auditory ROI. Black dots are EEG channels on the customized 96-channel EEG cap. Red dots indicate the selected ROI for visual and auditory tasks. Note that the ROI selected for the visual task is within occipital areas and the ROI selected for the auditory task is centered on fronto-central areas, consistent with previous literature. c VEP P100 topographical variation between participants: Three selected participants’ topographical layout of VEP P100 are plotted here. Notice subject 1 (Subj. 1) has a central occipital P100 component, subject 2 (Subj. 2) has a wider spread of P100 activation, and subject 3 (Subj. 3) has a bi-lateral layout for the P100 (Color figure online)

fNIRS-EEG Integration

Two types of correlation were performed separately for the visual and the auditory task. The first was an amplitude correlation in which the ERP peaks (VEP: P100; AEP and AEP1: N100, P200) were correlated with the OxyHb/DeoxyHb concentrations over the visual areas (visual task) or the auditory areas (auditory task). For these analyses, we used the average across the different conditions (visual: 4 levels of luminance ratio; auditory: 2 intensity levels) to test for a relationship between ERPs and OxyHb/DeoxyHb concentrations, irrespective of modulation levels. To assess the relationship of condition-specific modulation between EEG and fNIRS measures, a second type of correlation was computed (modulation correlation). For this, in the visual task, we performed a regression analysis across the conditions (visual task: 4 levels of luminance ratio), and then correlated the regression coefficients from ERP peaks (VEP: P100) and from OxyHb/DeoxyHb concentrations. For the auditory task, since there were only two conditions, the difference between the two conditions (two sound intensities) was instead calculated. A third type of correlation was specifically performed for the auditory task to further investigate the neural adaptation (adaptation correlation). Specifically, the N100 peaks from AEP3 to AEP20 were averaged (AEP3–20) and subtracted from AEP1 N100 peak to form an adaptation index (AI) for individual participants. The AI values indicate the amplitude reduction from the first tone to the end of the sound block, with a higher AI value reflecting a stronger adaptation. The AIs were then correlated with both OxyHb and DeoxyHb concentrations.

Statistical Analysis

Functional Near-Infrared Spectroscopy (fNIRS)

Repeated-measures two-ways ANOVAs with the factors stimulus (auditory, visual) and area (auditory, visual) were performed separately for OxyHb and DeoxyHb averaged across conditions. In a second step, we assessed the effect of visual stimulus properties on the modulation of OxyHb and DeoxyHb concentration by computing a regression analysis across the 4 levels of luminance ratio. A one-sample t test was then applied on the regression coefficients, derived for each subject separately, from both the visual and the auditory areas to test for a significant effect of modulation on the fNIRS response. Further, a paired t test was applied to the regression coefficients between the visual and the auditory area to test whether the stimulus-specific modulation of cortical response was different between the two areas. Auditory modulation with sound intensity was investigated by performing a paired t test of OxyHb/DeoxyHb concentrations in the auditory area between the two sound intensities for both S1 and S2. As in the procedures used for the visual task, a paired t test was applied between the visual and the auditory area to test for area differences. Since perceived loudness results diverged from sound intensity levels only with S1, the auditory modulation with perceived loudness was performed only on S1. A paired t test between the two S1 intensities was performed separately for both the higher and the lower perceived loudness difference groups.

Electroencephalography (EEG)

In a similar manner to the fNIRS analysis, visual modulation was investigated by performing a one sample t test on the regression coefficients (across the luminance ratios) on VEP P100 peaks. Auditory modulation was investigated by performing a paired t test of AEP N100 and P200 peaks between the two sound intensities.

fNIRS-EEG Integration

Since the current study used a block design, dissociation between individual checkerboard frame and tone in fNIRS was not feasible. Therefore a single trial correlation between ERPs and fNIRS as commonly performed in EEG/fMRI studies was not feasible in current study. We instead performed a correlation across participants, which has also been shown to demonstrate neurovascular coupling in previous studies (Mayhew et al. 2010). To minimize the possibility of finding a correlation related more to individual structural differences than to brain activity, we performed a modulation correlation and an adaptation correlation in addition to the amplitude correlation. Both amplitude correlations (first type of correlation) and modulation correlations (second type of correlation) were performed using the Pearson’s r correlation. Since the AI was not normally distributed, Spearman’s rho was used for adaptation correlations (third type of correlation).

Results

Functional Near-Infrared spectroscopy

Grand averages across participants of OxyHb and DeoxyHb concentration changes are plotted in Fig. 5. Consistent with our hypothesis, these plots showed for the visual stimulation conditions a higher activation in the visual area than in the auditory area. Conversely, for the auditory stimulation conditions, the concentration changes were higher in the auditory area compared with the visual area for DeoxyHb. OxyHb, however, was not clearly showing this distinction. Moreover, the visual area showed higher activation toward the visual stimulation compared to the auditory stimulation, and the auditory area showed higher activation toward the auditory stimulation compared to the visual stimulation. These observations applied to both the OxyHb and the DeoxyHb concentrations.

Fig. 5
figure 5

Grand average plot of OxyHb and DeoxyHb concentrations. Conditions of different luminance ratios and sound intensities are plotted separately. The axis on the left side of each subplot is for OxyHb concentration and the axis on the left side of each subplot is for DeoxyHb concentration. All OxyHb concentrations are plotted in different levels of red and all DeoxyHb concentrations are plotted in different levels of blue. Darker colors correspond to higher luminance ratio and higher sound intensity. Notice there are two sound types in the current experiment. a Visual stimulus in the auditory area. b Visual stimulus in the visual area. c Auditory stimulus in the auditory area. d Auditory stimulus in the visual area (Color figure online)

These observations were confirmed by the inferential statistics. Repeated-measures two-way ANOVAs with OxyHb revealed a significant main effect of stimulus (F(1,23) = 12.10, p < 0.005, ŋ 2 = 0.35) and area (F(1,23) = 47.73, p < 0.001, ŋ 2 = 0.68), and the interaction between stimulus and area was also significant (F(1,23) = 91.35, p < 0.001, ŋ 2 = 0.80). Follow-up paired t tests (Fig. 6) showed that the interaction was driven by the stimulus selectivity, where the auditory area was activated significantly more by the auditory stimulus than by the visual stimulus (t(23) = 2.08, p < 0.05) and vice versa for the visual area (t(23) = −6.80, p < 0.001). Area specificity was only observed with the visual stimulus, that is, activation was significantly enhanced over visual areas than over auditory areas for the visual stimulus (t(23) = −9.19, p < 0.001 between visual and auditory areas), while for the auditory stimulus the activation was numerically but not significantly different between these two regions (t(23) = 1.56, p = 0.13). For the stimulus-by-area ANOVA conducted on DeoxyHb, the results revealed a significant main effect of stimulus (F(1,23) = 22.30, p < 0.001, ŋ 2 = 0.49), and a significant interaction between stimulus and area (F(1,23) = 57.97, p < 0.001, ŋ 2 = 0.72). Follow-up paired t tests (Fig. 6) confirmed our predictions of both area specificity and stimulus selectivity. Specifically, the auditory area responded significantly more to the auditory stimuli than to the visual stimuli (t(23) = −3.77, p < 0.005), and the visual area responded significantly more to the visual stimuli than to the auditory stimuli (t(23) = 7.85, p < 0.001). Moreover, auditory stimuli activated the auditory area significantly more than the visual area (t(23) = −4.42, p < 0.001), and the visual stimuli activated the visual area significantly more than the auditory area (t(23) = 3.38, p < 0.005).

Fig. 6
figure 6

Paired t test results for OxyHb (left graph) and DeoxyHb (right graph). The x axis shows the task (auditory/visual stimulus) and the y axis is the hemoglobin concentration. Left: Changes in the OxyHb concentration to auditory and visual stimulation. Auditory stimulation evoked similar changes in the auditory and visual area. The interaction between brain areas and sensory stimulation was therefore driven by the auditory area showing significantly enhanced changes to auditory compared with visual stimuli, and by the visual area showing significantly stronger changes to visual than to auditory stimuli. Right: Changes in DeoxyHb to auditory and visual stimulation. All four possible pairs showed a significant difference

Results from the regression analysis of visual modulation are provided in Table 2. For both OxyHb and DeoxyHb, the regression coefficients from the visual area (OxyHb: t(23) = −3.51, p < 0.005; DeoxyHb: t(23) = 5.89, p < 0.005) but not from the auditory area (OxyHb: t(23) = −1.19, p > 0.2; DeoxyHb: t(23) = 1.99, p > 0.5) significantly differed from zero, suggesting a stimulus-related modulation effect of OxyHb and DeoxyHb specifically for the visual area (Fig. 7a). This was consistent with the finding that the slopes of OxyHb were significantly enhanced in the visual area compared to the auditory area (t(23) = 2.54, p < 0.05). For the stimulus-related modulation of responses in the auditory task, paired t tests were performed to compare responses in the auditory area to the two sound intensities for each sound (S1, S2) separately. Results were not significant for either OxyHb (S1: t(23) = 0.83, p > 0.4; S2: t(23) = 0.55, p > 0.5) or DeoxyHb (S1: t(23) = −0.74, p > 0.4; S2: t(23) = −0.47, p > 0.6). In addition, to test whether there was a difference between auditory-induced modulation in the visual and the auditory areas, paired t tests were performed to compare the sound intensity difference in the visual area with the sound intensity difference in the auditory area for each sound (S1, S2) separately. Results were not significant for either OxyHb (S1: t(23) = 0.48, p > 0.6; S2: t(23) = 0.27, p > 0.7) or DeoxyHb (S1: t(23) = −0.17, p > 0.8; S2: t(23) = −0.32, p > 0.7). For the auditory modulation with perceived loudness, paired t tests performed on the higher loudness scale difference group were found significant for DeoxyHb following Bonferroni correction for multiple comparisons (t(10) = −3.40, p = 0.007, Fig. 7b) but this was not the case for OxyHb (t(10) = 0.37, p > 0.7). Paired t tests performed on the lower difference group were not significant for either OxyHb (t(11) = 0.58, p > 0.5) or DeoxyHb (t(11) = 0.32, p > 0.7). This suggested that DeoxyHb was modulated by the perceived loudness,as indicated by a subjective scale, rather than by sound intensity per se.

Table 2 Regression across luminance ratio
Fig. 7
figure 7

Modulation analysis for both visual and auditory tasks. a Regression results for the OxyHb and the DeoxyHb to visual stimulation. The y-axis shows the average regression coefficients ± one standard error of the mean across the participants. The x-axis represents the different areas (auditory/visual). VstimAarea means visual stimulus in auditory area. VstimVarea means visual stimulus in visual area. Note that the regression analysis was performed from high luminance ratio to low luminance ratio; therefore regression coefficients for OxyHb are negative and for DeoxyHb are positive. b Perceived loudness modulation for S1. The y-axis shows the DeoxyHb concentration ± one standard error of the mean averaged across both 40 and 70 dB S1. The x-axis represents the larger and smaller loudness difference groups

Electroencephalography

Figure 8 shows VEPs at channel Oz and AEPs at channel Cz. Consistent with the findings of Sandmann et al. (2012), the VEPs showed four distinct peaks around 83, 109 ms (referred to as P100), 156, and 247 ms. In addition, the VEP amplitudes appeared to be modulated by the luminance ratio at the P100 peaks. AEPs on the other hand showed three distinct peaks around 88, 121 ms (referred to as N100), and 203 ms (P200). The amplitudes of the N100 and the P200 appeared to be modulated by sound intensity. Here we report data only for VEP P100, AEP N100 and AEP P200 peaks that have been identified in previous work as relevant to the present context (Mayhew et al. 2010; Sandmann et al. 2012).

Fig. 8
figure 8

Grand average ERP results. Plots in the upper row are topographical activation maps averaged across conditions (four luminance ratios for VEP and two intensity levels for AEP). Plots in the lower row show ERPs. The left column shows the visual event-related potentials (VEP) at channel Oz and the right column shows the auditory event-related potentials (AEP) at channel Cz. Note that only AEPs to Sound 1 are plotted here

The statistics from the ROI peak analysis confirmed the VEP and AEP modulations. For the VEP modulation, the mean of the regression coefficients for the P100 was −0.57, with a standard deviation of 0.76. One-sample t tests showed that the P100 regression coefficients significantly differed from zero (t(23) = −3.66, p < 0.005). For the AEP modulation, paired t tests showed a significant difference between low and high sound intensities at the AEP P200 peak (high intensity: 2.69 ± 1.24, low intensity: 1.78 ± 0.91, t(23) = 6.99, p < 0.001), but not at the AEP N100 peak (high intensity: −1.75 ± 1.23, low intensity: −1.64 ± 1.15, t(23) = −0.64, p > 0.5).

fNIRS-EEG Integration

To study the relationship between hemoglobin concentrations and neural activities, we performed two types of correlation analyses across participants, separately for visual and auditory stimulations. Specifically, amplitude correlations (first type of correlation) were performed to investigate whether the amount of activation measured by ERP peak amplitudes and hemoglobin concentration changes correlated across participants. We also performed modulation correlation (second type of correlation), which allowed us to investigate whether the two recording modalities responded similarly toward physical differences in stimuli. A third type of correlation (adaptation correlation) was performed specifically for auditory stimulation to test for neural adaptation.

For the visual amplitude correlation (first type of correlation), the P100 VEP peak amplitude from the ROI approach showed a significant correlation with DeoxyHb concentration across participants (R = −0.47, p = 0.021, Fig. 9a). On the other hand, the correlation between the P100 VEP peak amplitude from Oz and DeoxyHb concentration was not significant (R = −0.30, p = 0.160, Fig. 9c). For the visual modulation correlation (second type of correlation), the correlation between the condition regression of VEP ROI P100 and DeoxyHb concentration was significant (R = −0.50, p = 0.012, Fig. 9b), as was the correlation between the condition regression of VEP Oz P100 and DeoxyHb concentration (R = −0.49, p = 0.016, Fig. 9d). However, the correlation between VEP P100 peaks and OxyHb concentration were all not significant (p > 0.1).

Fig. 9
figure 9

Integration of EEG and fNIRS. a Visual ROI amplitude correlation. b Visual ROI modulation correlation. c Visual Oz amplitude correlation. d Visual Oz modulation correlation. e Auditory amplitude correlation. f Auditory adaptation correlation. For amplitude correlation, the y-axis is DeoxyHb concentration and x-axis is VEP P100 peak values. For modulation correlation, the y-axis is regression coefficient from DeoxyHb concentration and x-axis is regression coefficient from VEP P100 peak values. For adaptation correlation, the y-axis is OxyHb concentration and the x-axis is the adaptation index (defined as |AEP3-10-AEP1|). Each dot represents one participant and the black line is the linear best fit line

For the auditory amplitude correlation, only AEP1 N100 peak amplitude and OxyHb concentration correlated significantly (R = 0.41, p = 0.049, Fig. 9e). Note that this was an inverse correlation, that is, the higher the AEP1 N100 amplitude the lower the overall OxyHb concentration change. To explore whether the inverse correlation was a result of neural adaptation, we performed an adaptation correlation between adaptation index and OxyHb/DeoxyHb concentrations (third type of correlation). Consistent with the amplitude correlation, the adaptation correlation was found significant with OxyHb (R = 0.42, p = 0.040, Fig. 9f). None of the auditory modulation correlations were found significant (p > 0.1). See Table 3 for a summary of results.

Table 3 Overview of the results

Discussion

The current EEG-fNIRS study investigated both visual and auditory processing in adults. We found significant interactions between brain areas (visual, auditory) and sensory stimuli (visual, auditory) in the fNIRS signals, as expected. More specifically, we showed cross-sensory area specificity with DeoxyHb, in which the cortical responses to auditory stimuli were specifically enhanced over auditory regions and responses to visual stimuli were larger over visual regions. Our results also showed stimulus selectivity, that is, the auditory area showed stronger signal changes to auditory stimuli and the visual area revealed stronger signal changes to visual stimuli. In addition, both hemoglobin concentrations were modulated by the visual luminance ratio, and this stimulus-related modulation was significantly stronger in the visual than in the auditory area (OxyHb). The modulation by auditory sound intensity was not significant. However, a complementary modulation analysis using perceived loudness information was found significant. This pattern of results provides further evidence that the auditory cortex response is more related to perceived loudness than physical sound intensity (Langers et al. 2007; Rohl and Uppenkamp 2012). Correlation analyses revealed a relationship between DeoxyHb concentration and VEPs, and between OxyHb concentration and AEPs. These results suggest sensitivity in fNIRS at detecting modality-specific cortical activation and provide further evidence for a close relationship between stimulus-evoked changes in the neural activity and hemoglobin concentrations.

There has been an ongoing debate regarding whether OxyHb or DeoxyHb measured by fNIRS is a more reliable measurement of cortical activation. Several studies have suggested that OxyHb is 10 times more affected by physiological noise including heart beat and low frequency oscillations (Nolte et al. 1998; Wobst et al. 2001). A potential explanation is that the pulsatile character of blood flow is more present in the arterial system, where OxyHb concentration is much higher than DeoxyHb concentration. In line with this observation, another study has found synchronization between the stimulation of motor task and the systemic physiological oscillation including heart rate, respiration and arterial pulse (Franceschini et al. 2000). As a result of this systemic physiological contribution during stimulation, OxyHb presented a less localized activation compared to DeoxyHb, which was also observed in a visual experiment (Plichta et al. 2007). Therefore the DeoxyHb is thought to have better local specificity and be related more closely to brain activation than is the OxyHb (Obrig and Villringer 2003; Plichta et al. 2007; Schroeter et al. 2004). Consistent with this observation, our statistical analysis showed promising results regarding area specificity for auditory stimulation only with DeoxyHb. Furthermore, a double-peak pattern was more prominent in OxyHb compared to DeoxyHb (Fig. 5), which could be the residual physiological artifacts synchronized with the visual and the auditory stimulation. On the other hand, several fMRI studies have reported a double peak pattern in response to the onset and offset of the repeated auditory stimulus (Harms and Melcher 2002; Sigalovsky and Melcher 2006). Even though this does not fully explain the lack of the same pattern in DeoxyHb, it is possible that actual neural dynamics contribute to this double peak pattern. Our results also indicate a correlation between EEG and OxyHb, which suggests that OxyHb captured a valid signal. Other studies have also argued that OxyHb shows bigger response and higher re-test reliability (Plichta et al. 2006; Ye et al. 2009). Since the current experiment did not record non-brain physiological responses, a conclusive answer to the cause of this double-peak pattern is not possible. Future studies are required to further dissociate whether this pattern is related either to physiological or neural responses.

A GLM channel selection technique was implemented in the present study to extract the best recording channel within each area for fNIRS analysis. By doing so, the averaging of noise and signals across participants was avoided. Furthermore, we have also applied an independent channel selection for OxyHb and DeoxyHb. As discussed earlier, previous studies have shown that the signal to noise ratio (SNR) differs between OxyHb and DeoxyHb(Nolte et al. 1998; Wobst et al. 2001). This difference in SNR could very likely cause the channel with highest activation to be different for OxyHb and DeoxyHb. Consistent with this, our results also demonstrated different patterns of selected channel within each ROI for OxyHb and DeoxyHb (Fig. 5).Since spatial resolution was not the main concern, these approaches proved efficient at extracting the signals. In addition to an individual channel selection in the fNIRS data, individualized channel selection was also implemented in the EEG data. Although this is not a standardized analysis procedure in EEG analysis, it is known that individual differences in brain structure and function can lead to different ERP topographies even when the activation originates from the same cortical source (Luck 2005). In addition, some papers have also proposed an analytical tool to account for this individual difference (Pernet et al. 2011). In line with these observations, our EEG analysis showed a better correlation between the VEPs P100 and the DeoxyHb concentrations with an individualized channel selection approach. Regarding the modulation correlation, whereby responses toward physical stimuli differences were investigated, there was not much difference between the two channel selection approaches (Fig. 9b, d). However, for the amplitude correlation, the results only reached statistical significance with the individualized channel selection (Fig. 9a, c). This particular pattern of results could be expected: while peak amplitude values are likely to be sensitive to precise electrode placement, it is reasonable to assume that a sub-optimally positioned electrode might still be able to measure the modulation of a response, even when the response itself is not maximal at that location. Results of the across-participant correlation between ROI maximum activation from DeoxyHb and ERP P100 peaks support further the notion that individual differences should be taken into consideration when interpreting EEG data.

Few simultaneous EEG and fNIRS studies have investigated the direct relationship between the two recording modalities, two of which also investigated with visual stimulation. Näsi et al. (2010) reported a correlation between the summation of the area underneath the VEP and both OxyHb and DeoxyHb, with slightly better correlation with DeoxyHb. In addition, Koch et al. (2008) reported a correlation between individual alpha-frequency at rest and DeoxyHb only. Consistent with their findings, we found a significant correlation between VEP P100 and DeoxyHb but not with OxyHb. On the other hand, with auditory stimulation, Ehlis et al. (2009) reported a correlation between OxyHb and gating quality, which was defined as the ratio between the amplitude of P50 for the repeated second sound and the amplitude of P50 for the first sound. They found the correlation in both the prefrontal area and the temporal area. Consistent with this result, we also found an OxyHb correlation with sensory-related peaks in AEPs. Put together, our current findings suggest a relationship between changes in each hemoglobin form (OxyHb/DeoxyHb) and neural activity, specifically that OxyHb is more associated with the auditory modality and DeoxyHb with visual modality. However, one should note that previous simultaneous fMRI and EEG studies have shown a poor correlation between the BOLD signal and the VEP P100 for visual stimuli less than 5 ms (Yesilyurt et al. 2010). Additionally, one study has also shown a correlation between AEP P200 and the BOLD signal (Mayhew et al. 2010), pattern that was not found in the present work. Although results from fNIRS and fMRI are not directly comparable, these results also imply that associations of brain electrical activity to hemodynamic responses may vary across experimental designs. More work is therefore needed in order to dissociate the signal acquired by OxyHb and DeoxyHb and the neurovascular coupling.

Cross-recording-modality associations in the auditory system were observed only between OxyHb concentration and the AEP average from the first tone of each stimulus block (AEP1), but not with the averaged AEP. This probably reflects the strong adaptation of AEP N100 amplitudes for sounds presented at 2 Hz; the trials following the first in a block contributed more noise than signal to the data. To follow up on this hypothesis, an adaptation correlation was performed between the AI and hemoglobin concentrations. The results confirmed that participants with stronger adaptation showed less overall OxyHb concentration change (Fig. 9f). However, since the current experiment was not designed specifically to investigate AEP N100 adaptation in detail, we are reluctant to draw firm conclusions. Future studies should include a comparison condition where only the first sound is presented to further ascertain whether this reverse correlation is indeed an indication of neural adaptation. Another alternative would be an event-related experimental design, where strong adaptation in the auditory task is avoided and a trial-by-trial investigation could be performed. Furthermore, evidence with simultaneous EEG-fMRI recording has indicated a better correlation with time–frequency decomposition of the EEG data compared to ERPs (Mayhew et al. 2010). Therefore, further analysis in the frequency domain could be implemented to follow up on this observation.

Previous fNIRS studies have used visual contrast to modulate fNIRS signals (Plichta et al. 2007; Rovati et al. 2007), whereas EEG studies have shown modulation of luminance ratio (Sandmann et al. 2012). Consistent with the findings in EEG, we showed that both OxyHb and DeoxyHb were modulated by the luminance ratio. Furthermore, we also showed a modulation correlation between EEG and DeoxyHb concentrations. This indicated that DeoxyHb concentration responded to luminance ratio in a similar manner to the EEG signal in the visual system. In contrast, the modulation of both OxyHb and DeoxyHb by sound intensity was not significant. This is surprising since previous fMRI studies have shown an almost linear increase of BOLD response for every 10 dB increase in sound intensity, and that the 30 dB difference should be large enough to induce both activation difference and perceived loudness difference (Rohl and Uppenkamp 2012; Uppenkamp and Rohl 2014). However, our subjective loudness scale showed that the loudness manipulation was not as successful as we expected. For S1, half of the participants indicated only a small difference between the two sound intensities. Participants who reported a larger loudness difference showed a significant difference in activation toward the two sound intensities. This suggests that our fNIRS measurement was related more closely to perceived loudness than to the sound intensity. Since previous fMRI studies indicated that primary auditory cortex often responded more faithfully toward sound intensity compared to secondary auditory cortex (Langers et al. 2007; Rohl and Uppenkamp 2012; Uppenkamp and Rohl 2014), this might indicate that our fNIRS recording was mainly from the secondary auditory cortex. This would be consistent with the depth limitation of fNIRS. However, this does not fully explain the lack of modulation for S2 since the loudness manipulation was consistent with the intensity difference for S2. Therefore, further studies are needed to systematically compare different types of auditory stimuli to indicate whether the fNIRS recording is mostly from primary or secondary auditory cortex.

The current study demonstrates that fNIRS is sensitive to differences in sensory modalities and provides a solid basis for future studies in clinical application. For example, fNIRS will be a promising technique for investigating cross-modal reorganization in the auditory cortex of the deaf (Hauthal et al. 2013, 2014) and of CI users (Sandmann et al. 2012). With fNIRS, long-term monitoring of auditory cortex plasticity could be tracked after cochlear implantation, a notoriously difficult endeavor with other imaging modalities (Debener et al. 2008; Giraud et al. 2001; Pantev et al. 2006). A better understanding of these cortical changes might provide insights into possible training for participants to increase their CI performance. Furthermore, for certain groups of patients, for example children with CIs, fNIRS might be the only option for hemodynamic response measurements. Positron emission tomography is invasive and could potentially harm children in particular if applied repeatedly, fMRI raises safety issues with the metal implant (Majdani et al. 2008), and MEG, with very few exceptions, is not possible with CI users (Pantev et al. 2006). Most CI devices currently used are not MRI-compatible. Therefore subsequent studies are encouraged to utilize the benefit of fNIRS in the auditory system and clinical application to strengthen our understanding of the auditory system regarding the hemodynamic response. In addition, the lack of interference between fNIRS and EEG recording also provides a relatively simple and straightforward approach for the investigation of neurovascular coupling. By showing a correlation between DeoxyHb concentration and ERPs, the current study indeed demonstrates simultaneous fNIRS-EEG recording to be a feasible approach for investigating neurovascular coupling.

Conclusions

The present study demonstrates the sensitivity of fNIRS at separating responses from different sensory modalities. Specifically, the results indicated area specificity, whereby visual stimulation activated primarily visual areas and auditory stimulation evoked signal changes primarily over auditory areas. Furthermore, evidence for stimulus selectivity was found, whereby auditory and visual areas responded mainly towards auditory and visual stimulation, respectively. Sensitivity at detecting physical differences was confirmed in particular for visual stimuli, by showing a stimulus-dependent modulation in fNIRS signals recorded from visual cortex. Auditory response was more modulated by perceived loudness compared to physical sound intensity. Across subjects, EEG and fNIRS signals were significantly correlated, which suggests that concurrent fNIRS-EEG is a promising technique for the non-invasive study of neurovascular coupling. The study validates the feasibility of fNIRS to image cortical activation in both the visual and auditory sensory modalities.