Introduction

Movement is a dynamic process that involves planning, execution, and constant monitoring of feedforward motor commands based on information largely provided by sensory (e.g., visual, auditory or somatosensory) feedback mechanisms (Wolpert et al. 2011; Wolpert and Ghahramani 2000). This process is hypothesized to be regulated by an internal forward model whereby predictions of the sensory consequences of anticipated movements (e.g., efference copies) are compared to the actual sensory feedback (Wolpert et al. 2007). Errors resulting from mismatch between the internally predicted feedforward representation and its incoming sensory feedback are used to monitor and correct motor behavior during goal-oriented movement (Wolpert and Flanagan 2001; Wolpert et al. 2007). In addition, prediction error serves as a critical element of adaptive motor behavior in which a new mapping between feedforward and feedback mechanisms is learned in response to persistent changes in the sensory environment (Wolpert et al. 2011). Evidence from previous studies has supported the notion that motor behavior derived from reorganization of the sensorimotor maps can be shaped temporarily with corrective intent (adaptive behavior), and new behavior can be developed with lasting effects (learned behavior) following alterations in sensory feedback (Nasir et al. 2013; Ostry and Gribble 2016).

In humans, speech production is mediated by a series of motor commands activating respiration, phonation, and articulation mechanisms for communicative intent. Vocal production is an important subcomponent of speech that enables speakers to control the intensity (perceived as loudness) and fundamental frequency (F0, perceived as pitch) of their voice through controlling the contraction of respiratory and laryngeal (e.g., vocal folds) muscles. The control of supra-laryngeal structures of the vocal tract (e.g., pharynx, velum, tongue, jaw, lips etc.) enables speakers to dynamically adjust articulatory movement, and subsequently control formant frequencies to produce various speech sounds. Despite the inherent differences between the mechanisms underlying vocalization and speech, recent models have emphasized the role of auditory and somatosensory (i.e., kinesthetic or proprioceptive) feedback in vocal and articulatory motor control (Guenther et al. 2006; Hickok et al. 2011; Hickok and Poeppel 2007; Houde and Chang 2015; Houde and Nagarajan 2011). The principles of these models were developed based on empirical data from a large body of studies showing that the human vocal and articulatory motor systems respond to online perturbations in auditory and somatosensory feedback during vocalization or speech (Cai et al. 2011; Lametti et al. 2012; Larson 1998).

A widely studied aspect of these mechanisms is the concurrent changes in vocal motor behavior in response to unexpected alterations in the fundamental frequency (F0), perceptually referred to as pitch, of voice auditory feedback. Results of these studies have shown that human subjects control their vocalization F0 by generating compensatory responses that change their vocal F0 in the opposite direction to pitch-shift stimuli in the auditory feedback (Behroozmand et al. 2012; Burnett et al. 1998; Chen et al. 2007a; Larson 1998; Liu et al. 2011).

In addition, it has been shown that exposure to persistent alterations in sensory feedback results in sensorimotor adaptation in the limb, vocal, and articulatory motor systems. Neural and behavioral changes associated with sensorimotor adaptation have been well-documented in limb motor systems using force-field learning and visuomotor adaptation paradigms (Cressman and Henriques 2009; Krakauer and Mazzoni 2011; Nasir et al. 2013; Nasir and Ostry 2008; Ostry et al. 2010; Ostry and Gribble 2016; Vahdat et al. 2011, 2014). In the vocal and articulatory motor systems, studies have investigated sensorimotor adaptation in response to changes in F0 and formant frequencies (e.g., F1 and F2) of auditory feedback during vowel production (Bourguignon et al. 2014; Feng et al. 2011; Hawco 2010; Houde and Jordan 2002; Keough et al. 2013; Max and Maffett 2015; Shiller et al. 2010).

Furthermore, a number of studies on vocal (Jones and Keough 2008; Jones et al. 2005; Keough et al. 2013), articulation (Sengupta and Nasir 2015), and limb motor control mechanisms (Shadmehr and Mussa-Ivaldi 1994) have demonstrated an after effect associated with sensorimotor adaptation, which has been characterized by continuation of adaptive responses following removal of sensory feedback alterations. For example, a study by Jones et al. (2005) examined vocal responses to pitch-shift stimuli while Mandarin speakers produced words with specific pitch patterns (tone categories) and showed that the subjects vocally adapted by opposing pitch changes in auditory feedback with observed after effects of varying durations. This finding indicated that vocal adaptation leads to a global remapping of the auditory-motor relationship that is partially target dependent (Jones et al. 2005). In a later study (Keough et al. 2013), vocal motor responses were investigated when two English-speaking groups of singers and non-singers were asked to either ignore or compensate for pitch shifts in their auditory feedback during steady vowel vocalizations. This later study found that both trained singers and non-singers were unable to voluntarily suppress opposing responses to altered auditory feedback, and sensorimotor adaptation still occurred in both groups regardless of task instruction (Keough et al. 2013).

The neural bases of sensorimotor adaptation in the speech motor system have been investigated in more recent neurophysiological studies that recorded electroencephalography (EEG) signals during altered auditory feedback (AAF) paradigms. For example, Sengupta and Nasir (2015) implemented an adaptation paradigm using formant perturbation and demonstrated that phase coupling between neural oscillations at theta-gamma frequency bands was modulated during speech motor planning. Specifically, phase coherency was found to be increased over the centro-parietal areas and was decreased over the fronto-temporal regions of the scalp, suggesting a remapping of neuronal representations in a distributed sensorimotor network involving auditory and speech motor areas.

The aforementioned studies help interpret the theoretical framework underlying sensorimotor adaptation in the vocal and articulatory motor systems; however, supporting neurophysiological evidence is needed to further elucidate this framework in more detail. The purpose of the present study was to examine concurrent changes in vocal motor behavior and brain activities using electroencephalography (EEG) recordings during sensorimotor adaptation to pitch shifts in voice auditory feedback. We employed an experimental task in which subjects repeatedly produced steady vocalizations of a vowel sound in four conditions: (1) baseline, when auditory feedback was unaltered, (2) short-term adaptation to pitch-shifted auditory feedback, (3) long-term adaptation to pitch-shifted auditory feedback, and (4) washout, when alteration was removed from the auditory feedback. EEG activities were continuously recorded throughout the experimental session while subjects performed the vocal production tasks, and event-related potential (ERP) responses time-locked to vocalization onset were extracted offline for each subject during baseline, short-term adaptation, long-term adaptation, and washout conditions separately after EEG data recording was completed. The short-term and long-term adaptation phases were implemented and analyzed to conduct a more elaborate examination on the time course of behavioral and neural correlates of adaptive vocal motor responses to altered auditory feedback.

Based on previous findings, we hypothesized that subjects would behaviorally adapt to AAF by changing their voice F0 in the opposite direction to pitch-shift stimuli, and such adaptive changes in the vocal motor system would persist after auditory feedback alteration is removed during washout condition (after effect). We also hypothesized that adaptation to AAF would be indexed by modulation of ERPs in the vocal motor areas of the frontal, as well as sensorimotor integration areas of the parietal cortex. This latter hypothesis was motivated by recent models of audio-vocal integration (Guenther et al. 2006; Hickok et al. 2011; Hickok and Poeppel 2007; Houde and Chang 2015; Houde and Nagarajan 2011), suggesting the role of frontal and parietal cortical mechanisms in sensorimotor processing of auditory feedback during vocal production and motor control. We predicted that modulation of neural activities in these areas would highlight their specific role in adaptive vocal motor behavior for incorporating sensory feedback and reprogramming of feedforward motor commands originating from self-produced vocal productions.

Materials and methods

Subjects

This study included a sample of 12 subjects (2 males, age range 22–27 years, mean age 24.8 years) from University of South Carolina. All subjects were native speakers of English and had normal speech and hearing. Subjects reported no history of neurological disorders, voice or musical training. All study procedures, including recruitment, data acquisition and informed consent were approved by the University of South Carolina institutional review board.

Experimental design

The experimental design of this study is depicted in Fig. 1. All experimental procedures were conducted in a sound attenuated booth where subjects’ voice and EEG signals were recorded. The experiment involved a vocal motor task during which subjects produced a steady vocalization of the vowel sound /a/ at their conversational pitch and loudness while they received the auditory feedback of their own voice through insert earphones (Fig. 1a). Each subject repeated the vocalization task for a total number of 420 trials during the experiment. The duration of each vocalization was approximately 2 s, and subjects took 2–3 s break between successive vocalizations. The total duration of the experiment, including the EEG preparation, was approximately 1.5 h. As illustrated in Fig. 1b, the experiment consisted of 4 conditions: (1) baseline, subjects received normal auditory feedback of their own vowel sound vocalizations for the first 100 trials; (2) short-term adaptation, subjects’ auditory feedback was gradually pitch shifted throughout vocalization in the downward direction at a rate of − 10 cents per trial for ten consecutive trials reaching a final level of − 100 cents and remained at this level for 100 trials during this condition; (3) long-term adaptation, subjects’ auditory feedback remained pitch shifted at − 100 cents throughout vocalization for an additional 100 trials during this condition and was returned to baseline (no pitch shift) following a gradual pitch shift in the upward direction at a rate of + 10 cents per trial for ten consecutive trials; (4) washout, subjects’ auditory feedback remained at baseline level (no pitch shift) for 100 trials. The adaptation condition was divided into short-term and long-term conditions to examine the time course of adaptive vocal motor responses in more detail. Of note, the 10 trials of a gradual decrease in pitch shift at the beginning of the short-term adaptation condition and the 10 trials of gradual increase in pitch shift at the end of the long-term adaptation condition were excluded from our analyses of vocal and ERP responses. The gradual pitch shifts during transition phases from baseline to adaptation and adaptation to washout were implemented to make feedback changes less noticeable for the subjects, and helped ensure that vocal responses are primarily driven by automatic adaptation mechanisms rather subjects’ conscious and voluntary vocal changes in response to alterations in the auditory feedback.

Fig. 1
figure 1

Experimental design for the vocal motor adaptation paradigm: a subjects repeatedly maintained a steady vocalization of the vowel sound /a/ for approximately 2 s at their conversational voice pitch and loudness. b The experiment involved four conditions: (1) baseline, during which subjects received normal auditory feedback of their own vowel sound vocalizations for the first 100 trials, (2) short-term adaptation, during which subjects’ auditory feedback was gradually pitch shifted throughout vocalization in downward direction with the steps at − 10 cents/trial for 10 trials until it reached − 100 cents and then remained at that level for 100 trials, (3) long-term adaptation, during which subjects’ auditory feedback remained pitch shifted at − 100 cents throughout vocalization for another 100 trials and then gradually returned back to baseline with the steps at + 10 cents/trials after 10 trials, and (4) washout, during which the subjects’ auditory feedback remained at baseline level (no pitch shift) for 100 trials

Voice and EEG data acquisition

Subjects’ voice signal was picked up using a head-mounted AKG condenser microphone (model C520), amplified by a Motu Ultralite-MK3, and was recorded at 44.1 kHz on a laboratory computer. Pitch-shift stimuli were delivered online to voice auditory feedback using an Eventide Eclipse Harmonizer module controlled by Max/Msp program (Cycling 74, v.5.0). Etymotic insert earphones (model ER1-14A) were used to deliver the auditory feedback signal to subjects’ ears. The Max/Msp program also controlled all aspects of the pitch-shift stimuli (e.g., direction, magnitude, onset time etc.), and generated TTL pulses to accurately mark the onset of each trial during the vocal motor tasks in all four conditions. A 10 dB gain was applied to voice feedback signal delivered through insert earphones, which was the auditory input that carried the pitch-shifted version of subjects’ voice during adaptation trials, and the unaltered voice feedback during baseline and washout trials. This feedback gain was maintained to minimize effects associated with air-born or bone-conducted voice feedback throughout vocalizations. Since any minimal residual effect from air-born or bone-conducted feedback remained relatively consistent across trials, therefore, modulation of subjects’ responses was only accounted for by changes in the feedback signal delivered through the insert earphones.

The EEG signals were recorded from 64 sites on the subjects’ scalp using actiCAP Ag–AgCl active electrodes (Brain Products GmbH, Germany) according to the standard 10–20 montage (Fig. 2). Scalp-recorded EEG activities were low-pass filtered at 200 Hz (anti-aliasing filter), digitized at 1 kHz, and were recorded with a common reference using the BrainVision actiCHamp amplifier (Brain Products GmbH, Germany) on a computer utilizing Pycorder software. Electrode impedances were kept below 5 kΩ for all channels.

Fig. 2
figure 2

The EEG electrodes montage for 64 channels according to the standard 10–20 system. Color-coded electrodes show regions of interest (ROIs) used for statistical analysis

Analysis of vocal responses

The pitch frequency of the recorded voice signals was extracted in Praat (Boersma and Weenik 2001) using an autocorrelation method and then exported to a custom-made MATLAB code for further processing. For each individual subject, pitch frequencies were extracted for vocalization trials during baseline, adaptation (short-term and long-term), and washout conditions, separately. For every trial, vocal responses were calculated for each subject by converting pitch frequencies from Hertz to Cents scale according to the following formula:

$${\text{Vocal~Response~}}\left[ {{\text{cents}}} \right] = 1200 \times {\rm log}_{2} \left( {\frac{F}{{F_{{{\text{Baseline}}}} }}} \right).$$

In this formula, F is the single-trial mean of pitch frequency [in Hertz] averaged in a time window from 0 to 200 ms relative to vocalization onset in each condition, and FBaseline is the mean of all vocalization pitch frequencies in the same time window (0–200 ms) averaged across all the first 100 baseline trials for each individual subject. Findings from previous studies have suggested that compensatory responses to pitch-shift stimuli are elicited at approximately 200 ms after the onset of vocalization (Behroozmand et al. 2012; Burnett et al. 1998; Chen et al. 2007b; Hawco and Jones 2009). Therefore, we used an earlier time window at 0–200 ms to capture the adaptive, rather than compensatory, nature of vocal responses, and to avoid conflated measures of adaptation and compensation in response to pitch shifts. The extracted vocal responses [in cents] for each subject were then averaged across all subjects to obtain the grand-average profile of responses during baseline, short-term adaptation, long-term adaptation, and washout conditions.

Analysis of ERP responses

The EEGLAB toolbox (Delorme and Makeig 2004) was used to analyze recorded EEG signals to calculate ERPs time-locked to the onset of vocalization during baseline, short-term adaptation, long-term adaptation, and washout conditions. The recorded EEGs were first filtered offline using a band-pass filter with cut-off frequencies from 1 to 30 Hz (− 24 dB/oct) and then segmented into epochs ranging from − 600 ms before and 500 ms after the onset of vocalization. Following segmentation, artifact rejection was carried out by excluding epochs with EEG amplitudes exceeding ± 50 µV. Individual epochs for each trial were then subjected to baseline correction by removing the mean amplitude of the pre-stimulus time window from − 600 to − 500 ms before vocalization onset for each electrode. The extracted epochs were then averaged across all trials in each condition separately to obtain the ERP responses during baseline, short-term adaptation, long-term adaptation, and washout conditions. A minimum number of 75 trials was used to calculate the ERP responses during each condition for each subject. The extracted ERP profiles were then averaged across all subjects to calculate the grand-average ERP responses.

ERP source estimation

The standardized low-resolution brain electromagnetic tomography (sLORETA) was used to calculate the cortical distribution of current density for adaptation-induced modulation of the ERP responses. The sLORETA algorithm generates a single linear solution to the inverse problem of cerebral source localization based on a linear weighted sum of the scalp electric potentials (Pascual-Marqui 2002a, b). In this method, the standardized current density of a grid with 6239 voxels at 5 mm spatial resolution is calculated in the gray matter of a reference brain map normalized to the Montreal Neurological Institute (MNI) space under the assumption that neighboring voxels should have maximally similar electrical activity (Fallgatter et al. 2003). Accuracy of the sLORETA source localization algorithm has been validated in previous studies using fMRI (Mulert et al. 2004), PET (Pizzagalli et al. 2004), and intra-cranial ECoG recordings (Zumsteg et al. 2006). In our study, adaptation-induced modulation of ERP sources were calculated for each subject using sLORETA based on the mean amplitudes of the pre-motor (− 200 to 0 ms) and post-motor (0–200 ms) ERP components relative to the onset of vocalization. Statistical comparison of ERP modulation was performed by examining the contrast between short-term adaptation, long-term adaptation, and washout compared with baseline condition. The voxel-based sLORETA images were calculated for maximal global field power peaks within the pre-motor and post-motor ERP time windows using a realistic standardized head model (Fuchs et al. 2002) and the MNI152 template (Mazziotta et al. 2001) with the three-dimensional solution space restricted to cortical gray matter. Voxel-by-voxel comparisons of the current density distributions were performed using sLORETA voxel-wise randomization tests with 5000 permutations based on statistical non-parametric mapping, and corrected for multiple comparisons. The voxels with significant differences (p < 0.05, corrected) were specified in MNI coordinates and labeled as Brodmann areas (BA) in sLORETA software.

Results

Behavioral vocal responses

Results of the analysis for vocal responses during baseline, short-term adaptation, long-term adaptation, and washout conditions are shown in Fig. 3. As shown in this figure, subjects produced a relatively stable vocal pitch output during the baseline condition when they received their normal auditory feedback during vowel vocalizations (Fig. 3a, red trace). During short-term adaptation, subjects opposed the direction of downward (− 100 cents) pitch shifts in the auditory feedback by progressively changing the pitch of their vocal output in the upward direction (Fig. 3a, black trace). This increasing trend of vocal pitch output was maintained during the long-term adaptation condition (Fig. 3a, gray trace), until the pitch shift in the auditory feedback gradually diminished and returned back to baseline (unaltered feedback) at the end of the trials (Fig. 3a, gray trace). During the washout condition, although subjects received their normal auditory feedback (i.e. no pitch shift), an after effect of adaptation was observed and the subjects maintained their vocal pitch output above the baseline level (Fig. 3a, blue trace). On average, the mean of the subjects’ vocal responses to pitch shifts in the auditory feedback were at 62.3 and 88.6 cents during short-term and long-term adaptation, respectively, and their vocal pitch output persisted to remain at a mean of 73.1 cents during washout condition (Fig. 3b). A one-way repeated-measures analysis of variance (Rm-ANOVA) revealed a significant main effect of condition on behavioral responses during vocal production [F(3,36) = 3.04, p < 0.05]. Post hoc t tests using Bonferroni’s correction for multiple comparisons indicated a significant increase in vocal response magnitudes during short-term adaptation [t(11) = 2.97, p < 0.01], long-term adaptation [t(11) = 3.11, p < 0.01], and washout [t(11) = 3.04, p < 0.01] conditions compared with baseline. In addition, we found that vocal responses during long-term adaptation were significantly larger than those during short-term adaptation [t(11) = 2.59, p < 0.05] and washout [t(11) = 2.33, p < 0.05] conditions.

Fig. 3
figure 3

Results of the behavioral vocal response analysis: a trial-by-trial profile of the grand-average (n = 12) vocal responses during baseline, short-term adaptation, long-term adaptation, and washout conditions. b Bar plot representation of the statistical analysis for the differences between the mean of the grand-average vocal responses during baseline, short-term adaptation, long-term adaptation, and washout conditions (*p < 0.05, **p < 0.01)

ERP responses

We identified two major types of ERP components that were elicited during the vocal production tasks. These components included ERP activities that were elicited prior to and after the onset of vocalization, hereafter referred to as “pre-motor” and “post-motor” components, respectively (Fig. 4). Statistical analysis of the ERP responses for the pre-motor and post-motor components was performed at time windows from − 200 to 0 ms and 0–200 ms relative to the onset of vocalization, respectively. For each component, responses were analyzed using a 4 × 2 Rm-ANOVA model with factors including condition (baseline, short-term adaptation, long-term adaptation, and washout) and laterality (left vs. right hemisphere) for electrode positions in five different regions of interest (ROI) separately: (1) frontal (left: F1, F3, F5; right: F2, F4, F6), (2) fronto-central (left: FC1, FC3, FC5; right: FC2, FC4, FC6), (3) central (left: C1, C3, C5; right: C2, C4, C6), (4) centro-parietal (left: CP1, CP3, CP5; right: CP2, CP4, CP6), and (5) parietal (left: P1, P3, P5; right: P2, P4, P6).

Fig. 4
figure 4

a Topographical distribution maps of the pre-motor (− 200 to 0 ms) and post-motor (0–200 ms) ERP responses during baseline, short-term adaptation, long-term adaptation, and washout conditions. b The profiles of the grand-average ERP responses (n = 12) in the left and right parietal and fronto-central electrodes overlaid across baseline, short-term adaptation, long-term adaptation, and washout conditions

Results of the analysis on the pre-motor ERP components did not reveal any significant effects. However, for post-motor ERP activities, results indicated a significant main effect of condition on electrodes over the fronto-central [F(3,33) = 4.09, p < 0.05] and parietal [F(3,33) = 4.61, p < 0.01] areas. Post hoc tests using Bonferroni’s correction showed that the ERP responses over the fronto-central area were significantly stronger (p < 0.05) during short-term and long-term adaptation compared with baseline and washout conditions (Fig. 4). For electrodes over the parietal area, post hoc tests revealed that the ERP responses were significantly stronger (p < 0.01) only during the long-term adaptation compared with baseline, short-term adaptation, and washout conditions. The temporal profiles and topographical distribution maps of the ERP difference waveforms for short-term adaptation, long-term adaptation, and washout vs. baseline are shown in Fig. 5. In addition, for electrodes over the frontal area, only a significant main effect of laterality was found [F(1,11) = 6.04, p < 0.05], indicating stronger responses over the right vs. left hemisphere.

Fig. 5
figure 5

a The overlaid profiles of the ERP difference waveforms for short-term adaptation, long-term adaptation, and washout vs. baseline in a representative electrode over the parietal area (Pz). b Topographical distribution maps of the post-motor (0–200 ms) ERP response difference for short-term adaptation, long-term adaptation, and washout vs. baseline

Effects of vocal adaptation on ERP sources

Results of the adaptation-induced modulation of ERP sources are summarized in Table 1. In this table, the MNI coordinates and their corresponding Brodmann’s areas (BA) are reported for brain regions associated with current density maxima of significantly modulated sources of ERP activities. Results indicated a significant decrease in post-motor (0–200 ms) ERP sources localized over the left and right superior parietal lobules (SPL) only for long-term adaptation vs. baseline condition (BA 7, p < 0.05; corrected). Figure 6 shows the estimated current source density maps of statistically significant differences between long-term adaptations vs. baseline projected onto the coronal slices of the MNI152 template (top), and the realistic standardized head model (bottom). However, no such modulation effect was found for short-term adaptation or washout compared with baseline, as these contrasts did not reach significance at the ERP source level.

Table 1 sLORETA t statistics for significant modulation of the ERP sources within the post-motor time window (0–200 ms) during the long-term adaptation vs. baseline condition (MNI coordinates)
Fig. 6
figure 6

The estimated current source density maps of statistically significant differences between post-motor (0–200 ms) ERP responses during the long-term adaptation vs. baseline condition. The maps are projected onto the coronal slices of the MNI152 template (top), and the realistic standardized head model (bottom)

Correlation analysis

A Pearson’s correlation analysis was performed to examine the relationship between ERP activities and vocal responses during short-term adaptation, long-term adaptation, and washout relative to the baseline condition. Figure 7 shows the color-coded data for different experimental conditions. We found significant positive correlations between the magnitude of vocal responses (0–200 ms after vocalization onset) and the amplitude of post-motor ERPs (0–200 ms) over the bilateral fronto-central areas with strongest correlation in F3 electrode (r = + 0.56, p < 0.01). In addition, we found significant negative correlations between vocal response magnitudes and post-motor ERPs over the bilateral parietal areas with strongest correlation in P5 electrode (r = − 0.53, p < 0.01). For pre-motor ERP activities (− 200 to 0 ms), strongest correlations with vocal responses were found in FCz (r = + 0.36) and P5 (r = − 0.31) electrodes, however, these correlations were not found to be significant (p > 0.05). The topographical distribution maps of ERP vs. behavioral vocal responses are shown in Fig. 7b.

Fig. 7
figure 7

a Correlations between vocal response magnitudes and post-motor (0–200 ms) ERP activities in representative left fronto-central (F3) and parietal (P5) electrodes during short-term adaptation, long-term adaptation, and washout relative to the baseline condition, respectively. Color-coded circles in these plots show responses during different experimental conditions. b Topographical distribution maps of correlation between vocal response magnitude and the pre-motor (− 200 to 0 ms) and post-motor (0–200 ms) ERP responses during vocal production

Discussion

The present study used ERP recordings combined with the AAF paradigm to investigate the underlying neural mechanisms of sensorimotor adaptation in a timescale comparable to behavioral responses in the vocal production system. The adaptive nature of vocal sensorimotor responses to AAF was studied by analyzing vocal responses to pitch-shift stimuli in an early time window (0–200 ms) following the onset of vocalizations during the baseline, adaptation, and washout conditions. The choice of this time window allowed us to primarily focus on examining sensorimotor adaptation mechanisms before the emergence of compensatory responses to AAF in the vocal motor system (Hawco and Jones 2009). Results of our analysis indicated that the vocal motor system adapted to AAF by generating upward responses that opposed the direction of downward (− 100 cents) pitch-shift stimuli during the adaptation compared with baseline trials with unaltered (normal) auditory feedback. The adaptive changes in vocal motor behavior were persistent throughout adaptation trials and were partially maintained during washout trials when pitch-shift alterations were removed from the auditory feedback. Our findings provide supporting evidence for an adaptive mechanism in the vocal production system that incorporates auditory feedback and updates feedforward motor commands in response to persistent alterations in voice F0 feedback. We suggest that the observed vocal motor adaptation to AAF and its subsequent after effect during washout are indicative of sensorimotor reprogramming of feedforward motor commands in response to auditory feedback alterations.

Despite the inherent differences among mechanisms underlying vocalization, articulation, and limb motor control, adaptive responses to altered sensory feedback have previously been characterized in these modalities through manipulating feedback F0 during vocal production (Hawco 2010; Hawco and Jones 2009; Jones et al. 2005; Keough et al. 2013), perturbing formant frequencies during speech articulation (Cai et al. 2011, 2010; Houde and Jordan 2002; Lametti et al. 2012, 2014; Max and Maffett 2015; Sengupta and Nasir 2015), and applying force-filed during limb movement (Ostry and Gribble 2016; Shadmehr and Krakauer 2008; Shadmehr and Moussavi 2000; Shadmehr et al. 2010; Tseng et al. 2007). A number of studies have also demonstrated an after effect associated with continuation of adaptive responses following removal of sensory feedback alterations in the vocal (Jones and Keough 2008; Jones et al. 2005; Keough et al. 2013), articulation (Sengupta and Nasir 2015), and limb motor systems (Shadmehr and Mussa-Ivaldi 1994).

The neural bases of sensorimotor adaptation have recently been investigated in the speech motor system in a study that examined phase coupling between neural oscillations during a formant perturbation task (Sengupta and Nasir 2015). In that study (Sengupta and Nasir 2015), theta-gamma phase coherency was found to be increased over the centro-parietal areas and was decreased over the fronto-temporal areas during adaptation to formant perturbations, suggesting a remapping of neuronal representations in a distributed sensorimotor network involving auditory and speech motor areas. In the present study, we adopted the pitch-shift experimental paradigm combined with Electroencephalography (EEG) recordings during vocalization to investigate the neural correlates of sensorimotor adaptation in the vocal motor system. Analysis of the ERP responses during vocal production highlighted the important role of specific neural response components that reflect sensorimotor adaptation in the vocal motor system. We found that when subjects produced a steady vowel vocalization, pre-motor ERP responses were elicited at approximately − 200 ms prior to the onset of the vocal production. However, when the pre-motor ERP activities were compared across baseline, adaptation, and washout conditions, results of our analysis did not reveal any significant effects. When vocalizations were initiated, the pre-motor neural responses were followed by post-motor ERP activities, which were predominantly distributed as negative potentials over the parietal, and positive potentials over the frontal electrodes. In contrast with pre-motor neural responses, results of our analysis on post-motor ERP activities revealed significant effects across baseline, adaptation, and washout conditions, suggesting that the post-motor ERP components are implicated in vocal sensorimotor adaptation when auditory feedback is altered by pitch-shift stimuli. We found that, consistent with behavioral responses, the amplitude of the frontal component of post-motor ERPs was significantly increased during short-term and long-term adaptation compared with baseline and washout conditions. One possible explanation for this effect is that the frontal ERP component may reflect neural processes that encode feedback error in response to pitch-shift stimuli during adaptation conditions. However, when responses were compared across short-term and long-term adaptation, we found significantly larger vocal responses indicating smaller error signals during long-term vs. short-term adaptation, but no such effect was observed for frontal ERPs across these conditions. This latter evidence rules out the possibility of frontal ERPs being implicated in encoding feedback error and suggest that this component may reflect a general neural process underlying sensorimotor adaptation, which accounts for reprogramming of feedforward motor commands in response to alterations in voice auditory feedback.

In addition to the frontal ERP component, we identified an ERP response that was predominantly distributed as a negative potential over the parietal electrodes. We found that the amplitude of this parietal ERP component was significantly increased during the long-term adaptation compared with baseline, short-term adaptation, and washout conditions. This finding suggests that the parietal ERP component is not reflective of a general sensorimotor adaptation or error processing mechanism, but rather it may be indicative of a neural process that is specific to vocal sensorimotor adaptation (or perhaps learning) following long-term exposure to alterations in the auditory feedback. Our data supports this notion by several important observations: first, we found that the parietal ERP did not significantly increase when subjects transitioned from baseline (normal feedback) to short-term adaptation trials in which voice feedback was altered by AAF stimuli. This finding rules out the possibility of the parietal ERP being a neural indicator of vocal pitch error processing in the auditory feedback. Second, when responses were compared during short-term and long-term adaptation trials (both including AAF stimuli), the parietal ERPs were only increased when subjects were exposed to a longer period of feedback alteration during long-term adaptation and generated vocal responses that were significantly larger than those generated during short-term adaptation. Lastly, when feedback alteration was removed during washout, despite the presence of an adaptation after effect in vocal motor output, the gradual decline in the magnitude of vocal responses was accompanied by diminished parietal ERP responses during washout compared with long-term adaptation conditions. It is also important to note that in comparison with baseline, although not statistically significant, we observed a trend of increased parietal ERP activation during short-term adaptation and washout trials in which subjects exhibited a weaker degree of vocal adaptation compared to what was observed during long-term adaptation.

The functional role of the parietal cortex, through its extensive short-range connectivity with primary motor and somatosensory cortices as well as long-range connectivity with cerebellum, pre-motor and prefrontal cortices, has been associated with multisensory integration and sensorimotor adaptation for motor behaviors other than speech (Jones et al. 1978; Nasir et al. 2013; Ostry and Gribble 2016; Vahdat et al. 2011, 2014). In recent studies, the notion of parietal cortex involvement in sensorimotor integration and its role in adaptation has also been established in the vocal (Zarate and Zatorre 2005, 2008) and speech motor systems (Ostry and Gribble 2016; Sengupta and Nasir 2015; Shum et al. 2011). One study (Shum et al. 2011) delivered repetitive transcranial magnetic stimulation (rTMS) to inferior parietal lobe (IPL) before subjects performed a speech motor adaptation under formant perturbation. They found that subjects in the control (placebo) group who received sham stimulation (only at 1% of actual stimulation output during rTMS) exhibited robust speech adaptation to formant perturbation, whereas subjects who received normal rTMS exhibited a diminished adaptive speech motor response (Shum et al. 2011). This finding provided evidence that areas within the parietal cortex, specifically in and around supramarginal gyrus (SMG), are involved in sensorimotor adaptation during speech. In another study, Sengupta and Nasir (2015) reported an increase in the gamma power of neural activities during speech motor adaptation predominantly over the centro-parietal and fronto-temporal scalp electrodes. They also demonstrated that the increase in theta-gamma phase coherency at the centro-parietal electrodes was accompanied by a decrease in coherency at the bilateral fronto-temporal areas, suggesting that the engagement of sensorimotor areas in the parietal cortex was progressively increased in exchange for fronto-temporal disengagement, as the subjects’ speech motor system adapted to AAF (Sengupta and Nasir 2015). Sengupta and Nasir (2015) proposed that this neural reorganization of sensorimotor maps was driven by feedback errors and resulted in establishment of a new feedforward motor program as subjects learned the altered speech feedback. This notion was corroborated by findings showing that increased phase coherency over the parietal cortex was strongly correlated with the magnitude of speech motor adaptation (Sengupta and Nasir 2015).

In the present study, we used ERP source estimation to examine the neural substrates of sensorimotor adaptation in the vocal motor system. Results of our analysis showed that long-term adaptation to pitch-shift alteration in the auditory feedback was accompanied by enhanced neural activities within the parietal cortex on areas over the superior parietal lobule. We also found that the degree of parietal and frontal cortical activation was significantly correlated with adaptive vocal motor behavior in response to AAF stimuli. Although the observed correlation effect reflects a general pattern of relationship between the behavioral and neural correlates of vocal sensorimotor adaptation in response to pitch-shift alterations in the auditory feedback, limitations in our experimental design precluded us from determining whether such correlation is merely driven by differences in the experimental conditions (e.g., different feedback alterations), or is reflective of a general relationship between the behavioral and neural measures of vocal production independent of alterations in voice auditory feedback. Despite such limitations, our findings provided evidence supporting the notion that vocal sensorimotor adaptation following long-term exposure to pitch-shift alteration in the auditory feedback is accompanied by modulation of ERP activities and increased contribution of the cortical parietal and frontal mechanisms. Although data from previous studies (e.g., Shum et al. 2011) have supported the role of inferior parietal lobule in sensorimotor adaptation, findings of the present study are the first to suggest that areas within the superior parietal lobule are also implicated in vocal sensorimotor adaptation, specifically following long-term exposure to alteration in the auditory feedback. The neural mechanisms of sensorimotor processing during long-term vocal motor adaptation are not fully understood and future investigations are warranted to examine the role of parietal and frontal cortical areas in this process.

An effect associated with sensorimotor adaptation to altered auditory feedback (e.g., pitch shift) is that, during washout condition, vocal responses exhibit after effects that diminish over time and return to the original baseline. This gradual transition in after effects occurs because the removal of feedback alteration would be treated by the system as a new sensory error, which drives the vocal motor output to return to the original baseline during washout. However, the pattern and time course of such transition from after effects to baseline are not fully determined and have largely varied among different studies on vocal sensorimotor adaptation (Jones and Keough 2008; Jones and Munhall 2000; Jones et al. 2005). Previous studies have reported an after effect during the washout condition that was relatively more transient (i.e., showed faster transition toward baseline) compared with the after effect observed in the present study. In our study, subjects were tested for a total of 100 trials during each condition, and our data showed a pattern of sustained voice F0 below the long-term adaptation level, followed by a trend of transition toward the baseline during washout. However, it is likely that an extended time period beyond 100 trials may have revealed a clearer pattern of transition toward the original baseline during washout in our study. In addition, one possible explanation for the difference in the time course of vocal F0 transition during washout in the present vs. previous studies may come from an observed effect in a study by Jones and Munhall (2000) in which a gradual (and sustained) drift in voice F0 was reported when no pitch alteration was delivered to the auditory feedback. Based on that earlier finding (Jones and Munhall 2000), we suggest that a similar pattern of drift in voice F0 may explain longer after effect transitions toward the baseline in the present study. Future research will provide a more comprehensive understanding about the differences in the pattern and time course of after effects during vocal sensorimotor adaptation.

The question regarding the generalization of the observed adaptation effect to other vocal production parameters (e.g., different vowel sound categories) warrants future investigations. A study in Mandarin speakers during word production with specific target pitch patterns (tone categories) showed that adaptation in response to pitch-shifted auditory feedback for one tone category generalized to the production of another category (Jones et al. 2005). However, this learning effect was shown to be primarily target dependent and did not represent a global transformation of the audio-vocal mapping during adaptation. Studies on reaching mechanisms suggested a similar effect by showing that when subjects were trained to produce arm movements in a novel visual feedback environment, motor learning generalized to movements scaled independently in the temporal or spatial dimension (Goodbody and Wolpert 1999). However, when examined within the spatial dimension, generalization decayed as the distance between the original training position and translated spatial field increased (Ghahramani et al. 1996; Shadmehr and Moussavi 2000). These findings corroborate the notion that adaptation is a target dependent process in which disparities (errors) between motor actions and their sensory goals drive subsequent sensorimotor remapping.

Evidence from studies in neurological patients has suggested that the non-cortical structures, such as basal ganglia and cerebellum, also contribute to sensorimotor adaptation in the vocal motor system. Studies in patients with Parkinson’s disease (PD) have suggested that a possible consequence of sensorimotor deficits is related to damage to the basal ganglia associated with impairment in processing sensory input and learning new motor skills to adapt to changes in the environment. Studies using the AAF paradigm in PD have characterized sensorimotor deficits in the vocal motor control mechanisms by showing that the patients exhibited increased compensatory vocal changes in response to pitch and loudness perturbations in their auditory feedback compared with age-matched healthy (control) individuals (Chen et al. 2013; Liu et al. 2012). It has been suggested that overcompensation in PD during vocal pitch and loudness motor control may be accounted for by amplified reafferent feedback as a consequence of basal ganglia dysfunction. In contrast, a study by Mollaei et al. (2013) showed that when PD patients received formant perturbations in the auditory feedback, they did not exhibit speech motor adaptation compared with the control group. This latter finding suggests that deficits in the basal ganglia can impair the ability to establish new maps for motor adaptation in the vocal production system. A recent study in patients with cerebellar damage (Parrell et al. 2017) have revealed attenuation in speech adaptation, but increased compensatory responses to formant perturbations, suggesting that the cerebellum is crucial for maintaining accurate feedforward control during motor adaptation, but is uninvolved in feedback control during speech production. In another study (Lametti et al. 2017), applying neurostimulation to cerebellar regions in healthy adults was shown to enhance adaptive responses to formant perturbations, corroborating the notion that cerebellum is implicated in sensorimotor adaptation in the speech production system. The differential effects of basal ganglia and cerebellum damage on behavioral responses to AAF suggest that these neural structures may subserve different functional mechanisms during sensorimotor adaptation, which highlight the role of the basal ganglia in feedforward reprogramming (i.e., establishing a new sensorimotor map) vs. the cerebellum being primarily implicated in maintaining accurate feedforward motor control. Therefore, a thorough understanding of distinctions between the underlying cortical and subcortical mechanisms of sensorimotor adaptation and motor control warrants further investigations in the future. Promoting our knowledge about these mechanisms will have important clinical applications and will pave the way toward developing new treatment strategies for improving vocal and speech communication in patients with neurological disorders.