Introduction

Auditory neurophysiological deficits in schizophrenia have been described at multiple levels of processing. Using electroencephalography (EEG), researchers have identified reductions in the mid-latency auditory evoked potential (AEP) P50, which is insensitive to the allocation of attention (Clementz et al. 1998; Bramon et al. 2004; Patterson et al. 2008), and various late, attention-sensitive AEPs, such as N100 and P200 (McCarley et al. 1991; Boutros et al. 2004). Importantly, schizophrenia-related impairments in auditory predictive modeling have also been evidenced by AEPs using differential measures such as mismatch negativity (MMN) (Baldeweg et al. 2004) and P50, N100, and P200 repetition suppression (RS) (Boutros et al. 1999, 2004; Baldeweg 2006, 2007). Auditory predictive modeling deficits have been linked to altered salience (Neuhaus et al. 2013) and reward processing (Murray et al. 2007), and have been suggested as a core component of psychotic symptoms, such as delusions (Corlett et al. 2007) and auditory hallucinations (Ford et al. 2014).

In the theoretical model put forth by Friston (Friston 2005), sensory systems form a predictive coding heirarchy that is designed to minimize prediction error. Prediction error results from a mismatch between bottom-up sensory input and top-down prediction, and drives a change in the predicted model. As the predictive model is refined with stimulus repetition, prediction error is reduced through plastic alterations in connection strenghts at multiple levels of the heirarchy (Baldeweg 2006; Friston 2008). According to the sharpening model of repetition suppression (Wiggs and Martin 1998), these alterations in connection strenghts result in a sparser representation of stimuli, where neurons that code irrelevant or misclassified stimulus features will be suppressed in response to subsequent stimulus presentations. This sparse representation utilizes fewer neural resources and generates a smaller local neurophysiological response, measured at the scalp as a reduction in AEP amplitude. Model sharpening is perhaps best demonstrated by studies of stimulus-specific adaptation (SSA), a special case of RS where responses are suppressed specifically for stimuli matched for physical parameters such as pitch and duration (Ulanovsky et al. 2003). When the physical parameters are changed, neuronal responses are released from adaptation (Malmierca et al. 2009). Relatedly, MMN is a negative event-related potential (ERP) generated when an expected acoustic pattern is violated by a rare stimulus deviant. This signal is thought to represent pre-attentive deviance detection (prediction error) and active control of predictive coding (Wacongne et al. 2012). Thus, MMN reflects the active prediction error signal, while RS/SSA reflects the reduction of prediction error through model sharpening.

It is difficult to distinguish deficits in prediction error from deficits in model sharpening in schizophrenia patients because the timing of the MMN overlaps with AEPs known to be sensitive to RS (N100 and P200) and both MMN and RS are impaired in schizophrenia. Additionally, while MMN is traditionally measured using pure tones that can easily be varied on parameters like pitch and duration, RS studies in schizophrenia commonly use broadband paired-click stimuli to elicit stimulus-general sensory “gating” of AEPs. Relationships have been identified between stimulus-general RS measured with sensory gating paradigms and MMN to deviations in tone pitch (Kisley et al. 2004; Gjini et al. 2010; Rentzsch et al. 2015); however, it is unclear whether similar relationships would be identified using pure tone stimuli. Furthermore, although MMN is certainly related to RS both conceptually (i.e. predictive modeling) and arithmetically (i.e. MMN is calculated by subtracting suppressed N100 and/or P200 from unsuppressed responses to novel tones), studies of schizophrenia-related differences in MMN have not controlled for RS effects, and shared predictive variance between RS and schizophrenia diagnosis in the prediction of MMN amplitude has not been addressed.

We recently reported that a deviance-related negative ERP can be elicited by adding an extra tone to a predictable 5-stimulus group, further suggesting that MMN is distinct from RS (Haigh et al. 2016). Incidentally, this type of paradigm affords the opportunity to investigate the questions raised above, as identical tones were repeated with the same physical parameters commonly used in MMN research. In this study, we analyzed data from the study presented by Haigh et al. to compare and contrast pitch MMN, duration MMN, P50 RS, N100 RS, and P200 RS between participants diagnosed with schizophrenia and matched healthy controls. Stimulus parameters for repeated tones were identical between the RS and MMN tasks. We hypothesized that schizophrenia patients would show reduced MMN and RS, that both groups would show a moderate relationship between pitch MMN and N100 RS and between duration MMN and P200 RS (AEPs that overlap in time), and that group differences in MMN amplitude would be independent of RS.

Materials and Methods

Participants

Twenty-six individuals with schizophrenia (SZ) and 26 healthy control subjects (HC) participated in this study. Participants were matched for age, gender, parental social economic status, and estimated IQ (Table 1). Schizophrenia diagnosis was based on the Structured Clinical Interview for DSM-IV (SCID-P). Symptom scales, neuropsychological tests, and surveys were identical to those used by Haigh et al. (2016) (Table 1). All subjects had normal hearing as assessed by audiometry and were paid for participation. Informed consent was obtained from all individual participants included in the study, and all procedures were in accordance with the ethical standards of the University of Pittsburgh IRB and with the 1964 Helsinki declaration and its later amendments.

Table 1 Participant characteristics.

Procedures

EEG was recorded while participants watched a silent video. Tones were created with Tone Generator (NCH software) and presented using Presentation (Neurobehavioral Systems, Inc.). Binaural auditory stimuli were presented using Etymotic 3A insert earphones. Sets of five tones were presented, with 330 ms SOA separating tones within groups and 750 ms inter-trial interval between tone groups. Each tone was identical (1 kHz, 75 dB, 50 ms pips, 5 ms rise/fall times). Six-tone deviant trials (10%) were also presented. Results from deviant stimulus groups are not discussed in this manuscript, as RS was the primary focus of the current report, and analysis of the frequent standard groups provided the greatest signal-to-noise ratio. Mismatch negativity (MMN) was measured using a separate task. In the MMN task, standard tones were presented repeatedly (1 kHz, 75 dB, 50 ms pips, 5 ms rise/fall times, 330 ms SOA), with an occasional pitch deviant (1.2 kHz, 10% of trials) or duration deviant (100 ms, 10% of trials) interspersed.

EEG

EEG was recorded from a custom 72-channel Active2 high-impedance system (BioSemi), comprising 70 scalp sites including both mastoids, one below the right eye, and one at the nose tip (bandpass = DC − 104 Hz, digitized at 512 Hz). Processing was done off-line with EEGLAB (Delorme and Makeig 2004) and BrainVision Analyzer2 (Brain Products GMBH). Using EEGLAB, data were high-pass filtered (0.5 Hz, 24 dB/octave), visually inspected, and channels with excessive noise were removed and interpolated. Independent components analysis (ICA) was then used to isolate and remove eye-blinks, horizontal eye movements, and cardiac signal.

Repetition Suppression

In BrainVision Analyzer 2, EEG data were filtered and re-referenced to averaged mastoids. For P50 measurement, data were filtered from 10 to 70 Hz (24 dB/octave). For N100 and P200 measurement, data were low-pass filtered at 20 Hz (24 dB/octave). Epochs (400 ms) were extracted separately for each of the five tones in the sequence, including a 50 ms pre-stimulus baseline to which epochs were baseline-corrected. Artifact rejection was then performed using a two-step process to remove trials with (1) excessively high amplitude signals (e.g. movement) and (2) alpha contamination. Specifically, trials containing (1) any signal ± 50 μV or (2) min-max amplitude difference greater than 50 μV between 280 and 350 ms after stimulus onset were rejected prior to stimulus averaging. A minimum of 216 trials were included in any given analysis (mean ± SD = 831 ± 186 trials). For measurement of the N100 and P200, epochs were then truncated from −5 to 240 ms and the linear trend from the initial 5 ms (−5–0 ms) to the final 5 ms (235–240 ms) was removed to eliminate slow components caused by overlapping responses to stimuli from individual ERPs related to slow wave and/or N2 contamination. Mean amplitudes of the P50, N100, and P200 were calculated between 55 and 65 ms, 90–110 ms, and 145–175 ms following the onset of each tone, respectively. P50 ratio was additionally calculated for each subject according to the P50 gating literature convention to ensure that effects (or lack thereof) were not related to the way in which P50 amplitude was measured in this study (Patterson et al. 2008). Briefly, P50 peak-to-peak amplitude at electrode FCz in response to the first stimulus, measured as the amplitude difference between the P50 peak and the Na peak (the negative wave preceding the Pa/P30 response), was divided by P50 peak-to-peak amplitude at electrode FCz in response to the second, third, fourth, and fifth stimuli.

Mismatch Negativity

In BrainVision Analyzer 2, data were low-pass filtered (20 Hz, 24 dB/octave) and re-referenced to averaged mastoids. Epochs (400 ms) were extracted separately for standard (frequent) and deviant (infrequent) tones, including a 50 ms pre-stimulus baseline to which epochs were baseline-corrected. Artifact rejection parameters were identical to RS analysis. A minimum of 56 trials were included in any given analysis (standard trial: mean ± SD = 1155 ± 215 trials; pitch deviant: 142 ± 27 trials; duration deviant: 142 ± 27 trials). Difference waves were calculated separately for each subject to isolate MMN, subtracting the average standard tone response from the average deviant tone response (pitch or duration deviant). Mean amplitude of the MMN was calculated from the difference wave between 80 and 130 ms following stimulus onset for pitch MMN, or between 140 and 190 ms for duration MMN.

Data Analysis

Group demographics were compared using t-tests and chi-squared tests where appropriate. We calculated effect sizes for group differences in P50 RS, N100 RS, and P200 RS using Cohen’s d. For the purpose of these effect sizes, RS was calculated as the difference between S1 responses and the average of all other responses (S2–S5) at electrode FCz. S1/S2 P50 ratios were compared using t-tests, and P50 ratios across all repetitions were compared using 2 × 4 split-plot analysis of variance (SP-ANOVA), with schizophrenia diagnosis (HC or SZ) as the between-subjects factor and serial tone positon (2nd–5th) as the within-subjects factor. Mean P50, N100, and P200 amplitudes were compared over six frontal/frontocentral sites (F1, Fz, F2, FC1, FCz, and FC2) using 4-way SP-ANOVAs. Initial mean amplitudes and MMN amplitudes were compared over the same six electrodes using 3-way SP-ANOVAs. Schizophrenia diagnosis was the between-subjects factor, and electrode chain [frontal (F) or frontocentral (FC)], electrode laterality (left to right: 1, z, or 2), and serial tone positon (1st–5th, 4-way analysis only) were within-subjects factors. Significant effects of serial tone position were followed by four planned pairwise comparisons (first tone vs. others). For all within-subject statistics, Huynh-Feldt epsilon was used to correct for assumptions of sphericity. All simple effects were analyzed using Fischer’s LSD.

Sequential regression was employed to determine if addition of schizophrenia diagnosis (coded as 0 = HC, 1 = SZ) improved prediction of pitch or duration MMN amplitude beyond that afforded by RS measures alone. For these statistics, RS was calculated as the difference between mean AEP amplitudes to the first and fifth tone (S1–S5) at electrode FCz. Bivariate associations between schizophrenia diagnosis, RS, and MMN were examined using Pearson Correlation, partial correlation was used to assess the degree to which effects of diagnosis on MMN were independent of RS, and R 2 change was used to determine whether schizophrenia diagnosis accounted for a significant proportion of the variance in MMN over and above RS. Tolerance was examined to evaluate collinearity between RS and schizophrenia diagnosis in the prediction of MMN amplitude.

Results

P50 Repetition Suppression

Initial P50 amplitude was marginally reduced for SZ compared to HC [F(1,50) = 4.00, p = 0.051; Fig. 1]. In the ANOVA, SZ also showed reduced P50 amplitudes compared to HC [main effect of group; F(1,50) = 4.06, p < 0.05)]. RS was present in both groups [main effect of serial tone positon; F(4,200) = 7.82, p < 0.001], and there was no difference in RS between groups (serial tone positon x group interaction; p’s > 0.1; d = 0.23). RS was characterized by reduced P50 response for S2-S5 compared to S1 (p’s < 0.05), with little change in P50 amplitude after the first repetition. A 4-way interaction was detected, indicating subtle group differences in between-stimulus effects on P50 topography [F(8,400) = 2.31, p < 0.05]. There was no difference in S1/S2 P50 ratio between groups (p > 0.1), and there were no differences between groups or between stimuli when comparing P50 ratios across all four stimulus repetitions (p’s > 0.1; Fig. 1).

Fig. 1
figure 1

Grand-average event-related potentials for healthy controls and schizophrenia patients, filtered from 10 to 70 Hz for measurement of P50 response suppression. Average response to the entire 5-tone group is shown in (a), where individual tone onset times are depicted by dotted vertical lines. Healthy controls are shown in black and schizophrenia patients are shown in gray. Responses to individual tones (S1–S5) for healthy controls and schizophrenia patients are shown in (b) and (c), respectively. The time window used for measurement of mean P50 amplitude is depicted by a gray box in each panel

N100 Repetition Suppression

There was no difference in initial N100 amplitude between groups (p > 0.1). In the ANOVA, there was no difference in overall N100 amplitude between groups (main effect of group; p > 0.1), and there was no difference in N100 RS between groups as indicated by non-significant interaction term (serial tone positon x group interaction; p > 0.1; d = 0.23, Fig. 2). RS was present in both groups (main effect of serial tone positon; F(4,200) = 6.25, p < 0.001); however, unlike P50, N100 responses were not fully suppressed until S3 (S3–S5: p’s < 0.05), while S2 was reduced at trend-level (p = 0.09, Fig. 2). Additionally, an electrode chain x stimulus interaction [F(4,200) = 3.87, p < 0.01] indicated greater difference between frontal and frontocentral electrodes (more positive N100 in frontal electrodes) for S3 and S4 than for S1, S2, and S5.

Fig. 2
figure 2

Grand-average event-related potentials for healthy controls and schizophrenia patients, filtered from 0.5 to 20 Hz for measurement of N100 and P200 response suppression. Average response to the entire 5-tone group is shown in (a), where individual tone onset times are depicted by dotted vertical lines. Healthy controls are shown in black and schizophrenia patients are shown in gray. Responses to individual tones (S1-S5) for healthy controls and schizophrenia patients are shown in (b) and (c), respectively. The time windows used for measurement of mean N100 and P200 amplitudes are depicted by gray boxes in each panel

P200 Repetition Suppression

Initial P200 amplitude was not significantly different between groups (p > 0.1). A chain x group interaction [F(1,50) = 4.84 p = 0.032] indicated frontocentral distribution of initial P200 amplitude in HC [F(1,25) = 27.90 p < 0.001], but not SZ (p > 0.1). In the ANOVA, overall P200 amplitude was slightly reduced in SZ compared to HC, but these differences were not statistically significant (main effect of group; p > 0.1). There were no between-group differences in P200 RS (serial tone positon x group interaction; p > 0.1; d = 0.29). RS was present in both groups [main effect of serial tone positon; F(4,200) = 67.50, p < 0.001, Fig. 2]. Like P50 repetition suppression, responses to S2-S5 were significantly suppressed compared to S1 (p’s < 0.001), with little change in P200 amplitude after S2. Differences in P200 topography were indicated by significant interactions between electrode laterality and stimulus position [F(8,200) = 3.91, p < 0.001] and between electrode laterality and electrode chain [F(2,100) = 12.12, p < 0.001].

Mismatch Negativity

MMN responses were reduced for SZ compared to HC (Fig. 3), and deficits were similar for pitch and duration MMN. SZ MMN responses were approximately 2/3 the amplitude of HC MMNs (Pitch MMN: 38% reduction, F(1,48) = 13.78, p < 0.01; Duration MMN: 35% reduction, F(1,48) = 5.03, p < 0.05).

Fig. 3
figure 3

Mismatch negativity (MMN) responses for healthy controls and schizophrenia patients. Pitch MMN is shown in left panels (a, c, e), and duration MMN is shown in right panels (b, d, f). Healthy controls are shown in black and schizophrenia patients are shown in gray. Panels (a) and (b) show grand-average event-related potentials in response to frequent standard (solid lines) and deviant (broken lines) stimuli for healthy controls, while panels (c) and (d) show the same responses for schizophrenia patients. Panels (e) and (f) show the difference wave (standard minus deviant) from which mean MMN amplitudes were calculated. Time windows used for measurement of mean pitch and duration MMN amplitudes are depicted by gray boxes in each panel

Sequential Regression Analysis

Table 2 and Fig. 4 display the correlations among variables and Table 3 displays the unstandardized regression coefficients (B), standardized regression coefficients (β), and partial correlations for the sequential regression model predicting pitch MMN from all three IVs (N100 RS, P200 RS, and schizophrenia diagnosis). The regression model was statistically significant [R 2 = 0.40, F(3,46) = 12.04, p < 0.001]. P200 RS accounted for a significant proportion of the variance over and above N100 RS [ΔR 2 = 0.13, F(1,47) = 8.98, p < 0.01]. More importantly, schizophrenia diagnosis accounted for a significant proportion of the variance over and above N100 RS and P200 RS [ΔR 2 = 0.14, F(1,46) = 11.75, p < 0.001]. Pitch MMN amplitude increased by 0.77 ± 0.25 µV for every microvolt of N100 RS [t(48) = 3.08, p < 0.01] and by 0.53 ± 0.23 µV for every microvolt of P200 RS [t(47)=-2.34, p < 0.05], and SZ pitch MMN was 1.56 ± 0.45 µV smaller than HC [t(46) = 3.43, p < 0.001]. Schizophrenia diagnosis shared less than 8% variance with N100 RS and P200 RS in prediction of pitch MMN (tolerance = 0.923), indicating that these variables independently predicted pitch MMN amplitude. N100 RS and P200 RS shared <1% variance in prediction of pitch MMN (tolerance = 0.996) .

Table 2 Pearson correlations among diagnosis, mismatch negativity (MMN), and repetition suppression (RS) measures
Fig. 4
figure 4

Correlations between mismatch negativity (MMN) amplitude and repetition suppression (RS). Correlations with N100 RS are shown in the upper panels, while correlations with P200 RS are shown in the lower panels. Correlations with pitch MMN are shown in the left panels, and correlations with duration MMN are shown in the right panels. Healthy controls are depicted by filled circles and schizophrenia patients are depicted by open circles. The solid lines indicate the linear trend across both groups

Table 3 Sequential regression of repetition suppression (RS) and participant diagnosis (Dx) on pitch mismatch negativity (MMN) amplitude

Table 2 and Fig. 4 display the correlations among variables and Table 4 displays regression coefficients and partial correlations for the sequential regression model predicting duration MMN from P200 RS and schizophrenia diagnosis. The full regression model was statistically significant [R 2 = 0.32, F(1,47) = 11.21, p < 0.001], and schizophrenia diagnosis accounted for a significant proportion of the variance in duration MMN over and above P200 RS [ΔR 2 = 0.06, F(1,47) = 3.90, p = 0.05]. Duration MMN amplitude increased by 0.88 ± 0.24 µV for every microvolt of P200 RS [t(48) = 3.65, p < 0.001], and SZ duration MMN was 0.94 ± 0.48 µV smaller than HC [t(47) = 1.97, p = 0.05]. As in the model predicting pitch MMN, RS and schizophrenia diagnosis shared very little variance (<7%) in prediction of duration MMN (tolerance = 0.933).

Table 4 Sequential regression of repetition suppression (RS) and participant diagnosis (Dx) on duration mismatch negativity (MMN) amplitude

Discussion

The results of this study indicate schizophrenia-related deficits in the encoding of prediction error, but not model sharpening, and strongly suggest independence of schizophrenia-related deficits in RS and MMN. RS was not significantly different between groups and RS effect sizes were small (<0.3) at stimulation rates typically used in MMN experiments, and correlations between RS and MMN were only found for AEPs that overlap the MMN in time. These correlations can therefore be explained purely mathematically, as MMN is calculated by subtracting the average standard stimulus response from the average deviant stimulus response. Furthermore, although RS significantly predicted MMN, schizophrenia diagnosis predicted MMN amplitude over and above the effect of RS, as indicated by significant R 2 change, and these variables shared very little variance (<7%) in the prediction of MMN.

We provide evidence that the observed MMN deficits are independent of RS. P50 RS, N100 RS, and P200 RS were clearly evident for both groups and RS progression was similar between groups, with fully-suppressed P50 and P200 at S2 and continued suppression of N100 from S2 to S3. Thus, we are confident that schizophrenia-related deficits in the encoding of prediction error in our sample are not driven by deficits in model sharpening. Our findings suggest a specific deficit in detecting incongruity between modeled and perceived auditory environment in schizophrenia. Individuals with schizophrenia in this study were able to form a predictive model and generated small MMN responses to both pitch and duration deviants, but MMN response amplitudes were just over ½ the size of those measured for matched healthy controls. This finding confirms that prediction error (MMN) is abnormal in individuals with long-term schizophrenia, even under conditions where adaptation of responses to repetitive standard stimuli (RS) is unimpaired.

Contrary to our original predictions, RS measures were not significantly different between groups, and between-group effect sizes were rather small. RS deficits have been identified in schizophrenia in many previous studies (Bramon et al. 2004); however, some reports suggest that this deficit is not ubiquitous (Guterman and Josiassen 1994; Clementz and Blumenfeld 2001). The lack of RS deficits observed here may be related to the specific stimulus parameters used. Although the vast majority of human RS studies use a paired-click paradigm to elicit RS effects, we chose a paradigm that more closely matches the MMN task because we were interested in disentangling RS and deviance detection on an MMN task. It is possible that auditory stimulation with 5-tone groups elicits different neurophysiological effects than stimulation with paired clicks. Furthermore, we may have simply been underpowered to detect RS differences in this study. For example, Light et al. (Light et al. 2012) found between-group effect sizes for MMN that were 2–3 times those for P50 and N100. In this study, the MMN differences were quite strong and RS differences were modest at best, so we do not think that this is the primary source of the observed effects. Finally, it should be noted that the current design was implemented to maximize comparability of RS and MMN findings, but we are unable to make inferences about RS of fully-saturated AEPs (i.e. completely unsuppressed AEPs) in this study. We chose to separate tone groups by only 750 ms here, as opposed to the 5–10 s ITI normally used in studies of RS for paired-clicks/tones, to (closely) match the separation of standard tones by an intervening deviant in the MMN paradigm. Moderate RS is known to occur at this interval (Dolu et al. 2001), so it is likely that the initial tone (S1) was still suppressed to some degree by prior tones. Indeed, Baldeweg showed that stimulus repetition effects continue to build after the initial few repetitions (Baldeweg 2007), while RS was largely resolved after the second or third stimulus in this study. Future studies should compare MMN and RS using ITIs that place the first tone of the group well outside of reported effects of RS.

In conclusion, these results implicate deficits in prediction error, but not model sharpening processing in SZ at stimulation rates typical of MMN studies. This finding has strong implications for the understanding of perceptual learning deficits in schizophrenia. MMN is severely diminished in SZ and this deficit is correlated with measures of cognitive (Baldeweg et al. 2004) and functional impairment (Light and Braff 2005). We therefore suggest that auditory predictive modeling deficits in schizophrenia are specific to learning from prediction errors.