Introduction

We exchange information with other human beings every day, be it at work or times of leisure, face-to-face or on the telephone. That is, dialogs present a very common form of communication. However, the syntactic, semantic and prosodic organizing principles as proposed for single sentences are not sufficient to fully capture the structure of utterances spanning beyond sentence borders. For this reason, Halliday (1967) introduced the term `information structure` to account for the linking of sentences beyond ‘punctuation marks’. In a simplistic way, the information structure of a dialog can be subsumed into parts comprising new or contextually non-derivable information (e.g., contrastive statements) and parts containing information that has already been encountered earlier during an on-going dialog or can be inferred from the context of the conversation. The dialog parts encompassing new or contrastive information are often referred to as ‘information focus’. On the other hand, previously mentioned or contextually deducible information is referred to as non-focus, given or shared information (Chafe 1974). Within a dialog conversation, the proportion and content of focused and non-focused information are subject to constant fluctuation. When new or contrastive information is introduced for the first time, it is most relevant for updating the common ground or shared knowledge between conversation partners and is thus focused. Subsequently, these information units themselves turn into the common ground, non-focused part of the message when they are resumed (Grosz and Sidner 1986). That is, each bit of information is immediately influenced by the context preceding it.

In spoken dialogs, speakers highlight focus positions to indicate their relevance and render them more salient and accessible for listeners. In intonation languages like English, Dutch and German speakers make use of prosodic means like accentuation to do so (Chafe 1974; Birch and Clifton 1995; Cutler et al. 1997 for review; Féry and Kügler 2008; Ladd 2008).

Studies questioning whether and how rapid the brain is able to discern focused from unfocused information have shown that adults yield a characteristic event-related potential (ERP) when perceiving focus positions (Bornkessel et al. 2003; Stolterfoht et al. 2007; Toepel et al. 2007; Toepel et al. 2009). This ERP deflection with positive-going amplitude starts to evolve ~300–500 ms after the onset of the focus position and is most pronounced at centro-parietal electrodes. The ERP has been termed Focus Positivity (Bornkessel et al. 2003; Stolterfoht et al. 2007) or Focus Positive Shift (FPS; Toepel et al. 2007; Toepel et al. 2009). When employing auditory stimulation in particular, it was found that the focus positive shift is elicited independent of whether a focus is adequately marked by prosodic means or not (Toepel et al. 2007; Toepel et al. 2009). That is, adults readily exploit the context of an utterance to derive focused information in order to update knowledge that is shared between conversation partners. However, the abovementioned and other studies (Bögels et al. 2010; Bögels et al. 2011; Li et al. 2008; Li et al. 2010) also revealed an additional ERP, i.e., an N400, when prosodic means were in conflict with the focus structure of utterances, but also when the focus prosody-interplay was in conflict with sentence semantics (Wang et al. 2011). The N400 is an ERP component often reported in relation to inconsistencies or conflicts within (e.g., semantics) and between (e.g., semantics/syntax and prosody) linguistic interpretation levels (Brandeis et al. 1995; Steinhauer et al. 1999; for reviews Friederici and Alter 2004; Hagoort 2008).

Developmental ERP modulations in sentence perception were thus far limited to aspects of semantic, syntactic and prosodic processing in single sentence (Atchley et al. 2006; Hahne et al. 2004; Holcomb et al. 1992; Männel and Friederici 2009; Oberecker and Friederici, 2006; Pannekamp et al. 2006). Among these studies Holcomb and colleagues (1992) and Hahne and colleagues (2004) compared responses of children with varying age under identical semantic and syntactic processing requirements. Holcomb and colleagues (1992) presented six age groups (ranging from 5 to 16 years) with semantically correct and incorrect sentences, and found a quasi-linear decrease in latency and amplitude of the N400 component to the semantic processing conflict. While younger children displayed broadly distributed N400 s ranging from anterior to posterior sensors, the ERP effect was more restricted to posterior sites with increasing age. Hahne and colleagues (2004) presented five age groups of children (ranging from 6 to 13 years) with semantically and syntactically correct or incorrect sentences in German. As the study by Holcomb et al. (1992), they reported latency reductions in the N400 by age for the semantic processing conflict. Moreover, the syntactic processing conflict induced a biphasic pattern of an early left anterior negativity (ELAN) and a late positive-going ERP effect (P600) as usually observed in adults (cf. Friederici 2004) only after the age of seven. As for the N400, latency decreases with age were further found for the biphasic ERP pattern. The results of these studies were taken as evidence that the brain mechanisms underlying language processing are still subject to changes during school years, with adult-like patterns only revealed during late childhood and puberty.

Evidence as to developmental changes during the processing of connected utterances is still sparse. In a functional MRI study, Dapretto et al. (2005), compared behavioral and brain responses in 8-year-olds and adults. Participants listened to short question–answer dialogs, containing two information types (logical reasoning and topic maintenance), and judged them for coherence between question and answer. Under the condition employing logical reasoning, participants were presented with context questions (Why are you wearing a raincoat?) followed by either a logical answer (So I won’t get wet.) or an illogical one (So I won’t get tired.). In the topic maintenance condition, on the other hand, subjects listened to questions (Do you believe in angels?) followed by a topic elaboration (I have my own special angel.) or a sudden topic change (I have my own special sandwich.). Behavioral data showed that children were slower than adults in judging both conditions for coherence. Moreover, children were reliably less accurate than adults when assessing topic maintenance, but only marginally worse than adults during logical reasoning. However, despite of the behavioral results, children and adults showed very similar patterns of brain activity. In both age groups, logical reasoning induced strongly left-lateralized fronto-temporal activations while the topic maintenance condition addressed a bilateral fronto-temporal network, yet yielding stronger responses in the right hemisphere. That is, despite of behavioral differences, children and adults engaged relatively similar brain networks for the dialog processing tasks. On the other hand, perceiving the varying information types did induce differing behavioral consequences and was also supported by (partly) differing brain regions.

Other developmental research further attested an asymmetry in the acquisition of the prosodic marking of varying information types. In particular, children show an earlier mastery of the production of contrastive information prosody from around 5 years of age, while evidence on the accentuation of new information remains elusive (see Chen 2010 for an extensive discussion). In terms of comprehension, on the other hand, developmental studies most often employed off-line behavioral tasks and instances of contrastive prosody, and rather remain inconclusive as to whether preschoolers are already able to interpret focal accents (Chen 2010). However, a study by Wells et al. (2004) that investigated the interpretation of contrastive accents in four age groups from 5 to 13 years showed a gradual improvement ranging until teenage.

The Current Study

To our knowledge, no study has hitherto consistently compared children’s focus perception abilities in the presence and absence of adequate prosodic focus accentuation across different focus types and age groups. In particular, we investigated ERP markers in 12-, 8- and 5-year-old children when processing two types of naturally and frequently occurring information types, i.e., new information focus and contrastive focus in the form of corrections. That is, the study was centered on the question whether children are able to derive focus information by exploiting contextual cues or whether they rely on overt prosodic markings to detect information foci in dialogs. In order to detail the roles of dialog context and prosody in focus perception, both focus types (news and corrections) were presented with adequate and inadequate prosodic realizations. By doing so, we aimed to track the developmental course of brain markers to the perception of information foci, and compare them to our previous findings in adults (Toepel et al. 2007; Toepel et al. 2009).

We hypothesized that the oldest age group (i.e., 12-year-olds) reveals a Focus positive shift (FPS) as a correlate of contextually triggered new information and correction focus positions as previously shown in adults. As in adults, the FPS should be elicited irrespective of whether the focus position is adequately marked by prosody or not. In the younger age groups, the FPS occurrence was supposed to vary as a function of focus type (news or corrections) and the presence of adequate prosodic focus markings. However, according to extant behavioral findings (summarized in Chen 2010), an ERP marker to the perception of corrections should be observed earlier during development than to new information processing. As previous ERP studies reported N400 responses even in young children when sources to speech interpretation are in conflict (e.g., Holcomb et al. 1992; Hahne et al. 2004), we further assumed an N400 to occur in each age group whenever focus positions bearing inadequate prosodies are encountered. Yet, the latency of the component was expected to decrease with age (e.g., Hahne et al. 2004).

Materials and Methods

Participants

Three groups of native German-speaking children were investigated: 12-year-olds (n = 31; 15 male, 16 female), 8-year-olds (n = 27; 13 male, 14 female) and 5-year-old preschoolers (n = 36; 21 male, 15 female). The oldest age group was right-handed according to the Edinburgh Handedness Inventory (Oldfield 1971); the younger children were preferentially right-handed according to parental report. None of the children had known neurological or hearing disorders or had been diagnosed at risk for specific language impairments. Written consent for participation was given by the parents of each participant. The children were paid for their cooperation.

Dialog Materials

The dialogs presented were identical to those reported in Toepel et al. (2009). That is, four dialog conditions were formed by combining a context question and an answer as target sentence (see Table 1). Two types of context questions were presented to listeners, i.e., either inducing a new information focus or a correction focus in the target sentence. The question requesting new information contained a wh-pronoun (“Whom did Thomas ask?”) prompting a focus in the consecutive answer (“Thomas did ask Lisa.”). The correction context question, on the other hand, introduced a dialog referent (“Did Thomas ask Anne?”) which was corrected in the successive answer (“Thomas did ask Lisa.”). Each answer was realized with either a context-adequate prosody (i.e., accentuation pattern of new information focus or correction focus, respectively) or context-inadequate ‘common knowledge’ prosody. As detailed in Table 1, the children were presented with four listening conditions, each comprising of 40 dialogs: (1) New information focus with adequate prosody, (2) Correction focus with adequate prosody, (3) New information focus with inadequate prosody, and (4) Correction focus with inadequate prosody.

Table 1 Examples of the German dialog materials and quasi-literal translations into English. Target focus positions bearing adequate prosodies are underlined and highlighted by CAPITALS. Focus positions that do not convey an adequate accentuation are only underlined

For producing the dialog prosodies, two trained female speakers of Standard German were asked to mimic a dialog situation in a sound-attenuated booth. Speech recordings were done at a sampling rate of 44.1 kHz (16 bit, mono). Each sentence was saved in an individual file; loudness was consecutively adapted. Analyses of the durational and fundamental frequency (F0) were carried out with the PRAAT software (www.praat.org) for the different target sentence (answer) prosodies. In Table 1 and the following paragraphs we make use of underlining to point out (contextually derivable) focus positions and capital letters to indicate focus positions that are adequately marked by prosodic means.

Duration Analyses

The overall duration of answers bearing new information prosody was shorter than those of sentences with context-inadequate prosody (mean [SD] = 1551 ms [96.7] vs. 1665 ms [114.0]; t (78) = −4.16; P ≤ .01). Likewise, the duration of answers carrying correction focus prosody was shorter than the duration of sentences with context-inadequate prosody (mean [SD] = 1572 ms [96.6] vs. 1665 ms [114.0]; t (78) = −3.07; P ≤ .01). Focus elements bearing the prosody of new information (“LISA”) were produced with significantly longer durations than their context-inadequate counterparts (mean [SD] = 408 ms [50.0] vs. 362 ms [49.2]; t (78) = 4.21; P ≤ 0.01). Similarly, focus elements carrying the correction focus accentuation (“LISA”) were longer than those produced with a context-inadequate prosody (mean [SD] = 426 ms [58.8] vs. 362 ms [49.2]; t (78) = 5.33; P ≤ 0.01). That is, both information focus types were longer in duration when bearing focus prosody than when the same sentence element was realized with a context-inadequate prosody signaling ‘common knowledge’.

F0 Analyses

Figure 1 illustrates the mean F0 course across the 40 target sentences per condition (left: new information focus realized with context-adequate vs. context-inadequate prosody, right: correction focus with context-adequate vs. context-inadequate prosody). For this purpose, the onset, minimal, maximal and offset F0 values for three sentence parts (“Thomas did”, “Lisa” and “ask”) were extracted and averaged. In the position of the focused noun (“LISA”) the new information accentuation (left panel: blue line) is realized with a rising F0 contour. The correction prosody (right panel: green line) is produced with a pronounced falling-rising F0 pattern. The context-inadequate prosody (red line in both panels) on the focused noun (“Lisa”) is realized with a slight fall-rise F0 contour. T-test statistics on the F0 movement over the focused noun revealed that the tonal movement is more pronounced for the context-adequate new information prosody than for the context-inadequate prosody (mean [SD] = 66.63 Hz [29.37] vs. 41.3 Hz [29.97]; t (39) = 3.75, P ≤ 0.05). Also, the tonal movement on the focus element bearing the adequate correction prosody is more prominent than on the prosodically inadequate focus (mean [SD] = 96.88 Hz [47.99] vs. 41.3 Hz [29.97]; t (39) = 6.01, P ≤ 0.05).

Fig. 1
figure 1

Left Mean F0 course over the dialog target sentences conveying new information foci with adequate prosodies (blue line) or inadequate prosodies (red line). Right Mean F0 course over the target sentences with correction foci bearing adequate prosodies (green line) or inadequate prosodies (red line)

Experimental Procedure

The dialogs were presented to the children in random order via loudspeakers while they were seated in a comfortable chair in front of a computer monitor. The participants were asked to look at the monitor and to listen attentively. Each dialog trial started with the presentation of a context question followed by an answer target sentence after an inter-stimulus-interval (ISI) of 2000 ms. The pause between dialog trials was 3000 ms. For the 12-year-olds, a crosshair was present on the screen during all dialogs to avoid ocular movements. Following each dialog, a blink phase of 3000 ms was indicated by the presentation of a smiley on the screen. After a random number of trials (eight times per experimental session), a question mark signaled that the experimenter would ask a simple comprehension question regarding the dialog heard just before (e.g., “What did Thomas do?”). Pilot recordings in the younger children indicated that the presence of a crosshair was not sufficient to assure constant gaze direction to the screen. Thus, a silenced movie showing an aquarium was presented on the screen throughout the experiment for the 5- and 8-year-olds participants. After a random number of dialog trials (again 8 times per session), the movie was interrupted and a question mark signaled that the experimenter would ask the comprehension question. For each age group, an experimental session lasted approximately 30 min (no longer than 90 min including electrode preparation).

EEG Recordings and Analyses

The EEG was recorded from AgAgCl cap-mounted electrodes according to the 10–20 system (12y and 8y: 26 channels; 5y: 23 channels) with the system’s ground above the sternum. The vertical electrooculogram (VEOG) was recorded from electrodes placed above and below the right eye. The horizontal electrooculogram (HEOG) was recorded from positions at the outer canthus of each eye. Electrode impedances were kept below 5 kΩ. The EEG was acquired with XREFA amplifiers at a sampling frequency of 500 Hz. Recordings were online referenced to the left mastoid and offline re-referenced to average reference (Murray et al. 2008). Offline, EEG epochs containing eye and muscle artifacts and other noise transients were semi-automatically scanned and rejected. A band-pass filter from 0.1 to 40 Hz was applied to each single subject data set.

The EEG data were averaged per participant and condition between −100 and 1000 ms relative to the onset of the focused noun. Baseline correction was applied to the time period from −100 to 0 ms relative to the focus position (“Lisa”). In a second step, group averages were computed for each condition across subjects. All EEG analyses were carried out with the Cartool Software (http://sites.google.com/site/fbmlab/cartool).

ERP Analysis Strategy

For each age group separately, we first conducted millisecond- and electrode-wise paired t-tests comparing the perception of new information focus realized with adequate versus inadequate prosody and the processing of correction focus with adequate versus inadequate prosody, respectively. Only time periods showing effects (P ≤ 0.05) longer than >30 ms (Guthrie and Buchwald 1991; see also Khateb et al. 2010; Laganaro and Perret 2010) were considered reliable. In line with descriptively observed ERP waveform variations these periods served as time windows (TW) of interest for the consecutive regions of interest (ROI) statistics.

For the ROI-wise analyses, ERP mean values were computed for each condition and six lateral ROIs (in accord with the analysis array previously chosen in adults; Toepel et al. 2009). The ROIs computed on the data of the 12- and 8-year-olds were anterior left (FP1, F7, F3), anterior right (FP2, F8, F4), central left (FC3, FT7, C3, T7), central right (FC4, FT8, C4, T8), posterior left (CP5, P7, P3, O1) and posterior right (CP6, P8, P4, O2). In addition, the midline electrodes (FPz, Fz, Cz, Pz) entered the analysis as single electrodes. Due to the lower number of recorded electrodes ROI contents in the 5-year-olds were slightly different, i.e., anterior left (FP1, F7, F3), anterior right (FP2, F8, F4), central left (FC3, C3, T7), central right (FC4, C4, T8), posterior left (CP5, P7, P3, O1) and posterior right (CP6, P8, P4, O2). In analogy to the older age groups, the midline electrodes (Fz, Cz, Pz) entered the analysis as single electrodes. For the statistics on the lateral electrodes, separate repeated measures ANOVAs for each focus type were conducted with the factors prosody (adequate vs. inadequate), region (anterior, central and posterior) and hemisphere (left and right). For the analysis on the midline electrodes, the ANOVA comprised of the factors prosody and electrode.

When results of an ANOVA on the lateral electrodes indicated interactions between the factors prosody and region, three hemisphere-independent ROIs were computed for a post-hoc ANOVA, i.e., anterior (comprising of the anterior left, midline anterior and anterior right electrodes), central (central left, midline central and central right electrodes), and posterior (posterior left, midline posterior and posterior right electrodes). For dissecting interactions between the factors prosody and hemisphere, separate hemisphere ROIs (left and right) entered a post-hoc ANOVA consisting of mean values across all left-lateral and all right-lateral electrodes.

In addition, the latency of the maximal negative peak at the Pz electrode over the post-stimulus period was identified in each subject and condition, so as to replicate developmental variation in latency of the N400 component (Hahne et al. 2004). Separate one-way ANOVAs were computed for each information type (new vs. correction focus) with age group as between-subject factor. When observing significant effects, independent samples t-tests between two age groups at a time served to reveal the directionality of the effect.

Results

In the following, we present the ERP results for the group of the 12-year-olds, the 8-year-olds and the 5-year-olds in succession. Within age groups, the analyses of responses to new information focus (realized with adequate vs. inadequate prosody) will precede the results obtained for the perception of correction focus (marked by adequate vs. inadequate prosody). Difference voltage maps are provided for the time windows (TW) in which effects of conditions or interactions of the factor condition and hemisphere or region were yielded. Finally, developmental effects on the latency of the N400 are exemplified at the Pz electrode.

Twelve-Year-Olds

Figure 2 shows ERP waveforms, the results of the electrode-wise t-tests and the ERP difference maps (adequate–inadequate prosody) in the group of 12-year-olds when perceiving new information foci (left panel) and correction foci (right panel).

Fig. 2
figure 2

Responses in 12-year-olds when perceiving dialog foci. a ERP waveforms (low-pass filtered with 7 Hz for display). Left ERPs to new information foci bearing adequate (blue line) versus inadequate prosody (red line). Right ERPs to correction foci adequately (green line) versus inadequately (red line) marked by prosodic means. b Results of the millisecond- and electrode-wise paired t-tests between ERPs to the contextually adequate versus inadequate prosody. c Difference maps over periods revealing prosody-induced statistical differences

New Information Focus:

The perception of new information foci bearing a context-adequate prosody (Fig. 2a, left panel: blue line) elicited a positive shift peaking at ~600 ms after the focus onset (“Lisa”) at central-posterior electrodes. When processing new information foci realized with inadequate prosody (Fig. 2a, left panel: red line), the 12-year-olds revealed a widely distributed negativity peaking ~400 ms. This negativity was immediately followed by a positive ERP modulation most pronounced at central-posterior electrodes.

Initial millisecond-wise paired t-tests across all electrodes (left panel of Fig. 2b) revealed ERP differences between the prosodically adequate and inadequate focus version over the 280–520 ms and the 580–720 ms interval after focus onset. These TW of interest thus entered the ROI-wise ANOVA. In the TW from 280 to 520 ms a main effect of prosody was present at lateral (F (1, 30) = 7.17; P < 0.012) and midline electrodes (F (1,30) = 4.64; P < 0.039). In the TW from 580 to 720 ms a marginal main effect of prosody was evident at lateral electrodes (F (1,30) = 3.92; P < 0.057).

Correction Focus:

The correction foci carrying a context-adequate prosody (Fig. 2a, right panel: green line) elicited a posterior positive ERP starting ~500 ms after the onset of the focus position. In contrast, the focus position realized with an inadequate prosody (right panel: red line) induced a centro-posterior negativity peaking ~400 ms followed by a posterior positive-going ERP shift.

Initial millisecond-wise paired t-tests (right panel of Fig. 2b) indicated ERP modulations over the 280–420 ms, the 460–520 ms, and the 720–900 ms intervals after focus onset. The ROI-wise ANOVA revealed a main effect of prosody in all three TW of interest at lateral electrodes (280–420 ms: F (1,30) = 7.09; P < 0.012; 460–520 ms: F (1,30) = 13.97; P < 0.001; 720–900 ms: F (1,30) = 4.70; P < 0.038). Additionally, a prosody × region interaction was present at lateral electrodes in the TW from 280 to 420 ms (F (2,60) = 4.58; P < 0.029) and 720–900 ms (F (2,60) = 6.58; P < 0.010). Resolving the interaction in the TW from 280 to 420 ms resulted in a main effect of prosody in the anterior (F (1,30) = 4.28; P < 0.047) and posterior ROI (F (1,30) = 5.48; P < 0.026). In the TW from 720 to 900 ms the post-hoc ANOVA revealed a main effect of prosody in the central (F (1,30) = 8.58, P < 0.006) and posterior ROI (F (1,30) = 8.91, P < 0.006).

Eight-Year-Olds

Figure 3 displays the ERP waveforms, the results of electrode-wise t-tests and the ERP difference maps (adequate–inadequate prosody) for the group of 8-year-olds when encountering new information foci (left panel) and correction foci (right panel).

Fig. 3
figure 3

Responses in 8-year-olds when encountering dialog foci. a ERP waveforms (low-pass filtered with 7 Hz for display). Left ERPs to new information foci adequately (blue line) versus inadequately (red line) marked by prosodic means. Right ERPs to correction foci bearing adequate (green line) versus inadequate (red line) prosody. b Results of the millisecond- and electrode-wise paired t-tests between ERPs to contextually adequate versus inadequate prosody. c Difference maps over periods revealing prosody-induced statistical differences

New Information Focus:

When encountering the focus (“Lisa”) realized with a context-adequate prosody (Fig. 3a, left panel: blue line), the 8-year-olds did not reveal pronounced negative- or positive-going ERP deflections. The perception of new information foci bearing an inadequate prosody induced a centro-posterior negativity peaking ~500 ms (Fig. 3a, left panel: red line). The negative ERP is followed by a late positive shift most pronounced at posterior electrodes and starting ~800 ms.

The initial paired t-tests (left panel of Fig. 3b) indicated ERP differences over the intervals between 470 and 690 ms and 820–930 ms after focus onset. The successive ROI-ANOVA over the TW from 470 to 690 ms yielded an interaction between prosody and hemisphere at lateral electrodes (F (1,26) = 5.10; P < 0.033); the post-hoc test situated a main effect of prosody over the right hemisphere (F (1,26) = 5.57, P < 0.025). In the TW from 820 to 930 ms, interactions between prosody and hemisphere (F (1,26) = 5.97; P < 0.022) as well as prosody and region (F (2,52) = 4.55; P < 0.036) became evident. Resolving the prosody x hemisphere interaction resulted in effects of prosody over left-sided (F (1,26) = 6.80, P < 0.015) and right-sided electrodes (F (1,26) = 5.63, P < 0.025). The post-hoc ANOVA on the interaction between prosody and region located effects of prosody in the anterior (F (1,26) = 4.83, P < 0.037) and posterior ROI (F (1,26) = 4.62, P < 0.042).

Correction Focus:

The perception of correction foci carrying a context-adequate prosody (Fig. 3a, right panel: green line) induced a slow posterior positivegoing shift starting ~500 ms after the onset of the focus position (“Lisa”). In contrast, encountering the focus realized with an inadequate prosody (Fig. 3a, right panel: red line) resulted in a centrally distributed negativity peaking ~450 ms. The negative ERP deflection was not followed by a positive-going waveform.

Initial millisecond-wise paired t-tests (right panel of Fig. 3b) revealed ERP differences during the time intervals from 420 to 600 ms, 630–720 and 870–940 ms. The successive ROI-wise analyses yielded a main effect of prosody in all three TWs at lateral electrodes (420–600 ms: F (1,26) = 9.83; P < 0.004; 630–720 ms: F (1,26) = 7.88; P < 0.009; 870–940 ms: F (1,26) = 4.87; P < 0.036).

Five-Year-Olds

Figure 4 illustrates the ERP waveforms, t-tests across all electrodes and the ERP difference maps (adequate–inadequate prosody) when 5-year-old preschoolers perceived new information foci (left panel) and correction foci (right panel).

Fig. 4
figure 4

Responses in 5-year-olds when perceiving dialog foci. a ERP waveforms (low-pass filtered with 7 Hz for display). Left ERPs to new information foci bearing adequate (blue line) versus inadequate (red line) prosody. Right ERPs to correction foci adequately (green line) versus inadequately marked by prosodic means (red line). b Results of the millisecond- and electrode-wise paired t-tests between ERPs to contextually adequate versus inadequate prosody. c Difference maps over periods revealing prosody-induced statistical differences

New Information Focus:

In 5-year-olds, neither the focus position bearing a context-adequate prosody (Fig. 4a, left panel: blue line) nor the condition realized with inadequate prosody (left panel: red line) evoked a distinctive positive-going ERP deflection. When perceiving a context-inadequate prosody on new information foci, however, the children showed a temporally and spatially widely distributed negativity (left panel: red line).

The millisecond-wise paired t-tests (left panel in Fig. 4b) revealed ERP modulations over the 250–450 ms and the 520–700 ms interval after focus onset. The successive ROI-based ANOVAs in both TWs of interest evinced a main effect of prosody at lateral electrodes (250–450 ms: F (1,35) = 6.83; P < 0.013; 520–700 ms: F (1,35) = 4.30; P < .045).

Correction Focus:

The children did not show a distinctive positive-going ERP in relation to the focus position (“Lisa”), irrespective of whether the focus was adequately (Fig. 4a, right panel: green line) or inadequately marked by prosodic means (right panel: red line). On the other hand, the correction foci bearing a context-inadequate prosody (right panel: red line) induced a centro-posterior negativity peaking ~400 ms.

Based on the ERP modulations revealed by the millisecond-wise t-test (right panel of Fig. 4b), ROI-wise ANOVAs were computed over the TWs from 380 to 550 ms and 650 to 850 ms after focus onset. In both TWs a main effect of prosody was present at lateral electrodes (380–550 ms: F (1,35) = 15.72; P < 0.000; 650–850 ms: F (1,35) = 6.25; P < 0.017). Further, an interaction between prosody and hemisphere was apparent in the TW from 380 to 550 ms at lateral electrodes (F (1,35) = 7.46; P < 0.010). Resolving the interaction located a main effect of prosody over the left-sided electrodes (F (1,35) = 12.70; P < 0.001).

Age Effects on the Latency of the Negative-Going ERP to Prosodic Inadequacies

Maximum peak latency measures at the Pz electrode served to investigate latency differences in the negative-going ERP deflections induced by context-inadequate prosodic markings within each focus type (new information vs. correction focus) across age groups. The results of these measures are detailed in Table 2, and ANOVAs with the between-subject factor of age questioned the reliability of latency shifts.

Table 2 Mean latency values [±SD] of the maximal negative peak elicited by processing new information or correction foci conveyed with inadequate prosodies. Values for each age group were extracted from the PZ electrode (i.e., an electrode that consistently yielded modulations across all age groups)

For the negative ERP peak induced by perceiving an inadequate prosody on new information foci, the ANOVA revealed an effect of age on peak latency at Pz (F (2,93) = 7.40; P < 0.001). Post-hoc t-tests attested a reliably earlier ERP peak in the 12-year-olds than in the 5-year-olds (t (65) = 3.46; P < 0.001) as well as an earlier negative peak in the 8-year-olds than in the 5-year-olds (t (61) = 3.05; P < 0.003). Maximum peak latency did, however, not differ between the 12- and the 8-year old children.

The ANOVA on the latency of the negative ERP peak at Pz when encountering an inadequate correction focus prosody also showed an age effect (F (2,93) = 3.29; P < 0.042). Post-hoc t-tests showed that the maximum peak was present earlier in the 12-year-olds than in the 8-year-olds (t (56) = 2.60; P < 0.012) as well as in the 5-year-olds (t (65) = 2.12; P < 0.038). On the other hand, ERP peak latency did not differ between the 8- and the 5-year old children.

Summary of the ERP Markers to Focus Perception Across Age Groups

In Table 3, we present an overview of the obtained ERP markers in all age groups. Our results showed that in 12-year-olds the perception of both focus types (news and corrections) resulted in a centro-parietal positive-going ERP starting ~500 ms after the onset of the focus in the target sentence. In keeping with our previous findings in adults using a similar study design (Toepel et al. 2009) we termed the deflection Focus Positive Shift (FPS). As in adults, the FPS in 12-year-olds was elicited irrespective of whether they encountered focus positions that were adequately marked by prosody or not. In contrast, 8-year old children only revealed an FPS when encountering correction foci, and only when the foci were adequately marked by prosodic means. The youngest age group investigated, i.e., 5-year-olds, did not show FPS responses to new information or correction foci even when the focus positions were marked by prosodic means.

Table 3 Summary of the obtained ERP responses to focus perception in the presence (+) and absence (−) of adequate prosodic focus markings across the three investigated age groups

All three age groups did, on the other hand, reveal negative-going ERPs whenever perceiving new information or correction foci that were not adequately marked by prosodic means. In keeping with our previous results (Toepel et al. 2009) and many other studies introduced above, we propose that this ERP reflects N400 responses. Notably, when 8-year-olds encountered new information foci not adequately marked by prosody, the N400 was followed by a late positive-going ERP. We interpreted this biphasic pattern as a N400-P600 response (please see Discussion for reasoning).

Discussion

Our study aimed to investigate ERP markers during the development of language perception beyond sentence borders, i.e., in dialogs. In particular, the study was designed to explore the influence of prosodic highlighting on the recognition of information foci (news and corrections), the latter being a prerequisite for updating information states or shared knowledge in communication. Children of three age groups (12-, 8-, and 5 year-olds) were presented with short question–answer dialogs comprising information foci that were either adequately highlighted by prosodic means or inadequately realized, i.e., without a focus prosody.

We found modulations in the focus-elicited ERPs indicating developmental changes extending into late childhood, i.e., towards a decreased dependence on the prosodic surface realization of information foci, and an increased exploitation of contextual-pragmatic cues for focus recognition. However, the developmental alterations towards adult-like responses did not emerge alike for both information types investigated (news and corrections).

Twelve-Year Olds

In the oldest age group investigated, the perception of new information and correction foci both resulted in a Focus Positive Shift (FPS). In line with findings in adults, the FPS was elicited irrespective of whether the focus position was adequately marked by prosodic means or not. These results indicate that 12-year-olds process focused information independent of its overt prosodic highlighting, and are able to update their state of information by exploiting the dialogic context preceding the target sentence. However, the FPS was preceded by an N400 response whenever the new information or correction foci were not overtly marked by prosodic means. Noteworthy, the dialog target sentences were not prosodically inadequate as such but only with respect to the preceding dialog context.

Similar N400 effects were found in adults when encountering dialog parts that are not adequately marked by prosodic means, i.e. for mismatches between an expected vs. realized prosody (Magne et al. 2005; Toepel et al. 2007). Likewise, prosodic violations within single sentences result in N400 responses in adults (Steinhauer et al. 1999; Eckstein and Friederici 2005; Mietz et al. 2008). Developmental ERP studies on single sentence processing moreover reported evolving N400 patterns for semantic violations (Holcomb et al. 1992; Hahne et al. 2004; Atchley et al. 2006). In line with the ERP waveforms, difference maps (computed for responses to adequate minus inadequate prosody) indicate a broadly more positive-going ERP course over the intervals showing the FPS and N400 responses for both focus conditions when bearing an adequate prosody (Fig. 2c).

Eight-Year-Olds

Overall, the intermediate age group of 8-year old scholars revealed substantially varying ERP patterns in response to new information as opposed to correction foci. When encountering correction foci, the children did show a pronounced FPS starting ~500 ms after focus onset, but only when the focus was adequately marked by prosodic means. When perceiving correction foci without adequate prosodic highlighting, the 8-year-olds displayed an N400 similar to the current findings in 12-year-olds and previous ones in adults (Toepel et al. 2009). That is, the children readily recognize that the presented prosodic contour of the dialog answer does not match the contextually to-be-expected focus intonation. However, unlike in older listeners, the N400 in 8-year-olds was not followed by a distinct FPS response that would indicate focus recognition in the absence of prosodic highlighting, i.e., a quasi-mature pattern.

For the perception of new information foci, we did not find distinctive FPS deflections indicating focus detection. That is, even prosodically highlighted news did not elicit the focus-related brain response found in 12-year-olds. However, when 8-year-olds encountered new information foci that were not adequately marked by prosodic means, they showed a biphasic ERP pattern consisting of an N400 and a positive-going ERP starting ~800 ms. While the N400 is assumed to be elicited by the mismatch detection between the expected and encountered focus prosody, the late ERP with positive amplitude does most likely not reflect a focus-related FPS for several reasons. The 8-year-olds did not show an FPS response even when perceiving prosodically highlighted new information foci. Yet, this exact combination of contextual and prosodic means does provide a much more lucid cue towards the detection of a dialog focus. In addition, the late positivity begins to evolve ~300 ms later than the FPS that was apparent when 8-year-olds encountered (prosodically highlighted) correction foci. Furthermore, the difference map computed over the time window of the late ERP effect (Fig. 3c, left panel: 820–930 ms) exhibits a reversed polarity compared to the difference map computed over the time window of the FPS effect when correction foci are perceived (Fig. 3c, right panel: 870–940 ms). Jointly, these indices suggest that the observed biphasic ERP pattern to the perception of inadequately marked new information foci most likely presents an N400-P600 sequence, e.g., indicating the emerging awareness of 8-year-olds regarding the appropriate prosodic marking of new information foci in information exchange.

In single sentence processing, similar biphasic N400-P600 sequences were hitherto reported in adults when perceiving conflicts between the prosodic and syntactic interpretation level, and interpreted as brain indices for conflict detection (N400) and a concurrent syntactic reanalysis (P600; Steinhauer et al. 1999; Eckstein and Friederici 2005; Mietz et al. 2008). That is, our findings indicate that the new information foci lacking an adequate prosodic highlighting entail processing conflicts in the 8-year-olds. However, whether the so-termed P600 already presents the precursor of an emerging FPS to focus perception can only remain a speculation here.

Five-Year-Olds

The youngest age group investigated, 5-year old preschoolers, did not show distinctive FPS responses to either focus type (news or corrections), independent of whether the foci were adequately highlighted by prosodic means or not. In contrast, the perception of both focus types evoked N400 responses whenever the children encountered focus positions that were not adequately marked by prosodic means. In line with the ERP waveforms, difference maps (Fig. 4c) show a broadly more positive-going ERP course (due to the comparison of responses to adequate minus inadequate prosody) when both perceived focus types were realized with an adequate prosody. As the prosodic inadequacy of the dialog target is the result of its information structural relation with the context question, the N400 response indicates that 5-year old children are able to apprehend the presented dialogs as utterances spanning beyond sentence borders. That is, although the 5-year-olds still do not reveal FPS responses indicative of effective focus recognition they nonetheless reveal emerging brain indices in favor of information structural processing taking place.

Developmental Variation Across Age Groups

The non-uniform ERP responses to focus perception across the three age groups indicate a developmental course towards adult-like patterns throughout childhood and early adolescence. All age groups revealed characteristic N400 responses when encountering target sentence prosodies that were in conflict with dialog contexts. The ERP difference maps over the respective N400 time windows in all groups accordingly reveal distributed amplitude differences (that appear with positive polarity due to the difference computation context-adequate minus context-inadequate prosody). This finding indicates that all investigated age groups readily apprehend dialogs as sequences of utterances connected beyond sentence borders. On the other hand, the topographic distribution and peak latency of the N400 effect varied depending on whether (prosodically unmarked) news or corrections had been perceived.

In response to such unmarked new information foci, the N400 topography was more widely and frontally distributed in the 5- and 8-year-olds than in the 12-year-olds. Peak latency measures showed a reliably earlier N400 maximum in both older age groups compared to the 5-year-olds. When perceiving correction foci lacking adequate prosodic markings, on the other hand, the N400 topography showed a maximum at central electrode locations in the 5-year-olds, but appears to be slightly shifted towards posterior sensors with increasing age. Peak latency measures revealed a reliably earlier N400 peak in the 12-year-olds as compared to both younger age groups. Decreases in N400 latency with age have also been found in studies on single sentence processing (Holcomb et al. 1992; Hahne et al. 2004; Atchley et al. 2006). Further, the topographic modulations are in partial accordance with earlier findings (Holcomb et al. 1992; Atchley et al. 2006) showing posterior shifts of the N400 with increasing age and partly more confined responses in older children. However, since our ROI analyses did not consistently reveal effects of region and EEG recordings across age groups involved differing numbers of electrode sensors, the observed topographic variation rather remains a descriptive one.

In contrast to the N400, the focus-related positive shift (FPS) showed more pronounced and qualitative changes across age. Only the 12-year-olds yielded distinctive FPS deflections irrespective of the encountered focus type and independent of whether the focus was adequately marked by prosodic means or not. That is, only the quasi-adolescents resemble the adult-like FPS pattern observed under identical experimental conditions (Toepel et al. 2009). Eight-year-olds, on the other hand, only showed an FPS in response to prosodically highlighted correction foci. This finding indicates, first, that 8-year-old children still strongly rely on prosodic means to recognize focus positions and are not able to infer an information focus by solely taking contextual cues into account. Moreover, our findings point to a developmental advantage of correction focus over new information focus recognition.

While one reason for the lead of corrections might relate to their more salient prosodic prominence, i.e., an elevated fundamental frequency excursion, an alternative interpretation relates to general differences in the ease to interpret contrasted as opposed to new information. When encountering an information correction, the focus clearly contrasts with a previously stated alternative and an alternative from a finite set of possibilities is singled out, likely easing focus accessibility. On the other hand, the news foci can basically comprise of an infinite number of entities, restricted in our dialog materials only by the respective question pronoun, possibly rendering focus interpretability more challenging (cf. Chen 2010 for a similar suggestion regarding focus prosody production). Recent data on pupillary dilation as a measure of cognitive resource consumption seem to be in favor of such account (Zellin et al. in press). Using identical dialog materials as in our current study, the study reported reliably less pupillary dilation in adults when encountering dialogs with prosodically marked correction foci as compared to all other dialog conditions. The finding indicates that prosodically marked corrections require least cognitive resources in order to be processed, possibly accounting for the ′developmental advantage′ of this dialog condition observed in our current study.

The youngest age group, i.e., 5-year-olds, did not show any FPS responses even when prosodic highlighting supported focus interpretation. The obtained response pattern likely indicates that 5-year-olds are still insensitive to the importance of focus positions in information update between communication partners. However, the presence of an N400 in young children when perceiving news and corrections lacking prosodic highlighting nonetheless signifies an emerging awareness as to the information structure of utterances spanning beyond sentence borders.

Taken together, 8- and 5-year old children seem capable to exploit contextual as well as prosodic cues when processing spoken dialogs. Yet, at these ages children still appear to be limited when it comes to linking these cues in order to recognize information foci, a mechanism that is obligatory for knowledge state updates with communication partners.

In obtaining fine-grained modulations in the brain markers to new and corrective information during development, our findings slightly contradict with the fMRI results of Dapretto et al. (2005) showing very similar cerebral responses in adults and children in discourse processing (i.e., topic maintenance and logical reasoning). Yet, a direct comparison of both studies is limited by paradigmatic differences as well as the fact that our electrode montage does not permit strong speculation regarding likely neural substrates of the observed effects. Several magnetic resonance imaging studies have lately been concerned with brain-structural development. Although these studies still do not convey a comprehensive view on specialization, plasticity and connectivity patterns in brain ontogenesis, maturational changes in frontal and temporal-parietal cortices as well as in white matter structures are commonly reported that extend late into adolescence (Paus 2005; Ernst and Mueller 2008 for reviews; see also Brandeis et al. 2011; Dosenbach et al. 2011 for a recent discussion on brain maturation markers in EEG and fMRI). Since processing complex language sequences like dialogs and larger discourse is a challenging task involoving an extended brain network (Hagoort and Van Berkum 2007), there is good reason to assume that its proficient accomplishment indeed requires long-lasting brain maturation.

Thus far, the interplay of information focus and prosody during language perception has not been implemented into a common processing model. The Neurocognitive Model of Auditory Sentence Processing (Friederici 2002) and the Dynamic Dual Pathway Model (Friederici and Alter 2004) are representations incorporating - besides a route comprising of phonological, syntactic and semantic stages—a prosodic route in single sentence processing. This prosodic route is supposed to be activated in parallel to the aforementioned stages. There is still no consensus as to the temporal convergence of information from the processing routes, yet evidence in favor of early (~200 ms; Eckstein and Friederici 2006) and later interplay (~400 ms; Steinhauer et al. 1999; Eckstein and Friederici 2005) has been presented. On the other hand, the extended Unification Model (Hagoort and Van Berkum 2007) details the influence of sentence- and discourse-level context mostly based on evidence from visual language perception. That is, a comprehensive model integrating information structural processing and influences of prosody therein is still to be developed. Our data indicate that such model also needs to consider parameter ‘weighting’ as all cues to language interpretation are not equally influential and efficiently used by children throughout the development of communication abilities.

Conclusion

Information processing abilities gradually develop throughout middle and late childhood as revealed by age-varying patterns of the focus-related ERP (FPS) to new information and correction foci that are marked or unmarked by prosodic means. With increasing age, children shift from prosody-dependent focus recognition to a more prosody-independent adult-like processing strategy when encountering spoken utterances extending beyond single sentences. However, even younger children show an N400 response when encountering focus positions that lack overt prosodic highlighting, indicating that they readily apprehend dialogs as utterances spanning beyond sentence borders.