Introduction

While chronometric behavioural experiments allow to investigate the end point of the time-course of cognitive processes through the analysis of treatment costs in terms of reaction time, the continuous measure of brain activity in event-related brain potentials (ERP) studies allows direct and temporally precise insight into the cognitive processes during their execution. Investigating conceptually driven speech production raises the question of the choice of paradigms in ERP studies. Actually, most ERP studies applied to speech production during picture naming have avoided using overt production because of possible artefacts during motor preparation or execution, therefore leading to tasks or to ERP analyses which only partially address speech encoding processes.

For instance, metalinguistic tasks associated to go/no go paradigms have been largely used since the first ERP studies on speech production with healthy subjects (Jescheniak et al. 2002; Rodriguez-Fornells et al. 2002; Schmitt et al. 2000; Van Turennout et al. 1998, 1999; Zhang and Damian 2009) and with brain damaged (aphasic) speakers (Dobel et al. 2001; Hensel et al. 2004). Besides the drawback of using a metalinguistic task to study speech production, the interpretation of such results in terms of time-course of the addressed encoding processes is problematic. Actually, when the latency of the Lateralised Readiness Potential (LRP) associated with a go/no go paradigm is compared across conditions, it reveals the moment at which the information, once available, is transmitted to pre-motor cortex, but not the moment at which the information is processed. This delay involves that the latency of the LRP is subtracted from the reaction time (key press) in order to infer the time course of the encoding process. However, this computation reveals the moment at which the monitored linguistic information is available, not the moment at which it is processed. So, the results of monitoring studies are informative about the termination of these processes and do not provide direct timing information about the processes themselves (Camen et al. 2010).

Alternatively, ERP studies on word production have used silent or delayed production paradigms during picture naming (Cornelissen et al. 2003; Jescheniak et al. 2003; Laganaro et al. 2009; Vihla et al. 2006). In these studies participants either prepared the word (producing it overtly after a delay falling beyond the analysed period), or they said the word in their mind, therefore also avoiding possible artefacts during motor preparation and execution. Although this kind of paradigm addresses speech production directly (without having recourse to a metalinguistic task), one may wonder whether in delayed or silent production the same processes are executed as in overt production.

Only a few studies used overt production (immediate picture naming), with MEG (Levelt et al. 1998; Maess et al. 2002) or with EEG (Eulitz et al. 2000; Koester and Schiller 2008; Strijkers et al. 2010).

Taken together, these ERP investigations on speech production have led to converging results allowing a good estimation of the main processes underlying single word production during picture naming (Indefrey and Levelt 2004). In the picture naming task, visual processes are estimated to take place from 0 to about 150–175 ms relative to the picture presentation, followed by lexical-semantic (lemma selection) processes until about 275 ms. The encoding of the phonological form is thought to occur between 275 and 400–450 ms after picture onset, followed by phonetic encoding and motor execution.

We have already outlined the difficulty to tap into the time-course of each process with monitoring go/no go paradigms; with delayed or covert picture naming paradigms, the time course of each process might be modified when speakers do not have to produce a word immediately. Moreover, later processes (phonological and phonetic encoding) might not be completely implemented. It should be emphasised however that even ERP studies using an overt production paradigm and analysing epochs relative to picture onset might be missing these encoding processes, especially when the response onset exceeds the time-window considered in the analyses. In fact, in these overt production studies stimulus-aligned epochs have been analysed with a time-period varying from 500 ms (Maess et al. 2002) to 700 ms after picture onset (Koester and Schiller 2008) in order to avoid artefacts due to motor execution. For instance, in the study by Strijkers et al. (2010) the time-window of the analysed ERP was chosen in order to stop 100 ms before the fastest response latency. This implies that for most responses the analysed period stopped largely before the subjects started to articulate. Therefore, the capture of some encoding processes might be truncated when analysing a fixed time-window relative to stimulus onset because of inter-subject variability in the speed of speech production. For instance, when analysing data based on stimulus-aligned epochs of 700 ms after picture onset, the end of the data does not capture the same processes in “rapid” subjects (e.g. those with mean production latencies around 800 ms) and in “slower” subjects (e.g. those with production latencies around 1000 ms).

The only way of having ERP data covering the whole encoding process during picture naming, would be to analyse from picture onset to the beginning of articulation. This implies that the production latency of each item to be analysed is taken into account.

In the present study we seek to analyse overt production during a picture naming task relative to the stimulus and to the response and to compare it to delayed production. This comparison is aimed at analysing which encoding processes are captured with delayed production paradigms.

As a comparison point we manipulated a variable which was repetitively reported to affect spoken word encoding processes: word age of acquisition (AoA hereafter). Although lexical frequency has been manipulated in previous neuroimaging studies using picture naming paradigms (fMRI studies by Graves et al. 2007; Kronbichler et al. 2004; ERP studies by Cuetos et al. 2009; Dambacher et al. 2006; Hauk and Pulvermüller 2004; Strijkers et al. 2010), we decided to manipulate word AoA for the following reasons. Lexical frequency was a strong predictor of word production latencies in many studies using word reading (e.g., Gerhand and Barry 1998, 1999; see Andrews and Heathcote 2001 for a review) or picture naming tasks (e.g., Bonin et al. 1998; Jescheniak and Levelt 1994; Oldfield and Wingfield 1965). However, the reliably of lexical frequency effects has been challenged since effects of word AoA have also been reported (e.g., Bonin et al. 2002; Chalard et al. 2003; Morrison et al. 1992; Morrison and Ellis 1995). Especially, when those variables are operationalised following the recommendations made by Zevin and Seidenberg (2002),Footnote 1 reliable AoA effects were reported on conceptually driven spoken production (as in picture naming paradigms, Bonin et al. 2004). As we are concerned with conceptually driven speech production (picture naming), AoA is the most appropriate factor to be manipulated.

Two different cognitive loci of AoA effects have been proposed in the literature. Most empirical evidence points to an effect at the level of lexical-phonological encoding. This means that early-acquired word forms are encoded faster than late-acquired ones. This hypothesis is based on the fact that AoA effects were observed neither in lexical-semantic tasks (Chalard and Bonin 2006; Morrison et al. 1992) nor in delayed production tasks (Morrison and Ellis 1995), therefore excluding lexical-semantic encoding levels or motor processes. However, AoA effects have been recently reported with semantic tasks (Belke et al. 2005; Johnston and Barry 2005), supporting the alternative hypothesis of a lexical-semantic locus of AoA effects.

In the present study we will take advantage from ERP paradigms to analyze whether lexical-semantic or lexical-phonological or both processes are affected by word AoA in conceptually driven word production (picture naming). We combined waveform analyses and spatio-temporal segmentation analyses to the stimulus-aligned and response-aligned ERPs during overt picture naming, therefore covering the whole speech production processes. As a comparison point to address the methodological issue about the choice of ERP paradigm we compared the effects observed in overt (immediate) and delayed picture naming. Applying standard waveform analysis will provide information on the time-windows in which different conditions (immediate versus delayed production, early- versus late-acquired words) display different amplitudes. The spatio-temporal segmentation will allow to seek more detailed information on the time-windows corresponding to stable configurations of the spatial properties of the electric field (of scalp topography, corresponding to stable periods of functional microstates, see Michel et al. 2009) and their temporal dynamics across conditions. This analysis will inform about which encoding time-window is affected by AoA in immediate and in delayed picture naming and whether the time-windows underlying stable scalp topographies of word encoding processes are postponed in delayed picture naming relative to immediate naming.

Method

Subjects

The participants were 20 students (5 men), native French speakers, aged 20–36 (mean: 25.2). All were right-handed as determined by the Edinburgh Handedness Scales (Oldfield 1971; mean lateralization quotient index: 0.91; range: 0.4–1). They reported having normal or corrected-to-normal vision and did not suffer from any neurological or motor problem. All participants gave their informed consent to participate in the study and were paid for their participation.

Material

A total of 92 words and their corresponding pictures were selected from two French databases (Alario and Ferrand 1999; Bonin et al. 2003). Pre-linguistic and linguistic characteristics were available for all stimuli (pictures and words) in the same databases. We used a double criterion to select our material: adult rated AoA measures on 5-points scale and frequency trajectory. Half of the items were early-acquired words (EAW, mean = 1.73, SD = 0.25) and the other half late-acquired words (LAW, mean = 2.72, SD = 0.59, paired t-test across AoA categories: t(90) = −10.35, P < 0.0001). EAW also had a smaller mean frequency trajectory than the LAW (see Table 1, t(90) = −4.121, P < 0.0001). To obtain the frequency trajectory and the cumulative frequency, we used the “adult” measure of frequency taken from LEXIQUE (New et al. 2004) and the “child” measure of frequency (U values) taken from MANULEX (Lété et al. 2004). The frequency trajectory values were computed as the difference between the z-scores associated with the two measures of frequency (LEXIQUE minus MANULEX) and the cumulative frequency measures as the addition of the z-scores (LEXIQUE plus MANULEX). Early- and late-acquired words were matched on cumulative frequency, name agreement (h and percentage values), visual complexity, number of phonological neighbours, and on four sub-lexical variables (length in phonemes, sonority values of the first phoneme, and syllables and phonemes frequencies) (see Table 1).

Table 1 Visual, lexical and sub-lexical characteristics of the 92 stimuli

Procedure

The participants were tested individually in a soundproof dark room. They sat 60 cm in front of the screen. The presentation of trials was controlled by the software E-Prime (E-Studio). Pictures were presented on reverse video mode (white lines on grey screen) in constant size of 9.5 × 9.5 cm (approximately 4.52° of visual angle). A grey screen was used to avoid extreme light exposition. The spoken responses were digitized and recorded for later response latency (reaction time: RTs hereafter) and accuracy check.

Before the experiment participants were familiarized with the experimental pictures and their corresponding names. There were two experimental conditions: immediate picture naming and delayed picture naming. In both conditions an experimental trial had the following structure: first, a “+” sign was presented for 500 ms. Then, the picture appeared on the screen. The participant had to produce the word corresponding to the picture immediately when the stimulus appeared on the screen or after a cue (delayed naming). In both conditions the picture remained on screen during 1000 ms, but in the delayed condition if was followed by a grey screen lasting (randomly) 500 or 1000 ms. Finally, a response cue (question mark) appeared on the screen. Participants were asked to say aloud the name corresponding to the picture as soon as the response cue was presented on the screen and not earlier. Participants performed the two picture naming conditions in a counterbalanced order. In each condition, the 92 pictures were presented randomly, preceded by four fillers.

EEG Acquisition and Pre-analyses

EEG was recorded continuously using the Active-Two Biosemi EEG system (Biosemi V.O.F. Amsterdam, Netherlands) with 128 channels covering the entire scalp. Signals were sampled at 512 Hz with band-pass filters set between 0.16 and 100 Hz.

Epochs of 600 ms were averaged for each subject and AoA condition relative to picture onset for the delayed task. For the immediate task, stimulus-aligned (forward) epochs of 450 ms and response-aligned (backward) epochs of 400 ms were averaged across conditions. Response-aligned epochs started 100 ms before the production latency of each individual trial; stimulus-aligned epochs started at the moment the picture appeared on screen. For the spatio-temporal segmentation analysis (see below) the immediate stimulus-aligned and response-aligned data from each subject were merged according to each individual subject’s RT in each condition. This means that the individual averaged data (and the group grand-average) covered the actual time from onset (picture on screen) to 100 ms before articulation.

In addition to an automated selection criterion rejecting epochs with amplitudes reaching ±100 μV, each trial was visually inspected, and epochs contaminated by eye blinking, movements or other noise were rejected and excluded from averaging. ERPs were then bandpass-filtered to 0.2–30 Hz and recalculated against the average reference. After rejection of errors and of contaminated epochs a minimum of 32 epochs were averaged per subject for each AoA condition. Since epochs were analysed until 100 ms before articulation, we also performed an analysis on a bandpass filter 0.2–20 Hz in order to eliminate most EMG contamination (Goncharova et al. 2003; Whitham et al. 2007). As highly similar results were observed across all analyses we only present the results with the 30 Hz low-pass filter as in most cognitive ERP studies.

Immediate RT Analyses

After elimination of errors, latencies of vocal responses (ms separating the onset of the picture and articulation onset) were systematically checked with speech analysis software (Boersma and Weenik 2007), thanks to an inaudible acoustic click at the onset of the picture recorded on the second track of the recording system. Spoken latency data were fitted with linear regression mixed models (Baayen et al. 2008) with the R-software (R-development core team 2007; Bates and Sarkar 2007). AoA (EAW versus LAW) was included in mixed model as a fixed effect variable and participants and items as random effect variables. We controlled by-participants and by-items random adjustments to intercepts and by-participants random adjustments to slopes for the fixed effect of AoA. Likelihood ratio tests were used in order to select the most appropriate model (Pinheiro and Bates 2000; Baayen et al. 2008).

ERP Analyses

The ERPs were first subjected to waveform analysis to determine the time periods where amplitude differences were found between conditions. This analysis was performed on all electrodes and data-points. However, amplitude variations of ERP traces can follow from a modulation in the strength of the electric field, from a topographic change of the electric field (revealing distinguishable brain generators), or latency shifts of similar brain processes. To differentiate these effects, we applied topographic analyses (spatio-temporal segmentation). This approach allows to summarize ERP data into a limited number of topographical map configurations and identifying time periods during which different conditions (immediate and delayed production, early- and late-acquired words) evoke different configurations of the electric field at scalp. Any change of the spatial configuration of the electric field on the scalp is interpreted as revealing a difference in the distribution of the underlying intracranial sources.

Waveform and Global Field Power Analyses

Waveform analysis was carried out in the following way: paired t-tests were computed on amplitudes of the evoked potentials between conditions (delayed versus immediate naming and early versus late acquired words) at each electrode and time point (every 2 ms) over the whole period. Only differences over at least five electrodes from the same region out of six regions (left and right anterior, central, posterior) extending over at least 20 ms were retained with an alpha criterion of 0.01. For differences in global field power (GFP, or standard deviation of all electrodes at a given time, see Lehmann and Skrandies 1984), paired t-test were computed on the GFP between conditions at each time-frame, with an alpha criterion of 0.01 and a time-window of 20 ms of consecutive significant difference.

Topographic Pattern Analysis

The second analysis was a topographic (map) pattern analysis. This method is independent of the reference electrode (Michel et al. 2001, 2004) and insensitive to pure amplitude modulations across conditions (topographies of normalized maps are compared). A modified hierarchical clustering analysis (Michel et al. 2001; Pascual-Marqui et al. 1995), the agglomerative hierarchical clustering (Murray et al. 2008) was used to determine the most dominant configurations of the electric field at the scalp (topographic maps). A modified cross-validation criterion was used to determine the optimal number of maps that explained the best the group-averaged data sets across conditions. Statistical smoothing was used to eliminate temporally isolated topographic maps with low strength. This procedure is described in detail in Pascual-Marqui et al. (1995). Additionally, a given topography had to be present for at least 10 time frames (20 ms). We first applied a spatio-temporal segmentation on the four grand average data (delayed early- and late-acquired and immediate early- and late-acquired words). Then, the pattern of map templates observed in the averaged data was statistically tested by comparing each of these map templates with the moment-by-moment scalp topography of individual subjects’ ERPs from each condition. Each time point was labelled according to the map with which it best correlated spatially, yielding a measure of map presence. This procedure referred to as ‘fitting’ allowed to establish how well a cluster map explained individual patterns of activity (GEV: Global Explained Variance) and its duration. These analyses were performed using the Cartool software (http://brainmapping.unige.ch/Cartool.php).

In order to analyse whether one map is more representative of one condition or whether it lasts longer in one condition, GEV and durational measures observed in each subject’s data were used for statistical analysis. Analyses of variance were applied to these measures with subjects as random variable and conditions as fixed factors. This approach has been used in other cognitive domains (Britz et al. 2009; Murray et al. 2006; Schnider et al. 2007) as well as with language data (Camen et al. 2010; Laganaro et al. 2009).

Results

Behavioural Results (RTs)

In the immediate production, early-acquired words (EAW) were produced 26 ms faster than late-acquired words (LAW) (see Table 2). This effect of AoA is significant (fitted with mixed models: t(1618) = 4.360, P < 0.0001).

Table 2 Production latencies (RT) in ms (mean and standard deviation) for early acquired (EAW) and late acquired words (LAW)

ERP

Immediate Versus Delayed Production

Figure 1 shows the time points of significant amplitude differences between immediate and delayed naming on the 450 ms after picture onset. Different amplitudes are observed in the first 120 ms, and around 200 ms on electrodes from different regions of the scalp. More consistent differences in amplitude appeared from about 310 ms on posterior left and right electrodes and after 420 ms on electrodes from all over the scalp. Only in this latter time-window consistent significant differences were also observed on GFP.

Fig. 1
figure 1

Significant differences (P values, at P < 0.01 and P < 0.001) on ERP waveform amplitude on each electrode (Y axes) and time point (X axes) between the immediate and delayed task (only differences over at least five electrodes from the same region extending over at least 20 ms are displayed) and results of statistical analysis (1 − P values) on global field power (GFP). Bottom: Group averaged ERP waveforms in the immediate and delayed production. Negative amplitudes are plotted in the upward direction. In the lower right corner of the figure, the arrangement of the 128 electrodes with the electrode position of the displayed waveforms is presented

The spatio-temporal segmentation applied on the averaged data of early- and late-acquired words in the delayed and immediate (stimulus- and response-aligned) conditions revealed 8 different topographies accounting for 94.8% of the variance (see Fig. 2).

Fig. 2
figure 2

Temporal distribution of the cortical maps revealed by the spatio-temporal segmentation analysis on the grand-averages for early and late acquired words in the immediate and delayed condition. Topographical maps displaying significant differences between tasks or conditions and their specific time-windows are highlighted in black, grey, red and green (Color figure online)

The same sequence of topographic maps is observed until about 350 ms between immediate and delayed naming. The map template labelled as “E” (Fig. 2) starts around 230 ms in both conditions, but lasts 100 ms longer in the delayed condition. This difference is validated by the results of the fitting procedure applied to the individual delayed and immediate data in two time-windows: from 0 to 200 ms and from 200 to 600 ms. The fitting procedure indicated no difference in the 0–200 ms time-window on map presence or GEV, while the fitting in the following period confirmed the longer duration and higher GEV (respectively 31 and 15% of GEV) of Map E in the delayed naming condition relative to the immediate naming (duration: t(19) = 2.25, P < 0.05; GEV: t(19) = 2.35, P < 0.05). The opposite pattern was observed for map template F (Fig. 2): longer duration and higher GEV (respectively 44 and 18%) in the immediate than in the delayed production data (duration and GEV: t(19) = −3.2, P < 0.01). The presence of different maps (map G and map H) across the two tasks was confirmed in the fitting in the last 100 ms of the analysed periods by the interaction between map and condition (F(1,19) = 13.4, P < 0.01 on duration and F(1,19) = 16.96, P = 0.001 on GEV) and on the number of subjects displaying this map in the fitting in their data: (χ2 = 9, P < 0.001).

Age of Acquisition Effects

The time points of significant amplitude differences between AoA conditions are shown in Fig. 3 for the immediate and the delayed conditions.

Fig. 3
figure 3

Significant differences (over at least five electrodes from the same region extending over at least 20 ms are displayed at P values P < 0.01 and P < 0.001) on ERP waveforms amplitude on each electrode (Y axes) and time point (X axes) between early and late acquired words and results of statistical analysis (1 − P values) on global field power (GFP) for each condition and group averaged ERP waveforms in each condition. a Immediate production for the stimulus-aligned and response-aligned analysis. b Delayed production condition

In the immediate naming condition (Fig. 3a), different amplitudes between early and late acquired words appeared at 120–140 ms on a few electrodes from central left regions, between 220 and 240 ms on anterior right and central left regions at scalp and between 320 and 350 ms at anterior right and posterior left electrodes. In the response-aligned ERPs analysis different amplitudes are observed around 150 ms and around 280 ms before articulation. Different GFP between early- and late-acquired words also appeared in the same two early time-windows as different amplitudes as well as around 370–390 ms and around 200 ms before articulation.

In the delayed naming task (Fig. 3b), different amplitudes between early- and late acquired words were observed in the period between 120 and 140 ms on posterior left and right electrodes. Different GFP between early- and late-acquired words also appeared in the same time-window, with an additional difference around 400 ms. A later difference was observed at the end of the epochs (around 550 ms) on amplitudes and GFP.

In the spatio-temporal segmentation analysis, same topographies and duration of stable topographic maps across AoA conditions appeared in the immediate naming task in the first fitting period (see Fig. 2). The second fitting period (from 200 ms to RT) yielded a 32 ms difference in duration of map template F between early- and late-acquired words (EAW: 152 ms, LAW: 184 ms, t(19) = −3.47, P < 0.001). The shift of first onset of Map G was also significant (t(19) = −2.15, P < 0.05), but with similar duration across conditions (t < 1).

No differences were observed on topographic map distributions in the comparison between early- and late-acquired words in the delayed condition.

Discussion

In the present study we investigated the time-course of word Age of Acquisition effects during picture naming by integrating stimulus-aligned and response-aligned ERPs in overt picture naming and by comparing immediate and delayed production. We will discuss the effect of the manipulated variable first, and then draw the consequences for the choice of ERP paradigms in speech productions studies.

AoA Effect

Word Age of Acquisition did not affect the ERPs in the delayed naming task except for a very early (~130 ms) difference on amplitudes without modulation of the distribution of topographic maps.

In the immediate naming task, AoA modulated waveform amplitudes and the sequence of topographic patterns.

As explained in the “Introduction”, two possible loci of AoA effects in picture naming have been suggested: lexical-semantic (Belke et al. 2005; Johnston and Barry 2005) and lexical phonological encoding processes (Chalard and Bonin 2006; Morrison and Ellis 1995; Morrison et al. 1992). Different amplitudes between early- and late-acquired words were observed in four distinct time periods. Very early (around 120–140 ms after picture onset) and very late (150 ms before RT) differences were unexpected as they do not fall within any of the time-windows associated with the hypothesised processes affected by AoA. As most factors indexing visual processes (Image Agreement and Visual Complexity, see Alario et al. 2004) were controlled in our material, we have no clear explanation for these early differences. On the other hand, different amplitudes observed 150 ms before articulation may indicate that some characteristics of motor encoding modulated the strength of the electric field across LAW and EAW. We have no clear explanation for this result either, but it should be emphasized that these very early and very late differences were observed only on amplitudes without modulation of the topographical configuration or its duration (see below).

The other differences in amplitudes fall in two time-windows associated with both hypothesized loci of AoA effects. The modulation of amplitudes observed between 220 and 250 ms, falls within the time-window estimated for lexical-semantic encoding process (Indefrey and Levelt 2004); it therefore seems to support the hypothesis of a lexical-semantic locus of AoA effect (e.g., Belke et al. 2005; Johnston and Barry 2005). However, further differences in amplitudes between early- and late-acquired words also appeared in a time-window corresponding to lexical-phonological encoding (330–350 ms after picture onset and around 280 ms before RT). It seems therefore impossible to distinguish among possible loci of AoA effects only on the basis of waveform analyses.

By contrast, a unique time-window differentiates early- from late-acquired words if consistent differences observed across all analyses are considered. In the spatio-temporal analysis, the same scalp topographies appeared in the two AoA conditions, but a longer period of stable topography characterised the late-acquired words data relative to early-acquired words. This difference in duration was observed on a stable topographical map originating around 350 ms and lasting 32 ms longer for late-acquired words. As a consequence, the beginning of the following stable electric field configuration (map G in Fig. 2) was shifted (appeared later for late-acquired words).

Therefore, the joint results from waveform analysis and from spatio-temporal segmentation seem to be in line with the lexical-phonological hypothesis of AoA effects, as both indicate differences between early- and late-acquired words falling within the time-window that has previously been associated with lexical-phonological processes during picture naming (i.e., the time-window between 275 and 450 ms, Indefrey and Levelt 2004). The duration of the stable topography during this time-period (map F in Fig. 2) was on average 32 ms shorter for early-acquired words than for late-acquired words, suggesting that lexical-phonological encoding processes take longer for late-acquired words than for early-acquired words. Crucially, this result can be directly linked to the significant 26 ms difference observed in the behavioural results, as the difference in map duration between AoA conditions is very close to the difference observed in RTs.

In sum, AoA modulated response latencies (early-acquired words were produced faster than late-acquired ones), indicating increased processing time-cost for late-acquired words, which can be linked to the longer duration observed on a stable topography falling within the time-window associated to phonological encoding. Amplitudes also differed in time-windows corresponding to the beginning and to the end point of this period of stable topographical map.

To our knowledge only two studies have explored AoA effects with ERPs, one with a lexical decision task (Tainturier et al. 2005), the other with a word reading task (Cuetos et al. 2009). Cuetos et al. also suggested a lexical-phonological locus for their AoA effect. However, these results are hardly comparable with our data as their (silent) word reading task does not entail conceptually driven word production as in a picture naming tasks (see the “Introduction”). By contrast, we can attempt an integration of our results with ERP studies analyzing the effect of lexical frequency in picture naming tasks (Strijkers et al. 2010). Indeed, according to the points raised in the Introduction, both variables (lexical frequency and AoA) affected picture naming latencies in previous behavioural investigations, or they might be linked or confused, especially when they are not adequately manipulated (Zevin and Seidenberg 2002, 2004).

Strijkers et al. (2010) analysed word frequency in bilingual speakers during an overt (immediate) picture naming task. They reported an early lexical frequency effect on amplitudes around ~180–200 ms. According to the estimation made by Indefrey and Levelt (2004), see the “Introduction”), this time-period corresponds to lexical-semantic processes. Thus, it seems that lexical frequency affects lexical selection while AoA has a major impact on phonological encoding. This interpretation is in line with results suggesting a lexical-phonological effect of AoA while lexical frequency was found to affect several encoding levels (Kittredge et al. 2008). Further ERP investigation with factorial manipulation of AoA and lexical frequency may help to confirm this hypothesis.

Delayed Versus Immediate Overt Production

Converging results between waveform analyses and the spatio-temporal segmentation analysis indicate that immediate and delayed naming consistently diverge after 300 ms. Both, waveform and topographical differences appeared from around 300 ms until the end of the analysed time-period. Crucially, results indicated identical scalp topographies across immediate and delayed naming until 350 ms. The stable electric field configuration initiating around 230 ms in both tasks lasted longer in the delayed condition than in the immediate production, where it terminated around 350 ms. Only the last 100–150 ms of the analysed periods (corresponding to the 200 ms preceding articulation in the immediate production) were characterised by different topographical maps in the two tasks.

These results partially converge with those reported by Eulitz et al. (2000). In that study subjects had to produce a noun phrase (an adjective plus a noun) either in an overt production or in a covert (silent) production condition. These two conditions diverged after 400 ms post-stimulus onset. Taken together with ours, results indicate that overt production, covert production and delayed production might activate the same neuronal network during the first 350–400 ms following picture onset. The 50 ms difference between the two studies may be attributed to different tasks. Covert production may have more common processes with overt production than delayed production. Alternatively, the difference might be due to the different size of the produced utterance (single word versus two words). The earlier modulations on amplitudes in our study that did not appear in the study by Eulitz et al. may also be attributed to these differences across studies.

Based on the estimations from previous studies on single word production (see the Introduction), the common time-window between delayed and immediate production corresponds at least to the processes preceding phonological encoding (visual-conceptual processes and lemma encoding) and it falls within the beginning of lexical-phonological encoding (estimated to start around 270 ms). It seems therefore that subjects do start to encode the word form in a delayed production task, but that this process takes longer or is not completed when they are instructed to produce the word after a short delay. These results also indicate that in studies using delayed production encoding processes have been captured until the beginning of phonological encoding. Therefore, only immediate production with stimulus- and response-aligned ERPs allows to track the whole phonological and phonetic encoding processes on real-time.

Conclusion

Considering stimulus-aligned and response-aligned ERPs in the immediate naming condition allowed us to analyze which periods of stable scalp topographies are modulated by AoA and lead to longer RT in one condition relative to the other. The joint waveform and topographical analysis revealed that AoA modulated both, amplitudes and duration of stable scalp topography in a time-window corresponding to phonological encoding. The stable topographical map starting around 350 ms after stimulus presentation lasted around 30 ms longer for late-acquired than for early-acquired words, which corresponded to the differences reported in production latencies. These results seem to support a phonological encoding locus for main AoA effects. In addition, the comparison between immediate and delayed conditions indicates that phonological encoding processes are not entirely captured with delayed production paradigms, neither with immediate production analyses limited to stimulus-aligned ERPs. As a consequence, immediate production paradigms with response- and stimulus-aligned ERPs should be preferred to analyse real-time phonological and phonetic encoding processes.