One of the most replicated findings in the literature on visual-word recognition is that word identification times are faster (and more accurate) for high-frequency words than for low-frequency words (see Forster & Chambers, 1973; Preston, 1935; Rubenstein, Garfield, & Millikan 1970; Solomon & Postman, 1952, for early evidence). Similarly, during normal reading, fixation durations are shorter for high-frequency words than for low-frequency words (e.g., Inhoff & Rayner, 1986; Rayner & Duffy, 1986). For decades, word-frequency (WF; i.e., the number of times a word appears in a lexical database) has been considered the most important lexical factor in visual-word recognition and reading, and it plays a pivotal role in all computational models of visual-word recognition (e.g., the resting level of activation of word units in interactive activation models is a function of word-frequency; see Davis, 2010; Grainger & Jacobs, 1996; McClelland & Rumelhart, 1981) as well as in all leading computational models of eye movement control during reading (e.g., EZ-Reader model: Reichle, Pollatsek, Fisher, & Rayner 1998; SWIFT model: Engbert, Nuthmann, Richter, & Kliegl, 2005).

In an influential study, Adelman, Brown, and Quesada (2006) reported that “contextual diversity” (CD), which was defined as the proportion of contexts (documents) in which a word appears in a lexical database, was a better predictor of word identification times than WF in two widely used behavioral tasks (lexical decision and naming). In the past years, the effect of CD has received increasing attention in the field of word recognition. The basic finding is that the higher the number of contexts in which a word appears, the faster the word identification times (see also Cai & Brysbaert, 2010; Dimitropoulou, Duñabeitia, Avilés, Corral, & Carreiras, 2010; Perea, Soares, & Comesaña, 2013; Soares et al., 2015, for converging evidence). This effect is not restricted to single word identification tasks. During sentence reading, fixation durations are shorter for higher CD words than for lower CD words matched in WF (Plummer, Perea, & Rayner, 2014).

A fundamental and unanswered issue is to clarify the nature of the CD effect. In this experiment we aim to address this question by using a highly sensitive experimental tool: the recording and analysis of the event-related potentials (ERPs). As CD and WF tend to be correlated (i.e., high-frequency words tend to be words that appear in many contexts and vice versa), one might argue that they essentially reflect the same underlying structural processes: Each exposure to a word will influence its accessibility, allowing it to be processed more quickly. (Note that WF stands for the number or raw frequencies, whereas CD filters out repeated encounters of the word in the same documents.) This interpretation would have little implications for models of visual-word recognition and reading. As indicated by Plummer et al. (2014), “models could easily substitute word-frequency with contextual diversity without any serious theoretical implications” (p. 280). Alternatively, one might argue that CD effects have a semantic origin. Adelman et al. (2006) indicated that “whereas WF is subject to effects of structural variables, CD seems more likely to be influenced by semantic variables” (p. 816) and “temporal, as well as semantic aspects of context, contribute to the CD effect” (p. 822). In latent semantic analysis (LSA; Landauer, 2001), a psychological model intended to explain the learning and representation of words and other sources of knowledge, the meaning of a word is conceptualized as “an irreversible mathematical melding of the meanings of all the contexts in which it has been encountered” (Landauer, 2001, p. 1). Within this framework, two uses of the same word are never identical in meaning, as their precise connotation in each case depends on the immediate linguistic and environmental context. Therefore, CD may as well have a crucial impact on the way meaning is built for that particular word. In this line, Hoffman, Lambon Ralph, and Rogers (2013) claimed that words that appear in a wide range of diverse contexts might be more variable in meaning than the words that appear in a restricted set of contexts. In other words, higher CD words could be semantically richer than lower CD words.

How can we tease apart the “lexical/structural” versus “semantic” accounts of the CD effect? Word-recognition experiments that only collect behavioral data cannot be used to disentangle the two explanations proposed for the CD effect since both lexical/structural and semantic manipulations produce facilitative effects. That is, high-frequency words yield shorter response times than low-frequency words in word recognition experiments. Likewise, semantically richer words produce shorter response times than semantically poorer words (number of semantic features, number of semantic associates: Buchanan, Westbury, & Burgess, 2001; Duñabeitia, Avilés, & Carreiras, 2008; Pexman, Hargreaves, Siakaluk, Bodner, & Pope, 2008; Pexman, Lupker, & Hino, 2002; Rabovsky, Sommer, & Abdel Rahman, 2012; Yap, Pexman, Wellsby, Hargreaves, & Huff, 2012; number of senses/meanings: Borowsky & Masson, 1996; Rodd, Gaskell, & Marslen-Wilson, 2002; Rodd, 2004; Woollams, 2005; Yap, Tan, Pexman, & Hargreaves, 2011; concreteness: Kanske & Kotz, 2007; Kounios & Holcomb, 1994; Schwanenflugel, 1991, but see Barber, Otten, Kousta, & Vigliocco, 2013). Thus, a facilitative effect of CD in the word identification times can be readily accommodated by the two accounts.

The ERPs have the potential to provide a critical measure of neural processing (time course, amplitude, and scalp distribution) related to the underlying cognitive processes of the CD effect. Many studies have investigated the temporal dynamics of lexical and semantic influences during word recognition, mainly focusing on the N400 component. The N400 is a negative deflection starting around 200 ms and reaching its peak amplitude around 400 ms after stimulus onset, which is maximal over centro-parietal electrode sites. For words presented in isolation, the N400 has been associated with lexical-semantic processing and the modulation of its amplitude reflects processing costs during the retrieval of properties associated with a word form stored in memory (Holcomb, Grainger, & O’Rourke, 2002; Kutas & Federmeier, 2000). In this line, the amplitude of the N400 component is modulated by WF: low-frequency words elicit larger N400 amplitudes than high-frequency words (Barber, Vergara, & Carreiras, 2004; Smith & Halgren, 1987; Van Petten & Kutas, 1990; Vergara-Martínez & Swaab, 2012; Vergara-Martínez, Perea, Gómez, & Swaab, 2013). Although the N400 effects are often characterized in the 300–500-ms time window, it is not rare to observe WF effects in earlier time windows (e.g., see Hauk & Pulvermüller, 2004; Hauk, Davis, Ford, Pulvermüller & Marslen-Wilson, 2006).

Crucially, the amplitude of the N400 is also modulated by semantic factors (e.g., concreteness, number of associates, number of semantic features), but in the opposite direction to that of the WF effect in word recognition experiments. Larger N400 amplitudes have been found for concrete than for abstract words (Barber et al., 2013; Holcomb, Kounios, Anderson & West, 1999; Kanske & Kotz, 2007; West & Holcomb, 2000). Larger N400 amplitudes have also been reported for words with many semantic features or associates than for those with few semantic features or associates (Amsel, 2011; Laszlo & Federmeier, 2011; Müller, Duñabeitia, & Carreiras, 2010; Rabovsky et al., 2012; but see Amsel & Cree, 2013; Kounios et al., 2009, for an opposite pattern due to explicit semantic task demands). ERP effects related to semantic richness have been found to be distributed over anterior scalp electrodes (concreteness effects: Adorni & Proverbio, 2012; Barber et al., 2013; Holcomb et al., 1999; Kanske & Kotz, 2007; West & Holcomb, 2000; semantic richness: Amsel, 2011; Müller et al., 2010; but see Rabovsky et al., 2012, for centro-parietal localization of semantic richness effects). As occurs with the WF effect, ERP effects of “semantic richness” are not necessarily confined to the classic N400 interval (300–500 ms), but they have been found to peak at earlier latencies (Amsel, 2011; Rabovsky et al., 2012).

In sum, prior ERP experiments have revealed a dissociation between lexical/structural versus semantic factors in the N400 component: while low-frequency words produce more negativity than high-frequency words, words that are richer in semantic factors (e.g., concreteness, number of associates) produce more negativity than words with less semantic richness—note that both WF and the measures related to “semantic richness” are facilitative in behavioral and eye-tracking experiments. This experiment makes use of the dissociation regarding the N400 component to examine whether the facilitative CD effect is driven by lexical/structural or by semantic processes. We measured the ERPs during a lexical decision experiment (i.e., the most common laboratory word recognition task) with words that varied in the number of contexts they appeared in (high-CD vs. low-CD words) while WF and other psycholinguistic characteristics were controlled for. The predictions are clear-cut. Larger negativities for low CD than for high CD words, along with a centro-parietal distribution (in line with the canonical N400 WF effect) would favor a “lexical/structural” interpretation of the CD effect (i.e., CD would just be another signature of word frequency). Alternatively, larger negativities for high-CD than for low-CD words would favor a “semantic” interpretation of the CD effect. Furthermore, we scrutinized the ERP segments to better characterize the CD effect—note that, although the time course of orthographic and lexical semantic effects in visual word recognition, as measured by the ERP technique, seem to converge on the 300–500 ms time window, the limits of the N400 are far from certain (see Laszlo & Federmeier, 2014, for a metareview of the time course of orthographic, lexical, and semantic factors during visual word recognition). Finally, for comparison purposes, we also measured the ERPs for a set of words that only differed in WF with the experimental low-CD words. This enabled us to compare the effect of CD with the more canonical WF effect with the same participants.

Method

Participants

Twenty-three undergraduate and graduate students of the University of Valencia (14 women) participated in the experiment in exchange for a small gift. All of them were native Spanish speakers with no history of neurological or psychiatric impairment, and with normal (or corrected-to-normal) vision. Ages ranged from 18 to 40 years (M age = 26 years, SD = 5.9). All participants were right-handed, as assessed with a Spanish abridged version of the Edinburgh Handedness Inventory (Oldfield, 1971).

Materials

We selected 70 Spanish words from the EsPal subtitle database (Duchon, Perea, Sebastián-Gallés, Martí, & Carreiras, 2013). This database provides not only the token account of each word (i.e., word frequency) but also the proportion of films in which a word appears. As in previous research, the CD variable was operationalized as the proportion of films (documents) in which a word appears (see also Soares et al., 2015). There were 35 high-CD words (i.e., words that occur in a high percentage of films) and 35 low-CD words (words that occur in a low percentage of films). To establish two differentiated word groups regarding CD, the words were selected from a range of high-frequency values. The two conditions only differed significantly in CD (p < .0001) and were carefully matched for a number of sublexical, lexical, and semantic variables (see Table 1). To perform a follow-up analysis on word frequency, a second group of 35 low-CD words was selected from a range of low-frequency values. Both word groups of low-CD were matched for a number of sublexical, lexical, and semantic variables (see Appendix A), so that the two conditions only differed significantly in WF (p < .0001). For the purposes of the lexical decision task, we also created 105 orthographically legal pseudowords (by replacing 2–5 letters from the original words, depending on their length) using Wuggy (Keuleers & Brysbaert, 2010). The list of words/pseudowords is presented in Appendix B.

Table 1 Mean values for sublexical, lexical, and semantic characteristics of the stimuli (obtained from the Espal database; Duchon et al., 2013)

Procedure

Participants were seated comfortably in a dimly lit and sound-attenuated chamber. All stimuli were presented on a high-resolution monitor positioned at eye level 80 cm in front of the participant. The stimuli were displayed in white lowercase Courier New 24-pt. font against a dark-gray background. Participants performed a lexical decision task: they had to decide as accurately and rapidly as possible whether or not the stimulus was a Spanish word. They pressed one of two response buttons (YES/NO). The hand used for each type of response was counterbalanced across subjects. Reaction times (RTs) were measured from target onset until the participant’s response.

The sequence of events in each trial was as follows: A fixation cross (“+”) appeared in the center of the screen for 1,000 ms. This was followed by a 200-ms blank screen which, in turn, was replaced by a stimulus (word or pseudoword) in lowercase letters that remained on the screen for 400 ms. The trial finished when the participant responded or 1,500 ms had elapsed. A blank screen of random duration (range: 700–1,000 ms) was presented after the response. To minimize subject-generated artifacts in the EEG signal during the presentation of the experimental stimuli, participants were asked to refrain from blinking and eye moving from the onset of the fixation cross to the end of the trial. Each participant received the stimuli in a different random order. Sixteen warm-up trials, which were not further analyzed, were presented at the beginning of the session and were repeated if necessary. The whole experimental session lasted approximately 20 minutes.

EEG recording and ERP analyses

The electroencephalogram (EEG) was recorded from 29 Ag/AgCl electrodes mounted in an elastic cap (EASYCAP GmbH, Herrsching,

Germany) according to the 10/20 system. These electrodes were referenced to the right mastoid and re-referenced off-line to the averaged signal from two electrodes placed on the left and right mastoids. Eye movements and blinks were monitored with electrodes placed on the right lower and upper orbital ridge and on the left and right external canthi. The EEG recording was amplified and bandpass filtered between 0.01–100 Hz with a sample rate of 250 Hz by a BrainAmp (Brain Products, GmbH, Gilching, Germany) amplifier. An off-line bandpass filter between 0.01 and 20 Hz was applied to the EEG signal. Impedances were kept below 5 kΩ during the recording session. All single-trial waveforms were segmented and screened offline for amplifier blocking, drift, muscle artifacts, eye movements, and blinks. This was done for a 500-ms epoch with a 100-ms prestimulus baseline. Trials containing artifacts and/or trials with incorrect lexical decision responses were not included in the average ERPs or in the statistical analyses. These processes led to an average rejection rate of 9.2% of all trials (7.9% due to artifact rejection; 1.3% due to incorrect responses). A t test on the number of included trials per condition showed no difference between conditions, t(22) = 1.4, p = .175. ERPs were averaged separately for each of the experimental conditions, each of the subjects, and each of the electrode sites.

To characterize the CD effect in terms of the time course, polarity, and scalp distribution of its electrophysiological signature, the statistical analyses were performed on the mean voltage values between 225 and 325 ms, and on the full montage of 27 scalp electrodes. The selection of this time epoch was based on the results of running repeated-measures t tests at every 4-ms intervals between one and 500 ms at all 27 scalp sites for CD (high/low). To correct for multiple comparisons, we applied the following criterion: if a sequence of 15 consecutive t-test samples exceeded the .05 significance level, then an onset latency for a given experimental contrast was considered significant and reliable (see Guthrie & Buchwald, 1991; see Fig. 1). As a result, one time window of interest was identified: 225–325 ms. The full set of 27 electrodes was included in the analyses by dividing the electrode montage into seven separate parasagittal columns along the anterior–posterior axis of the head (see Fig. 2; Massol, Midgley, Holcomb, & Grainger, 2011). We performed four separate repeated-measures analyses of variance (ANOVAs), one on each of the three pairs of lateral columns and on the midline column. The lateral column analyses (referred to as Col. 1, Col. 2, Col. 3, extending outward) included the factor anterior–posterior (AP) over dorsal electrode sites (three, four, or five levels) and the factor hemisphere (HEM) over rostral electrode sites. The midline column analysis only included the AP factor with three levels. In sum, the analyses of variance (ANOVAs) included the factors CD, AP, and HEM (on three pairs of columns; CD and AP on the midline column). Effects for the AP and HEM factors are reported when they interact with the experimental manipulation. Interactions between factors were followed up with simple effect tests.

Fig. 1
figure 1

Results of the univariate statistical analyses of the time course of contextual diversity. The plots convey the results of repeated-measures t tests at every 4 ms interval between zero and 500 ms at all 27 scalp sites (listed in an anterior-posterior progression). P values are coded from lighter (light gray: .05–.06) to darker (black: <.01) and corrected for multiple comparisons (e.g., Guthrie & Buchwald, 1991)

Fig. 2
figure 2

Schematic representation of the electrode montage. Electrodes are grouped into four columns (midline and extending outward 1, 2 and 3 columns) for statistical analysis

Results

Behavioral results

Incorrect responses (1.3% of the data) and lexical decision times less than 250 ms or larger than 1,500 ms (less than 0.4% of the data) were excluded from the latency analyses. The mean lexical decision times and percent errors were submitted to separate t tests (contextual diversity: high CD vs. low CD) over participants (t1) and items (t2).

The statistical analyses on the latency data revealed that, on average, high-CD words (601 ms; SD = 123) were responded to faster than the low-CD words (614 ms; SD = 121), t1(22) = 4.01, p = .001; t2(68) = -2.05, p = .044. The statistical analyses on the error data did not reveal any effects of contextual diversity (both ts < 1).

Therefore, the behavioral data replicated the same pattern of data as in the previous experiments where CD has been manipulated: response times were shorter for high- than for low-CD words.

ERP results

Figure 3 shows the ERP waves of contextual diversity (CD) in 11 representative electrodes. The ERPs show a negative potential peaking around 100 ms, which was followed by a slower positivity (P2) ranging between 100 and 250 ms. Following these early potentials, a large and slow negativity peaking around 350 ms can be seen at both anterior and posterior areas (N400). After the N400 component, the waves remain positive until the end of the epoch (500 ms).

Fig. 3
figure 3

Grand average ERPs to words in the two CD conditions (low and high) in eleven representative electrodes. The 225–325 ms time epoch is highlighted by the colored bar (Color figure online)

Starting around 200-ms poststimuli, high-CD words show larger negative amplitudes compared to low-CD words over anterior scalp areas. This effect lasts approximately until 400-ms poststimuli. The results of the ANOVAs on the averaged voltage values in the 225–325 ms time window, and across the different electrode columns, are reported below.

225–325 ms epoch

The analysis on midline and Column 1 showed a main effect of CD, midline: F(1, 22) = 5.77, p = .02, η2 p = .208; Col. 1: F(1, 22) = 8.20, p = .009, η2 p = .272; in Col. 2 the main effect was close to significance: F(1, 22) = 4.01, p = .058, η2 p = .154, which was modulated by a significant interaction between CD and AP distribution, midline: F(1, 22) = 5.54, p = .009, η2 p = .201; Col. 1: F(1, 22) = 4.28, p = .026, η2 p = .163; Col. 2: F(1, 22) = 5.74, p = .006, η2 p = .207. This interaction showed that the CD effect was located over anterior scalp areas: words with high-CD values elicited larger negativities than words with low-CD values, midline: Fz: F(1, 22) = 12.28, p = .002; Cz: F(1, 22) = 6.66; p = .017; Pz: F < 1; Col. 1: FC1/FC2: F(1, 22) = 13.10; p = .002; C3/C4: F(1, 22) = 6.97; p = .015; CP1/CP2: F(1, 22) = 4.62; p = .043; Col. 2: F3/F4: F(1, 22) = 6.96; p = .015; FC5/FC6: F(1, 22) = 5.98, p = .023; CP5/CP6: F(1, 22) = 2.08; p = .16; P3/P4: F < 1.

For the interested reader, the results of the WF manipulations are presented in Appendix A—as in prior research, we found higher N400 amplitudes for lower- than for higher-frequency words.

Discussion

This experiment aimed to shed some light on the nature of the contextual diversity (CD) effect (i.e., lexical/structural vs. semantic) by examining its electrophysiological signature. As expected, the behavioral data were consistent with previous findings: high-CD words were responded to faster than low-CD words. But the central finding was on the ERP data: high-CD words elicited larger negative amplitudes than low-CD words. This constitutes a reversal in the direction of the CD effect when contrasted to the WF effect (see Figs. 4 & 5). Note that the findings of numerous studies that have manipulated WF mainly consist of high-frequency words eliciting smaller negative amplitudes than low-frequency words (see Vergara-Martínez & Swaab, 2012, for recent evidence), a pattern that has also been replicated in the present study for the same participants (see the follow-up analysis of WF included in Appendix A).

Fig. 4
figure 4

a Topographic distribution of the CD effect (calculated as the difference in voltage amplitude between the ERP responses to low- minus high-CD words) and of the WF effect (calculated as the difference in voltage amplitude between the ERP responses to low- minus high-WF words) in the 225–325-ms time epoch. b Summary of contextual diversity (CD) and word frequency (WF) effects in each electrode column. Significant (p < .05) main effects are reported. When there is a significant interaction between CD or WF and AP distribution and/or hemisphere, effects at specific locations are reported (Color figure online)

Fig. 5
figure 5

Difference waveforms of contextual diversity and word requency for 11 representative electrodes. The CD effect is calculated as the difference in voltage amplitude between the ERP responses to low versus high CD words. The word frequency effect is calculated as the difference in voltage amplitude between the ERP responses to low- versus high-frequency words

Our finding of a reversal of the ERP effects of CD compared to WF has important implications regarding the assimilation of both factors into a common facilitative mechanism in visual-word recognition. If the effects of CD and WF were similar instances of the same underlying lexical/structural processes (facilitating lexical access in the same way), high-CD words would have elicited smaller negative amplitudes than low-CD words. Instead, the direction of the CD effect in the ERP results resembles that obtained in ERP experiments that manipulated factors related to “semantic richness” (i.e., larger negativities for the semantically richer words; e.g., see Rabovsky et al., 2012; West & Holcomb, 2000). Namely, ERPs for high CD words were more negative-going than ERPs for low CD words between 225 and 325 ms after word onset. Importantly, the CD effect obtained in the current experiment cannot be explained in terms of other semantic variables such as concreteness or imageability, as the experimental words were matched in these and other psycholinguistic factors (see Table 1). Although the latency and duration of the CD effect (225–325 ms) is consistent with the time course of different variables affecting lexical-semantic processing, it is outside the common interval of the N400 (300–500 ms). Nevertheless, the limits of the N400 are far from certain (see Laszlo & Federmeier, 2014, for a metareview) because few studies make the effort to really determine the onset of the effect. One might argue that, despite the many reliable effects obtained across large time intervals when the data are analyzed in aggregate, this may also result from large effects peaking very early or very late within the interval. In fact, when we conducted the statistical analyses on a broader time window (225–450 ms), the results also showed a significant effect of CD over frontal electrodes,Footnote 1 confirming that this method may overestimate the impact of significant effects throughout the course of processing. Hence, the latency and polarity of the CD ERP effects could be interpreted in terms of an (early) N400 modulation. Compared to the CD effect, post hoc analyses of the WF effect revealed a longer duration (150–500 ms; see Fig. 5 and Appendix A). The transient effect of CD could be explained as the result of larger semantic networks that become temporally active for words that appear in many contexts. This is, words that appear in a diverse set of contexts (i.e., high-CD words) could develop a “larger and more varied set of semantic associations, many of which will be irrelevant in any specific situation” (Hoffman, Rogers, & Lambon Ralph, 2011, p. 2442). A similar reasoning has been previously used in the interpretations of the interplay between orthographic neighborhood size (ON) and WF effects in the ERP waves (Vergara-Martínez & Swaab, 2012). The finding of a shorter timing of the ON than the WF effect was explained in terms of the interaction between the transient activation of orthographic neighbors at a lexical-semantic level and the specific characteristics of the stimulus item during visual word recognition. Note that larger N400 amplitudes for words with many orthographic neighbors relative to words with few orthographic neighbors (Holcomb, Grainger, & O’Rourke, 2002; Laszlo & Federmeier, 2009, 2011; Vergara-Martínez & Swaab, 2012) has also been interpreted in terms of a wider activation at the semantic level of representation from orthographically similar words.

The anterior-scalp distribution of the CD effect further suggests a different underlying neural substrate of CD, when compared to that of WF (central-scalp distribution; see Figs. 4 and 5). This distribution is consistent with previously found N400 effects related to “semantically richer words” observed mainly in frontal electrodes (concreteness: Adorni & Proverbio, 2012; Barber et al., 2013; Holcomb et al., 1999; Kanske & Kotz, 2007; West & Holcomb, 2000; semantic richness: Amsel, 2011; Müller et al., 2010; but see Rabovsky et al., 2012, for centro-parietal distribution of semantic richness effects). This frontal distribution has been linked with top-down control of semantic memory in prefrontal brain areas (Adorni & Proverbio, 2012). One explanation of this pattern is that activity from long-term memory is specifically enhanced for words related to richer concepts (in terms of more semantic features or number of different contexts in which the concept is typically found).

Notably, despite the fact that CD and WF produced electrophysiological effects, their behavioral counterpart was only obtained for the CD effect. In the lexical decision process, a “wordness” index may take advantage of the larger activation of the semantic networks for high-CD than for low-CD words (as shown by larger negativities for high-CD vs. low-CD words), thus producing faster response times for high-CD than for low-CD words. However, the effect of WF was not significant in the response time data. One potential reason why the behavioral WF effect was not apparent may have to do with the range of frequencies employed in the present study—note that our main goal was to maximize the differences in CD while controlling for WF. The Zipf values of WF for the high- and low-WF words were above 4.5 points, which is an upper limit for producing floor effects in lexical decision times (see Keuleers, Diependaele, & Brysbaert, 2010; Perea et al., 2013). Although ERP measures may be sensitive enough to capture the impact of subtle differences of WF on different levels of word processing (as shown by the sustained effect of WF), it is possible that this effect was not strong enough to differentially/functionally feed onto the lexical decision counterpart. All in all, the most relevant finding was the opposite pattern of the CD ERP effect when contrasted to the classic WF effect, a result that could be accommodated in a semantic enrichment interpretation of the CD facilitative effects in lexical processing.

Our findings contribute to the interpretation of the N400 as the result of different mechanisms or neural generators (divergent on time course and scalp signature) that may be differently involved in lexical-semantic retrieval, integration processes, or during the activation of semantic features in word reading (see Kutas & Federmeier, 2011, for review). One of these mechanisms would be sensitive to the strength of the memory traces regarding the specific characteristics of a word (lexical/structural: WF). A different mechanism would be more related to the semantic properties of a word’s subset network composed by interconnected/similar features at different levels of processing (CD; see Laszlo & Federmeier, 2011). The first mechanism may operate as an interface between the brain’s internal model of the environment (built upon the extraction of statistical regularities) and the encountered information. From a connectionist perspective of semantic memory the mismatch between predicted and real observations would be described as the “implicit prediction error” (Elman, 1990; McClelland, 1994), and has been proposed by Rabovsky and McRae (2014) to be reflected by the N400 amplitude. Within this framework, words that are encountered frequently are more prone to be expected (and would elicit lower implicit prediction error, reflected in smaller N400 amplitudes) than words that are rarely encountered (which would elicit larger implicit prediction error, reflected in larger N400 amplitudes). As the strength of activation of (lexical) representations adopts a relative value due to the continuous updating of the brain’s internal model, the N400 effects related to the WF manipulation on out-of-context words can be overridden when the same words are presented in highly constraining contexts (Van Petten & Kutas, 1990). Indeed, effects from measures that represent the properties of single items (larger N400s related to orthographic neighbor frequency and frequency of the top associate) have been reported to vanish in the second presentation of the words (Laszlo & Federmeier, 2011).

The larger negativities obtained for words with high CD could result from a second mechanism that is not determined by the actualization of an internal model according to experience or context, but rather to properties of the comprehension network at a semantic level of processing. Support for this idea comes from the finding that the N400 amplitude effect of ON size and number of lexical associates survive despite the repetition of the items, or when the stimuli are embedded in highly constraining sentences (Laszlo & Federmeier, 2009). The larger N400 amplitudes for words with many orthographic neighbors (or with many lexical associates) points to an enhanced activation of the semantic properties of the subset network for a particular item. This semantic level of processing seems to take precedence over structural/lexical processing in specific scenarios. In a reading experiment with unbalanced bilinguals on L1 and L2 word processing, Midgley, Holcomb, and Grainger (2009) presented words blocked by language and found a larger N400 for the L1 words compared to the L2 words. This is apparently an unexpected finding as the L1 was the preferred language for the participants (i.e., they were more frequently exposed to words in L1 than in L2). Midgley et al. (2009) concluded that the enhanced N400 amplitudes in L1 words reflected the larger degree of coactivation of similar representations at different levels of processing (orthographic and semantic) taking place in the native language compared to the nonnative language.

What would be the functional difference between the effects of word frequency and contextual diversity? On the one hand, the word-frequency manipulation seems to capture the consequences of mere repetition over word learning and word processing: a word’s memory trace is strengthened on each occurrence, boosting the efficiency of access on subsequent presentations. Indeed, repeated words elicit smaller N400 amplitudes compared to the first word presentation, as occurs with the word-frequency effect (see Besson, Kutas, & Van Petten, 1992; Nagy & Rugg, 1989). Conversely, the manipulation of contextual diversity (i.e., the number of contexts in which a word appears) seems to capture the way in which the meaning of words is represented, specifically, the variability in meaning that is enhanced across the multiple contexts in which a word is presented. Using both a corpus-based study and a learning experiment with an artificial language, Jones, Johns, and Recchia (2012) reported that words are encoded better across multiple contexts when “the current episodic context provides novel information about the words not already contained in memory” (p. 120), thus demonstrating the importance of CD in lexical organization (see also Recchia, Johns, & Jones, 2008, for further evidence). It may be important to note here that Räling, Holzgrefe-Lang, Schröder, and Wartenburger (2015) recently reported a study that exploited the functional meaning of the N400 as a way to disentangle the impact of two different variables (semantic typicality vs. age of acquisition [AoA]) on semantic processing during an auditory category-member-verification task. Of relevance to our study was that the pattern of the AoA effect resembled that of CD: AoA elicited behavioral facilitative responses (faster reaction times for early acquired targets), whereas its ERP counterpart consisted of early acquired targets eliciting larger early N400 amplitudes than the late acquired targets. Although Räling et al. (2015) did not discuss their results in the terms of richer semantic representations eliciting larger negativities, we believe that there are reasons to assume not only the existence of richer semantic representations of early acquired words but also to characterize the underlying relations between AoA, contextual diversity, and semantic enrichment (something that lies beyond the scope of this research study). For example, AoA may reflect not only the strength of network connections but also the quality of those words’ representations (see Ellis & Lambon Ralph, 2000, for simulations in a connectionist model). The idea is that words that are learned relatively late would not be completely comparable with those acquired earlier due to the continuing loss of the network’s plasticity over life. Likewise, in the area of language learning, Hills, Maouene, Riordann, and Smith (2010) analyzed the impact of contextual diversity in both acquisition and lexical processing. Hills et al. found that a word’s contextual diversity, which was defined as the number of unique word types a word co-occurs with in caregiver speech, not only predicted the order of early word learning, but was also highly correlated with the number of unique associative cues for a given target word in adult free association norms.Footnote 2

In a nutshell, when compared to raw WF, CD may capture the semantic enrichment produced by encountering the words across multiple and different contexts. Converging evidence for this account can be found in the field of human memory (see Hicks, Marsh, and Cook, 2005; Parmentier, Comesaña, & Soares, 2017). For instance, Hicks et al. (2005) found that CD and WF effects contributed independently to recall and posited the locus of CD effect at the level of associative connections between a to-be-remembered word and its episodic context. That is, the higher the number of contexts in which a given word appear, the higher the competition between the contexts as retrieval cues for this word (see Reder et al., 2000).

To sum up, this ERP experiment demonstrated that contextual diversity is not an epiphenomenon (or simply another indicator) of word frequency. Instead, the effects of contextual diversity are better explained as a function of semantically related factors: words that appear in many contexts may be richer in shades of meaning than the words that occur in few different contexts. Therefore, word frequency should not simply be replaced with contextual diversity in models of visual-word recognition and reading. While Althoiugh apparently associated, word frequency and contextual diversity originate from different sources during the access of lexical-semantic representations, as evidenced by its dissociating role at eliciting opposite ERP signatures. Additional research should examine in greater depth the interplay between word frequency and contextual diversity during word learning.