Introduction

A key process of word reading is mapping orthographic into phonological representations. Such an operation is performed fast and efficiently by skilled readers: Although word recognition and silent reading might be performed exclusively by means of the orthography-to-semantic mapping, phonological representations may be activated during those tasks. However, two key questions arise: How fast may phonological activation be during word recognition? In particular, how fast is suprasegmental information accessed, given that it is not explicitly marked in the orthography? In the present study, we will investigate the process of lexical stress and its precise time-course by recording event-related potentials (ERPs) during a simple lexical decision task.

Although frequently investigated, the role of phonological processes in word reading continues to be debated. While some studies suggested that phonological codes are not activated until relatively late (e.g., Harm & Seidenberg, 2004; Seidenberg, Waters, Barnes, & Tanenhaus, 1984; Waters, Komoda, & Arbuckle, 1985), other studies came to the opposite conclusion, indicating an early and prominent role of phonology in lexical access (Frost, 1998, 2012; for a more recent review, see Leinenger, 2014). Behavioral studies using masked-priming suggest that skilled readers use phonological information very early during lexical access: For example, Lukatela and Turvey (1994), in a lexical decision experiment, showed that participants were faster in categorizing target words (e.g., church) when they were preceded by pseudohomophone primes (cherch) than orthographic-control primes (chorch) (for similar evidence see, e.g., Braun et al., 2015; Perfetti, Bell, & Delaney, 1988). Moreover, masked-priming experiments manipulating the prime-target relation in terms of orthographic and phonological overlap suggest that the two types of information are generated quite early but have a different and consecutive time-course, with the latter immediately following the former (orthographic priming effects start to emerge with 17-ms primes whereas phonological ones with emerge with 50-ms primes; Ferrand & Grainger, 1993).

The previous studies, however, are inconclusive with regard to the time-course of the constitutive events of visual word recognition. Outcome measures like reaction times (RTs)and response accuracy only tap the final representation of the output of the system and may not detect differences during processing. Electrophysiological techniques bypass this limit by providing highly precise time resolution that is informative about all the stages of processing. ERP studies addressing the issue of the role of phonological representations in word recognition have reported mixed and inconclusive results. In their masked-priming study in Spanish, Carreiras and colleagues (2009) reported that prime-target orthographic similarity affects target processing in the 150- to 250-ms indow, whereas phonological similarity does so in the 350- to 550-ms window. Along the same lines, Grainger, Kiyonaga, and Holcomb (2006) ran a masked-priming experiment in English and investigated the time-course of orthographic and phonological activation by looking at transposed letter primes (e.g., barin-BRAIN vs. bosin-BRAIN) and pseudohomophone primes (e.g., brane-BRAIN vs. brant-BRAIN), respectively: Orthographic effects were found in the 150- to 250-ms window, whereas phonological effects started later, in the 250- to 350-ms window.

Other studies, however, show a different picture, reporting phonological effects during the very early stages of visual word recognition. In a masked priming study with English skilled readers, Ashby, Sanders, and Kingston (2009) presented targets with voiced and unvoiced final consonants (e.g., fad, fat) preceded by nonword primes that were incongruent or congruent in vowel duration and voicing (e.g., faz, fap). The authors found that phonological feature congruency modulated ERPs by 80 ms after target onset. A similar time-course for phonological processing was also reported by Ashby (2010) in another masked-priming experiment in which prime-target pairs shared their first syllable or were one letter different (e.g., po##-PONY vs. pon#-PONY): Phonological congruency (sharing the syllable) modulated ERPs as early as 100 ms, indicating fast phonological processing in word recognition (for converging evidence, see also Wheat, Cornelissen, Frost, & Hansen, 2010).

It is worth pointing out that all the reviewed (ERP and behavioral) studies shared the use of a masked-priming paradigm: Since such a paradigm “reflects the overlap of processing operations applied to the prime and the target” (p. 9, Kinoshita & Norris, 2012; see also Bodner, Masson, & Richard, 2006), what the system does while processing a target preceded by a prime may not be identical to what it does when processing a stimulus in isolation. Moreover, some types of experimental stimuli, like pseudohomophones, may have encouraged phonological processing (Braun et al., 2015; Grainger et al., 2006; Lukatela & Turvey, 1994; Perfetti et al., 1988).

While different aspects of phonological processing have been widely investigated, suprasegmental prosodic information has received much less attention; moreover, whatever data are available come from behavioral tasks that either tap the final stage of the output or reflect a coarse measure of the temporal course of processing involved in word recognition.

In the present study we investigated the process of word recognition using a lexical decision task, and manipulating the stress pattern of words and nonwords, thus directly focusing on the prosodic contrast. In a recent study, Colombo and Sulpizio (2015) reported three behavioral lexical decision experiments in Italian in which they manipulated lexical stress and stress neighborhood consistency. The stress neighborhood of a target stimulus (e.g., TAvolaFootnote 1 'table') is formed by the number of words sharing with it their final sequence (defined as the nucleus of the penultimate syllable plus the last syllable; Colombo, 1992) and having either the same stress pattern and consistent stress-neighbors (e.g., PENtola, BAMbola, 'pot, doll'), or a different stress pattern and inconsistent stress-neighbors (e.g., piSTOla, paROla 'gun, word'). Stress neighborhood has been shown to be used by readers as a probabilistic cue to assign stress in reading-aloud tasks (Burani & Arduino, 2004; Colombo, 1992; Colombo & Zevin, 2009; Sulpizio, Arduino, Paizi & Burani, 2013). In Colombo and Sulpizio's study, words had either penultimate- or antepenultimate-stress and either a consistent or an inconsistent stress neighborhood. Note that in Italian, which is a polysyllabic free-stress language, the two main stress patterns have an asymmetric distribution, with about 75% of three-syllables bearing stress on the penultimate syllable, and 18% bearing stress on the antepenultimate syllableFootnote 2 (Spinelli, Sulpizio, & Burani, 2016). Moreover, and more importantly, stress is not governed by rule,Footnote 3 and although it is to some extent predictable on the basis of the final sequence, the process of stress assignment cannot be based on rules or regularities, with stress being correctly assigned only through lexical mediation.

Colombo and Sulpizio (2015) found that word recognition was mainly affected by stress dominance, with participants being faster and more accurate in recognizing words with the penultimate-dominant than the antepenultimate-non-dominant stress. In contrast, the effect of the stress neighborhood was weak in lexical decision, although it was reliable in a naming task. These results would speak in favor of a prominent role of phonological processing during visual word recognition, which is plausible in a language like Italian with a highly transparent orthography. Moreover, the results are particularly relevant since they were obtained in a task that can in principle be performed purely on the basis of orthographic information and does not enforce phonological activation (as would be the case, e.g., in a priming study with phonologically related primes, or with the use of particular materials, like homophones).

However, the behavioral results of Colombo and Sulpizio (2015) are totally silent about exactly when prosodic information comes into play during word recognition, and how fast phonological processing can be in skilled readers. In order to look at the exact time-course of phonological processing in reading isolated words, lexical stress and stress neighborhood were manipulated and ERPs were recorded in the present study. Participants performed a simple lexical decision task, in which they had to categorize written stimuli either as words or as nonwords. Words had either a dominant or a non-dominant stress, and either consistent or inconsistent stress neighborhood (e.g., dominant consistent: graNIta 'slush'; dominant inconsistent: seNIle 'senile'; non-dominant consistent: MISsile 'missile'; non-dominant inconsistent: BIbita 'drink'). The predictions are clear-cut: If phonological information is involved in visual word recognition since its early stages, the suprasegmental effect reported by Colombo and Sulpizio (2015), that is, the effect of stress dominance, should elicit early waveform modulation (i.e., before 200 ms) with a difference between words with dominant versus non-dominant stress. As noted, we found weak stress neighborhood consistency effects in Colombo and Sulpizio (2015), suggesting that the more relevant information for readers is the overall frequency distribution of the main stress pattern. Thus, predictions for the effect of stress neighborhood consistency are less clear. Stress neighborhood consistency effects could be the result of different processes, namely: (a) an orthographic process of parsing the word’s final sequence, the phonological correspondence for which would be immediately activated by means of orthography-to-phonology tight connections and would influence the sublexical mechanism of stress assignment (Colombo & Zevin, 2009; Perry, Ziegler, & Zorzi, 2014);(b) in accordance with Monsell, Doyle, and Haggard's (1989) account of stress effects in lexical decision, a further process would combine sublexical and lexical outputs in a common phonological output “until the combined product exceeds some criterion of sufficiency” (Monsell et al., 1989; p. 62); in case of inconsistent words, the time-courses would be different from that for consistent words (for a similar reasoning, see also Burani & Arduino, 2004). That is, if stress neighborhood consistency, which is a phonological effect, has an early influence in word recognition, consistent words like graNIta (slush, dominant stress) and MISsile (missile, non-dominant stress) should show a similar electrophysiological pattern, but their waveforms are expected to differ from the inconsistent pair, SeNIle and BIbita (dominant and non-dominant stress). However, if orthographic effects reflecting the extraction of the final sequence (e.g., -ita, in graNIta) occur before the corresponding phonological activation, we might expect words like GraNIta and Bibita, on the one hand, and SeNIle and MISsile, on the other hand, to elicit a similar ERP modulation, as they share the ending, although they have different stress. Examination of the temporal course with ERPs will also allow determination of whether these effects occur early or reflect later processes. We measured both behavioral (RTs and accuracy) and EEG measures. Optimally, the data for both types of measures may present the same pattern. However, as is well known from the literature, often they do not (e.g., Carreiras et al., 2009; Rugg, 1990; Wu & Thierry, 2012). The most obvious reason for this difference is that, in lexical decision, behavioral measures reflect the final stage of the process, and often also more markedly reflect decisional processes (e.g., Balota & Chumbley, 1984), while EEG measures can track the whole processing on-line, being able to tap different points in the process.

Method

Participants

Twenty-four students (21 females, mean age: 23.16 years, SD: 4.2) from the University of Trento took part in the experiment. All participants were native Italian speakers; they were right-handed, had normal or corrected-to-normal vision, and reported to be neurologically healthy. Participants gave written informed consent for their participation after they were completely informed about the nature of the study. The study was approved by the ethics committee of the University of Trento.

Materials and design

We adopted as stimuli the same 120 three-syllable low-frequency words selected from the CoLFIS database (Bertinetto et al., 2005) and 120 three-syllable filler nonwords used in Colombo and Sulpizio, (2015). Words belonged to four sets, which were obtained by combining two experimental factors: Stress type (dominant vs. non-dominant) and stress neighborhood consistency (stress consistent neighborhood vs. stress inconsistent neighborhood). Due to the constraints of the language (the number of types for each ending), in most cases the same orthographic endings were shared by words among conditions. Specifically, dominant-consistent words (e.g., graNIta 'slush') shared 28 endings with non-dominant inconsistent words (BIbita 'drink'), while the latter shared 24 endings with dominant consistent words. Dominant inconsistent words (seNIle, ‘senile’) shared ten endings with non-dominant consistent words (MISsile 'missile'), while the latter shared 22 endings with dominant-inconsistent words. A consequence of this repetition of endings might be that participants would not able to “guess” each word's stress from the ending, and should be obliged to access the lexicon to assign stress correctly (as discussed in Colombo & Sulpizio, 2015). However, another possible consequence of this repetition might be that it more easily prompts attention to endings themselves (see “Discussion” below).

Stimuli were matched on the main psycholinguistic variables (see Table 1), except for bigram frequency, because this choice would have strongly limited the number of items in each condition.Footnote 4 The size of each word’s neighbourhood and the proportion of stress friends (i.e., words sharing the ending and stress pattern) were not matched, but they favoured words with non-dominant stress (against our hypothesis of an advantage for dominant stress) as in the overall language distribution non-dominant neighborhoods have fewer word types, but tend to be large-sized and with consistent neighbors (see Table 1).

Table 1 Summary statistics: means (and standard deviations) for the words used in the experiments: Words with dominant stress and consistent (GraNIta, slush) and inconsistent (seNIle, senile) stress neighborhood; words with non-dominant stress and consistent (MISsile, missile) and inconsistent (BIbita, drink) stress neighborhood. Examples of target words are in parentheses

Three sets of nonword fillers were included: Forty-five nonwords ending with a final sequence mainly associated with dominant stress (e.g., valona), 45 nonwords with a final sequence mainly associated with non-dominant stress (e.g., necile), and 30 nonwords with a stress-ambiguous final sequence (i.e., neither biased toward dominant nor non-dominant stress; e.g., gorafo). This division in three sets of nonwords was based on a previous study in which the likelihood with which each stimulus received each stress pattern was recorded (Colombo, Deguchi, & Boureux, 2014). Half of nonwords (50%) shared their ending with word stimuli.

The experiment's design was a 2 (stress type: dominant vs. non-dominant stress) × 2 (stress neighborhood: consistent vs. inconsistent neighborhood) within-participant design. Stimuli were presented in two blocks, in a random order; block presentation was counterbalanced across participants.

Procedure

Each trial started with a fixation cross, presented for 300 ms in the center of the screen; the fixation was followed by a short blank of 200 ms, then a stimulus (word/nonword presented in lowercase letters) appeared in the same position and was presented until the participant's response or for a maximum of 1,500 ms. Finally, a blink cue (--|--) was presented for 2,500 ms and was followed by a short inter-stimulus interval of 300 ms; participants were asked to blink only when such a blink cue was presented. A brief practice preceded the experiment. The experiment was run using E-Prime software (version 2.0, Psychology Software Tools, Pittsburgh, PA, USA; www.pstnet.com).

EEG recording and data processing

EEG was recorded from 64 scalp electrodes (Fp1, Fpz, Fp2, AF7, AF3 AF4, AF8, F9, F7, F5, F3, F1, Fz, F2, F4, F6, F8, F10, FT7, FC5, FC3, FC1, FCz, FC2, FC4, FC6, FT8, T7, C5, C3, C1, Cz, C2, C4, C6, T8, TP7, CP5, CP3, CP1, CPz, CP2, CP4, CP6, TP8, TP10, P7, P5, P3, P1, Pz, P2, P4, P6, P8, PO7, PO3, POz, PO4, PO4, PO8, O1, Oz, O2) positioned on an elastic cap according to the international standard position (10 to 20 system). An additional external electrode was placed below the left eye. All sites were referenced to the left mastoid and the ground was placed in the AFz channel. Impedance was kept below 10 kΩ. Data were acquired at a sampling rate of 250 Hz with a low-pass filter with 100-Hz cutoff frequency and 10-s time constant.

In order to better detect blinks and ocular movements, two virtual EOG channels were off-line computed as the difference between Ve1 and Fp1 (VEOG), and the difference between F9 and F10 (HEOG). All the other channels were re-referenced to the average mastoid activity, and filtered with a low-pass filter (20-Hz cutoff, 12 dB/oct) and a high-pass filter (0.1-s time constant 12 dB/oct).

EEG was segmented up to 600 ms after target onset. Artifact rejection was performed by means of an automatic threshold rejection algorithm: Epochs at which the voltage exceeded [−40 μV, 40 μV] for EOG channels, or [−100 μV, 100 μV] for any other site were not included in the average. Also trials where participants gave a wrong or no response were rejected from the dataset before averaging. Overall, 15.8% of the trials were excluded before ERP average. Single-subject waveforms for each condition were averaged in reference to the 100-ms pre-target baseline.

Results

Behavioral results

Response accuracy (overall 88.3%) and RTs of correct responses were analyzed using mixed-effects models (Baayen, Davidson, & Bates, 2008; Jaeger, 2008). RTs were log transformed in order to reduce the skewness of the distribution (e.g., Baayen, 2008) The models were fitted using the lmer function (lmerTest package; Kuznetsova, Brockhoff, & Christensen, 2013) in R software. Participants and items were entered as random factors, whereas stress type (dominant vs. non-dominant) and stress neighborhood consistency (consistent vs. inconsistent neighborhood) were included as fixed factors. Bigram frequency, a variable that could not be matched across stimuli but affected naming performance in Colombo and Sulpizio (2015), was also entered as predictor in order to control for its effect on participants responses. Results are reported in Table 2. The analysis of RTs showed a significant effect of stress type (β = 0.05, SE = 0.02, t = 2.80, p = .03), with dominant stress words being recognized faster than non-dominant stress words; no further effect reached significance (stress neighborhood consistency: t = 1.2, p >.2; stress type × stress neighborhood consistency: t < 1, p >.6; bigram frequency: t = 1.8, p >.06).

Table 2 Mean latencies for correct responses and percentage of errors (with standard deviations)

The analysis of accuracy showed that stress type approached significance (β = −0.78, SE = 0.43, z = −1.81, p = .06), with participants tending to be more accurate in recognizing dominant than non-dominant stimuli as words. No further effect reached significance (stress neighborhood consistency: z < 1, p >.7; stress type x stress neighborhood consistency: t < 1, p >.8; bigram frequency: t = −1.2, p >.2).Footnote 5

ERP results

Grand average waveforms at representative sites are plotted in Fig. 1. Visual inspection of the waveforms revealed a P1-N1 complex on the posterior sites that is typically associated with the presentation of visual stimuli; the close observation of grand average waveforms revealed that our experimental manipulations seemed to modulate electrophysiological activity at different time intervals: First, in the early components of P1, maximal on the posterior sites at ~100 ms after target onset, and fronto-central N1, i.e., the first negative peak on fronto-central sites. Then, in the fronto-central and right sites, in a later time window, i.e., 250–350 ms after target onset.

Fig. 1
figure 1

Event-related potential (ERP) waveforms corresponding to words in the four conditions at the 11 representative electrodes. ERPs are time-locked to the target onset. The first vertical line indicates the target onset; the next four thinner vertical lines indicate the boundaries of the time windows for the analyses (70, 150, 250, and 350 ms, respectively). Short ticks on the x-axis indicate 100 ms intervals. Negative voltages are plotted up

Fig. 2
figure 2

Schematic flat representation of the electrode position and the nine groups of electrodes used in the analyses (the front of the head is at the top)

Differences in the mean amplitudes of nine groups of electrodes (see Fig. 2) were tested across two latency ranges,Footnote 6 i.e., 70–150 ms (for the N1 and P1), and 250–350 ms (for the late effect) post-stimulus. This choice was based on both the literature review (e.g., Bigman & Pratt, 2004; Carreiras et al., 2009; Grainger et al., 2006; Vogel & Luck, 2000) and the visual inspection of ERP grand averages. Repeated-measure ANOVAs including stress type (dominant vs. non-dominant), stress neighborhood (consistent vs. inconsistent neighborhood), and topographic factors were performed. Where appropriate, critical values were adjusted using the Geisser and Greenhouse (1959) correction for violation of the assumption of sphericity. Figure 3A shows the ERP grand averages of two representative electrodes (F4, PO7) selected according to the topographical distributions of the effects. The topographical distribution of the significant effects is shown in Fig. 3B.

Fig. 3
figure 3

(A) Grand average waveforms highlighting the significant effects at F4 (left) and PO7 (right); the first vertical line indicates the target onset; the following four thinner vertical lines indicate the boundaries of the time windows for the analyses (70, 150, 250, and 350 ms, respectively) Short ticks on the x-axis indicate 100 ms intervals. Negative voltages are plotted up. (B) Topographical distribution of the effects: (a) dominant stress words with inconsistent neighborhood minus dominant words with consistent neighborhood; (b) dominant stress words with inconsistent neighborhood minus non-dominant stress words with inconsistent neighborhood; (c) non-dominant stress words with consistent neighborhood minus dominant stress words with consistent neighborhood

P1

Posterior regions of the scalp were used since the P1 is mainly visible on these regions (e.g., Di Russo, Martinez, & Hillyard, 2003). The ANOVA with stress type (dominant vs. non-dominant stress), stress neighborhood (consistent vs. inconsistent neighborhood), and laterality (left, central, right) as within-participants factors showed a significant stress type by stress neighborhood consistency interaction (F (1, 23) = 6.83, p =.01). Further inspection of the interaction by means of multiple comparisons (with fdr correction) showed a larger positivity for dominant-stress words with inconsistent neighborhood (seNIle, MEAN: 2..23μV) than for dominant-stress words with consistent neighborhood (graNIta, MEAN: 1.62μV; t (23) = 2.63, p =.04) while there was no difference between non-dominant-stress consistent (MISsile, MEAN: 2.06μV) and for non-dominant-stress inconsistent words (BIbita, MEAN: 1.63μV, t = 1.46, p >.3). Moreover, the difference between dominant inconsistent (SENILE) and non-dominant inconsistent words was significant (BIBITA, MEAN: 1.63μV t (23) = 2.63, p =.04). No further difference was significant (all other ts < |1.4|, ps >.3). Considering topographical variables, the main effect of Laterality (F (2, 46) = 4.47, p =.01) was significant. No further effect reached significance (all other Fs < 1.8, p >.1).

Frontal N1

The analysis was performed on the frontal and central regions, in which frontal N1 is evident (e.g., Vogel & Luck, 2000). The ANOVA with stress type (dominant vs. non-dominant stress), stress neighborhood (consistent vs. inconsistent neighborhood), Longitude (frontal, central), and Laterality (left, central, right) as within-participants factors showed a significant stress type by stress neighborhood consistency interaction (F (1, 23) = 4.32, p =.04). Further inspection of the interaction by means of multiple comparisons showed that dominant-stress words with consistent neighborhood (graNIta, MEAN: -1.17μV) were more negative than dominant-stress words with inconsistent neighborhood (seNIle, MEAN: 2.23μV, t (23) = 2.6, p =.01), whereas no difference emerged between non-dominant stress words with consistent and inconsistent neighborhood (MISsile MEAN: -.82μV vs. BIbita MEAN: -.45μV, t <1, p >. 4). Moreover, dominant-stress words with inconsistent neighborhood (seNIle) were also more negative than non-dominant-stress words with inconsistent stress neighborhood (BIbita, t (23) = 2.46, p =.02, respectively). No further difference was significant (all other ts < |1.2|, ps >.4). Among the topographical factors, the main effect of longitude (F(1, 23) = 5.81, p =.02) as well as the Longitude by Laterality interaction (F(2, 46) = 10.88, p <.001) were significant. No further effect reached significance (all Fs <1.8, p >.1).

Late effect (250–350 ms)

The ANOVA was performed including all the nine regions of scalp, with stress type (dominant vs. non-dominant stress), stress neighborhood (consistent vs. inconsistent neighborhood), Longitude (frontal, central, posterior), and Laterality (left, central, right) as within-participants factors. The interaction between stress type and stress neighborhood was significant (F (1, 23) = 5.81, p =.02), as well as the stress type by stress neighborhood by laterality interaction (F (2, 46) = 3.87, p =.03). The three-way interaction was due to the fact that the interaction between the two linguistic factors was significant at the central (F (1,23)= 5.98, p =.02) and right sites (F (1,23)= 8.44, p =.007), but not at the left ones (F= 2.64, p >.1). We further inspected the two-ways interaction by means of multiple comparisons. As in the earlier temporal ranges, dominant-stress words with inconsistent stress neighborhood (SENILE, MEAN: 1.19μV) were more positive than dominant-stress words with consistent stress neighborhood (GRANITA, MEAN: 0.61μV, t (23) = 2.48, p =.02), whereas no difference emerged between non-dominant-stress words with consistent and inconsistent stress neighborhood (MISsile MEAN: 1.31μV vs. BIbita MEAN: 0.76μV, t = 1.61, p > .1). Dominant-stress words with inconsistent neighborhood (seNIle) also differed from non-dominant-stress words with inconsistent neighborhood (BIbita, t (23) = 2.43, p =.02). Finally, different to the earlier stages, at the right sites, non-dominant-stress words with consistent stress neighborhood (MISsile, MEAN: 1.31μV) were more positive than dominant stress words with consistent stress neighborhood (graNIta, MEAN: 0.61μV, t (23) = 2.19, p =.03) but not significantly different from non-dominant-stress words with inconsistent stress neighborhood (BIbita, MEAN: 0.76μV, t < 1, p >.5). No further comparison reached significance (all ts < |1.15|, ps >.1). At the central sites, dominant-stress words tended to be more positive when they had an inconsistent than a consistent stress neighborhood (MEANS: 0.91μV vs. 0.30μV, t (23) = 2.59, p =.09); no further effect reached significance (all ts < |1.75|, ps >.1). Also the topographic factors Longitude (F (2,46)= 8.33, p =.005) and Laterality (F (2,46)= 3.30, p =.04) were significant. No further effect reached significance (all other Fs < 1.9, ps >.1).

Discussion

In the present experiment we manipulated stress pattern and stress neighborhood consistency in lexical decision and tested the time-course of these variables in word recognition in Italian. The behavioral data showed a significant effect of stress dominance and no effect of stress neighborhood consistency, overall replicating Colombo and Sulpizio (2015). The ERP data, however, showed a different pattern. Stress dominance and stress neighborhood consistency interacted and affected word recognition both very early (i.e., in the 70- to 150-ms time window) and later (i.e., the 250- to 350-ms time window) in processing.

In the early stage, we found an interaction between stress dominance and stress consistency, which might be described as a consistency effect for dominant stress words, but not for non-dominant stress words. Stated in these terms these results would already reflect phonological/prosodic processing. The stress consistency by stress dominance interaction has already been investigated with behavioral measures in reading (Burani & Arduino, 2004; Colombo, 1992), lexical decision (Burani & Arduino, 2004; Colombo & Sulpizio, 2015), and through simulations (Pagliuca & Monaghan, 2010; Perry, Ziegler & Zorzi, 2014), with inconsistent results that in part may be explained by the properties of items.

We note that words with dominant stress and consistent neighbors (graNIta) and words with non-dominant stress and inconsistent neighbors (BIbita) share the ending (-ita; i.e., the nucleus of the penultimate syllable and the last syllable), which is important to define stress neighborhood and to assign stress in reading (Colombo, 1992; Colombo & Zevin, 2009). Both types of words differed from words with dominant stress and inconsistent neighbors (seNIle). The overlap in the words sharing the ending is orthographic, because phonologically the former is a tonic syllable whereas the latter is not, and therefore the two endings are phonetically realized in a different way, that is with longer vowel duration in case of dominant-stress words (GraNIta) and with shorter duration in case of non-dominant-stress words (Bibita, as the vowel of the penultimate syllable is unstressed). Thus, the similar activation pattern for dominant consistent and non-dominant inconsistent words indicates differences between, on the one hand, words with an ending very frequently and consistently occurring in dominant stress words (-ita) and, on the other hand, words with an ending most typically characterizing a non-dominant stress neighborhood (-ile).

The differences we report were visible as a modulation of both the P1 on the posterior regions and the N1 on the anterior regions of the scalp. While P1 reflects processing of visual objects and its amplitude is sensitive to attention allocation (e.g., Luck, Woodman, & Vogel, 2000) and to other cognitive processes pertaining the visual dimension (Meeren, van Heijnsbergen, & de Gelder, 2005), anterior N1 is assumed to index selective attention to basic stimulus characteristics and initial-early selection for later pattern recognition (e.g., Key, Dove, & Maguire, 2005; Vogel & Luck, 2000). Our results thus suggest that early visuo-perceptual and attentional processes were affected by sublexical orthographic endings. Specifically, there was a similar early activation dynamic for words sharing a dominant-stress ending, but not the stress pattern; moreover, both differed from words with dominant stress pattern and a non-dominant stress ending (e.g., seNIle). Thus, we suggest that the pattern of activation we report presumably reflects an early process of orthographic parsing that isolates frequently co-occurring letter units like, in the present case, the final orthographic letter sequence. Note that it is not the absolute number of token occurrences that matters (number of words with those endings) because, as noted in the Materials section, non-dominant neighborhoods have fewer word types, but tend to be large-sized and with consistent neighbors). What seems to matter is the association between orthographic endings and the type of stress. Being associated to the dominant stress boosts the ending's processing. This further suggests a fast contact between sublexical orthographic units and information about the relative distribution of the stress pattern types, as a consequence of implicit learning of the association between word-endings and their most-likely stress pattern. This association is the result of the acquired knowledge provided by experience with the lexicon, and, in particular, with the word neighborhoods that share word endings. Note that a modulation of the P1 has been reported not only in studies involving low-level visual-orthographic manipulations (e.g., Blackburne, Eddy, Kalra, Yee, Sinha, & Gabrieli, 2014; Proverbio & Adorni, 2009; Su, Mark, Cheung, & Law, 2012; for visual-orthographic effects with a similar time dynamic, see also Hauk, Davis, Ford, Pulvermüller, & Marslen-Wilson, 2006), but also in lexical decision studies investigating the activation of semantic information (e.g., Bayer, Sommer, & Schacht, 2012; Scott, O'Donnel, Leuthold, & Sereno, 2009), suggesting that the early stages of visual perception can be shaped by stored knowledge.

One may object that the effects we found are purely orthography-based, while the P1 at the 75- to 150- ms range reflects visual processes, thus an earlier stage of processing. Indeed, in many stimuli, the same endings were used for dominant consistent and non-dominant inconsistent words (e.g., graNIta, BIbita), as well as for dominant inconsistent and non-dominant consistent words (e.g., seNIle, MISsile); therefore, an alternative interpretation of the P1 modulation (and of the N1 modulation discussed below) might be in terms of visual similarity among endings, boosted by the repeated letters. Although this alternative explanation is possible and might contribute to the pattern we reported, we would emphasize that if this was the case, words sharing a dominant-stress ending (e.g., -ita in graNIta, BIbita) should differ from all words sharing a non-dominant-stress ending, including non-dominant-consistent ones (e.g., MISsile). These words, however, differed from dominant-inconsistent words (e.g., seNIle), but were not significantly different from non-dominant-consistent words (e.g., MISsile). Therefore, although visual similarity may contribute to the P1 modulation, it does not seem to be the only factor affecting the early stages of processing.

Besides the P1 modulation, a second early effect emerged, that is, a smaller frontal N1 for dominant-inconsistent words than for both dominant-consistent and for non-dominant-inconsistent words. Frontal N1 has usually been associated with selective attention to the characteristics of the stimuli; this component typically reflects discrimination processes within the focus of attention (Hillyard, Hink, Schwent, & Picton, 1973; Vogel & Luck, 2000). The findings may reflect the allocation of initial attentional resources. In such a perspective, the pattern we report suggests that attentional mechanisms may be automatically oriented toward the endings in the early processing of the orthographic parser. Endings reflecting a dominant consistent neighborhood would be most likely to drive the selective attention mechanism. Note that our early ERPs effects must be interpreted with caution, since early effects are more safely interpreted in reference to visually identical stimuli. However, in lexical decision with isolated words this limit is hard to overcome since words belonging to different experimental categories are, by definition, visually different. With regard to the direction of the N1 modulation, one might wonder why the dominant-consistent words – which are associated to the fastest responses in the behavioral task – do not show the smallest N1. Indeed, the literature on attention shows that, with a visual cuing paradigm, perceptual features in attended (vs. unattended) locations elicit the largest N1 amplitude, which is thus associated with the easier-to-process condition (Luck & Kappenman, 2012). However, not all findings are consistent with this pattern. Lee and colleagues (2012) orthogonally manipulated the close probability of the final words in sentences and their orthographic similarity. They reported an N1 modulation only with highly predictable sentences, with a larger negativity for high orthographic similarity, that is when highly expected information was presented. Finally, in a lexical decision experiment with emotional words, Hofmann, Kuchinke, Tamm, Vo, and Jacobs (2009) reported a larger N1 and faster lexical decision times for positive than neutral stimuli (Brisemeister, Kuchinke, & Jacobs, 2014). Furthermore, in visual-orthographic tasks in which participants were presented with pairs of written stimuli that could be identical or include letter transposition or letter substitution (e.g., RFCV–RFCV vs. RFCV–RCFV vs. RFCV–RSTV) and were required to judge whether they were the same or different (Duñabeitia, Dimitropoulou, Grainger, Hernández, & Carreiras, 2012) the authors found both a larger N1 and faster response times for pairs including letter substitution than transposition.

The interaction between stress type and stress neighborhood consistency also affected word processing in a later time window, that is, between 250 and 350 ms after target onset. At this time, the orthographic effects persisted, as shown by the larger positivity for dominant inconsistent words (seNIle, 1.19μV) than for dominant consistent (graNIta, 0.61 μV) and non-dominant inconsistent words (Bibita, 0.76 μV). In addition, a suprasegmental phonological effect also emerged, with non-dominant-consistent words (MISsile, 1.31μV) being more positive than dominant-consistent words (graNIta, 0.61 μV). We point out that both types of words are stress neighborhood consistent, and only differ for stress. Thus this effect of lexical stress, in which words with the most frequent stress pattern differ from words with the less frequent stress, is in our view the earliest marker signaling the assignment of stress in our study. To our knowledge, this is also the earliest marker of stress assignment ever reported during the time-course of stress assignment in visual word recognition of isolated words. Indeed, other EEG–based studies on stress assignment of isolated words show rather late effects (Kriukova & Mani, 2016). Moreover, non-dominant stress consistent and non-dominant stress inconsistent words still did not differ, suggesting that consistency effects were not present.

Lexical stress, in Italian, can be correctly assigned by accessing the lexical phonological representation of the stimulus; similarly, stress neighborhood consistency emerges at the orthography-to-phonology interface by mapping orthographic information into a prosodic structure (see, e.g., Arciuli & Cupples, 2006; Colombo, 1992; Burani, Paizi, & Sulpizio, 2014). The interaction in the 250- to 350-ms duration range shows that, during word recognition, the system activates and accesses phonological information from the orthographic correlates of stress, i.e., the endings. Admittedly, such effect may have been emphasized by the overlap in the words' endings and their repetition in the different conditions of consistency, increasing the likelihood of stress- and consistency-related effects to emerge. This increase might consist of an increase of attention on repeated letter strings, which would increase the probability of parsing the words at the beginning and ending. Another way that repeated endings might increase this parsing process might be based on a purely orthographic process of increasing the activation of the letter endings, as might occur, for example, in an interactive activation model (Diependaele et al., 2010; Jacobs, Rey, Ziegler, & Grainger, 1998).

Using evidence of suprasegmental phonological activation, we have shown that the time-course of lexical stress assignment is fairly compatible with that reported by other ERP studies investigating phonological processing in visual word recognition (Ashby & Martin, 2008; Carreiras et al., 2009; Grainger et al., 2006), and may be related to a stage of phonological processing at which both whole-word phonological representations and sub-lexical phonology are active to constrain word recognition, with the reader using both types of information in order to reach her/his final decision about the nature of the stimulus. The possibility of analyzing the time-course of the effects of stress and consistency in lexical decision has allowed us to show that the stress neighborhood plays an early role that is no longer visible when the response is given, that is, from behavioral results that purely reflect the final stage of processing.

The present study differs from previous ones for two important aspects: (a) it investigates the activation of phonological representation by means of a paradigm in which the target to be identified is not preceded by other similar words, as in masked priming; (b) it investigates the time-course of the activation of supra-segmental information. With respect to (a), all the above-mentioned studies reporting significant effects have used a priming paradigm; in such a paradigm the emergence of early phonological effects could have been facilitated by the information made available by the processing of the prime, which pre-activates the relevant information for the to-be processed target. Therefore, the operations of the system during a prime-target trial might be different from those occurring during a single-target trial. With respect to (b), our results suggest that, during the early stages of word recognition, the system accesses phonological supra-segmental information by orthography-to-phonology mapping, through the association of the segmental material to stress position information.

Summarizing the results, in the present study we found early effects, which we interpreted as due to an early isolation of the sublexical orthographic units most frequently associated with a dominant stress, independently from the stress pattern that the words containing these endings received. Secondly, in a follow-up time window the ortho-phonological effect remained and a marker of the lexical stress difference clearly appeared. Early markers of orthographic information correlated to stress position can significantly contribute to word identification.