1 Introduction

The structure of Semitic words is limited by configurations consisting of a vocalic pattern and a prosodic structure (and often also an affix), where the consonantal positions in the prosodic structure are filled by stem consonants. For example, the Hebrew words gadal ‘he grew’, higdil ‘he enlarged’, godel ‘size’, and gadol ‘big’ share the ordered set of consonants /g/, /d/, and /l/ but differ in their configuration; and the verbs higdil ‘he enlarged’, hizkiʁ ‘he reminded’, and hidʁiχ ‘he guided’ share a configuration but differ in stem consonants.

There is a longstanding debate regarding the mental representation of the stem consonants. Traditional Semiticists (Moscati, 1980; Blau, 2010; Goldenberg, 2013), as well as many generative linguists (McCarthy, 1981; Arad, 2005; Faust & Hever, 2010; Faust, 2019), hold that the stem consonants comprise a morphological unit, the so-called ‘consonantal root’ (hereafter C-root). In contrast, there are studies that view these consonants as phonological elements within a morphological unit – the stem, which serves as the base of derivation (Heath, 1987; Bat-El, 1994b; Benmamoun, 1999; Ussishkin, 1999, 2000, 2005; Boudlal, 2018; see also Simpson, 2009 for historical perspective). We refer to these two views as the C-root Approach and the Universalist Approach, respectively.

Most results obtained from experimental studies are interpreted as supporting the C-root Approach, indicating that the C-root morpheme in Semitic languages is the key to lexical retrieval (Frost et al. 1997, 2000, 2005 and Berent & Shimron, 1997 for Hebrew; Boudelaa and Marslen-Wilson 2001, 2005, 2011 for Modern Standard Arabic; Ussishkin et al., 2015 for Maltese). However, only a few studies compared the predictions of the two approaches directly, and these few ones do not support the C-root Approach (Berent et al., 2007; Perea et al., 2014; cf. Geary & Ussishkin, 2018, discussed in Sect. 2.2).

In this critical review, we challenge the C-root Approach by re-examining previous behavioral results from a different angle. While various theoretical arguments against a C-root Approach have been presented elsewhere (see Sect. 1.1), the focus of the current study is on psycholinguistic findings from Semitic languages. We show that the results of priming studies of Semitic and non-Semitic languages alike can be explained by assuming that consonants are inherently more important than vowels for lexical retrieval regardless of other properties of the language. We thus argue that adopting a model of word recognition with an inherent consonant bias makes the idea of the C-root superfluous and puts speakers and readers of Semitic languages on a par with their Indo-European peers.

In the remainder of this introductory section, we review the debate on the organization of the lexicon of Semitic languages (Sect. 1.1), and highlight the consonant bias observed in lexical access experimental findings in a variety of language families and paradigms (Sect. 1.2). We then review results from psycholinguistic studies on Arabic, Maltese, and Hebrew in light of the universal properties of visual word recognition (Sect. 2) and auditory word recognition (Sect. 3), showing that Semitic languages are not exceptional. We conclude with a proposal to consider the inherent consonant bias in future psycholinguistic studies as a prospective alternative to the C-root (Sect. 4).

1.1 What is listed in the mental lexicon?

Words in Semitic languages like Hebrew and Arabic appear in configurations consisting of prosodic templates, vocalic patterns, and in some cases also affixes. In such languages, a verb, regardless of its source, must take the form of one of a relatively limited number of configurations (called binyanim in Hebrew). A richer and more permissive system of configurations exists in the nominal system (mishkalim), which includes quite a few words that do not fit into a configuration (mostly loan nouns). For ease of exposition, the following discussion focuses on the verbal system of Modern Hebrew, which includes seven configurations – five basic and two parasitic. All the configurations are disyllabic (though there are a few irregular monosyllabic verbs), with the exception of the last one in Table 1, where the prefix consists of a CVC syllable.

Table 1 Hebrew verb configurations

The two configurations on the right host passive forms only; they share a vocalic pattern (u–a) and differ from their respective active counterparts (on the left) only in the vocalic pattern. While this active–passive relation is fully consistent, the semantic relation among the other five configurations is not always transparent. For example, the configuration hiCCiC often hosts the causative alternate of CaCaC (e.g. ʃatak ‘he was silent’ – hiʃtik ‘he made X silent’). However, there are hiCCiC verbs that do not have a CaCaC counterpart and are not causative (e.g. himli ‘he recommended’; *mala); there are hiCCiC verbs that are related to another configuration (e.g. hiʁdim ‘he put to sleep’ – niʁdam ‘he fell asleep’; *ʁadam); and there are CaCaC – hiCCiC pairs that share stem consonants but do not share meaning (e.g. gazam ‘he trimmed’ – higzim ‘he exaggerated’), or exhibit a semantic relation which is weak at best (e.g. zaʁak ‘he threw’ – hizʁik ‘he injected’).Footnote 1

There is no doubt that the configurations are fundamental in Semitic morphology (Doron, 2003), at least structurally, i.e. that the shape of words is prosodically and vocalically restricted to include these sets of vowels.Footnote 2 However, the status of the residual phonological material in the verb, i.e. the stem consonants, is a matter of a longstanding debate that touches on the basic tenets of the lexical structure in Semitic languages.

The C-root Approach is considered the traditional approach to Semitic morphology (see the collection of papers in Shimron, 2003 and references therein). According to the conservative version of this approach, the Semitic lexicon consists of a list of C-root morphemes – ordered triplets of consonants (more rarely, pairs or quadruples) – and a list of configurations. Words are derived through the combination of a C-root with a configuration, as in the case of Hebrew gadal ‘he grew’ and gidel ‘he raised’, where the shared C-root g-d-l is mapped into the configurations CaCaC and CiCeC, respectively. Within the conservative C-root Approach (Moscati, 1980; Blau, 2010; Goldenberg, 2013), the C-root is the morphological basis of words, “associated with a basic meaning range common to all members of the root” (Moscati, 1980: 71).

However, as noted above, words sharing a C-root can display distant meanings that are not synchronically related. For example, daʁaχ ‘he stepped’ and hidʁiχ ‘he guided’, which seem to share the C-root d − r − χ, are not semantically related in any way that can be theoretically formalized. In general, it is not clear how far a word can drift from the basic meaning range of its C-root, what are the limits of the range, and whether the role of the configuration in the derivation of meaning can be pinned down. This poses a challenge to the claim adopted by the C-root Approach with respect to the basic meaning. Another challenge, this time on the morpho-phonological side, is that verbs derived from loanwords preserve clusters in the base word (e.g. filter ‘he filtered’, flirtet ‘he flirted’), a fact that cannot be predicted if the process of deriving a word always involves the extraction of the C-root and its assignment to pre-configured slots (Bat-El, 1994b).

The latter issue had been solved within a less conservative view within the C-root Approach, which admits that some words are derived from other words, rather than from roots. It thus allows an additional list of words within the lexicon, which can serve as the base for the formation of other words (Prunet et al., 2000; Arad, 2005; Ravid, 2006; Berman, 2012; Brice, 2017; Ziani, 2020). The word-based system within the C-root Approach is rather limited, mostly to denominative verbs (Arad, 2005) or more so to the nominal system (Ravid, 2006; Berman, 2012). Importantly, under this approach as well, “the stored lexical units contain roots on a distinct morphemic tier” (Prunet et al., 2000: 642), as proposed in McCarthy (1979, 1981). That is, words/stems are represented in the lexicon in a morphologically-motivated multi-tiered structure, which gives the C-root an independent status of a morphological unit. Thus, despite differences among these two versions of the C-root Approach, in both the stem consonants are considered a morpheme.

Psycholinguistic adaptations of the C-root Approach (Frost et al., 1997 and onward) maintain that the root is the organizing principle of the Semitic lexicon, and as such, activations of root morphemes govern lexical retrieval. Under the C-root Approach, the root is an atomic abstract unit, which does not have sub-parts. Therefore, in studies using the priming paradigm (the major paradigm for empirical studies of the Semitic lexicon), C-root overlap between the prime and target is predicted to be facilitative, whereas a partial form overlap which is not associated with the C-root is predicted to have no effect, since priming relies solely on the morphological organization of the language (see an explicit formulation in Deutsch et al., 1998 and discussion in Frost, 2012). Crucially, this contrasts with lexical retrieval in other languages, which is guided by activation of form representations (phonological/orthographic).

Contrary to the idea of the abstract unit of the C-root, which consists of consonants, the Universalist Approach posits the word/stem as the base of word derivation (Horvath, 1981; Lederman, 1982, Heath 1987, Bat-El, 1994b, 2003a,b, Ratcliffe, 1997, Ussishkin 1999, 2000, 2005, Simpson, 2009, Laks 2011, 2015, Lev, 2016). In the spirit of the word-based approach (Aronoff, 1976; Anderson, 1992), words/stems are listed in the lexicon, and their phonological representation and morphological properties can be affected by rules (e.g. affixation, apophony, gemination etc.) that change the form and meaning of the base to create another word in a systematic manner; this is also the approach advocated much earlier in Gesenius (1910[1813]) and Wright (1962[1859]). Words restricted by configurations are derived by the imposition of the configuration on the base, such that the vowels in the configuration override the vowels of the base stem. For example, gidel ‘he raised’ is derived from gadal ‘he grew’ by assigning the configuration |\(\upsigma^{\mathrm{i}}\upsigma^{\mathrm{e}}\)| to gadal (i.e. gadal + |\(\upsigma ^{\mathrm{i}}\upsigma^{\mathrm{e}}\)| ⇒ gidel). Similarly, higdil ‘he enlarged’ is derived from gadal by assigning |\(^{\mathrm{h}}\sigma ^{\mathrm{i}}\sigma ^{\mathrm{i}}\)| to gadal (i.e. gadal + |h\(\upsigma^{\mathrm{i}}\upsigma^{\mathrm{i}}\)| ⇒ higdil). As the configuration includes a prosodic structure, a vocalic pattern and sometimes also an affix, the processes involved can be affixation, apophony, and prosodic readjustment, which are lumped together under the term ‘stem modification’ (Steriade, 1988; McCarthy & Prince, 1990, Bat-El, 1994b).

Under the Universalist Approach, the consonants of the stem do not form a morphological unit (Bat-El, 1994b, 2003a,b, 2017). This is where the Universalist Approach deviates from the word-based approach advocated by McCarthy (1979, 1981). For McCarthy, words are stored in the lexicon along with the C-root; words are assumed to be stored as complex morphological objects composed of root and configuration, since for the most part, the meaning of a word cannot be compositionally derived from the root and the configuration. Contrarily, the Universalist Approach holds that stem consonants are merely phonological elements and do not form a morphological unit of any sort. Instead, within the theory of Feature Geometry (Goldsmith, 1976; Clements, 1985; see Sect. 3.3), consonants and vowels are segregated based on phonological grounds, thus accounting for patterns in various languages (Bat-El, 2003a). As consonants within the stem, their status also differs from that of affix consonants, which give cues to the word’s syntactic/thematic properties, that are partly systematic (contrary to the material of the stem, which is lexical).

In sum, under the C-root Approach, the lexicon is organized around roots (making it an item-and-arrangement approach; Anderson, 1992), while under the Universalist Approach the lexicon is organized around stems, with morphological relations connecting forms that are derived from each other (item-and-process; see Table 2). Importantly, the relations and representations that make up the Semitic lexicon under the Universalist Approach have also been proposed in analyses of non-Semitic languages. For example, the alternation within word pairs such as sing-sang and ring-rang is treated as apophony/ablaut in Anderson (1992), and pairs such as write-written share the ablaut + affixation mechanism that is so characteristic of Semitic. While apophony is a phonological process that often bears syntactic consequences in Semitic languages and elsewhere (Nespor et al., 2003), the insertion of consonants into pre-configured slots, as assumed by the C-root Hypothesis, is uniquely proposed for Semitic languages. Thus, the Universalist Approach seems more economic and typologically restrictive.

Table 2 The derivation of gidel ‘raise’ under the competing approaches

A third possibility, highlighted by a reviewer, can be termed the Hybrid Approach: regardless of how words are derived, both roots and stems/words are stored in the lexicon. This differs from the above-mentioned approach of McCarthy (1981) and Arad (2005), under which most words are root-derived but some are word-derived; under the Hybrid Approach, all words include in their representation both a stem and a root. Since the Hybrid Approach contains the same stored items as the Universalist Approach, any result compatible with the latter is also compatible with the first.

Of course, if the mechanisms of root insertion/extraction can be shown to play a role in speakers’ behaviors, the economic claim of the Universalist Approach bears little. As mentioned above, a large body of psycholinguistic studies claims to reveal the cognitive reality of the C-root as a morphological unit, using various paradigms: acceptability ratings (Berent & Shimron, 1997 – Hebrew), rapid serial visual presentation (Velan & Frost, 2007, 2011 – Hebrew), masked visual priming (Boudelaa & Marslen-Wilson, 2001, 2005 – Arabic; Frost et al., 1997; Velan et al., 2005 – Hebrew; Geary & Ussishkin, 2018 – Maltese), cross-modal priming (Boudelaa & Marslen-Wilson, 2015 – Arabic; Frost et al., 2000 – Hebrew), and masked and overt auditory priming (Schluter, 2013 – Moroccan Arabic; Ussishkin et al., 2015 – Maltese; Geary & Ussishkin, 2019 – Hebrew). Most of these studies aimed to validate the C-root Approach, without taking into consideration the alternative Universalist Approach (studies that diverge from this unilateral approach are discussed in Sect. 2 and Sect. 4).

Indeed, these studies found that in the above-mentioned Semitic languages, the consonants of the stem play a crucial role in lexical retrieval. For example, in priming paradigms, words that share stem consonants with the target are effective primes, regardless of the semantic overlap between prime and target (e.g. <mdryx> ‘guide’ primes <dryxwt> ‘alertness’; Frost et al., 1997). In addition, primes sharing all three stem consonants with the target induce facilitation relative to primes that share two stem consonants plus an affix. For example, <rʔh> ‘he saw’ primes <mrʔh> ‘mirror’ better than <mʔh> ‘hundred’.

In the current paper, we do not commit to specific models of word recognition or specific theories regarding the structure of the mental lexicon (i.e. the types of representations it stores and the computations that apply to them). Instead, we focus on showing that the observed experimental effects are not unique to Semitic languages. This demonstration serves two goals: first, it gives relevant and overlooked context to word recognition and reading models; second, it casts doubt on the interpretation of previous psycholinguistic work as supporting the C-root Approach.

Before diving into the psycholinguistic literature on the C-root, we dedicate the following section to review findings showing the processing salience of consonants in non-Semitic languages.

1.2 Consonant facilitation effects: a universal perspective

Cross-linguistically, languages tend to have more consonants than vowels, a fact that is in part due to articulatory and acoustic properties (Stevens, 1998). This makes consonants trivially more informative for lexical retrieval in most scenarios. The distribution of consonants and vowels across stems in different languages is another indicator of their distinct status: while the most common stems (i.e. affixless words) include a stable average of 2-4 consonants across languages, the canonical number of vowels differ considerably between languages, depending on the language’s permissible syllable structure. In addition, vowels tend to lose distinctiveness within a word through processes of reduction or vowel harmony, while consonants tend to be distinctive within a stem/word. This is evident in OCP effects, common in Semitic languages, as well as in Japanese, which does not allow more than one voiced obstruent within a stem, and Classical Greek, where three aspirated stops within one word are avoided (see discussion in Sect. 3.4). Nespor et al. (2003), who contributed these observations, developed the claim that the categorical perception of consonants vs. vowels is a general property of human auditory perception, which explains the tendency of languages to develop a particular (partial) division of labor between these two segment types. Specifically, whereas consonants contribute more to the lexical core of words, vowels contribute mainly to their grammatical function.Footnote 3

The perceptual differences between consonants and vowels are supported by evidence from various fields. For example, in the neuropsychological literature, there is evidence for a double dissociation between consonant and vowel recognition, suggesting that their processing is supported by different brain areas (Caramazza et al., 2000; Poeppel, 2001). It was further shown that in patients with a selective consonant deficiency, no difference in category recognition was observed based on the sonority of consonants, lending support to the hypothesis that the distinction between consonants and vowels is categorical rather than based on a gradual acoustic feature (Caramazza et al., 2000).Footnote 4

Further, as shown in Bertoncini et al. (1988), the distinction between consonants and vowels arises at a very young age – at the age of 2 months, infants can distinguish between vowels but not consonants. Even neonates are sensitive to the proportion of vowels vs. consonants in the speech stream, showing differential responses to languages that differ in this respect (Mehler et al. 1988, 2006 among others). Infants also seem to notice at a fairly young age that consonants are generally less variable in their production than vowels. Poltrock and Nazzi (2015) confirmed that at the age of 11 months, French-acquiring infants prefer to listen to familiar words compared to non-words. However, when presented with vowel and consonant mispronunciations, they prefer to listen to vowel mispronunciations, indicating that vowel change still leaves the item more similar to a familiar word status, compared with a consonant change (see review in Nazzi et al., 2016).

Similarly, vowel information seems to be more subject to change, and accordingly constrain lexical access less tightly. This is evident in the typology of phonological rules, where there are more cases of apophony than of consonant mutation (Nespor et al., 2003). Also experimental results show this difference between consonants and vowels. In online reconstruction tasks in Dutch, Spanish and English, subjects are faster to reconstruct a vowel-changed word like *coge into cage than a consonant-changed word such as *lage into cage (Van Ooijen, 1996; Cutler et al., 2000). In artificial language learning, speakers of various genetically unrelated languages prefer to generalize over consonants when segmenting the speech stream (Bonatti et al., 2005; Mehler et al., 2006). That is, when unsegmented words include the same consonants, speakers learn them more successfully than when they share vowels. Similar effects of consonant advantage was found with visual presentation in French: in a series of visual priming experiments, New and Nazzi (2014) found that at very short exposures of the prime (33 ms), both vowel and consonant overlap yielded facilitation (e.g. ibex is equally primed by *iten and *obux), but with a longer exposure period (66 ms) only consonant priming was found. A consonant letter advantage was found in several other unrelated languages, such as Spanish (Perea & Lupker, 2004, Carreiras et al., 2009a,b) and Thai (Winskel & Perea, 2013).

The distinction between consonants and vowels seems to be imported into the orthography of alphabetic systems (Van Ooijen, 1996), in at least two ways. First, as outlined above, consonant letters are more effective facilitators in priming tasks, at least at later stages of visual processing (New et al. 2008; New & Nazzi, 2014). Second, low performance with orthographic representations can selectively affect one class and not the other. One example of this is that deficits to the graphemic buffer can result in C- or V-letter errors, but not both in the same word, i.e. <chain> is likely to be misspelled as <*chaon> or <stain>, but not as <*chaln>. Another is that vowels are disproportionately affected for some people with dyslexia (Khentov-Kraus & Friedmann, 2011).

Bearing in mind the above evidence which support the salience of consonants in the lexical domain, we now move to a presentation of some of the factors governing word recognition cross-linguistically. After addressing form similarity and frequency-governed visual processing, we illustrate how the results of visual word processing obtained from Semitic languages are accounted for within a universal perspective. We also show that some of these results could not easily be accommodated under a C-root Approach, including unsung results from the very studies that claimed to support it.

2 Visual word processing: Frequency, neighborhood size and affix stripping

In this section, we argue that visual word processing of alphabetic systems is governed by the same factors for all languages, including Semitic languages. We demonstrate that most effects observed in masked visual priming experiments reflect a very early form processing that is not unique to linguistic stimuli. Understanding C-root effects in the visual modality first requires considering the lexicon of visual objects in Semitic languages, rather than the lexicon of linguistic representations.

A key fact about visual processing in general is that experience with a stimulus affects the degree and nature of neural activation. During our lifetime, we accumulate experiences of reading, resulting from exposure to letters of various fonts, sequences of letters of varying frequencies, and, most importantly, words, which are the most common sequences of letters as well as the units that interact with other linguistic representations. Letters and words are the visual objects we parse during reading. As noted, one of the insights about visual processing in general is that we have more stable and clear representations of familiar objects than unfamiliar ones. Moreover, we parse objects faster when they are presented in the company of familiar objects. For example, letters embedded in words (e.g. <s> in the English word <flash>) or in word-like letter strings (<s> in <frish>) are more efficiently recognized than letters embedded in unusual letter strings (<s> in <rfhsl>). Similarly, but from the other direction, uncommon sequences of letters (e.g. <xwys>) are easier to dismiss as non-words than common sequences (e.g. <waugh>; Binder et al., 2006). That is, it is easier to parse common sequences and harder to dismiss them as non-words because increasingly large sequences of letters are activated during their processing. In models of visual word processing, this is captured by a distributed activation network, in which word forms are not represented by a single node, but as a pattern of activation over their sub-parts (Seidenberg & McClelland, 1989).

There is also neurological evidence for the role of experience in visual processing. Studies show an effect of experience with objects (van Turennout et al., 2000) in the area identified as the Visual Word Form Area in the occipitotemporal cortex of the left hemisphere (Cohen et al., 2002), i.e. the area known for its activation during reading. Specifically, activations in this area are increasingly more selective for higher-level stimuli (i.e. longer, frequently co-occurring sequences of letters) toward the anterior fusiform region in the left hemisphere (Vinickier et al., 2007). This sensitivity to common word-like strings of letters is not dependent on the lexical status of the word; orthographically legal non-words show the same effect (Binder et al., 2006). Therefore, one important factor in visual word recognition is the frequency of the sequence of letters (i.e. the word): high frequency facilitates recognition.

A related crucial factor in visual word recognition is the neighborhood density of a word, i.e. how similar the given word is to other words in the language. An example of a neighborhood-size metrics is Levenshtein’s measure (Levenshtein, 1966; refined in Yarkoni et al., 2008), which defines the orthographic neighborhood size of a word w as the number of words created by a single-letter edit from w, i.e. insertions, substitutions or deletions in the refined definition. For example, farm, firm, forms, from and for are all one-edit neighbors of the word form. Words can reside in dense neighborhoods, i.e. have many word forms that can be made by 1-2 edits, or sparse neighborhoods, in which as many as 4 edits are needed in order to meet the nearest lexical neighbor (e.g. one of the nearest neighbors of pistachio is pitch). Research has shown that in lexical decision tasks without priming, dense neighborhoods facilitate recognition of real words and inhibit the recognition of non-words (Binder et al., 2006). The notion of “similarity” in the visual modality, however, is much more complicated, with many factors such as the number of positions in which substitution of a letter can create another word, and the positions themselves, also playing important roles (Luthra et al., 2020).

So far, we have discussed words and non-words as simple sequences of letters. This is inaccurate in at least two ways. First, as mentioned in Sect. 1.2, there is reason to believe that the status of a letter as a consonant/vowel is crucial even at very early stages of visual word processing (New & Nazzi, 2014; New et al., 2008; Schubert et al., 2018). Second, there is evidence that morphologically complex words are processed differently than monomorphemic words in the visual modality. Research on Indo-European languages revealed that sequences of letters that tend to occur at word edges are decomposed from the rest of the word at early stages of processing (Rastle & Davis, 2008). For example, although <brothel – broth> and <brother – broth> share the same degree of form similarity, the latter induces significantly more priming, since -er is a common sequence of letters at the right edge of English words, and is therefore rapidly stripped from the base, which is identical to the target.

This visual morphological decomposition process was shown to be automatic, not sensitive to semantics (at least during early processing; though see Feldman et al., 2009), and begin early in the course of processing, at a stage that might not even be sensitive to the phonological status of the letter as a consonant/vowel (30-50 ms; consonant-specific facilitation in French was reported with an SOA of 66 ms, New & Nazzi, 2014). This quasi-decomposition mechanism is reflected, among other things, in shorter RTs for decomposable words compared with morphologically simple words of the same length, a tendency that is enhanced in low-frequency words and more pronounced for children with dyslexia (Burani, 2010; Deacon et al., 2011).

There is some debate as to whether this effect reflects true morphological processing, since it is also congruent with the above-mentioned familiarity effect in the visual system. An alternative to a true morphological conceptualization of this process is a morpho-orthographic parser in which an affix is parsed as a separate unit, a possibility outlined in Rastle and Davis (2008). In such a model, decomposition would depend on how consistently a string that occurs at the word edge tends to represent a separate morpheme vs. part of the stem. However this effect is conceptualized, it is important to note the special status of letters at word edges in visual experiments; if primes or targets are likely to be decomposed, this could have an effect on the pattern of facilitation.

In sum, the picture of visual word recognition that arises from previous studies suggests an early visual word processing mechanism that relies on sequence frequency and neighborhood size, and an obligatory stage of affix stripping, i.e. parsing of common letter sequences at word edges. Also recall the consonant advantage in visual word recognition discussed above.

In the rest of this section, we accommodate data from visual experiments in Semitic languages with this picture. The experimental data described below are summarized in Appendix Table A.

2.1 Masked priming lexical decision: Consonant advantage and affix stripping

In this section, we revisit results from Hebrew and Arabic, and give them a broader context by also reviewing results from similar experiments in French, English and Spanish. We argue that the results attributed to the C-root can be explained within a universal context, with reference to consonant advantage (Sect. 2.1.1), affix stripping (Sect. 2.1.2) and the degree of morphological decomposability (Sect. 2.1.3).

A key observation to keep in mind is that the Semitic languages that have been at the center of psycholinguistic investigations, Arabic and Hebrew, have dense visual lexicons, i.e. most words are from dense neighborhoods (in contrast with Maltese, another Semitic language; see Sect. 2.2). The high density in Hebrew and Arabic is due to two factors: (a) the prosodic limits on word size – words exceeding three syllables are relatively rare; and (b) the under-representation and inconsistency of vowels in writing – the vowels a and e are rarely represented in the orthography; o,u, and i are optional in some words (e.g. kotel ‘wall’ is spelled with or without the letter <w>, i.e. <kwtl> or <ktl> ); vowels might also obligatorily surface in a word with a particular vocalic template and not in its close relative (e.g. <pryxh> ‘blossom’ vs. <prx> ‘blossomed’). As maintained by authors in the visual word recognition literature, in a view that we call the Orthographic Universalist Approach (e.g. Grainger & Ziegler, 2011; Davis, 2012; Norris & Kinoshita, 2012a, Whitney, 2012), the divergence between form-based similarity in Indo-European languages vs. apparent morphologically-based similarity in Semitic languages is the result of statistical facts about the distribution of consonantal letters within lexical items. As highlighted above, the number of shared letters is not a good metrics for form similarity between words.

2.1.1 Consonant advantage

Boudelaa and Marslen-Wilson (2005) explored the timeline of priming by stem consonants and configuration in Arabic, manipulating the time interval between prime and target presentation (with SOAs of 32, 48, 64, & 80 ms). The stem consonants priming condition included words sharing all three stem consonants (e.g. C-root ħ-r-q in prime <ʔħtrq> ‘he burned’ and target <ʔħrq> ‘he set ablaze’), with half of the C-root primes semantically related to the targets. The configuration priming condition involved words sharing a configuration (e.g. configuration CaCC in prime <laħðʕ> ‘notice’ and target <safr> ‘travel’), with primes and target not semantically related. They found a consistent facilitation effect in the C-root condition across all SOAs, and a small facilitation effect in the configuration condition only in the 48 ms SOA. Semantic relatedness showed an effect above and beyond phonological similarity in the C-root condition, but only with an SOA of 80 ms.

Although the authors took these findings as evidence for the existence of a C-root, they can be equally explained by the general consonant advantage that was observed in different languages. The facilitation effect in the root condition is compatible with the timeline of consonant advantage in French (see Sect. 2), which was also observable across SOAs of 33 and 66 ms (New & Nazzi, 2014). The second finding, that configuration priming is only faciliatory at relatively early stages, requires an in-depth investigation. It will be noted, however, that at least two types of prime-target relations in Indo-European languages also fade at a comparable rate: first, the aforementioned vowel letter priming (e.g. <vobi> primes <joli>); and second, morphological priming with opaque semantic relationships in English pairs like <APARTMENT>–<apart>, which was observed with an SOA of 43 ms but not with 72 ms (Rastle et al., 2000).

2.1.2 Affix stripping

Frost et al. (1997) were the first to claim that Hebrew visual word recognition is qualitatively different from visual word recognition in Indo-European languages. The main finding of this study is that primes of the same length, which share the same number of letters with the target, induce different effects: primes sharing all three stem consonant letters with the target induce facilitation relative to primes that share two stem consonant letters plus an affix/vowel letter. For example, <rʔh> ‘he saw’ primes <mrʔh> ‘mirror’ better than <mʔh> ‘hundred’. Note that the semantic relationship between the prime and target is not relevant here, because the authors used an SOA of 43 ms, which has previously been shown not to be sensitive to semantic relations (e.g. Boudelaa & Marslen-Wilson, 2005 described above; see also discussion in Kinoshita & Norris, 2012). In their series of experiments, Frost et al. (1997) demonstrate the same for Hebrew: semantically related pairs that did not share the C-root did not prime each other, and semantically related C-root pairs did not significantly differ in the magnitude of priming from semantically unrelated pairs.

According to Frost et al., these results support the C-root: all letters of the prime <rʔh> are C-root letters in both the prime and the target, while out of the letters of the ineffective prime <mʔh>, two are part of the C-root and the third is a prefix in the target.

We argue that the familiar process of affix stripping (described in other languages and explained in Sect. 2 above) can explain these results just as well. Particularly, after affix stripping (the initial m in <mrʔh> is a common prefix), the target is identical to the effective prime <rʔh>. This reflects a general property of Frost et al.’s design, not limited to the current example: 27 of the 48 pairs used in the experiment shared exactly the same relations, such that after affix stripping the prime and target were identical. In the remaining 21, the third (non-stem) shared letter between the ineffective prime and the target was a vowel letter, e.g. <kpl> ‘multiplied’ primed <kpyl> ‘a double’, contrary to <pyl> ‘elephant’. In these cases, the target does not include an affix, but a medial vowel letter. Recall that consonant letters were found to be more facilitatory than vowel letters with an exposure of 66 ms in French; although in the current case, primes were only presented for 43 ms, the differential status of consonants was likely to have played a role. In addition, letters at word edges seem to affect perceived similarity to a greater extent than letters in medial positions; see discussion on the Transposed Letter effect in Sect. 2.2.

In sum, effective primes shared with the target exactly three consonant letters that are part of the stem, while ineffective primes shared two consonant letters that are part of the stem in addition to a third letter with a diminished status – a vowel letter on an affix. This systematic asymmetry in the status of the letter of the prime can account for the differential priming effects induced by these primes without appealing to the C-root.

A similar case can be made for Velan et al.’s (2005) findings from Hebrew, whereby priming with two letters of the stem was more successful than priming with a prefix letter plus a stem letter (e.g. <pl> primes <mpwlt> ‘avalanche’ better than <pt> ). Here too, affix-stripped targets were systematically more similar to the effective than to the ineffective primes; in fact, as in the stimuli in Frost et al. (1997), two factors contributed to the higher similarity between the effective prime and the target: affix stripping and consonant superiority. For example, the affix-stripped stem of the target <pwl> is more similar to <pl> than to <pt> because they share all consonant letters; <pt> shares one consonant letter with <pwl>. All the stimuli pattern similarly.

2.1.3 Degree of morphological decomposability

Some of the findings obtained from studies on Semitic languages are difficult to reconcile within the C-root Approach. One such finding, reported in Frost et al. (1997), is that legal nonword primes, consisting of existing C-roots and a permissible configuration, did not facilitate lexical decision for targets sharing the same putative C-root (e.g. *<mqrh> did not prime <qyr> ‘harvest’ despite the shared C-root q--r). This is unexpected under the C-root Approach, because the C-root is considered a morphological unit represented in the lexicon and thus should automatically be extracted and guide lexical retrieval.Footnote 5 These findings also contrast with those from French pseudo-derived words (Longtin & Meunier, 2005), where nonwords like *<sportation> prime <sport>, although this combination of stem and affix is invalid even at the syntactic level (in real words, -ation only attaches to verbs).

We attribute this difference between Hebrew and French to the degree of morphological decomposability, affecting the speed of the process of affix stripping. Rastle and Davis (2008) propose several factors that affect affix stripping, one of which is the transitional probabilities between letters. Such a mechanism predicts that decomposition would depend on how consistently a string that occurs at the word edge tends to represent a separate morpheme vs. part of the stem. For example, unhappy is the common case for un, while uncle is the rare one; that is, most words containing the prefix un- are truly decomposable. For de, however, it is the other way around, with decay being the common case and devalue the rare one; that is, most words containing the prefix de- are non-decomposable, i.e. their stems are not words in their own right. This difference in compositionality means that the transitional probabilities between un and the following letters drop significantly more than in the case of de. Chateau et al. (2002) found that words with high-consistency prefixes facilitate each other to a greater extent than words with low-consistency prefixes (e.g. immobile facilitates impatient to a greater extent than adventure facilitates administer, although all words are decomposable). Moreover, in pseudo-derivations, high-consistency prefixes facilitate real derived words, but low-consistency prefixes do not (e.g. uncle facilitates unable, but affiliate does not facilitate affirm).

In the case of French, there would typically be a drop in probability between a stem and an affix (e.g. between <sport> and <-ation>). That is, upon encountering a word ending in <ation>, it is statistically very plausible that this part of the word is a suffix. Thus, French pseudo-derived words prime their stems.

In contrast with many Indo-European affixes, identification of affixes in writing is much more difficult in Hebrew, due to their size and ambiguity. Affixes in Hebrew typically include one to two letters (due to the near absence of vowel letters) and are often not distinguished from stem consonants. For example, the morphological status of a word initial <n> in Hebrew is ambiguous, because <n> serves as a prefix (e.g. <nlħm> nilχam ‘he fought’) and is also common as a stem consonant (e.g. <nyħʃ> niχeʃ ‘he guessed’). Similarly, <m> can be a prefix or a stem consonant (e.g. <m> manaχ ‘position’ vs. <mnh> mana ‘portion’). The nominal system of Hebrew is much more irregular than the verbal system and it has a greater variety of affixes, and this additional load might make the decomposition of nouns slower.Footnote 6

For real word items, decomposition might still occur in Hebrew; the overall frequency of the word and of the stem, and the experience with the entire sequence of letters may make the process fast enough to result in priming. For pseudo-derived items, however, decomposition is slower due to the ambiguity of letters between stem and affix. Therefore, pseudo-nouns with an affix fail to prime nouns with the same stem consonants in Hebrew, as observed in Frost et al. (1997).

Importantly, when the priming conditions, inter-stimulus interval and material composition are comparable, Spanish readers behave like readers of Semitic languages. Duñabeitia and Carreiras (2011) found that stem consonant overlap (e.g. <frl> – <farol> ‘lantern’) induces significant priming compared with vowel overlap (e.g. <aeo> – <acero> ‘steel’), as with Frost et al.’s (1997) experiment, in which stem consonant letters that do not by themselves comprise a word primed words that shared them. The magnitude of the priming effect compared with unrelated primes was also similar between the two languages. In an ERP study using the same materials, Carreiras et al. (2009a) found significant differences in two components when comparing the C-related with the V-related condition: the N250 and the N400. Both showed similar large negativities in the identity and C-related condition. Taken together, these results suggest that consonants are lexically more restrictive than vowels not only in Semitic.

2.2 The orthographic representation of the “Root” is not atomic

We showed that the evidence for stem consonant facilitation found in Hebrew and Arabic priming experiments is compatible with a theory of lexical activation that relies on consonant bias and morphological decomposability. The current section shows that yet another visual effect that was attributed to the C-root – the transposed letter effect (hereafter TL) – is not likely to be due to the atomic status of the C-root. While we do not propose a full account of the lack of TL priming in Hebrew, we highlight two related findings that undermine the C-root account: TL priming was reported in at least one Semitic language (Maltese); and C-root letter switch, unlike C-root transposition, was effective in another (Arabic).

In Indo-European alphabetic systems, switching positions of two adjacent consonant letters within a word, results in robust priming (e.g. Perea & Lupker, 2003; Kinoshita et al., 2009). For example, *<jugde> can easily be read as <judge> in English. Velan and Frost (2011) demonstrated that in Hebrew, medial transpositions as in *<srpyh> * for <spryh> sifrija ‘library’, are detrimental for reading. They also found that affixless Hebrew loan-nouns with five or more letters, described as having no C-root (e.g. <ʔgrtʕl> agartal ‘vase’) show a similar behavior to that of English words; letters in such words can be transposed with a minimal effect on reading. These findings were taken to indicate that Hebrew consonant letters code position more rigorously than English ones, but only when there is a C-root morpheme in the representation (Deutsch et al. 2000, Velan & Frost, 2009; Frost, 2012). That is, the orthographic representation of Hebrew readers is sensitive to morphological structure, distinguishing between stem consonants that form a C-root and stem consonants that do not.

We object to this interpretation on two grounds. First, the TL effect has many exceptions in Indo-European systems as well, not all of them are currently accounted for by leading word recognition models. For example, when the transposed letters are the first two in the word (e.g. <*csout> for <scout>), or are at the edge of a morpheme boundary (<*dresesr> for <dresser>), the effect disappears (Duñabeitia et al., 2007, Kinoshita et al., 2009). In addition, the effect is sensitive to the status of the letter as a consonant or vowel: there is priming when consonants are transposed across a vowel (<*condiser> for <consider>), but not when vowels are transposed across consonants (<*cinsoder>; Perea & Lupker, 2004). Recently, Schubert et al. (2018) showed that the asymmetry between consonants and vowels was eliminated when transpositions involved singleton consonants or vowels, rather than those in clusters (e.g. <*cholocate> vs. <*chocalote> for <chocolate>), suggesting that the difference is not in the degree of precision of C/V encoding, but rather in the difficulty to determine letter position within a sequence of letters with the same consonant/vowel status. That is, previous differential C/V effects merely mirror the fact that C-letter sequences are more common than V-letter sequences. This result is another demonstration of how the status of a letter as a consonant or a vowel plays a role in structuring orthographic representation. While models of visual word recognition have only recently started to integrate C/V status (e.g. Chetail et al., 2015), the lack of V-letter representation in Hebrew is likely to contribute to the lack of a TL effect.

Support for this direction of investigation comes from Maltese, a Semitic language which contains many unassimilated non-Semitic words. Perea et al. (2012) showed that letter transposition in Maltese does not hinder reading, even though the Semitic words in Maltese are analyzed as containing C-roots and templates. The authors demonstrated that in Maltese, words with a Semitic structure can be transposed with a minimal effect on reading, as in Indo-European languages. As they point out, and in line with the intuition that the lack of V-letter representation in Arabic and Hebrew plays a role, Maltese is written in Latin script where vowels are fully specified. Therefore, the statistical distribution of letters in Maltese is less “compressed” than that of Arabic or Hebrew, similarly to Indo-European languages. This finding therefore supports the claim that the distribution of letters within visual words, rather than the existence of a C-root, is the key to understanding the detrimental effect of letter transposition in Hebrew.

A related point is demonstrated in Geary and Ussishkin (2018), who set out to test C-root priming in Maltese, over and above consonant letter priming, using a masked priming task. Using the multiple lexica in Maltese, the experiment tested whether Semitic and non-Semitic words differ in priming by consonant letter representation. The results show facilitation only for Semitic words and not for non-Semitic words (e.g. Semitic <ħmk> primed <ħimek> while non-Semitic <rbr> did not prime <ribra>). The authors took this as evidence that the representation of Semitic words is qualitatively different from that of non-Semitic words within the same lexicon.

However, as Schubert et al. (2018) demonstrated, the distribution of consonant and vowel graphemes within a word plays a role in structuring orthographic representations, and we claim that this is a possible explanation for results obtained in Geary and Ussishkin (2018). The Semitic words in the experiment were mostly with CVC at the end of the stem (47/48), where the vowel grapheme was mostly <e> or <a> (45/48). The syllable structure in the non-Semitic words was more diverse, with few ending in CVC (5/48) or CVCC (3/48) and the majority ending with a vowel (40/48). If the visual lexicon of Maltese is skewed towards words that end with a C-letter, they may benefit more from priming. We therefore find these results inconclusive. If the results arise from the distribution of C and V letters in the Maltese lexicon, a prediction arises that non-Semitic words in Maltese with a similar syllable structure would show the same facilitatory effect as Semitic words. While it is impossible to test this prediction post-hoc on the results of Geary and Ussishkin (2018) due to the scarcity of appropriate stimuli, this case raises a clear divergent prediction between the phonological (Universal) and morphological (C-root) approaches that future studies can test.

Our second objection to the interpretation of TL effects is based on the fact that despite the alleged atomicity of the C-root representation under a C-root approach, sub-units of the root can induce form priming comparable with that found for Indo-European languages. Thus, Perea et al. (2014) observed that while the distributional properties of Hebrew and Arabic scripts, contrary to those of Indo-European languages, are such that transposing two letters within a word is very likely to create a new real word, replacing a letter has about the same chance of creating an existing word in both language families; that is, cases like rampdamp are as common in English as they are in Arabic. Under a Universalist Approach, therefore, performance in Semitic and non-Semitic languages should be comparable in priming experiments with switched-letter words (e.g. <gdl> gadal ‘he grew’ – <gml> gamal ‘he rewarded’), since the source of priming is form similarity. On the other hand, under the C-root Approach, partial form overlap is predicted to have no effect, since priming in Semitic languages should rely solely on morphological organization. In a series of priming experiments, Perea et al. (2014) found that Arabic switched-letter word pairs significantly prime each other (<*kxb> primes <ktb> ‘he wrote’), as expected under the Orthographic Universalist Approach but not under the hypothesis that C-root governs priming in Semitic languages.

In fact, a similar result was obtained in a previous study on Hebrew (Velan & Frost, 2011), where comparable materials induced a facilitation effect (e.g. *<tʃmyl> primes <trmyl> taʁmil ‘backpack’). This was a condition that the authors dubbed the “unproductive root” condition, since targets included three consonants that do not occur in other Hebrew stems (e.g. r-m-l, the C-root of taʁmil, is not found in any other Hebrew word). This criterion of inclusion ensures that words in the “unproductive root” condition were from sparse neighborhoods, because most words in Semitic share consonants with other words. In the “productive root” condition, i.e. with words from a dense neighborhood, no priming was found. In general, targets from dense neighborhoods are less effectively primed by form-related items in the visual modality; in fact, they were even shown to get a significantly smaller advantage from identity priming (Perea & Rosa, 2000; Kinoshita et al., 2009), which is likely to be the reason that the effect was not robust for items from a dense neighborhood.

3 Auditory word processing: rhyme and consonant facilitation

We have shown that in the visual modality, alleged C-root effects can be attributed to form similarity facilitation. Furthermore, we noted two bits of behavioral evidence that weaken the C-root Approach: (i) consonant letter facilitation is significant not only in Semitic but also in French and Spanish (Boudelaa & Marslen-Wilson, 2005; New & Nazzi, 2014; Duñabeitia & Carreiras, 2011); and (ii) sub-parts of the C-root are effective facilitators, provided that the distributional properties of the stimuli are controlled (Velan & Frost, 2011, Perea et al. 2012, 2014). The first data point undermines claims regarding the uniqueness of stem consonant facilitation in Semitic languages, and the second one goes against the atomicity attributed to the C-root.

The present section is dedicated to results from priming in the auditory modality, with reference to five experiments conducted in the past decade – one on Dutch (Cutler et al., 1999), three on French (Delle Luche et al., 2014; Turnbull & Peperkamp, 2017), and one on English (Delle Luche et al., 2014). These studies allow a relatively direct comparison between Semitic and non-Semitic languages, since they take into account the asymmetries between consonants and vowels (see Sect. 1.2). We use these studies to set a new universal frame to results from Moroccan Arabic, Modern Standard Arabic, Maltese and Hebrew.

3.1 Auditory priming in French, English and Dutch

The key point is that similarity in phonology (just as in orthography) cannot be measured just as the number of shared segments. First, vowels and consonants are not treated alike; results from Indo-European languages suggest that words sharing consonants are processed as more related than words sharing vowels. Second, the position of overlap between consonants is crucial for the obtained effect, with rhymes playing a more central role. We thus propose that auditory word recognition relies on one set of principles, regardless of the language, and these principles include at least the advantage of consonants, the facilitation of rhymes, and the distributional properties of the lexicon.

Priming studies in the auditory modality in non-Semitic languages consistently reveal two form similarity facilitation effects: rhyme facilitation (e.g. time primes rhyme; Seidenberg & Tanenhaus, 1979; Radeau et al., 1995; Norris et al., 2002) and consonant facilitation (e.g. *benu primes bni; Delle Luche et al., 2014). Additionally, word-initial overlap was found to be inhibitory (Radeau et al., 1995; Dufour & Peereman, 2003). This effect is usually attributed to lexical inhibition between related items.

Rhyme facilitation was observed under varying conditions: across a variety of inter-stimulus intervals (ISIs), with the proportion of related items within the task ranging from 15-70%, and with no regard to prime/target lexicality or frequency, i.e. both words and non-words that share rhymes facilitate each other. Importantly, this effect dissipates under cross-modal presentation (see Dufour, 2008 for a thorough review). Taken together, the lack of sensitivity to frequency, lexicality and composition on the one hand and the sensitivity to the modality of presentation on the other suggests a pre-lexical auditory similarity effect.

Consonant facilitation has been studied to a lesser extent. Compared with the rich literature on priming in Semitic languages, which provided evidence for the consonant-vowel asymmetry, there are only a few studies on non-Semitic languages that focused on the role of consonants vs. vowels in auditory priming. One such study is Delle Luche et al. (2014), which examined the consonant vs. vowel asymmetry using an auditory priming paradigm in English and French. Using non-word primes in both conditions, they found that overall, consonant overlap facilitated responses to the target to a greater extent than vowel overlap, both in French and in English (e.g. for French speakers, *synoma is a better facilitator to sinema than *timema; for English speakers, *benu is a better facilitator to bni than *nzi). Importantly, in their design, either all vowels or all consonants were shared between prime and target. The facilitatory result was obtained with a short ISI of 10 ms, replicating a previous result with French written words (New et al., 2008).

Cutler et al. (1999) tested Dutch speakers using a cross-modal lexical decision task, in which auditory non-word or real word primes were mismatched with visual real word targets by one phoneme. They tested separately for consonant and vowel mismatches (e.g. kaper ‘pirate’ was preceded by koper ‘buyer’ or kamer ‘room’). The results revealed significantly shorter RTs with related primes compared with control primes, regardless of the lexical status of the word: words and non-words primed phonologically related words to the same extent, suggesting that the effect cannot be attributed to inter-word priming. Interestingly, unlike Delle Luche et al. (2014), Cutler et al. (1999) did not find a consonant/vowel asymmetry: words sharing all segments except one consonant and words sharing all segments except one vowel facilitated targets to a comparable degree.

It is important to note, however, that the design of Delle Luche et al. (2014) is more similar to the auditory studies done on Semitic languages, in at least four respects: (i) all non-identical vowels/consonants of the word were mismatched in Delle Luche et al., while only one segment was mismatched in Cutler et al.; (ii) Delle Luche et al.’s experiments were purely auditory, while Cutler et al.’s presentation of the stimuli was cross-modal; (iii) Delle Luche et al. included several syllable structures and stress patterns as part of the experimental design, while Cutler et al.’s design did not test specifically for syllable structure and stress; (iv) Delle Luche et al. tested three-syllable-long words in addition to words containing two syllables, while Cutler et al.’s design included only disyllabic words. The last two factors were shown to modulate the effect: Delle Luche et al. found significant differences between consonant and vowel priming in VCVC words (both Trochees and Iambs), but not in iambic CVCV words, in both English and French. Like Cutler et al. (1999), they found no significant difference in the magnitude of priming for consonant and vowel substitutions in iambic CVCV pairs, but they found significant consonant superiority in other pairs (CVCV Trochees, VCVC and in Experiment 3 that controlled for the rhyme facilitation effect and tested only in French, also CVCVCV iambs). To summarize, it seems that consonant advantage occurs when primes and targets share all consonants, and the rhyme effect is controlled.

Turnbull and Peperkamp (2017) provide further support for this conclusion. They focused on mismatched segments in all possible positions within monosyllabic words. They tested French words with five types of primes: three types with one segment mismatched (_VC, C_C, CV_), and two baseline types with either all segments identical (the identity condition, assumed to induce maximal facilitation) or none (the control condition). All primes in their experiments were real words, and half of the targets were non-words that were matched for relatedness with real words, to reduce the chances of a response bias. They found consonant facilitation priming (C_C) comparable in its magnitude of facilitation to rhyme priming (_VC). No facilitation was observed with primes that mismatch on the coda.

Taken together, the results from Turnbull and Peperkamp (2017) and Delle Luche et al. (2014) demonstrate consonant priming, highlighting again the enduring role of consonants in lexical retrieval, similarly to what we have seen in the visual modality.

Finally, the independence of morphological priming from semantic and phonological factors was also explored in auditory paradigms. As in the visual modality, morphologically complex words are processed faster than simple words, all other things being equal (Winther Balling & Harald Baayen, 2008). In auditory priming experiments, French morphologically related pairs (gamin ‘child fm.’ – gamε̃ ‘child ms.’) prime each other to a higher degree than phonological (e.g. mãndarε̃ ‘Mandarin Chinese’ – mãndarin ‘Mandarine’) and semantic (e.g. ‘boy’ – fij ‘girl’) controls (Kouider & Dupoux, 2009). Bacovcin et al. (2017) similarly demonstrated that during the processing of past tense inflected verbs in English, the phonological representation of the stem becomes available on its own. Using the rhyme facilitation effect, they showed that primes that rhyme with the stem of the target (e.g. dough for snowed) induce a facilitatory effect compared with primes that do not rhyme with the stem (void). Together, this seems like evidence that bare stems are made available during the auditory processing of inflected words. Derivational morpheme priming was also found to be facilitatory in a bi-modal paradigm in English, with words sharing the same phonological material (e.g. darkness primes toughness but not harness; Marslen-Wilson et al., 1996). Facilitation was also modulated by the productivity of the affix (-ness can attach to almost any English noun while -ment is restricted), such that productive affixes induced more priming.

In sum, priming in the auditory modality exhibits the following effects:

  1. 1.

    Final overlap is in general facilitatory (rhyme effect);

  2. 2.

    Word-initial overlap priming is in general inhibitory (cohort competition effect);

  3. 3.

    Consonants are more facilitatory than vowels, e.g. a mismatched coda consonant eliminates facilitation while a mismatched vowel induces facilitation in CVC items;

  4. 4.

    Morphologically complex words make available during processing both the bare stem and the affix. This is true at least for cases tested so far, in which the affix was productive.

3.2 Semitic findings from the Cross-modal lexical decision task

The idea of presenting primes and targets in different modalities was conceived to avoid purely low-level similarity effects (like the modality-dependent rhyme facilitation effect) and tap into more abstract representations. Indeed, some similarity effects survive cross-modal presentation, and some do not; consonant priming seems to be one of the effects that remain under cross-modal presentation (Turnbull & Peperkamp, 2017).

From its inception, cross-modal priming was shown to be sensitive to semantic relations (Swinney, 1979). It is therefore not surprising that in cross-modal lexical decision, Hebrew words show an effect of semantic priming, in addition to the form priming effect found in masked visual lexical decision (Frost et al., 2000). While the facilitative effect of semantic relatedness is expected, another central effect, more relevant for our discussion, was robust stem consonant facilitation, e.g. madʁiχ ‘guide’ (auditory prime) primed <hdrxh> ‘guidance’ (visual target) to a greater extent than mehudaʁ ‘fancy’ (see Table 3). A similar result was obtained in Arabic, but with semantic relatedness playing a more limited role (Boudelaa & Marslen-Wilson, 2015).

Table 3 Examples of experimental conditions and the results of Frost et al. (2000), Exp. 2. The examples are taken from an illustration table in the paper; RTs and error rates reflect mean across conditions. Affixes are marked in bold

Results of stem consonant priming are congruent with the Universalist Approach, under the assumption that the stem form becomes available during lexical activation (Bacovcin et al., 2017; see discussion above). For example, compare items from the “Phonological control” with the “C-root -semantics” condition from Frost et al. 2002 (Exp. 2), given in Table 3 above. These examples reflect the relationship between stimuli within each condition. If some representation of the stem is available to listeners during processing, as in English auditory processing, then the C-root related prime shares significantly more phones/graphemes with the target.

3.3 Semitic findings from overt and masked unimodal auditory presentation

Unimodal auditory presentation allows exploring low-level similarity effects, as well as lexical competition effects. A commonly cited disadvantage of this paradigm is that participants are aware of the primes and are therefore more likely to develop response biases. For instance, if most real word targets share an onset with their primes, participants might be biased to accept targets with the same onset without hearing them through, yielding a facilitation effect, while with a different composition of the materials a shared onset could yield significant inhibition (see discussion in Dufour, 2008). As discussed in Sect. 3 above, this paradigm is nonetheless informative with regard to auditory word processing, e.g. the rhyme facilitation effect is unique to this modality and cannot entirely be reduced to a response bias (Norris et al., 2002).

In this section, we review studies on Semitic languages that used this paradigm, including a version of the task in which primes are made less perceptible by a form of auditory masking. As mentioned above, there is relatively little data on consonant facilitation effects in the auditory modality. Nevertheless, we show that results from Moroccan Arabic (Schluter, 2013) and Maltese (Ussishkin et al., 2015) mimic some patterns from English and French discussed in Sect. 3. The findings discussed below are summarized in Appendix Table B.

Schluter (2013) explored priming effects in the Moroccan Arabic verbal paradigm. Related pairs shared all stem consonants, with targets belonging to one configuration (CCVC) and primes to another (CVCCVC). Some primes were real words and others were not, but all C-root primes shared the same degree of consonant overlap with the target (see Table 4).

Table 4 Examples of experimental conditions in Schluter (2013), adapted from Table 3.7, Exp. 4. Note that only the C-root condition involves a non-word prime

Schluter found that non-word primes that share the stem consonants with the target induced a facilitation effect under overt auditory presentation. In fact, non-word primes were better facilitators than real-word primes under these conditions; that is, *həttər primed htər ‘talk nonsense’ to a greater extent than səʕʕəd primed sʕəd, compared with the control. A possible explanation for this difference in magnitude was proposed by Delle Luche et al. (2014): non-word primes might be more facilitatory at early stages since they do not activate a lexical item, an activation that should result in competition and induce lateral inhibition of related forms under most models of lexical retrieval. A related prediction is that words that have many competitors – i.e. words with many lexical neighbors – should not benefit as much from priming. As discussed in Sect. 2.2, this prediction is borne out in the visual modality: words from dense neighborhoods were found to benefit less from priming, including identity priming (Perea & Rosa, 2000; Kinoshita et al., 2009).

In order to closely investigate the timeline of consonant priming, Schluter (2013) conducted another experiment, this time with a masked presentation of the prime. In the masked auditory priming technique (Kouider & Dupoux, 2005), subjects are not fully aware of the prime, as it is embedded within samples of reversed speech and compressed to 35-40% of its original length. In French, identity priming for real words was facilitatory between 35%-70% compression; for non-words, facilitation was only observed between 50-70%. Similarly, in Moroccan Arabic, compressed C-root related non-words were not facilitatory (contrary to the results with overt presentation), but the C-root priming effect for real words remained. That is, səʕʕəd still primed sʕəd, but *həttər did not prime htər compared with the control condition.

If overt and covert priming had yielded the same result, lexicality could explain facilitation by real words in both, and the C-root could explain facilitation by non-words in both. However, the mismatch between overt and masked auditory priming with non-words is unexpected under the C-root Approach. This contrast is easier to explain within the broader context of word recognition. As mentioned above, object recognition in all domains of cognition relies on experience for accurate and fast results. We propose that the lack of priming by non-words in the masked paradigm is due to a difficulty to parse the prime – a previously unheard sequence – when it is compressed and embedded in noise, resulting in longer processing times, rendering the prime irrelevant for the processing of the target. This is consistent with results from French, reported in Kouider and Dupoux (2005): at a compression rate of 40% and below, non-words did not benefit from repetition priming while real words did.

In Maltese as well, target words were facilitated by primes that share their stem consonants (i.e. C-root) in both overt and masked priming (Ussishkin et al., 2015). In this series of experiments, priming of real words by real words was compared with priming of non-words by non-words, with the same degree of form overlap in the two cases; e.g. a real word pair like sikket ‘to silence’ – siket ‘to be quiet’ was compared with a non-word pair like *gemmeħ – *gemeħ. Non-words were composed of an existing configuration and three stem consonants that never appear in that order within a Maltese word.

The experiment ran in two versions: an unmasked and a masked priming lexical decision task. The result in both cases was priming with related real words, but not with related non-words, despite what seems like a perfect control for degree of similarity. Related non-words benefited from identity priming but not from consonant sharing in the unmasked experiment; in the masked experiment, there was no priming for non-words at all, including identity priming. The authors took this as evidence that morphological, and not phonological relations derive the results: if the effect were purely phonological, it could be expected that non-word targets should also benefit from it.

However, it can also be claimed that non-words did not benefit from stem consonant overlap for different reasons. As mentioned above, non-words seem to need a longer processing time in order to be effective primes (Kouider & Dupoux, 2005). In addition, non-word targets might not benefit from exposure to a related prime to the same extent as real words. This is because acceptance responses (to real words) and rejection responses (to non-words) in lexical decision tasks are different – rejection typically takes longer, and error rates depend on the composition of the experimental list. Indeed, Ussishkin et al. note that in the overt experiment, error rates in consonant-related non-words significantly differ depending on the prime, such that consonant-sharing primes induced more “false alarms” (Identity – 6.99% errors; Unrelated – 7.84% errors; Consonant-sharing – 15.2% errors). An opposite trend was observed with real words: consonant-sharing words were responded to more accurately (Identity – 10.42%; Unrelated – 11.64%; Consonant-sharing: 8.09%). This pattern of results might be interpreted as a “yes” response bias: participants tended to accept real words and non-words alike when they shared the consonants of the prime, making the recognition of consonant-sharing real words more accurate, and the recognition of consonant-sharing non-words less accurate.

In order to test whether our alternative explanation is plausible, i.e. whether a positive response bias may have been responsible for some of Ussishkin et al.’s (2015) results, we examined previously unanalyzed accuracy rates for non-words from Delle Luche et al.’s (2014) series of auditory priming experiments, described above.Footnote 7 As in many lexical decision tasks, non-words from this study were not analyzed, since they were viewed as a crucial part of the task (the decision part in lexical decision), but uninteresting on their own. As in Ussishkin et al.’s (2015) design, non-word pairs had an unrelated and a consonant-related priming condition; they additionally had a vowel-related condition, and in Experiment 3 a rhyme-related condition was added. Similarly to the Maltese materials, consonant-sharing non-words had identical consonants, e.g. *abyl-*ibol. Vowel-sharing non-words had no consonant in common in Experiments 1-2 (*itom-*ibol), and shared one consonant in Experiment 3 (*kinema-*timema). Rhyme-sharing non-words from Experiment 3 also shared one consonant (*kynima-*timema).Footnote 8 The task included about the same proportion of real/non-words and the relations between the prime and target, except for the fact that Delle Luche et al. did not include an identity condition.

We hypothesized that the recognition of consonant-sharing non-words should be less accurate, complying with the strategic “yes” response bias. Table 5 confirms that this is the trend: C-sharing non-words were rejected less accurately compared with Unrelated non-words, regardless of syllable structure. They were also rejected less accurately than V-sharing non-words, the one exception being the CVCV condition in Experiment 3, in which V-sharing trials were less accurate.Footnote 9 In a logistic regression model predicting Accuracy from Condition, that included disyllabic Unrelated, C-sharing and V-sharing non-words from all three experiments, this trend was shown to be statistically significant.Footnote 10

Table 5 Accuracy rates for non-words in Delle Luche et al., 2014 (SE). Rows indicate syllable structure (of both prime and target), and columns indicate the Prime: U = unrelated, C = consonant-sharing, V = vowel-sharing and R = rhyme-sharing
Table A Results from paradigms using visual stimuli: The task is masked visual priming, unless otherwise mentioned. Primes marked with * are non-words. Stem consonant letters are in bold; in priming conditions, only stem consonant letters shared with the target are in bold. A more detailed review of the results is in the body of the text
Table B Comparable experimental findings from Semitic and Indo-European languages with cross-modal/auditory presentation (C – consonant; V – vowel)

Apart from the illustration that this type of material composition can encourage strategic responses, this result also provides additional support for the consonant bias hypothesis in the following way. The experiments presented in Delle Luche et al. (2014) included an equal number of C- and V- sharing words. Nevertheless, participants seem to have relied on C-sharing to a greater extent in order to predict target lexicality; they anticipated that C-related targets would be words, and therefore committed more errors with C-related non-words.

In sum, results from Moroccan Arabic and Maltese show consonant facilitation, with neighborhood size and the lexicality of the target modulating the effect. We have shown that these results are congruent with current research in auditory lexical retrieval in non-Semitic languages.

3.4 The most famous C-root restriction is sensitive to stem vowels

In the previous sections we have seen that consonant facilitation – and consonant mismatch – can explain results of auditory priming in Modern Standard Arabic, Hebrew, Moroccan Arabic, and Maltese. We further pointed out similarities between results from Semitic and Indo-European languages – results that were often claimed to be unique to Semitic languages: (i) lack (or reduction) of non-word target facilitation, compared with real word target facilitation with the same degree of form similarity; (ii) lack of consonant priming when there is a mismatch in one consonant. These similarities were overlooked in earlier studies. Our last piece of evidence challenging the C-root approach is from a study of co-occurrence restrictions within Semitic stems.

The co-occurrence restrictions on stem consonants in Arabic and other Semitic languages have been a familiar generalization about the Semitic lexicon since Greenberg (1950). Essentially, stem consonant sequences in which the final two consonants are identical (e.g. Hebrew minen ‘dosed’) are very common relative to sequences in which the first two consonants are identical (e.g. Hebrew mimen ‘funded’).Footnote 11 These restrictions are not merely a statistical fact about the lexicon, but also psychologically real; experimental studies show that speakers are sensitive to the distribution of stem consonants within the word (Berent & Shimron, 1997, Shimron & Berent, 2003, Yeverechyahu, 2014). For years, co-occurrence restrictions and the experimental studies that showed their psychological reality served as evidence supporting the C-root (McCarthy 1979, 1981). Adopting the cross-linguistic generalization that co-occurrence restrictions apply to adjacent sounds only, the existence of such restrictions on stem consonants seems to necessitate an abstract C-root morpheme.

However, there are at least two phonological theories that can account for the co-occurrence restrictions without assuming a C-root. The theory of Feature Geometry (Clements, 1985) provides a hierarchical phonological representation that allows a phonologically-based segregation between consonants and vowels. Crucially, the segregation is phonological and therefore there is no need for the morphological segregation imposed by the C-root (Bat-El, 2003a) – the co-occurrence restrictions refer to a phonological tier. More recently, within the framework of Optimality Theory, the co-occurrence restrictions in Semitic were accounted for in terms of consonant correspondence (Gafos, 1998; Bat-El, 2006). Importantly, consonant correspondence is a universal mechanism available for consonant harmony across languages (Hansson, 2001; Rose & Walker, 2004).

In addition to the theoretical advantage of relying on independently supported mechanisms for deriving these generalizations, the phonological explanation predicts that languages from other families may also exhibit similar consonant-specific co-occurrence restrictions. This prediction is borne out: Japanese, for example, presents a similar co-occurrence restriction in Yamato stems (see also Coetzee & Pater, 2008 for Muna, and references therein). As in Hebrew and Arabic, the number of observed consonants with the same place of articulation within a stem (labial-labial, coronal-coronal, dorsal-dorsal) is far below the expected value if there was no restriction (Kawahara et al., 2006).

Finally, Berent et al. (2007) provide evidence for the storage of a stem representation. The authors conducted a series of grammaticality judgment and online lexical decision experiments on Hebrew, showing that co-occurrence restrictions are modulated not only by stem consonants but also by stem vowels. This is of particular importance, since sensitivity to vowels, which is also found in denominative verbs (Bat-El, 1994b), cannot be attributed to a C-root morpheme. The authors argue that if the consonant co-occurrence restrictions are instantiated at the level of the C-root, it should not matter which vowels intervene between the root consonants; XeYeY and XiYuY should be identical as far as the C-root is concerned as in both it is XYY. On the other hand, if the lexicon stores stems – representations that include intervening vowels (instead of or in addition to roots) – then intervening vowels should strengthen or weaken the effect of the restriction. XeYeY stems are much less common in the lexicon than XiYuY stems, allowing a direct examination of the hypotheses: If the co-occurrence restriction applies to C-roots, no difference in acceptability is expected between XeYeY- and XiYuY-type words; the distribution of co-occurring consonants within the different templates is coincidental, since, according to the C-root Approach, the grammar stores information about consonants and vowels separately. If the co-occurrence restriction applies to stems, XeYeY-type words are expected to be less acceptable than XiYuY-type words, in accordance with the relative amount of stored lexical items of these forms, respectively.

The latter prediction turned out to be correct. In offline judgment tasks, where subjects rated items of the types XiXuY, XiYuY, XiYuZ, and XeXeY, XeYeY, XeYeZ, they consistently judged XiYuY as better than XeYeY compared with their respective XYZ counterparts. Additionally, in an online lexical decision task, it took longer for participants to decide that a XiYuY-type item was a non-word than it did to decide about a XeYeY-type item. As Berent et al. (2007) point out, these results do not mean that the restriction against XeYeY is not active at both the C-root and the stem levels; it merely means that while the consonants of the stem are at the heart of the Semitic consonantal co-occurrence constraint, vowels have a role in it as well, in line with the hypothesis that stems are stored as a whole.

In sum, co-occurrence restrictions in Hebrew can be explained under the Semitic-specific C-root Approach, by assuming that the constraint operates at the C-root level. However, they are accounted for equally well, if not better, under the Universalist Approach: the constraint applies only to the consonants of the stem, which are phonologically projected on an independent tier. While both approaches account for the Semitic facts, the Universalist Approach also accounts for co-occurrence restrictions in Yamato stems. Furthermore, Berent et al.’s (2007) experiments revealed that the co-occurrence generalization must be implemented (at least, if not only) as a constraint at the level of stems. The psychological evidence supports the existence of a stem-level representation in which the restrictions apply, and undermines the claim that only a C-root-level explanation of the effect is viable.

4 Conclusions

In this paper, we provided a re-analysis of psycholinguistic data that were claimed to support the C-root; this is the first effort to debunk the psycholinguistic argument for the C-root, and it is probably long due. The topic tends to fall between the cracks of psycholinguistics and morphology. Researchers of word recognition, even when assuming a general consonant bias or when critical of the language-specific interpretation of behavioral results from Semitic languages, still refer to the C-root as a unit that might be relevant to morphology (e.g. Nespor et al., 2003; Norris & Kinoshita, 2012b); and Semitic morphologists, who support the concept of the C-root, cite root-positive psycholinguistic findings, overlooking similar characteristics in non-Semitic languages. This results in a peculiar gap in the manner in which Semitic and non-Semitic languages are discussed in both fields.

When speakers of Semitic languages re-enter the field of visual word recognition as equal players, they can inform us about the influence of dense neighborhoods on a variety of effects, as Perea and colleagues demonstrate (see Sect. 3.2). The fact that the vowel letters in Arabic and Hebrew are under-represented can be useful for the study of the origins of the seemingly differential encoding of consonants and vowels in alphabetic writing systems (see recent discussion in Schubert et al., 2018).

In the field of auditory word recognition, the ongoing debate regarding the origin of consonant advantage can greatly benefit from a comparison between languages with a similar number of vowels and consonants that differ on other dimensions, such as neighborhood size and spread. For example, Levantine Arabic and Catalan have comparable inventory sizes (26 consonants and 9 vowels in Arabic vs. 26 consonants and 8 vowels in Catalan). On the other hand, the “spread” of these languages differs dramatically in some parts of the lexicon. For example, in the Semitic verbal system, vowels are bound to a configuration, making the transitional probability from a vowel in one syllable to the following syllable particularly high. This makes some vowels critically uninformative in the Arabic verbal system: if a vowel can be predicted based on the previous one, then it can be ignored. In Catalan, verbs are much more likely to differ in one vowel, such that the lexical meaning (contrary to syntactic configuration) is dependent on the vowel (e.g. alejar ‘to distance’ – alojar ‘to host’, acusar ‘to accuse’ – acosar ‘to harass’). Comparing the influence of vowel/consonant spread in the specific context of the verbal system of these languages, that have quite similar perceptual distinctions between their vowel and consonant inventory, can advance our understanding of consonant superiority in early lexical retrieval from hearing.

In sum, the goal of this paper was putting psycholinguistic data from Semitic languages in a universal context. We hope that it would be of use to scholars of lexical storage and word recognition, psycholinguists and theoretical linguists alike.