Introduction

In the past 2 decades, an increasing interest to study literacy processes in the Arabic language has been triggered by the awareness of the challenging diglossic nature of this language and of the particularities of its orthographic and morpho-syntactic systems (Saiegh-Haddad, 2003; Saiegh-Haddad, & Henkin-Roitfarb, 2014; Saiegh-Haddad & Joshi, 2014). This interest was also motivated by the difficulties that native Arabic-speaking children face during reading acquisition (Saiegh-Haddad, 2003, 2004, 2005, 2007). In addition, research on reading has very often been conducted and based on English (Coltheart, 2005; Seidenberg & McClelland, 1989) and authors have emphasized the necessity to investigate literacy processes in other languages (Share, 2008) while taking into account the unique features of the linguistic systems studied. Nonetheless, there is an almost common agreement about the linguistic and cognitive variables that are involved in reading (Vellutino, Fletcher, & Snowling, 2004), which is seen as a complex skill implying different modules (Fodor, 1985). In the same time, current models on reading suggest that the linguistic and cognitive processes (or sub-processes) that contribute to word decoding may vary from those involved in reading fluency and reading comprehension. The central processing approaches postulate the existence of general cognitive processes that affect reading in alphabetic orthographies (Coltheart, 2005; Plaut, McClelland, Seidenberg, & Patterson, 1996; Seidenberg & McClelland, 1989). The script-dependent approaches assume that the different cognitive factors might contribute differently to reading, depending on the specific features of the orthographic systems (Bick, Goelman, & Frost, 2011) and their transparency (e.g., Frost, 1998; Share, 2008; Van Orden, Pennington, & Stone, 1990). The latter view seems particularly relevant for languages such as Arabic which has several unique features that differentiate it from other languages (Abu-Rabia, 2002; Saiegh-Haddad, 2005, see also here below).

Overview of the cognitive and linguistic processes involved in reading

Visual processing

Various models suggest that reading capacity starts when the child first perceives the visual graphic symbol (Adams, 1990; Ehri & Snowling, 2005; Frith, 1985) and that reading difficulty rests on visual processing and the ability to deal with the visual written material (Stein & Walash, 1997). This view is supported by findings indicating that visual processing skills are highly significant for young children but become less relevant with age (Plaza & Cohen, 2007; Tahan, Cline, & Messaoud-Galusi, 2011).

Working memory

The difficulty to encode and organize (phonological) information in poor and dyslexic readers is thought to affect their ability to retrieve words from memory (Smith-Spark & Fisk, 2007). Difficulties in decoding abilities, which affect the intact exposure to print, prevent the correct establishment of the orthographic representations and knowledge (Ehri, 2000; Ellis, 1994; Frith, 1985; Steffler, 2001; Treiman & Bourassa, 2000). Consequently, deficient processing of visual words will increase working memory load (Vellutino et al., 2004) and negatively impact reading fluency.

Speed of information processing

Some authors (Wolf & Bowers, 1999) suggest that reading fluency is affected by the quality and the speed with which information is processed. The Rapid Automatized Naming (RAN) tasks (Denckla & Rudel, 1976), which are thought to assess this capacity (Breznitz, 2006), were shown to predict reading in a variety of orthographies, including in Arabic (Johnston & Kirby, 2006; Saiegh-Haddad, 2005; Taibah & Haynes, 2010). Other authors consider performance in the RAN as an index of the phonological abilities (Torgesen, Wagner, & Rashotte, 1994) that determine the speed of access to orthographic patterns (Norton & Wolf, 2012).

Phonological processing

Most researchers place phonological abilities, which significantly determine differences between readers, at the basis of reading capacity (Snowling, 2000). The quality of phonological processing is thought to be determinant in the development of phonemic awareness (PA) which predicts reading acquisition (Adams, 1990; Elbeheri, Everatt, Mahfoudhi, Abu Al–Diyar, & Taibah, 2011; Elbeheri & Everatt, 2007; Gillon, 2004; Saiegh-Haddad & Geva, 2008; Snowling, 2000; Stanovich, 1988; Taha, 2013). Evidence for the contribution of PA to reading has been found in different orthographic systems (Asadi & Khateb, 2017; Ziegler et al., 2010). Yet, some researchers suggest that this contribution is less important in transparent orthographies and tends to decrease with the development of the alphabetic code (Shatil & Share, 2003; Taha, 2013). The importance of PA during the early stages of reading is explained by the fact that, when first learning to read, the child relies on grapho-phonemic conversion (GPC) rules which enable the readers to internalize the relations between the phonemes (i.e., the sounds) and the graphemes (i.e., orthographic symbols).

Orthographic knowledge

Among the other factors that contribute to reading is the orthographic knowledge (Abu-Rabia, Share, & Mansour, 2003; Share & Stanovich, 2002), an issue which is gaining more and more attention in literacy research. In alphabetic orthographies, the orthographic patterns (together with the phonological) may carry another source of information: namely the morpho-orthographic representation which was shown to contribute to word recognition (Nunes, Bryant, & Bindman, 2006; Velan & Frost, 2011).

Morphological knowledge

The awareness to the morphological structures of the language was found to play an essential role in reading processes in general (Nunes et al., 2006) and in Semitic languages in particular (Abu-Rabia, 2007; Ravid & Malenky, 2001; Saiegh-Haddad & Geva, 2008; Schiff & Ravid, 2007). In Arabic, morphological ability was found to be developmentally based and morphological units were proposed to be part of orthographic knowledge (Saiegh-Haddad, 2013; Taha & Saiegh-Haddad, 2015).

Semantic knowledge

Several models stress the role of semantic knowledge as one of the essential stages in word recognition (Adams, 1990; Plaut et al., 1996). Lexical-semantic knowledge is suggested to interact with orthographic knowledge and PA, particularly when reading irregular words (Nation & Snowling, 2004; Ricketts, Nation, & Bishop, 2007), where the GPC processes are inefficient for accurate reading. However, it had been also suggested that lexical and supra-lexical components, such as syntax and semantics, do not contribute to predicting basic reading abilities (Shatil & Share, 2003). Hence, one might assume that such a contribution might depend on the orthographic transparency and the characteristics of the language at hand.

This short overview suggests that different variables might affect word recognition and reading in alphabetic languages. Some previous studies had addressed the specific contribution of some of these variables to reading in Arabic (e.g., Abu Rabia, 1995, 2001, 2007; Abu-Rabia et al., 2003; Saiegh-Haddad, 2003, 2004, 2005; Saiegh-Haddad & Geva, 2008). However, large-scale and comprehensive studies investigating reading development in Arabic across different age groups are still lacking.

Some specific features of Arabic

The Arabic language is characterized by various features that are thought to affect literacy processes (Mahfoudhi, Everatt, & Elbeheri, 2011; Saiegh-Haddad, 2004; Saiegh-Haddad & Joshi, 2014). Among these features, the diglossia issue which refers to the existence of two forms of Arabic that are used in different situations. The spoken Arabic dialect (SA), first acquired, is used for everyday oral communication purposes. The literary Arabic (LA, referred to also as Modern Standard Arabic or MSA,) is acquired at school and used for reading and writing and for formal communication purposes (Saiegh-Haddad & Henkin-Roitfarb, 2014). The two forms of Arabic are distant at different linguistic levels including the phonological, morphological, semantic and syntactic ones (see Asadi & Ibrahim, 2014; Saiegh-Haddad, 2003, 2004, 2007; for a review see Saiegh-Haddad & Henkin-Roitfarb, 2014). Researchers suggest that the SA dialect negatively impacts reading development because children use SA until the pre-school period and thus experience difficulties in constructing the phonological representations for LA words during the early grades (Saiegh-Haddad, 2003, 2004, 2007; Saiegh-Haddad, Levin, Hende, & Ziv, 2011).

The specific features of the Arabic orthographic system are also thought to make literacy acquisition more challenging than in other languages. This alphabetic system is written from right to left and includes 28 consonant letters of which three represent also long vowels (Taha, 2013). Short vowels are added as diacritical marks above and under the letters, leading to a certain visual complexity. Also, there is a physical similarity between the letters, since some letters have the same basic form but are distinguished by the presence of dots, their number (one to three dots,) and their position (under or above the letter, see Saiegh-Haddad & Henkin-Roitfarb, 2014). In addition, the presence/absence of short vowels determines the depth of the orthography (Frost, 1998). When written with short vowels, the orthography is considered transparent and reading relies mainly on sub-lexical processes. When written without short vowels (non-vowelized), the orthography is considered deep since words lack part of the phonological information, and reading is said to rely on lexical processes (Frost, 1998; Katz & Feldman, 1983) and context cues because many words become homographic (Abu-Rabia, 2001). The children first learn to read with transparent text and start to use the deep orthography around the fourth grade (Abu-Ahmad, Ibrahim & Share, 2014). Finally, there are several other features that influence the degree of orthographic transparency and contribute to its complexity (Asadi, Ibrahim & Khateb, 2017). These features include: (1) the presence of emphatic sounds (τℜ, σℜ, ∆ℜ, δℜ) which refers to pairs of different letters that share the same phonology and rely on the same part of the articulation system (t, σ, ∆, δ, see Taha & Khateb, 2013); (2) the fact that 22/28 letters can be written in four different ways depending on their connectedness and place in the word (see Khateb, Taha, Elias, & Ibrahim, 2013; Khateb, Khateb-Abdelgani, Taha & Ibrahim, 2014; Taha, Ibrahim & Khateb, 2013); and (3) the existence of sounds that in certain instances are written but not pronounced and the existence of letters that in some instances are pronounced but not written (like in the case of the letter “?aleϕ” in the word “هذا” that one pronounces /ha:Δa:/ while it is written as/haΔa:/). These and other characteristics contribute to inconsistency in the graphemic- phonemic conversion (GPC) rules, on which the young children have to rely for reading before acquiring bigger morphological units for automatized reading (see Saiegh-Haddad & Henkin-Roitfarb, 2014).

As for the morphological system, Arabic words are produced from the combination of roots, representing the meaning, and patterns representing the lexical and syntax categories. A majority of Arabic words are produced through derivational and inflectional processes of the roots and patterns. However, despite the fact that most roots are composed of three consonants (Abu-Rabia, 2001) and thus might contribute to a certain morphological transparency, the transformations are generally not linear and tend to break the phonological and orthographic identity of the words, and to reduce their morphological transparency (for details see Saiegh-Haddad & Geva, 2008). In languages like English, new words are generally produced through linear transformations using affixes that preserve their phonological and orthographic identity, and maintain a high morphological transparency. The role of morphology in reading and spelling morphologically less transparent languages like Hebrew (Ravid & Malenky, 2001; Schiff & Ravid, 2007) and Arabic (Abu-Rabia, 2007; Saiegh-Haddad & Geva, 2008; Saiegh-Haddad, 2013) has been repeatedly emphasized.

In view of the above mentioned specific features of the Arabic language and the lack of comprehensive studies that investigate reading processes in different age groups, this study used a cross-sectional design to examine the contribution of linguistic and cognitive variables to reading in a large sample of Arabic speaking children. The study aimed at assessing the weight of the contribution of each of the linguistic and cognitive variables to reading measures in each of the first six grade levels. We predicted, as in other languages, that reading will involve various linguistic and cognitive variables. In particular, we hypothesized that phonology will contribute particularly to decoding abilities. This contribution was expected to decrease with age due to the decreasing reliance on GPC strategies. Also, given the complexity of the orthographic system, we predicted a major role for orthography both in decoding and fluency. Also, given the proposed role for morpho-orthographic representations in reading less transparent morphologies, we also expected a significant role for morphology.

Materials and methods

Participants

A nationally representative sample of 1305 pupils participated in the study including: 107 first graders, 241 s graders, 238 third graders, 239 fourth graders, 242 fifth graders, and 238 sixth graders (see details in Table 1). The sampling was conducted in three stages, of which the first two steps were conducted by the service of the chief scientist of the Ministry of Education. Firstly, 23 schools from all the six districts of Israel were sampled while taking into consideration the socioeconomic criteria (low, medium and high on the basis of parents’ income, occupation and the ranking of the residential area). Secondly, a specific class was sampled from each of the required grades in each of the selected schools. Thirdly, from the classes of second to the sixth grades, 10–11 participants per class (×23 schools) were sampled by taking every third child (whether a boy or a girl) from each class alphabetical list of names. This selection provided about 238 children in each of these grades. These children were all tested in two different sessions starting in March (the school year extending from September to June). As for the first grade, testing started in May, a little closer to the end of the school year in order to ensure that all schools were around to finish the official reading instruction curriculum and thus to avoid major acquisition differences between the schools during this early and critical stage of reading acquisition. Due to this time constraint and to the fact that first graders were tested in three sessions (instead of two for the other grades: to avoid fatigue and concentrations effects), only 4–5 participants were selected from each first grade class by taking every sixth child (whether a boy or a girl) from class list of names of the same 23 schools. This selection allowed to preserve the same sampling principle (i.e., representativeness) and yielded a total of 107 representative first grade participants. As already mentioned, all participants were Arabic-speaking children who were exposed from birth to SA and acquired LA through formal education starting in first grade. All children started to learn Hebrew, as their second language, formally from the third grade, but were already sensitized to very basic Hebrew in the second grade (i.e., colors, numbers and animal names and letters’ names). Also, their exposure to basic vocabulary in English started in the third grade.

Table 1 Details of the participants’ groups

Materials

The tools used in this study aimed at assessing reading capacity (by measuring decoding and fluency), the linguistic and cognitive skills hypothesized to be involved in reading, together with the general non-verbal ability. All measures (except the digit span, symbol search, visual perception and Raven general ability tests, see below) were developed for the purpose of this investigation on the basis of a pilot study that enabled the selection of items according to the level of difficulty for each grade. The pilot was conducted with about 60 children from each grade sampled from 10 schools selected randomly from the six districts of Israel. For the initial selection of the words for the different tests, we first relied on the school books of the different grades and collected words to be used for the first–second grade, third–fourth grade and fifth–sixth grade. The words’ lists for the different tests were then examined by language teachers to assess their suitability for the different grades. For this purpose, different teachers assessed the different word lists: one evaluated words for first–second grade, one assessed words for third–fourth grade and one for fifth–sixth grade. Concretely, each teacher rated the familiarity of each word in the list using a five points scale (1 for not familiar, 5 for familiar). Based on this rating, words with different levels of familiarity (those rated between 2 and 4) were retained for the different lists. The retained word lists were then used for collecting data during the pilot study. Item analysis was then conducted on the pilot data in order to select items that were appropriate in terms of difficulty (showing neither floor nor ceiling effects) and reliability (see below Cronbach’s alpha for each test) for the first–second, third–fourth and fifth–sixth grades (see details below, and Appendix Table 5). Item analysis showed that the lists for word reading, phonemic deletion tests (see below) were not appropriate for the first–second grades together. Hence, items which in each list appeared appropriate (>50 and <90% success) for second grade were kept in the lists to be used for second graders and new separate lists for first graders were created. The news lists included those items (determined by item analysis) that were appropriate for first graders, to which other few items were added after verifying there suitability for the first graders. More generally, item analysis performed on the pilot data not only allowed selecting the items which are appropriate for the age section but also to exclude some problematic inappropriate ones.

Word reading test

This test examined decoding abilities and the fluency of reading of vowelized Arabic words. Reading fluency is described as the outcome of two variables: The accuracy and the rate of reading. Hence, fluency in reading was computed as the number of the correctly read words per minute. Four word lists were created with a level of difficulty adapted to each grade. The first list for the first grade included 24 words (8 verb and 16 nouns) of which 10 disyllabic, 12 tri-syllabic and 2 quadri-syllabic words. The second list for the second grade included 25 words (9 verb and 16 nouns) of which 8 disyllabic, 13 tri-syllabic and 4 quadri-syllabic words. The third list for third and fourth grades included 25 words (8 verb and 17 nouns) of which 3 disyllabic, 12 tri-syllabic and 10 quadri-syllabic words. The fourth list for fifth and sixth grades included 26 words (8 verb and 18 nouns) of which 2 disyllabic, 7 tri-syllabic and 16 quadri-syllabic words. In each of the lists, the verbs included both derivational and inflectional (tense and person) patterns, while the nouns included mostly inflectional forms (i.e., possessive, gender and number, see examples in Appendix Table 5). The examinee was asked to read the words as accurately as possible in his/her personal comfortable rate. Reading accuracy and fluency was measured. The reliability of the test (α) ranged from .88 to .91 in the different grades.

Linguistic and cognitive measures

On the basis of findings from previous studies (see Introduction section), we choose to investigate cognitive and linguistic skills that are related to the development of reading. For the general cognitive domain, we assessed visual perception, memory and RAN, and for the spoken and written linguistic domain we assessed phonology, orthography, morphology, semantic (vocabulary) and syntactic knowledge. Each of these cognitive and linguistic domains was examined with two or more tests in order to avoid the effect of one specific task on the assessment of a given knowledge.

Visual perception tests

Symbol search

This task examined the speed and accuracy of visual search of nonlinguistic symbols, while ignoring distracters (Wechsler, 1998). The task consisted of 45 line-items, where in each, two symbols appeared at the right (target) of the line and other symbols appeared elsewhere in the line. The examinee was required to find at least one of the two symbols among the distracters within 2 min.

Beery’s test of visual perception

This 3 min test examined the ability to distinguish between nonlinguistic symbols. It consisted of 27 items of increasing difficulty, presented in columns. The examinee was required to identify the target (the first at the top of the column) among the distracters. (Beery & Beery, 2010).

Memory tests

Digit span

This test assessed short term memory (STM) using verbal stimuli of digits that the child was asked to repeat in the order presented, and then in the reverse order (Wechsler, 1998). The individual score was based on the sum of the forward and backward spans.

Phonological working memory

This test examined the ability to keep and manipulate phonological information. Three lists were created. The stimulus list for the first and second grades contained 18 items (11 mono-syllabic and 7 bi-syllabic). These 18 items included 6 pseudowords, 4 SA words, 3 LA and 5 words shared by SA and LA. For the third and fourth grades, the stimulus list contained 31 items (10 mono-syllabic and 21 bi-syllabic). These 31 items included 13 pseudowords, 6 SA words, 6 LA and 6 words shared by SA and LA. For the fifth and sixth grades, the stimulus list contained 30 items (6 mono-syllabic and 24 bi-syllabic). These 30 items included 13 pseudowords, 5 SA words, 5 LA and 7 words shared by SA and LA. During the test, the child was asked to repeat the stimulus heard after reversing the order of its sounds. The reliability of the test (α) ranged from .92 to .95.

Speed of information processing tests

The speed of processing was examined using rapid automatic naming (RAN) tasks. These included (1) Digit naming which included the digits 1, 5, 9, 3, and 7, (2) Letter naming which included the Arabic letters (/س/ا/ع/ت/ي/) and Object naming which included pictures for watch, chair, banana, frog and candle. In each test the stimuli were repeated 10 times, yielding thus 50 stimuli that the child had to name while measuring the response time to complete all the items. The total time to complete each subtest was then used to compute the number of items per minute.

Phonemic awareness tests

Phonemic deletion

This test examined the ability to perform phonemic deletion at the beginning, middle and end of one to three syllable words (Abu-Rabia, 1995). Four word lists were created: One list for first grade, one for second grade, one for third–fourth grades and one for fifth–sixth grades. The list for first grade included 12 items (6 mono-syllabic and 6 bi-syllabic of which 2 SA words, 10 shared SA/LA words) of which 6 items comprised 3 phonemes and 6 comprised 5 phonemes. The list for second grade included 16 items (7 mono-syllabic and 9 bi-syllabic of which 1 SA word, 2 LA words and 13 shared SA/LA words) of which 7 items comprised 3 phonemes, 6 comprised 5 phonemes and 3 comprised 6 phonemes. The list for third–fourth grades included 18 items (5 mono-syllabic and 13 bi-syllabic of which 2 SA words, 3 LA words and 13 shared SA/LA words) of which 5 items comprised 3 phonemes, 7 comprised 5 phonemes and 6 comprised 6 phonemes. Finally, the list for fifth–sixth grades included 20 items (4 mono-syllabic and 12 di-syllabic and 4 tri-syllabic of which 2 SA word, 4 LA words and 14 shared SA/LA words) of which 4 items comprised 3 phonemes, 7 comprised 5 phonemes and 9 comprised 6 phonemes. Each word was read by the examiner to the participant who had to repeat it and to say it after deleting a specific sound (see examples in Appendix Table 5). The reliability of the test (α) ranged between .80 and .92.

Phonemic segmentation

This test examined the ability to repeat and segment the words into their basic sounds. The items represented SA, LA, SA/LA shared words, but also pseudowords to force the participants’ reliance on the phonological (and less lexical) analysis. The use of pseudowords in this task aimed at obliging the participant. One list was used for first–second grades and included 14 mono-syllabic items (5 SA words, 2 LA and 4 shared SA/LA words and 3 pseudo words). Of these, 12 items comprised 3 phonemes and 2 items comprised 4 phonemes. A second list was used for third-to-sixth grades and included 18 items (13 mono-syllabic and 5 bi-syllabic: 6 SA words, 4 LA and 2 shared SA/LA words and 6 pseudo words). Of these, 8 items comprised 3 phonemes, 8 items comprised 4 phonemes and 2 comprised 5 phonemes (see examples in Appendix Table 5). The participant had to repeat each word after the examiner and to segment it into its sounds. The reliability of the test (α) ranged between .82 and .90.

Orthographic knowledge tests

In order to assess orthographic knowledge, we relied on Apel and Lawrence’s (2011) model which suggests that orthographic knowledge contains two basic components: word specific knowledge and general orthographic knowledge. The first component relates to the capacity to identify words and pattern that children learn. This component was here examined in the parsing task (see below). The second component relates to the knowledge and understanding of the general characteristics of a specific writing system (including legality in combining letters and there sequence). This component was here examined in the orthographic choice task (here below).

Parsing

This test examined the ability to identify orthographically significant patterns (i.e., to detect word boundaries) inside a sequence of words that were presented in a line without spaces between them. Technically, the final letter of a word, even if naturally connecting with a subsequent letter (in this case the first letter of the next words in the line), was controlled for in order to not connect (see examples in the Appendix Table 5 as in the case of /ﺃكلﺑﺎب/instead of /أكلباب/). The words were all from LA, and included inflected nouns and verbs but also pronouns, preposition, conjunctions etc. The test included one list with 46 line-items in which each line included four separate words (of one to five syllables) that did not constitute a meaningful sentence. The examinee was asked to separate the words by drawing a line between two successive words (i.e., to mark word boundaries, see Appendix Table 5). A correct response was considered when all boundaries in one item/line were detected. The reliability of the test (α) ranged between .80 and .91 in the various grades.

Orthographic choice

This test examined the ability to identify orthographically wrong and illegal patterns and is thought to assess word specific and general orthographic knowledge. Three lists were created: one for first–second grades (40 items including 18 wrong ones), one for third–fourth grades (60 items including 27 wrong ones) and one for fifth–sixth grades (102 items including 54 wrong ones). The items (between 1 and 4 syllables) represented a variety of nouns and few verbs in the correct and incorrect forms (see Appendix Table 5). The wrong items included pseudo-homophones (e.g., the word إصطبل/ instead of /إسطبل/), visually similar words (e.g., the word تمين/ instead of /ثمين/), phonetic errors (e.g., the word ساريع/ instead of /سريع/) and orthographic rules errors (e.g., the word شربو/ instead of /شربوا/). The examinee was asked to identify the correct and incorrect forms and to mark the incorrect forms. The participant’s score is based on the number of correctly marked incorrect forms (correct positives) and non-marked correct forms (and correct negatives) meaning that maximal score was: 40 for first–second grades, 60 for third–fourth grades and 102 for fifth–sixth grades. The reliability (α) ranged between .83 and .91.

Morphological knowledge tests

Inflecting verbs and nouns

This test examined the ability to inflect verbs and nouns in the literary (modern standard Arabic) language. Three test lists were created: A first 19-item list for the first and second grades (11 verbs and 8 nouns), a second 24-item list for the third and fourth grades (16 verbs and 8 nouns), and a third 23-item list for the fifth and sixth grades (13 verbs and 10 nouns). For the verbs, the participant was shown a three consonant roots and explicitly required to inflect it according to person and number, gender, and tense (past, present and imperative). For example, the child asked to inflect the root /k.t.b/, with the person /he/, in present tense will say /jiaktub/ for /he writes/. For nouns, the examinee was presented a picture and required to inflect its noun according to gender and number using only possessive pronouns. For example, a participant seeing a picture of a house together with another picture depicting 3 boys (gender and number) was required to inflect the noun with the seen person (i.e., they) and will say /da:rahum/ for /their house/ (see Appendix Table 5). The reliability of this test (Cronbach’s α) ranged from .88 to .91 in the different grades.

Derivation of words in context

The test examined the ability to derive from a given three consonants root the word that completes a sentence. Three lists were created: A first lists for first–second grades (14 items), a second list for third–fourth (16 items), and a third list for fifth–sixth grades (16 items). The derivations were made using the most frequent morphological Arabic patterns (e.g., /fa:∂el, maf∂uw:l and maf∂al/). During the test, the examinee heard an incomplete sentence and was required to complete it verbally according to a given root presented visually (see examples in Appendix Table 5). Concretely, the child is expected to derive from the root the correct word that adequately completes the sentence. The participant received one point for each correct item. The reliability of this test (Cronbach’s α) ranged from .60 to .62 in the different grades.

Root awareness

The test examined the children’s awareness of the words’ root. Three lists were created: A first list of 26 items for the first and second grades, a second list of 28 items for the third and fourth grades, and a third list of 30 items for the fifth and sixth grades. In each item, there were either three words (first–second grades) or four words (the other grades) in each item. Each item was presented visually and read to the participant, who was required to say the word that did not relate to the root “family” (i.e., did not derive from the same root, see examples in Appendix Table 5). The unrelated words (distracters) were either related morphologically (same word pattern as one of the words) or semantically (related categorically or associatively), or orthographically. In addition, from third grade onwards, the distracters included also Simple and frequent patterns (e.g., /faala, fa:∂el and maf∂uw:l/) were used for first and second grades, while for higher grades the patterns included also less frequent ones (e.g., /fa∂∂ala, fa:∂ij:l, ?af∂ala and ?istaf∂ala/). The participant’s score was based on the number of correct answers. The reliability of this test (Cronbach’s α) ranged from .82 to .91 in the different grades.

Pattern awareness

The test examined the children’s awareness of morphological patterns existing in Arabic. Three lists were created: A list of 19 items for the first and second grades, a list of 18 items for the third and fourth grades and a list of 21 items for the fifth and sixth grades. The items represented a variety of morphological patterns. Again here, simple patterns were used for first–second grades (e.g., /faala, fa:∂el, maf∂uw:l and ?af∂ala), while for higher grades the patterns included also less frequent ones (e.g., /fa∂∂ala, fa:∂ij:l, ?af∂ala, ?istaf∂ala/ and ?iftaala/). In each item, two words were presented visually and read to the examinee who was required to decide by saying “yes” or “no” whether the words were related to the same pattern or not (see examples in Appendix Table 5). The lists contained: 10/19 related items for first and second grades, 10/18 related items for third and fourth grades and 13/21 related items for fifth and sixth grades. The participant scored one point for each correct answer. The test’s reliability (Cronbach’s α) ranged from .72 to .81 in the different grades.

Semantic knowledge tests

Receptive vocabulary

This test evaluated the semantic knowledge at the receptive level. A list of 90 LA words was created with items 1–50 for testing first–second grades, items 21–70 for third–fourth grades, and items 30–90 for fifth–sixth grades. The words in each list were all nouns or verbs describing actions, that included high to low frequency words, selected on the basis of a rating made language teachers of the relevant grades (see above). For each item, the examinee heard a word and while presented with four pictures from which he/she was required to designate the one corresponding to the word heard (see examples for the words in Appendix Table 5). In each item, the pictures always included one unrelated distracter (neutral) and two other related distracters (e.g., phonological, semantic or morphological as in the following example for target word “مهد for cradle” with “فرد for gun” which is related phonologically, “سرير for bed” which is related semantically and “ هدية for present” is a neutral distracter). The reliability of the test (α) ranged between .82 and .88.

Expressive vocabulary

The test examined the participants’ semantic knowledge at the production level. Three word lists which included adjectives (e.g., big/small), abstract nouns (love/hatred), verbs (to destroy/to build) and preposition (e.g., under/above) were created: one for first–second grades (26 items), one for third–fourth grades (30 items) and one for fifth–sixth grades (30 items). Again here, the selected words were adapted in terms of difficulty to the grade levels and included low to high frequency items as assessed by language teachers. The examinee heard a word and was required to respond by saying its opposite (i.e., the antonyms, the given response was compared to 1–3 possible correct answers provided to the examiner, see examples in Appendix Table 5). The items, from LA, included verbs, time expressions, quantity expressions, adjectives and others. The reliability of the test (α) ranged between .80 and .89.

Syntactic knowledge

Sentence judgment

The test examined the sensitivity to syntactic rules and the ability to judge the correctness of sentences presented visually and read by the examiner. Three sentence lists were created: one for first–second grades (26 items, including 15 incorrect), one for third–fourth grades (33 items, including 18 incorrect), and one for fifth–sixth grades (31 items, including 15 incorrect). The sentences contained between three and seven words (see examples in Appendix Table 5). For first–second grades, there were 9 sentences of 3 words, 9 of 4 words, 5 of 5 words and 3 of 6 words. For the third–fourth grades, there were 9 sentences of 3 words, 4 of 4 words, 10 of 5 words, 7 of 6 words and 3 of 7 words. For the fifth–sixth grades, there were 5 sentences of 3 words, 5 of 4 words, 9 of 5 words, 10 of 6 words and 2 of 7 words. The anomalies in the sentences concerned the syntactic rules such as the order of the words in the sentence, but also subject–verb agreement, verb tense and time complement agreement, and use of prepositions and conjunctions. The reliability of the test (α) was relatively low and ranged between .63 and .78.

Personal pronoun affinity

Due to the fact that this knowledge is not required in early grades teaching curricula, this task was used only starting from the third grade onwards. Actually, due to the Arabic diglossia and the fact that children use one language variety for speaking (Spoken Arabic: with its own syntactic rules) and another language variety (literary Arabic: with its own syntactic rules) for learning to read, the children can’t rely on their knowledge of the spoken language to understand the syntax of the written literary language in which the present test was performed. Such syntactic rules, in particular those related to the affixed personal pronouns (which appear as part of the word), are not taught during the first years and are very hard to understand by the children This test examined the participants’ awareness of the link between words in the sentence and in particular the significance of the personal pronouns which in some case appear separately and in other cases affixed at the end of a word. For this purpose, two lists were created: one for third–fourth grades (28 items) and one for fifth–sixth grades (31 items). In each item, the sentence was presented visually to the participant (in order to minimize memory load) and read by the examiner (in order to avoid possible effects of reading difficulties).The examinee was required to determine the affinity of a given personal pronoun within the presented sentence (i.e., to whom or to what a personal pronoun refers to in the sentence, see examples in Appendix Table 5). In Arabic, separated and affixed personal pronouns for the first, second, and third singular and plural persons do not differentiate between humans and non humans, such that pronouns like he and she can be used for people, animals and objects depending on the gender of the words (i.e., no “it” in Arabic). For each item, the examinee had to choose between three possible choices (referring all to nouns appearing in the sentence) provided visually and read by the examiner. In all groups, the sentences included between 6 and 15 words. The reliability of the test (α) ranged between .87 and .88.

Finally, we used the Raven progressive matrices test (Raven, Raven & Court, 1995) to control for the children’s nonverbal general ability. For this purpose, we used a color A 36-items version for examining first-fourth grades children and a standard 60-items black and white version for examining fifth–sixth grades children. The rational for using the color version for earlier grades is based on the assumption that the standard Raven test has a limited discrimination at the upper and lower levels (Raven, 2000). In each item and in both tests, the participant was asked to choose the part that completed a matrix from several simultaneously presented alternatives.

Procedure

The participants were tested individually in a quiet schoolroom during school hours. The different tasks were administered to the children in different orders. All examiners were professionals in communication disorders and learning disabilities. For the purpose of this study, the examiners received specific training about the procedures for the administration of the tasks. In order to further validate the results of the study, all language teachers of the selected classes and grades in each school filled a questionnaire about each child’s reading abilities (accuracy and fluency). The teacher’s evaluation was based on his/her own personal knowledge on the child’s everyday performances and in the different exams (see Results below).

Statistical analyses

Given the fact that all the domains of knowledge were examined with two or more tests (see Appendix Table 6 for mean raw scores for all tests), we computed for each participant the average of the performance in the different tests of the same skill/domain in order to generate one single general measure per domain and thus to reduce the number of variables (Appendix Table 7 for the combined scores). Also, the combination of the score of more than one test presents the advantage of avoiding the effect of one specific task in which the variance of scores for a given age group might be different from that of another one that assesses the same knowledge. However, before computing the average score per domain, we computed two different analyses. First, we examined the developmental validity of the tests by conducting unpaired t-tests on the subjects’ scores for the same test used for two successive grades (see here above). This analysis showed that performance change between the two successive grades using the same test was significant (at p < .05) in all cases, except for the phonemic segmentation in first–second grade and morphological pattern awareness test in fifth–sixth grades (see details on these scores in Appendix Table 6). Afterwards, before computing the average of the tests’ scores per domain for each participant, we computed a correlation between the two tests in the domain. This analysis was significant for visual perception (symbol search and Beery’s test, r = .50, p < .01), memory (r = .48, p < .01), phonological awareness (r = .57, p < .001), orthographical knowledge (r = .50, p < .001), semantic knowledge (r = .68, p < .001), and syntactic knowledge (r = .60, p < .001). For domains that were assessed with more than two tests, a coefficient of reliability (Cronbach’s alpha, α) was computed and showed α = .82 for the RAN tests and α = .77 for the morphology tests. Appendix Table 7 presents the mean of the combined raw scores of all measures. However, in order to allow for a better comparability between age sections where tests had different number of items, the combined scores were transformed into percentages of correct responses and presented Table 2. These scores were then used for further analysis from which 17 children (from the various grades), whose standardized scores were below minus 2 in decoding and fluency measures, were excluded.

Table 2 Descriptive statistics (mean and SD) of the participants’ combined scores for the different variables by grade

For determining the contribution of the different measures to decoding and fluency, path model analyses were conducted (with AMOS 18.0 software, Arbuckle, 2009). This model was conducted separately for each grade in order to determine the relative weight of the contribution of each variable, after controlling for the general nonverbal ability. A basic saturated model (path analysis) was formed through paths between the measures that predict themselves and between the self-predicting measures and the dependent measures (decoding and fluency). This saturated model is identical to linear regression analysis (which gives us the same results in terms of the betas and explained variance).

Results

Before conducting the various analyses to answer our research questions, we first examined the validity of our results on reading by conducting correlation analyses between the children’s reading accuracy and reading fluency as reported in the teacher’s evaluation (based on the questionnaire) and the participants’ performance in decoding accuracy (r = .50, p < .01) and fluency results (r = .48, p < .01). Table 2 shows the descriptive statistics for reading tests (decoding and fluency) and for each of the general measures in the different grades. When comparing two successive grades (i.e., same task for first–second, third–fourth, or fifth–sixth, see also Appendix Table 6), the results indicated a clear developmental change. Of note is the fact that decoding measure in first and fourth grades tended towards a ceiling effect with a large variance.

The correlations analyses between all the predicting measures were computed through the basic saturated model (path analysis). Significant correlations were found between all of them, and showed no evidence of multi-co-linearity (not reported here). In addition, a correlation analysis was conducted separately between decoding and fluency and all the predicting (cognitive and linguistic) measures. This analysis, presented in Table 3, shows that there were significant correlations at p < .01 between the dependent variables and all measures throughout all grades, except between decoding and syntax where the correlation was weak, but still significant at p < .05. Additionally, weak correlation was also found between the dependent variables and the general ability Raven test (see Table 3 for more details).

Table 3 Inter-correlation analyses between dependent variables and all measures by grade

The results of the path model analyses are presented in Table 4. These show that the various measures contributed in a fairly similar manner to predict decoding and fluency (between 48 and 66% of explained variance −R 2). The results show that the variance explained in the two dependent variables in the first grade was the highest and tended to decrease in the next grades. However, despite the big similarity between the percentages of variance for decoding and fluency, the standardized coefficients (Beta-b) showed a clear difference in the contribution of some of the cognitive and linguistic measures to the two dependent variables. Thus, it can be seen that visual perception made no significant contribution either to decoding or to fluency, except for a marginal one to fluency in the first grade. As for memory, its contribution to both decoding and fluency showed some similarity, although it was absent for the fluency in the first and sixth grades. Of particular interest is the fact that while RAN did not contribute at all to decoding, its contribution to fluency was highly significant in almost all grades and reached its maximum in the fourth grade (b 3 = .34). The contribution of PA showed a reversed pattern since it contributed consistently to decoding and increased slightly in the higher grades (forth–sixth grades). Its contribution to fluency was quite inconsistent and remained generally weak. The contribution of orthography to decoding and fluency was highly significant in all grades. For decoding, from being absent in the first grade, the orthographic contribution changed very significantly and reached its maximum in the second grade (b 5 = .50). More generally for fluency, there was a change in the contribution of the orthographic knowledge as a function of the grade (i.e., increasing in higher grades). The contribution of morphology to the two dependent variables was inconsistent but in both cases appeared in first and fourth grades. The vocabulary and syntax did not contribute to decoding or to fluency, yet for both variables the contribution of vocabulary (although not significant) was the strongest in the first grade. Finally, these results emphasized the fact that the differences in the contribution of the different measures to decoding and fluency appeared in particular during the first grade. In fact, whereas the morphological knowledge contributed to both decoding and fluency, the other variables contributed either to decoding or to fluency. Thus, memory and PA contributed only to decoding, while visual perception, RAN and orthographic knowledge contributed solely to fluency.

Table 4 Summary of path analyses of the variables decoding and fluency by grade

Discussion

This study examined how various linguistic and cognitive variables contribute to reading measures (i.e., decoding and fluency) in Arabic in each of the first six grades. Reading tests assessing decoding and fluency were administered, together with other linguistic and cognitive tests. The assessment of performance change between two successive grades using the same tests was significant in almost all tests and attested of the presence of a developmental skill improvement. Path model analyses were conducted to assess the relative weight of the contribution of each predictor separately for each grade. The results showed that the different predictors explained between 48 and 66% of the variance in decoding and between 46 and 65% in fluency in the different grades. Within each grade, the results showed that the weight of the contribution of the same predictors differed between decoding and fluency. Here below, we discuss the contribution of the different variables as they appeared in the introduction and in the results.

It appears from the literature that different predictors vary in their importance depending on the characteristics of specific languages. In our case, despite the complexity of the Arabic orthography (see Introduction), the results showed that visual perception failed to predict decoding and fluency, except a marginal contribution to fluency in first grade. This finding is inconsistent with the visual hypothesis (Stein & Walash, 1997) and recent findings in Arabic where only one age group, including normal and poor readers, was examined (Taha, 2013). The prediction of reading in first grade, based on the children’s visual abilities in kindergarten (Plaza & Cohen, 2007), might be interpreted in terms of abilities that prepare children for dealing with complex graphic symbols in the future. Our results indicate that orthographic processing and visual perception should be considered separately rather than as one visual-orthographic ability.

The contribution of memory to decoding and fluency in most grades is consistent with results from different languages (Shatil & Share, 2003; Smith-Spark & Fisk, 2007; Swanson & Siegel, 2001) including Arabic (Abu-Rabia et al., 2003; Al-Mannai & Everatt, 2005). The involvement of memory was significant in all grades in decoding but not in fluency. This finding might suggest that Arabic-speaking children rely on phonological decoding strategies. Through decoding, the child deals with the smallest units of the words and must keep more information in working memory for the synthesis, processes that decelerate the reading rate and challenge memory (Perfetti, 1992; Wolf & Katzir-Cohen, 2001). Also, the involvement of memory might be due to the overlap between the phonology and memory measures which both rely on the phonological loop (Smith-Spark & Fisk, 2007). Actually, the correlation (independent of age) between decoding and each of the memory tests was higher with phonological working memory (r = .53) than with digit span (r = .31).

The absence of involvement of the RAN in decoding through all grades is consistent with a recent study of Arabic (Taibah & Haynes, 2010). This result contradicts the argument that, in transparent orthographies, the contribution to decoding of the RAN is much stronger than of the phonology (Goswami, 2000; Ziegler et al., 2010). Consistent with results from both deep and transparent orthographies, our results showed that RAN contributed significantly to fluency in all grade (Bowers & Swanson, 1991; Breznitz, 2006; Cardoso-Martins & Pennington, 2004; Saiegh-Haddad, 2003; Wolf & Bowers, 1999; Taha & Saiegh-Haddad, 2015). The results on decoding reinforce the argument that RAN does not really relate to phonological processing (Bowers & Wolf, 1993). This interpretation is consistent with the fact that PA which predicted decoding significantly and consistently showed a weak correlation (across grades) with RAN (r = .24).

The phonological contribution was more significant to decoding than to fluency. These results fit with others from alphabetic languages (Gillon, 2004; Shatil & Share, 2003; Snowling, 2000; Stanovich, 1988) including Arabic (Al-Mannai & Everatt, 2005; Elbeheri & Everatt, 2007; Elbeheri et al., 2011; Saiegh-Haddad & Geva, 2008; Taibah & Haynes, 2010). Some researchers claim that the contribution of phonology to decoding in transparent orthographies is more crucial in lower grades and disappears in higher grades (Shatil & Share, 2003). The fact that here PA contributed to decoding also in the fifth and sixth grades might be explained by the characteristics of the Arabic vowelized orthography that forces the reader to rely on the phonological information. The weak contribution of PA to fluency is consistent with Wolf and Bowers’s (1999) suggestion that PA has a much stronger correlation with the accuracy in decoding processes. The absence of the PA contribution to fluency in the first grade was already observed in Arabic (Saiegh-Haddad, 2005). The absence of contribution of PA to fluency and its contribution to decoding agrees with the proposition that development of accuracy precedes that of fluency which requires more automaticity (thanks to the development of larger orthographic units).

As expected, the orthographic knowledge contributed significantly to both decoding and fluency. Previous evidence emphasized the importance of orthographic knowledge to decoding (Abu-Rabia et al., 2003; Ibrahim, Eviatar & Aharon-Peretz, 2002; Shatil & Share, 2003) and fluency (Badian, 2005; Bekebrede, van der Leij, & Share, 2009; Elbeheri et al., 2011; Katzir, Wolf, O’Brien, Kennedy, Lovett, & Morris, 2006; Saiegh-Haddad, 2005; Share & Stanovich, 2002; Ziegler & Goswami, 2005). The reader’s ability to stock orthographic representations in long term memory is thought to assist in identifying the written words more automatically and accurately (Share & Stanovich, 2002). Our finding is in line with the ‘grain size theory’ (Ziegler & Goswami, 2005) which proposes that orthographic knowledge helps the reader to move from slow reading, based on grapho-phonemic decoding, to efficient and fluent reading based on larger orthographic units. The weight of the contribution of orthographic knowledge (relative to the other predictors) in the fifth and sixth grades indicates a reciprocal correlation between reading and orthographic knowledge (Elbeheri et al., 2011).

Although morphology was highly correlated with both decoding and fluency, its contribution to the two variables was significant in first and fourth grades (and in fifth grade to decoding only). The contribution in the first grade might be explained by the exposure already at the beginning of reading to inflected morphological patterns with which the readers have to deal immediately. In line with this explanation, an ad hoc analysis performed between decoding and each of the morphology subtests showed that decoding correlated much more highly with inflecting verbs and nouns (r = .62) than with derivational words (r = .42) and root and patterns awareness (r = .30). The morphological contribution in fourth grade and fifth grade is in line other previous results on Arabic and Hebrew (Abu-Rabia, 2007; Ravid & Malenky, 2001). Saiegh-Haddad and Geva (2008) have however reported that morphology does not contribute to decoding ability in Arabic beyond the PA contribution, contrary to English. This finding was attributed to the difference in the morphological transparency of the two languages, supporting the proposition that morphological contribution to reading is more significant in transparent morphologies (Farran, Bingham, & Mathews, 2011; Saiegh-Haddad & Geva, 2008).

The results indicated that vocabulary and syntactic knowledge did not contribute to decoding and fluency in all grades, despite the moderate correlations between these and reading measures. These results fit with others from Hebrew, where it was proposed that lexical and supra-lexical abilities may contribute to higher processes such as reading comprehension (Shatil & Share, 2003). In English, it was suggested that lexical knowledge is much more related to reading irregular words (Nation & Snowling, 2004; Ricketts et al., 2007). Our findings are consistent with the theory of modularity in interaction with the transparency view proposed by Share (2008). Indeed, it has been proposed that Arabic readers do not rely greatly on lexical information in pronunciation of words. Instead, they go from the orthographic channel to pronunciation via the phonological channel. This fact might be due to diglossia, since beginning readers encounter many new LA words for the first time, and these are regarded as generally unavailable in the readers’ mental lexicon (Saiegh-Haddad, 2004).

The analysis of the various components contributing to reading reinforces the idea that the degree of modularity is influenced by the orthographic transparency. This was particularly evidenced by the large and very significant phonological contribution to decoding in the fifth and sixth grade. Also, this argument is supported by the lack of contribution of RAN to decoding in Arabic, a language regarded as transparent. Finally, this argument is further strengthened by the orthographic contribution to decoding in spite of the claims that such contribution is expected to be more significant in deep rather than in transparent orthographies where grapho-phonemic relations are not sufficiently clear. The behavior here of some of these linguistic and cognitive measures allow for questioning the matter of the orthographic transparency in Arabic. Indeed, while orthographic transparency is commonly determined by the clear and consistent link between the written symbols and their corresponding sounds, the orthographic features of Arabic do not constantly enable full correspondence between spoken and written language. Accordingly, one should not deal with the orthographic transparency issue in terms of dichotomy (transparent vs. non-transparent), but instead in terms of a continuum, with different levels of transparency along its axis where Arabic might be defined as semi-transparent.

The orthographic complexity of Arabic is thought to be one major source of difficulties in reading and writing acquisition. The duality of orthographic presentation existing in Arabic exists also in the pronunciation of the words. The use of short vowels, while providing the full phonological information and thus allowing transparency, is also one source of visual density and complexity. The visual similarity between the diacritical marks makes their automatic perception and the processing of orthographic patterns, that differ from each other only by small features, a difficult process. Therefore, the reader is required to use GPC processes to take into account the punctuation marks even in higher grades. Such integration may oblige the reader to rely on phonological information and even on higher cognitive functions such as phonological working memory. Finally, the transition (from vowelized to non-vowelized texts) that occurs around the fourth grade might also be a source of difficulty in reading and reading acquisition. For this transition to occur, children are forced to develop strategies that help them reading non-vowelized words. These strategies (which have to rely on other skills) become more sophisticated with exposure to literacy, thus enabling efficient word recognition. One of these skills is the morphological knowledge that enables the reader to extract phonological information from print even in the vowellized words (Saiegh-Haddad, 2013; Saiegh-Haddad & Geva, 2008). Here, we showed that morphology contributed to both decoding and fluency in the fourth grade when readers start using the unvowellized orthography.

Conclusion

The different linguistic and cognitive predictors explained decoding and fluency with similar amounts of variance across the grade levels, but with major differences between those contributing to decoding compared to those contributing to fluency. This differential contribution might have important pedagogical and theoretical implications. Actually, if these two basic components of reading rely on different cognitive and linguistic skills, their improvement in struggling readers would necessitate the use of different intervention strategies (Aaron, Joshi, Gooden, & Bentum, 2008). In the early grades, the intervention oriented towards improving decoding and focusing generally on phonologically-based training, might profit from a combination that strengthens the morphological awareness. As for fluency, the focus should be made on visual processing of non-linguistic symbols (especially before entering to school) and on the enhancement of orthographic processing and speed of processing in general. Theoretically, the study reported here is the first of its kind to provide an empirical starting point for a reading model in Arabic, based on a large sample of participants. By providing information about the diversity of the predictors of decoding and fluency already in the first grade, these findings might contribute to an earlier and more efficient identification of the potential at-risk children in the preschool stages. Such information about predictors of decoding and fluency in the early stages could orient the establishment of effective instruction programs that better prepare children to formal literacy. One major limitation of this study is the fact that we used a cross-sectional design. The contribution of these predictors to early literacy stages, particularly during the first–second grades, needs to be further examined using longitudinal designs. In such designs, analysis might also be performed to assess statistically the difference in the contribution of the same predictors between the different grades. Of the other limitations of this study is the fact that we used isolated word readings. In this regard, future studies should consider using both words and texts reading. Finally, the results showed that some tests, particularly decoding, showed some ceiling effects, due probably to the fact that items showing floor effects were removed after the pilot item analysis. This point should be considered more cautiously in future studies.