Introduction

The early grade reading assessment (EGRA) is an assessment that is commonly used to inform education systems and reading programs in low and middle-income countries (LMICs) world-wide (Dubeck & Gove, 2015; Gove & Wetterberg, 2011). Informed by research conducted primarily with alphabet-language learners the EGRA is often based on the five main instructional components identified by the National Institute of Child Health and Human Development (NICHD), (2000)—phonemic awareness, phonics, vocabulary, fluency, and reading comprehension (Dubeck & Gove, 2015). This tool has been used in over 40 countries to measure student literacy levels, inform and improve reading pedagogy, as well as shape education policy (RTI International, 2020), and as such, is a very influential part of educational improvement efforts especially in LMICs.

The EGRA relies heavily on the subconstruct of reading fluency, measuring correct words per minute, leading practitioners and policymakers globally to utilize fluency as a “proxy for comprehension” (Dowd & Bartlett, 2019). This has led to scholarly debate about the relative importance of fluency versus other reading subconstructs and reading comprehension tasks in international reading efforts (Abadzi & Centanni, 2020; Bartlett, Dowd, & Jonason, 2015; Dowd, Bartlett, Khamis-Dakwar, & Froud, 2020; Hoffman, 2012; Zuilkowski, Piper, Kwayumba, & Dubeck, 2019).

Although the EGRA is being used in countries across the world that use varying orthographies and scripts, there is little research on the psychometric properties of EGRA skills in a wide range of languages (for an exception see Jiménez, Gove, Crouch, & Rodríguez, 2014), let alone diverse writing systems, or in second (or later) language learners. In order to address this gap and shed empirical light on the relative importance of various EGRA sub-constructs in explaining reading ability, the objective of this study was to examine the underlying psychological and linguistic constructs measured in the EGRA in Kyrgyz, Russian, and Tajik through a cross-linguistic comparative lens. These languages were selected for four main reasons: (1) they are from very different language families, but utilize the same script, allowing us to conclude theoretically that any cross-linguistic patterns and generalizations that are observed hold across Cyrillic scripts, regardless of lanaguge family; (2) these three languages are used by 12 million speakers in Kyrgyzstan and Tajikistan, and Russian as a second or first lanaguge used by 4 million speakers; yet they are highly understudied, forcing practitioners to utilize reading research from English and maybe Russian contexts to develop reading programs; (3) the opportunity to carefully test and analyze data on several foundational reading subskills for an unusually large group of students in Central Asia; and 4) to test whether the various EGRA sub-tasks measure unique constructs or whether there is redundancy in the sub-tasks, which in turn can inform more targeted and cost-efficient measurements of early reading sub-skills in Cyrillic writing systems.

Theoretical framework

Two theoretical frameworks jointly inform the main hypotheses of this paper: the cognitive foundations of reading acquisition (CFRA) (Hoover & Tunmer, 2020; Tunmer & Hoover, 2019) and the universal grammar of reading (Perfetti, 2003). The CFRA—the recently expanded version of the simple view of reading—posits that reading comprehension ability requires both automatic word recognition abilities and language comprehension abilities; neither is sufficient by itself (Gough & Tunmer, 1986; Hoover & Gough, 1990; Hoover & Tunmer, 2020; Tunmer & Hoover, 2019). This underlying theoretical premise has been validated in several languages and writing systems (Florit & Cain, 2011; Joshi, Tao, Aaron, & Quiroz, 2012) and in second language (L2) learners as well (Lervåg & Aukrust, 2010; Proctor, Carlo, August, & Snow, 2005; Verhoeven, van Leeuwe, & Vermeer, 2011), especially if oral vocabulary is considered as part of the language comprehension pillar (Verhoeven & Perfetti, 2017). Within this model, word recognition and language comprehension are made up of additional sub-constructs. For example, word recognition is comprised of concepts about print, letter knowledge, lexical knowledge, and cipher (or orthographic) knowledge; and language comprehension is comprised of background knowledge, phonology, syntax, and semantics (Catts, Adolf, &Weismer, 2006; Gough, Hoover, & Peterson, 1996; Ouellette & Beers, 2010). The CFRA also makes explicit the inter-relationships and progressions of the sub-skills (Tunmer & Hoover, 2019), which makes it useful for translating the components of reading development into assessments and instructional approaches.

EGRA draws on the research from the National Reading Panel (NICHD, 2000), which focuses on the “Big Five” skills (Gove & Wetterberg, 2011; Dubeck & Gove, 2015). This research is mostly conducted with monolingual, English-speaking populations, which has a relatively unique orthography (Share, 2008; 2014). There is growing evidence that there are specific constraints in various writing systems and scripts that do not necessarily reflect the same cognitive and linguistic underpinnings as English. For example, the role of oral reading fluency in explaining reading comprehension is not unequivocal. While in English (Fuchs, Fuchs, Hosp, & Jenkins, 2001) and other alphabetic languages (Cossu, 1999), it has been shown to be a strong indicator of individual differences in reading competence; in other languages, the evidence is mixed, ranging from a lack of a significant relationship between oral reading fluency and comprehension in Hebrew (Saiegh-Haddad, 2003) to oral reading fluency playing a mediating role in Korean (Kim, Park, & Wagner, 2014).

Given the theoretical assumption that automatic and effortless processing of words is required to free up the mental space necessary for comprehension (Fuchs et al., 2001; Perfetti, 1985), fluency is required for comprehension; but that does not mean fluency equates to reading comprehension. In fact, the amount of variance that reading speed explains in reading comprehension greatly varies across languages (Dowd & Bartlett, 2019); fast reading does not take into consideration slow, but accurate readers (Bartlett, Dowd, & Jonason, 2015). Further, the use of fluency as a proxy for reading comprehension may not be as appropriate for bilingual or multilingual learners as it is for monolingual English speakers (Gathercole, 2013; Piper, Schroeder, & Trudell, 2016).

The notion of phonemic awareness is also an English and alphabetic-specific construct as it is often utilized. In the writing systems of South, Southeast and East Asia, there is an emerging consensus that the phonological awareness is represented in terms of a dual and asymmetrical awareness of both syllabic and phonemic awareness (Nag, 2007; Nag & Perfetti, 2014; Reddy & Koda, 2013; Nakamura, Joshi, & Ji, 2017). Finally, vocabulary has been empirically shown to be a component of oral language skills, and not an independent reading subconstruct in a transparent orthography, Greek (Protopapas, Simos, Sideridis, & Mouzaki, 2012). For these reasons, we propose that the CFRA may be a better fit for explaining the mechanisms and constructs that underpin a range of writing systems rather than the model of five skills put forth by the NICHD (2000) study.

We also ground our hypotheses in the universal grammar of reading and the operating principles of learning to read across orthographies (Perfetti, 2003; Verhoeven & Perfetti, 2017), which allows us to make cross-linguistic comparisons. The universal grammar of reading essentially suggests that all writing systems encode spoken language, and thus learning to read is uncovering the code used to encode that particular system—whether it is phonemes, syllables, morphemes, or some combination. Also supporting the notion that cross-linguistic differences in mapping principles affect the relative importance of reading sub-skills, and the relative differences in pace of acquisition, Caravolas, Lervåg, Mikulajová, Defior, Seidlová-Málková and Hulme (2019) reveal longitudinal evidence that decoding remains a significant predictor of reading development in opaque orthographies like English; whereas, oral lanaguge skills emerge as significant predictors of reading in transparent orthographies like Spanish, Slovak, and Czech as early as Kindergarten.

Similarly, Landerl et al. (2018) also conduct a cross-linguistic comparison of five alphabetic orthographies to demonstrate that the mapping principles are reliant on degree of orthographic transparency. When comparing the relative contributions of rapid automatic naming (RAN) and phonological awareness (PA) in fluent word reading in grade 1 and 2 students, the former consistently predicted word reading fluency, while the latter presented variation across the orthographies. At the same time, cross-linguistic studies across alphabetic orthographies differing in degree of transparency provide evidence for the universal mapping principle that both RAN and PA are stable predictors of early reading development in the first three years of school (Furnes & Samuelsson 2009; 2011). Together, these studies suggest that despite a stable pattern of latent structures predicting early reading, there are key cross-linguistic mapping differences that manifest themselves in terms of relative predictive power across the early reading acquisition trajectory.

This paper provides a window into three relatively different languages that have adopted one script—Cyrillic—under the umbrella of the alphabetic writing system. This allows us to conclude that any patterns seen across the three languages in terms of the componential constituents they represent can be traced to a function of the Cyrillic script, regardless of variances in the spoken language family.

Of these three languages, early reading acquisition in Russian is more studied than Kyrgyz or Tajik (in the English language medium literature), and thus will comprise most of this section. The empirical base from Russian describes key differences between the psycholinguistic mechanisms underpinning reading development in Russian, when compared to English. For example, Russian has a high degree of transparency for letters to sounds (Grigorenko, 2005); but due to unstressed vowel reductions, consonant assimilation, and homonymy, the relationship from sounds to letters is quite opaque (Rakhlin, Kornilov, & Grigorenko, 2017). This so-called “asymmetrical transparency” makes encoding (spelling) harder than decoding (reading) (Rakhlin, Cardoso-Martins, & Grigorenko, 2014). In a study examining Russian reading development in students with specific reading disorders, it was found that students made accuracy errors in addition to reading with diminished speed, possibly due to the complex morphological and stress patterns of Russian (Grigorenko, 2012; Grigorenko, Kornev, Rakhlin, Krivulskaya, 2011). There is also evidence that orthographically more complex words in Russian result in lower accuracy rates and slower reaction times, even more so than pseudowords which can be accessed with phonological recoding strategies (Kerek & Niemi, 2009). Another example of how the Russian orthography is implicated in the reading acquisition process is extremely complex syllabic structures, including several consonants in sequence, such as CCCCVC, which can make processing a single syllable much slower than a simpler syllabic structure in English. Taken together, it is clear that the universal mapping principle is apparent in learning to read Russian, but the mapping details vary from English.

In this study, we utilized a full battery of EGRA scores for Grade 2 and Grade 4 students in Russian, Kyrgyz, and Tajik—all of which use Cyrillic scripts—to test whether the underlying constructs that are measured can be explained by the two main pillars of the CFRA—word recognition and language comprehension, and whether any differences can be explained through cross-linguistic mapping differences. Specifically, our main aim was to identify the structure of reading comprehension in these under-studied languages by analyzing the latent factors or constructs that contribute to reading comprehension. In other words, the null hypothesis is that each of the subtasks measured will load on a separate construct, leading to the fact that the five skills identified by the NICHD (2006) study are more useful to be measured (and taught) separately; rather than a more streamlined assessment that focuses on the underlying constructs that emerge from our factor analysis.

After a brief contextual overview on learning to read in primary schools in Central Asia, we present basic background about the languages and orthographies under study, followed by a description of the early grade reading assessment, the research questions and analytical methods employed, and a presentation of findings. We conclude with a discussion of our results in the context of the theoretical bases of the CFRA and cross-linguistic differences that might be expected through the universal grammar of reading’s mapping principle; as well as of the practical uses of the EGRA.

The instructional contexts

In Central Asia, as in other parts of the former Soviet Union, a widely acknowledged success of the state was the provision of mass education at primary and secondary levels and the attainment of high literacy rates overall (Dienes, 1987). Twenty-five years after the collapse of the Soviet Union, the Central Asian nation states that inherited the institutions of a once respected education system face significant challenges in sustaining those achievements (Silova, 2009; Shamatov & Niyozov, 2010).

Currently, daily instructional practices emphasize mastering (usually memorizing) and reproducing core knowledge and literature. The oral reproduction of texts (memorization of key portions of literature, poems, speeches, essays, etc.) occupies a prominent place in classrooms as does orally answering questions posed by the teacher about text (Shamatov & Niyozov, 2010). Pupils also spend time at the blackboard and at their desks writing dictations and reproducing works of others, such as famous literary figures. In-class assessments and take-home work, neatness, accuracy, style, and form, are all assessed in addition to content knowledge and other reading sub-skills.

With the Soviet collapse and the mass closure of state-funded kindergartens, the improvement of primary level reading skills is a priority now more than ever (De Young, Reeves, Valyaeva, 2006; CEATM, 2009). In both Kyrgyzstan and Tajikistan, nationally representative EGRAs were conducted in 2014 as part of larger reading programs (American Institutes for Research, 2014). We utilize these nationally representative data from the assessments to determine the underlying constructs measured by the EGRAs in the Kyrgyz, Russian, and Tajik orthographies.

The languages and orthographies

Kyrgyz language and script

Kyrgyz is an Altaic (Turkic) language which uses a modified Cyrillic script spoken by approximately 6 million people in Eurasia. It contains 36 letters (33 from the Russian alphabet) with an additional three letters for unique sounds in the Kyrgyz language (Ң, Y, Ө) (Hu & Imart, 1989). It is also a transparent language characterized by word agglutination, vowel harmony (only certain vowels can follow preceding vowels), voiced and unvoiced letters, and a structured sentence word order that places verbs at the end of sentences (Hu & Imart, 1989). Implications of this for learning to read is that the acquisition of Kyrgyz words requires attention to morpheme development and how they combine in myriad ways to create different meaning in single words. Agglutination of words results in short sentences (on average) and longer words (on average) than many Indo-European languages use to express complex ideas (Hu & Imart, 1989; Drummond, 2011). Finally, relevant to educational assessment, there are ongoing socio-linguistic debates on standardization and language adaptation (translation) practices in the republic.Footnote 1 Figure 1 shows the Kyrgyz alphabet with its International Phonetic Alphabet notation.

Fig. 1
figure 1

Kyrgyz alphabet (modified cyrillic)

Russian language and script

Pupils in both countries retain the option to study in Russian Language of Instruction (LOI) tracks. In the Kyrgyz Republic, for example, approximately 25% of the entering Grade 1 cohort enroll in Russian LOI tracks though only about 7% of the population is ethnic Russian. While Russian LOI can be found in schools throughout the country, they are most prevalent in the capital, Bishkek, and the northern Chui Valley where approximately 50% of the cohort select Russian LOI tracks.

Russian is a Slavic language in the Indo-European language family that uses the Cyrillic alphabet and has 33 letters (21 consonants, 10 vowels, and two letters that do not designate any sounds). It is a transparent language with a one-to-one correspondence between symbols and sounds, and each written grapheme has only one corresponding phonemic unit. The regularity makes word recognition a straightforward process, with words then being mapped to familiar concepts in the mental lexicon (Rakhlin, Kornilov, Grigorenko, 2013), assuming oral language proficiency in the language. Figure 2 shows the Russian alphabet.

Fig. 2
figure 2

Russian alphabet

Tajik language and script

Tajik, also referred to as “Tajiki Persian” (фopcи́и тoҷикӣ́) is considered a dialect of Persian by most scholars.Footnote 2 Like Kyrgyz, Tajik is written with a modified Cyrillic script and has a total of 35 letters, six of which are not found in the Russian Cyrillic alphabet. An additional three letters can still be found in loan words though they were dropped from use in the 1998 language reform (ц, щ, and ы). In addition, like Russian, Kyrgyz and other Cyrillic languages Tajik has a transparent orthography. Figure 3 presents the Tajik alphabet with its International Phonetic Alphabet notation.

Fig. 3
figure 3

The Tajik alphabet (modified cyrillic)

Mother tongue versus foreign language of instruction

A final important contextual nuance in both Tajikistan and Kyrgyzstan needs to be noted that relates to language of instruction. Today, the overwhelming majority of ethnic Russians study in Russian LOI tracks. However, the Russian LOI cohorts are very diverse in ethnic composition. Azerbaijanis, Chechens, Dungans, Germans, Kazakhs, Koreans, Kurds, Kyrgyz, Tatars, Turks, Ukrainians, and Uzbeks, and children of other ethnic groups also study in Russian language tracks in large proportions in both countries (Korth, 2005; Drummond, 2011). Many of these minorities speak Russian at home as well as at school. Regardless of the ethnic constitution of the community, Kyrgyz language schools are typically composed almost entirely of ethnic Kyrgyz pupils and in Tajikistan, despite some diversity, the pupils in the Tajik language schools are more homogeneous than the Russian language schools (Drummond, 2011).Footnote 3 This is important when interpreting the factor analysis results as the Russian language cohorts are composed of a large percentage of pupils studying in a language other than their mother tongue.

Given this orthographic and linguistic background, and the above theoretical frameworks of the two pillars of the CFRA and the cross-linguistic mapping principles of the universal grammar of reading, we ask the following research questions:

  1. 1.

    Does initial letter sound identification (phonological awareness), oral vocabulary, and listening comprehension all load on the language comprehension construct in Kyrgyz, Russian, and Tajik?

  2. 2.

    Do familiar word recognition, pseudo word recognition, and oral reading fluency all load on the decoding construct in Kyrgyz, Russian, and Tajik?

  3. 3.

    Does reading comprehension load on language comprehension, decoding, or a separate construct?

The early grade reading assessment (EGRA)

The EGRA conducted in this study was a comprehensive assessment battery in both countries consisting of nine reading subtasks for Grade 2 and seven reading subtasks for Grade 4.Footnote 4 The subtasks were constructed based on the guidance from the most updated EGRA toolkits at the time of data collection (RTI, 2016). They included:

  1. 1.

    Initial letter sound recognition: Students were required to identify the first phoneme (which included both consonants and vowels) from 10 randomly arranged, commonly used real words. Students identified and sounded out just the first sound (phoneme) from a whole word read aloud by the administrator. This task was scored as percent correct.

  2. 2.

    Oral vocabulary: Based on the PPVT-R format (Dunn & Dunn, 1981), students saw ten sets of four pictures and were asked to identify which picture matched a word (noun or verb) they heard the test administrator say out loud. There were ten total items for all languages and grades and the subtask was scored on a percent correct basis.

  3. 3.

    Listening comprehension: An enumerator read a grade-appropriate passage to the student. The subtasks included a paragraph of approximately 40 words for Grade 2 and approximately 80 words for Grade 4. The test administrator read the passage aloud only once at a pace of about one word per second. The questions were then asked by the enumerator. For Tajik Grades 2 and 4, there was a total of five questions per text. For Kyrgyz and Russian (both grades) there was a total of four questions per text. The subtask was scored on a total percent correct basis.

  4. 4.

    Letter name recognition: Students were presented with a list of randomly arranged upper and lower case letters. Each letter of the alphabet was included on the list. In a two-minute period, students were asked to correctly identify each of the letters’ names. Results were scored on a letter correct per minute basis.

  5. 5.

    Familiar word recognition: Students read aloud 40 familiar, grade-appropriate words. The words were presented on a list with five rows and eight columns. Scores were calculated on a correct per minute basis.

  6. 6.

    Pseudoword recognition: Students read aloud 40 grade-appropriate pseudo words. The words were presented on a list with five rows and eight columns. Scores were calculated on a correct per minute basis.

  7. 7.

    Oral reading fluency (ORF): Students were asked to demonstrate oral reading of grade-appropriate passages. Students were asked to read out loud a grade-appropriate passage with fluency in terms of prosody, accuracy, and speed. They were provided two minutes to read and scores were calculated on a rate per minute basis. The ORF subtask included passages with 40 words (Tajik Grade 2) and 78 words (Tajik Grade 4). For Kyrgyz, the passage had 41 words (Grade 2), 78 words (Grade 4). For Russian, the passage had 48 words (Grade 2) and 91 words (Grade 4).

  8. 8.

    Reading comprehension: Students were asked to answer 3–5 orally presented comprehension questions on the same passage that was used in the ORF test. Most questions were explicit comprehension questions but at least one question was an implicit comprehension question. The subtask was scored on a total percent correct basis.

  9. 9.

    Dictation: Pupils were asked to listen to a sentence as it was read aloud and correctly reproduce that sentence in written form. Pupils were graded on spelling, symbols, capitalization, punctuation, spacing, and accuracy in vowel and consonant sounds. In Tajik, the maximum possible raw scores for Grades 2 and 4 were 16 and 22 points, respectively. In Kyrgyz, the maximum possible raw scores for Grades 2 and 4 were 16 and 20, respectively. In Russian, the maximum possible raw scores were 18 and 22 for Grades 2 and 4, respectively. Overall score was determined on a total percent correct basis.Footnote 5

Sample

The 2016 EGRA was administered in approximately 131 randomly selected schools in the Kyrgyz Republic (total students 4751) and 132 schools in Tajikistan (total students 4328). A cluster sampling approach was employed where schools were selected, then students within schools. Schools in both countries were chosen using a stratified sample. First, the number of schools that would need to be in each region, in proportion to the total number of schools, was determined; each school had an equal probability of being selected. Then the sampling frame was divided into different strata according to region. The necessary number of schools was selected within each of these categories for each region.

Administration procedure

All EGRA administrators were trained in a week-long, hands on training event. Data in the Kyrgyz Republic were collected in pencil and paper format while in Tajikistan via electronic tablets. Practice sessions with inter-rater reliability analyses were conducted to ensure consistency in EGRA scoring. Once at school, 20 pupils per grade with equal gender representation were randomly selected by lining up students in a single file line and selecting every Xth student according to a protocol. The subtasks took 25 min to administer in face-to-fact format. Tasks were scored in two ways, % correct and number of letters, words, or pseudo words read per minute, depending on the subtasks (see Table 3).

Assessment results

Before the results are presented, a note on reliability estimation in EGRA studies is in order. The EGRA poses two challenges to employing standard reliability estimations such as Cronbach’s alpha. First, several of the subtasks (e.g., Listening Comprehension) have a low number of test items, making reliability estimations tenuous. Second, standard estimation approaches are not applicable when subtasks are timed. One approach to reliability of timed tasks is to estimate the coefficients by entering total subtask scores (rather than item-level data) into the estimation formula for the subtasks that are measuring the same constructs (RTI, 2015). Coefficients on all four EGRAs for these timed subtasks were reasonably high: all above 0.80. Listening Comprehension and Oral Vocabulary (non-timed tasks that allow traditional estimation approaches) were estimated using Cronbach’s alpha, but these scores were composited. Cronbach’s alpha was also used to estimate reliability for the Dictation and Initial letter sound subtasks.

The results of the reliability analyses for all three languages are presented in Tables 5 and 6 in the appendices. Compared to many countries where EGRA is administered, the overall percentage of “zero scores” or percentages of students answering no items correctly subtasks were quite low, less than 1% of respondents on most subtasks. Of note, the score distributions on all three languages were normally distributed, though standard deviations were high on many sub-tasks, indicating wide dispersion of scores. There was evidence of strong foundational Grade 2 skills, for example, a mean score of 66.7 letters read per minute (Kyrgyz) and a 94% mean score (Kyrgyz) on the Initial Letter Sound sub-task. Oral Vocabulary mean scores were also above 95% on all three languages. In both countries there were gender gaps favoring females over males as well as gaps between rural and urban students, favoring urban students. Table 1 presents the sample sizes, mean scores, and standard deviations for both grades for all three languages.

Table 1 Summary statistics for both grades for all three languages

Factor analysis methods

To determine the underlying structure of the assessment, and specifically to answer our research questions of whether the EGRA sub-tasks were measuring the two main constructs of the CFRA—word recognition (or decoding) and language comprehension—principal axis factor analyses were conducted in SPSS. The factor analyses were conducted with the EGRA subtasks to assess the dimensionality of the entirety of the assessment battery. We entered the Grade 2 sub-tasks into a model simultaneously and the Grade 4 subtasks into a model simultaneously (each language separately).6

The data were analyzed under the “data reduction” function in SPSS and results interpretation was guided by examining factor loadings in a rotated factor matrix. Based on previous research on reading ability, it was plausible that our underlying factors of interest would be correlated, so oblique (Oblimin) rotation was selected. Interpretation of factor analyses was facilitated by using the “pattern matrix” output (see Table 2). The pattern matrices allowed interpretation of the overall simple structure of the data by examining how factors cluster on the matrix. “Clustering” of subtasks by high or low loadings across factors (high loading meaning above 0.400) indicated the existence of a single factor. In addition to examining the loadings, Eigen values and scree plots were also employed to interpret the results.

Table 2 Kyrgyz language factor analysis results

Factor analysis results: interpreting data structure

The results indicated two primary underlying constructs, language comprehension and decoding, for both grades and all three languages.Footnote 6 Below we present the results for each language with Grades 2 and 4 presented together for that language. The indication of two factors can be seen by the clustering by subtasks with coefficient values above 0.400. We also analyzed the data by gender and saw the same two underlying constructs for boys and girls. However, for the decoding subskills, there were a slightly higher loadings for the girls than the boys. For example, on oral reading fluency and familiar word reading, for girls they were around 0.9 while for boys they were 0.7.

Kyrgyz language results

The results for the Kyrgyz group are presented in Table 2. The results indicate a two-factor solution with evidence of decoding (factor 1) and language comprehension (factor 2) as the two primacy constructs. The clustering and high loadings for both grades on decoding for the Pseudo Word Recognition, Familiar Word Recognition, and Oral Reading Fluency subtasks are noteworthy. There is a cluster of loadings for Listening Comprehension, Reading Comprehension, and Oral Vocabulary on language comprehension (factor 2). Unlike the Russian group, the Letter Name Recognition subtask loads on factor 1 in the Kyrgyz language (0.618) while Initial Letter Sounds loads on neither factor.

Interestingly, for the Kyrgyz language group, Dictation loads on different factors, depending on grade level. At Grade 2 it loads on decoding (though at only 0.466) while at Grade 4 it loads on language comprehension (0.508). The low loadings and inconsistency as to which factor it loads on plausibly indicate that the subtask is somewhat “multi-dimensional,” requiring both language comprehension (the teacher reads out one sentence to be written down) which must then be copied accurately onto the test form (encoding), with spelling, grammar, neatness, and punctuation skills necessary. In other words, the subtask may demand a broader range of skills than the other subtasks.

Russian language results

As can be seen in Table 3, for both Grades 2 and 4 the Oral Reading Fluency, Familiar Word Recognition, and Pseudo Word Recognition subtasks all cluster together with coefficients above 0.400, with relatively high loadings. While the order is slightly different for the two grades, these three subtasks are clearly tapping into the same basic construct, as distinct from the remaining subtasks with coefficients under 0.400 on this factor (construct).Footnote 7 Per the CFRA, which theorizes that the underpinning cognitive demands of all these tasks are automatic sound-symbol mapping processes, we denote factor 1 as the construct of decoding. For the Russian language group (Grade 2 only), Letter Name Recognition did not load above 0.400 on either factor.

Table 3 Russian language factor analysis results

The second clustering evident in both grades includes Oral Vocabulary, Reading Comprehension, Listening Comprehension, and Initial Letter Sound (Grade 2 only), and Dictation. The loadings are high for the first three subtasks. Based on the CFRA, which posits that underpinning cognitive demands of all these tasks is the processing of oral language and phonological information (from simple phonological unit manipulation to more complex integration of various kinds of semantic information), we denote factor 2 as the construct of language comprehension. For the Russian language group in both grades, the results from the factor analyses are relatively straightforward to interpret as the loadings are high and the distinctions between the clusters of subtasks are clear. Because only approximately 25% of all students studying in the Russian language tracks indicated that Russian was their home language, we carried out additional analyses to see if there would be differences in the results by those studying in their native language vs. a non-native language. The results were that the data structure in the two groups was the same.Footnote 8

Tajik language results

The results for the Tajik language group are consistent with the Kyrgyz and Russian results and the CFRA: A two-factor solution with evidence of decoding (factor 1) and language comprehension (factor 2) as the two primary constructs. There is evidence of clustering and high loadings for both grades on decoding for the Pseudo Word Recognition, Familiar Word Recognition, and Oral Reading Fluency subtasks. Also, consistently, the loadings for Listening Comprehension, Reading Comprehension, and Oral Vocabulary on language comprehension (factor 2) are clustered, though the loading at Grade 4 for Oral Vocabulary is not high (0.454). As with the Kyrgyz group, Letter Name Recognition loaded on factor 1 (decoding) and Initial Letter Sound did not load on either factor. Dictation, however, loaded on decoding and only at Grade 2 (Table 4).

Table 4 Tajik language factor analysis results

Discussion

This study aimed to determine what underlying cognitive and linguistic factors were inherent in the sub-tasks measured in the EGRA in Kyrgyz, Russian, and Tajik, and whether there were cross-linguistic variations in these factors that stemmed from the oral language or corresponding mapping principle differences. The results indicated strong support for the CFRA in the three analyzed languages from three different language families (Slavic, Altaic, Persian). Despite significant differences in phonological and syntactic patterns such as differences in agglutination, the degree to which there is vowel harmony, noun declensions and case endings across the three languages, all three languages share the Cyrillic alphabet (with a few different letters across languages). The study showed that when the same underlying grapheme-to-phonology mapping principles are employed (Perfetti, 2003; Perfetti & Verhoeven, 2017), the same underlying factors emerged in the sub-tasks measured in these assessments. In other words, in Russian, Kyrgyz, and Tajik, the 7–9 EGRA sub-tasks were consistently measuring only two latent variables: decoding and oral language comprehension.

Specifically, we asked whether oral vocabulary, listening comprehension, and phonological skills will load on a single oral language construct; whether pseudo-word recognition, real word recognition, and oral reading fluency will load on a single decoding construct; and whether reading comprehension will load on the former or latter. As expected, based on the CFRA, which posits that there are two pillars undergirding the development of reading, our findings revealed a clear two factor model in all three languages in both grade levels. Word recognition and oral reading fluency skills are likely to be measuring the same underlying decoding aspects of literacy development; whereas, listening comprehension, oral vocabulary, and reading comprehension all loaded on the underlying construct of language comprehension. This suggests that reading fluency is more likely to be measuring script processing skills, while reading comprehension measures are likely to be tapping into language comprehension skills as well, and as such, the two are not proxies for each other in these three languages.

These results have practical utility for teachers and practitioners for several reasons. First, understanding the need to develop skills in both areas enables teachers to focus adequately on preparation of relevant tasks and activities. An understanding of the differences between the two constructs and knowledge about what kind of reading tasks and activities develop the requisite skills can enable teachers to avoid duplication of efforts and wasting resources. For example, if a teacher has limited time for activities that promote literacy, it would make sense to use that “time on task” to address both constructs in the CFRA. These results could also inform teacher training models by clarifying redundancies in certain tasks, and streamlining limited teaching time on tasks that make unique contributions to literacy acquisition. Clarity about what the subtasks are assessing also enables better diagnostic analyses of obstacles to learning to read.

These results are also of interest to assessment developers and implementers. For example, confidence in our knowledge that the subtasks are essentially tapping into the same latent constructs can lead to efficiency gains in assessment if fewer subtasks can be used in further assessments. Knowing that subtasks cluster together also means it is possible to create scores that are composited for constructs such as decoding and possibly employ this knowledge in test scoring or equating methods. For example, certainty that the subtasks Familiar Word Recognition, Oral Reading Fluency and Pseudo Word Recognition are all assessing decoding skills might enable composite test scoring in some instances, where scores from each of the subsections can be standardized and used to create robust indictors. Such indicators could be used as dependent variables in regression modeling in impact evaluations that are more powerful than subtasks with fewer items. It could also lead to better understanding of what subtasks are redundant and can be used as composites in evaluations of basic education programs in LMICs, especially in the Central Asian context. Finally, the results empirically validate the fact that fluency is not a “proxy for comprehension” in at least these cases, and this should not be considered as a primary indicator of “reading” in general in global metrics of literacy in LMIC’s or beyond. This helps build the evidence base for the debate introduced earlier on the use of various sub-tasks in international reading efforts (e.g. Abadzi & Centanni, 2020; Bartlett, Dowd, & Jonason, 2015; Dowd, Bartlett, Khamis-Dakwar, & Froud, 2020).

A small sample size was not a limitation of this study. That said, there were a couple of limitations to this study that warrant discussion. First, there was no measure of non-verbal intelligence that could act as a co-variate. Second, there was a limited number of items on the reading and listening comprehension sub-tasks. This is a common concern with EGRAs worldwide as they are usually designed for students with very low reading ability to begin with, and high so-called “zero scores” (or the inability to even decode a single word). We have reported our reliability measurement procedures to mitigate this possible limitation.

In conclusion, this paper addresses a need for a deeper understanding of the internal structure of the early grade reading assessments being used in LMIC’s globally. By focusing our investigation on Kyrgyzstan and Tajikistan, we go beyond the “anglocentricities” (Share, 2008) and monolingualism inherent in the theoretical base that informs the development of many EGRAs. This is important not only for programmatic decision making and impact evaluation outcome measure identification, but also for the development and definition of literacy metrics such as the those that may be used in the measurement of the Sustainable Development Goals and other international development benchmarks.