The Wechsler Individual Achievement Test, 3rd Edition (WIAT-III; Wechsler 2009) is a popular, individually administered, standardized measure of academic achievement designed to assess the reading, writing, math, and oral language skills of individuals between ages 4 to 50 (Burns 2010). Despite publication almost a decade ago, little structural psychometric research has investigated the WIAT-III other than inclusion of its subtests as outcome variables explained by cognitive skills (e.g., Beaujean et al. 2014; Caemmerer et al. 2018). Indeed, detailed investigations of achievement batteries have lagged considerably compared to analyses of instruments that measure general cognitive ability (Dombrowski 2015). This research gap could be due to clinicians’ tendencies to focus on latent factors when considering general cognitive functioning, while focusing on observed variables when evaluating academic achievement. To this point, initial WIAT-III construct validation efforts focused on correlations within WIAT-III achievement areas, with other published batteries, while less attention was devoted to the identification of latent factors within the battery (Breaux 2010). In addition, standardized test development and interpretation is often criticized as disconnected from psychological theory (Beaujean and Benson 2018). There are well-researched theories of academic skill development that can inform both the development and interpretation of academic batteries. An example of such a theory is “the simple view of writing” (Berninger et al. 2002). Our purpose is to describe the simple view of writing, its implications for psychoeducational assessment construct validity, and to apply those implications to evaluate aspects of WIAT-III test validity.

The Simple View of Writing

Initially, the simple view of writing conceptualized composition as the outcome of two skill areas, spelling and idea generation (Juel 1988). Relying on the Hayes and Flower (1980) model of adult writing, researchers expanded and clarified the model to describe the developmental trajectory of writing skill (Berninger 2009; Berninger and Amtmann 2003; Berninger and Swanson 1994; Hayes and Berninger 2009). The model includes three broad skill areas, transcription, text generation, and self-regulation, collectively coordinated within working memory (Berninger and Winn 2006; Kim et al. 2015a). The effects of transcription, text generation, and self-regulation skills on students’ composition are central to the model.

Transcription represents skills necessary to convert language (either heard from an external auditory source, or generated internally in one’s mind) into print, specifically, spelling and handwriting (Hayes and Berninger 2009). Both handwriting and spelling intervention effects on writing have been investigated by comprehensive meta-analyses (Graham and Santangelo 2014; Santangelo and Graham 2016). Graham and Santangelo’s handwriting analyses aggregated intervention effect sizes (ES) for students in Kindergarten through the ninth grade, stressing improvement in composition quality (ES = .84), length (ES = 1.33), and fluency (ES = .48). Handwriting was effectively supported via individualized instruction (ES = .69) and technology (such as using a tablet to copy letters; ES = .85). Their spelling analyses demonstrated that explicit instruction transferred to correct spelling within composition (ES = .94), though spelling instruction did not appear to increase general writing performance (ES = .19). An increase in general writing may also require explicit instruction pertaining to composition (Berninger et al. 2002).

Text generation represents a writer’s skill constructing ideas, translating those ideas into words, sentences, or paragraphs, and includes topic and genre knowledge (Berninger et al. 2002). Text generation is often operationalized as oral language skills and represents students’ collective knowledge of vocabulary, grammar, morphology, syntax, and language fluency (Kim et al. 2015b; Kim and Schatschneider 2017; McCutchen 2011). Researchers have demonstrated that language competence relates to effective writing across school grades (Berninger and Abbott 2010; Dockrell and Connelly 2009; Kim et al. 2011, 2015a, 2015b). Research investigating language intervention effects on writing appears significantly less developed than spelling/handwriting intervention (Shanahan 2006). Comparisons of students with specific language impairments to students with decoding impairments and typically developing peers suggest that oral language challenges may lead to shorter composition and higher rates of spelling and grammatical errors (Connelly et al. 2012; Dockrell et al. 2009; Puranik et al. 2007). Some investigators suggest that writing deficits can persist even after the remediation of language difficulties (Dockrell et al. 2009; Nauclér and Magnusson 2002).

Self-regulation skills such as goal-setting, self-assessment, and writing strategy instruction can support text planning, generation, and revision efforts (Santangelo et al. 2008). Indeed, explicit intervention to support self-regulation can increase the quality of writing, particularly when added to writing strategy instruction (ES = .59; Graham et al. 2012). Ritchey et al. (2016) stressed that the types of supports necessary to scaffold students’ self-regulation when writing might vary based on the type of writing task. For instance, when spelling, the writer might require help sounding out a word or reviewing their spelling choices. Alternatively, when composing, writers might require support with graphic organizers, or other prompts for topics/subtopics.

It is important to stress that these skills are not necessarily independent of each other (Hayes and Berninger 2009). Transcription skills may constrain text generation (Graham et al. 1997; Hayes and Berninger 2009). For instance, when young writers dictate their thoughts, removing the need for transcription skills, they produce stronger text (De La Paz & Graham, 1995). When fluent writers transcribe in a novel, unpracticed way, their sentences become shorter and less sophisticated (Grabowski 2010). Berninger (1999) suggested that handwriting and spelling skills place more cognitive load on working memory in developing writers, limiting students’ ability to translate ideas into language.

The concept of levels of language is also central to the simple view of writing. As language is not a single construct, listening comprehension, oral expression, reading, and writing reflect both interconnected and independent developmental systems (Berninger 2000). Berninger and Abbott (2010) demonstrated that these four language systems contain both shared and unique variance longitudinally and stressed that individual strengths and weaknesses in these language systems can be common (even in “typically developing” youth) and relatively stable over time. These language systems can be compared and contrasted not only expressively/receptively but also at subword, word, sentence, and text/discourse levels of languages. These levels are partially hierarchical; students can engage in higher levels of language without proficiency in lower levels, even though the units of higher levels are comprised of lower levels of language (Abbott et al. 2010; Berninger et al. 1988, 1994). As a case in point, Berninger et al. (1994) reported non-significant relationships between measures of spelling, sentence, and paragraph construction completed by intermediate grade students. The implication is that students may demonstrate intraindividual differences in performance across various levels of language, and performance at one level should only partially explain performance at another. The WIAT-III technical manual authors stressed this aspect of writing development, and provided subtest correlations, but did not investigate latent factors or theory-based structural relationships (Breaux 2010).

Implications for Validity of a Writing Battery

The WIAT-III includes subtests assessing writing performance at multiple language levels (Breaux 2010). Alphabet writing fluency provides a measure of fluent letter recall and legibility, writing at the subword level. Breaux and Lichtenberger (2016) stressed that it is not a handwriting task, though others have categorized it as such (Drefs et al. 2013). It is certainly a measure of transcription skills. The measure can be administered to students up to grade 3. Spelling requires examinees to spell single words from dictation (though the earliest items include single letters) and reflects transcriptive writing at the word level for most examinees. It requires knowledge of letter/sound correspondence, prefixes, suffixes, and also homophones, and thus, effective performance will also tap subword language skills, semantics, and morphology. The sentence composition task is comprised of two components. Sentence combining requires examinees to convert short sentences into one sentence of greater complexity. Sentence building requires examinees to generate a sentence based on a target word. Collectively, performance on these sentence composition tasks is most likely influenced by an awareness of semantics, syntax, grammar, capitalization and punctuation, and spelling. At the text level, essay composition requires examinees to write an essay within 10 min. It provides scores representing theme and organization, word count, an aggregation of theme and word count, and a score reflecting grammar and mechanics.

As some researchers suggest that oral language can represent text generation skills (Kim and Schatschneider 2017), we included the WIAT-III oral language measures in these analyses. Within the battery, receptive language includes two components: one requiring the recognition of a picture definition of a word (receptive vocabulary) and the other requiring answers to questions about brief audio passages (oral discourse comprehension). Expressive language tasks include an expressive vocabulary measure, which requires examinees to say a word labeling a picture and its corresponding definition; a word fluency task, requiring examinees to quickly provide names of category exemplars; and a sentence repetition task, in which examinees restate sentences read by the examinee.

The simple view of writing provides a strong theoretical basis for test developers to evaluate aspects of construct validity and for clinicians to describe both the writing performance of struggling students as well as develop intervention strategies. As most achievement batteries include tests operationalizing the model’s core constructs, the simple view of writing provides predictions that can be empirically evaluated via the relationships between subtests. First, the battery’s oral language tasks should demonstrate effects on each written level of language (e.g., spelling, sentence construction, and composition). Second, as transcription skills may constrain students’ text generation, measures of transcription should mediate effects of language on higher levels of language. Third, as language levels are only partially hierarchical, the effects of one language level on another should only be low to moderate. Lower levels of language should demonstrate both direct and indirect effects on higher language levels.

Our purpose is to evaluate the extent to which the WIAT-III tasks operationalize the aforementioned hypotheses. We used the battery’s normative sample and an independent sample of students referred for special education to formally test invariance between the two samples. Invariance testing can support whether language/writing constructs established in the normative sample represent the same skills or abilities in students referred for testing due to academic difficulties (Wicherts 2016). Results should add to the construct validity base of the WIAT-III.

Method

Participants

WIAT-III Standardization Sample

The WIAT-III standardization sample was stratified to approximate the US population in 2005, as reflected in the U.S. Bureau of the Census on the basis of grade, age, sex, race/ethnicity, parent education level, and geographic region (Breaux 2010). Because students vary in the writing measures they complete based on their grade level, we analyzed the performance of students in students from grades 1–3 (n = 668) in one model and grades 3–12 (n = 2226) in a second model. These grade ranges conform to differences in the WIAT-III subtest administration procedures.

Referral Sample

We gathered a sample of students from a suburban school district in the Pacific Northwest who completed all WIAT-III oral language and writing subtests through special education evaluation. Other portions of this sample were reported by Parkin (2018). All participants completed the battery during the same school year. Specifically, 143 students completed subtests for grades 1–3, and 345 students completed subtests for grades 3–12. Most of the same third grade students were included in both age groups, though some were excluded from one group or the other because they did not complete either the alphabet writing fluency task or the essay composition task. The grades 1–3 group included two third grade students not in the grades 3–12 group. The grades 3–12 group included four third grade students not in the grades 1–3 group. Socioeconomic status data is not available for the referral sample. Approximately 12% of students who attend the school district qualified for free or reduced Lunch, according to district information. We compare available demographic data for these samples in Table 1.

Table 1 Demographic characteristics of WIAT-III normative and referral samples

Measures

We used oral language and writing measures from the WIAT-III to operationalize text generation, transcription, and composition skills. We provide a description of these tasks and their reliability coefficients in Table 2.

Table 2 WIAT-III subtest-component descriptions

Procedure

We obtained permission from the publisher to analyze the WIAT-III standardization sample. We gathered archived WIAT-III test data on referred students attending a large school district in the Pacific Northwest. Forty-seven certified special education teachers trained in the administration of the WIAT-III gathered data from students as part of a special education eligibility evaluation process during the 2015–2016 school year. Teachers administered between one and 35 assessments with a mean of 10.1. Order of subtest administration is not known. The battery was administered during the standard school day. The specific number of sessions for each student is not known. We gathered demographic data from a district database and test scores from the publisher’s online scoring system.

Analyses

We conducted all analyses with Mplus Version 8 (Muthén and Muthén 1998-2017). Within each age group, our analyses proceeded in three stages.

Measurement Invariance

We evaluated configural, metric, and scalar invariance across normative and referral samples. Because the WIAT-III provides single measures of subword, word/spelling, and discourse/text skills, we modeled each of these factors with a single indicator. To ensure model identity, we constrained disturbance variance in these factors to 0.

Structural Equation Modeling

Next, for each age group, we modeled the effects of the oral language latent factor on each level of language factor. We also evaluated the effects of lower language levels on higher levels. Figure 1 depicts the model for grades 1–3, and Fig. 2 describes the model for grades 3–12.

Fig. 1
figure 1

Standardized effects comparing samples on grades 1–3 WIAT-III language and writing measures. Coefficients represent the normative/referral samples. Asterisk indicate statistically non-significant coefficient. Disturbance factors not included to easy readability. RV receptive vocabulary, ODC oral discourse comprehension, EV expressive vocabulary, OWF oral word fluency, SR sentence repetition, AWF alphabet writing fluency, SP spelling, SB sentence building, SC sentence combining, Lang language, Sent sentence writing, Alpha alphabet writing, Spell spelling

Fig. 2
figure 2

Standardized effects comparing samples on grades 3–12 WIAT-III language and writing measures. Coefficients represent the normative/referral samples. Asterisk indicates statistically non-significant coefficient. Disturbance factors not included to easy readability. RV receptive vocabulary, ODC oral discourse comprehension, EV expressive vocabulary, OWF oral word fluency, SR sentence repetition, EC essay composition, SP spelling, SB sentence building, SC sentence combining, Lang language, Sent sentence writing, Spell spelling, Write essay writing

Consistency of Effects Across Samples

We used Wald’s test to compare the equivalency of effects across each sample. This included five coefficients in the grades 1–3 model and six coefficients in the grades 3–12 model.

Model Fit

For model fit statistics, we relied on multiple measures (Keith 2015). The comparative fit index (CFI) and the Tucker–Lewis index (TLI) indicate a strong fit when values reach .95 or higher. The root mean square error of approximation (RMSEA) when values are lower than .06 and standardized root mean square residual (SRMR) when values are lower than .08 reflect appropriate fit (Hu and Bentler 1999). Changes in model fit statistics have been recommended for testing measurement invariance (Chen 2007; Cheung and Rensvold 2002). These recommendations include a decrease in CFI by at least .01, supplemented by an increase in RMSEA by at least .015 for testing factor loading invariance or for testing intercept or residual invariance, if a less constrained model is to be selected. When evaluating model effects, Keith (2015) suggests that effects between .05 and .10 are small, effects between .10 and .25 are moderate, and effects larger than .25 are large.

Results

Descriptive Statistics

We report descriptive statistics of manifest variables in Table 3. As would be expected, participants in the referral sample demonstrated lower performance on almost all variables. The referral sample also displayed greater variability in scores.

Table 3 Descriptive statistics across WIAT-III normative and referral samples

Measurement Invariance

Measurement invariance results for the grades 1–3 group are in Table 4. Using normative and referral sample type as the grouping variable, the configural, metric, and scalar invariance models fit the data well. The ΔCFI from the configural invariance model to the metric invariance model was − .001, and the ΔCFI from the metric invariance model to the scalar invariance model was − .023. The ΔRMSEA from the configural invariance model to the metric invariance model was 0, and ΔRMSEA from the metric invariance model to the scalar invariance model was .019. These results suggest that between configural, metric, and scalar invariance models, metric invariance model should be selected. Further examining the scalar invariance model indicated that two intercept constraints for the sentence combining and oral word fluency subtest components should be released. After releasing the two constraints, the partial scalar invariance model fit the data well (CFI = .976, TLI = .968, SRMR = .058, RMSEA = .051 with 90% CI [.037, .064]) and not worse than the metric invariance model. We retained the partial invariance model for subsequent analyses.

Table 4 Model fit for WIAT-III measurement invariance models for grades 1–3 group
Table 5 Model fit for WIAT-III measurement invariance models for grades 3–12 group

Measurement invariance results for the grades 3–12 group are in Table 5. The configural and metric invariance models fit the data well, and the scalar invariance model fits the data acceptably. The ΔCFI from the configural invariance model to the metric invariance model was .001, and the ΔCFI from the metric invariance model to the scalar invariance model was − .032. The ΔRMSEA from the configural invariance model to the metric invariance model was − .003, and the ΔRMSEA from the metric invariance model to the scalar invariance model was .028. These values suggest that between configural, metric, and scalar invariance models, metric invariance model should be selected. Further examining the scalar invariance model indicated that three intercept constraints for sentence combining, oral word fluency, and receptive vocabulary subtest components should be released. After releasing the three constraints, the partial scalar invariance model fits the data well (CFI = .980, TLI = .974, SRMR = .043, RMSEA = .049 with 90% CI [.042, .056]) and not worse than the metric invariance model. Therefore, we retained the partial invariance model for subsequent analyses.

From the final partial scalar invariance models, we obtained latent mean differences between the normal sample and the referral sample. For the grades 1–3 group, the referral group displayed statistically significantly lower language, spelling, and sentence writing factor scores, compared to the normative sample; the norm sample and the referral sample were comparable on alphabet writing factor. For the grades 3–12 group, the referral group displayed statistically significantly lower language, spelling, sentence writing, and essay composition factor scores, compared to the normative sample.

Structural Equation Models

Based on the final measurement model for each age group, we evaluated path effects from the language factor to the writing factors and between levels of language. We also investigated differences between effects across samples via Wald’s test: five structural paths in the grades 1–3 sample, χ2(5) = 28.52, p < .001 and six structural paths in the grades 3–12 sample, χ2(6) = 53.77, p < .001.

For each age group, the model fit for the two group structural regression model was the same as the model fit for the partial scalar invariance model. Figure 1 depicts the path model for the grades 1–3 sample, illustrating the effects of oral language on levels of language and the interrelationships between writing tasks. The language factor demonstrated a moderate direct effect on sentence writing. Its effect on spelling varied by sample, moderate in the norm sample, and smaller in the referral sample. The language factor also displayed a small to moderate relationship level with the alphabet factor. Regarding levels of language, the subword alphabet writing factor demonstrated a small to moderate effect on the word level spelling factor. It displayed a negligible to small effect on the sentence level factor for the two samples. The spelling factor demonstrated a large effect on the sentence factor. Collectively, the model explained 42% of variance in the spelling factor for the normative group, but only 19% in the referral sample. For sentence writing, the model explained 78% of variance in the normative sample and 74% of variance in the referral sample.

Figure 2 displays path effects for the grades 3–12 sample. Sample type moderated the effect of language on spelling. In the normative sample, the effect was large, while in the referral sample, the effect was significantly smaller. The model explained 49% of spelling variance in the normative sample and 23% of variance in the referral sample. Language displayed a moderate effect on sentence writing and a negligible to small effect on essay composition, again moderated by sample. The language-to-composition effect was not significant in the normative sample and small in the referral sample. In terms of levels of language, the word level spelling factor displayed a moderate effect on the sentence writing factor and no direct effect on the essay composition factor. Its effect was completely mediated by the sentence factor. The sentence factor demonstrated a moderate effect on the essay composition factor. The model explained 49% of spelling variance in the normative sample and 23% in the referral sample. Regarding sentence writing, the model explained 72% of variance in the normative sample and 48% of variance in the referral sample. It explained 25% of essay composition variance in the normative sample and 38% in the referral sample.

Discussion

The simple view of writing suggests that text generation skills, operationalized as oral language, should demonstrate effects on word, sentence, and composition skills. The simple view of writing also suggests that transcription skills (e.g., spelling/handwriting) should mediate those effects. The model conceptualizes writing tasks via hierarchical levels of language. Subword, word, and sentence skills may only partially influence performance on higher levels in the hierarchy. We investigated these implications of the simple view of writing within the WIAT-III, using latent factors to establish both the effects of a language factor on written expression tasks and the effects between writing tasks in the battery. We also compared these implications across the battery’s normative sample and an independent sample of students referred for special education.

Our measurement model established that WIAT-III writing tasks can indeed be interpreted in a manner that is consistent with a level of language model (Berninger et al. 1994). In the younger sample, a model specifying subword, word, and sentence writing skills provided a strong fit to the model. Similarly, in grades 3–12, a model with word, sentence, and composition level tasks reflected a strong model fit. These results are generally consistent with longitudinal analyses conducted with the second edition of the WIAT from a levels of language perspective (Abbott et al. 2010; Wechsler 2001). Importantly, the measurement model demonstrated metric and partial scalar invariance across normative and referral samples. These findings indicate that the latent factors may be interpreted with the same meaning across samples. However, mean score differences across groups in the sentence combining and oral word fluency measures (and in the grade 3–12 groups, the receptive vocabulary measure) may be more related to those tasks, rather than the latent factor they represent.

Though the measurement model was consistent across samples, structural equation modeling indicated that the effects of latent language and writing variables varied across samples. We provide numerical comparison of the effects of language skills and lower level writing skills on writing performance in Table 6. In both samples and across both models, language skills demonstrated direct effects on word and sentence level skills and an association with subword performance in the younger grade model. Language effects on spelling varied by sample. In the normative sample, language and spelling appear to develop closely together, but they are less associated with each other in the referral sample. These results likely underscore a core phonological deficit in youth struggling to develop word-level literacy skills (Fletcher et al. 2007). Language and spelling affected sentence writing in the same way across samples and within models. Collectively, these results indicate that transcription skills represent a significant bottleneck on essay writing, perhaps increasing the difficulty of language production (Berninger 1999; Bourdin and Fayol 1994; Connelly et al. 2005). The bottleneck appeared more extreme in the normative sample due to a smaller association between spelling and language in the referral sample. These results are consistent with our hypotheses stemming from the simple view of writing’s description of the text generation/transcription relationship. Lower level writing skills largely mediated language effects on higher order skills. At the same time, the larger, direct effect for language on composition in the referral sample could suggest a compensatory process. Perhaps, writers with stronger language skills can more easily select words and phrases they can spell, reducing the transcription bottleneck.

Table 6 Comparison of standardized effects across WIAT-III normative/referral samples

While WIAT-III writing tasks demonstrated partial independence, these measures appear more closely associated than those described by others (Berninger et al. 1994). Berninger et al. (1994) used writing tasks similar to those contained in the WIAT-III but reported non-significant relationships between them. One reason may be differences in scoring. For instance, here the spelling/sentence writing relationship may be higher than Berninger’s investigation, because some types of spelling errors are included as part of the scoring criteria in sentence composition (Breaux 2010). In contrast, essay composition scoring includes a word count and a rubric to evaluate organization and theme development; neither component contains criteria explicitly associated with lower language levels. This could explain why the model explained the less variance in the essay composition variable than in the spelling or sentence writing variables.

Consistency with Previous Research

These results are consistent with a number of prior investigations. Kim et al. (2015b) noted significant effects for spelling on a latent writing quality variable in a small sample of Korean students. Their oral language factor, a discourse-level measure more narrow than the factor we used here, only approached significance, likely due to the small sample size. However, they did not model a mediation effect for spelling, nevertheless noting an association between spelling and discourse language. Using the WIAT-II, Berninger and Abbott (2010) reported effects for listening comprehension and oral expression skills on written expression in multiple grade levels, though this study did not include a measure of spelling skills. Other researchers have investigated effects of cognitive performance on writing skills (Caemmerer et al. 2018; Cormier et al. 2016; Hajovsky et al. 2018). If crystallized intelligence (Gc) might be construed as oral language skills, Hajovsky et al. (2018) demonstrated language effects on writing moderated by grade level in the Kaufman Test of Educational Achievement, Second Edition (KTEA-2; Kaufman and Kaufman 2004). In comparison, using Wechsler batteries, Caemmerer et al. (2018) noted effects for Gc on spelling, but not for essay writing, when included in analyses with other cognitive variables, and Cormier et al. (2016) describe similar differences in Gc effects across basic writing and written expression. These analyses regressed single writing tasks on cognitive variables and may not account for the effects of academic tasks on each other, a key implication from the simple view of writing.

Implications for Practice and Test Interpretation

These analyses suggest partial independence between levels of written language, as assessed by the WIAT-III. Clinicians should expect variability in these skills when evaluating examinee writing performance. To describe that performance, clinicians may need to focus interpretation on specific tasks. At the same time, we stress that subtests may not provide the level of reliability necessary to make high-stake decisions; in this battery, only spelling demonstrates a reliability level of .90 or higher (Breaux 2010).

Levels of language interpretation of writing tasks may have implications for the interpretation of a writing composite. Schneider (2013) described two ways to consider the relationships between subtests and composites. Generality reflects the idea that some abilities influence a wider range of functioning. For example, pertaining to cognitive measures, “g” would influence most all cognitive tasks, a broad ability would influence tasks reflective of a general class of abilities (e.g., only visual/spatial skills), and a narrow ability would influence very specific tasks (e.g., Spatial Relations). In comparison, abstractness represents the degree of complexity in a task. A more abstract task may require the coordination of multiple skills/abilities. We contend that the levels of language fit described here suggests that writing skills might effectively be conceptualized in this way. Spelling may represent the coordination of letter retrieval, phonological, orthographic, and morphological skills, as well as semantic/vocabulary knowledge (Berninger et al. 2006), while sentence writing adds aspects of grammar and syntax. Composition may require the coordination of even more skills.

Limitations and Future Directions

There are a number of limitations associated with these analyses. First, we recognize that the tested models reflect an incomplete operationalization of the simple view of writing. They omit indicators of handwriting, working memory, and self-regulation skills. Like spelling, handwriting represents a transcription skill and likely mediates the relationship between oral language and levels of written expression. However, working memory and self-regulation, skills that coordinate other skills, may be more challenging to conceptualize. Do these skills explain independent variance in writing, or do they moderate or mediate the influence of other skills? Kim and Schatschneider (2017) reported that language skills mediate working memory effects on composition. Poch and Lembke (2017) provided a latent variable analysis including self-regulation and working memory skills alongside writing, though noted a poor model fit. In an exploratory analysis, they reported that working memory and lower-order writing skills loaded together and self-regulation, planning, handwriting, and essay composition loaded on a second factor. However, the authors noted that their results may have been impacted by a small sample size for the analyses they employed. It may also be necessary to expand these models beyond variables included in the simple view of writing. Levels of language conceptualization of writing suggest that there may be unique predictors of performance at each language level. For instance, phonological skills may explain variance in spelling, but not in essay writing, controlling for spelling. These models could be expanded to investigate such predictors, particularly at the essay level; language and lower-level writing skills only explained 25% of essay writing variance in the normative sample.

Second, the referral samples we used in these analyses come from only one district and may not generalize to others. Although the samples demonstrated a measurement model consistent with the normative sample, that generalizability may be due to the inclusion of referred students without writing difficulties. The referral samples were heterogeneous in that they included students with numerous types of disabilities and educational needs. Only approximately 50% of the referral samples included students that required specially designed instruction in writing. These results may differ if only students with writing challenges were analyzed.

Third, we evaluated the consistency of a specific model across samples. However, there may be other models that reflect an appropriate model fit. Finally, there may be developmental effects that require further investigation. Berninger (1999) highlighted that effects of transcription skills on composition decrease across development and Hajovsky et al. (2018) highlighted grade level moderation of Gc-to-writing effects.

Fourth, it is important to stress that though we highlighted mediation effects, the data here are ultimately correlational, which limits our ability to make causal inferences. However, we reported both intervention research and longitudinal analyses that support causal relationships between these correlations (Abbott et al. 2010; Graham et al. 2012; Graham and Santangelo 2014; Santangelo and Graham 2016).

Conclusion

The WIAT-III appears to operationalize important aspects of the simple view of writing effectively. Its writing measures appear to reflect multiple levels of language, though it is possible the spelling/sentence writing relationship is exacerbated, because spelling is an explicit scoring criterion within sentence writing. Spelling and sentence writing tasks mediated effects of oral language on composition, as predicted by the simple view of writing, though mediation occurred to a lesser degree in the referral sample. Collectively, clinicians should consider the simple view of writing in their interpretation of the battery.