Spelling is a crucial skill that students learn during the first several years of formal schooling. Despite teachers endorsing the importance of spelling instruction and reporting an average of 90 classroom minutes per week on it, they report that more than 25 % of students struggle with learning spelling (Graham et al. 2008). Spelling skill is linked to both writing and literacy outcomes (e.g., Graham and Santangelo 2014) and may have long-term effects on skilled adult reading (e.g., Perry and Ziegler 2000). In particular, students who are poor spellers are poor readers (e.g., Ehri 1987) and struggle with writing (e.g., Juel 1988). Poor spellers use simpler terms in their writing, forget ideas they want to express, and write less than students who are strong spellers (e.g., Graham et al. 2002; MacArthur et al. 1996; Okyere et al. 1997). Furthermore, Graham and Hebert (2011) found that teachers judge the quality of ideas in papers containing spelling errors more harshly than the same papers with no spelling errors. Importantly, spelling instruction improves both writing and reading skills; recent meta-analyses demonstrate that spelling instruction improves spelling accuracy during writing (Graham and Santangelo 2014), the quality of writing (Graham et al. 2012), and has a positive impact on phonological awareness, word reading, and reading comprehension (Graham and Santangelo 2014).

The findings that improvements in spelling skill transfer to both writing and reading skills are consistent with the lexical quality hypothesis (see Perfetti 2007, for a review). According to this hypothesis, readers who have well-specified word representations are able to devote cognitive resources to higher level text comprehension tasks as opposed to word decoding; one way of indexing lexical quality (or well-specified word representations) is through assessing spelling skill (e.g., Andrews and Bond 2009). Similarly, others suggest that strong spelling skills allow writers to focus more on writing processes that include, for example, planning and revising (Berninger 1999; Graham 1999).

Further demonstrating the importance of good spelling skills, the impact of spelling difficulty during childhood may persist. For example, Perry and Ziegler (2000) found that skilled adult readers were slower to identify words that are difficult for first graders to learn how to spell (even when controlling for other factors commonly known to influence word identification abilities in adult readers). Furthermore, skilled adult spellers outperform those who are less skilled in different reading measures, even when reading comprehension skills are equivalent (Andrews and Bond 2009; Veldre and Andrews 2014a, b).

For all these reasons, investigating which instructional methods lead to the strongest spelling skills is important. In the current study, we examined both the efficacy of and student engagement in two common methods of direct spelling instruction. Direct instruction involves explicit training in word spelling; students engage in activities (either teacher led or independently) designed to improve spelling for assigned lists of words.

The utility of direct instruction has been downplayed by some educational researchers (e.g., Bean and Bouffler 1987; Brown 1990; Krashen 1989, Wilde 1990) who endorsed a spelling-is-caught approach. For this approach, students learn to spell in an incidental fashion during reading and writing activities. Indeed, students do learn new word spellings following reading and writing, without direct spelling instruction (for reviews, see Graham 2000; Krashen 1989). Even so, other educational researchers endorse the use of direct spelling instruction and the majority of teachers report dedicating classroom time to direct spelling instruction activities (e.g., Graham et al. 2008). Perhaps most important, although spelling-is-caught approaches do improve spelling, Graham and Santangelo (2014) discovered in a meta-analysis of 23 studies that direct spelling instruction leads to more learning than do spelling-is-caught approaches.

Because direct spelling instruction outperforms spelling-is-caught approaches, we chose to investigate two common methods of direct instruction used in the classroom—rainbow writing and retrieval practice. Rainbow writing is a relatively new method that involves repeatedly copying spelling words in different colors, creating a rainbow effect. It is related to another commonly used and older technique, copying, in which spelling words are written without changing colors (Cronnell and Humes 1980; McNeill and Kirk 2014). Retrieval practice involves taking practice quizzes and then checking produced spellings against correct spellings.

The selection of these two methods was not arbitrary. We selected retrieval practice because experimental studies conducted in the laboratory, involving undergraduate participants for the most part, provide strong evidence that it is an effective way to promote learning in other domains (for reviews of the retrieval practice literature, also known as the testing effect, see Dunlosky et al. 2013; Roediger and Butler 2011). Several hypotheses have been proposed to explain the benefits of retrieval practice, including that retrieval enhances semantic elaboration (Carpenter 2011), that it increases the likelihood learners will use better strategies to encode the correct responses (Pyc and Rawson 2010; 2012), and that it enhances memory for context that improves subsequent retrieval (Karpicke et al. 2014b). The mechanisms described in these hypotheses are not mutually exclusive and the positive evidence for each hypothesis suggests that all proposed mechanisms may contribute to retrieval practice benefits in some circumstances. Despite evidence in support of these hypotheses and numerous demonstrations of the robust benefits of retrieval practice, only four studies have examined the influence of retrieval practice on memory in younger elementary school children (Bouwmeester and Verkoeijen 2011; Fritz et al. 2007; Gates 1917; Lipowski et al. 2014) and results have been mixed. These studies have investigated name learning, list learning, and nonsense syllable learning; none have investigated the influence of retrieval practice in an authentic classroom setting with spelling materials.

In the spelling instruction literature, several studies report a benefit following a variant of the traditional retrieval practice paradigm (Alber and Walshe 2004; Grskovic and Belfiore 1996; McGuffin et al. 1997; McNeish et al. 1992; Wirtz et al. 1996). Whereas the traditional retrieval practice paradigm involves retrieval practice followed by restudy, these studies used retrieval practice plus self-correction. That is, students engaged in retrieval practice were shown correct spellings and then rewrote any misspelled words. Because these studies always used self-correction, it is unclear whether retrieval practice or rewriting the misspelled words correctly improved spelling. Furthermore, all of these studies used very small samples (n = 5 to 6) of students with learning disabilities or identified by teachers as at-risk spellers in grades three or higher—and only descriptive statistics were reported. As such, it is unclear whether retrieval practice is an effective instructional method for typically developing, younger elementary school children when formal spelling instruction begins.

In contrast to the literature indicating that retrieval practice may be beneficial in teaching children to spell, we are unaware of any empirical investigations evaluating the effectiveness of rainbow writing. Literature searches on ERIC, PsycINFO, and Web of Science produced no results for empirical investigations of the efficacy of rainbow writing. Even without an evidence base, however, rainbow writing appears to be a popular instructional method. It is recommended for use by the Scholastic Corporation (e.g., Taylor 2011; Wagstaff 2009) and as part of a spelling instruction curriculum aimed at appealing to multiple intelligences (Shah and Thomas 2002), and it is implemented in a popular Daily Five Word Work program (Boushey and Moser 2006, 2014). Indeed, of the three authors of this paper who have children in elementary school, all have seen their children complete rainbow writing exercises in school. Remarkably, then, it appears that rainbow writing, and likely other instructional tasks, are routinely used in schools with no evidence of efficacy.

Importantly, rainbow writing may be as (or even more) effective than retrieval practice. First, exposure to correctly spelled words increases the likelihood of producing a correct spelling, whereas exposure to an incorrectly spelled word increases the likelihood of producing a misspelling (e.g., Jacoby and Hollingshead 1990). Because rainbow writing virtually precludes spelling errors, it may be more effective than retrieval practice, which is likely to include spelling errors.Footnote 1 Second, rainbow writing allows students a level of freedom and choice (i.e., color selection, self-paced) not provided with retrieval practice; student choice is associated with improved motivation and learning outcomes (Grolnik and Ryan 1987; Patall et al. 2010; Ryan and Deci 2000). Third, rainbow writing modifies the commonly used copying technique (e.g., Cronnell and Humes 1980; McNeill and Kirk 2014) by introducing a multisensory component (Shah and Thomas 2002). Although copying may not be as effective as retrieval practice (Grskovic and Belfiore 1996; McGuffin et al. 1997), the multisensory component of rainbow writing may make it more engaging, and thus potentially more effective in the long run, for students.

Because empirical investigations of the efficacy of retrieval practice and rainbow writing for learning spelling are absent in the literature, we conducted three experiments, with typically developing first- and second-grade children, that directly compared the two instructional methods. In addition to assessing their effectiveness as instructional techniques, we assessed the extent to which they engaged children during practice. Based on the rationale above, one prediction is that both techniques will be equally effective but that rainbow writing will be more interesting to students and hence, they will find it more engaging and enjoyable to do. Of course, given the power of retrieval practice over repetition in the larger literature (Dunlosky et al. 2013; Roediger and Butler 2011), another plausible outcome is that retrieval practice will be more effective. Because competing predictions can be made, we evaluated these predictions by exploring the relative efficacy of these two techniques across multiple experiments.

Experiment 1

Method

Participants and Design

Fourteen second-grade students (six girls, eight boys) enrolled at an elementary school in Southern California participated in experiment 1. The sample size used in this experiment (and in the following experiments) was based upon the number of available students in participating classrooms; all available students were included in the sample. The design was a 2 (spelling practice: rainbow writing, retrieval practice) × 2 (test: 1-day retention, 5-week retention) within-participant design.

Materials and Procedure

Materials included 20 words which were selected from a list provided by the teacher (see Appendix A). These words came from materials that students were to be taught and tested on after the experiment was complete, later in the academic year. The 20 words were split into two 10-word lists; one list was practiced with rainbow writing and one with retrieval practice. Students were randomly split into two groups so that list assignment to type of spelling practice was counterbalanced.

For rainbow writing, students were provided with worksheets with the practice words printed and several crayons of varying colors. They were instructed to complete the worksheets by writing each word several times, in the colors of their choosing. The research assistant demonstrated the procedure on a white board, writing a single word several times in various colors before students began. They were given 10 min to complete the rainbow writing activity and they worked continuously at their own pace throughout each trial.

For retrieval practice, the research assistant explained the procedure and then read a list of 10 words, one at a time (each repeated once). Students wrote each word on worksheets provided by the research assistant. After all 10 words were presented, the research assistant wrote the correct spellings of the words on a white board for students to view. Students marked any incorrectly spelled words with an X and marked correctly spelled words with a check mark. Students then turned their worksheets over, words were erased from the white board, and the process began again, repeating until time elapsed. Retrieval practice, like rainbow writing, lasted for 10 min.

Students participated in both types of spelling practice over two consecutive days, with one 10-min block for each practice activity on each day. In addition to counterbalancing word list across practice type, the order of practice was counterbalanced across practice days and each student group.

Students completed two retention tests. One was given 1 day following the practice sessions; one was given 5 weeks later. For the tests, students had blank paper and pencils. All 20 practiced words were presented in random order. Similar to the retrieval practice phase, each word was read twice for students to spell.

In addition to the retention tests, students also completed a questionnaire regarding each of the practice methods immediately following the first retention test (see Appendix B). The questionnaire contained three questions in which students chose between rainbow writing and retrieval practice; they concerned which task the students would choose to do in the future, which task was more fun, and which task helped them learn more. The questionnaire also contained four questions in which students used a 5-point Likert scale for responding; two questions were about rainbow writing and two were about retrieval practice. Students rated how much they liked and learned from each method. In all experiments, if students had a question about the questionnaire, a research assistant answered them to ensure they understood the task.

Results

In the interest of focusing on statistical analyses necessary to answer our specific research questions, below, we report only the planned comparisons. However, outcomes of the repeated measures analyses of variance (ANOVAs) for all experiments are reported in Table 1.

Table 1 Outcomes of omnibus analyses of variance for experiments 1–3

Spelling Accuracy

As illustrated by Fig. 1, retrieval practice produced more learning than rainbow writing, both on the test 1 day following the practice sessions, t(13) = 2.31, p = .038, d = .60, and on the retention test 5 weeks later, t(13) = 2.75, p = .017, d = .48.

Fig. 1
figure 1

Spelling accuracy for students in experiment 1. Error bars are standard error

Questionnaires

Descriptive statistics for the self-report questionnaires are listed in Table 2. When asked to choose between retrieval practice and rainbow writing, a larger percentage of students chose retrieval practice over rainbow writing; they preferred it as a future instructional method, endorsed learning more from it, and liked it more. However, note that the sample sizes in this experiment as well as in experiments 2 and 3 were relatively small, so none alone provided enough power to reveal a significant effect using this relatively insensitive measure based on binary (yes/no) responses; thus, we combined data for the first three questionnaire questions across experiments and report the inferential statistics in the “General Discussion” section. When asked to report the degree to which they liked and learned from each instructional method on its own (i.e., not as a comparison between methods), retrieval practice was rated higher than rainbow writing for both liking, t(13) = 2.11, p = .055, d = .98, and learning, t(13) = 3.31, p = .006, d = 1.14.

Table 2 Descriptive statistics for the self-report questionnaires for experiments 1–3

Experiment 2

The purpose of experiment 2 was to replicate and extend results from experiment 1 with a new sample of second graders from a different geographic region. Again, we were interested in comparing the efficacy of retrieval practice and rainbow writing. In this experiment, we included a pre-test to assess the amount of learning that occurred during training.

Method

Participants and Design

Sixteen second graders (eight girls, eight boys) enrolled in an elementary school in Northeast Ohio participated in experiment 2. The design was a 2 (test: pre-training, post-training) × 2 (spelling practice: rainbow writing, retrieval practice) within-participant design.

Materials and Procedure

As in experiment 1, spelling materials were selected from a list provided by the teacher (see Appendix A); items were selected from a list of words used by the school district that were expected to be learned during the next academic year. The questionnaire was identical to that used in experiment 1. The procedure was identical to that of experiment 1, with the following three exceptions. First, to assess baseline performance before spelling practice began, students were administered a pre-test of all 20 spelling words in randomized order. Second, there was no 5-week retention test; the test was administered 1 day following the practice sessions. Third, the order in which items were presented during each day’s practice phase was randomized to reduce the possibility that students had more practice with some words than others. Because rainbow writing was self-paced, it is possible that the difference in accuracy observed between rainbow writing and retrieval practice in experiment 1 was due to students only focusing on the first several words of the list and not practicing the end items.Footnote 2

Results

Spelling Accuracy

Comparisons of pre-training and post-training test performance indicate that both rainbow writing and retrieval practice produced learning, t(15) = 3.74, p = .002, d = .60 and t(15) = 6.32, p < .001, d = 1.13, respectively. Furthermore, as illustrated in Fig. 2, the benefits of retrieval practice were larger than the benefits of rainbow writing, t(15) = 2.78, p = .014, d = .47.

Fig. 2
figure 2

Spelling accuracy for students in experiment 2. Error bars are standard error

Retrieval Practice

We also retained students’ retrieval practice data to examine spelling accuracy during training.Footnote 3 On each training day, students completed two retrieval practice attempts. On both days, accuracy improved from the first to second retrieval attempt (see Table 3), t(13) = 3.51, p = .004, d = .88 (day 1) and t(13) = 2.35, p = .035, d = .49 (day 2).

Table 3 Spelling accuracy during retrieval practice

Self-Report Questionnaires

Similar to experiment 1, students rated retrieval practice as high as or higher than rainbow writing (see Table 2). A larger percentage of students again preferred it as a future instructional method, endorsed learning more from it, and liked it more. When asked to report how much they liked and learned from each instructional method on its own, the differences between retrieval practice and rainbow writing were nonsignificant, although in the same direction as in experiment 1, t(15) = 1.38, p = .189, d = .59, t(15) = 1.41, p = .178, d = .58, respectively.

Experiment 3

In experiment 3, we sought to replicate and extend our findings with a younger sample of students for two reasons. First, baseline performance on the pre-training test was relatively high in experiment 2 (M = 62.8 %, SD = 31).Footnote 4 As such, the 28.8 % improvement in performance following retrieval practice (compared with 15.6 % following rainbow writing) may reflect an underestimate of the advantage for retrieval practice given possible ceiling effects. In fact, not only was overall performance at 90 % after retrieval practice, but 56.3 % of the students in experiment 2 had perfect accuracy on items following retrieval practice. Thus, the nearly twofold benefit of retrieval practice over rainbow writing we observed may not accurately reflect the advantage of retrieval practice over rainbow writing. Because first grade is when formal spelling instruction typically begins, we expected baseline performance to be lower. With lower baseline performance, we may observe an even greater benefit of retrieval practice over rainbow writing. Second, only two studies have examined the effects of retrieval practice (both in non-spelling task domains) with first graders (Gates 1917; Lipowski et al. 2014) and results were mixed. Thus, it is unclear whether retrieval practice would yield the same benefits for first graders as was established for second graders in the first two experiments.

Method

Participants and Design

Twelve first graders (nine girls, three boys) from the same school in Northeast Ohio (as reported in experiment 2) participated in experiment 3. The design was identical to that of experiment 2.

Materials and Procedure

Again, spelling materials were selected from a list provided by the teacher (see Appendix A) and the questionnaire was identical to those used in experiments 1 and 2. The procedure was identical to that of experiment 2.

Results

Spelling Accuracy

Results replicated experiment 2 (see Fig. 3). Both rainbow writing and retrieval practice produced learning, t(11) = 2.303, p = .042, d = .29 and t(11) = 6.50, p < .001, d = 1.04, respectively. Furthermore, the benefits of retrieval practice (34 % gain) were again larger than the benefits of rainbow writing (9 % gain), t(11) = 5.61, p < .001, d = .69.Footnote 5

Fig. 3
figure 3

Spelling accuracy for students in experiment 3. Error bars are standard error

Retrieval Practice

As in experiment 2, spelling accuracy increased from the first to second retrieval practice attempts on both days (see Table 3), t(11) = 3.56, p = .004, d = .41 (day 1) and t(11) = 4.10, p = .002, d = .44 (day 2).

Self-Report Questionnaires

Although an equal proportion of students chose each method when asked to choose between rainbow writing and retrieval practice as a future instructional method, a larger percentage of students endorsed both liking and learning more from retrieval practice (see Table 2). When not choosing between methods, but rating how much they liked and learned from each method, students endorsed more learning from retrieval practice than rainbow writing, t(11) = 2.87, p = .015, d = 1.23. As in experiment 2, although retrieval practice was rated numerically higher than rainbow writing, this difference in student ratings of liking was not significant, t(11) = 1.30, p = .220, d = .63.

General Discussion

In three experiments, we consistently found that retrieval practice promotes student learning more than rainbow writing does. In experiment 1, spelling accuracy was 10 % higher following retrieval practice than following rainbow writing and the benefit remained stable over a 5-week delay. Experiment 2 replicated experiment 1; spelling accuracy following retrieval practice was 9 % higher than following rainbow writing. Finally, in experiment 3, with reduced baseline performance, spelling accuracy following retrieval practice was 22 % higher than following rainbow writing. These results constitute the first empirical investigation regarding the relative efficacy of rainbow writing and retrieval practice as instructional methods for spelling. Furthermore, the benefit in learning afforded by retrieval practice did not come at the expense of children’s enjoyment; students rated retrieval practice to be as or more preferable than rainbow writing.

Our results also uniquely contribute to the testing effect literature. Publications about the testing effect with young elementary school children represent a small, but growing, literature (Bouwmeester and Verkoeijen 2011; Fritz et al. 2007; Gates 1917; Lipowski et al. 2014). Adding to the several demonstrations of the testing effect with older elementary and middle school children using applied, course-relevant concepts (e.g., Karpicke, Blunt, Smith and Karpicke 2014; Lipko-Speed et al. 2015; Metcalfe et al. 2007; Roediger et al. 2011), our results constitute the first demonstration of the benefits of retrieval practice for young elementary students in an authentic classroom setting. These data also provide a somewhat unusual and surprising concordance between performance and metacognitive awareness, which is not common in the adult testing effect literature (e.g., Roediger and Karpicke 2006). In experiments 1 and 3, students rated learning as higher from retrieval practice than from rainbow writing; the trend, although nonsignificant, was in the same direction in experiment 2. Furthermore, when combining questionnaire data from all three experiments, when students had to choose between retrieval practice and rainbow writing, retrieval practice was chosen as the superior learning method, χ 2 = 4.67, p = .031. In the only other study with young elementary students that assessed beliefs about learning (Lipowski et al. 2014), only third graders believed retrieval practice was superior to restudying; first graders endorsed learning more from restudying than retrieval practice, similar to findings in the adult testing effect literature. Here, though, both the first and second graders endorsed retrieval practice as the superior learning method. (Across the three experiments, students also chose retrieval practice over rainbow writing as a preferred future practice method, χ 2 = 3.43, p = .064, and endorsed liking it more, χ 2 = 13.71, p < .001.)

Our findings have straightforward implications for improving spelling instruction; namely, retrieval practice promotes better learning than rainbow writing. However, teachers report using a variety of spelling activities (Graham et al. 2008; McNeill and Kirk 2014) and many intervention studies report using multi-component spelling instruction (e.g., Berninger et al. 2002; Graham et al. 2002; Kirk and Gillon 2009). Aside from the instructional methods we investigated, other common methods include alphabetizing, writing-saying, finding the missing letter, unscrambling the letters, word searches, and dictionary work. As such, programmatic and parametric research is needed to examine which instructional activities among the variety used promote the best learning as well as which combination(s) of activities promote the best learning. For example, retrieval practice may produce more learning if preceded by another activity like rainbow writing or unscrambling the letters, so as to ensure some successful retrieval of word spellings during practice. That is, retrieval practice followed by feedback may not be as effective when retrieval performance during practice is low (Karpicke et al. 2014a, b; Smith and Karpicke 2014). In the present case (Table 3), the students did show retrieval success during practice trials, but an activity which promotes initial learning gains prior to engaging in retrieval practice may provide the best outcomes, particularly for students struggling to learn spelling.

More generally, in terms of educational practice, these results demonstrate the importance of empirical evidence for instructional techniques. When teachers seek to maximize the efficiency and effectiveness of classroom practices, they are often faced with a wide array of choices, sometimes with little more than their intuitions to guide them. Presumably, the use of rainbow writing has become popular precisely because it is believed to be more enjoyable for students. Our results indicate that is not the case. These findings serve as an important reminder that even when teaching methods have been developed to be fun and innovative, claims that they are educationally beneficial and that children find them appealing require (causal) empirical support (see Reinhart et al. 2013).