Introduction

“Evidence-based practice” has become a familiar term among recent reform initiatives in education and psychology. Despite its popularity, there has been little consensus regarding what this term means, or what distinguishes those professional activities aligned with, or not aligned with, evidence-based practice. Many of the efforts to establish evidence-based practices in education have focused on compiling lists of interventions that have sufficient research support, but a broader interpretation would suggest that evidence-based practice should be the standard for all educational decisions (Detrich 2008). Thus, it would appear that decisions related to when a problem warrants more intense intervention, what is causing the problem, which treatment or treatments should be attempted, and how to best evaluate intervention effects, should all be held to the highest standard of scientific evidence.

The purpose of this study was to demonstrate how recent methodological innovations may contribute to data-based problem solving in schools. The framework for problem solving was a sequence of educational decisions described by Tilly (2008):

  1. 1.

    Is there a problem and what is it?

  2. 2.

    Why is the problem happening?

  3. 3.

    What can be done about the problem?

  4. 4.

    Did the intervention work?

Based on a review of recent studies examining the brief experimental analysis (BEA) of reading concerns, a data-based problem solving model was assembled to address these educational decisions using procedures that promote “the application of science and the scientific method” (Tilly 2008, p. 18).

Problem Solving Decisions

Problem Identification

Problem solving begins with perhaps the most challenging decision: Is there a problem and what is it? Given the potential impact of academic and social skills on a variety of school-related tasks, it is important that child study teams target concerns that, if corrected, would lead to the most favorable widespread consequences for the child (Barnett et al. 1996). Curriculum-based measures, for example, are critical indicators of a child’s academic health (Shinn and Bamonto 1998), as these measures have been shown to model academic growth across the elementary grades and thus may be used to establish the discrepancy between the child’s performance level and the “expected” level, as indicated by peers receiving the same instruction and/or literature-based criteria.

A discrepancy in performance level alone, however, is insufficient for problem identification purposes because it assumes that every child should (or could) be expected to read or write at the same level. A more reasonable expectation is that every child should (and could) make the same amount of progress in reading or writing. Thus, problem identification must also consider the trend or slope of improvement observed under existing “baseline” conditions. A problem that warrants more intervention or a change in intervention services is defined as the presence of a discrepancy in both performance level and growth (McMaster et al. 2005).

Problem Analysis

According to Tilly (2008), problem analysis is concerned with addressing why the problem is happening. In this phase, hypotheses regarding variables that contribute to the discrepancy between current and expected levels of performance are generated. These hypotheses can be generated through correlational or experimental methods. A correlational approach involves inferring a cause based on its presence or absence under existing, baseline conditions. Based on a classroom observation, for example, low oral reading fluency might be attributed to insufficient motivation or time allocated to active reading. The BEA (Daly et al. 1997) is an experimental approach to problem analysis that involves the actual testing of causal hypotheses, under controlled conditions. Using brief instructional trials, one or more conditions representing common causes of academic problems are applied to separate passages, and the relative effects are used to generate a causal hypothesis.

BEA models vary across studies, but all are characterized by a brief multi-element design that includes a series of planned intervention strategies, tested in a hierarchical order according to the duration or intrusiveness of the treatment strategy. All models have also used the relative, “within-trial” effects of conditions to identify the most effective yet least intrusive intervention. Within-trial effects represent performance immediately following intervention on the same passage, or a passage that shares many of the same words and content. Thus, problem analysis decisions are based on a summative evaluation—the extent to which specific instructional content has been mastered. Gains observed during one condition are not expected to have significant carry-over to the next condition, which allows for two or more conditions to be administered in a single session, and the entire assessment to be completed in one or two sessions. The goal of a BEA is to isolate the core instructional strategies that may provide the foundation for intervention (Jones et al. 2008).

Treatment Design

Following problem analysis, the next question is “What can be done about the problem?” In some cases, treatments may be based solely on the most effective yet least intrusive strategy identified during the BEA. As Tilly (2008) notes, however, a multi-component intervention should be considered in order to ensure the highest likelihood of success. For example, Noell et al. (2001) as well as others (e.g., VanAuken et al. 2002) have extended the BEA so that one or more experimental conditions are repeated across several sessions. The primary purpose of an extended assessment has been to confirm the results of the BEA (Malloy et al. 2007; Noell et al. 2001; VanAuken et al. 2002) or further examine the effects of particular instructional components (Daly et al. 2002; Dufrene and Warzak 2007). In this regard, the extended assessment may contribute to treatment design decisions within a problem solving framework because initial effects observed during the BEA are clarified. By repeatedly administering BEA conditions, the stability of within-trial effects on targeted or matched passages can be observed, and the least intrusive treatment package that results in criterion performance levels may be identified.

Treatment Evaluation

Once an intervention package is identified, implementation of a sustained intervention phase is begun. At this point, the focus of measurement shifts from summative to formative evaluation, and the effects of intervention are measured in untreated and unmatched passages. During this phase, general outcome measurement (Hixson et al. 2008) is used to evaluate student trend or academic growth in relation to baseline and projected goals. Fuchs and Fuchs (1998) presented a treatment validity model that highlighted the central role of an experimental analysis of learning as a vital component of service delivery. In this model, formative evaluation of academic growth, defined in terms of the slope of improvement across weekly measures of oral reading fluency, is examined across baseline and treatment phases. Within each phase, progress monitoring is conducted for at least 7 weeks to allow enough time to establish both levels and trend. Many recent studies have proposed similar single-participant designs and the resulting patterns of responding necessary to evaluate the effects of sustained intervention (Barnett et al. 2004, 2007; Fuchs et al. 2002).

Evaluation of treatments derived from a BEA using general outcome measures has been rare, but preliminary findings are encouraging. Daly et al. (2005) conducted an experimental analysis to identify reading fluency interventions for two elementary-age children. Next, an intervention package was administered 3–4 times per week. Progress monitoring was conducted for several weeks prior to and following the experimental analysis by alternating administration of randomly selected passages with and without a reward component. Results indicated that both children’s reading fluency on untreated passages (i.e., no reward) increased in level and positive slope across approximately 12 weeks of intervention. Gortmaker et al. (2007) conducted a BEA for three elementary-age children during a summer academic program. During a 4 week parent-implemented fluency intervention derived from this assessment, all children demonstrated an increase in performance on low overlap passages that contained very few words included in instruction. For two of three children, a positive slope of improvement across sessions was also observed.

Purpose of Study

Prior research has established a potential role of experimental analysis in a problem solving approach to addressing reading fluency concerns. The purpose of this study was to assemble various research exemplars into a unified model, and to demonstrate the potential contribution of each to milestone problem solving decisions for six children experiencing reading difficulties. Problem identification decisions were based on the presence of a dual discrepancy (McMaster et al. 2005) during an extended baseline phase. Problem analysis and treatment design were based on a BEA that tested the major causes of academic failure (Daly et al. 1997) and an extended assessment of intervention packages (Noell et al. 2001), respectively. Problem evaluation featured formative evaluation of weekly academic growth (Fuchs and Fuchs 1998) across a prolonged period of baseline and treatment.

Method

Participants and Setting

Six elementary-age students with reading difficulties served as participants in this study. All children attended the same school. April, Tammy, and Tim were 3rd grade students receiving Title I services. The remaining three participants included two 3rd graders (Greg, Amy) and one 4th grader (Lisa), each of whom received inclusion-based special education services in a general education setting.

Participant selection was based on the school’s existing universal screening procedures that featured the dynamic indicators of basic early literacy skills (DIBELS; Good and Kaminski 2001), which is a web-based application of CBM and other early literacy measures. The participating school utilized the DIBELS data system during each fall, winter, and spring quarter. Beginning in the spring of first grade, all students at the school were administered three grade-specific oral reading passages at each time interval. Student data were entered online to generate automated reports that displayed, among other information, an oral reading performance score for each child (as well as classroom means) and the risk status associated with the child’s performance. The scores of the six participants in this study were among the lowest on the fall DIBELS administration in their respective grades. All individualized assessment and intervention procedures were provided by five graduate students enrolled in one of two NASP-approved school psychology programs located within the region. Training foundations were provided during preliminary meetings between university and school personnel.

Dependent Variable

Curriculum-based measurement in reading (CBM-R) was used to establish response to intervention. CBM-R was administered by asking a child to read aloud from randomly selected, generic grade-level passages and determining the number of words read correctly per minute. CBM-R correlates well with traditional (norm-referenced, teacher judgments) measures of decoding and comprehension, and reliably differentiates special education status (Marston 1989). Further, CBM-R appears to be one of the most valid measures available for monitoring reading competence (Fuchs and Fuchs 1999). Administration and scoring of CBM-R followed closely the standardized procedures outlined in Shinn (1989), with one exception. If a child hesitated for more than 3 s on a word, the examiner told them to “go on,” (rather than supplying the word) and the word was counted as an error. Administration was modified in this manner so that measurement strategies were completely independent of instruction.

Generic passages at the child’s current grade level were used to monitor child progress. A set of 100 passages at each grade level was created from various reading series using guidelines suggested by Shinn (1989). Each passage was retyped onto a single page in which the fonts and spacing closely resembled the original. Passages contained at least 150 words and did not consist of plays, poetry, or songs. In some cases, text was modified (e.g., proper nouns changed, sentences shortened) so that passages were at the designated grade level readability, using the extended version of Fry’s (1977) Readability Graph. Fry estimates are based on the number of sentences per 100 words and the total number of syllables per 100 words. This method for controlling the readability was used primarily because it is relatively simple and thus easily communicated to parents and teachers, but also because the Fry estimates correspond well with more sophisticated readability approaches (Fry 1989).

Reliability

CBM-R administrations were audio-taped so that accuracy of scoring could be closely monitored. Reliability was assessed through inter-observer agreement (IOA) between the examiner and a second, independent scorer on 37% of the CBM-R administrations. Reliability sessions were distributed equally among children and the various experimental phases. Agreement was calculated by dividing the lower estimate by the higher estimate, and multiplying by 100 (House et al. 1981). Although a point-by-point (i.e., word-by-word) method is preferred for oral reading, the base rate for errors in this study was low (M = 5.53 per min), and thus the total score method may provide a reasonable estimate of reliability for this sample. Mean IOA across cases was 98% (range, 79–100%).

Independent Variables

The effects of four instructional interventions on CBM-R were assessed during various phases of the study: incentive, repeated reading, phrase drill (PD), and easier material. These four strategies were selected because they represent the most common components of evidence-based interventions for reading fluency and are conceptually distinct along a continuum of intervention intrusiveness or intensity (Daly et al. 1997).

Incentive

The incentive (IN) strategy involved the use of goal setting and contingent rewards to impact reading performance. Using this strategy, a fluency goal was set and the child selected a preferred prize coupon from a set of three choices that included: (a) an award coupon, which could be redeemed for certificates, badges, or ribbons (b) a prize coupon, which could be redeemed for trinkets, cards, erasers, or pens, and (c) a phone coupon, which was redeemed for a positive phone call home to the child’s parent. If the child achieved the goal, the preferred coupon was provided, and exchanged for backup rewards immediately following the session. If the participant did not achieve the goal, encouragement and a consolation prize (e.g., a pencil) was provided. Regardless of whether the goal was met, performance feedback was given. This strategy was conceived as the least intrusive because it did not require much instructional time, and incentives were commonly used in the school.

The incentive strategy was intended to maximize the impact of increased motivation on oral reading. Goal setting and contingent rewards are strong elements of effective academic interventions (Lentz et al. 1996; Martens and Witt 2004). Recent work has demonstrated the powerful yet idiosyncratic effects of performance feedback (Conte and Hintze 2000) and contingent rewards (Eckert et al. 2002; Noell et al. 2001) on reading performance.

Repeated Reading

The repeated reading (RR) condition involved the child reading the same story passage four consecutive times, without error correction. The child was not provided the correct word when errors or hesitations occurred, but rather the examiner encouraged the child to “go on.” The first minute of the initial and final reading were scored in order to evaluate oral reading fluency before and following instruction. Repeated reading was considered more intrusive than incentives, due to the additional instructional time required.

This strategy was intended to maximize the impact of “independent practice of text” (Kuhn and Stahl 2003, p. 8). The effects of RR are well-established, although far from conclusive (Kuhn and Stahl 2003; Therrien 2004). The primary basis for omitting corrective feedback was to maintain the hierarchical separation between unassisted and assisted reading strategies; monitoring the accuracy of systematic corrective feedback would require training and a more skilled partner, thus elevating the intrusiveness of this intervention.

Phrase Drill

The PD condition actually included both listening passage preview (Rose 1984) and PD (O’Shea et al. 1984). Similar to RR, the condition began with the child reading a novel passage once, while the examiner noted errors, with the first minute of reading scored for CWPM and errors. Following the initial reading, the examiner read the same passage aloud while the child followed along silently using his or her own copy of the passage. Then the examiner pointed to each word the child read incorrectly on his or her initial reading, and read the word aloud to the child. Next, the child read short text phrases containing each error three times, with the examiner providing immediate corrective feedback. Finally, the child read the passage again while the examiner noted errors. The first minute of the final reading was scored in order to evaluate oral reading fluency following instruction.

This strategy was deemed more intrusive than the RR strategy because implementation required 1:1 interactions and a skilled change agent. The PD package was intended to maximize the impact of modeling, rehearsal, and corrective feedback, all strong components of academic interventions (Kuhn and Stahl 2003; Lentz et al. 1996).

Easier Material

The previous strategies were implemented in grade level materials, regardless of the child’s skill level. The easier material condition involved lowering the difficulty level of instructional materials. Specifically, the child was administered passages that had a readability level that was one grade lower than the child’s actual grade placement. The easier material strategy was intended to assess the impact of curriculum revision. A higher ratio of knowns to unknowns is a characteristic of many curriculum-based and incremental rehearsal interventions (Gickling and Thompson 1985; Shapiro 2004), although the isolated impact of curriculum revision on reading acquisition and fluency is not clear (Kuhn and Stahl 2003). This condition was considered the most intrusive strategy because it requires individualizing the curriculum and represents a change in the child’s least restrictive environment.

Design and Procedures

Assessment and intervention procedures for each child were administered according to the sequence of decisions in a problem-solving framework (Tilly 2008). A unique design and set of procedures were associated with problem identification, problem analysis, treatment design, and problem evaluation.

Problem Identification

During this phase, baseline data were collected and compared to both peer and literature-based standards in order to establish a discrepancy existed in both level and trend, thus warranting further intervention (McMaster et al. 2005). For all children, baseline conditions included routine general education instruction and supplemental daily instruction in the school’s reading center. Supplemental instruction was described by the school’s reading specialist as 30 min per day of assisted reading and phonics drills provided in small groups.

During the first week of baseline, all children in the school—including the six participants—were administered three DIBELS grade level passages by school personnel as part of the building’s tri-annual (fall, winter, spring) school-wide universal screening. During subsequent weeks, an examiner met with the child twice per week and administered at least three grade level reading passages during each session, with no incentives, repetition, or instruction. A weekly CWPM score was calculated as the median score among all passages administered that week, and was used for progress monitoring during the remainder of baseline.

Problem Analysis

Problem analysis consisted of the administration of a BEA of reading fluency (Daly et al. 1997). The four experimental conditions were applied, each time with a novel passage, and relative strength was evaluated in terms of within-trial effects on the same passage in which instruction occurred. Within-trial instructional effects were indicated by the child’s performance on a passage following application of the strategy (i.e., after repeatedly reading the passage during the RR condition). A multi-element design was utilized, during which each condition was administered once, in ascending order of intrusiveness (Daly et al. 1997). An “effective” condition was defined as one that produced a 30% increase over the most recent baseline condition, with six or fewer errors (Jones and Wickstrom 2002). After each condition was tested once, a mini-reversal was conducted to rule out the effects of measurement, practice, and history. A mini-reversal required that the effects of the least intrusive yet effective strategy be: (a) reversed upon presentation of a subsequent condition and (b) replicated when the selected strategy was reinstated. If replication was unsuccessful, an attempt was made to replicate the effects of a more intrusive yet effective strategy. The BEA was completed in two 30 min sessions.

Treatment Design

Following the identification of an effective strategy, an extended assessment was employed in order to identify the minimum combination of strategies needed to reach a mastery level of 100 CWPM, with six or fewer errors (Fuchs and Deno 1982). During this phase, the selected strategy derived from the BEA was administered several times (each time representing one trial) during a session. Each trial began with a new passage, and within-trial effects on the same passage in which instruction occurred were plotted. If the strategy continued to produce effects at least 30% above the baseline and reversal conditions, the BEA results were confirmed. If the isolated effects of the strategy were confirmed, yet produced levels consistently below 100 CWPM with six or fewer errors, a less intrusive strategy was added in an attempt to elevate within-trial effects to the mastery criterion. For example, incentives were added if the RR strategy did not produce scores consistently at or above the criterion. Treatment scripts for “stacking” components (e.g., IN + RR, IN + PD, IN + RR + PD) were developed and used for each trial, and every script included one initial or “cold” reading of the new passage before treatment was applied. Thus, for each trial a measure of CWPM was obtained both before and following treatment.

In essence, the extended assessment shifted the focus from an idiographic to a criterion-referenced goal. For this reason a less intrusive strategy was added, regardless of its idiographic effects during the BEA, so that assessment identified the least intrusive “package” that consistently resulted in mastery levels. An increasing intensity design (Barnett et al. 2004) was used to evaluate the incremental effects of each strategy. By systematically introducing varying intensities of treatment, this design embeds one or more replications by using the stability of prior treatment phases as a “baseline” for subsequent treatment phases. This design preserves the logic of single-case designs while also allowing for the sequencing of intervention intensities to clarify service delivery questions (Barnett et al. 2007).

During the extended assessment, a weekly CWPM score was calculated as the median score among all initial or cold readings (i.e., before treatment) during a particular week. Once an intervention package that consistently produced within-trial scores above the criterion was identified, the problem evaluation phase was instituted.

Problem Evaluation

During the final phase of data-based problem solving, the intervention package identified during the extended assessment was administered twice per week during the remainder of the school year. The effects of the treatment package on the six children were evaluated using a multiple baseline design (Kennedy 2005) by examining the weekly CBM scores across baseline and treatment conditions.

Baseline data were collected during the problem identification, problem analysis, and the initial sessions of the extended assessment phases. During problem identification, the child’s score on the fall universal screening represented his or her initial baseline data point. After obtaining child assent and parent consent for services, baseline resumed as the examiner met with each child twice per week and administered at least three passages with no incentives, repetition, or instruction. Once a relative stable pattern of weekly CBM scores was observed, the examiner administered the BEA and the extended assessment. Baseline data collection continued during these phases, using the initial passage readings before strategies were applied. For five of the six cases, the length of baseline was staggered between 7 and 15 weeks (the length of baseline for Tammy and Lisa were equal).

Treatment data collection began during the final phase of the extended assessment, at which point the child was receiving the exact intervention package that would continue throughout the remainder of the school year. The only change in procedures following the extended assessment was an added incentive for children. Specifically, a pizza coupon was delivered if the child’s weekly CWPM score fell at or above an aim-line that was drawn to represent a +1 CWPM increase per week. Once the child acquired six pizza coupons, they were exchanged for a certificate for a free pizza at a local restaurant. With this additional component in place, treatment sessions were administered twice per week. The goal of problem evaluation was to establish whether weekly CBM scores, which represented child performance on novel passages, were sensitive to the effects of an assessment-derived treatment program. Unlike previous studies that have established similar outcomes (Daly et al. 2005; Gortmaker et al. 2007), the current evaluation addressed relatively long term effects, with a minimum of 7 weeks of baseline (M = 10) and 12 weeks of treatment (M = 16), and examined both level and slope of improvement.

Treatment Integrity

The duration of all treatment sessions was 30 min, regardless of the number or configuration of components in the intervention package. Sessions were scheduled twice per week during reading center so that procedures represented a change in, rather than an addition to, supplemental instruction. On average, four complete intervention trials (i.e., four passages) were administered each week. To ensure adequate measurement sampling, weekly CBM scores were plotted only if at least two complete intervention trials were administered during that week.

Treatment integrity was assessed through permanent products. During implementation, examiners completed a treatment script checklist that required an active response to each step in the intervention package (e.g., checking off that instructions occurred, entering CWPM and errors). A review of these permanent products indicated 100% adherence to the procedures across all six cases.

Results

Problem Identification

For each child, problem identification indicated a discrepancy in both level and trend. Figures 1 and 2 display each child’s performance during baseline. These data indicated that the CWPM score during the fall DIBELS screening ranged from Lisa’s 39 CWPM to Tammy’s 59 CWPM (M = 49), with all performance levels associated with “some risk”or “at risk” status according to DIBELS benchmarks (Good and Kaminski 2001). Compared to grade level means, which were 101 CWPM for 3rd grade and 120 CWPM for 4th grade, participants performed at a level that was approximately half the rate of their same-grade peers (M = 49%; range, 32–58%).

Fig. 1
figure 1

Treatment evaluation for Greg, Tim, and April. Intervention components include incentive (IN), repeated reading (RR), and listening passage preview/phrase drill (PD)

Fig. 2
figure 2

Treatment evaluation for Amy, Tammy, and Lisa. Intervention components include incentive (IN), repeated reading (RR), and listening passage preview/phrase drill (PD)

An aim-line was used to evaluate baseline trend. The aim-line began at a value representing the median of the first three weekly CBM scores, and the slope of the aim-line represented an increase of +1 CWPM per week, which would reflect realistic growth for either a third or fourth grade child (Fuchs and Fuchs 1998). Although measurement intervals were intermittent in some cases, the available data suggested relative stability for four children, and an increasing trend for two children (Tim, Amy) that appeared to be slightly below expected growth rates. Thus, analysis of both level and baseline trend warranted a change in intervention services for all participants.

Problem Analysis and Treatment Design

Results of the BEA and extended assessments are displayed in Figs 3 and 4. During the BEA, three patterns of responses were observed. For Greg and Tim, PD was the only effective condition (i.e., 30% increase over baseline with six or fewer errors) and, following an embedded reversal during the easier material (EM) condition, these effects were replicated. For April and Lisa, both RR and PD were effective and, following EM, the effects of the less intrusive RR strategy were replicated. For Amy and Tammy, replication of the least intrusive yet effective strategy (IN and RR, respectively) failed, while replication was achieved in both cases when the next intrusive yet effective strategy was reinstated.

Fig. 3
figure 3

Brief experimental analysis and extended assessment for Greg, Tim, and April. Intervention components include baseline (BL), incentive (IN), repeated reading (RR), listening passage preview/phrase drill (PD), and easier material (EM)

Fig. 4
figure 4

Brief experimental analysis and extended assessment for Amy, Tammy, and Lisa. Intervention components include baseline (BL), incentive (IN), repeated reading (RR), listening passage preview/phrase drill (PD), and easier material (EM)

The BEA identified either repeated practice or teaching interactions (modeling/error correction) as the primary instructional need for each of the six children. The extended assessment was conducted to confirm these findings and to evaluate whether additional components were needed to achieve mastery levels (i.e., 100 CWPM). With the exception of Lisa, repeated administration of the primary instructional component alone produced levels that were consistently higher than baseline. To achieve mastery, however, the addition of incentives was necessary for Greg (PD + IN), Amy (RR + IN), and Lisa (RR + IN). A combination of incentives, repeated reading, and listening passage preview/phrase drill were needed for Tim, April, and Tammy. None of the children required the addition of easier materials.

Treatment Evaluation

The effects of the assessment-derived intervention package on generalized outcomes were evaluated through visual inspection and summary statistics (refer to Figs 1, 2). Baseline levels were low for Greg (M = 66), Tim (M = 50) and Amy (M = 46), and weekly CBM scores were below or slightly overlapping the aim-line. During treatment, performance levels increased for all three children. Greg’s mean weekly CBM score increased to 92 CWPM, while Tim and Amy’s performance increased to a mean of 66 and 81 CWPM, respectively. For all three children, performance during treatment was consistently above the aim-line.

More modest treatment effects were achieved for April and Tammy. April’s performance level increased from a mean of 50 CWPM during baseline to a mean of 66 CWPM during treatment. Although this represented a moderate increase during a relatively brief treatment period (12 weeks), her performance against the aim-line was less clear: An upward trend is apparent but may be exaggerated by three initial data points that were actually lower than most of her baseline data points. Tammy’s response was similar, although her performance level increased very little, from a mean of 62 CWPM during baseline to a mean of 64 CWPM. Her data during treatment indicated an upward trend that is exaggerated also by unusually low scores during the first 3 weeks of treatment. Although trends are promising for both April and Tammy, further progress monitoring is needed to clarify their response. In any case, there was at least modest success for these two children if one considers that, even after 12–16 weeks of intervention, an instructional change is not yet warranted.

Lisa’s mean level during baseline (M = 51 CWPM) increased to 64 CWPM during treatment, yet a slightly decreasing pattern after 11 weeks of IN + RR warranted an instructional change. After six additional weeks of IN + RR + PD, there remained little change and the gap between her level and the aim-line continued to widen. Thus, Lisa’s response to both the assessment-derived intervention package and an increased instructional package was poor.

Summary statistics were used to compare these treatment effects to literature-based criteria (Table 1). The overall impact of treatment using the “no assumptions” method for calculating effect size formula described by Busk and Serlin (1992) indicated that the mean effect size across the six participants was 2.31, with all cases except Tammy exceeding an effect size of at least 1.0 standard deviation above baseline. Although the use and interpretation of effect size estimates for individual cases is controversial, these estimates correspond to differences in level based on visual inspection. These coefficients are not, however, sensitive to trends within experimental phases.

Table 1 Summary statistics for the six participants

To summarize trends, growth rates were calculated as the average increase in weekly CBM scores across time. For both baseline and treatment, growth rate was derived by subtracting the median weekly CWPM score for the initial three data points from the median weekly CWPM score for the final three data points, divided by the total number of weeks. The average growth rate per child was +.30 CWPM per week during baseline, and +1.13 CWPM per week during treatment, with all five third-grade children exceeding realistic growth expectations of +1.0 CWPM per week for general education children (Fuchs and Fuchs 1998). Further, for all five children, growth rates were lower during baseline and higher during treatment than the average growth rate obtained by peers (+.93 CWPM per week). Lisa, the only fourth grade child among the group and one of three children on an IEP, achieved a negative growth rate during both baseline and treatment.

The final status of children was assessed by school personnel during administration of the spring DIBELS universal screening. Compared to grade level means, which were 123 CWPM for 3rd grade and 147 CWPM for 4th grade, the discrepancy in level was substantially reduced for the group as a whole. In the fall, the targeted group’s performance was at 49% that of their peers, while this index increased to 67% at spring. Only two of the children, however, improved their risk status according to DIBELS benchmarks (Good and Kaminski 2001): Tim increased from “at-risk” to “some risk,” while Amy increased from “some risk” to “low risk.” Thus, for four of the six children a discrepancy in performance level persisted after at least 3 months of sustained intervention.

Discussion

This study demonstrated the contribution of a BEA and extended assessment of reading fluency for six children with severe academic challenges. A BEA was used to isolate potential causes of reading problems, which in turn provided the foundation for an intervention package that was identified during an extended assessment. The intervention package was evaluated in terms of its impact on reading fluency level and growth. Results indicated moderate to strong effects for five of the six children. Summary statistics supported the general effectiveness of the intervention package, while also confirming the different patterns of response among the children.

The current study contributes to the emergence of the BEA as a problem solving “accelerator.” Within a relatively brief period, usually one to two sessions, the sources of difficulty for children with academic concerns were identified by isolating the separate effects of hierarchically ordered instructional strategies. All of the strategies used in this study are empirically supported and, more importantly, reflect complementary rather than “competing” hypotheses. In other words, a treatment that includes all strategies would be expected to produce larger effects than one consisting of fewer strategies. The purpose of assessment is not, however, to identify the most effective treatment, but rather the least amount of treatment that produces meaningful effects (Barnett et al. 2004). The hierarchical arrangement of strategies is a critical feature of the BEA, and most clearly distinguishes this approach from a functional assessment and other empirically based approaches to assessment (Beavers et al. 2004).

The primary contribution of this study was the conceptualization of BEA and extended assessments as components of a sequenced approach to problem solving that also addresses problem identification and problem evaluation. In isolation, each of these set of procedures is limited in scope and susceptible to scrutiny, as there are a host of other, proven practices for addressing one or more elements of problem solving. By linking individual instructional needs to resource decisions (i.e., level of intrusiveness), the current study may reflect more accurately the manner in which professional practice is conducted. This is especially true today, with the emergence of tiered service delivery models that feature continuous iterations of problem solving at the universal, targeted, and intensive levels. Although the literature includes several individual case illustrations of what data-based decision making might look like (Barnett et al. 2004, 2007; Fuchs and Fuchs 1998; Fuchs et al. 2002), the current study demonstrated what it did look like for six children with serious reading difficulties.

The study also contributes to past research demonstrating the treatment utility of the BEA. For five of six children, assessment identified an intervention package that produced moderate to strong academic growth. This study is among a growing number of investigations to link a BEA to positive effects observed during an extended assessment (Dufrene and Warzak 2007; Malloy et al. 2007; Noell et al. 2001; VanAuken et al. 2002). This is, however, one of only a handful of studies that have evaluated assessment-derived interventions beyond within-trial effects on matched or closely matched passages. It is important to note that the impact of treatment in this study was based on level and growth in passages that were administered under conditions identical to baseline. In this regard, the current study is aligned more closely to BEA studies demonstrating effects on general outcome measures (Daly et al. 2005; Gortmaker et al. 2007; Wagner et al. 2006). Although replication is needed, there is emerging evidence that a BEA may lead to accelerated outcomes for children with reading fluency problems.

The problem solving approach illustrated in this study is promising, but there are a number of limitations that should be addressed in future work. The primary issue pertains to experimental control. The magnitude of treatment effects during the extended assessment were difficult to discern because this phase began with an effective strategy, rather than a baseline or control condition. Also, progress monitoring using weekly CBM on novel passage readings, prior to instruction, made it difficult to discern effects during treatment evaluation as well. Requiring a convincing demonstration of cause-effect using performance in untreated, unmatched passages may be an unrealistic standard for academic skills problems, as the most effective interventions produce growth increases of only 1 or 2 units per week (Deno et al. 2001). Decision confidence may be aided, however, by supplementing visual inspection with criterion-based decision rules, using an aim-line to clarify trends, and employing summary statistics.

A second important limitation is that within-trial effects during the BEA and extended assessment were based on instructional rather than generalization passages. While both may be measuring the same thing (Jones et al. 2008), measuring within-trial generalization during problem analysis and treatment design may isolate better those strategies that will produce generalization across trials, sessions, and passages. It should be noted, however, that these findings were similar to the outcomes of similar studies that have used high overlap passages during the BEA (Daly et al. 2005; Gortmaker et al. 2007).

A third limitation is that measurement of weekly CBM scores was embedded within the instruction, rather than conducted during an independent session. This was done primarily for the sake of efficiency but was also aligned closely with the principles of direct instruction (Carnine et al. 2004). For three of the six children (Tim, April, Tammy), however, this may have produced a temporary suppression of weekly CBM scores that might inadvertently be misinterpreted as insufficient response to the intervention package. Future studies may avoid this limitation by making certain that a reasonable period of progress monitoring is allowed before making instructional changes (Fuchs and Fuchs 1998).

Another issue to revisit is the CBM-R administration and repeated readings intervention, each of which included no word supply or error correction. Deviation from traditional CBM administration is always risky, but omitting any teaching from measurement and practice allowed for much greater clarity when interpreting response to intervention. Amy, for example, responded positively to incentives and RR alone, as her performance increased from 63 CWPM at the end of baseline to 97 CWPM on the final week of treatment. This favorable response to intervention is remarkable when one considers that during 20 weeks of reading intervention, her errors were never corrected and no words were supplied. Some may argue that such a practice does not constitute “reading intervention,” and this argument should be embraced and extended also to the use of incentives. Although lack of motivation and insufficient practice are not considered core deficits in reading disorders, they are often essential components of any educational intervention, particularly for those children who have rich schedules of punishment for reading and impoverished opportunities to practice the skills they have.

A final issue pertains to whether improved academic growth alone is sufficient for determining intervention success. Although five children demonstrated positive growth in this study, the final status of only two children (Greg, Amy) was close to peers. Within a 3-tiered, response-to-intervention (RTI) service delivery system, in which resource allocation decisions are based on a dual discrepancy (Fuchs and Fuchs 1998), none of these five children would be considered eligible for special education trials because they are meeting realistic growth standards. It will be a challenge, however, for both researchers and practitioners to accept a definition a “success” that allows achievement gaps to persist. Another challenge, with equally harmful consequences, is to define success in terms of unrealistic growth standards. To achieve at the mean level for their grade, the oral reading performance of these six children would need to have increased by about 75 CWPM, or more than three times the average growth of their typical peers.

In conclusion, this study illustrated one approach to individualized, data-based problem solving that included procedures derived from recent BEA research. Future work is needed to strengthen this model and explore its contribution to a comprehensive, tiered model of service delivery based on student response to intervention. Given the empirical, ethical, and legal foundations for RTI, it is important for future studies to conduct a component analysis of problem solving while also clarifying criteria for “high stakes” progress monitoring decisions. Improving the science of practice may increase decision confidence, but dichotomous classification of response to intervention will continue to be uncomfortable and controversial, as perhaps it should be, lest we grow too fond of it. This is an entirely acceptable consequence if the goal of service delivery is to build fluency in problem solving rather than problem classification. To paraphrase Winston Churchill, RTI may be the worst form of eligibility determination, except for all the others that have been tried.