Introduction

Proficiency in mathematics is not only necessary for academic success, but is essential for daily living (Geary et al. 2012). The application of basic computational skills is required for independent adaptive functioning and creates a foundation for time, money management, and problem-solving (Patton et al. 1997). Rivera-Batiz (1992) found that mathematical competence was a major factor in accounting for higher levels of employment, work productivity, and income even after the impact of IQ and reading achievement as explanatory variables had been considered. Mathematical ability can be likened to basic literacy in that it is a vital component for independent functioning within society (Geary 2013).

Despite the acknowledged importance of mathematical skills, deficits in this area continue to be observed within school systems, with 18% of fourth grade students in the USA performing below the basic achievement level (Kena et al. 2016). In Ireland, similar difficulties with mathematics have been reported. The Counting on Success Report, prepared by the Irish Department of Education, found that 15% of Irish fourth grade students were found to be at or below the minimum proficiency level for mathematics (Surgenor et al. 2006). The data compiled in this report were based upon tests developed to be compatible with the framework of the national school curriculum for mathematics. More recently, the National Assessment of Reading and Mathematics (Shiel et al. 2014) found that 6.2% of students in second grade and 5% in sixth grade were performing below the lowest proficiency level. These deficits increase when looking at data from urban disadvantaged schools. Within urban schools classified as having the highest disadvantage, 14.3% of second grade students and 11.9% of sixth grade students performed below than lowest proficiency level measured (Shiel et al. 2014).

Socioeconomic status (SES) has been identified as an influential variable impacting mathematics proficiency. The rate of students performing below proficiency levels almost doubles in populations living in urban poverty (Rohrbeck et al. 2003), and students from high-poverty backgrounds in particular demonstrate large deficits in mathematics thinking and skills (Griffin et al. 1994). Findings from the Program for International Assessment (PISA) demonstrated a positive correlation between SES and mathematics performance (OECD 2016). For students from the USA, 13.1% of the variance in mathematics scores was found to be attributable to SES. Since family poverty, parental education, low grades, and low test scores are all major predictors of school failure and dropout (Thompson and Kelly-Vance 2001), research focusing on mathematical instruction among disadvantaged populations is critical.

A fundamental skill in the progression of mathematical abilities is the computation of math facts. In order to demonstrate proficiency in mathematics and to progress to more complex mathematics skills, students should achieve mastery of key mathematical concepts (Nelson et al. 2013). Further, those who are both accurate and fluent when performing basic arithmetic have been found to perform better on assessments of overall mathematics ability (Carr et al. 2008). Research by Jordan et al. (2003) found that children with poor arithmetic fact mastery (defined as the ability to compute an addition or subtraction number fact in 3 s or less) in second and third grade performed significantly lower on tests of broad mathematics achievement and showed very little growth in timed number facts when compared to those with good fact mastery.

Fluent responding can be defined as accurate and quick responding to a selected stimulus, with minimal effort, that enables an individual to function efficiently and effectively in their natural environment (Axtell et al. 2009; Binder 1996; Cates and Rhymer 2003). When students achieve a fluent level of performance (i.e., a higher rate of correct responding) with such skills, they typically retain and maintain what they have learned, remain on task in the face of distraction and for longer periods, and more readily apply what they have learned to new situations (Binder 1996; Brady and Kubina 2010; Johnson and Street 2013; Kubina and Wolfe 2005). It has been suggested that students who are slow but accurate in responding to math facts may not be able to complete assigned tasks within set time limits and consequently may receive less or no reinforcement (Bliss et al. 2010; Skinner 2002). A lack of fluency in basic computations can lead to mathematics difficulties that can persist across the lifespan (Nelson et al. 2013). Addressing these deficits early provides a greater chance for success, before the school mathematical curriculum increases in scope and difficulty (Fuchs et al. 2002).

Fluency-based instruction (i.e., instruction designed to increase rate of correct responding with targeted skills) leads to responses with high accuracy and quick response times that require seemingly low effort for the responder (Weiss et al. 2010). Research into interventions that promote fluency has found practice to be a key active ingredient (Burns 2005; Codding et al. 2011; Daly et al. 2007). In addition to practice, successful interventions utilize explicit instruction, drill, feedback, and reinforcement (Burns 2005; Codding et al. 2011; Fuchs et al. 2008). A number of fluency-based instructional approaches have been developed, and demonstrated as effective, to increase fluency with mathematics skills. Three of these which incorporate the components mentioned above include explicit timings (ET; Van Houten and Thompson 1976), SAFMEDS (Casey et al. 2003), and peer tutoring (Hawkins et al. 2009).

Explicit timing (ET) is a method of fluency-based instruction that uses a timing procedure to increase the speed with which a person responds to chosen stimuli. It involves the presentation of a task that can be accurately completed and a finite amount of time in which it is to be completed (Schutte et al. 2015). It provides repeated practice for mathematics fluency, with students completing as many repetitions of the target behavior within the timing, usually 1 min. Research to date has shown ET to be an effective procedure for increasing rate of academic responding and is particularly effective when consequence procedures such as feedback, self-correction, and goal setting are incorporated (Gross et al. 2014; Miller et al. 1995). Frequency-building procedures combine timed repetition of performance with performance feedback to increase fluency with component skills (Kubina et al. 2016). A number of studies to date have demonstrated the ability of frequency building to increase rate of accurate responding within the mathematics domain (Gallagher 2006; Hartnedy et al. 2005; Mc Tiernan et al. 2016; Poncy et al. 2010).

Say All Fast Minute Each Day Shuffled (SAFMEDS) is a frequency-building intervention (Kubina et al. 2016) developed to enable students to learn and increase rate of correct responding with key facts (Graf and Lindsley 2002). SAFMEDS flashcards are used within timed practice sessions to engage in free operant responding and repetitions of the target behavior. Flashcards are developed dependent on the behavior targeted for increase and on the skill level of the student. Performance feedback can also be provided using the flashcards subsequent to timed practice. To date, positive outcomes have been demonstrated in the literature to support its use to increase fluency in the mathematics domain (Beverly et al. 2009; Casey et al. 2003; Chapman et al. 2005; Hartnedy et al. 2005; Hunter et al. 2016). Although a number of studies have demonstrated positive outcomes as a result of fluency-based instruction, a meta-analysis of mathematics interventions conducted by Codding et al. (2009) found that while several fluency-based instructional approaches produced positive results, many of those included failed to meet criteria to be considered evidence-based.

Peer tutoring is an instructional method that has the benefits of individualizing content based on students’ needs, while allowing for extensive feedback and maintaining high levels of task engagement (Dufrene et al. 2005). It is a student-mediated instructional procedure, where one student is responsible for providing instruction intended to teach another student in a specific skill area (Dufrene et al. 2005; Robinson et al. 2005). Research indicates that students benefit from engaging in the role of both tutor and tutee (Robinson et al. 2005). The process of explaining concepts and facts is thought to support tutors to develop and reinforce their own skills (Mitchell et al. 2015).

A diverse literature base exists for peer tutoring across a variety of content areas, intervention intensities, participant ages and ability levels (Robinson et al. 2005). A number of reviews examining this instructional approach have found that it has beneficial outcomes, namely improved academic outcomes across a variety of content areas, as well as having collateral effects on a variety of behavioral and social outcomes (Bowman-Perrott et al. 2013; Bowman-Perrott et al. 2014; Robinson et al. 2005; Stenhoff and Lignusaris-Kraft 2007). Bowman-Perrott et al. (2014) conducted a meta-analyses of single-case research, examining both the academic benefits and the social and behavioral outcomes of peer tutoring. They found that in addition to the primary academic benefits, peer tutoring led to increases in positive social interactions, academic engagement, and decreased disruptive and off-task behaviors. A meta-analysis by Robinson et al. (2005) also found increases in on-task behavior, attendance, positive attitudes about school, self-rated scholastic competence and behavioral conduct, self-concept, and sense of belonging. These observed benefits are not limited to the tutees, but have also been found in tutors across studies. Specific to mathematics instruction, peer tutoring has been demonstrated as an effective instructional approach to teach mathematics skills (Fueyo and Bushell 1998; Sprinthall and Scott 1989).

A number of variations of peer tutoring exist, for example heterogeneous grouping (in which tutees are taught by peers from the same grade level with a higher level of knowledge or skill), homogeneous grouping (in which tutees are taught by peers with similar levels of knowledge or skill), and cross-age peer tutoring. Cross-age peer tutoring involves an older tutor who teaches a younger tutee (Stenhoff and Lignusaris-Kraft 2007). Although cross-age peer tutoring has been found to be an effective intervention for both academic and collateral gains, there are far fewer studies of cross-age tutoring than of same-age peer tutoring (Robinson et al. 2005). The previous research on peer tutoring at an elementary school level has primarily focused on using tutors from the same grade level (Ginsburg-Block et al. 2006; Rohrbeck et al. 2003), and much of the research on cross-age peer tutoring has concentrated on students with learning disabilities and emotional and behavioral disorders (Heron et al. 2003). Additional research is necessary to evaluate cross-age peer tutoring and to investigate the effects of this instructional approach in general education classes (Stenhoff and Lignusaris-Kraft 2007).

Purpose of the Present Study

The purpose of the current study was to extend the literature on both cross-age peer tutoring and fluency-based instructional approaches to increase fluent performances with mathematics skills, specifically with students attending classrooms situated in disadvantaged areas. Findings that SES influences mathematics proficiency (Rohrbeck et al. 2003) indicate that students attending such classrooms may benefit from additional instruction in this domain. Peer tutoring was chosen as an instructional approach since it can be used to increase students engaged time, in particular to build fluency with academic skills (Stenhoff and Lignusaris-Kraft 2007), and can be implemented with relatively low demands on resources (Dufrene et al. 2010). Cross-age peer tutoring was evaluated in combination with explicit timings using SAFMEDS. Kubina et al. (2016) posit that building frequency to a performance criterion results in systematically increasing the frequency of the target skill and that cumulative implementations of such frequency-building sessions foster growth across time. Therefore, fact families from the Morningside Math Facts: Addition and Subtraction curriculum (Johnson 2008) and associated performance criterion were targeted for intervention within the current study.

A stratified randomized controlled trial (RCT) was employed to evaluate the effectiveness of the intervention in comparison with a treatment as usual (TAU) control condition. The effects of the intervention were evaluated using standardized norm-referenced assessments of mathematics ability, direct measures of fluency with targeted math facts, and a measure of social skills and competing problem behaviors. Each measure was recorded pre- and post-intervention to evaluate whether the intervention group would demonstrate significantly greater performances on all outcome measures at post-testing. Since the previous research investigating peer tutoring has indicated benefits from engaging in both the role of the tutor and tutee (Robinson et al. 2005), each measure was also conducted with the tutoring group to investigate possible increases in outcome measures associated with the tutor role alone in non-reciprocal peer tutoring.

Method

Participants and Setting

Forty-four participants were recruited from third, fourth, and fifth grade in an all-female urban elementary school in Galway, Ireland. An a priori power analysis was carried out to determine a sufficient sample size. Using an alpha of .05 and a power of .95, and a medium effect size (f2 = .25; Faul et al. 2007), the desired sample size was determined to be a total of 35 participants. Due to attrition, three participants did not complete the intervention resulting in a final sample of 41 participants. Causes of attrition included participants relocating to another school or extended absences from school while the study was taking place. Participants were between the ages of eight and 12 years (M = 9.94, SD = 0.96). The participating school was recruited through convenience sampling. After teachers from each grade agreed for their students to participate, appropriate consent and assent forms for the parents and students were distributed by the experimenter and were returned by the participants and their parents prior to pre-testing.

The study was conducted in the participants’ school, a Band One school within the School Support Programme (Delivering Equality of Opportunity in Schools, DEIS) by the Irish Department of Education (DES 2005). DEIS schools are identified as those with the highest level of disadvantage, who receive additional supports and resources. Band One schools are those where the level of disadvantage is the greatest. DEIS urban primary schools report higher levels of pupils with literacy and numeracy difficulties as well as emotional and behavioral difficulties. Absenteeism is also reported as a more significant problem in DEIS urban Band One schools (Smyth and McCoy 2009). The school had mixed-sex classes up to second grade and all-female classes from there upwards. The study took place in a room separate to the participants’ regular classroom, either prior to break time in the morning or after lunch in the afternoon. Participants came to the room in groups of 5 dyads, and each dyad sat across the table from each other. The experimenter was in the room throughout each session and was aided by a second instructor during several sessions.

Intervention Agents and Training

Both the experimenter and an additional instructor were masters level students completing university postgraduate training in Applied Behavior Analysis. The experimenter provided training sessions in frequency-building and peer tutoring intervention to the second instructor prior to and during the course of the intervention.

Materials

Flashcards

Sets of flashcards consisting of targeted math facts were developed for frequency building using SAFMEDS. Facts were printed on the front of a flashcard, and the answer was printed on the reverse. The tutee was required to see the math fact on the front of the flashcard and then to say aloud the answer that was printed on the reverse of the card. Math facts were printed on laminated rectangular cards, measuring 10 × 5 cm. Facts were printed in black font on a white background. The answers on the back of the cards were printed in gray to prevent the answer being visible to the tutee. Each set of tutoring cards was contained in an envelope and labeled to indicate the math facts set inside. Each set contained 70 cards, as this was 10 more than the fluency aim. Further, based upon pre-test rates of responding, the experimenters anticipated that participants would not score higher during timings. Cumulative sets of previously mastered targets contained 80 cards, as the fluency aim for these was 60–70. A green “Go” was placed on the front of each envelope to indicate the current set of problems that the tutee was working on.

Curriculum

The Morningside Math Facts: Addition and Subtraction curriculum (Johnson 2008) was used to identify component skills in mathematics which were targeted during intervention. Within this curriculum, math facts are presented in fact families which allows the learner to create four facts from a three-number family (for example, by memorizing the 2,3,5 family, students combine these numbers to get 2 + 3 = 5, 3 + 2 = 5, 5–3 = 2, 5–2 = 3). The addition and subtraction curriculum is broken down into 16 levels or slices, each containing between three and four fact families. For the current intervention, each set of flashcards was based on a slice of the curriculum containing the fact families within that slice. There were four cumulative sets interspersed throughout the intervention consisting of a mixture of math facts from previously mastered sets.

Fluency aims used in the current study were based on those within the Morningside curriculum. Participants were required to reach a fluency aim of between 50 and 60 correct responses per minute for each set of flashcards containing newly introduced math facts, and a fluency aim of 60–70 correct responses per minute for cumulative sets. At the beginning of the intervention, all participants were assessed on the first set of flashcards, which was based on the first slice of the curriculum. No participant was found to perform fluently; therefore, all participants began with this introductory set. They then progressed, in sequence, through flashcard sets based on each subsequent slice of the curriculum. Progression to the next set was contingent on achieving the fluency aim on the previous set.

Tutoring Folders

Tutoring folders were made from manila file folders. On the inside left, there was a progress chart consisting of 10 steps depicted by either ten flowers or ten stars for the tutees to color in. Tutees colored in a step each time they beat their previous timing by five or more correct responses. Participants had a chart in their folders on which they received checks as positive reinforcement during the tutoring sessions. A large emoticon and a large “X” were placed on the back of the folder to record correct and incorrect responses to math facts during timings (i.e., correct responses to the presentation of a flashcard were placed on the emoticon and incorrect responses placed on the “X”). Each dyad was also provided with a data collection sheet where the tutees progress was recorded. Data sheets were divided into sections pertaining to each slice of the curriculum with rows representing an intervention session and columns used to record correct and incorrect responses after each timing. Either the tutor or tutee wrote in the number of correct and incorrect math facts after each timing.

Timer

A small digital timer with a countdown setting was used to keep time during the testing sessions. Each dyad had their own individual timer.

Dependent Measures

To evaluate outcomes of the intervention on mathematics skills, two subtests of the Woodcock Johnson Tests of Achievement, Third Edition (WJIII; Woodcock et al. 2001), namely Mathematics Fluency and Calculation, were conducted with all participants at pre- and post-testing. A direct measure of fluency with math facts targeted during intervention was also tested pre- and post-intervention. To investigate any impact on social skills, the Social Skills Improvement System (Gresham and Elliott 2008) was also competed at pre- and post-testing with all participants.

Woodcock Johnson Test of Achievement, Third Edition

The WJIII (Woodcock et al. 2001) is an individually administered assessment for measuring skills in reading, mathematics, writing, oral language abilities, and academic knowledge. The internal consistency of the WJIII (Woodcock et al. 2001) has been reported at .98 (McGrew and Woodcock 2001), and correlation coefficients between .65 and .79 have been reported between the WJIII (Woodcock et al. 2001) and other leading measures of achievement (McGrew and Woodcock 2001). Two mathematics subtests of the WJIII (Woodcock et al. 2001) were used in the study. These subtests measure computational skills and automaticity with basic math facts, providing a measure of basic math skills. Test–retest reliabilities for the calculation and mathematics fluency subtests of the WJIII have been reported at .86 and .90, respectively (Schrank et al. 2001).

Calculation

This subtest measures mathematics achievement in relation to ability to access and apply knowledge of numbers and calculation procedures. It is a paper and pencil test involving arithmetic and computation. Participants are presented with the problems on a worksheet and are asked to provide written answers on the sheet next to each problem.

Mathematics Fluency

This subtest measures mathematics achievement: specifically, the ability to access and apply automatically, and with speed, digit-symbol arithmetic procedures. Participants are presented with a sheet of simple calculations and asked to answer as many as they can correctly in 3 min. The subtest includes two sheets in total with 80 calculations on each.

Fluency with Target Math Facts

Flashcards consisting of math facts from the Morningside Math Facts: Addition and Subtraction curriculum (Johnson 2008) were used to directly assess fluency with math facts targeted across intervention sets. Each participant was assessed using multiple exemplars of addition and subtraction problems targeted during intervention (Mc Tiernan et al. 2016). Seventy problems relevant to the first 7 fact families in the Morningside Math Facts: Addition and Subtraction curriculum (Johnson 2008) were presented in random order. Participants were assessed on the same problem set at pre- and post-testing. Facts were printed on the front and answers printed on the reverse of the flashcards. As many cards as possible are presented in a 1-min timing, after which the number of correct and incorrect responses are counted and recorded. The assessment procedure required the experimenter to show the participant the front of the tutoring card and the participant to answer the math fact. The experimenter waited 3 s for a response. If an incorrect response was given or there was no response, the experimenter moved to the next card and an incorrect response was recorded. If the participant gave a correct response within 3 s, the experimenter moved to the next card and a correct response was recorded.

Social Skills Improvement System—Rating Scales

The Social Skills Improvement System—Rating Scales (SSIS-RS; Gresham and Elliott 2008) is a multi-rater series of rating scales that includes rating scales for teachers, parents, and participants. It is designed for children aged 8–18 years and is a revised form of the Social Skills Rating System (SSRS; Gresham and Elliott 1990). It documents the frequency of social skills and competing problem behaviors. Forms include common social skills in the following subdomains: communication, assertion, responsibility, empathy, engagement, and self-control. The Problem Behavior subscales include the subdomains of: externalizing, bullying, hyperactivity/inattention, internalizing, and autism spectrum. Participants indicate how true a statement is about each social skill and problem behavior for them using a 4-point scale of not true, a little true, a lot true, and very true. The scales are reported to have a high reliability from a national standardization sample (N = 4700) reported in the SSIS-RS manual (Gresham and Elliott 2008). The reliability of subscales is between .70 and .80 for the student forms, and test–retest reliability is between .50 and .80. In the current study, the student response form for ages 8–12 years was completed by each participant.

Independent Variables

Control Condition

At the beginning of the intervention there were 15 participants in the control condition; however, due to attrition, only data from 14 participants were included in the analysis. Participants in the control group were in third or fourth grade and participated in treatment as usual (TAU) in the form of typical classroom mathematics instruction. They did not receive any component of the intervention (i.e., peer tutoring or frequency building using SAFMEDS and targets from the Morningside Math Facts: Addition and Subtraction curriculum (Johnson 2008). The intervention sessions took place outside of scheduled classroom mathematics instruction, so during intervention times the control group remained in their regular classroom and participated in other subject instruction.

Experimental Condition

There were 15 participants from third and fourth grade in the experimental condition. The experimental group also received TAU in the form of typical classroom mathematics instruction, the same as the control group. In addition to this, they participated as tutees in the peer tutoring intervention for 30 min, approximately 3 days per week, for 8 weeks. Participants in this group were tutored by peers in the fifth grade who were trained to implement frequency building using SAFMEDS and targets from the Morningside Math Facts: Addition and Subtraction curriculum (Johnson 2008) with their peers. These intervention sessions took place outside of scheduled classroom mathematics instruction and so were in addition to TAU.

Tutor Condition

Fourteen participants began in the tutor condition, but due to attrition, this number decreased to 12 during the intervention. Participants in the tutor group were fifth grade students and received TAU in the form of typical classroom mathematics instruction. In addition to this, they participated in tutor training, followed by tutoring their third and fourth grade peers in the experimental group for 30 min, approximately 3 days per week, for 8 weeks. These intervention sessions took place outside of scheduled classroom mathematics instruction and so were in addition to TAU.

Procedure

A stratified randomized control trial (RCT) was used to evaluate the efficacy of the intervention. Tutees (those in third and fourth grade) were stratified into pairs according to their pre-test scores on the WJIII (Woodcock et al. 2001) Mathematics Fluency subtest. Participants were matched into pairs according to the participant who had the next closest standard score to theirs. Participants in and each pair were then randomly assigned to either the control group (n = 14) or experimental group (n = 15) using stratified randomization design. Pre-test-dependent measures were recorded for all participants in the experimental, control, and tutor groups. Subsequent to this, the experimental and tutor groups participated in the peer tutoring intervention, while all groups continued to receive TAU. The intervention was carried out over 8 weeks, with an additional 4 weeks for pre- and post-testing. Once the intervention was complete, dependent measures were again conducted with each participant.

Intervention

Tutor Training

Tutors participating in the intervention were trained by the experimenter prior to the beginning of the intervention. Training was conducted in a separate session with only the participants who were acting as tutors. Tutors were trained in two groups, with eight tutors in the first group and six in the second group. Training sessions lasted 30 min each, during which tutors were instructed in the steps to be carried out during the tutoring procedure. Tutors were given a set of materials (folders, flashcards, and recording sheets) the same as those used during the intervention. The experimenter followed a training script (Appendix 1) for these sessions, adapted from Nobel (2005), which categorized the training procedure into segments: (1) introduction to tutoring, (2) tutoring procedure, and (3) recording.

The training followed a model-lead-test instructional format that provided opportunities for active student responding during training. This instructional strategy involved the experimenter modeling each part of the tutoring procedure for the tutors, leading them through the procedure as a group, and finally assessing the tutors’ independent use of the procedure in dyads. The experimenter demonstrated the tutoring activities to the tutors and then the tutors engaged in role-playing of the activities in pairs, with one acting as tutor and the other as the tutee. During the training session, emphasis was placed on the presentation of the math facts, the correction procedure, and recording of responses. At the end of the training session, there was a “test” portion to determine whether tutors could accurately carry out the intervention procedure. During the “test” portion of the training, procedural integrity was recorded for each of the tutors. Procedural integrity checklists were used allowing the experimenter to calculate the number of steps implemented correctly by each tutor. Training was only considered complete when the students demonstrated the tutoring procedures with 90% accuracy. All the tutors demonstrated the procedures with at least 90% accuracy during the training session, with an average of 95.48% accuracy.

Tutoring Sessions

Tutoring sessions were conducted on average 3 days per week for 30 min. Approximately 15 min was spent on frequency building with SAFMEDS, with a mean of 5.21 timings per session for each tutee. The intervention lasted a total of 8 weeks. Participants completed an average of 15.3 intervention sessions, equating to 229.5 min of total intervention time. This included time for participants to come to the tutoring room and collect their materials and time at the end to tidy up their materials and return to their classrooms. The participants were divided into peer tutoring dyads, with a fifth grade participant acting as the tutor and a third or fourth grade participant acting as the tutee. Outside of this requirement, the dyad assignments were random. Tutoring dyads rotated each week. Due to the uneven number of participants in the tutor and tutee groups, some tutors were required to participate in two sessions each day. The selection of these tutors was random and changed each day.

In the first session, the experimenter explained the tutoring procedure to both the tutors and tutees and described how each session thereafter would be carried out. At the beginning of the session, tutees collected their tutoring folder from the front of the classroom. Each assigned tutor and tutee pair then sat at their table facing their partner. Once all participants were seated in their pairs, the experimenter directed the participants to take their recording sheets and reinforcement sheets from their folders, and to get their targeted flashcards out of their “Go” envelope.

Tutees were told that they needed to achieve a pre-determined fluency aim (e.g., between 50 and 60 correct responses per minute) to move onto the subsequent set of flashcards and that there were a total of 16 fluency aims corresponding to each set. Tutees could perform anywhere within the provided fluency range. At four intervals in the curriculum (after slice 7, 10, 13 and 16) tutees worked on a cumulative set of flashcards from the previously mastered sets. The fluency aim set for this was 60–70 correct responses per minute. Tutoring dyads conducted frequency building using SAFMEDS throughout each session.

Frequency Building Using SAFMEDS

Tutors set their timers to 1 min before starting each timing. Tutors first shuffled the set of cards, began the timer, and then presented one card at a time to the tutee. The tutee answered the math fact presented on the tutoring card. If the tutee gave the correct response for a card, that card was placed on the emoticon printed on the back of the tutoring folder. If the tutee responded incorrectly, or did not give a response within 3 s, the card was placed on the X on the back of the folder. The tutor presented as many cards as possible in the 1-min timing. No immediate feedback was provided for responses during the timings. Feedback on responses was provided at the end of each timing.

Corrective Feedback

When the 1-min timing elapsed, the tutor represented the cards that were placed on the incorrect pile by holding up the cards for the tutee in the same way they were presented during the timing. If the tutee answered correctly the tutor said “Well done!” and moved to the next card in the pile. If the tutee gave an incorrect answer, or did not respond within 3 s, the tutor provided an error correction procedure using least-to-most prompts. This consisted of first telling the tutee to “Try again.” If a correct response was then given, the tutor would say “Well done!” and move on to the next card. If the tutee again provided an incorrect response or no response, the tutor provided the answer and asked the tutee to repeat the answer. This was completed for each card answered incorrectly during the timing. Once tutors had carried out the error correction procedure, they reset the timer and repeated the procedure.

Reward Contingencies

It was explained to participants in the first session that they would receive “checks” based on four target behaviors. The first target behavior was “working well,” and checks were delivered contingent on correct responding and working independently. The second target behavior was “working fast,” and checks were delivered if participants beat their score on their previous timing. The third target behavior was “getting your goal,” and participants received checks in this category for achieving a fluency aim range (e.g., between 50 and 60 correct responses) or performing above this criterion. The final target behavior was “listening,” and participants received checks when they followed instructions (e.g., “take out your recording sheets”) throughout the session. Each participant had a chart on the desk beside their folder with pictures and words representing each target behavior.

The experimenter provided rewards for target behaviors throughout the sessions by placing a check on the relevant category and specifying why they had received the check (e.g., “you got your goal!”). Checks were delivered on a fixed ratio schedule for “working fast” and “getting your goal.” Each time one of these behaviors occurred a check was delivered. Checks for “working well” and “listening” were delivered on a variable ratio schedule. Once four checks were received, the participant was awarded a raffle ticket. Alternatively, one check on the “getting your goal” category was immediately rewarded with a raffle ticket, ensuring a higher magnitude of reward for achieving a fluency aim. At the end of each week, a raffle was held and one participant won a prize (e.g., coloring pencils, pens, hair clips, and small toys). Tutees also had progress charts, which consisted of 10 steps, depicted by flowers or stars. Each time a tutee beat their previous timing by five or more correct answers they were instructed to color in a step. This provided additional opportunities for reinforcement for tutees for increases in rate of correct responding.

Data Collection

After each timing was carried out, either the tutors or tutees recorded the number of correct and incorrect responses on their recording chart. Progression to the next set of flashcards was contingent on the attainment of each fluency aim. When a fluency aim was not attained, additional timings were subsequently completed by the tutor and tutee on the same set. The data from each session was transferred to a graphic presentation of correct and incorrect responses for each timing by the experimenter, using standard celeration charts. This provided a standard display of frequency and allowed the experimenter to assess each tutee’s growth of learning across time.

Interobserver Agreement

Interobserver agreement (IOA) was conducted for 14.63% of the pre-test fluency timed probes and 17.07% of the post-test fluency timed probes. Agreement was defined as scoring the same response for each observation. Percentage agreement was calculated by dividing the number of agreements on responses correct plus agreements on responses incorrect by the number of agreements on responses correct and agreements on responses incorrect plus disagreements on responses correct and disagreements on responses incorrect and multiplying by 100. Agreement between the experimenter and second instructor was calculated resulting in 100% agreement for pre-test probes and 98.65% for post-test probes.

Procedural Integrity

The experimenter was present for every session and monitored implementation using the same procedural integrity checklist as was used in tutor training and was aided by a second instructor during a number of sessions. The second instructor also provided reinforcement checks and monitored procedural integrity, as well as conducting interobserver agreement. Procedural integrity was defined as the number of steps of instruction completed correctly by the tutor. Procedural integrity checklists were completed by the experimenter and second instructor during each session. The checklists provided a task analysis of the tutoring procedure and were used to demonstrate that the students were correctly completing each step of the tutoring procedures. Each step that was observed as being completed correctly by the students was marked with a checkmark. The number of steps completed correctly during each timing was then calculated.

Interobserver agreement on timings was included as part of the procedural integrity procedure. Procedural integrity was recorded for a total of 87 timings during the intervention. The average procedural integrity across tutors was 91.19%, with a range of 81.25–100%. Interobserver agreement was also carried out on 5.7% of the procedural integrity checklists carried out by the experimenter. The second instructor observed and recorded the number of steps completed correctly by the tutors, and percentage of agreement on the procedural integrity on each timing was calculated. Interobserver agreement for procedural integrity was 92.73%.

Social Validity

A brief questionnaire was developed by the experimenter to assess social validity. This was administered to all the tutors and tutees who had participated following the completion of the intervention. The questionnaire combined questions based on Likert scales of agreement and open-ended questions to gain an insight into participants’ experience of the intervention. The questionnaire consisted of eight questions, the first three of which were answered using 5-point Likert scale from “strongly disagree” to “strongly agree.” These questions addressed whether participants enjoyed the intervention, whether they would participate again given the opportunity, and whether they thought the intervention had been helpful for them. The last 5 questions were open-ended to allow for elaboration on participants’ experience. These questions addressed what participants liked most and least about the project, if they thought switching partners was beneficial, suggestions for how they might change the intervention, and any other comments they would like to make.

Data Analysis

A multivariate analysis of covariance (MANCOVA) was conducted to investigate the differences between the three groups on mathematics-dependent measures at post-testing. Participants’ pre-test scores were used as the covariates in the analyses. A separate MANCOVA was conducted to investigate differences in post-test scores between groups on the SSIS-RS (Gresham and Elliott 2008). Preliminary checks were conducted for each MANCOVA to ensure there was no violation of assumptions. The assumptions tested for were univariate and multivariate outliers, normality, multicollinearity, equality of covariance matrices, homogeneity of variance, and homogeneity of regression slopes.

Results

The Experimental Group completed an average of 15.3 intervention sessions, achieving an average of 6.6 fluency aims. The number of fluency aims reached ranged from two (achieved by four students) to 18 (achieved by one student). Table 1 outlines the number of sessions completed and number of fluency aims achieved by the experimental group.

Table 1 Number of sessions conducted and aims achieved by the experimental group

Mathematics-Dependent Measures

Preliminary checks on the data indicated that the pre-test scores for fluency with target math facts violated the assumption of normality, p = .03. The data were transformed using a log transformation. Outliers were identified in the post-test of fluency with target math facts and the WJIII (Woodcock et al. 2001) Mathematics Fluency subtest; therefore, transformations were also carried out on these variables. The transformed variables were subsequently included in the MANCOVA. The preliminary checks found that there were no other violations of assumptions.

The results of the MANCOVA indicated a statistically significant difference between groups on the combined dependent variable, λ = .6, F (6, 66) = 3.198, p = .008, η 2 p  = .225. Multivariate tests were followed up with univariate analyses of covariances (ANCOVA) to investigate the effect of intervention, controlling for pre-tests, on each of the dependent variables separately. ANCOVA results demonstrated that there were no significant differences between groups for scores on the Calculation subtest of the WJIII (Woodcock et al. 2001), F (2, 35) = 1.721, p = .194, η 2 p  = .090. Significant differences between groups were found for the Mathematics Fluency subtest of the WJIII (Woodcock et al. 2001), F (2, 35) = 5.398, p = .009, η 2 p  = .236, and for fluency with target math facts, F (2, 35) = 8.504, p = .001, η 2 p  = .327.

Bonferroni post hoc pairwise comparisons were made to assess pairwise differences in adjusted group means (controlling for pre-tests as covariates). The experimental group’s adjusted means were significantly higher than of the control group for both the WJIII (Woodcock et al. 2001) Mathematics Fluency subtest (p = .014) and fluency with target math facts (p = .001). Results demonstrated no significant differences for means on any of the mathematics-dependent measures between the tutoring group and either the control or experimental group. Table 2 shows the original mean scores for each group on pre- and post-test mathematics-dependent measures.

Table 2 Mean performance on dependent measures at pre- and post-tests for each group

SSIS-RS

MANCOVA was conducted to investigate the differences between the three groups on social skills and problem behaviors (SSIS-RS; Gresham and Elliott 2008) at post-testing. Four participants were excluded from the MANCOVA due to incomplete responses. The remaining 37 participants were included in the analysis. Preliminary checks on the data indicated that the post-test scores for the Problem Behavior subscale violated the assumption of normality, p = .005. The data were transformed using an inverse transformation, and the transformed variables were included in the MANCOVA analysis. The preliminary checks found that there were no other violations of assumptions. The results of the MANCOVA found that there was no statistically significant difference between groups on the combined dependent variables, measuring social skills and problem behaviors, λ = .95, F (4, 62) = .45, p = .776, η 2 p  = .03. Table 2 shows the original mean scores for each group on the SSIS-RS (Gresham and Elliott 2008).

Social Validity

The social validity measure provided an account of participants’ attitudes toward their participation in the intervention. The vast majority of participants enjoyed being part of the intervention, with 96% of the participants indicating that they liked taking part. Ninety-one percent of tutors and 80% of tutees said they would participate in the intervention again if they could. Ninety-one percent of tutors and 100% of tutees felt that the intervention had helped them with their mathematics. Sixty-seven percent of tutors and 80% of tutees preferred rotating the dyad pairs to remaining in the same pairing for the full intervention. Tutors indicated that their favorite aspects of intervention included “helping others get their goals” and “teaching the tutees,” while most of the tutees stated that their favorite aspect was either that it helped them with their mathematics or the reinforcement aspect of the intervention. Responses from the tutees mainly indicated that there was “nothing” they liked least about the intervention, while some of those participating as tutors highlighted the behaviors of their tutoring partners as the thing they liked least about the intervention “doing math with people that don’t listen” and “that some people cheated to go onto the next set.”

Discussion

The current study evaluated the outcomes of a school-based fluency intervention incorporating cross-age peer tutoring with frequency building using SAFMEDS and targets from the Morningside Math Facts: Addition and Subtraction curriculum (Johnson 2008). The intervention targeted fluency with basic mathematics component skills and employed an RCT design to evaluate its effectiveness. The experimental group demonstrated the ability to progress through the intervention with many achieving a considerable number of fluency aims during the 8-week intervention. Significant differences on mean post-test scores between the experimental and control groups were found for the Mathematics Fluency subtest of the WJIII (Woodcock et al. 2001) and on fluency with target math facts. There were no significant differences between groups on the Calculation subtest of the WJIII (Woodcock et al. 2001), or on measures of social skills and competing problem behaviors (SSIS-RS; Gresham and Elliott 2008).

The current findings are in agreement with the previous research showing the effectiveness of cross-age peer tutoring (Hawkins et al. 2009; Robinson et al. 2005; Fueyo and Bushell 1998) and fluency-based interventions (Gallagher 2006; Hartnedy et al. 2005; Mc Tiernan et al. 2016; Poncy et al. 2010) to increase fluency with mathematics skills. Cross-age peer tutoring has not been as widely researched in the literature as homogeneous peer tutoring and has often included participants with learning disabilities or emotional and behavioral difficulties (Heron et al. 2003; Rohrbeck et al. 2003). The current study supports the use of cross-age peer tutoring with typically developing students from a disadvantaged population. It provides an important addition to the literature on cross-age peer tutoring for mathematics, demonstrating that it provides an effective intervention for younger tutees. This has further important implications for practice, indicating that cross-age peer tutoring is an instructional approach that would afford beneficial outcomes if incorporated into educational settings. The relatively low demands on teachers further supports cross-age peer tutoring as a favorable strategy (Dufrene et al. 2010).

The current study provides evidence for an instructional approach, utilizing the combination of peer tutoring and frequency building using SAFMEDS, to increase rates of correct responding with math fact families. Further, most participants reported that they enjoyed taking part in this instructional approach. The implications for educational settings are significant, given the importance of math fact mastery in the progression to more complex skills and overall mathematical ability (Carr et al. 2008; Jordan et al. 2003; Nelson et al. 2013) and the significant correlation which exists between SES and mathematics performance (OECD 2012). The current study provides evidence for the efficacy of this approach as well as demonstrating the ease with which such an approach can be utilized for an elementary school population.

The findings also contribute to the literature base supporting fluency-based interventions to improve performance with component mathematics skills. The successful use of fluency-based interventions has been demonstrated within the literature for a number of years, yet there is still a lack of empirically validated approaches (Codding et al. 2009). Few studies to date have used RCTs to evaluate such approaches meaning that the current study contributes significantly in establishing evidence-based instructional practices. Employing an RCT within an educational setting promotes confidence that the treatment outcomes reflect an empirically robust investigation of the fluency intervention. Though often considered impractical to establish, RCTs within fluency research will further support educational applications for children from disadvantaged backgrounds.

There were no significant differences between the tutor group and control group across mathematics-dependent measures. The use of SAFMEDS as a method of instruction for math facts could potentially have impacted on the lack of academic gains demonstrated by the tutor group. The benefits observed for tutors in the previous research is often attributed to the process of constructing explanations for tutees on the material being taught, requiring the tutor to elaborate and generate connections between the current material and previously learned information (Fuchs et al. 2002). For example, in an intervention involving teaching of algebra, tutors combine their knowledge of basic math facts with the algebraic equations they are helping their tutees to learn. As the answers to the math facts were provided on the reverse of the cards, there was little opportunity for learning for the tutors, who could simply read the answers to math facts from the cards. Future research should evaluate possible benefits for peer tutors using different methods of instruction and differing target skills.

The tutoring intervention did not significantly impact measures of social skills or problem behaviors of either tutors or tutees. This is inconsistent with the previous research demonstrating that academic tutoring has beneficial collateral effects on social outcomes (Robinson et al. 2005). The variability in scores on the SSIS-RS at pre- and post-testing indicates that this measure may not have been representative of true levels of social skills. Individual scores varied widely from pre- to post-tests, with both increases and decreases in each scale across the groups. There was no discernible trend for either the Social Skills or Problem Behaviors in any of the three groups. An alternative measure that was not reliant on subjective reports may have been more suitable to determine a more accurate picture of participants’ social skills and problem behaviors.

Limitations and Future Directions

As mentioned above, the variability in scores on the SSIS-RS (Gresham and Elliott 2008) indicates that it may not be a sensitive enough measure for assessing social skills and problem behaviors. Subjective measures have been found in the previous research to show no correlation with independent, objective measures related to the variable of interest, and can be difficult to interpret because they are often expressed in ordinal scales (Jahedi and Mendez 2013). Future research should also include objective measures, such as direct observation of behavior, or include additional subjective measures, such as teacher ratings in addition to the self-ratings used in the current study. These methods would provide a more sensitive measure of participant behaviors and social skills, rather than relying on participant’s own reports.

The level of interobserver agreement collected for pre- and post-test fluency probes is a limitation to the current study, as collection for both was less than that which is traditionally recommended. In addition, interobserver agreement was not collected for the WJIII subtests (Woodcock et al. 2001) and the SSIS-RS (Gresham and Elliott 2008) administered pre- and post-test. Future studies should ensure that sufficient levels of IOA are collected for all dependent measures to improve reliability of measurement.

Participants achieved a wide range of fluency aims across the 8 weeks of intervention, ranging from 2 to 18, despite similar number of sessions. Intervention data were charted on a standard celeration chart to monitor each participant and to ensure that they were making progress. However, some participants achieved only two fluency aims. Some participants most likely required more time to achieve each fluency aim, and because the intervention was of a relatively low intensity and of short duration, they did not achieve more during the study. However, it may have been beneficial to add criteria for modifications to instruction (e.g., reversing to a previous fluency aim) if participants did not reach a particular aim within a set number of sessions as demonstrated on the standard celeration charts.

Follow-up assessments of retention should also be included in future studies to determine whether the short-term gains observed during intervention have been maintained. Due to time constraints imposed, it was not possible to conduct such a follow-up assessment. Retention is one of the critical outcomes associated with the fluent performance of skills (Weiss 2001). It would be beneficial to determine whether the short-term academic gains translated to long-term gains over time and whether continued instruction with component skills would lead to improvements in other Mathematics applications over time.

Conclusion

Proficiency in basic computational skills is essential for independent adaptive functioning in numerous areas of daily life. Despite this, there is a widely acknowledged deficit in students’ performance of such skills and a scarcity of research on effective, empirically based interventions to increase fluency in these skills (Codding et al. 2010; Codding et al. 2009; Poncy et al. 2013). The current study provides support for the incorporation of cross-age peer tutoring and frequency building using SAFMEDS to increase fluency with math fact families. Relatively little of this research has focused on cross-age peer tutoring for a mainstream disadvantaged population, while a lack of sufficient research on empirically validated instructional approaches to increase fluency also exists in the literature. This study aimed to address such gaps in the literature and to assess the benefits of such an intervention for both tutees and tutors. The findings suggest that while there may be limited benefits for tutors, the use of cross-age peer tutoring and fluency-based instruction has significant benefits for younger tutees.