Applied behavior analysis has an expansive history of using behavioral techniques to enhance the training and performance of athletes in a long list of sports. The list includes football (Allison & Ayllon, 1980; Komaki & Barnett, 1977; Smith & Ward, 2006; Stokes, Luiselli, Reed, & Fleming, 2010; Tai & Miltenberger, 2017; Ward & Carnes, 2002), tennis (Allison & Ayllon, 1980; Ziegler, 1987), swimming (Hume & Crossman, 1992; Koop & Martin, 1983; Rogers, Hemmeter, & Wolery, 2010; Schonwetter, Miltenberger, & Oliver, 2014), soccer (Brobst & Ward, 2002; Ziegler, 1994), gymnastics (Allison & Ayllon, 1980; Boyer, Miltenberger, Batsche, & Fogel, 2009), track and field (Scott & Scott, 1997), basketball (Kladopoulos & McComas, 2001), in-line speed skating (Anderson & Kirkpatrick, 2002), martial arts (Harding, Wacker, Berg, Rick, & Lee, 2004), rugby (Mellalieu, Hanton, & O’Brien, 2006), and many more (Schenk & Miltenberger, 2019).

Behavioral techniques have also enhanced skills using many procedures and intervention packages (Luiselli, Woods, & Reed, 2011; Schenk & Miltenberger, 2019). In a review by Schenk and Miltenberger (2019), they found that the enhancement of sports performance has been targeted using procedures such as consequence and antecedent manipulations, feedback procedures, skills training, and unique procedures such as habit reversal and acceptance and commitment therapy. Some examples of intervention packages used include positive and negative reinforcement, video modeling, teaching with acoustical guidance, public posting, goal setting, verbal and video feedback, self-monitoring, simulated practice, behavioral rehearsal, relaxation, discrimination training, and many more (Luiselli et al., 2011; Schenk & Miltenberger, 2019).

Among this volume of literature, there is a small number of published studies focused on improving dance behavior. Auditory feedback (Quinn, Miltenberger, Abreu, & Narozanick, 2017; Quinn, Miltenberger, & Fogel, 2015; Quinn, Sherman, Sheldon, Quinn, & Harchik, 1992), error correction procedures (Fitterling & Ayllon, 1983; Vintere & Poulson, 2010), and self-instruction procedures (Vintere & Poulson, 2010) have been shown to improve the accuracy of dance performance. However, only one study on dance instruction has adopted a rate-based measure. Lokke, Lokke, and Arntzen (2008) recognized the importance of shaping both the accuracy and speed of responding to improve dance performance. Lokke et al. (2008) worked with a ballet dancer to correct a faulty component (i.e., three jumps on the left foot that ended in a 90-degree turn) that interfered with the dancer’s accurate performance of a dance routine. Specifically, they used timed-practice intervals (i.e., timings) of 15 s along with performance feedback and modeling to help the dancer master the faulty component. They found that once the dancer was performing the jumps accurately and at the same rate as that of an experienced dancer, she was able to complete the routine flawlessly. Moreover, the amount of practice time totaled 21 timings across 9 days, demonstrating the efficiency of the intervention.

The importance of the Lokke et al. (2008) study is that it introduced the framework of precision teaching (PT) to dance instruction. Though many of the interventions for enhancing sports performance have been effective in building the accuracy of specified responses during interventions, there have been limitations in terms of maintenance of trained responses and generalization to untargeted responses (Luiselli et al., 2011). PT has a long history of producing important learning outcomes in classrooms and other educational settings with a variety of learners using a framework that emphasizes rate of response (frequency) displayed on the standard celeration chart (SCC), continuous real-time monitoring for effective data-based decision making, and an approach to instructional design based on a component-composite analysis (Binder, 1996; Johnson & Street, 2013; Kubina & Yurich, 2012). Therefore, a PT approach to dance instruction, and sports performance more generally, may yield the learning outcomes that, according to Luiselli et al. (2011), have been missing from this literature base.

Precision Teaching and Behavioral Fluency

Simply put, PT involves making instructional decisions on continuous performance frequencies displayed on the SCC (Lindsley, 1992a, 1992b). This emphasis on sensitive measurement gave rise to a general process of data-based decision making that can be summed up in four steps: pinpoint, record, change, and try again (Johnson & Street, 2013; Kubina & Yurich, 2012). This process is a continuous cycle of defining objectives (pinpoint), providing practice opportunities and recording frequencies (record), changing strategies or interventions based on performance (change), and in cases in which the first change does not work, persisting in systematically changing variables and observing their effects on learner performance (try again). This process is a perpetual exploration and refinement of instructional strategies tailored to maximize efficient learning. Through this process, precision teachers have discovered that building responses to optimal rates of responding leads to behavioral fluency (Haughton, 1972, 1980; Johnson & Layng, 1992).

Behavioral fluency is defined by the important learning outcomes that emerge at optimal performance frequencies (Binder, 1996; Johnson & Layng, 1992, 1996; Johnson & Street, 2013; Kubina & Yurich, 2012). In other words, the assessment of mastery is post hoc, and certain observable features participate in the evaluation. Specifically, a mastered skill will retain over time without practice, endure when performed for sustained periods, persist (i.e., stability) in conditions with competing stimuli, and can be applied to (i.e., facilitates) learning related skills (RESA; Berens, Boyce, Berens, Doney, & Kenzer, 2003; Berens & Hayes, 2007; Brady & Kubina, 2010; Dembek & Kubina, 2018; Haughton, 1972; Carl Hughes, Beverley, & Whitehead, 2007; Kubina, 2005; Newsome, Berens, Ghezzi, Aninao, & Newsome, 2014; Stocker, Schwartz, Kubina, Kostewicz, & Kozloff, 2019; Twarek, Cihon, & Eshleman, 2010; Weiss, Foley, Pearson, & Pahl, 2010).

Anyone who has played a sport, learned a musical instrument, or danced recognizes the importance of practice to master the skills of those domains (Binder, 1996). One way of practicing that precision teachers commonly use to shape fluent behavior is frequency building. Frequency building involves timed repetition of behavior and performance feedback after the timed interval to build the frequency of correct responses to specified frequency aims, while simultaneously establishing low frequencies of incorrect responses (Kubina & Yurich, 2012). Practicing in this manner typically leads to behavior that seems effortless, automatic, or masterful—words typically used to describe fluent behavior (Binder, 1996; Haughton, 1972).

In PT, the explicit purpose of practice is to build the frequencies of component skills rather than composite skills (Johnson & Street, 2013). Research has shown that building the frequency of component skills in a training sequence generally facilitates the acquisition of more complex skills and can have facilitative effects across nontargeted repertoires (Berens & Hayes, 2007; Brosnan, Moeyaert, Newsome, & Healy, 2018; Haughton, 1972, 1980; Carl Hughes et al., 2007; Johnson & Layng, 1992; Johnson & Street, 2013; Lokke et al., 2008; McDowell, Mcintyre, Bones, & Keenan, 2002; McTiernan, Holloway, Healy, & Hogan, 2016; Newsome et al., 2014; Twarek et al., 2010). For example, the performance of basic arithmetic computations (i.e., addition, subtraction, single-digit multiplication, and division) that is both accurate and sufficiently fast aids in the acquisition of related, but more complex skills, such as long division. Similarly, in sports, a basketball player who learns how to dribble a ball at an optimal frequency will more easily learn to pass and shoot the ball from various positions on the court in order to play effectively in a game. Thus, component-composite approaches to instructional design are advantageous in that the arrangement of skills in an instructional sequence is an important variable that impacts the efficiency of learning. Addressing dysfluency at the root cause prevents cumulative dysfluency (the compound effects of layering fluency-deficient component skills on top of one another), which can limit or even halt the acquisition of composite skills (Binder, 1996; McDowell & Keenan, 2001).

Precision Teaching Research

The majority of PT and fluency literature focuses on education and academic skills (Berens et al., 2003; Brady & Kubina, 2010; Brosnan et al., 2018; Cavallini & Perini, 2009; Chiesa & Robertson, 2000; Griffin & Murtagh, 2015; Carl Hughes et al., 2007; Lambe, Murphy, & Kelly, 2015; McDowell & Keenan, 2001; McTiernan et al., 2016; Mercer, Campbell, Miller, Mercer, & Lane, 2000; Milyko, Berens, & Ghezzi, 2012; Newsome et al., 2014; Sleeman, Friesen, Tyler-Merrick, & Walker, 2019). Research on nonacademic skills is small by comparison and limited to a handful of studies on fine-motor behavior and sports (Fabrizio, Schirmer, King, Diakite, & Stovel, 2007; Lokke et al., 2008; McDowell et al., 2002; Twarek et al., 2010; Weiss et al., 2010). The most widely taught nonacademic skills are fine-motor responses necessary to improve handwriting and daily living skills. Eric Haughton (1972) and his colleagues (Binder, 1996) pioneered the work on the “Big 6 + 6.” They discovered that strengthening a series of 12 fine-motor responses (i.e., reach, touch, point, grasp, place, release, push, pull, shake, squeeze, tap, twist) leads to better acquisition of complex skill sets of daily living. Innovative applications of the Big 6 + 6 approach includes one study by Twarek et al. (2010), on teaching young children with autism to dress themselves, and a second study by Fabrizio et al. (2007), on teaching a young boy with autism how to squeeze an object, a deficit that prevented the child from playing with toys that required this response. The Big 6 + 6 approach has also proven to be successful in strengthening hand dexterity for sign language purposes with people suffering from traumatic brain injury (Chapman, Ewing, & Mozzoni, 2005).

PT has also contributed to the area of sports, though in a limited capacity. Aside from the Lokke et al. (2008) article on dance, there is one other demonstration of PT and fluency-based instruction applied to sports. McDowell et al. (2002) improved the swing of golfers by strengthening components of a golf swing in isolation (e.g., backswing, follow-through). The participant’s golf swing improved even though it was never directly targeted. Both studies demonstrated a PT-oriented solution to a sports performance problem.

These studies are a promising start toward answering the call from the sports performance literature to optimize learning outcomes, a hallmark of PT. Moreover, given the outcomes that the PT framework has yielded in other areas, it is time to expand this approach to sports and other motor-learning repertoires. Tap dancing is an ideal domain for investigating fluency through the PT framework, as it requires expert performers to tap at high rates. Applying a PT framework to tap dance instruction may reveal relevant component-composite relations that can help improve the efficiency of tap dance training, as well as provide an additional demonstration of how the major concepts of PT apply to sports performance.

The current study, therefore, sought to evaluate fluency outcomes with motor learning and to explore whether a PT approach to the teaching of tap dancing would yield efficient learning across a training sequence. Specifically, the purpose of this study was (a) to use a PT framework in the design, teaching, and evaluation of an instructional sequence to teach tap dancing and (b) to examine the facilitative effects of frequency building on tap dance components on untrained steps and probes of retention, endurance, stability, and application (RESA).

Method

Participants

Five adults participated in the study. The experimental participants included two females and two males. Daisy and Tina, the female participants, were both 23 years old at the time of the study. Andy was 25 years old, and Berry was 20 years old; both identified as male. Jack, the constant-series probe participant, was an 18-year-old male. No participants reported that they had dance or music experience, except for Andy, who reported he had played the trumpet in grade school. All participants reported that they had played organized sports as children and teenagers. Only one participant (Andy) continued to play an organized sport (basketball) as an adult.

Materials and Setting

All sessions took place at the University of Nevada, Reno, in a dance room equipped with one wall lined with mirrors and a wooden dance floor. Materials included data collection sheets, writing utensils, a video camera, scripts for the instructor, tap dance shoes for the participants and the instructor, a ThinkPad laptop computer with tap dance analysis software, an external microphone, wireless headphones, and an iPhone.

Response Targets

The tap dance steps used in the study are from the beginning curriculum at the Fascinating Rhythm School of Performing Arts, located in Reno, Nevada. Table 1 contains detailed descriptions of basic tap steps, the selected target responses, and the application combinations in this study. Note that not all training steps are part of the application combinations, but all training steps are in one or the other combination. Training steps were organized in pairs and placed in a presumed hierarchy of increasing complexity and taught sequentially. Toe taps and heel taps constituted Pair 1. Tip steps and dig steps formed Pair 2. Pair 3 included shuffles right and shuffles left. Shuffle steps and windshield wipers made up Pair 4. Two of the participants, Daisy and Andy, received some training on the fourth pair, but researchers revised the training sequence as a function of their data, and from that point forward, all participants experienced the training sequence in the order mentioned previously.

Table 1 Tap Steps Terminology

Dependent Variable

The frequencies of correct and incorrect responding on weekly probes of the untrained components after the current training component in the training sequence, as well as the frequencies of correct responding and incorrect responding on RESA probes, served as the dependent variables. Weekly probes served to evaluate generalization to untrained components in the training sequence, and RESA probes served to evaluate the emergence of fluency as the participant reached the performance criterion for each component step in the training sequence.

Measurement

Frequency of correct sounds per minute and frequency of incorrect sounds per minute were the measures of performance for both training steps and probes. A correct sound was any tap sound with the part of the foot, the rhythm, and the order appropriate for the response target. There were two types of errors: missing sounds and extra sounds. A missing sound occurred when a participant attempted a step but did not contact the floor and therefore failed to make the sound associated with a given response target. Observers also counted a missing step when a step that was part of the sequence did not occur at all. If a participant made an error and restarted a combination, the steps that would have followed were not considered errors. Extra sounds were sounds produced by the dancer that were not appropriate for a given response target. This included sounds that accompanied topographical errors made by the participants.

Researchers used a software program designed to capture and analyze the sound of the taps. The software was also used to provide performance feedback to the participants. The apparatus audio-recorded each tap and counted the total number of taps that occurred during a timing. Because the software could not capture the absence of a tap or count a tap that occurred when it should not have occurred, the experimenter counted the total number of incorrect taps.

The software program provided a prompt after each timing so that the experimenter could enter the number of errors made. The software then calculated the total number of correct responses and the total number of incorrect responses based on the total number of taps recorded and the total number of incorrect taps inputted by the experimenter.

The software gave a visual display of the number of total taps, the number of correct taps, and the number of extra or missing taps. Figure 1 shows an example of the screen the experimenter saw at the end of a timing. The participants would see the screen in the frequency-building phase only. The sound wave produced by the taps would appear in blue. Vertical green lines were overlaid on the sound wave to indicate the position of the taps. The closer the green lines appeared, the faster the participant tapped. The farther apart the lines appeared, the slower the participant tapped. Evenly spaced lines indicated a steady pace, whereas unevenly spaced lines indicated a variable pace.

Fig. 1
figure 1

Screenshot of the software output screen. This is an example of the software’s output screen after a timing. The screen shows what the participant and the experimenter saw at the end of the timing. The vertical green lines overlaid on the blue sound wave represent tap sounds. The output screen also indicates the number of correct responses, the number of missing sounds, and the number of extra sounds

The distance between the green lines revealed whether the participant tapped at the appropriate pace. For example, a shuffle required the participant to perform a syncopation. If the participant executed the timing with accuracy, the pattern of syncopation would appear like the one in Fig. 1, where two green lines (taps) are close together (a syncopated shuffle), followed by a comparatively larger gap, which represents the time between the end of one shuffle and the beginning of the next shuffle.

This display was useful for delivering intertiming performance feedback to the participant. It allowed the instructor to speak specifically to instances in the timing when a participant paused, missed a sound, or made an extra sound. The display could also show where the participant may have slowed down, sped up, maintained, or fell out of rhythm, along with the participant’s performance feedback in terms of the number of correctly executed sounds and the number of missing and/or extra sounds.

Research Design and Analysis

The study used a multiple-probe across-responses single-subject design. Researchers selected this design specifically because the training responses were presumably part of the same operant class, and thus expected the baselines of the untrained responses (as well as the frequencies of the fluency outcome probes) to change as a function of training on other responses in the training sequence (Horner & Baer, 1978). The design included one constant-series control participant who served as an uninstructed comparison to the effects of instruction for application combinations only (Hayes, Barlow, & Nelson-Gray, 1999). All steps entered baseline at the same time. Once the steps met the inclusion criteria, the first pair of training steps entered frequency building, and the next step pair simultaneously underwent weekly probes. Once the training step pair met the performance criterion, it entered RESA probes for retention, endurance, and stability. Application probes on the application combinations were also conducted during the RESA probes phase. Every training pair followed this same progression. The constant-series participant underwent probes for application combinations only.

First-timing data yielded celerations, frequency multipliers, accuracy ratios, and improvement index metrics for training and probe data. Researchers selected first-timing data because these data are the most conservative measure of change across sessions. The authors used CR PrecisionX (CentralReach, LLC, 2019) online software, formerly known as Chartlytics, to generate celeration, first-last frequency multipliers, and improvement indices (also known as the accuracy improvement measure; see Pennypacker, Gutierrez, & Lindsley, 2003, for definitions of these measures). Accuracy ratios represent the directed distance from the incorrect frequency to the correct frequency. Accuracy ratios were calculated by dividing the frequency of corrects by the frequency of errors (Pennypacker et al., 2003). All accuracy ratios in this study were assigned a multiplicative sign because the frequency of errors was never higher than the frequency of corrects. Any additional frequency multipliers that appear in the Results section were calculated by dividing the larger frequency (F1) by the smaller frequency (F2) in the comparison and assigning a sign depending on the direction of change on the chart to get from F1 to F2 (Pennypacker et al., 2003).

The study evaluated retention by calculating the percentage of change from the first data point of the last training session to the first data point in the retention probes for each training step (Bucklin, Dickinson, & Brethower, 2000). The percentage-of-change formula involved dividing the frequency of the training data point by the frequency of the retention probe data point and multiplying by 100. On the graphs, a percentage of change that represents growth in the frequency of corrects from the training phase to the retention phase appears on the positive scale, and a percentage of change that represents decay appears on the negative scale. This measure provides a rapid visual analysis of retention across training responses for each participant.

Procedures

Normative sampling

Ten experienced tap dancers enrolled in advanced tap classes and, either competing on a tap dance team or working as a tap dance instructor, established performance criteria for all the dance step pairs used in the current study. The dancers completed two 15-s timings for all the tap steps and sequences. Then, the researchers calculated the average frequencies for each step based on the 10 dancers’ frequencies and set performance criteria at ranges from 10% above the average to 10% below the average for each step. The inclusion criterion for steps entering training was 60% within the average frequency or less. Table 2 shows the aim ranges and the baseline criterion for each step.

Table 2 Normative Sampling Data

Sessions

The first author served as the instructor and conducted individual sessions that lasted approximately 10–15 min, four times per week, with each participant at the University of Nevada, Reno. All sessions started with scripted instructions that described the step and its important features. Then, the instructor modeled the step for the participant and asked the participant to engage in the step. The instructor provided praise or corrective feedback after the participant performed the step. Once the participant demonstrated the step correctly for one eight-count, timings would begin. When a participant was ready to engage in a timing, the software would begin a countdown with four tones plus a visual display of numerals counting down. The participant began tapping at the fourth tone for the length appropriate for the condition. During timings, the software would simply display a countdown of the timing duration. The software would indicate the end of a timing with another tone. Then the participant either would or would not receive feedback, depending on whether it was a baseline session, a frequency-building session, or a probe session. The number of timings per response varied depending on the session condition. The standard baseline and frequency-building timing length was 15 s. Researchers chose this timing length for several reasons. First, in a natural setting, tap dancers are not expected to perform the same tap step for more than a few seconds. During normative sampling, we discovered that experienced dancers had difficulty sustaining their rates for more than approximately 15 s. Furthermore, the PT literature has specified that practicing in shorter timing lengths (sprints) has the same benefits as practicing with longer timing lengths, and it is more efficient because sprints take less time (Binder, 1996; Haughton, 1980; Kostewicz & Kubina, 2010). Finally, Lokke et al. (2008) also used 15-s timings.

Baseline

Baseline sessions included three 15-s timings for each step. Participants did not receive feedback of any kind during baseline. This phase continued until one or all of the following four conditions were met: (a) the celeration of correct steps was equal to or less than x1.2, (b) variability was equal to or greater than x2.0, (c) there were relatively high rates of inaccurate steps, or (d) frequencies were no greater than 60% of the average frequencies established by the normative sampling group. The authors selected these criteria because baselines with one or more of these features would likely indicate dysfluent performance. Celerations at or below x1.2 are considered practice effects (Kubina & Yurich, 2012). Fluency outcomes typically emerge once frequencies of correct responding stabilize within the fluency aim, so responses with a high degree of variability are unlikely to yield fluency outcomes. Moreover, high rates of errors can also impact the emergence of fluency outcomes, and low error rates are a critical feature of dance performance. Finally, responses occurring at frequencies above 60% of the presumed fluency aim would have little room for growth; celerations naturally flatten as response frequencies approach the optimal frequency.

Frequency building

Following the baseline phase, the steps chosen for training entered the frequency-building phase in pairs. Table 1 shows the pairs and the order in which they were trained for all participants under the heading “Training Pairs.” During frequency-building sessions, participants completed three 15-s timings for each step in the pair they were currently training on. The instructor provided goals before each timing. If the participant met the goal, the instructor provided performance feedback and praise after the timing. If the participant did not meet the goal, the instructor provided performance feedback and corrective feedback. Performance feedback included the total number of correct responses, the total number of incorrect responses, and a brief description of intertiming performance while looking at the visual display provided by the software. Praise involved short and concise statements (not exceeding five words) delivered in an enthusiastic voice that indicated the learner did well. Some examples of praise statements include “You did it,” “That was awesome,” or “Great dancing.” Corrective feedback involved describing the nature of the learner’s errors, modeling a strategy to correct the error, and providing the learner with an opportunity to rehearse the step again before moving to the next timing. The duration of feedback did not exceed 2 min.

  • Goal setting. The instructor established participants’ goals according to their overall performance in the study. The study used “personal best” goals. “Personal best” goals require participants to beat their best performance up to that point in the study (Binder & Sweeney, 2002; Carl Hughes et al., 2007; Lokke et al., 2008; Milyko et al., 2012). Personal best goals could be set for increasing the frequency of correct steps, decreasing the frequency of incorrect steps, or the combination of the two. For example, before a timing, if a participant was training on windshield wipers and the participant’s best performance up to that point had been 100 sounds per minute with no errors with a frequency aim of 234–286 sounds per minute, then the instructor would likely set the goal at 101 sounds per minute for that timing. If the participant beat that goal in the proceeding timing, then the instructor would use this new frequency to set a new goal for the next timing. The instructor reviewed previous performance on the SCC with each participant at the beginning of every session so that participants were aware of their overall performance and the basis for their goal. Visual inspection of learning pictures served to make determinations regarding goals. The instructor would set accuracy-only goals if the participants’ data trends showed accelerations in errors or high variability in errors. If data trends showed a deceleration in errors or low frequencies and low variability in errors, then the instructor would set frequency and accuracy goals.

  • Goal failures. If a participant failed to meet his or her goal for three consecutive timings (i.e., one session), the instructor would implement an intervention. Interventions included giving a prompt, isolating a component of the failed step(s), shortening the timing length, or revising a goal. The instructor would select the intervention based on the nature of the performance that led to a failed goal. For example, a commonly used prompt for participants who had steady but low rates during timings included presenting an auditory model of the speed and pace required to meet a goal via a metronome. Another typical prompt used in the study involved putting a marking such as a colored sticker on the floor that the participant could use as a target. This prompt helped participants make their steps smaller and hence increase the number of steps they could perform in a timing. If high errors occurred due to a specific component of the step, the component was isolated and practiced in the timings. Once the isolated component reached the performance criterion (typically the same as the aim of the step it was a component of), timings of the target step resumed. If a step occurred at a high rate at the beginning of a timing but slowed down at the end of the timing, or vice versa, or if high and low rates occurred sporadically throughout a timing, the timing length was shortened.

    Removing the intervention was contingent on reaching the performance criterion with the modification in place. Once the participant met the criteria for removing the modification, session contingencies reverted to the original procedure. Therefore, participants had to reach the performance criterion without the modification to move on in the experiment.

  • Performance criterion. The frequency-building phase ended when steps reached and sustained frequencies within the performance criterion (i.e., the frequency range demonstrated by the normative sampling participants) across two or more timings for three consecutive sessions. Once they demonstrated this criterion, trained steps entered the RESA probes to ensure the steps had achieved functional mastery.

    On some occasions, however, steps were moved to the next phase before meeting the performance criterion based on improvements in the weekly probes coupled with flattened celerations within 80% of the performance criterion. Performance aims are estimates of mastery that are based on performers who demonstrate proficiency in the skill and are generally thought of as guidelines (Binder, Haughton, & Bateman, 2002; White, 1984). Learners can demonstrate fluency below or above the specified fluency aim. Ultimately, what defines fluency is the demonstration of fluency outcomes (Binder, 1996; Binder et al., 2002). For this reason, and given our small normative sample size, we made data-based decisions guided by weekly probe data for participants whose performance stabilized below the performance criterion to move forward in the training sequence.

  • Weekly probes. While one pair entered frequency building, the next pair in the training sequence entered weekly probes. For example, when Pair 1 (i.e., toe taps and heel taps) entered frequency building, Pair 2 (i.e., tip steps and dig steps) entered weekly probes. Weekly probes occurred in the first session of the week and consisted of two 15-s timings with no programmed feedback. Weekly probe conditions were exactly like baseline conditions because the purpose was to evaluate growth on untrained responses. All pairs in the training sequence underwent weekly probes except for the first pair.

  • RESA probes. Once a training pair reached the performance criterion, RESA probes followed. The recently mastered training pair would enter retention probes. All steps in the training sequence including the recently mastered steps would undergo endurance and stability probes, and the application combinations underwent application probes. Retention, stability, and application probes involved two 15-s timings per probe session. The timing length for retention, stability, and application probes remained the same as the training timing length to avoid confounding the duration of timing with the variables specific to each probe condition. Retention probes occurred after 2 weeks and 4 weeks with no practice following mastery. Endurance probes involved two 30-s timings per probe session. The timing length for endurance probes was double the training length so that there was a clear difference between the duration in training conditions and the duration in endurance probes so that growth in endurance could be more clearly evaluated. During stability probes, participants would listen to a tap dance choreography that was unlike the beat of the step they were performing transmitted through wireless headphones, and would face away from the mirror to simulate a performance environment (distracting conditions). During application probes, participants performed the application combinations. There was no programmed feedback of any kind for any of the RESA probes.

Interobserver Agreement and Treatment Fidelity

The experimenter conducted sessions and collected the error data that the software did not collect. The instructor video-recorded all sessions for the purpose of assessing interobserver agreement (IOA). Trained secondary observers viewed videos of the sessions and collected the data independently of the instructor. The study collected IOA for 25% of all sessions per step, per participant. Researchers calculated IOA by dividing the number of exact agreements for each timing by the total number of agreements plus disagreements and multiplying by 100%. The average IOA for each step is as follows: toe taps, 88.8% (range 83.3%–91.7%); heel taps, 84.7% (range 72%–93.3%); tip steps, 97.5% (range 90%–100%); dig steps, 84.2% (range 63.3%–93.3%); shuffles right, 67.1% (range 36.7%–61.3%); shuffles left, 59.1% (range 36.7%–80%); shuffle steps, 71.7% (range 61.7%–76.7%); windshield wipers, 80.4% (range 68.3%–86.7%); Exercise 6, 70% (range 33.3%–100%); and Exercise 8, 52.7% (range 30%–63.3%).

The high degree of variability in IOA for errors is a limitation of this study. It is virtually impossible, however, to manually count responses that occur at extremely high rates and do not leave a permanent product (e.g., the 6th log of the SCC, which ranges from 100 to 1,000 per minute). Future research could target methods for improving the measurement of high-rate behavior. Advances in technology may remove this limitation in future studies.

Checklists that included the steps required of the instructor for each type of session (baseline, probe, frequency building) were developed to assess treatment fidelity. Trained observers completed the checklists for 25% of all sessions. On the checklists, “Y” indicated a correct procedural step, “N” indicated an incorrect step, and “N/A” indicated the step was not applicable during the session. Dividing the total number of Y selections by the total number of Y plus N selections and multiplying by 100% yielded a fidelity score for each session. On average, the instructor demonstrated 99.5% fidelity (range 82.2%–100%) regarding the implementation of study procedures.

Results

All experimental participants, except for Berry, met inclusion criteria and underwent training for all tap pairs. Berry did not meet the inclusion criteria for the first pair. His frequencies at baseline for these steps were nearly at aim and highly accurate. Therefore, Berry’s training began with the second pair in the training sequence. Each participant generated over 20 charts including training and probe data, an amount too expansive to include in its entirety. We have provided a sample of Tina’s data to demonstrate the design. However, we have included all data generated by weekly probes on untrained steps and RESA probes.

Sample Training Data

Figure 2 shows a portion of Tina’s training data. The panel shows data for four of the eight training steps. Note that each step shown in this panel went through the training sequence with its pair. For example, training for toe taps occurred simultaneously with heel taps, tip steps with dig steps, shuffles right with shuffles left, and shuffle steps with windshield wipers.

Fig. 2
figure 2

Sample of Tina’s training data. The daily per-minute charts represent Tina’s training data for one half of the training pairs. Dots represent frequencies of correct responses, and x’s represent frequencies of errors. The yellow bands across the charts represent the performance criterion. Horizontal black lines represent celerations. Vertical blue lines represent phase changes, and vertical red lines represent intervention changes

Tina met the inclusion criteria for training for all steps in the training sequence. She had decelerations in the frequency of corrects for heel taps (/1.51), dig steps (/1.31), and shuffles left (/1.27). For windshield wipers, her acceleration of corrects was less than x1.2 (x1.13). Her frequencies were also below 60% of the performance criterion. Heel taps entered frequency building and, after an initial acceleration, met criteria for intervention. The instructor reduced the timing length to 10 s. Once Tina met the performance criterion, the intervention was removed, and heel taps returned to 15-s timings. Retention probes showed frequency of corrects above the performance criterion with no errors.

Dig steps entered weekly probes at the same time as heel taps entered frequency building. Weekly probes for dig steps showed an acceleration of corrects (x1.13) and a deceleration of errors (/1.39). Once dig steps entered frequency building, Tina met the performance criterion within eight sessions without interventions. Retention probes showed maintenance of frequencies above the performance criterion for this step as well.

Shuffles left also showed improvements in celeration of corrects during weekly probes (x1.18) but simultaneously showed increases in errors (x1.13). To avoid shaping errors, the instructor set accuracy-only goals at the beginning of the frequency-building phase, which produced a deceleration of /1.49 in errors. However, Tina experienced several more interventions before she met the performance criterion for shuffles left. Shuffles left took notably longer to reach the performance criterion than previous steps. This pattern was common across all experimental participants for both shuffles left and right. All participants also met the criteria for interventions for both steps.

The data for weekly probes for windshield wipers show an initial level change from baseline to weekly probes. The frequency multiplier from the last data point of baseline to the first data point of weekly probes equals x1.66. After this initial jump up, the celeration of corrects hovered close to the performance criterion, and errors remained low. To demonstrate the design, we have displayed the weekly probes on the daily per-minute chart here. However, during the experiment, the instructor monitored these on weekly per-minute charts. The next section includes all data for weekly probes across participants.

Weekly Probes

Figure 3 shows participants’ celeration collections for weekly probes, and Table 3 shows celeration values. Daisy, Andy, and Tina had the greatest accelerations of frequency of corrects for the second pair in the training sequence (tip steps and dig steps). Additionally, Tina had a substantial deceleration in errors for dig steps (/3). Because Berry started training with tip steps and dig steps, there were no weekly probes for these steps. Daisy and Berry showed very flat celerations for the third pair (shuffles right and shuffles left), and Daisy had an acceleration of x1.52 for errors on shuffles left. Andy only had two probe data points for the third pair, which was not enough to generate celeration values.

Fig. 3
figure 3

Weekly probes celeration collections. Solid lines indicate the celeration of correct steps, and dashed lines indicate the celeration of errors

Table 3 Celeration Values for Weekly Probes

Tina, on the other hand, demonstrated significant improvement on the third pair. Her shuffles right and shuffles left both had accelerations of x1.61 for frequency of corrects and /5.21 and /4.26, respectively, for frequency of errors. Andy showed some gains in accuracy for the fourth pair, with decelerations in errors for shuffle steps (/1.49) and windshield wipers (/1.22). Otherwise, there were no other major improvements across celerations for the rest of the participants. However, it is important to note that participants took much longer to master shuffles right and shuffles left in comparison to the previous steps, which extended the amount of time shuffle steps and windshield wipers were in the weekly probe phase, therefore naturally flattening celeration. Some participants did have jump-ups in frequency from baseline sessions to weekly probes for these steps. Tina’s windshield wipers (as seen previously) were one example of this. She had a x1.66 frequency multiplier from the last baseline data point to the first probe data point. Andy and Daisy both had frequency multipliers of /4 for errors on shuffle steps.

Retention

Figure 4 presents the percentage of change in frequency from the last training data point to the first retention data point for the eight training steps for each participant. Data points falling below the 0 line indicate a decrease in frequency, whereas data points above the 0 line indicate an increase in frequency during the first retention probe. The smaller the difference, the more stability at retention for that step.

Fig. 4
figure 4

The percentage of change from each participant’s last frequency in training to the first retention probe 2 weeks after termination of training. Data above 0 indicate a percentage increase in frequency, whereas data below 0 indicate a percentage decrease

All four participants demonstrated high levels of retention across trained steps. Daisy, Tina, and Berry exhibited the most stable patterns of retention with the most steps falling above the 0 line, indicating that the change in frequency was an increase from training rather than a decrease. Though Andy and Daisy had the most steps with a loss in frequency at retention, these were all below a 15% loss. In fact, no step across all four participants suffered more than a 14% loss in frequency from the last training frequency to first retention probe frequency.

Stability

Figure 5 shows celeration collections for stability probes for all eight training steps across participants, and celeration values can be found in Table 4. All participants showed an acceleration in the frequency of corrects across all stability probes. With few exceptions, most also experienced decelerations in errors or maintained stability in low rates of errors across steps.

Fig. 5
figure 5

Stability probes celeration collections. Solid lines indicate the celeration of correct steps, and dashed lines indicate the celeration of errors

Table 4 Celeration Values for Stability Probes

Daisy had decelerations in errors across all steps except for dig steps (x1.49) and windshield wipers (x1.98), and she maintained low errors on toe taps (x1). Andy also maintained low errors across toe taps, heel taps, and tip steps and showed decelerations for dig steps, shuffles left, and windshield wipers. Tina had the steepest accelerations for frequency of corrects overall. Six out of eight steps had accelerations higher than x1.5. However, it is important to point out that her baseline frequencies were substantially lower than those of the other participants, giving her more room to grow toward the aims. Berry most notably had accelerations in errors on various steps (shuffles right, shuffles left, shuffle steps, windshield wipers). These steps had consistent errors throughout Berry’s training as well. Nonetheless, Berry demonstrated decelerations in errors in three steps (heel taps, tip steps, dig steps) and maintained a celeration of x1 for toe taps during stability probes.

Endurance

Figure 6 displays the celeration collections for endurance probes, and Table 5 displays the associated values. All four participants showed improvements in endurance probes. Daisy and Andy showed a very consistent acceleration in the frequency of corrects. Daisy also showed decelerations in the frequency of errors across all steps except two (tip steps and shuffles left). Andy showed acceleration in the frequency of errors across three steps: heel taps (x3.97), shuffles right (x1.26), and shuffles left (x1.27). Tina and Berry started with higher rates of errors during endurance probes. For various steps, both participants showed a deceleration in errors and an acceleration in the frequency of corrects. Specifically, Tina showed this pattern for toe taps, tip steps, dig steps, and shuffles left. Berry showed this pattern for dig steps, shuffles right, shuffles left, shuffle steps, and windshield wipers. Berry’s celeration of errors was x1 for toe taps and heel taps because he never made any errors during these probes.

Fig. 6
figure 6

Endurance probes celeration collections. Solid lines indicate the celeration of correct steps, and dashed lines indicate the celeration of errors

Table 5 Celeration Values for Endurance Probes

Application

Figure 4 contains celeration collections for application probes, and Table 6 contains the values. Except for one exercise for Tina, all other application probes for the four participants showed improvements. Daisy demonstrated the steepest accelerations in the frequency of correct steps and the steepest decelerations in the frequency of incorrect steps. Andy also showed acceleration of frequency of corrects and deceleration of frequency of errors, though they were not as steep as those of the other participants. Tina had the shallowest accelerations in corrects (x1.25 for Exercise 6 and x1.08 for Exercise 8) and acceleration of errors for Exercise 6 (x1.48) yet she showed sizable decelerations in errors for Exercise 8 (/2.25). Berry also did not demonstrate robust accelerations in the frequency of corrects (x1.33 for Exercise 6 and x1.12 for Exercise 8); however, his starting frequencies were higher than those of the other participants. He did show the greatest decelerations in errors, nonetheless (/31.7 for Exercise 6 and /4.38 for Exercise 8).

Table 6 Celeration Values for Application Probes

Constant-Series Control

Jack served as the constant-series control for application probes. Researchers yoked his schedule of measurement to Berry’s schedule of probes because they were of a similar age, had similar histories of sports and music exposure, and no experience with dance instruction. Figure 7 shows celeration collections for Jack’s constant-series probes beside those of the experimental participants, and Table 6 contains additional measures.

Fig. 7
figure 7

Application probes celeration collections for each experimental participant and the probes of the constant-series participant (Jack) for the same combinations as application probes. Solid lines indicate the celeration of correct steps, and dashed lines indicate the celeration of errors

Jack showed improvements in both probes. Exercise 6 had an acceleration of x1.95 for corrects and a deceleration of x1.52 for errors, whereas Exercise 8 had an acceleration of x2.01 for corrects and a deceleration of /5.48 for errors. For both combinations, Jack showed greater accelerations in the frequency of corrects than Berry. He also showed greater deceleration of errors for Exercise 8 than Berry. Note that Berry may have experienced ceiling effects because his baseline frequencies for these exercises were in a higher cycle closer to the performance criterion, making the comparison between Jack’s and Berry’s celeration values difficult.

Because Tina and Jack had similar baseline frequencies, researchers compared their celerations to provide additional insight. Jack had a higher acceleration of frequency of corrects than Tina on both combinations. Both Tina and Jack had accelerations in errors for Exercise 6, but Tina had a lower celeration value than Jack (x1.48 vs. x1.52) and maintained fewer errors compared to Jack. Tina’s errors did not exceed 8 per minute, whereas Jack’s were as high as 20 per minute. Both had decelerations for Exercise 8. Jack’s deceleration of errors (/5.48) was steeper than Tina’s (/2.25). There is also a limitation to this comparison because Tina and Jack did not have the same schedule of probes, which also impacts celeration values. Tables 7 and 8 show several measures for Exercises 6 and 8 for experimental participants and Jack that can provide a basis for further comparison.

Table 7 Comparison Metrics for Experimental Participants and Control Participant: Exercise 6
Table 8 Comparison Metrics for Experimental Participants and Control Participant: Exercise 8

Despite his improvements, Jack never reached 100% accuracy on either combination. All experimental participants had greater accuracy ratios than Jack for both Exercises 6 and 8. With the exception of Tina on Exercise 6 and Andy on Exercise 8, all other first-last frequency multipliers for experimental participants showed greater deceleration of frequency of errors than Jack’s. These measures show that in the majority of cases, experimental participants gained greater degrees of accuracy than Jack. Berry, Daisy, and Andy had greater improvement indices for Exercise 6 than Jack. However, he outperformed all experimental participants on this measure for Exercise 8.

Discussion

The main purpose of this study was to evaluate fluency outcomes as a function of building the frequencies of component responses with a tap dance training sequence. Weekly probes served to monitor potential improvements of untrained components. Results showed some facilitative effects during weekly probes, primarily for the second pair in the sequence (i.e., tip steps and dig steps). Tina also demonstrated significant improvements in celeration for the third pair (i.e., shuffles right and shuffles left) during weekly probes, suggesting some facilitative effects of training of the previous pair. The results of the retention probes also indicated that retention emerged as a function of training steps to optimal rates. Retention probes showed little decay in performance after 2 weeks with no practice. Stability and endurance also seemed to have emerged for most steps across participants. General trends for stability and endurance probes showed accelerations in the frequency of corrects and decelerations of errors. Application is more difficult to interpret given the results of the control participant (i.e., Jack). Though the experimental participants all demonstrated gains in application probes, so did Jack. In most instances, he outperformed experimental participants in terms of celeration. Experimental participants, however, did outperform Jack in terms of accuracy. Anecdotally, the experimental participants also showed unmeasured improvements in the aesthetic features of the application combinations, including clarity of sound, volume of sounds, range of motion of the feet, and posture, which are important in the performing arts. Notwithstanding, there were several differences between the control participant and the experimental participants that may have presented confounds, making comparisons difficult.

There were several uncontrolled variables in the constant-series condition that impacted celeration values and presented inequities in the comparison. First, the control participant’s frequencies resided in a lower cycle of the SCC than those of his experimental participant counterpart. Due to the SCC’s logarithmic nature, celeration values decrease in higher cycles even if there is an equal amount of absolute change. This makes it difficult to compare growth across participants if their frequencies are not in the same cycle. Second, the control participant experienced probes within a shorter period than did all the experimental participants except his experimental counterpart, who shared his probe schedule. This also impacts celeration values on the SCC and, again, does not allow for a fair comparison with the experimental participants who did not share the control participant’s probe schedule. Future studies can improve on these limitations by setting fixed probe schedules (i.e., monthly, biweekly) and ensuring that control participants and comparison experimental participants have similar baseline frequencies.

A secondary purpose of this study was to provide an example of how one can apply the PT framework to constructing, monitoring, and perfecting instructional sequences for a variety of repertoires, including those of interest in sports. This study was a first attempt at building a training sequence for tap dancers based on a component-composite analysis. Though the study did not produce an ideal training sequence, it did yield important discoveries regarding fluency outcomes. It also provides an example of how a PT approach to training can reveal inadequacies in the instructional sequence. For example, participants took substantially longer to reach the performance criterion for one of the pairs, which highlights where a change should be made. Perhaps breaking down these steps into smaller components would have helped participants master them faster. This may have also facilitated improvements in untrained components before they reached the frequency-building phase. These are the types of speculations that are prompted when precision teachers are continuously engaged with the data of their learners. The process of pinpoint, record, change, and try again leads precision teachers to discoveries regarding the effectiveness and efficiency of their instructional arrangements. The main contribution of PT to sports and any other area is a process by which one can systematically and continuously evaluate and improve training using a sensitive dimensional unit of measurement (rate of response) and a standard real-time display (SCC) that allows for effective data-based decision making in service of learners.

Limitations aside, the present study also contributes to the literature on PT and fluency instruction in several ways. First, it adds to the small yet growing body of evidence regarding sports performance in general and dance instruction in particular. Second, it adds to the PT literature on psychomotor learning, which is limited not only in size but also in the types of behaviors examined until now. It is also among a handful of studies that have examined a repertoire of behavior that resides in the sixth cycle of the SCC. Most of the published research in PT has been with behaviors that reside within the fourth and fifth cycles, which equate to approximately 1 to 100 responses per minute. There is comparatively less research on responses such as tap dancing that can occur above 100 responses per minute or below 1 response per minute. Finally, the use of a software program to count dance steps is the first of its kind. With refinement, this tool may be useful as an electronic application for adding objective, rate-based feedback for tap dancers and instructors alike.

We encourage future research to focus on exploring the basic behaviors that make up complex gross and fine motor movements and the order of training that yields the most efficient learning and maintenance, both in sports and in other motor movement domains. PT provides a model for instructional design that focuses on building behavioral repertoires fluently from the ground up with an emphasis on continuous progress monitoring and data-based adjustments (Johnson & Street, 2013). Dance and sports training, in general, can benefit from approaching behavioral intervention from the instructional design perspective of PT. This is a prime opportunity to investigate whether PT and fluency-based instruction can impact sports performance as it has education. Though this study and the few that came before it are a start, there is much work to be done before we understand what fluency means when it comes to motor behavior and how we can capitalize on this knowledge.