Introduction

The number of individuals diagnosed with autism has increased greatly in recent years. A review by the Center for Disease Control’s Autism and Developmental Disabilities Monitoring Network (ADDM) recently found that in the 8-year span between 2002 and 2010, the number of 8-year-old children diagnosed with autism increased by 120 % from approximately 1 in 150 children in 2002 to 1 in 68 children in 2010 (Wingate et al. 2014). Accompanying the increase in the number of children diagnosed is an increased demand for services to assist these individuals who frequently display difficulties with language and other learning skills. A review of current autism treatments conducted by the National Autism Center (2009) found that the vast majority of autism treatments which are established (i.e., they are effective and empirically supported) were developed from behavioral interventions and had the most empirical support.

Applied behavior analytic (ABA; Baer et al. 1968) treatments have been proven to be effective at minimizing or even correcting deficits experienced by individuals with autism (Lovaas 1987; Foxx 2008). Although these behavior analytic methods are frequently incorporated in both instructional and behavior plans, a structured curriculum incorporating behavior analytic technology may be a more effective and efficient way to implement this type of therapy in some applied settings. Although many protocols rooted in ABA have been developed and are currently being implemented in applied settings, few of these protocols have published reliability, validity, or effectiveness data to date (Gould et al. 2011).

One example of psychometric testing designed to assess the validity of an assessment or curriculum is establishing normalized references for the assessment. In the case of a test designed to be used with individuals with developmental disabilities, a normalized sample would include typically developing individuals of the same age range. The use of normative samples allows for comparisons to indicate how one individual’s performance on a test matches up to the performance of typically developing peers. These outcomes can be used both to assess typical development for its own sake and to establish a reference for comparison to identify deficits in performance among various subpopulations (Gregory 2011). For instance, the use of a normative comparison can allow an assessor to see how much progress an individual has made following intervention, and how close they are to attaining the requisite skills to demonstrate a repertoire that is equivalent to that of a typically developing peer (Walker and Hops 1976). Despite the large number of available behavior analytic assessment tools, to date only one has published any normative data in a peer-reviewed journal (Dixon et al. 2014b) highlighting the need for further analysis of current behavior analytic tools.

One important feature of successful behavior analytic technologies in pursuit of increasing the skills of individuals with disabilities to be closer to that of their typically developing peers is the use of environmental variables in the promotion of language and learning skills acquisition, which can in turn improve the effectiveness of teaching situations (Dixon et al. 2014b). The Promoting the Emergence of Advanced Knowledge Relational Training System (PEAK; Dixon 2014a, b, 2015) was designed to encompass these necessary features and technologies to provide individuals working with developmentally disabled populations an assessment and curriculum guide aimed at promoting cognitive and language development. PEAK consists of a total of four modules, each containing 184 programs that are in an order of complexity from least to greatest. These modules: direct training, generalization, equivalence, and transformation focus on training verbal relations ranging from simple vocalizations to complex language (e.g., metaphorical responding, problem-solving, and labeling emotion) in accordance with Skinner’s (1957) account of verbal behavior, as well as other academic and cognitive skills. In addition to focusing on a wide array of abilities, the modules incorporate several different methodologies from the behavior analytic literature such as discrete-trial training, stimulus equivalence (Sidman 1971), and relational frame theory (Hayes et al. 2001). The first of the four modules, PEAK Direct Training Module (Dixon 2014a), emphasizes training language and learning skills via discrete-trial teaching. This module focuses on directly training skills by providing reinforcement for correct answers and correction for incorrect answers.

Previous research on the PEAK Direct Training Module has yielded promising results. Correlational analyses between the PEAK Direct Training Assessment tool and other standardized language assessments and intelligence tests have consistently yielded strong and significant correlations indicating good convergent validity (Dixon et al. 2014c, d; McKeel et al. 2015b). Further research on the PEAK Direct Training Module has investigated its underlying structure and internal consistency using a principal component analysis. The analysis indicated that the PEAK Direct Training Module can be broken down into four factors: foundational learning skills, perceptual learning skills, verbal comprehension skills, and verbal reasoning, memory, and mathematical skills (Rowsey et al. 2014). Along with measures of reliability and validity, the PEAK Direct Training Module has been demonstrated to be an effective training tool. McKeel et al. (2015c) used a single-subject design to demonstrate the acquisition of several complex verbal skills in individuals with autism using the PEAK Direct Training Module. In a larger group study, McKeel et al. (2015a) utilized a randomized controlled trial to assess the relative efficacy of the PEAK Direct Training Curriculum as compared to students’ typical special education classroom curriculum in increasing PEAK Direct Training Assessment scores and found that instruction using the PEAK Direct Training Curriculum was more effective. When compared with other behavior analytic assessment tools such as the VB-MAPP (Sundberg 2008), it appears that although both tools are effective, the PEAK system may provide a stronger measure of more complex and advanced language and learning skills (Dixon et al. 2014a). Taken together, these results indicate that the PEAK Direct Training Module is both valid and reliable as an assessment tool, and that the curriculum provides an effective procedure to train skills to individuals with autism who may have deficits in language and learning abilities.

To improve the psychometric strength of the direct training module further, Dixon et al. (2014b) developed a normative sample. They acquired PEAK Direct Training Assessment scores from 206 typically developing individuals and compared those scores with a sample of 94 individuals with autism. For their first study, Dixon et al. (2014b) administered the PEAK Direct Training Assessment to each of the typically developing participants. The total scores of these assessments were then compared with the age and gender of the participants using both visual and statistical analyses. Subsequent testing of the normative sample included assessing the distribution of total PEAK scores across two-year age groups and assessing the age at which 80 % of participants within a two-year age group were able to demonstrate mastery of each of the 184 items in the PEAK Direct Training Assessment. The results indicated a nonlinear relationship between PEAK Direct Training Assessment total score and age whereby a rapid acquisition of language and cognitive skills occurs before the age of eight in typically developing children. After age 8, a ceiling effect was discovered for the normative sample as most of the sample above the age of 8 had a maximum score on the PEAK Direct Training Assessment. In their second study, Dixon et al. (2014b) administered the PEAK Direct Training Assessment to each of the participants with autism. As with the normative sample, the results of the assessment for the sample with autism were then compared to the participants’ age and gender. In addition, the total PEAK Direct Training Assessment scores were grouped into eight groups of 23 programs (0–23, 24–46, …, 162–184) and the percent of participants in both the normative sample and the sample with autism were analyzed to determine the distribution of PEAK Direct Training Assessment total scores in both groups. Contrary to the findings in the normative sample, the sample with autism did not display an orderly relationship between PEAK Direct Training Assessment scores and age, nor was the same ceiling effect demonstrated. The results of the analysis of the distribution of total PEAK Direct Training Assessment scores for both samples indicated that the majority of the sample with autism demonstrated mastery of between 0 and 23 items; however, the majority of the normative sample demonstrated mastery of between 162 and 184 items. Overall, the results of the two studies indicated that the normative analysis was successful in both identifying the abilities of a typically developing population as they relate to participants’ ages and providing a method for comparing the performance of individuals with autism to their typically developing peers.

Generalization is an important feature of human learning and is one of the core features of ABA. The term generalization can be used both as a description of a behavioral process and as a term for a behavior change procedure (Cooper et al. 2007). Cooper and colleagues provided three types of generalized behavior change: response maintenance, setting/situation generalization, and response generalization. An example of response maintenance is when an individual learns to use a tool to perform a task and then continues to do so when the training has been terminated. An example of a setting/situation generalization is when the learner acquires a skill in one setting (e.g., the skill to operate airplane in a simulator) and is able to apply that skill in a novel setting (e.g., the learner then uses that training to fly an actual airplane). Finally, response generalization is when the learner engages in a behavior that has not been specifically trained before and provides the same outcome as the trained behavior. An example of this is when a person learns to take notes on ideas that she has, and then switches to dictation. Researchers have successfully demonstrated a variety of different methods to promote generalization (for a review, see Chandler et al. 1992), though few curricula provide a structured method of developing a generalized repertoire in learners and the lack of normative samples for generalization makes it difficult to investigate the developmental trajectory of the acquisition of generalization skills (Bailey and Burch 2002). In addition to determining the manner in which generalization skills emerge in a typically developing population, normative comparisons can allow clinicians to determine when it is appropriate to move from directly training skills to instruction geared toward promoting a repertoire that includes generalized operant responding.

Though the importance of generalization is well known, current research on the PEAK Relational Training System has only investigated normative samples of the Direct Training Module. In order to assist clinicians in providing the most effective and efficient treatment, the development of a normative sample for the generalization module is warranted. The purpose of the current research was to replicate and extend the study by Dixon et al. (2014b) by conducting two studies in which the first study developed a normative sample for the PEAK Generalization Module and the second study compared a sample of individuals with autism to the normative sample based on their scores on the PEAK Generalization Assessment.

Experiment 1: Normative Sample of the PEAK Generalization Module

Materials and Methods

Participants

The participants in the current study included 183 children (98 males, 85 females) with no previous diagnoses of autism or any other developmental or intellectual disabilities. The participants were recruited by graduate students at Southern Illinois University based on a priori relationships (i.e., the students recruited children from friends, family, or other associates with whom they had previously interacted). All children that the recruiters knew were given the opportunity to participate. The ages of the participants ranged from 1 to 21 years of age (M = 8.38, SD = 4.39) and were located across several US states. Table 1 contains identification of the number of participants by age.

Table 1 Descriptive statistics and for PEAK total scores across age groups

Materials

The PEAK Generalization Assessment was used as a measure of language and learning skills. The assessment consists of 184 items covering a wide continuum of skill difficulties including prerequisite learning skills (e.g., sharing, turn-taking), vocal skills (e.g., imitation of sounds and words), writing skills (e.g., writing spoken words, writing letters), math skills (e.g., basic counting, addition), problem-solving (e.g., balancing weights, packing a container), and advanced verbal skills (e.g., metaphors, rhyming). Each item requires that the participant be able to identify novel items without direct training. For example, if a participant could identify a picture of a truck, several pictures of trucks would be tested to ensure that the participant is able to generalize the tact of “truck.” The assessment itself contains labels and descriptions of each of the 184 items with checkboxes next to each item to indicate that, “yes” a child can perform the required response, or, “no” they cannot. The time to complete the assessment ranged from 15 to 70 min due to the varying abilities of the individuals being observed. That is, individuals who had more advanced skills would take longer to assess than those who had less skills the assessor needed to rate.

Procedure

The PEAK Generalization Assessments were conducted by trained graduate students who were familiar with the participant being assessed as recommended in the PEAK Generalization Instructions (Dixon 2014b). Assessors evaluated each of the participants by first indirectly scoring the skills which they knew the participant could or could not do based on previous experience and observation. Next, the assessor would directly test any skill that they were not sure the participant either could or could not demonstrate. Testing consisted of following the instructions from the PEAK Generalization Curriculum for the specific skill in question. For example, if the assessor was uncertain whether the participant could demonstrate measuring for recipes, they would provide the participant with a recipe and an array of measuring objects. If the performer was able to independently complete the skill, the assessor would score a “yes” on the assessment; otherwise, the assessor would score a “no.” If the assessor did not respond for a certain item (i.e., they did not score a “yes” or “no”), then the skill was scored as a “no” for the purpose of data analysis. Finally, the total number of items scored as “yes” was then tallied to comprise a total PEAK Generalization Assessment score ranging from 0 to 184. Prior to the onset of the assessment, participants were offered preferred items or activities which would serve as potential reinforcers following the completion of the assessment session. Participants were offered preferred items or activities following the completion of the assessment session prior to its onset. In addition, children were provided intermittent reinforcement in the form of verbal praise or small preferred items for participation; however, this reinforcement was not contingent in any way on performance on the assessment itself. The schedules of reinforcement for each participant varied based on their individual ability to sit and attend for extended durations.

Data Analyses

The Relationship Between PEAK and Age

Several statistical and visual analyses were conducted to assess the relationship between the participants’ PEAK Generalization Assessment scores and age. First, Pearson’s correlation coefficients were calculated to assess the relationship between PEAK Generalization Assessment score, age, and gender. Although the primary relationship of interest was the relationship between the PEAK score and age, gender was included in the analysis to assess whether or not it acted as a moderating variable. Second, a scatterplot was constructed to visually assess the relationship between age and PEAK Generalization Assessment score. Regression analyses were then conducted to fit several functions to the scatterplot and to determine the trajectory of skill acquisition as age increased (see Table 2). The regression analyses were then compared based on their respective goodness of fit and statistical significance. Finally, a one-way ANOVA was conducted to determine whether a statistically significant difference in PEAK Generalization Assessment scores for various ages existed based on the regression which demonstrated the best fit to the data.

Table 2 Summary of regression lines and their goodness of fit when applied to the relationship between score on the PEAK Generalization Assessment and age
PEAK Distribution Within Age Groups

To compare the distribution of PEAK Generalization Assessment scores within age groups, the participants were sorted into groups of 2 years based on their age (i.e., 1–2, 3–4, 5–6) to ensure that the number of participants in each group was sufficient to allow statistical comparison. The final group which was comprised of participants over 19 years of age included more than a 2-year range to ensure that the number of participants in this group was comparable to the other groups. For each group, the total PEAK Generalization Assessment scores were assessed by looking at the groups’ means, standard deviations, kurtosis, and skewness to determine the ages which benefit the most from the assessment and curriculum tools of the PEAK Generalization Module.

PEAK Item Analysis Across Age Groups

To evaluate how specific skills develop in relationship to age in a typically developing population, an item analysis across age groups was conducted. The participants were split into the same age groups as in the examination of the PEAK distribution within age groups. For each age group, the percentage of group members who demonstrated mastery of each skill on the PEAK Generalization Assessment was calculated. Next, the earliest age at which greater than 80 % of participants demonstrated the skill was recorded, and each item was categorized accordingly. The criterion of 80 % was selected to ensure a stringent criterion which also accounted for potential outliers. By providing information on the age at which each skill develops, this assessment was intended to allow practitioners to evaluate their clients’ abilities as compared to the normalized sample for specific skills.

Results

The Relationship Between PEAK and Age

A Pearson’s correlation analysis was conducted to determine the relationships between PEAK Generalization Assessment scores, age, and gender. The mean PEAK Generalization Assessment score was 133.39 (SD = 50.77), the mean age was 8.38 (SD = 4.39), and there were 98 males (53.55 %) across all participants. The results of the Pearson’s correlation analysis indicated a strong positive correlation between PEAK Generalization Assessment scores and age (r = .753, p < .01), but no significant correlations between age and gender (r = −.065, p = .381) or between PEAK Generalization Assessment scores and gender (r = −.054, p = .468), suggesting that gender had no effect on the outcome of the PEAK assessment. Figure 1 displays a scatterplot of the relationship between PEAK Generalization Assessment scores and age. A visual analysis of the scatterplot confirms a positive relationship between the two variables such that as age increased, so did PEAK Generalization Assessment scores until approximately the age of 13 where there is an apparent ceiling effect where nearly all of the participants in that age group appear to have gained all of the skills tested in the PEAK Generalization Assessment.

Fig. 1
figure 1

Relationship between total PEAK Generalization Assessment score and age for the normalization sample

Several regression lines, including both linear and nonlinear, were fit to the scatterplot to determine the trajectory of the participants’ scores on the PEAK Generalization Assessment. Table 2 presents a summary of the various regression lines including their associated R-square values and the significance of their fit. Although all of the attempted regression lines produced a statistically significant fit (p < .01), a visual inspection of the data indicates that the relationship between PEAK Generalization Assessment scores and age is not linear. This is reflected in the R-square scores where the cubic regression produced the highest score and the linear regression produced the lowest score. Using this information, an ANOVA was conducted based on the cubic regression with the results suggesting a significant difference between the respective average PEAK Generalization Assessment scores and age [F(3, 179) = 143.223, p < .001].

PEAK Distribution Within Age Groups

Figure 2 displays the results of the PEAK Generalization Assessment scores by age group. The boxplot displays the means, standard deviations, and standard errors, and outliers for each respective age group. A visual inspection of the means indicates that as age increases, so do PEAK Generalization Assessment scores. In addition, as age increases, the variability (standard deviations and standard errors) also seems to decrease, indicating that younger children may display more varied skill repertoires which converge as they get older. In addition to the boxplot, Table 1 summarizes the means, standard deviations, kurtoses, and skewness for each of the age groups. The lowest mean was associated with the 1–2-year-old age group (M = 34.26, SD = 23.47) and the highest standard deviation was associated with the 7–8-year-old age group (M = 133.62, SD = 39.95). The group demonstrating the highest mean and lowest standard deviation was the group of ages 19 and up (M = 183, SD = 1.73). The kurtoses of the age groups ranged from −1.59 to 6.49; however, the large kurtosis scores only occurred in higher age groups. Generally, kurtosis scores between −2 and 2 are representative of a normal distribution, whereas scores outside of this range indicate that data may be clustered along one of the tails of the distribution. Both the 9–10- and 13–14-year age groups had a kurtoses of around 2 (2.03 and 1.96, respectively); however, it was not until ages above 15 that kurtoses became more extreme with the 15–16-year age group having a kurtosis of 4 and the 17–18-year age group having a kurtosis of 6.49. This may reflect the previously observed ceiling effect on PEAK Generalization Assessment scores as age increases. The skewness of the age groups was primarily negative, with only the 1–2-year age group having a positive skewness. As with kurtosis, skewness scores between −2 and 2 represent a generally normalized distribution. And again, as with kurtosis, only the older groups had high skewness. Additionally, only the 17–18-year age group had a skewness outside of the −2 to 2 range (−2.53).

Fig. 2
figure 2

Boxplot of total PEAK Generalization score by age group. Horizontal lines represent the mean PEAK Generalization score, boxes represent 1 standard deviation, and whiskers represent 1 standard error. Circles represent scores that fall 2 standard deviations from the mean, and asterisks represent scores that fall greater than 2 standard deviations from the mean

PEAK Item Analysis Across Age Groups

Figure 3 displays the cumulative number of skills acquired at each age group, and a list of the individual PEAK Generalization Assessment items which were acquired at each age group is provided in Table 3. For the 1–2-year age group, mastery of only four items was demonstrated by 80 % or more of the participants. These included generalized motor imitation, peer-modeled echoic play, creativity: dance, and responding to facial expressions. The subsequent skills which 80 % of the 3–4-year age group were able to demonstrate mastery of additional 35 programs to the cumulative number of skills representing a 775 % increase in the skills acquired during this age range. These skills included tacting nonidentical animals, asking what, receptively, identifying nonidentical clothing, editing actions: correcting others, intraverbals: small talk, and others. The 5–6-year age group demonstrated mastery of an additional 26 skills for a cumulative total of 65 skills that participants were able to demonstrate mastery of by the age of 6. These skills ranged from math: sorting and counting by group, matching numbers and letters, and sounding out letters within a word to asking how, intraverbally identifying words by function, and transcribing past and future events. The 7–8-age group demonstrated mastery of 17 additional skills. These skills included identifying who can see an item based on perspective, tacting pictures following and delay, and requesting additional items when presented with a task and given insufficient materials. The largest increase in the number of skills for which an age group demonstrated mastery of occurred with the 9–10-year age group who demonstrated mastery of an additional 70 skills (an increase of 85.37 % of the cumulative total for all previous age groups) for a cumulative total of 152 skills mastered by the age of 10. These skills included items such as identifying sarcasm, receptively, identifying sensory feelings, and simple action metaphors. The age group 11–12 demonstrated mastery of an additional 14 skills including weight measurement, math and time, and intraverbal: rhyming poems. The 13–14-year age group demonstrated mastery of a further 15 skills for a cumulative total of 181 (98.37 % of the total PEAK Generalization Assessment skills) skills by the age of 14. These skills included measuring with objects, pronouncing digraphs, and labeling music genres. Finally, the 15–16-year age group demonstrated over 80 % mastery of the final 3 skills which included labeling directions with a compass rose, varying degrees of measurement, and problem-solving. For a complete list of the items referenced in Table 3, refer to the PEAK Generalization Assessment (Dixon 2014b).

Fig. 3
figure 3

Cumulative number of PEAK Generalization items where >80 % of participants demonstrated mastery of the item across age groups

Table 3 PEAK Generalization items that were completed by >80 % of each age group

Discussion

The present study provides an initial assessment of normative scores on the PEAK Generalization Assessment. This information is an initial attempt to allow practitioners to compare the score of any individual they assess to the scores of typically developing peers. This can help assessors identify deficits in the repertoire of the individual who is being assessed which may aid in the creation of individualized plans to correct these deficits by identifying the appropriate skills to target based on the existing deficits. In order for practitioners working with individuals with disabilities to accurately assess both progress and current functioning level, it is vital that we, as a field, understand the manner in which these cognitive and language skills develop in a typical child. The findings of the initial investigation into the relationship between scores on the PEAK Generalization Assessment and age indicated a strong positive correlation such that as age increases so do scores on the assessment. This implies that higher scores on the PEAK Generalization Assessment indicate a more advanced skill repertoire in the individuals being assessed. The cubic nature of this relationship indicates that children do not typically learn at a steady rate as they age, rather as age increases so does the rate of learning. As can be seen in Figs. 2 and 3, between the ages of 7 and 10 there is a large increase in the number of mastered skills. Furthermore, this relationship indicates that after the age of approximately 13, mastery of the skills assessed within the PEAK Generalization Module is readily demonstrated by the normative sample. This potentially indicates that the acquisition of generalized skills may be such that the rate of emergence of a generalized repertoire of responding increases as more generalized skills are learned. That is, as an individual gains more generalized skills, the ability to obtain novel skills which require generalization may itself increase in regard to rate of acquisition.

The psychometric properties of the PEAK Generalization Assessment such as skewness and kurtosis indicate that scores are generally normally distributed prior to age 13; thus, the PEAK Generalization Module may serve as a useful assessment tool for individuals who function at an age level less than 13 years.

The data from the analysis of PEAK Generalization Assessment items as mastered by age group indicate the specific skills which typically developing children in each group may be expected to display. This information could potentially help inform individualized instructional programs such that performance by individuals with developmental disabilities such as autism could be compared to the performance of a sample of typically developing peers and skills which are demonstrated in the typical sample but not by the individual assessed could be included as targeted skills for instruction.

As a whole, these findings describe the performance of a normative sample of children on a behavior analytic assessment tool allowing for a tentative comparison between typically developing children and children with autism or other developmental disabilities. These findings are also very similar to the findings of Dixon et al. (2014d) who investigated the scores of a normative sample on the PEAK Direct Training Assessment. In their analysis, the authors also found that the relationship between scores on the assessment and age was cubic and that a ceiling effect appeared to occur following a rapid period of acquisition. However, in the Direct Training Module, this ceiling effect occurred around the age of 8 years, and the ceiling effect for the Generalization Module appears to occur around the age of 13 years. It is possible that a generalized learning and language repertoire only emerges following mastery of directly trained skills; however, this remains an empirical question. Both studies indicate that the PEAK Relational Training System appears to assess skills which span a large range of developmental functioning in a typically developing population, providing practitioners a tool with which to examine how these skills develop and at what age it may be appropriate to focus on them. Insofar as the information provided by the normalization sample gives us a greater understanding of the skills contained within the PEAK, a comparison to a sample of individuals with autism is warranted to investigate how these skills develop in this specific population.

Experiment 2: Comparing Students with Autism to the Normative Sample

Materials and Methods

Participants

The participants in the second study consisted of 84 individuals between the ages of 5 and 21 (M = 12.51, SD = 4.83) including 75 males and 9 females. All of the students were enrolled in a Midwestern school for children with autism and other disabilities and had previously been diagnosed with an autism spectrum disorder. All students within the school were given the opportunity to participate. The functioning level of participants ranged from severe intellectual disability to near typical intellectual functioning. No students were omitted from the analysis except where consent was not able to be obtained. The graduate students who performed the recruitment and assessment for the participants were familiar with all of the students as they interacted with them on a weekly basis.

Materials and Procedure

As in the first study, the PEAK Generalization Assessment was used to evaluate the participants’ skills. Trained graduate students pursuing degrees in behavior analysis who worked at the school and were familiar with the students administered the assessments. For all participants, the assessments were conducted in one or more sessions which ranged in total cumulative duration from 10 to 120 min dependent on the abilities of the participant. In the event that a session could not be completed (e.g., the participant engaged in severe problem behavior, the participant was required to return to class, etc.), that session was discontinued and resumed at a later time or date. Students were only removed from their classroom settings during non-instructional time to minimize any interference with their education as well as education of their peers. All students who were recruited completed the assessment process. To begin the assessment, the assessor first indirectly scored the skills which they knew the participant could or could not readily demonstrate. Then, any skills which were not scored indirectly were directly assessed. As in the first study as well, prior to the onset of the assessment, participants were offered preferred items or activities which would serve as potential reinforcers following the completion of the assessment session. In addition, children were provided intermittent reinforcement in the form of verbal praise or small preferred items for participation; however, this reinforcement was not contingent in any way on performance on the assessment itself. The schedules of reinforcement for each participant varied based on their individual ability to sit and attend for extended durations.

Data Analyses

The Relationship Between PEAK and Age

Pearson’s correlation coefficients were conducted to determine the relationship between PEAK Generalization Assessment score, age, and gender. As in the first study, the relationship between PEAK Generalization Assessment scores and age was of primary interest; however, gender was included to ensure that it was not a moderating variable. A scatterplot was created to visually analyze the relationship between PEAK Generalization Assessment score and age and was juxtaposed onto the data from the normative sample in study 1 to allow for comparison of this relationship in the two populations. Finally, an independent samples t test was conducted to assess the difference between the normative sample and the sample with autism spectrum disorders on overall PEAK Generalization Assessment scores (p < .05).

Percentage Distribution by PEAK Score

A frequency distribution graph was used to display the number of students who achieved various PEAK Generalization Assessment scores. This investigation sought to determine the appropriateness of the PEAK Generalization Module for individuals with autism across the ages of 2–21. Even though the findings of the first study indicated that the PEAK Generalization Assessment was effective up to the age of 13 for typically developing individuals, the same may not be true for individuals with autism. In fact, previous research has indicated there is not a significant relationship between PEAK Direct Training Assessment scores and age, or IQ and age (Dixon et al. 2014d). It may be that individuals with autism who are older than 13 will still benefit from continued instruction with the PEAK Generalization Module if they have yet to develop the skill repertoire of their typically developing peers. For the analysis, PEAK Generalization Assessment programs were divided into groups of 23 (i.e., 0–23, 24–46, 57–69, …, 162–184), and the percentage of participants from both the normalized sample and the sample with autism who fell within each score range were assessed. Finally, a Chi-square analysis was conducted to determine whether the difference between the groups was significant (p < .05).

Results

The Relationship Between PEAK and Age

A Pearson’s correlation analysis was conducted to determine the relationship between PEAK Generalization Assessment score, age, and gender for the sample with autism. The mean PEAK Generalization Assessment score was 62.76 (SD = 72.34), the mean age was 12.51 (SD = 4.83), and there were 75 males across all participants (89.29 %). The results of the Pearson’s correlation analysis indicated there was no significant difference between PEAK Generalization Assessment score and age (r = .200, p = .068), PEAK Generalization Assessment score and gender (r = .068, p = .539), or age and gender (r = −.003, p = .977). Due to the fact that the primary variables of interest were PEAK Generalization Assessment score and age, Fig. 4 displays a scatterplot of these two variables for the sample with autism as well as the scores on these two variables by the normative sample from the first study. A visual analysis confirms that the sample with autism did not display a relationship between PEAK Generalization Assessment scores and age. Furthermore, the ceiling effect that occurred with the normative sample did not appear in the sample with autism. The results of the independent samples t test further suggest that the difference between the overall scores on the PEAK Generalization Assessment for the normative sample and the sample with autism was significantly different [t(265) = 9.179, p < .001].

Fig. 4
figure 4

Relationship between PEAK Generalization score and age for the normative and autism samples

Percentage Distribution by PEAK Score

Figure 5 displays the distribution of participants from both the normative sample and the sample with autism based on total PEAK Generalization Assessment scores. These were grouped into total score ranges each including 23 items from the PEAK Generalization Assessment. For example, 8 of the total 183 (4.37 %) participants in the normative sample scored between a 0 and a 23 on the PEAK Generalization assessment as the graph indicates. The results indicate that for the sample with autism, the largest percentage of participants demonstrated mastery of the least number of skills (between 0 and 23) on the PEAK Generalization Assessment (47.62 %). The second largest percentage of participants, however, demonstrated mastery of the largest number of skills (between 162 and 184) on the assessment (16.67 %). Comparatively, for the normative sample, the largest percentage of the sample demonstrated mastery of the majority (between 162 and 184) of the skills on the PEAK Generalization Assessment (40.44 %) and the two groups that tied for the lowest percentage (4.37 %) demonstrated mastery of the lowest amount (between 0 and 23) or the third lowest amount (between 47 and 69) of the skills, respectively. The Chi-square test further indicated that the difference between the normative sample and the sample with autism across total PEAK Generalization Assessment scores was statistically significant [(χ 2(7) = 88.45, p < .001].

Fig. 5
figure 5

Percentage of distribution of both the autism and normative samples across ranges of total PEAK Generalization scores

Discussion

The results of the second study permit a comparison of participants with diagnoses of autism and the normative sample. The results extend the previous evaluations of the PEAK Direct Relational Training System (Dixon et al. 2014a, b, c, d; McKeel et al. 2015c; Rowseyet al. 2014), as well as extend the understanding of the development of generalization skills in both typically developing individuals and individuals with autism. The results indicate that the generalization repertoire of individuals with autism does not develop in the same fashion as it does in typically developing individuals. Though PEAK Generalization Assessment scores generally increased as age increased in the typically developing sample, there was no significant relationship between age and PEAK Generalization Assessment scores in the sample with autism. It may be that factors other than age are more reliable predictors of performance for individuals with autism (e.g., intellectual abilities, and severity of autism). It may also be the case that the sample of children with autism had not mastered as many directly trained skills, therefore inhibiting their ability to engage in generalized responding; however, as stated above, this remains an empirical question.

The results further indicate that participants with autism generally scored lower overall on the PEAK Generalization Assessment than did their typically developing peers. This may be due to the fact that approximately 53 % of individuals with autism exhibit below-average IQ scores (Wingate et al. 2014); however, it is also possible that even individuals with autism who exhibit average to above-average IQ scores also develop generalization skills differently from their typically developing peers. In fact, only 16.67 % of individuals in the sample with autism demonstrated mastery of more than 161 items on the PEAK Generalization Assessment. The remaining 83.33 % were unable to demonstrate mastery of 23 or more of the 184 items contained within the PEAK Generalization assessment. The majority of the sample with autism (59.52 %) demonstrated mastery of only up to 46 programs in the assessment. Conversely, the largest portion of individuals from the normative sample demonstrated mastery of the largest number (162–184) of items on the PEAK Generalization Assessment, with the majority of normative participants (56.83 %) demonstrating mastery of 139 or more items. The results of all analyses indicate that the generalization skill repertoires of individuals with autism develop differently from their typically developing peers. By examining how and why these developmental differences occur, steps can be taken to improve instruction by targeting skills which may be deficient in an individual with autism but are readily demonstrated by typically developing peers. In addition, examining these findings may further our understanding of when it is appropriate to begin systematic instruction designed to promote the emergence of a generalization repertoire.

General Discussion

The current study sought to investigate the performance of a normative sample on the PEAK Generalization Assessment as well as to compare those results with the performance of a sample of children with autism. The normative sample suggests that in typically developing individuals generalization skills begin to emerge around the age of 3 years and rapidly increase until around the age of 13 years. After the age of 13, a ceiling effect appears to exist where participants can demonstrate mastery of all of the items on the PEAK Generalization Assessment, indicating that for typically developing individuals over the age of 13 years, a more advanced assessment may be warranted. The results of the performance of the sample with autism on the same assessment, however, yielded different results. The ceiling effect was not apparent, and no significant relationship between performance on the assessment and age was found. These results indicate that for individuals with autism, the PEAK Generalization Assessment may be effective across a wide range of ages. The more appropriate determining factor for the appropriateness of this assessment with individuals with autism may be intellectual ability or severity of autism as opposed to chronological age, though that remains an empirical question which is beyond the scope of the current studies. Given previous research demonstrating the PEAK Direct Training Module’s convergent validity with other language and intelligence measures (Dixon et al. 2014a, d; McKeel et al. 2015b) and the effectiveness of the PEAK Direct Training Module curriculum (McKeel et al. 2015a, c), it seems likely that in addition to functioning as a useful assessment tool in identifying potential skill deficits, the PEAK Generalization Module may provide a curriculum to fill the gaps in the repertoires of individuals with autism as identified by the assessment. However, future research is required to ascertain the effectiveness of the PEAK Generalization Module curriculum in establishing these deficit skills.

The current studies had several limitations. First, the sample sizes for both the normative sample and the sample with autism were relatively small compared to other normative assessments. In addition, neither sample was randomly sampled due to constraints on participant availability. Unlike standardized IQ and language tests, the PEAK Generalization Module is not designed to assess and compare performance with a normative population; rather, it is an assessment and curriculum tool designed to aid in instruction for individuals with autism and other developmental disabilities. Additionally, the similarities in the findings of the current studies to those of Dixon et al. (2014b) indicate that the sample size was sufficient to produce reliable results, and in spite of smaller sample sizes, both these studies and those of Dixon et al. (2014b) demonstrated statistically significant results. Nonetheless, the findings of the current studies included only a small portion of the overall population of both the normative population and the population of individuals with ASD or other developmental disabilities limiting the conclusions that can be drawn as to the representativeness of the overarching populations of individuals with autism or other developmental disabilities and typically developing individuals. As such, future research should seek to include a larger number of participants as well as implement procedures to randomize the sampling of both groups to increase the overall external validity of the findings. Furthermore, any attempts to utilize these findings as a comparison tool between clients being assessed using this tool and the overall normalized population should be considered in light of this limitation. A second limitation is that the sample of individuals with autism was relatively homogenous while the sample of typically developing children was varied across several states. All participants in the sample with autism attended the same school for children with autism in the Midwest. Although the functioning level of the sample with autism and the normative sample varied across participants, there was not sufficient data to indicate that the findings of the current assessments would generalize to the performance of the general populations of individuals with autism and typically developing children. Future research should investigate the reliability of the findings for the sample of children with autism across a greater diversity of residential locations and educational providers as well as a greater variety of typically developing individuals. A third limitation of the current studies is the lack of demographic data for the participants. The normative sample included children from several states within the USA; however, information such as their race, socioeconomic status, and current grade levels was not available. Additional demographic information on the sample of children with autism could also be useful. Further investigations should look at how these other variables might affect performance on the PEAK Generalization Module Assessment by incorporating sampling from more varied demographic pools and reporting the relative performance of subgroups within the population. A final limitation of the current studies is that no reliability data were taken on the assessments for either sample. Though all assessors were required to demonstrate the ability to provide the assessments reliably prior to the study, their accuracy was not measured using interrater reliability procedures during data collection for the current studies. As such, it is possible that false negatives (indicating an individual was not able to demonstrate mastery of a target skill when, in fact, they could) or false positives (indicating an individual was able to demonstrate mastery of a target skill when, in fact, they could not) may have occurred during the assessment procedures. To attempt to minimize these types of errors, any item for which an assessor was not completely sure the child could reliably demonstrated mastery of was directly tested; however, this does not preclude the possibility of errors in either the direct or indirect assessment of skills. It is also possible that factors other than intellectual ability would have affected scoring for individuals in either group such as motivation or generalized compliance with a novel instructor. Although this could lead to scoring individuals as unable to perform a skill when in fact they could demonstrate mastery of it (a false negative), the ability to demonstrate skills in the presence of novel instructors and in novel environments is at the heart of generalization; therefore, the inability to demonstrate a skill in the assessment environment would be considered an inability to demonstrate a generalized response. Future research should attempt to address these limitations through the incorporation of reliability data as well as investigating how reliable the assessment process is over time using a test–retest methodology.

In addition to the extensions of the current research to address limitations, further research can expound upon these findings in several ways. First, future research should look at the relationship between age and scores on the PEAK Generalization Assessment across a single normative sample. This would allow for further information regarding the concurrent development of both a directly trained and generalization skill repertoire in a typically developing population. Second, future research should continue to examine the reliability and validity of the PEAK Generalization Module. Examples might include research on the convergent and concurrent validity of the PEAK Generalization Assessment as compared to other standardized measures of language and intelligence, psychometric investigation as to the subsections of the PEAK Generalization Assessment such as a factor analysis, investigations as to the reliability of the PEAK Generalization Assessment including interrater reliability and test–retest reliability, and research into the effectiveness of the PEAK Generalization Curriculum in promoting skill acquisition.

Though the current studies provide a descriptive analysis of the emergence of a generalization repertoire, they also present useful data for practitioners providing instruction for children with autism. The findings of the item analysis across age groups with the normative sample provide a preliminary guideline for what ages (or relative age functioning) at which it may be appropriate to begin instruction on specific items within the PEAK Generalization Module. This could be used to guide instruction such that prerequisite skills are mastered before instruction on more difficult skills and to ensure that the skills being targeted are reasonably expected to be present in the typically developing peers for that individual. As stated previously, the limitations of the current samples indicate that these potential benchmarks cannot be assumed to be fully representative of the overall population of children with and without autism; however, the data may serve as a useful additional tool to aid in appropriate curriculum planning.

Combined with the findings from Dixon et al. (2014b), the current results also inform practitioners on when it is appropriate to begin training on each of the PEAK Modules. Rather than two separate repertoires that develop in a subsequent fashion, the data indicate that typically developing individuals begin to exhibit directly trained skills prior to generalization skills; however, by the age of 3, both repertoires should be emerging. That is, a repertoire including generalized responding does not require a completely mastered repertoire of direct instruction, rather around the age of 3 both repertoires should be developing concurrently. As skills are mastered in the PEAK Direct Training Module, continued direct training on subsequent skills may be warranted along with an introduction to the more basic PEAK Generalization skills. Although these two modules may be concurrently implemented, the data also provide practitioners with potential benchmarks allowing them to assess deviations from the typically developing population in individuals’ skill repertoires as well as deviations from peers with autism. In addition to being a useful clinical tool for curriculum design, these results also bolster the ability of clinicians and other staff working with children with autism to provide research-based data to support the need for services. This is a vital tool when communicating with insurance providers and other funding providers regarding appropriate services.

As a whole, the current studies add to both the support for the validity of the PEAK Generalization Module and to our understanding of how generalization repertoires are formed. As behavior analysts continue to investigate human learning, it is vital that we continue to seek the most efficient and effective teaching methodologies. The incorporation of instruction which combines both directly trained and tested skills (such as with generalization) allows instructors to assess the emergence of untrained skills which may enhance the efficiency with which instruction takes place. Further research is certainly needed on both the PEAK Generalization Module specifically and the development of generalization more generally; yet, the current research provides an imperative first step in that direction.