The growing prevalence of autism spectrum disorder (ASD), currently estimated at 1 in 68 (Centers for Disease Control and Prevention [CDC], 2014), provides a rising challenge to researchers, service providers, family, and to those diagnosed. ASD is a pervasive neurodevelopmental disorder characterized by impairments in communication and social interaction, often resulting in restricted and repetitive behavioral repertoires (Autism and Developmental Disabilities Monitoring, ADDM Network Surveillance Year, and 2010 Principal Investigators 2014). Language and social deficits often persist throughout a lifetime and may profoundly affect the individual and caregiver’s quality of life (ADDM 2014; Mahmood et al. 2015). Currently, there seems to be no shortage of efforts by researchers and practitioners to address the needs of children and families affected by ASD. Unfortunately, sifting sound practice from the myriad of choices presents a daunting challenge. In order to establish a reference of evidence-based autism treatment, Odom et al. (2010) identified 30 comprehensive and empirically validated treatment models. In a comprehensive literature review of intervention practices for children and youth with ASD, Wong et al. (2015) found that procedures based on applied behavior analysis (ABA) almost solely account for procedures that qualify as evidence-based. Astute decision making is required by parents, teachers, and practitioners in order to navigate the plethora of assessment and treatment options available, both from within and outside of behavior analysis. Given that treatment options are on the rise; standardization of service delivery could benefit from manualization. Odom et al. (2010) state that manualization is one of the critical components of comprehensive autism treatments. Manualization is part of the process of operationalization, in which procedures are described in enough detail for individuals from outside of a project to allow for accurate implementation (Odom et al. 2010). Manualization can serve important functions, such as allowing systematic standardization of treatment protocols; thus setting the occasion for increased treatment integrity in practice and comparisons of treatment efficacy of varied ABA programs in research. Ultimately, it stands to reason that current commercially available manualized ABA protocols are in need of further evaluation to meet the above ends.

An assessment of skill repertoires of individuals with autism and related disabilities is greatly important for the purposes of expedient and effective treatment. An accurate assessment of abilities can allow researchers or practitioners to focus on remediating skills in a logical sequence. If an assessment is well ordered and comprehensive, one can be confident in selecting treatment protocols, due to the likelihood that prerequisite skills have been accounted for within the assessment.

A commonly used assessment to identify and address skill deficits in individuals with autism is the Assessment of Basic Language and Learning Skills – Revised (ABLLS-R; Partington 2008). The ABLLS-R evaluates the repertoire of learners according to skill areas including language, social interaction, self-help, academic, and motor skills. Items in the ABLLS-R are conceptually derived from Skinner’s (1957) analysis of verbal behavior, which has facilitated tremendous gains in the field of autism treatment over the past 30 years (Reed and Luiselli 2016). A consensus exists amongst several reviews of the contributions of Skinner’s (1957) analysis (e.g. Dymond et al. 2006; Dixon et al. 2007; and Sautter and Le Blanc 2006), indicating that Skinner’s analysis of verbal behavior has been immensely influential and has resulted in a significant number of publications of sound treatments and interventions to address language problems in children with autism and related disorders. A critical analysis of the state of knowledge about verbal behavior, however, indicates that the majority of empirical research has been limited to the study of relatively few basic verbal operants (e.g. mands, tacts, intraverbals, and echoics) (Dymond et al. 2006; Dixon et al. 2007). Consequently, much of Skinner’s analysis has remained underdeveloped as a research program, particularly in the area of complex language. Assessments and curriculum guides that make use of updated and comprehensive technologies may provide a more complete account of human language and have great practical utility, by meeting the demand that exists for comprehensive treatments of individuals with autism.

The PEAK Relational Training System (Dixon 2014a) is an assessment and curriculum guide developed for the purposes of assessing language and cognitive skill deficits of special populations, and facilitating improvement in those areas. Currently, the three published iterations of PEAK include the PEAK Direct Training Module (PEAK-DT), the PEAK Generalization Module (PEAK-G), and the PEAK Relational Training: Equivalence Module (Dixon 2014a; b; Dixon 2015). Some psychometric support for the PEAK-DT module has been gathered by establishing the internal validity and external validity of the assessment. Dixon et al. (2014a) demonstrated a relationship between PEAK-DT and the Peabody Picture Vocabulary Test (PPVT), as well as correspondence with standardized measures of intelligence (Dixon et al. 2014b). Additional research by Dixon et al. (2015) evaluated the relationship between PEAK and another commonly used assessment, the Verbal Behavior Milestones and Placement Program (VB-MAPP; Sundberg 2008). Forty students with autism were recruited and administered the assessments. Results indicate that total scores on the PEAK-DT and PEAK-G were strongly correlated those on the VB-MAPP (Dixon et al. 2015). Additionally, a ceiling effect was found for the VB-MAPP, where mastery of all items on the VB-MAPP does not correspond to mastery of all items on the PEAK-DT and PEAK-G curricula. In summary, PEAK appears to target skills not incorporated in other curricula and may provide “a more robust measure of advanced language skills in individuals with autism” (Dixon et al. 2015, p. 223). Prior to the investigation mentioned above, no peer-reviewed publications had established the reliability or validity of the VB-MAPP. Similarly, no published studies have been conducted with an aim of evaluating the reliability of the ABLLS-R, and a single published study included the ABLLS-R in an analysis of convergent validity of a variety of instruments commonly used to estimate the functional abilities of individuals with developmental disabilities (Wagner et al. 2007).

Service providers must be provided empirically validated and conceptually systematic assessment tools so that treatments developed from these tools will be effective. The task of identifying possible teaching targets for intervention may be challenging, due to a potentially vast array of options; a solution can be the use of standardized descriptive assessments of adaptive and problematic behavior, such as the Vineland Adaptive Behavior Scales (VABS-II; Furniss 2009). The VABS-II assesses overall adaptive functioning, which can be compared to developmental norms for typically developing children. Scores on the VABS-II may be used as a diagnostic tool, as well as to guide treatment goals in the areas of communication, daily living, socialization, and motor skills (Pearson Education 2014; Perry et al. 2009). The psychometric properties of the VABS-II have yielded good ratings for internal consistency and test–retest reliability, while the inter-rater reliability of this assessment has been rated as adequate (Perry et al. 2009). Research involving the use of the VABS-II with individuals with autism supports its internal consistency and convergent validity (Perry et al. 2009).

Long-term ABA treatments have been shown to have significant effects on scores on IQ scores, language, daily living skills, and social skills of individuals with autism. Thus, comparing scores on frequently used assessments such as the PEAK and ABLLS-R to a standardized assessment such as the VABS-II has the potential to advance the validity of these assessments by establishing a correlation and convergent validity of scores across assessments. Dixon et al. (2014b) found that PEAK has strong convergent validity with IQ scores, and recommended further testing of the correlation of PEAK with other psychometrically sound assessments. Comparing PEAK and the ABLLS-R to other language assessments would build evidence supporting behavior analytic treatments of language deficits, as well as the tools used to guide treatment, ensuring the most effective intervention may be chosen. It is important to make certain that appropriate assessments are used across settings and that these assessments are compared and contrasted for the purposes of determining their “utility and redundancies in guiding behavior analytic treatment” (Dixon et al. 2015, p. 226). The current study aims to extend the findings of Dixon et al. (2015; 2014a) by evaluating the relationship between two behavioral language assessments: PEAK-DT and the ABLLS-R, and a psychometrically sound assessment of adaptive ability, the VABS-II. We hypothesized that higher scores on the PEAK-DT assessment would be significantly and positively correlated with ABLLS-R and VABS-II scores. As scores increase on one assessment, intuitively, scores are likely to increase on another assessment if both assessments encompass a similar scope of skills; it is important to evaluate whether commonly used assessments do in fact have a similar scope, and whether those skills are related to adaptive abilities. This evaluation may provide clinicians with data to inform decisions regarding assessments and derived treatments for those diagnosed with ASD and related disorders.

Method

Participants

A total of 21 participants were included in the current study. Participants were 18 males and 3 females ranging in age from 4 to 8 years of age (M = 6.35, SD = 0.93), all participants were diagnosed with ASD, within the moderate to severe range. Each of the participants demonstrated social and cognitive skills that were below average, including speech and language deficits. All participants were recruited from an Intensive Behavioral Intervention (IBI) program located in Toronto, ON. The above program was funded by the Ontario Ministry of Children and Youth Services (MCYS) and provided for children and youth who have a diagnosis of autism. Specifically, the IBI program provides instruction based on the application of ABA, with the aim of teaching novel skills, primarily using a one-to-one ratio, for approximately 20 or more hours, per week (Ontario Ministry of Child and Youth Services 2016, n.d.). In order to qualify to participate in the IBI program, all participants must have met the following criteria: a) residing within the geographic boundaries of the program, b) have a current diagnosis of autism from a doctor, psychologist, or psychological associate, indicating severity toward the severe end of the autism spectrum, and c) all participants must have been assessed for eligibility for IBI by clinical staff employed by their service provider (Ontario Ministry of Child and Youth Services 2016, n.d.). All participants in the current study received between 19 and 30 h of instruction, per week.

Materials

Assessments used in the current study include the PEAK-DT, ABLLS-R, and VABS-II. The PEAK-DT assessment contains 184 items ranging from basic learning skills (e.g., maintaining eye contact, keeping hands still, and turn taking) to more complex academic skills (e.g., advanced addition and subtraction, guessing, and audience control). The ABLLS-R is an assessment of basic learning skills, containing 544 skills across 25 domains. These include language, social interaction, self-help, academic, and motor skills. The VABS-II provides a measure of individuals’ adaptive level of functioning, which may aid in the classification and diagnosis of a developmental delay or a related disorder. The VABS-II is organized into four adaptive domains that include communication, daily living, motor, and socialization skills in addition to a maladaptive behavior assessment. For the purpose of comparing adaptive functioning only, the VABS-II maladaptive behavior scale was not used.

Procedure

The current study evaluated the relationship between the PEAK-DT, ABLLS-R, and VABS-II (Communication, Daily Living, Socialization, and Motor Skills Domains). A Board Certified Behavior Analyst trained staff to assess participants’ verbal and academic skills by completing the PEAK-DT. The training was conducted over the course of an approximately 30-min session, which included an overview of the PEAK-DT assessment, instructions on conducting the assessment, and provided the necessary materials. Following the training, staff completed the PEAK-DT assessment indirectly; only staff familiar with participants’ skills completed the assessment. Skills that staff directly observed participants perform prior to the completion of the assessment were recorded. ABLLS-R and VABS-II scores were gathered from existing records, completed within six months of the PEAK-DT assessment. PEAK-DT and ABLLS-R scores were calculated by indicating the total number of skills acquired relative to the overall number of skills per domain, per assessment (e.g. a score of 120 on the PEAK-DT or ABLLS-R assessment indicates that 120 out of a possible 184 skills or 544 skills were observed, respectively). Twenty participants’ data were compared for all three assessments; one participant was not assessed using the VABS-II. Interobserver agreement data was not completed for any assessment due to time restrictions of staff working in a publicly funded program who are mandated to allocate time toward instructional delivery.

Data Analysis

Correlational analyses were performed between each of the assessments, using IBM SPSS Statistics, Version 21, for all participants. A Pearson’s correlation was calculated for scores obtained on the PEAK-DT, the ABLLS-R, and Adaptive Behavior Composite (ABC) scores derived from the VABS-II assessment.

Results

Relationship of PEAK-DT and ABLLS-R Scores

A scatter plot of each participant’s scores on the PEAK-DT and ABLLS-R scores are displayed in Fig. 1. PEAK-DT scores for all participants resulted in a mean score of 75.05. The mean ABLLS-R obtained for all participants was 235.76. A Pearson correlation was calculated to examine the relationship among the PEAK-DT scores and the ABLLS-R scores. The results suggest that there was a strong positive relationship between the PEAK-DT and ABLLS-R (r = 0.951, p < 0.001), indicating total scores on the ABLLS-R were a strong predictor of scores on the PEAK-DT.

Fig. 1
figure 1

A scatter plot displaying the scores of each child on the PEAK-DT and ABLLS-R Assessment

Relationship of PEAK and VABS-II ABC Scores

Figure 2 displays each participant’s score on the PEAK-DT and VABS-II. The mean VABS-II ABC score for all participants was 68.05. A Pearson correlation was calculated to examine the relationship between scores on the PEAK-DT and the VABS-II ABC. The relationship between PEAK-DT and VABS-II scores was significant (r = 0.453, p < 0.05), indicating a moderate correlation.

Fig. 2
figure 2

A scatter plot displaying the scores of each child on the PEAK-DT and VABS-II ABC Assessment scores

Relationship of VABS-II ABC and ABLLS-R Scores

A scatter plot of each participant’s scores on the ABLLS-R and VABS-II ABC scores are displayed in Fig. 3. Both VABS-II ABC and ABLLS-R scores were analyzed using existing records completed within 6 months of each other. Pearson correlations were used to examine the relationship among the VABS-II ABC scores and the ABLLS-R scores. The relationship between PEAK-DT and VABS-II scores was significant (r = 0.563, p < 0.05) indicating a moderate correlation (Tables 1 and 2).

Fig. 3
figure 3

A scatter plot displaying the scores of each child on the ABLLS-R and VABS-II ABC Assessment scores

Table 1 Descriptive statistics
Table 2 Correlations

Discussion

The present data set provides an extension of the empirical evidence supporting the validity of the PEAK-DT module (Dixon 2014a). The results are consistent with previous studies on the convergent validity of the PEAK Relational Training System with other psychometrically sound assessments. Dixon et al. (2014a) found that PEAK-DT was significantly correlated with the Peabody Picture Vocabulary Test (Dunn and Dunn 2007). Additionally, Dixon et al. (2015) and Dixon et al. (2014b) found that PEAK-DT was significantly correlated with the Illinois Early Learning Standards Test (Illinois State Board of Education 2013), as well as standardized intelligence tests. The current results are also consistent with the Dixon et al. (2015) finding indicating a strong correlation with another commonly used language assessment, the VB-MAPP (Sundberg 2008).

The finding of a moderately strong relationship with the VABS-II is encouraging. The current study provides some preliminary data that suggests that commonly used behavioral assessments evaluated in the current study may be used across settings, in order to assess not only language and academic abilities, but also provide an approximate measure of overall adaptive abilities. However, replication of the current study is needed with larger and more varied samples, as such an endeavor may lead to important findings that lend greater validity to behavioral assessments in a treatment landscape that must insist on the use of evidence-based practice. The PEAK-DT has now been correlated with measures of academic skill ability, intelligence, language ability, and now adaptive ability. In order to determine the broad utility of the PEAK Relational Training System, future researchers may continue this line of research by comparing this package to other assessments, including establishing a relationship between each of the PEAK modules.

A ceiling effect was not observed in the current comparison of the PEAK-DT and ABLLS-R assessment; however, this may be due to a limited number of participants with high scores on either assessment. Additionally, the current investigation compared only one of the PEAK modules to the ABLLS-R. Subsequent PEAK modules are comprised of skills that have been claimed to be more advanced than those in the direct training modules and relationships between these and other curricula have yet to be established. Participants in the current study were drawn from a limited sample due to the eligibility criteria for admission into the IBI program, which is available only to individuals diagnosed with autism toward the severe range of the autism spectrum. Participants were also drawn from a limited age group. Participants ranged in age from 4 to 8 years. Future research may extend the results of the current study by broadening the inclusion criteria, in order to more fully encompass any variability in language abilities associated with age.

The current data set yielded significant results despite the relatively limited sample size. Dixon et al. (2014a) reported statistically significant results from a similar sample size. Large sample sizes are more likely to achieve significant results; however, individual scores are less critical to the analysis in larger samples. Generally, as more data is added to the sample, individual scores have less influence (Dixon et al. 2014a). Consequently, obtaining statistical significance with a smaller sample size in the current study implies that the data set based on the convergent validity between the PEAK-DT, ABLLS-R, and VABS-II assessments is well ordered; this is due to the fact that each obtained score impacts the measure more significantly in a small sample.

Although the current data set is promising, it is not without limitations. All of the assessments were carried out non-concurrently. The majority of VABS-II and ABLLS-R scores were gathered prior to the administration of the PEAK-DT assessment. It may be the case that some skill acquisition took place in the time between assessments, possibly resulting in higher PEAK-DT scores, per participant. Future research may extend the current findings by conducting each of the assessments concurrently. However, the non-concurrent method of assessment may yield some interesting results; ABLLS-R and VABS-II scores may be predictive of PEAK-DT scores; this relationship may require future systematic investigation.

An additional limitation is related to the implementation of each of the assessment. Interobserver agreement data was not collected due to pragmatic challenges at the site of the research. The ABLLS-R does not have any published studies to support the test-retest or inter-rater reliability of the assessment, therefore, ABLLS-R scores in the current study may only be accepted as approximations of the participants’ abilities. Some previous literature suggests that the PEAK-DT has reliability as an assessment of directly trained language (e.g. Dixon et al. 2014a; b, but such outcomes are only as good as the two assessors used in this prior work. Future research should replicate the results obtained in the current study with the inclusion of IOA procedures.

In summary, the current study, along with several other recent papers, cited above, suggest convergent validity of the PEAK Relational Training System with other more established assessments. These investigations have made strides to ensure that an evidence base exists for the practices of behavior analysts treating individuals with ASD and related disorders. However, a gap in the available data on behaviorally based autism language assessments still exists. It is important that the assessment process in behavior analysis stands up to the psychometric rigors required by the broader scientific community in order to communicate our science beyond our home discipline. Arguably of greater importance, establishing confidence in the validity of assessment measures can allow practitioners to confidently choose treatments that are likely to be effective for the individuals they serve.