Vocal development in children with autism spectrum disorder (ASD) is an understudied area with potential clinical utility for enhancing language trajectories. Improving language trajectories and language outcomes for children with ASD is critical because language skills are affected by ASD and language competence predicts social, adaptive, and vocational outcomes in this population (Billstedt et al. 2005; Howlin 2000). For children with ASD, assessing and targeting vocal development, the process through which children produce increasingly speech-like sounds (Oller 2000), may be useful for three reasons. First, compared with only targeting lexical or grammatical development, targeting vocal communication during the preverbal stage of communication development might be more effective in facilitating language development in children who are not ready for linguistic targets. Second, because vocal development is a logical precursor of language, vocal measures might show early response to intervention that targets lexical development and help explain why intervention initially targeting vocalizations is effective in facilitating language in preverbal children with ASD (e.g., Woynaroski et al. 2014; Yoder et al. 2015b). Finally, evaluating vocal development prior to initiating communication intervention may provide a means for determining which children are more likely to benefit from a given intervention approach (Yoder and Warren 2002). To experimentally evaluate these reasons for focusing on vocal development, researchers must employ valid vocal development measures.

Because there is no gold-standard vocal development measure, one cannot simply correlate a new vocal measure with a gold-standard vocal development measure to evaluate the new measure’s validity (i.e., criterion-related validity). Instead, one must draw on multiple sources of evidence to assess the degree to which a variable demonstrates that it measures what it purports to measure (i.e., construct validation; Cronbach and Meehl 1955). Construct validity includes convergent validity (i.e., degree to which a variable correlates with other variables with which it is predicted to correlate based on theory) and divergent validity (i.e., a variable does not correlate other variables with which it is not predicted to correlate based on theory; Campbell and Fiske 1959). Sensitivity to change, the degree to which a measure changes over time, is another key feature of high-quality measures, particularly those used to examine change occurring within intervention studies. This study assesses the construct validity and sensitivity to change of four variables purported to capture vocal communication or vocal complexity in children with ASD.

Two Potentially Important Aspects of Vocalizations for Children with ASD

Vocal communication is defined as how frequently or consistently a child produces vocalizations directed towards another person in an apparent attempt to transmit a message (Wetherby et al. 1989). These vocalizations appear to be used to initiate or maintain a social interaction. Because children do not always direct their vocalizations to another person, one would expect fewer communicative vocalizations than total vocalizations.

Vocal complexity is defined as the frequency, consistency, or diversity with which a child produces vocalizations with speech-like features, such as canonical syllables or consonants (e.g., Wetherby et al. 2007; Woynaroski et al. 2017; Yoder et al. 2015a). Typically developing children progress from producing quasivowels (0–2 months), gooing (1–4 months), grunts, squeals, fully resonant vowels, and marginal babbling (3–8 months) to canonical babbling (5–10 months; Oller 2000). Canonical babbling sounds substantially more like adult speech than precanonical vocalizations because it includes vowel-like and consonant-like sounds with rapid, adult-like transitions (Oller et al. 1999). Example vocal complexity measures include the rate of consonant–vowel productions (Talbott 2014), proportion of vocalizations with a canonical syllable (Woynaroski et al. 2017), and diversity of key consonants used in communication acts (DKCC; Wetherby et al. 2007; Woynaroski et al. 2017). DKCC is calculated as how many of 13 consonants (i.e., /m/, /n/, /b/ or /p/, /d/ or /t/, /g/ or /k/, /w/, /l/, “j,” /s/, and “sh”) that a child produces during the sample. Children can only receive up to ten points because members of voiced-voiceless pairs (e.g., /b/ vs. /p/, /d/ vs. /t/, /g/ vs. /k/) are difficult to distinguish reliably on recordings. Thus, children can only receive one point for the pair regardless of whether they produce one or both of each pair’s consonants.

Theoretical Support for Measuring Vocal Communication and Vocal Complexity

Before assessing the construct validity and sensitivity to change of vocal variables, one must consider the theoretical basis for variables to which vocal variables should correlate (i.e., convergent validity), for variables to which vocal variables should not correlate (i.e., divergent validity), and changes in the vocal variables over time (i.e., sensitivity to change). Child-driven theories of language development focus on child characteristics, such as those related to speech production, and assert that vocal communication and vocal complexity facilitate language development in children with ASD. These child-driven explanations include shared articulators (e.g., tongue and lips) and shared motor pathways for speech and vocalizations (Fry 1966; Iverson 2010; Stoel-Gammon 2011; Vihman 1992, 1996) for prelinguistic vocalizations and spoken words. For instance, regardless of whether the production of “wawa” is produced without lexical meaning or as an approximation of “water,” the articulators move in the same way. One can also make theoretical arguments involving bidirectional influence between child characteristics and adult input, rather than focusing only on child characteristics.

Social Feedback Theory

Goldstein et al. (2003) and Goldstein and Schwade (2008) presented the social feedback theory as a potential explanation for vocal development in infants with typical development. This theory emphasizes the role of contingent caregiver responses to child vocalizations in social interactions for facilitating more complex child vocalizations over time. Goldstein et al. (2003) asserted that infants produce more complex and more adult-like vocalizations following contingent adult responses within social interactions compared with noncontingent adult responses. Based on results of an experimental study with 6- to 10-month-old infants, Goldstein and Schwade (2008) concluded that infants produced either more fully resonant vowels or more consonant–vowel syllables depending on how caregivers contingently responded to the children’s vocalizations (i.e., with a fully resonant vowel or a consonant–vowel syllable, respectively). However, results of the key comparison between the contingent response group and the corresponding control group were not reported, even though this comparison was possible with the study design. Therefore, direct evidence of the impact of contingent versus noncontingent responses was not provided.

Social Feedback Loop

Warlaumont et al. (2014) proposed a “social feedback loop” in which (a) adults are more likely to respond to child vocalizations that are speech-related and (b) a child is more likely to produce speech-related vocalizations if an adult responded immediately to the child’s preceding utterance. Speech-related vocalizations include words as well as prespeech vocalizations (i.e., babbling; Oller et al. 2010). They posited that this social feedback loop may be disrupted in children with ASD because (a) children with ASD produce fewer speech-related vocalizations than children with typical development, (b) caregivers of children with ASD respond differently than caregivers of children with typical development, and (c) children with ASD have a reduced ability to respond to adults’ contingent responses. This disruption might explain in part the language deficits in children with ASD.

Transactional Theory of Spoken Language Development

The transactional theory of spoken language development considers child factors (e.g., cognitive, social, and motor abilities), parent factors (e.g., linguistic input), and dyadic factors (i.e., parent–child) while emphasizing the bidirectional nature of the interactions between child and parent factors across development (Camarata and Yoder 2002; McLean and Snyder-McLean 1978; Sameroff and Chandler 1975; Woynaroski et al. 2014). It posits that as a child’s speech and language skills increase, parents provide more complex input that scaffolds continued child growth (Camarata and Yoder 2002; Woynaroski et al. 2014). Despite some mixed findings, bidirectional influences have been documented for vocal development in children with typical development (Fagan and Doveikis 2017). For example, the content of infant vocalizations and accompanying actions (e.g., play actions and directing eye gaze) has been reported to influence how mothers respond to infant vocalizations (West and Rheingold 1978; Yoder and Feagans 1988). In initially preverbal children with ASD, both parent factors (i.e., linguistic input) and child factors (i.e., intentional communication and receptive vocabulary) predicted growth in DKCC, which in turn predicted expressive language in children with ASD (Woynaroski et al. 2017; Yoder et al. 2015a). These findings are consistent with the transactional theory of spoken language development in children with ASD. Positive associations between parent verbal responsiveness (e.g., follow-in comments) to child behaviors and child spoken language skills in children with ASD provide additional support (e.g., McDuffie and Yoder 2010). In contrast to these findings supportive of bidirectional influences, Fagan and Doveikis (2017) reported that mothers responded to a relatively small number of infant vocal behaviors (i.e., 30%) during ordinary interactions. However, mothers responded much more to infant vocalizations than other infant vocal behaviors (e.g., crying, coughing, and raspberries). Thus, the difference in types of infant vocal behaviors coded may explain the differences in findings.

Application to Vocal Development in Children with ASD

Theoretically, improving vocal communication and/or vocal complexity could facilitate language development in children with ASD. The social feedback theory, social feedback loop, and transactional theory of spoken language development all emphasize interactions between adults and children in children’s vocal development. Increasing vocal communication or vocal complexity in children might elicit more frequent or complex adult responses to scaffold the child’s ability to produce more adult-like productions including spoken words. Also, increases in vocal communication or vocal complexity could signal that children are attempting to say words they understand but do not yet produce accurately enough to be understood (Woynaroski et al. 2016).

Current Empirical Support for Measuring Selected Aspects of Vocalizations

When establishing the convergent validity of particular vocal variables for specific purposes, correlations between vocal variables of interest with expressive language outcomes or measures of precursors to expressive language are some of the most relevant pieces of evidence. Evidence from studies that use a longitudinal correlational design, rather than a concurrent correlational design, and that include children with ASD in the early stages of language learning, are most relevant here. Longitudinal associations provide stronger evidence of convergent validity than concurrent associations because they document an association between variables and provide evidence of a temporal precedence of the putative cause relative to the putative effect.

Broadly, a recent meta-analysis revealed that vocalizations correlate strongly with current or future expressive language skills for children with ASD (McDaniel et al. 2018). The meta-analysis included a variety of vocal variables, including those purported to measure vocal communication and vocal complexity. However, the overall number of studies was too low to achieve sufficient power to test whether specific types of variables yielded stronger correlations with expressive language than others.

Vocal Communication

There is equivocal evidence regarding the correlations between communicative vocalizations specifically and expressive language in children with ASD. For example, Plumb and Wetherby (2013) reported that the proportion of vocalizations that were communicative in the second year of life predicted expressive language skills at age 3 above and beyond the proportion of vocalizations that were noncommunicative. In contrast, for children suspected of having ASD (mean chronological age = 19.51 months), Swineford (2011) reported nonsignificant correlations between the rate of communication acts with vocalizations and verbal communication concurrently and predictively.

Vocal Complexity

Vocal complexity has been defined in multiple ways within two broad categories: (a) vocalizations with consonants and/or canonical syllables without differentiating diversity of consonants produced and (b) diversity of consonants produced. Within each of these two broad categories, there are two subordinate categories: those that are derived from (a) all vocalizations versus (b) only communicative vocalizations.

When only communicative vocalizations are used, the vocal complexity variable combines communication and complexity concepts. For example, several studies have examined consonant inventories in communicative vocalizations of children with ASD, rather than in all vocalizations. For the analyses and discussion purposes, we classify these variables within the complexity set because we judged the complexity component to be more prominent in the variable’s interpretation than the communication component. For example, DKCC is conceptually more related to consonant inventory in all vocalizations, which is clearly a complexity variable, than to the number of vocal communication acts. Relatedly, proportion of communication acts with a canonical syllable is judged to be a vocal complexity variable because it focuses on how consistently the child uses canonical syllables (a marker of complexity) rather than how consistently a child uses vocalizations for communicative purposes.

There is empirical evidence for correlations between the types of vocal complexity variables described above and expressive language in children with ASD. For use of vocalizations with consonants, the rate of consonant–vowel vocalizations produced at 9 months of age correlated with expressive language at 12 months of age for children with ASD (Talbott 2014). Degree of verbal delay concurrently and negatively correlated with lack of communicative vocalizations with consonants (Book 2009; McCoy 2013). Relatedly, the rate of canonical babbling correlated with concurrent expressive language in children with ASD (mean chronological age = 44.67 months; Sheinkopf et al. 2000).

For consonant inventory measures, Yoder et al. (2015a) found that DKCC predicted expressive language growth in initially preverbal children with ASD over and above ten other putative predictors. Similarly, Wetherby et al. (2007) identified that DKCC at 18–24 months was one of the “best predictors of verbal skills at 3 years” (p. 971), compared with numerous other possible predictors for children with ASD. Relatedly, a composite variable derived from the proportion of communication acts with a canonical syllable and DKCC strongly correlated with later expressive vocabulary in a sample of initially preverbal children with ASD (Woynaroski et al. 2017).

The Need for Additional Evidence of Validity for Vocal Variables

A single analysis or test is insufficient for reporting the degree to which a measure exhibits construct validity (Cronbach and Meehl 1955). Instead, multiple sources of evidence must be integrated and evaluated for the specific purpose of the variable of interest. When comparing evidence of validity among multiple vocal variables, one can compare variables with attention to the number of different purposes the variable might serve. For example, a measure may exhibit strong evidence of convergent construct validity with expressive language, but weak evidence that it is sensitive to change. Currently, nearly all validity evidence for measuring vocal development in young children with ASD is convergent validity evidence, even though divergent validity is a key type of validity evidence. The current study presents an opportunity to compare directly convergent validity, divergent validity, and sensitivity to change of selected vocal variables from the same, relatively large sample of young children with ASD. It begins to fill this gap in the literature and move the field forward in selecting vocal measures that are most likely to yield meaningful, interpretable results.

For evidence of convergent validity, we test whether vocal variables predict expressive language, which is predicted by theory and evidence (e.g., Goldstein et al. 2003; Goldstein and Schwade 2008; McDaniel et al. 2018; McLean and Snyder-McLean 1978; Sameroff and Chandler 1975; Warlaumont et al. 2014). For evidence of divergent validity, we test whether vocal variables predict nonverbal cognitive skills, which they would not be expected based on theory. Cognition and language are separate constructs. Particularly in children with disabilities, such as ASD, individuals can present with relatively low expressive language skills in the context of average or even above average nonverbal cognitive skills (e.g., Stark and Tallal 1981). Although no known studies provide evidence of divergent validity for the included vocal variables, evidence that vocal variables do not predict nonverbal cognitive skills would support an inference that the vocal variable is specific to vocal development rather than a more general developmental measure.

For evidence of sensitivity to change, we examine whether vocal variables exhibit a change in value from study initiation to 12 months later. The ability to capture change is a particularly important feature for an intervention study’s outcome measure. The social feedback theory, social feedback loop, and transactional theory of spoken language development assert that vocalizations are expected to increase in complexity and to be used for communicative purposes as children develop (Goldstein et al. 2003; Goldstein and Schwade 2008; McLean and Snyder-McLean 1978; Sameroff and Chandler 1975; Warlaumont et al. 2014). Therefore, vocal variables assessing complexity and communicative use are expected to increase over time.

Purpose and Research Questions

This study evaluates and compares quantitative evidence of validity for vocal variables purported to assess vocal development in young children with ASD. We addressed the following research questions for two vocal variables purported to assess vocal communication (i.e., number of communication acts that include a vocalization and proportion of vocalizations that are communicative) and two purported to assess vocal complexity (i.e., DKCC and proportion of vocalizations with a canonical syllable). (1) To assess convergent validity, does the vocal variable predict later expressive language skills? (2) To assess divergent validity, does the vocal variable not predict later nonverbal cognitive skills? (3) Does the vocal variable exhibit sensitivity to change?

Method

Participants

The study includes 87 children (21 female, 66 male) who participated in the Toddlers with Autism: Developing Opportunities for Learning (TADPOLE) multi-site randomized controlled trial (Rogers et al. 2013). The TADPOLE study compared language and developmental outcomes of a sample of young children with ASD who were randomly assigned to one of two treatment styles and one of two treatment intensity levels. However, the results used for evidence of construct validity and sensitivity to change were not influenced by groups to which participants were assigned as indicated by nonsignificant predictor by group interactions. When an interaction term for style by predictor or intensity by predictor was added to each growth curve model used in the convergent validity analyses, the interaction term was nonsignificant. Similarly, for the sensitivity to change analyses, when style by time or intensity by time interaction terms were added, none were significant.

Participants met the following inclusion criteria: (a) chronological age of 13 to 30 months at study entry, (b) ambulatory without primary motor impairments affecting hand use, (c) meets ASD diagnostic criteria, (d) overall developmental quotient of at least 35 on the Mullen Scales of Early Learning (MSEL; Mullen 1995), (e) English as a primary language (i.e., English reportedly spoken at least 60% of the time at home), and (f) hearing and visual acuity within normal limits per screening. ASD diagnosis was based on: (a) Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition diagnostic criteria for ASD (American Psychiatric Association 2013), (b) clinical consensus of diagnosis based on record review and observation by two independent staff, one of whom is a licensed psychologist, (c) meeting full autism criteria on the Autism Diagnostic Interview-Revised (Lord et al. 1994), (d) meeting autism cutoff on the Autism Diagnostic Observation Schedule for Toddlers (Luyster et al. 2009), and (e) diagnosis confidence rating of relatively confident or very confident assigned by the assessor who evaluated the child. Participants were not excluded based on the presence of genetic disorders or other health conditions.

Per caregiver report, 48 participants were reported to be white, 19 to be more than one race, 9 to be Asian, 7 to be black or African American, 1 to be American Indian/Alaskan native, 1 to be Native Hawaiian or other Pacific Islander, and 2 as unknown. Seventeen participants were reported to be Hispanic/Latino, 64 to be non-Hispanic, and 6 as unknown. Maternal education level was reported as follows: 1 had some high school, 6 had a high school diploma, 25 had some college, 24 had a college degree, 6 had some graduate school, 22 had a graduate degree, and 1 reported “other.” See Table 1 for additional participant characteristics.

Table 1 Participant characteristics at study entry

Procedures

The study’s constructs, procedures, and variables are listed in Table 2. Data are used from procedures administered across three time periods that spanned 12 months (Time 1 = study initiation; Time 2 = 6 months post study initiation; Time 3 = 12 months post study initiation).

Table 2 Study constructs, procedures, and variables

Communication Sample Procedure

The Communication Sample Procedure (CSP) is a 15-min semi-structured free-play communication sample with a standard toy set in a lab setting with interspersed opportunities for the child to request clarification and to respond to an examiner’s topic change. The examiner’s interaction style is guided by specific principles designed to support productive engagement (e.g., follow the child’s lead and join in and play at the child’s demonstrated level of play) and communication (e.g., talking about topics related to child’s focus of attention, monitoring utterance length and complexity, and avoiding directives) as described in the procedure manual. This manual is available upon request from the first author. The CSP was administered at Times 1 and 3.

Early Communication Index (Greenwood et al. 2006; Luze et al. 2001)

The Early Communication Index (ECI), one of the Individual Growth and Development Inventories, is a 6-min play-based measure that uses a standard toy set in a lab setting. The ECI has been validated on multiple samples of young children that include more than 7000 total children (Greenwood et al. 2006, 2010; Luze et al. 2001). The samples include racially diverse children, most of whom attended Early Head Start and some of whom had a disability. The general principles of examiner behavior and talk followed in the CSP are followed in the ECI as well. The ECI may be used frequently to monitor progress during intervention. It was administered monthly throughout the 12-month period for a total of 13 administrations. To align with the data from the 15-min CSP, we averaged the ECI sessions from the first 3 months for Time 1 for a total of 18 min of coded time and the sessions from the last 3 months for Time 3 for a total of 18 min of coded time. Using three ECI sessions for each time point increased the stability of the coded variables.

MacArthur-Bates Communicative Development Inventory (Fenson et al. 2007)

Caregivers completed a compilation form (i.e., 720 total items from the Words and Gestures and Words and Sentences vocabulary items) of the MacArthur-Bates Communicative Development Inventory (MB-CDI) for expressive vocabulary at all three time points. Caregivers marked words on the checklist that they observed their child saying at least once in the prior 2 weeks.

Mullen Scales of Early Learning (Mullen 1995)

The Mullen Scales of Early Learning (MSEL) was administered at three time points. The MSEL includes subscale scores for receptive language, expressive language, visual reception, and fine motor skills.

Vineland Adaptive Behavior Scales, Second Edition (Sparrow et al. 2005)

The examiner interviewed the participants’ caregiver(s) to complete the Vineland Adaptive Behavior Scales, Second Edition (VABS) at three time points. The VABS includes subscale scores for expressive language, daily living, and fine motor skills.

Coding Vocal Variables

Trained research assistants and the first author completed observational coding for the CSP and ECI using ProCoder DV (Tapp 2003) and Systematic Analysis of Language Transcripts (SALT) software (Miller and Chapman 2016). Coders completed four passes using timed-event behavior sampling to code behaviors in the CSP and ECI necessary for deriving the vocal development variables. On the first pass, the coder identified codable and uncodable portions of each video file. On the second pass, the coder identified all communication acts within the codable time and orthographically transcribed words that the child said. The coding manual, which is available from the first author, includes detailed communication act coding rules. See Table 3 for operational definitions of key concepts for coding. Within the third pass, the coder identified and classified vocalizations that occurred within a communication act to indicate whether they contained one or more codable consonants (i.e., /m/, /n/, /b/ or /p/, /d/ or /t/, /g/ or /k/, /w/, /l/, “y,” /s/, and “sh”) and/or a canonical syllable. Because the coding for the larger project focused only on communication, a fourth pass was used to code vocalizations that occurred outside of communication acts. Thus, the coder listened to the entire recording stopping it each time she heard a vocalization. For any vocalizations not already coded as part of a communication act, the coder marked the vocalization as a non-communicative vocalization and indicated whether it included one or more codable consonants and/or a canonical syllable. SALT software was used to calculate the CSP and ECI vocal communication and vocal complexity variables. See Table 2 for the study variables.

Table 3 Operational definitions of key concepts for coding

For communicative vocalizations, two variables were calculated: the number of communication acts that include a vocalization and the proportion of vocalizations that are communicative (i.e., number of vocalizations within a communication act divided by the total number of vocalizations). For complexity, two additional variables were calculated: DKCC (Wetherby et al. 2007; Woynaroski et al. 2017) and the proportion of vocalizations with a canonical syllable (regardless of communicativeness).

Interobserver Reliability

A trained secondary coder independently coded a random sample of ≥ 20% of coded sessions for each time point for CSP and ECI variables. The primary coder was blind to which sessions would be coded for reliability. Training included reading the coding manual and an initial training session with an expert coder including a didactic presentation, a question and answer session, and group coding of non-participants with discrepancy discussions. After the initial training session, coders independently coded novel videos and participated in discrepancy discussions until the secondary coder reached criterion of at least .80 small/large agreement for all variables on three consecutive videos (Yoder et al. 2018). After initial training was complete, coders completed discrepancy discussions for each reliability set (i.e., group of five videos from which one reliability video was randomly chosen and completed for reliability before proceeding to the next set) to prevent coder drift. The primary coder’s coding was used in the analyses. Interobserver reliability was estimated using intraclass correlation coefficients (ICCs) with absolute agreement and participant and observer as random factors. ICCs account for differences in unitizing and classifying behaviors between coders and for the variance among participants on the component variables addressing the research questions.

Results

Preliminary Analyses

Reliability

For all conventionally-coded variables combined, the mean ICC was .93 (SD = .11). Table 4 displays ICCs for the conventionally-coded variables by time period and procedure. Means and standard deviations are reported for ECI ICCs because these values were calculated from months 1 through 3 for Time 1 and months 11 through 13 for Time 3. We used a benchmark of .70 when interpreting the ICCs, which Mitchell (1979) interpreted as “very good”.

Table 4 Intraclass correlation coefficients for conventionally-coded vocal variables by time and procedure

Creating Composite Variables

Six composite variables were computed: one for expressive language, one for nonverbal cognitive skills, and four for the vocal variables if the intercorrelation among component variables posited to measure the same construct warranted it. To allow composite scores to show growth, we calculated and averaged the z-scores for each component variable using the sample’s Time 3 mean and standard deviation.

The expressive language component variables (see Table 2) correlated with each other at r ≥ .40, our a priori criterion for aggregating, at both time periods. The nonverbal cognitive skills component variables correlated with each other at r ≥ .40 at each time point, except for Time 2 VABS fine motor skills and Time 2 MSEL visual recognition subscale (r = .37). Because these components correlated sufficiently at Times 1 and 3 and with all other component variables at all time points, both were retained for the Time 2 composite. For the four vocal variables, we created composite variables across the CSP and ECI sampling procedures for Time 1 and another composite for Time 3. For each vocal variable, we summed the results from the ECI Month 1, ECI Month 2, and ECI Month 3 for the Time 1 ECI value. We then averaged the Time 1 ECI value with the Time 1 CSP. Analogously, we summed the results from the ECI Months 11, 12, and 13 for the Time 3 ECI value. We then averaged the Time 3 ECI value with the Time 3 CSP. Correlations between Time 1 CSP and Time 1 ECI as well as Time 3 CSP and Time 3 ECI correlated at r ≥ .60 for all four vocal variables, which is above the r ≥ .40 criterion.

Evaluating Evidence of Convergent Validity

For evidence of convergent validity we tested whether each vocal variable predicted later expressive language skills using growth curve modeling with full maximum likelihood estimation (Enders 2010). By centering time in study at Time 3, the intercepts of the growth model are interpretable for the participants’ expressive language skills at the final study period. Significant fixed coefficients for the predictor variable of a model with the expressive language composite as the dependent variable provide evidence of convergent validity.

The initial step of multilevel modeling is to identify the unconditional growth model. We used a build-up approach for model selection. Although the data fit the random intercept, random slope model better than the random intercept, fixed slope model (p < .001), the correlation between slope and intercept was very high (r = .92). Due to this high covariance of the intercept and slope and the desire to use the most parsimonious growth model, we chose to use the random intercept, fixed slope model. The growth parameter of interest was intercept, which was interpreted as the best estimate of end point (Time 3) expressive language.

To evaluate for evidence of convergent validity, we added each vocal variable to the random intercept, fixed slope model predicting the end point-centered intercept of growth of expressive language (i.e., best estimate of Time 3 expressive language) one at a time. All variables were significant predictors with positive associations (see Table 5). See Supplemental Material Fig. 1 for an example depiction of the association between a vocal variable and end point expressive language. Specifically, the association between DKCC at Time 3 and the predicted Time 3 expressive language value is displayed. No evidence of heteroscedasticity was observed. All residuals fell within the acceptable parameters for skewness (< |.8|) and kurtosis (< |3.0|; Tabachnick and Fidell 2001). Mathematically pseudo R2 is the difference between the residual variance of the intercept between the full model (i.e., includes vocal predictor variable of interest) and the reduced model (i.e., excludes vocal predictor variable of interest) divided by the reduced model’s residual variance of the intercept. Conceptually, pseudo R2 means the proportion of the growth model or growth parameter explained by the more elaborate or full model relative to the less elaborate or unconditional model. Using pseudo R2 ≥ .25 as an indication of a large effect size, all four vocal variables at Time 1 have a large association with the best estimate of later expressive language.

Table 5 Fixed effects estimates for vocal variables predicting end point expressive language

Evaluating Evidence of Divergent Validity

For evidence of divergent validity, we evaluated whether each vocal development variable predicted the end point-centered intercept of the growth of nonverbal cognitive skills (i.e., best estimate of Time 3 nonverbal cognitive skills). Theoretically, vocalization measures should not be significant predictors of nonverbal cognitive skills. As before, time in study is centered at Time 3 to yield intercepts interpretable for the participants’ skills at the final study period. Associations with the intercept of growth on nonverbal cognitive skills are expected to be low and nonsignificant. Given the large sample size, we rely on significance to define low.

We used a build-up approach for model selection for the model predicting nonverbal cognitive skills. As with expressive language, the data fit the random intercept, random slope model better than the random intercept, fixed slope model (p < .001). Unlike the expressive language model, the intercorrelation between the intercept and slope for the random intercept, random slope model was acceptably low (r = .79) to retain the random slope. To reflect the logic used in the expressive language models, the intercept was the parameter of interest. None of the vocal variables were significant predictors of end point nonverbal cognitive skills (see Table 6). No evidence of heteroscedasticity was observed. All residuals fell within the acceptable parameters for skewness and kurtosis (Tabachnick and Fidell 2001).

Table 6 Fixed effects estimates for main effects of vocal variables predicting end point nonverbal cognitive skills

Evaluating Evidence of Sensitivity to Change

A significant difference between Time 1 and Time 3 via a paired t-test is evidence of sensitivity to change (see Table 7). All of the variables exhibited evidence of sensitivity to change and exceeded the benchmark for large effect size (i.e., Cohen’s d ≥ .80).

Table 7 Results of paired t-tests from Time 1 to Time 3

Discussion

Summary of Relative Validity of Vocal Variables

The two vocal variables purported to assess communicative vocalizations (i.e., number of communication acts that include a vocalization and proportion of vocalizations that are communicative) and the two purported to assess vocal complexity (i.e., DKCC and proportion of vocalizations with a canonical syllable) exhibited consistent evidence for convergent validity, divergent validity, and sensitivity to change. Regardless of whether we use the presence or absence of significant results or effect size as the criterion for assigning validity and summarizing across these three purposes, all four variables presented with consistent positive evidence for all three purposes. As further indication of strong validity evidence, the effect sizes ranged from pseudo R2 = .44 to .60 for convergent validity and d = 0.88 to 1.32 for sensitivity to change. DKCC had the largest effect size for convergent validity and sensitivity to change. Large effect sizes increase our confidence that the findings will replicate. The convergence across validity comparison methods bolsters the confidence in the conclusions.

The sensitivity to change evidence provides key information for planning intervention studies. For example, using a vocal variable to show response to intervention or as a mediator to explain why a given intervention is effective in facilitating language in preverbal children with ASD, one must select a vocal variable that captures change over time.

The Current Study Findings Relative to the Extant Literature

The social feedback theory, social feedback loop, transactional theory of spoken language development, and child-driven theories all support using vocal communication and vocal complexity variables as putative predictors of expressive language. Having multiple studies suggesting that a variable or construct should predict language strengthens the rationale for using that variable or construct. The current study was not designed to test one theoretical framework against another. All of these theories predict that vocal communication and vocal complexity should change over time. Additionally, the bidirectional theories suggest that increases in child vocal communication and child vocal complexity will elicit more frequent or complex adult responses to scaffold the child’s ability to produce more adult-like productions including spoken words. Future work will be needed to test this prediction. The construct validity evidence in which vocal communication and vocal complexity variable predict expressive language but not later nonverbal cognition supports the assertions that the examined measures of vocal communication and vocal complexity measure the constructs they are presumed to measure.

Convergent validity evidence based on significant correlations with current or later expressive language has been reported for vocal communication (Plumb 2008) and DKCC (Wetherby et al. 2007; Woynaroski 2014; Woynaroski et al. 2017; Yoder et al. 2015a). Thus, these findings from the current study are replications, decreasing the likelihood that the current study findings are sample specific results.

Some of the many tests of significance for the association between vocal communication and language are significant, while others are nonsignificant. For example, Plumb and Wetherby (2013) found that the proportion of vocalizations that are communicative was associated with the Speech subscale of the Communication and Symbolic Behavior Scales, a measure of speech level, but unrelated to the MSEL verbal developmental quotient, a measure of language delay. Swineford (2011) found that rate of vocal communication acts was unrelated to expressive language. The current study may have identified these previously unidentified relations in part due to different metrics for the vocal variable, different metrics for the language variable, the relatively large sample size in the current study, use of growth curve modeling to generate a better estimate of end point expressive language rather than a single observed measure (Singer and Willett 2003), and the focus on expressive language level rather than expressive language delay. In comparison to the current study, Plumb and Wetherby (2013) and Swineford (2011) included fewer participants and quantified expressive language with a single measure rather than multiple measures.

No known prior studies report convergent validity for the proportion of vocalizations that include a canonical syllable in children with ASD. However, Williams (2013) reported a nonsignificant relation between a similar variable (i.e., percent of syllables that are canonical) and language composites for fifteen 6-month-old siblings of children with ASD. The fact that most infant siblings will not have ASD, the relatively younger and smaller sample, and the use of different language measures relative to the current study may explain the incongruent findings.

No prior studies reporting evidence for divergent validity for any of the study variables with children with ASD were located. For sensitivity to change, only one study with children with ASD was located. Woynaroski et al. (2016) reported significant positive simple linear growth in DKCC for 87 initially preverbal children with ASD (mean age = 34.7 months, SD = 7.2 months) across 16 months in a longitudinal correlational study. Thus, the evidence for sensitivity to change for DKCC in the current sample replicates this finding in a more diverse sample of participants with ASD.

Limitations

Four limitations should be acknowledged. First, validation refers to a specific variable, use, and population (Yoder et al. 2018). Therefore, findings from this study may not directly transfer to other variables derived from the same data collection methods, other uses, or other populations. Second, multiple t-tests were conducted without alpha adjustment when assessing significance of predicted associations and change, which increases the risk for Type I errors. Although using composites partially addresses concerns about family-wise error due to multiple significance tests without alpha adjustment, there are still many significance tests per research question. Replications of associations with expressive language that are new to the field are needed to ensure those findings are not sample specific. Despite some novel findings, many of the predictors of expressive language have been detected in other samples of children with ASD, as described above. It is unlikely that replicated associations are significant due to unadjusted multiple significance testing. Third, clear divisions among variables that measure vocal communication and vocal complexity were not possible in every case. For example, DKCC considers vocal communication and complexity. Last, the correlational and single-group pre-post design used to test the validity of the selected vocal variables prevents confident inferences that predictors cause criterion variables or that treatments caused the change in the vocal variables.

Strengths

Five strengths should be acknowledged. First, we addressed not only convergent validity, but also divergent validity when assessing construct validity. Divergent validity evidence is notably sparse in the literature. Thus, the current findings provide a unique contribution to the literature, particularly for vocal measures for children with ASD. Second, we used multilevel modeling to provide the best estimate of end point expressive language and end point nonverbal cognitive skills, rather than relying on the observed value (Singer and Willett 2003). Third, the study duration of 12 months provides a relatively long, and meaningful, period of time for predicting growth. Intervention goals are often written for yearlong intervals. The current study design permitted addressing predictive validity, one important purpose for which vocal variables are often needed. Fourth, this study includes a relatively large sample size for this population, which increased the power to detect effects and permitted the use of multilevel models with the necessary number of predictors to address the research questions. Fifth, composite variables were used to increase the reliability of construct estimates relative to single-measure constructs.

Implications

The results provide guidance for selecting variables for a variety of studies related to vocal development and language development of children with ASD. Overall, the results support the measurement of vocal communication and vocal complexity when derived by human coding of communication samples to assess vocal level and vocal development in young children with ASD. Potential purposes for assessing and targeting vocal development include increasing the effectiveness of communication intervention, identifying early response to such interventions, and explaining why or for whom communication interventions are effective in preschool-aged children with ASD. For example, when selecting variables that might mediate treatment effects on expressive language, the findings suggest that using vocal communication and vocal complexity variables may maximize the probability of detecting the putative mediated effect of early language interaction on expressive language through midpoint vocal development.

Future Directions

Additional investigations are needed to compare the validity of these vocal variables to less costly variables, including automated measures. Using variables that are more elaborate and/or expensive due to research staff training and coding costs is justifiable only when the more elaborate or more costly vocal variables yield more useful results than the less elaborate or less costly measures. Assessing volubility (i.e., how frequently a child vocalizes) is less elaborate than vocal communication and vocal complexity variables, and therefore less expensive to code (i.e., lower training and coding costs). Automated measures are also less expensive to code than variables coded conventionally from communication samples. Examples of automated variables from day-long audio samples from the Language ENvironment Analysis (LENA) system for volubility or complexity include the number of child speech-related vocalizations (LENA 2015), the average count per utterance of consonants and vowels (Woynaroski et al. 2017; Xu et al. 2014), and the infraphonological vocal development (IVD) score (Oller et al. 2010). Whether conventionally-coded variables account for unique variance above and beyond less elaborate or less expensive measures should be evaluated empirically.

In addition, further investigation is required to validate the tested variables with other populations of children, such as children with language impairment without ASD or children with ASD at different communication levels. Relatedly, because construct validity is judged based on a network of constructs called nodes (Cronbach and Meehl 1955), additional nodes for convergent and divergent validity may be explored to increase the confidence in the results. For instance, correlations with receptive language may be considered for convergent validity nodes. It may not be intuitive to hypothesize that vocal variables may predict receptive language, but receptive language has predicted DKCC (Woynaroski et al. 2016). One possible explanation for this finding is that children may be trying to use words they understand prior to being able to make themselves understood. Other divergent validity nodes could also be considered.

Whether variables that may most readily transfer to clinical practice (e.g., DKCC and proportion of vocalizations with canonical syllables) can be coded live reliably, and with what amount of training, warrants further investigation. If these variables can be coded reliably, their use may be encouraged within clinical and research settings with appropriate training.

Conclusion

The current study offers crucial new knowledge that can help the broader scientific community measure vocal development within and across young children with ASD. Key findings include strong evidence of convergent validity, divergent validity, and sensitivity to change for predicting expressive language using conventional methods for coding communication samples to measure vocal communication and vocal complexity in young children with ASD. These results support the use of conventional measures of vocal communication and vocal complexity in future studies of communication intervention in children with ASD.