Introduction

In the past decade, there has been substantial progress in the discovery of loci and genes contributing risk for autism spectrum disorder (ASD). Many of these advances have been made possible by large multi-site studies (Fischbach and Lord 2010) and international consortia (Buxbaum et al. 2012), resulting in several thousand ASD cases, parents, and unaffected sibling controls. While these samples have been sufficient to advance gene discovery by the identification of rare and de novo variants with large effects, it is estimated that identification of common risk alleles, which will generally yield increases in susceptibility less than 1.5-fold, will require tens of thousands of cases (Geschwind and State 2015). The cost of behavioral phenotyping presents a formidable challenge for such large-scale investigations. The focus of this study is to demonstrate the validity of using the Peabody Picture Vocabulary Test—4th Edition (PPVT-4; Dunn and Dunn 2007) as a lower-cost, less time-intensive alternative to traditional cognitive batteries to obtain a proxy for verbal IQ (VIQ).

Aside from the millions of dollars of funding required to ascertain and sequence large samples, considerable resources have also been dedicated to case identification and behavioral phenotyping. The nature of these assessments varies widely, from case confirmation based on cut-offs from caregiver questionnaires to multimodal behavioral batteries administered and interpreted by one or more highly trained clinicians. Studies suggest that behavioral subtyping has a modest effect on genetic homogeneity, thereby doing little to improve power in genome-wide association studies (Chaste et al. 2015). Nonetheless, the collection of a broad array of behavioral measures has allowed for phenotype–genotype analyses that have shown associations between clinical profiles and particular genetic syndromes (e.g., social anxiety in individuals with duplications at 7q11.23; Sanders et al. 2011; Mervis et al. 2015). In particular, a reduction in IQ has been one of the most consistent findings associated with de novo mutations in ASD. Importantly, however, an excess of mutations is also found in individuals with ASD who have average and above average IQs (Sanders et al. 2015). Further analysis of the relationship between IQ, ASD, and genotype is limited by the lack of IQ data for children with ASD in some cohorts and, in most databases, unaffected parents and siblings. The availability of IQ data in parents and siblings would enable us to begin to better understand the relative contributions of de novo and inherited genetic variation to IQ in affected individuals. The necessity of taking into account parental estimates when assessing abnormalities in heritable traits has been demonstrated for head circumference (Chaste et al. 2013). While that study demonstrated that far fewer probands had abnormally large or small head sizes (given parental head size), the ability to compare a proband’s measured cognitive ability to what is “expected” given heritability of IQ would help to understand the extent to which risk variants influence aberrations in IQ.

Collection of IQ data presents a challenge to large-scale ASD investigations. Children with ASD span the full IQ range, from superior cognitive ability to profound intellectual disability (ID). Therefore, a single study may implement multiple different cognitive instruments, including use of tests outside the standardized age range to derive estimates for children with ID. For individuals who can complete standard testing, cognitive batteries often require over an hour to be administered by clinical staff with training in psychometrics. Even abbreviated cognitive batteries requiring 30 min may be prohibitive in studies aiming to recruit thousands of participants, particularly those seeking to establish estimates for non-affected family members.

In contrast to cognitive tests, the PPVT-4 is a standardized measure of one-word receptive vocabulary that can be administered in 10–15 min by research staff with minimal training (Dunn and Dunn 2007). This test is available in both paper and digital form, potentially providing further flexibility in administration. Moreover, the PPVT-4 does not necessitate spoken language and has norms for 2–90 year olds, allowing use of the same test across individuals of different ages and varying cognitive abilities.

Since the publication of the first version of the PPVT in 1959, researchers and clinicians have been interested in using this brief assessment as a cognitive screener or proxy for verbal IQ. Early studies of the original PPVT and the Wechsler Adult Intelligence Scale yielded inconsistent results, with some studies suggesting that the PPVT underestimates IQ and others reporting that the PPVT overestimates abilities (Altepeter and Johnson 1989; Carvajal et al. 1989; Craig and Olson 1991; Mangiaracina and Simon 1986; Maxwell and Wise 1984; Price et al. 1990; Prout and Schwartz 1984; Stevenson 1986). Subsequent versions of the PPVT have also been reported to demonstrate moderate to strong correlations with measures of verbal IQ (Dunn and Dunn 1981, 1997). For example, in a study of typically developing young children of low socio-economic status, Campbell et al. (2001) reported a correlation of 0.58 between PPVT-III and the Kaufman Mental Processing Composite. Whilst PPVT-III scores tended to be lower than the Kaufman (mean difference = 8.3 points), broad IQ classifications (i.e., <70 on both measures) were accurate in 83% of cases. In a small study of adult college students, PPVT-III was moderately correlated (0.46) with VIQ from the Wechsler Adult Intelligence Scale (Bell et al. 2001). Notably, 33% of participants exhibited PPVT-WAIS VIQ differences greater than 10 points; the majority of these cases were PPVT underestimating VIQ for individuals in the superior range (i.e., 120+). Fewer studies have investigated validity of the current PPVT, PPVT-4, though there is some evidence suggesting similar associations with the Leiter-R, a measure of nonverbal cognitive ability, across small samples of typically developing school-aged children and youth with Down Syndrome or idiopathic ID (Phillips et al. 2014).

Although using the PPVT-4 has many advantages over traditional cognitive assessments, its use as a proxy for VIQ (or intelligence more broadly) also carries considerable limitations. Intelligence is a complex construct comprising multiple different cognitive processes. Although there are many competing theories of intelligence (Reschly et al. 2002), most cognitive batteries provide separate estimates for VIQ and nonverbal intelligence (NVIQ). Verbal IQ is generally thought to reflect crystallized intelligence (i.e., based on knowledge and experience). Nonverbal IQ is associated with fluid intelligence, or the capacity to reason about problems, independent of past knowledge. A Full-Scale IQ or General Conceptual Ability score often combines both VIQ and NVIQ, and sometimes other aspects of cognition, such as memory or processing speed. Use of a brief receptive vocabulary measure will not capture potentially discrepant VIQ–NVIQ profiles (e.g., Joseph et al. 2002; Bal et al. 2016). Impairments in cognitive processes such as memory or processing speed are also not assessed using the PPVT-4.

Even within VIQ assessments, there is a combination of abilities assessed, including both expressive and receptive skills. Studies suggest that rates of receptive and expressive vocabulary development may differ for some children with ASD (e.g., Kover et al. 2013). Consequently, as a measure of one-word receptive vocabulary, the PPVT-4 may over- or underestimate VIQ in cases where there are discrepant expressive and receptive abilities. Although specific language profiles are not consistent on a larger group level across studies, there remains the possibility that small subgroups of children show discrepant abilities (Kwok et al. 2015), which would suggest caution is warranted when using the PPVT-4 as a proxy for VIQ.

While there are limitations to any brief assessment, it is important to note that the measurement of IQ in genetic studies (and many ASD studies more broadly) is also affected by methodological limitations. Studies combine IQ estimates from multiple instruments, which may be derived from divergent theories of intelligence assessing different cognitive processes. Nonetheless, there have been consistent associations between broad IQ estimates and genetic mutations. The extraordinarily large sample sizes required of genetic investigations will preclude comprehensive, detailed assessments, particularly when such investigations seek to include family members. The potential for a relatively quicker, although less precise, estimate will enable researchers to answer initial questions (e.g., relative contributions of de novo and inherited variants to IQ as described above). Once a broad understanding of the effect that risk mutations seem to have for IQ is established, more comprehensive evaluations to parse out the specific cognitive or language processes underlying the observed effect can be carried out in smaller genetics-first studies. An example of this is a study by Hippolyte et al. (2016), in which a detailed neuropsychological assessment was used to characterize different cognitive domains affected by CNVs at the 16p11.2 locus. Thus, while there is potential for genotype–phenotype associations to be missed when using proxies for complex phenotypes such as IQ, the costs of these kinds of comprehensive assessments on tens of thousands of probands and family members must be weighed against the proportion expected to have an identified de novo CNV or LoF risk mutation (e.g., 10.5% of simplex cases across all different loci, with any given mutation representing fewer than 1%; Sanders et al. 2015).

The PPVT is commonly used in ASD research, particularly for genetic studies and other large consortia (e.g., Staal et al. 2012; McCray et al. 2014; Sykes et al. 2009). Although an ideal assessment would encompass both verbal and nonverbal abilities, in studies where resources limit the depth of characterization, a proxy for VIQ may be of particular interest given findings suggesting VIQ, but not NVIQ, is a significant predictor of ASD symptom trajectories (Gotham et al. 2012). However, studies using the current edition of the PPVT, PPVT-4, primarily cite studies comparing PPVT-R or PPVT-III with older versions of cognitive assessments to justify its use as a proxy for VIQ. In one recent study, the PPVT-4 was moderately correlated with scores from the Raven Matrices (0.56), a nonverbal IQ assessment, but it was not compared to a measure of VIQ and the focus of the study was minimally verbal children with ASD (Plesa-Skwerer et al. 2015). No study has systematically investigated the convergent validity of the PPVT-4 and other cognitive measures in children with ASD across a wide range of ages and ability levels. Moreover, we are not aware of a study examining how the PPVT-4:VIQ association may differ when VIQ is estimated using different cognitive assessment instruments in any population. The range of ability levels observed in children with ASD necessitate use of multiple different cognitive tests in large samples, thereby making such examination an important part of validating the PPVT-4 as a proxy for VIQ.

This study specifically adds to the existing literature by (1) systematically investigating the association between PPVT-4 and VIQ assessed using several different traditional cognitive batteries in a large sample of children and adolescents with ASD that span the full range of cognitive abilities; and (2) demonstrating the utility of using PPVT-4 as a proxy for VIQ in genetic studies of ASD.

Methods

Participants

Participants were drawn from the Simons Simplex Collection (SSC), which includes 2,658 children, aged from 4 to 17 years and diagnosed with ASD. In addition to the focus on simplex families (defined as families comprised of one child with ASD who has no known first-, second-, or third-degree relatives with ASD), all children were required to have a nonverbal mental age of at least 18 months, meet ASD cut-offs on the ADOS and ADI-R, and receive a best estimate diagnosis of ASD by a clinician based upon all available information (see Hus et al. 2013). Participants with sensory impairments (e.g., blindness or deafness) that would interfere with standardized testing were excluded. Evaluations were conducted at one of 12 university-based sites; parents completed an ADI-R, Vineland, and questionnaires about the child with ASD. All children completed the ADOS and a cognitive assessment. Study procedures were approved by Institutional Review Boards at each university site. Parents provided informed consent and, when possible, children provided verbal assent. Only those children who completed the PPVT-4 were included in this current analysis (N = 2420, 86.4% male, 84.5% White, 88.7% non-Hispanic).

Measures

The Peabody Picture Vocabulary Test—4th Edition (PPVT-4; Dunn and Dunn 2007)

The PPVT-4 offers two alternative forms (A and B), each comprised of 228 items, grouped into 19 sets of 12-items arranged in increasing difficulty. Recommended start points by age or ability level, combined with basal and ceiling rules limit administration time to 10–15 min. Raw scores are converted to age-normed standard scores with a mean of 100 and standard deviation of 15. The original validation study indicated strong psychometric properties, with split-half reliability and Cronbach’s alpha of 0.94 or higher for both forms and test–retest reliability of r = 0.93.

Cognitive Level

Verbal (VIQ) and nonverbal IQ (NVIQ) scores were derived from a standard developmental hierarchy of cognitive assessments. Most children received the Differential Ability Scales2nd edition (DAS-2; Elliott 2007) Early Years and School-Age (87.3% of sample for VIQ and 89.7% for NVIQ). The Mullen Scales of Early Learning (Mullen 1995; VIQ = 8.2% and NVIQ = 6.2%) or a Wechsler battery (1999, 2003; VIQ = 4.1%, NVIQ = 4.1%) was used for the remaining sample. In cases where children were unable to obtain a standard IQ estimate on the age-appropriate test, verbal ratio IQs were derived by averaging expressive and receptive age equivalents, then dividing by chronological age and multiplying by 100 (see Bishop et al. 2011). If a child was unable to achieve a basal score on either of the subtests required to derive VIQ, they were administered the next test in the hierarchy and a ratio IQ was derived. The difference between PPVT-4 and VIQ estimate was computed as PPVT-4–VIQ. Discrepancies were defined as differences greater than 15 points (i.e., one standard deviation; ≤−16 = PPVT < VIQ, −15 to 15 = PPVT ≈ VIQ, ≥16 = PPVT > VIQ).

Language Level

The Autism Diagnostic Observation Schedule (ADOS; Lord et al. 1999, 2012) is a semi-structured observational assessment used to inform ASD diagnoses. ADOS Module was used as a broad estimate of language level (Module 1: nonverbal or single words, Module 2: phrase speech, and Module 3 or 4: verbally fluent).

Statistical Analyses

Aim 1

Associations between PPVT-4, VIQ, and NVIQ were first examined using Pearson correlations; r < 0.3, 0.3–0.5, and >0.5 reflect weak, moderate, strong associations, respectively (Cohen 1988). Fisher r-to-z transformation was used to compare correlations across subgroups. There were no differences by sex or race; therefore subsequent analyses were not divided by demographic characteristics. Next, children were divided into four subgroups by age groups reflecting broad developmental levels (preschool:4–5, school-age:6–12, adolescent:13+), VIQ test (DAS-2, Mullen, Wechsler), Cognitive Level (VIQ < 70, ≥70), and Language Level (ADOS Module 1, 2, 3/4). These groupings were chosen to explore whether PPVT-4:VIQ associations differed across subgroups commonly combined in large ASD samples; group descriptives are provided in Table 1. Separate linear regression models predicting VIQ were used to assess whether the PPVT-4:VIQ association differed across subgroups; predictors were entered in the following order to allow examination of the relative contribution of each set of variables: PPVT-4 score centered at the sample mean, subgroup, and PPVT-4*subgroup interaction. Cohen’s f 2 was computed to assess the effects of added predictors, when controlling for all other variables in the model; f 2 of 0.02, 0.15, and 0.35 reflect small, medium, and large effect sizes, respectively (Cohen 1988). PPVT-4 and VIQ distributions were compared; paired t-tests were used to assess individual differences in scores and distributions of PPVT-4–VIQ discrepancies were examined. Finally, PPVT-4 and VIQ categories (i.e., <70, ≥70) were compared to evaluate the impact that large discrepancies had on classification. SPSS 23.0 was used for comparative analyses; given the large sample size and multiple comparisons, the significance level was set at p < 0.001.

Table 1 Sample descriptives by age, cognitive, and language subgroups

Aim 2

To demonstrate the utility of PPVT-4 as a proxy for VIQ in genetic studies, PPVT-4 and VIQ distributions were plotted for 447 children with a de novo mutation associated with ASD and 1715 children with no identified copy number variant (CNV) or loss of function (LoF). Data regarding de novo mutations (CNV or LoF) for each child were obtained from Sanders et al., 2015. Details regarding microarray and whole exome sequencing procedures for identifying ASD-associated mutations have been previously reported in detail (see Iossifov et al. 2014 and; Sanders et al. 2015). Consistent with the IQ analyses by Sanders and colleagues (2015), one-sided Wilcoxon Rank Sum Tests (WRST) were used to compare groups; analyses were conducted using R:Version 3.1.1.

Results

Associations Between PPVT-4, VIQ, and NVIQ

Correlations between PPVT-4:VIQ (r = 0.925), PPVT-4:NVIQ (r = 0.803), and VIQ:NVIQ (r = 0.824) were strong. As expected, the PPVT-4:VIQ association was significantly stronger than PPVT-4:NVIQ (z = 25.36, p < 0.0001). Consistent with this, PPVT-4:NVIQ associations within subgroups (i.e., age, cognitive, language) and correspondence between PPVT-4 and NVIQ distributions were reduced (compared to PPVT-4:VIQ comparisons), albeit generally in the same direction. As such, results reported focus on PPVT-4:VIQ; parallel analyses based on NVIQ are available upon request.

Comparisons of PPVT-4:VIQ Associations Across Subgroups

Consistent with the correlations above, PPVT-4 significantly predicted VIQ (R 2 = 0.855). As shown in Table 2, VIQ Test, Cognitive Level, and Language Level also had small, but significant main effects on VIQ (p < 0.001), reflecting expected subgroup differences. Only PPVT-4*VIQ Test and PPVT-4:ADOS Module 1 interactions were significant predictors of VIQ (p < 0.001), indicating a difference in PPVT-4:VIQ association for different cognitive tests and minimally verbal children. Notably, the change in model R 2 when these interactions were entered into their respective models was less than 0.01, indicating a small effect (i.e., f 2; Cohen 1988).

Table 2 Regression models investigating PPVT-4:VIQ associations by subgroup

Comparison of PPVT-4 versus VIQ Distributions and Classifications

As shown in Fig. 1a, b, distributions of PPVT-4 and VIQ scores were similar. The PPVT-4 showed a floor effect for 4% of children (n = 100 at standard score = 20), compared to 17% of children (n = 417) for whom a ratio VIQ was computed due to inability to achieve a standard score on an age-appropriate VIQ test (see Bishop et al. 2011). Ratio PPVT-4 scores could be computed from PPVT-4 age equivalents for 53 cases; when substituted for floor values, PPVT-4 distribution was nearly identical to VIQ (Fig. 1c). Additionally, the PPVT-4 and VIQ both had similarly strong correlations with measures of autism symptomatology, adaptive behavior, and emotional problems (see Supplemental Table 1).

Fig. 1
figure 1

Comparison of PPVT-4 versus VIQ distributions; VIQ Verbal IQ, PPVT-4 Peabody Picture Vocabulary Test—4th Edition

Paired comparisons indicated that PPVT-4 scores were, on average, 5.46 points higher than VIQ, t(2419) = 23.23, p < 0.0001. A closer examination of PPVT-4–VIQ differences revealed that 79% had PPVT-4 scores within one standard deviation (+/− 15 points) of their VIQ estimate, 18% had PPVT-4 scores exceeding VIQ estimates by 16 or more points, and the remaining 3% had PPVT-4 scores 16+ points below VIQ. Group descriptives are provided in Table 3. VIQ Test, Cognitive Level, and Language Level subgroups showed different proportions of PPVT-4 < VIQ, PPVT ≈ VIQ, and PPVT-4 > VIQ. Significance was driven by a higher proportion of PPVT-4 > VIQ in the more cognitively or language impaired groups (Mullen VIQ test, VIQ < 70, ADOS Module 1).

Table 3 PPVT-4 and VIQ discrepancies across subgroups

PPVT-4 is plotted against VIQ by classification group in Fig. 2 and distribution of PPVT-4–VIQ differences by classification group is provided in Fig. 3. PPVT-4 and VIQ classifications (i.e., <70, ≥70) were highly consistent; 90% of children in the overall sample had the same PPVT-4 and VIQ classification. Of the 10% misclassified, 9% (n = 208) were classified as PPVT-4 ≥ 70 and VIQ < 70; 49% (n = 102) of these were within 15 points (one standard deviation). The remaining 1% (n = 35) misclassified were PPVT-4 < 70 but VIQ ≥ 70; the majority of these (57%, n = 20) were within 15 points.

Fig. 2
figure 2

PPVT-4 versus VIQ by classification group; PPVT-4 Peabody Picture Vocabulary Test—4th Edition, VIQ Verbal IQ. Thin line VIQ < 70; thick line VIQ = PPVT-4; dashed line VIQ ≥ 70

Fig. 3
figure 3

Distribution of PPVT-4–VIQ differences by classification group; VIQ Verbal IQ, PPVT-4 Peabody Picture Vocabulary Test—4th Edition

Comparison of PPVT-4 and VIQ by de novo mutations

To demonstrate the utility of using the PPVT-4 as a proxy for VIQ, PPVT-4 and VIQ estimates were plotted by de novo mutation status (Fig. 4). Children with a de novo mutation showed marginally lower VIQ compared to those without an identified mutation (5-point median difference; p = 0.01; one-sided WRST); differences in PPVT-4 were also marginal (3-point reduction; p = 0.04). Notably, 19 children of the 447 with de novo mutations showed a floor effect on the PPVT-4, compared to 88 children whose verbal abilities were estimated using a ratio VIQ.

Fig. 4
figure 4

Comparison of PPVT-4 and VIQ by de novo mutations; VIQ Verbal IQ, PPVT-4 Peabody Picture Vocabulary Test—4th Edition

Discussion

For children with ASD, across varying ages, language, and cognitive ability levels, the PPVT-4 showed strong associations with VIQ estimates derived from different instruments. On average, across groups, the PPVT-4 tended to overestimate verbal abilities (by approximately 5.5 points higher than VIQ), with approximately 18% of children exhibiting PPVT-4 scores more than one standard deviation above their estimated VIQ. However, broad cognitive classifications were highly consistent, with only 10% of children misclassified (i.e., PPVT ≥ 70 when VIQ < 70 or vice versa). Importantly, of those 243 children, 26% were within 10 points (the 95% confidence interval range for these cognitive tests; Fig. 3). The distributions of VIQ and PPVT-4 scores by de novo mutation status were nearly identical. Statistical comparisons across groups yielded highly similar results using VIQ or PPVT-4.

While these results support the proposal that the PPVT-4 may be a good proxy for VIQ in large-scale genetic investigations, investigators should be aware of the limitations of the PPVT-4 in lieu of a full cognitive battery. As described in the introduction, IQ is a multi-faceted construct, considered to be comprised of both knowledge gained through experience, as well as more inherent capacity to reason about problems. The DAS-II, used to derive 87% of the VIQ estimates for this study, is modeled after the Cattell–Horn–Carroll theory (CHC; Carroll 1993). Verbal subtests map onto crystallized intelligence (Gc), and nonverbal subtests are further divided into fluid reasoning (Gf) and visual perception (Gv; Elliott 2007). The PPVT-4 is also used as an estimate of crystallized intelligence (Akshoomoff et al. 2013); accordingly, we suggest the PPVT-4 as a potential proxy for VIQ. Analyses indicated that PPVT-4:NVIQ associations were weaker than PPVT-4:VIQ (available upon request), which may reflect separation of crystallized intelligence versus fluid or spatial reasoning (consistent with CHC theory). This suggests that analyses using the PPVT-4 may miss genotype–phenotype associations if mutations are more specifically affecting fluid or spatial reasoning aspects of intelligence (e.g., Edelmann et al. 2007). Given research demonstrating that some individuals with ASD exhibit uneven verbal–nonverbal profiles (e.g., Joseph et al. 2002), it is preferable to obtain separate verbal and nonverbal cognitive estimates whenever feasible. Many studies implementing the PPVT-4 as a proxy for VIQ have used a brief NVIQ test, such as Raven’s Progressive Matrices (Raven 2000). The Matrix Adaptive Test (Hansen 2016) may provide an efficient online alternative to the Raven that awaits further validation.

A second consideration is how different language profiles may affect PPVT-4:VIQ correspondence. For minimally verbal children and/or those with VIQ below 70, the DAS-Early Years and Mullen were the most common assessments used to derive VIQ. Both tests assess expressive (emphasizing verbal labeling) and receptive language (responding to verbal instructions). The PPVT-4 has lower test demands, in that it does not require spoken language and only assesses single word understanding. Thus, our finding that PPVT-4 tended to be higher than VIQ for children who were minimally verbal, had a VIQ < 70 and/or were assessed using the Mullen may be expected, given that VIQ would reflect an average of expressive and receptive abilities. A closer look at the language profiles of individuals with discrepant PPVT-4:VIQ scores may be of interest for more in-depth characterization studies. Although the subgroup of children with discrepant receptive and expressive language abilities may be small (Kwok et al. 2015), considering discrepancies within individual phenotypic profiles may highlight phenotype–genotype associations that would be missed by focusing on broader dimensions, such as VIQ.

Although there is a logical explanation for why PPVT-4 may be overestimating verbal abilities, it is also possible that cognitive tests underestimate VIQ in more impaired children. Considering challenges of assessing children with limited language, it is plausible that the PPVT-4 actually provides a better assessment of their abilities. That the PPVT-4 does not require verbal responses offers a distinct advantage compared to standard cognitive tests, as well as other brief assessments such as, the Expressive Vocabulary Test, Second Edition (EVT-2; Williams 1997). Indeed, the PPVT-4 is one of the few measures recommended as being well-suited for assessing minimally verbal children (Kasari et al. 2013).

Reduced precision is an expected trade-off of using a 10–15 min assessment administered by research staff with minimal training in place of a 30–60+ minute assessment administered by an experienced clinician. However, the cost:benefit ratio of using the PPVT-4 in different types of research contexts needs to be considered for each individual study. Although time and cost are always pertinent factors, whether the PPVT-4 provides an adequate cognitive estimate will vary depending on the purpose of the study and would not be advised in clinical studies which are particularly interested in understanding complex behavioral profiles of subgroups of children. For example, in samples of individuals with intellectual disability, case-control matching on PPVT-4 versus a nonverbal measure is likely to produce different results (Phillips et al. 2014).

Such a trade-off may be less problematic in large-scale genetic studies, where there is a need to have a cognitive estimate for both children with ASD and their parents and/or siblings. In addition to requiring less time and advanced training to administer, an advantage of the PPVT-4 over traditional cognitive assessments is the availability of standard scores on the same test across a wide range of ages and ability levels. Compared to traditional cognitive assessments, for which standard scores only go as low as 30–50, the PPVT-4 extends down to 20, thereby yielding fewer floor effects. Indeed, only 4% (100/2420) of children were at or below the floor on the PPVT-4, compared to 17% of children for whom ratio VIQs had to be computed due to scores falling below the standard floor or skill level necessitating use of a test outside its standardized age range. While it is common practice to derive ratio IQs for children with ASD who cannot achieve standard scores on age-appropriate cognitive batteries, differences in subtests across tests used for different ages and age effects on calculations make it difficult to compare ratio scores across individuals of varying ages (Bishop et al. 2015). In addition to increasing comparability of children with ASD across varying ability levels, PPVT-4 standard estimates from the same test administered to unaffected siblings and parents will be more easily compared. Such comparability is especially pertinent in genetic studies when the goal is to provide familial context in order to facilitate interpretation of the relative contribution of de novo mutations to a proband’s IQ.

Regardless of how comprehensive the assessment, behavioral phenotypes, including IQ, are not static within individuals across the lifespan (e.g., Bishop et al. 2015). Thus, it is difficult to compare measurements at a single time point across samples for wide age and developmental ranges encompassed in large genetic studies. Exploration of developmental trajectories within subgroups of individuals with specific ASD risk mutations is more likely to inform associations between biological risk factors and phenotypic outcomes (Lord et al. 2015). Moreover, the complex biological consequences downstream of any given genetic mutation that go on to influence brain development and function are matched by an equally complex interaction of processes, many of which are difficult to capture with behavioral measurement. As such, there is also a need to consider biological markers for more complex behaviors that may be more objectively measured, such as physiological reactivity or neural functioning.

Taken together, brief assessments as proxy estimates of more complex behavioral constructs may be a useful step forward to address the specific question posed by large-scale genetic investigations: what are the relative contributions of de novo versus inherited variation to IQ? There will be a need for in-depth, genetic-first longitudinal follow-up studies that take a more comprehensive, multimodal assessment approach to understand specific genotype–phenotype associations. An important limitation to this study relates to the ascertainment of the sample. SSC inclusion criteria required that all children have a nonverbal mental age of at least 18 months and that a direct assessment of VIQ and NVIQ be obtained for all participants. Thus, it is possible that the proportion of children able to achieve a standard PPVT-4 score (96%) is somewhat higher than might be observed in other samples. Interestingly, researchers have begun to explore ways to use receptive vocabulary measures similar to the PPVT-4 to assess children with ASD; for example, some researchers reduce motor demands or other potential confounds to tests requiring pointing responses by using eye tracking to assess performance (Brady et al. 2014; Plesa-Skwerer et al. 2015). Such methods may actually increase PPVT-4 completion rates and await more systematic validation in larger samples.

In summary, these data suggest that the PPVT-4 is a good proxy for VIQ in children with ASD, providing standard scores across a wide range of age and ability levels. The PPVT-4 has three main advantages over standard cognitive batteries: (1) it is faster (10–15 vs. 30–60+ minutes); (2) it can be administered by research staff with minimal training; and (3) it is comparable across different ages and levels of ability. In addition, it is now available in both digital and paper formats, potentially creating an avenue for siblings or parents to complete it remotely, which would further facilitate large-scale data collection. However, future studies are needed to investigate measurement equivalence across formats and modes of administration. Although there are clear limitations to using only the PPVT-4 as a proxy for cognitive abilities (e.g., it may somewhat overestimate VIQ and is less strongly correlated with NVIQ), this may be an acceptable trade-off in order to expand collection of cognitive estimates to parents or siblings in large-scale studies, such as genetic consortiums aiming to ascertain tens of thousands of families.