Introduction

Substantive language heterogeneity presents within autism spectrum disorders (ASD). While much remains to be understood regarding language processes in these conditions (see Boucher 2012, for a recent review), a profile of relative impairment in receptive skills alongside better-spared expressive abilities appears commonly as a group-level phenomenon (Charman et al. 2003; Ellis Weismer et al. 2010; Luyster et al. 2007), holding true for many, but not all, individuals (Hudry et al. 2010; Volden et al. 2011). This contrasts with the normative profile whereby raw receptive language typically exceeds concurrent expressive skills.

At all stages of language acquisition, individuals usually acquire skills first in the receptive domain, with production lagging somewhat behind (see Bates et al. 1995). Fenson et al. (1994) present comprehensive vocabulary growth data which demonstrate that single-word comprehension grows rapidly from around 9 months of age. Production follows more slowly, beginning only around 12 months and then increasing rapidly later in the second year. Similarly, young infants show early understanding of short phrases and sentences but do not begin to produce these until later, around 18–20 months (Fenson et al. 1994). Grammatical development follows a similarly staggered pattern (Bates et al. 1995), and while important individual differences in the rate of language acquisition are observable among typically-developing children (e.g., Huttenlocher et al. 1991), the profile of receptive advantage over concurrent expressive skills is consistent.

While language is affected in many developmental conditions, profiles usually retain the normative balance of raw receptive advantage over expressive skills (Ellis Weismer et al. 2010; Laws and Bishop 2003). The contrasting profile often seen in ASD, whereby raw receptive skills fail to show clear advantage over expressive skills, is not a newly-identified phenomenon (i.e., see Bartak et al. 1975). However, attention has recently returned to this topic, as researchers strive to better understand the extent and underpinnings of this profile (e.g., Charman et al. 2003; Ellis Weismer et al. 2010; Hudry et al. 2010; Luyster et al. 2007; Volden et al. 2011).

A commonly-used parent-report measure, the MacArthur-Bates Communicative Development Inventories (MCDI; Fenson et al. 1992), yields meaningful raw counts of receptive and expressive vocabulary (i.e., the numbers of words a child understands and produces). Raw receptive counts necessarily exceed raw expressive counts, as children cannot meaningfully use words they do not yet understand. As noted above, Fenson et al. (1994) have shown, within a large normative sample, that infants typically present early and persistent receptive advantage over expressive vocabulary on the MCDI, with acquisition and growth in the former domain preceding the latter by several months. Individuals’ expressive vocabulary scores can be expressed as a function of their receptive level. Through comparison with Fenson et al.’s normative data, three independent samples of toddlers/young children with ASD have now been shown to present unusually close alignment across these domains, signifying lack of receptive advantage over expressive skills (Charman et al. 2003; Hudry et al. 2010; Luyster et al. 2007).

Various other standardised language/communication measures exist for young children, including direct-assessment tools [e.g., Mullen Scales of Early Learning (MSEL), Mullen 1995; the Preschool Language Scales (PLS), Zimmerman et al. 2002], and parent-report measures [e.g., Vineland Adaptive Behavior Scales (VABS) domain measuring functional communication; Sparrow et al. 2005]. While assessing different aspects of language, common across such measures is the process of metric standardisation on the basis of data from large, normative samples. Derived Standard Scores (SS) are often of limited use for samples of children with ASD, given frequent floor-level performance. Age-Equivalence (AE) scores are more interpretable but less robust, psychometrically. Given instrument standardisation procedures, close alignment should be expected across receptive and expressive AEs (as a normative raw advantage in receptive skills should become standardised so as to present balanced receptive and expressive AEs). However, various factors make this an imperfect assumption, including sampling procedures for obtaining the norming group, data interpolation/smoothing procedures undertaken to produce normative score tables, and the substantial differences in performance between individuals resulting in large standard deviations/errors at various age-groupings (Murphy and Davidshofer 1998). Correspondence is also expected across obtained AE scores and a given individual’s chronological age (CA), given that the former are derived on the basis of the median CA at which children in the norming group obtained the given score (Murphy and Davidshofer 1998).

In various developmental conditions, including ASD, language AEs frequently present well below CA-expectations (e.g., Ellis Weismer et al. 2010; Laws and Bishop 2003). In conditions such as Down syndrome and specific language impairment, relative profiles of receptive and expressive AEs are usually observed to remain closely-aligned or, if different, to demonstrate heightened receptive compared to expressive AE. However, as detailed below, in the case of children with ASD, data regarding language AE profiles has been highly variable, with scores based on different measures yielding different results, even within one sample.

Broadly speaking, measures of a child’s ability to actively demonstrate language comprehension and to produce communication/speech in response to direct examiner bids (e.g., MSEL, PLS, etc.) have consistently shown profiles favouring expressive over concurrent receptive AE scores (Ellis Weismer et al. 2010; Hudry et al. 2010; Luyster et al. 2008; Volden et al. 2011). By contrast, results based on VABS AE scores, measuring functional comprehension and use of adaptive language/communication (via parent-report and based on every-day observation of the child), have presented highly inconsistently. Ellis Weismer et al. (2010) and Luyster et al. (2008) reported balanced VABS AE scores, or profiles favouring receptive over expressive AEs, in their samples of toddlers with ‘broad-spectrum’ diagnoses. By contrast, Hudry et al. (2010) reported profiles favouring expressive over receptive AE scores in their sample of preschoolers with core autism.

To date, dedicated investigation of language profiles in ASD has remained cross-sectional. However, a recent developmental surveillance study presented relevant longitudinal data. Barbaro and Dissanayake (2012) examined the MSEL AE scores of infants drawn from a large, community-based sample. While all infants presented early signs of ASD, by 24 months, two subgroups were differentiated: those with ASD and with broader developmental/language delays (DD/LD). Receptive language was the domain of most pronounced delay for the infants with ASD outcome, remaining a persistent area of weakness across assessments undertaken at 12, 18, and 24 months. While infants with DD/LD outcome also presented early receptive impairments relative to concurrent expressive AEs at 12 and 18 months, by 24 months the receptive and expressive AEs of this group had become well-aligned. This suggests that while early receptive impairment may signal broadly delayed development, sustained lack of receptive advantage may point to emerging ASD. Dedicated research has yet to explore this possibility, and the high-risk sibling design provides just such an opportunity.

Infants at familial high-risk for ASD (hereafter, high-risk; HR) can be identified on the basis of having a diagnosed older sibling, and can be monitored from early in life through to the point of own possible diagnosis in toddlerhood (e.g., see Zwaigenbaum et al. 2007). This method affords three key benefits to the investigation of early language profiles in ASD. First, it permits repeated assessments to be conducted with children developing ASD, beginning earlier in infancy than otherwise possible/practicable. Second, the inclusion of low-risk controls (with no familial ASD; hereafter low-risk; LR) permits direct between-group comparisons rather than reliance on indirect comparison with tabulated norms (see Ellis Weismer et al. 2010; Luyster et al. 2007 for a discussion). Third, whilst yielding relatively high saturation of ASD cases, the design also permits evaluation of the broader autism phenotype (BAP; Bolton et al. 1994), characterising features in HR individuals who do not go on to ASD. A number of studies have investigated the early language skills of infants developing ASD using the HR design, noting early and continuing receptive and expressive delays in this group (e.g., Landa and Garrett-Mayer 2006; Mitchell et al. 2006; Yirmiya et al. 2007). Early language delays have presented equally for many HR infants who do not proceed toward ASD outcome, indicating potentially important intermediate phenotypes (see Elsabbagh and Johnson 2010). No study has yet, however, specifically examined profiles of language ability in HR infants, tracking the normative developmental advantage of receptive over expressive skills in early life.

The current study employed a prospective HR sibling design, presenting the first longitudinal evaluation of language profiles in the context of emerging ASD. We sought to explore patterns in the relative development of receptive and expressive skills across subgroups of infants at HR for ASD, separating three outcome subgroups: individuals with ASD (HR-ASD), other atypicality (HR-Atypical), and typical outcome (HR-Typical). A LR comparison group was expected to display the normative pattern of early-emerging and persistent raw advantage of receptive over expressive skills. We sought to observe whether relative lack of such receptive advantage might emerge early in infants later diagnosed with ASD. We were also interested to examine the specificity of such an early atypical profile to this outcome subgroup, versus more general applicability within intermediate phenotypes of heightened ASD risk.

Given variability in the facets assessed and metrics produced by different language measures, different patterns of result were expected across these. From previous research using the MCDI—a parent-report measure of vocabulary—LR infants were expected to present early and persistent profiles of raw receptive advantage over expressive vocabulary (i.e., large receptive-expressive difference scores). By contrast, HR-ASD infants were expected to show more closely-aligned receptive and expressive vocabularies (i.e., smaller difference scores). The MSEL—a direct measure of child ability to show understanding of language and to produce appropriate communication/verbalization in response to bids—yields AE scores derived from a large, normative sample. As such, LR infants were expected to show well-matched scores across receptive and expressive domains (i.e., near-zero AE differences), while a profile favouring expressive AE was expected for HR-ASD infants. Also yielding meaningful AEs derived from a large, normative sample, the VABS presents a parent-report measure of everyday functional comprehension and adaptive use of language/communication skills. Again, we expected LR infants to show well-matched receptive and expressive subscale scores, while confident prediction regarding HR-ASD infants was not possible given the inconsistent past results from this measure. Given recent data showing early but transient receptive impairments in infants developing broader developmental/language delays (rather than ASD), we considered it plausible that those HR infants without ASD outcome might present some early profile atypicality, before regaining a more normative receptive advantage.

Methods

Participants

Data from 54 HR and 50 LR infants were made available through the British Autism Study of Infant Siblings (BASIS, www.basisnetwork.org; NHS NRES London REC 08/H0718/76). Each HR infant had an older sibling with a community clinical ASD diagnosis (hereafter, proband), confirmed on the basis of information in the Development and Wellbeing Assessment (DAWBA; Goodman et al. 2000) and the Social Communication Questionnaire (SCQ; Rutter et al. 2003) by expert clinicians on our team (TC, PB). Most probands met ASD criteria on both measures (n = 44). While a small number scored below threshold on the SCQ (n = 4), no exclusions were made due to meeting the DAWBA threshold and expert opinion. For two probands, data were only available on one measure, and for four probands, neither measure was available (aside from parent-confirmed local clinical diagnosis). Parent-reported family medical histories were examined for significant conditions in the proband or extended family members (e.g., Fragile X syndrome, tuberous sclerosis, condition requiring institutional care) with no such exclusions deemed necessary.

LR controls were full-term infants (gestational ages 38–42 weeks) recruited from a volunteer database at the Birkbeck Centre for Brain and Cognitive Development. Medical history review confirmed lack of ASD within first-degree relatives. All LR infants had at least one older sibling (in three cases, only half-siblings). The SCQ was used to confirm absence of ASD in these older siblings, with no child scoring above instrument cut-off (n = 1 case missing data).

Procedure and Measures

Infants attended up to four visits, each conducted across one or two testing days, around the following mean ages: 7 months (range 6–10 months), 14 months (11–18 months), 24 months (21–27 months), and 38 months (32–53 months). These are referred to as Visits 1 through 4, respectively. Table 1 presents basic characterisation of the sample, at each visit. Over the course of the study, minimising attrition was considered more important than maintaining strictly-defined age-ranges at each visit (particularly at Visit 4, when attrition would result in missing diagnostic outcome). Groups remained well-matched on mean CA at each visit (all ps > .38) and on individuals’ inter-visit intervals (all ps > .32; Visit 1 to 2 interval, M = 6.4 months, SD = .93; Visit 2 to 3, M = 10.2 months, SD = 1.57; Visit 3 to 4, M = 14.0 months, SD = 3.2). Non-verbal (NV) ability, the average of MSEL Visual Reception and Fine Motor t-scores (i.e., population M = 50, SD = 10), was consistently higher in LR than HR infants.

Table 1 Characterisation of infants at high-risk for ASD and low-risk controls, at four visits

Language Measures

Parents completed the MCDI (Fenson et al. 1992) at each visit, with minor adaptations for British English. The Words and Gestures (WG) form, used at Visits 1 and 2, includes ‘understands’ and ‘understands and says’ columns permitting the computation of raw receptive and expressive vocabularies. The Words and Sentences (WS) form, used at Visits 3 and 4, includes only an ‘understands and says’ column. However, parents were asked also to indicate any words only understood by the child, permitting assessment of both receptive and expressive vocabulary counts at each visit. This therefore provided repeat parent-report of child knowledge and production of vocabulary, among words commonly-acquired in the early years.

Parents also completed the VABS—2nd edition (Sparrow et al. 2005), yielding measures of functional communication in receptive and expressive domains. The Parent Rating Form was used at Visits 1 and 2 and at Visit 3 for LR infants. The Survey Interview Form was used at Visit 3 for HR infants and for all participants at Visit 4. Administered by clinical researchers, use of the interview form for HR infants at Visit 3 was considered important to facilitate discussion with parents around any developmental concerns for the child. While the same normative tables are used to derive standardised metrics across both VABS forms, un-facilitated completion by parents might plausibly yield different ratings to those from interview. Though not ideal, lack of consistent use of forms across groups at Visit 3 presents as confounding only for HR-subgroup versus LR group comparisons, posing no problem for comparisons among HR outcome subgroups.

The MSEL (Mullen 1995) yielded a third set of receptive and expressive scores, via direct assessment of child skills, and was administered by research-reliable assessors. At the toddler visits specifically, administration was by or under close supervision of the same clinical researchers who undertook the diagnostic assessments. Within a direct-assessment context, MSEL scores indexed children’s developing abilities to show understanding of and compliance with language spoken by the examiner, and to produce appropriate communication/language spontaneously and in response to the examiner’s bids/questions (varying across items, as appropriate to the developmental level of the assessed individual).

ASD Symptoms and Diagnosis

Of the 54 HR infants recruited, 53 were retained to Visit 4 when comprehensive diagnostic assessment was undertaken. Parents completed the Autism Diagnostic Interview—Revised (ADI-R; Lord et al. 1994) and the SCQ (Rutter et al. 2003), and toddlers were assessed with the ADOS-G (Lord et al. 2000; module 2 for 50 toddlers, module 1 for 3 toddlers), with standard algorithms computed. Assessments were conducted without blindness to risk-group status (given the different protocols run with HR and LR toddlers), by or under the close supervision of clinical researchers (i.e., psychologists, speech therapists) with demonstrated research-level reliability.

In determining diagnostic outcome status, four clinical researchers (KH, SC, GP, TC; the first three of whom also conducted the assessments), reviewed information across Visits 3 (including an ADOS-G assessment) and 4 (including both ADOS-G and ADI-R administration). Seventeen toddlers met ICD-10 (World Health Organisation 1993) criteria for ASD (i.e., HR-ASD subgroup). A further 12 toddlers did not meet ASD criteria, but were neither considered typically-developing due either to a) scoring above ADI-R cut-off for autism (n = 1), b) scoring above ADOS cut-off for ASD (n = 9), c) scoring greater than 1.5 SD below population mean on the MSEL direct-assessment Early Learning Composite (n = 1), or meeting both of points b and c (n = 1). These therefore comprised a HR subgroup presenting other atypicalities (i.e., HR-Atypical), while the remaining 24 toddlers were considered typically-developing (i.e., HR-Typical).

Table 2 provides child outcome characterisation at Visit 4. Comparison of ADOS calibrated severity scores (Gotham et al. 2009) across HR subgroups indicated a significant omnibus effect, F(2,52) = 23.88, p < .001, η 2 = .49, with the HR-Typical subgroup scoring significantly below both HR-Atypical, t(14.78) = 5.23, p < .001, d = 2.72, and HR-ASD subgroups, t(21.18) = 5.76, p < .001, d = 2.50, which did not differ, t(27) = .310, p = .759, d = .11. Of the 50 LR infants recruited, 48 were retained to Visit 4. One LR infant had attended Visit 1 only, so was excluded from further analysis. Data were retained for the other LR child lost to Visit 4, who had attended every other visit. While well-matched on Visit 4 CA, F(3,100) = .85, p = .47, LR and HR outcome subgroups differed on NV ability, F(3,100) = 6.71, p < .001, η 2 = .17. Specifically, HR-ASD toddlers were significantly lower-functioning than LR and HR-Typical toddlers (both p ≤ .004), with HR-Atypical toddlers falling intermediate.

Table 2 Detailed characterisation of HR subgroups and LR controls at outcome Visit 4

Results

A two-staged analytic approach was adopted, with comparisons first made across HR and LR groups to ascertain broader risk-group effects, and then repeated separating out HR-Typical, HR-Atypical and HR-ASD subgroups to evaluate specificity of language profiles to diagnostic outcome. Various data points were missing at each visit, due either to failure to attend a given session or to lack of parental questionnaire completion. MCDI data at Visits 1, 2 and 3, and MSEL data at Visit 3 included 5 % or more missing cells (see Table 3). These appeared to be largely at random with the exception of Visit 1 MCDI scores, where cases with missing data presented lower scores on other language measures at the same visit (e.g., VABS receptive scale). Missing data here may therefore have reflected lack of MCDI completion due to very limited overt child language ability. Missing data were imputed via expectation–maximisation (EM) method (as per Tabachnick and Fidell 2007), with the analyses below presented on the basis of these data. Analyses were also undertaken on the data without imputation, and where slight differences arose, this is noted. Following the presentation of preliminary data handling, the main analyses for each language measure are presented in turn.

Table 3 Number of missing data-points across three language measures and three/four visits, for LR and HR groups, and HR subgroups (with typical, atypical, and ASD outcome)

Omnibus analysis of variance (ANOVA) tests (with Greenhouse–Geisser adjustment applied as necessary) were followed-up with post hoc multiple comparisons, controlling family-wise error rate via Bonferroni correction. Figures present MCDI difference scores, but MSEL and VABS AE scores are presented for each of receptive and expressive ability separately (rather than presenting AE difference scores), to allow the reader to observe the overall skill levels of each sub/group, whilst also considering the relative balance across domains (as presented in the text/analysis). Comprehensive tabulation of data is included in “Appendix 2”.

Preliminary Data Handling

Raw MCDI receptive and expressive vocabulary counts were totalled as usual. While clear ceiling effects resulted in the necessary exclusion of Visit 4 data from analysis, for Visits 1 to 3, difference scores were computed by subtracting expressive from receptive counts. Square-root transformation was applied to these data which were highly positively-skewed. This effectively normalised Visit 2 and 3 data, but Visit 1 data remained skewed (i.e., with a modal score of zero, as many infants reportedly neither produced nor yet understood any words). The results of these parametric analyses and of non-parametric analyses conducted on untransformed data were substantively identical, so the former are presented here (including transformation on Visit 1 data). A small number of observed outliers (i.e., data-points falling beyond 2SD of the mean; seven points at Visit 1, four at Visit 2, and three at Visit 3), were replaced with the value one point greater than the highest non-outlier score (as per Tabachnick and Fidell 2007).

MSEL and VABS receptive and expressive AE scores were computed as usual, with receptive-expressive AE difference scores computed as the key metric for analysis. MSEL difference scores were normally-distributed, although some outlier data-points were noted (one low and six high outliers at Visit 2; one low and one high outlier at Visit 4), and handled as above. VABS scores at Visits 3 and 4 were skewed, with square-root transformation effectively normalizing these (and applied also to the non-skewed Visit 1 and 2 VABS data to permit repeated-measures analysis across visits). No outlier data points were apparent.

MCDI Raw Vocabulary Count Difference Scores

A 2 Group (HR, LR) by 3 Visit (mean age: 7, 14, 24 months) ANOVA on MCDI vocabulary difference scores revealed significant main effects of Visit, F(1.49,150.25) = 160.30, p < .001, η 2 = .61 and Group, F(1,101) = 9.08, p = .003, η 2 = .08, and a significant interaction term, F(1.49,150.25) = 3.88, p = .034, η 2 = .61 (with the latter non-significant, p = .059, η 2 = .04, when data did not include EM imputation). The intercept term was significantly different from zero, F(1,101) = 872.62, p < .001, η 2 = .90, indicating overall receptive advantage over expressive vocabulary (i.e., non-zero difference scores). LR infants demonstrated consistently greater receptive advantage compared to HR infants (p = .003). Though both groups showed increasing magnitude of receptive advantage from Visits 1 to 2 (all pair-wise p values <.001), HR infants continued this pattern of increasing magnitude to Visit 3 (p < .001), while for LR infants, change from Visits 2 to 3 did not reach significance (p = .066).

A similar analysis retaining LR infants but separating HR infants by outcome subgroup revealed significant main effects of Visit, F(1.49,145.74) = 111.16, p < .001, η 2 = .53 and Subgroup F(3,98) = 4.09, p = .009, η 2 = .11, and an interaction term which just failed to reach significance, F(4.46,145.74) = 2.34, p = .051, η 2 = .07 (significant at p = .027, η 2 = .10, when analysis did not include EM imputation). Again, the intercept term was significantly different from zero, F(1,98) = 584.69, p < .001, η 2 = .86. HR-ASD infants presented significantly reduced receptive vocabulary advantaged compared to LR infants (p = .012) with other HR outcome subgroups not significantly different to either of these. While all subgroups displayed increasing magnitude of receptive advantage across Visits (all pair-wise p values <.001), change from Visit 2 to 3 was significant only for HR-Typical infants (p < .001), non-significant for LR infants (p = .066, as already reported) for HR-Atypical and HR-ASD subgroups (both p ≥ .157). Figure 1 depicts these effects, presenting group and subgroup mean difference scores (untransformed data).

Fig. 1
figure 1

Mean raw (untransformed) receptive-expressive difference scores (MacArthur-Bates Communication Development Inventories; Fenson et al. 1992) for two risk-groups, and for four outcome subgroups, across three visits. Error bars represent ×2 SE

MSEL Direct-Assessment Language AE Difference Scores

A 2 Group (HR, LR) by 4 Visit (mean age: 7, 14, 24, 38 months) ANOVA on MSEL AE difference scores revealed a significant main effect of Visit, F(2.27,229.70) = 5.56, p = .003, η 2 = .05, but none for Group, F(1,101) = .002, p = .964, η 2 = .00, nor any interaction term, F(2.27,229.70) = .23, p = .820, η 2 = .00. While the intercept term was not significantly different from zero, F(1,101) = 1.58, p = .212, η 2 = .02, indicating overall lack of difference between receptive and expressive AEs, at Visit 3, scores evidenced generally greater receptive than expressive AE (i.e., positive mean difference), whereas Visit 1 and 4 scores evidenced generally greater expressive than receptive AE (i.e., negative mean difference; pair-wise p values ≤.014; though these effects became non-significant when NV ability was statistically controlled).

A similar analysis, retaining the LR group but separating three HR subgroups, also yielded a significant main effect of Visit, F(2.28,223.87) = 3.69, p = .021, η 2 = .04, but none for Subgroup, F(3,98) = .86, p = .464, η 2 = .03. The interaction term approached significance, F(6.85,223.87) = 2.03, p = .054, η 2 = .06. The intercept was not different from zero, F(1,98) = .72, p = .398, η 2 = .01, again indicating overall balance across receptive and expressive AEs. However, Visit 1, scores evidenced generally greater expressive AEs (i.e., negative mean difference score), consistent across all subgroups, except the HR-Typical group which presented balanced profiles. This pattern at Visit 1 was also significantly different to the more balanced AE profiles observed across all subgroups at Visit 2 (p = .035), and marginally different to the balanced AE profiles at Visit 3, although HR-Typical infants here showed heightened receptive AEs and HR-ASD infants showed the reverse. Figure 2 presents group and subgroup MSEL AE scores.

Fig. 2
figure 2

Mean direct-assessment receptive and expressive language age-equivalence scores (Mullen Scales of Early Learning; Mullen 1995) for a two risk-groups and b four outcome subgroups, across each of four visits. Error bars represent ×2 SE

VABS Parent-Reported Functional Communication AE Difference Scores

A 2 Group (HR, LR) by 4 Visit (mean age: 7, 14, 24, 38 months) ANOVA on VABS AE difference scores revealed a significant main effect of Visit, F(2.02,204.04) = 9.50, p < .001, η 2 = .09 and a trend toward such for Group, F(1,101) = 3.54, p = .063, η 2 = .03 (significant when analysis was without EM imputation, p = .035, η 2 = .05). There was no significant interaction term, F(2.02,204.04) = .62, p = .538, η 2 = .01. The intercept term differed significantly from zero, F(1,101) = 35.00, p < .001, η 2 = .26, indicating greater overall receptive compared to expressive VABS AE scores. However, this appeared to be driven by Visit 4 data where large receptive-expressive AE discrepancies were apparent (favouring the former), compared to Visits 1 to 3 (ps ≤ .007) where scores were more balanced. LR infants showed somewhat greater receptive-expressive AE difference scores than did HR infants.

When comparing LR and three HR outcome subgroups, main effects were significant for both Visit, F(1.97,193.09) = 9.49, p < .001, η 2 = .09, and Subgroup, F(3,98) = 4.16, p = .008, η 2 = .11, while the interaction term remained non-significant, F(5.91,193.09) = 1.28, p = .271, η 2 = .04. The intercept term was significantly different from zero, F(1,98) = 13.48, p < .001, η 2 = .12, indicating greater overall receptive compared to expressive AE scores. Again, however, this appeared due to the substantively larger Visit 4 receptive-expressive discrepancies, compared to the more balanced AE scores across Visits 1 through 3 (ps ≤ .014). Considering subgroups, LR and HR-Typical infants presented somewhat greater AE difference scores than HR-Atypical infants (pairwise, p = .044 & p = .067, respectively) with HR-ASD infants intermediate. Figure 3 presents group and subgroup VABS receptive and expressive AE scores.

Fig. 3
figure 3

Mean (untransformed) parent-reported functional receptive and expressive communication age-equivalence scores (Vineland Adaptive Behaviour Scales—2nd edition; Sparrow et al. 2005) for a two risk-groups and b four outcome subgroups, across each of four visits. Error bars represent ×2 SE

Discussion

The current study evaluated early language profiles in infants at familial HR for ASD. We sought to prospectively track the emergence of a profile seen in many young children with ASD, characterised by relative lack of receptive advantage over concurrent expressive skill. Indices were available across three language measures, taken at multiple visits in infancy and toddlerhood. Our analyses first compared high- and low-risk groups, then separated HR infants by diagnostic outcome, as determined after the third birthday.

Language Profiles in Early Development Toward ASD

Data for the current sample of HR infants with ASD outcome serve to extend, into infancy, the findings from past research sampling young children with established diagnoses. Regarding MCDI vocabulary profiles, the magnitude of receptive advantage over expressive skills shown by infants developing ASD was significantly reduced compared to that of controls. Furthermore, across the second year of life, while LR controls tended to experience continued growth in receptive vocabulary advantage, infants developing ASD showed limited enhancement of receptive skills beyond concurrent gains in expressive vocabulary. Ceiling scores unfortunately precluded ongoing evaluation across the third year of life in this cohort. However, the pattern shown here around the second birthday aligns with findings presented elsewhere for toddlers/young children with established ASD diagnoses (i.e., Charman et al. 2003; Hudry et al. 2010; Luyster et al. 2007). Interestingly, our data also show some consistency with patterns observable in the data presented by Mitchell et al. (2006), from a similarly-sized sample of HR infants with ASD outcome. Mitchell et al. examined early receptive and expressive language but did not specifically address relative balance across these domains. However, comprehensive data tabulation by these authors permits comparison with our own. Around 13 months, mean MCDI receptive-expressive difference scores from Mitchell et al. present quite similarly across ASD-outcome and control groups (and intermediate to our own Visit 2 data). However, by 18 months, data from Mitchell et al.’s ASD sample align closely with our own 24 months data, and present quite differently to data for their sample of controls, showing broad convergence with our own pattern of findings (methodological differences notwithstanding).

Considering profiles obtained from the current MSEL direct assessments of language, a lack of significant intercept term indicates overall alignment across receptive and expressive domains. This is precisely as expected for controls, given procedures of AE metric derivation. Indeed, across all visits, for both controls and HR infants with ASD outcome, receptive-expressive difference scores included confidence intervals spanning zero (with the single exception of Visit 1 scores presenting slight expressive AE advantage in both groups). These results therefore present limited convergence with previous findings using the same and other direct-assessment tools. Previously, MSEL and PLS direct-assessment data have indicated consistently greater expressive compared to receptive AEs in samples of toddlers/young children with established diagnoses (Ellis Weismer et al. 2010; Hudry et al. 2010; Luyster et al. 2008; Volden et al. 2011). Barbaro and Dissanayake (2012) also demonstrated this on the basis of MSEL direct-assessment scores for their sample of infants developing ASD, showing this profile to persist across the second year of life. Again, tabulated data provided by Mitchell et al. (2006) present a profile slightly favouring expressive over receptive scores on the PLS in 13-month-olds with ASD outcome (although data are SS, rather than AEs). Clear such advantage (on both PLS and MSEL) is then shown at 24 months, while controls’ scores present as well-balanced across domains, or favour receptive ability.

In the current study, VABS parent-reported functional receptive and expressive communication AEs also presented fairly similarly across controls and infants developing ASD. A significant intercept term, however, indicated a profile generally favouring receptive AE, albeit alongside a main effect of visit. This amounted to near-zero difference scores across Visits 1 to 3 (consistent with assumptions based on AE metric derivation procedures), but a marked receptive AE advantage at Visit 4. At a subgroup level, this Visit 4 effect held for LR controls, while for infants with ASD, the mean was in the same direction but with confidence intervals spanning zero. A pattern of well-balanced VABS subscale AEs converges with findings from Ellis Weismer et al. (2010) within a broad-spectrum ASD subgroup (but not within a narrower Autism group). However, this diverges from a previous report by Hudry et al. (2010; showing relative expressive VABS AE advantage) and from Luyster et al. (2008, reporting receptive VABS AE advantage).

Such discrepant VABS parent-reported functional communication findings may stem from sample characterisation differences. Profiles featuring balanced AEs, or profiles favouring receptive score, have arisen in samples of young toddlers presenting broader ASD diagnoses and relatively high-level NV ability (Ellis Weismer et al. 2010, CA = 30 months, NVAE = 20 months; Luyster et al. 2008, CA = 28 months, NVAE = 21 months), whereas lack of VABS receptive advantage was previously reported in a sample of older, lower-functioning children with core autistic disorder (Hudry et al. 2010, CA = 44 months, NVAE = 26 months). The current data do converge, therefore, with the former set of findings given this is a relatively high-functioning sample of young toddlers with broad-spectrum ASD diagnoses (i.e., not just core autism), extending past results to suggest that such relative balance in functional receptive and expressive communication skills is stable across early infancy and through toddlerhood, on the pathway toward ASD outcome.

Language Profiles in the Context of Intermediate Phenotypes

Given the interest in using HR designs to study intermediate phenotypes (Elsabbagh and Johnson 2010), we were equally interested to explore early profiles in those HR infants without ASD outcome. BASIS design permitted the delineation of two additional outcome subgroups; 12 infants were considered to present other atypicalities (including subthreshold ASD symptoms and/or developmental delay), while 24 were typically-developing. Into the second year of life, all HR subgroups presented similar MCDI profiles, quite distinct from those of LR controls, suggesting that early atypical receptive vocabulary features within the broader phenotype rather than specifically applicable to ASD outcome. Across the second year, however, clear subgroup differentiation became apparent. Infants with ASD outcome demonstrated minimal growth in receptive advantage across this period. This was true also for HR infants developing other atypicalities. However, infants with typical outcome showed clear acceleration in receptive vocabulary growth over the second year, such that this initially-delayed subgroup presented comparably to LR controls by 24 months.

As already noted, language profiles based on MSEL direct-assessment presented as quite well-balanced for controls and HR infants with ASD outcome, albeit with slight expressive AE advantage at Visit 1. HR infants with other atypical outcome presented similarly, with balanced AE profiles sustained across each visit. HR infants with typical outcome were initially well-balanced, with fluctuation then observed between Visits 3 (favouring receptive AE) and 4 (presenting clear expressive AE advantage). Again, considering parent-report VABS data for functional communication, profiles for HR infants with ASD outcome were consistently well-balanced across visits. However, the patterns observed for the other HR subgroups were somewhat different. Infants with typical outcome maintained close correspondence to LR controls, with initially well-balanced scores, slight favouring of receptive AE at Visit 3 (despite differential use of VABS forms across high- and low-risk groups at this visit) and significant favouring of receptive AE at Visit 4. Here, however, HR infants with other atypical outcome presented differently, more like the ASD-outcome subgroup, with profiles slightly favouring expressive AE at Visit 3, and regaining greater balance at Visit 4.

Elsabbagh and Johnson (2010) proposed that while early perturbations might apply broadly to infants at HR, a canalisation process might restore development toward a more typical trajectory and outcome within a majority subgroup. This is consistent with the current MCDI data, whereby an early lag in relative receptive advantage was observable across HR subgroups, but maintained only in some individuals (i.e., those with ASD diagnostic outcome or presenting subthreshold symptoms and/or other atypicality) while the majority ‘caught up’ to a more normative profile. The picture presented by our MSEL direct-assessment and VABS parent-reported functional communication AE data is less clear, however, and no existing research has yet provided VABS data for similar prospective language profiling, in the context of ASD-risk. Mitchell et al.’s (2006) tabulated MSEL and PLS direct-assessment data (SS reported for 12- and 24-months visits) suggest well-balanced skills, or profiles favouring receptive SS, within their large subgroup of HR infants without ASD outcome (with the exception of 12 months MSEL scores favouring the expressive domain). Strong comparison with this existing dataset is limited by methodological differences (subgrouping and metric choices).

Data from Barbaro and Dissanayake (2012) do also support the notion that processes of canalisation of early risk apply in early development beyond the context of heritable ASD risk. In their community-based sample of infants presenting early signs of ASD, an outcome-subgroup with developmental/language delay showed early but transient receptive language impairment, relative to concurrent expressive skills. The early receptive impairments of infants with later-confirmed ASD, however, persisted through toddlerhood. Current MSEL direct-assessment data provide little in the way of corroborating evidence for such canalisation. However, similarity in the pattern of our MCDI raw vocabulary data and these MSEL direct-assessment data is interesting, supporting the need for further prospective, longitudinal investigation of early language profiles in subgroups of infants with and without ASD outcome.

Measuring Receptive and Expressive Language

Consideration of the various available tools for language assessment in young children is required, in attempting to reconcile these results. While available measures span parent-report and direct-assessment types, and routinely differentiate receptive and expressive domains, the specific language/communication facets under evaluation differ across measures and also at different points along a given measure’s scale. Furthermore, particulars of the derived metrics are also relevant, and interpreting standardised test scores poses clear challenge with children with ASD (i.e., due to floor effects on standardised scores, or lack of age-appropriate normative comparison groups). Consideration is limited here to the measures employed in the current study, although these comments apply equally to other measures.

The MCDI (Fenson et al. 1992) provides a parent-report of child knowledge and meaningful production of vocabulary, based on an inventory of words commonly acquired in early language development. Clear benefit lies in the fact that raw data (receptive and expressive vocabulary counts) are meaningful in their own right, removing reliance on norm-referenced scores. While still appropriate for use with many young children with ASD (i.e., who present limited vocabularies), a clear limitation is in the restriction of this measure to early language acquisition. MCDI data collected here at Visit 4 were necessarily excluded from analysis due to many children approaching instrument ceiling. Nevertheless, the MCDI presents perhaps the most useful tool for evaluating relative receptive-expressive skills in young children. Receptive vocabularies necessarily exceed expressive counts (i.e., children cannot meaningfully use words they do not yet understand). The magnitude of receptive-expressive difference scores is then easily compared across groups, as in the current study.

Vocabulary knowledge is only one feature of language development, however, and many other tools take a more holistic approach. The MSEL (Mullen 1995) is a direct-assessment of child ability to demonstrate understanding of and compliance with the examiner’s spoken language and to produce appropriate communicative or verbal responses to the examiner’s bids. Early items test attention to sound and voice, production of vocalization and gesture, and recognition and production of familiar words. Later items tap the understanding of action words, concepts (e.g., size) and simple questions, alongside the repetition and spontaneous production of longer verbal constructions. The VABS (Sparrow et al. 2005) takes an approach which is broader still, assessing every-day and functional child communication skills use (including language), via parent-report. Early items tap the ability to direct and sustain attention, and to show functional comprehension and use of various communicative signals including words. Later items address comprehension of and compliance with longer instructions and child use of speech to gain and provide information to others. The emphasis here is on spontaneous and regular use of skills, as opposed to the optimal performance arguably elicited in the context of a direct-assessment.

As is the case for many other ability and achievement tests, raw MSEL and VABS scores are not readily interpretable and so interpretation relies on comparison of an individual’s raw score/s with those obtained by members of a large normative sample. Use of AE metrics is often preferred over other, more psychometrically-robust indices (e.g., SS, percentiles, etc.) with ASD samples, given the greater likelihood that these will produce interpretable data distributed in such a way as to permit parametric analysis (i.e., distributions of SS are often skewed due to floor-level performance). AE scores, as reported here, are derived on the basis of mean raw scores obtained by age-level subgroups within the norming sample. As such, comparability across an individual’s various domain AEs might be assumed, but derivation and smoothing procedures mean that some cross-domain AE variability is likely (see Murphy and Davidshofer 1998). Neither the MSEL or VABS manuals provide guidance on interpreting domain difference scores (e.g., regarding statistically-significant or clinically-meaningful discrepancies). Furthermore, discrepancies are likely to have different implications at different child ages; a 3-month domain difference in infancy is unlikely to equate to one of the same magnitude in toddlerhood.

Comparison across target and control groups, as undertaken here, affords some capacity for interpreting AE difference scores. However, this may also account for our substantive lack of significant VABS parent-report and MSEL direct-assessment findings, given large within-group variability/confidence intervals and substantial between-group overlap. Dedicated research is required to determine the extent to which divergent receptive-expressive domain scores remain within ‘normal limits’ on a given test, and the point at which such becomes statistically and/or clinically significant. While remaining the most broadly applicable and readily-interpretable norm-referenced scores, results based on the AE metrics of tests like the MSEL and VABS present far greater ambiguity than the meaningful raw data obtainable in the form of MCDI vocabulary counts.

The current, prospectively-collected MCDI data confirm a lack of normative receptive vocabulary acquisition, relative to concurrent expressive acquisition, in HR infants with ASD outcome. They also indicate such atypicality to present relatively early, observable around the same time that the first overt behavioural symptoms of ASD usually appear (Ozonoff et al. 2010), with maintenance through toddlerhood. Lack of receptive vocabulary advantage may therefore present another early sign of emerging ASD, but a relatively subtle one which initially lacks specificity to ASD, reflecting the broader phenotype. Maintenance of this early atypicalty across the second year of life, however, shows greater specificity to later diagnostic outcome and may present an example of the gradual emergence of behavioural symptoms and/or subtle developmental regression in the early course of ASD, as proposed by Ozonoff et al. (2010). The demonstration of differential pathways of relative vocabulary acquisition in the context of heightened ASD-risk supports notions of compounding atypicality along a pathway toward ASD diagnosis and canalisation of early atypicality toward a more normative trajectory in most HR infants as proposed by Elsabbagh and Johnson (2010). Notably, vocabulary counts in this latter subgroup fell well below age-expected levels, receptively and expressively (also fairly consistently true across MSEL direct-assessment of langauge and VABS parent-reported functional communication scores; see Tables 4, 5) despite close matching to the LR group on outcome NV ability level (see Table 2).

Table 4 Mean (and SD) receptive–expressive difference scores (untransformed data) for LR and HR groups, and for HR diagnostic subgroups (with typical, atypical, and ASD outcome), across three language measures and three/four visits
Table 5 Mean (and SD) receptive and expressive scores (untransformed data) for LR and HR groups across three measures and three/four visits

The striking acceleration in relative receptive vocabulary advantage across the second year of life highlights the possible advances to be made in our understanding of ASD and its broader phenotype through the investigation of developmental pathways in the context of early risk. Groups employing HR sibling designs regularly separate ‘affected’ and ‘unaffected’ outcomes (e.g., see Mitchell et al. 2006). However, the latter plausibly comprises substantial heterogeneity in developmental trajectories and outcomes, so dedicated attention those HR infants who do not present ASD at outcome is warranted in illuminating those processes which serve to steer infants away from disorder, toward more normative outcome.

Strengths, Limitations and Future Directions

Strengths of the current study lay in the recruitment and retention of well-characterised high- and low-risk groups, longitudinal assessment on multiple measures, and rigorous outcome characterisation. Limitations include the relatively modest sample size, particularly separating infants by outcome subgroup. While diagnostic stability of the ASD- and typical-outcome groups is likely, this is less certain for individuals within our other atypicality subgroup, who presented sub-threshold ASD traits and/or global developmental delay (see Chawarska et al. 2010). This group tended to present language profiles intermediate to those of the outcome subgroups, and future follow-up will permit observation of symptom consolidation or processes of canalisation within this subgroup. Further recruitment (underway) may permit the more specific delineation of outcome subgroups (e.g., high- vs. low-functioning ASD, sub-threshold ASD only, global delay only, etc.) potentially affording further insight into specific developmental language profiles. Given that some recent reports have suggested atypical receptive-expressive language profiles to present more commonly in individuals with ASD and relatively high-level NV functioning (Hudry et al. 2010; Volden et al. 2011), more detailed analysis of subgroups or covariation among skills sets will be important in future studies with larger outcome groups.

Limitations of the current study include the relatively large between-visit intervals, precluding more detailed observation of early language trajectories across the infant and toddlers years. Future replication with more frequent sampling (particularly during the second year of life) will afford better understanding of those points at which pathways begin to diverge toward differential outcomes. Follow-up beyond the third birthday would also afford more ready comparison of these prospective data with existing cross-sectional data from samples of young children (e.g., Hudry et al. 2010). Additionally, while basic proband screening was undertaken to confirm parent-reported ASD diagnosis, comprehensive characterisation was not conducted so it is plausible that some probands might have failed to meet full ASD criteria, reducing the relative ‘risk’ of some HR infants. This is unlikely to be particularly problematic for the current study as interpretation of our results has been primarily by diagnostic outcome of the infants themselves. This classification was made by a team of experienced clinical researchers, using gold-standard instruments and procedures (though not blind to risk-group status) and case review of data collected across the 24- and 38-month visits, including the ADOS-G and ADI-R.

Finally, a broad limitation regards the lack of any clearly-articulated mechanism underpinning a lack of receptive advantage over concurrent expressive skill, in children with ASD. The current results have provided some novel insights, particularly regarding the early generality and later specificity of observed effects to heightened risk- versus diagnosed outcome, respectively. The lack of corroborated findings across the three measures employed here, however, precludes us from drawing any stronger conclusions about the underpinnings of relative language profiles at this time. Given that clearest evidence comes from the parent-report MCDI, under-reporting of child comprehension and over-reporting of echolalic speech plausibly play a role in these findings. While Houston-Price et al. (2007) have indeed shown discrepancies across parent-reported and preferential-looking indices of receptive vocabulary in typical infants, this seems unlikely to account fully for the phenomenon under current investigation, given the differential subgroup patterns observed here and also the similar pattern reported by other groups (but not replicated here) on the basis of direct-assessments (e.g., Barbaro and Dissanayake 2012; Ellis Weismer et al. 2010; Hudry et al. 2010; Luyster et al. 2008; Volden et al. 2011).

Social and cognitive processes involved in language acquisition (e.g., social referencing, recognition vs. recall types of memory, etc.) likely have important roles to play regarding relative receptive-expressive language acquisition, but have yet to be specifically investigated. The current results provide the very general indication that the mechanism at play is disrupted early in development within intermediate phenotypes, maintaining its effects into toddlerhood as ASD symptoms consolidate, but canalizing its early effects within in the majority HR subgroup headed toward typical outcome. Attempts to demonstrate how social-cognitive processes typically produce clear receptive advantage over expressive vocabulary skills, but manifest in the distinct profile and trajectory demonstrated here on the course toward ASD, should now begin.