Given our broad knowledge base of early temperament, measurement of temperament in populations at high risk of developmental disorders provides an informative framework for increasing our understanding of these disorders (Nigg 2006). Temperament and personality traits are closely linked to basic psychobiological systems, which in turn are governed by specific genes (Whittle et al. 2006). Thus, temperament may serve as an early endophenotype; that is, physiological or psychological / cognitive markers that can be measured in individuals with a given disorder and are also present more often in family members than in unrelated individuals (Gottesman and Gould 2003). Since endophenotypes are directly linked to genetic influences, they are particularly useful in the study of disorders with complex inheritance (Doyle et al. 2005), such as autism spectrum disorder (ASD).

Temperament research has a long history (see Rothbart 2011 for a review), with the most well-known framework originating from the work of Chess and Thomas (1990). The definition of temperament for the current study was based on that of Rothbart and her colleagues, built upon the pioneering work of Thomas and Chess. Rothbart and Bates (2006) defined temperament as individual differences in reactivity and regulation. Reactivity refers to emotional valence and intensity, which are dependent on interaction between the autonomic nervous system and the brain (Porges 2005). Factor analysis of childhood temperament traits using questionnaires developed by Rothbart and colleagues has consistently yielded three main factors -- Extraversion, Negative Affectivity, and Effortful Control (Rothbart 2011). The first two factors are closely tied to emotional reactivity, while the third factor is regulatory. Extraversion is defined as an approach system; positive affect is an important component of this factor. Negative Affectivity refers to a child’s propensity to experience distress, including sadness, anxiety and frustration. Finally, Effortful Control involves the ability to regulate reactivity through the use of attention (Rothbart and Posner 2006).

Temperament and ASD

ASD is a disorder involving problems with social communication and restricted/repetitive interests (American Psychiatric Association [APA], 2013). Accumulating evidence indicates that individuals with ASD show a specific temperament profile consisting of lower positive affect, higher negative affect, and lower regulation (e.g., Garon et al. 2009). Compared to typically developing children, children with ASD are perceived as exhibiting lower positive affect and neutral facial expression (i.e., little smiling and low approach; Adamek et al. 2011; Capps et al. 1993; Loveland et al. 1994; Konstantareas and Stewart 2006). In adulthood, individuals with ASD are reported to show lower novelty seeking and reward dependence (Anckarsäter et al. 2006; Soderstrom et al. 2002), higher shyness and introversion (Anckarsäter et al. 2006; Ozonoff, Garcia, Clark, and Lainhart 2005), and higher social withdrawal than typical controls. In contrast to positive affect, higher negative affect is common in children and adults with ASD, when compared to both typical and developmentally delayed controls (Adamek et al. 2011; Anckarsäter et al. 2006; Capps et al. 1993). Finally, lower levels of focusing attention, shifting attention, and inhibiting prepotent responses have been reported for children with ASD than for either typical or developmentally delayed controls (Adamek et al. 2011; Konstantareas and Stewart 2006). Similarly, children with ASD have been rated as less flexible, more difficult to distract and less goal-oriented (Brock et al. 2012). In adults with ASD, problems with inattention (Bradley and Isaacs 2006) and lower levels of self-directedness and responsibility (Anckarsäter et al. 2006; Soderstrom et al. 2002) also implicate low Effortful Control and Conscientiousness.

More recently, this pattern of low positive affect, high negative affect and low regulation has been confirmed in prospective studies of high-risk infants subsequently diagnosed with ASD (Bryson et al. 2007; Clifford et al. 2013; del Rosario et al. 2014; Filliter et al. 2015; Garon et al. 2009). Notably, two studies have indicated that infants who are subsequently diagnosed with ASD show distinctive temperament trajectories in the first year of life (Clifford et al. 2013; del Rosario et al. 2014). For example, Clifford et al. (2013) found evidence of lower positive affect, higher negative affect, and lower levels of regulation in children from the high-risk (HR) group who were later diagnosed with ASD, compared to children in the low-risk (LR) group. Moreover, children who were in the HR group and who were not diagnosed with ASD also showed significant differences in temperament compared to the LR group. The overall pattern of findings from these two studies indicates that temperament distinguished children later diagnosed with ASD not only from LR controls, but also from their HR peers who did not develop ASD.

Development of Higher Order Temperament Traits

In a prospective study of infant siblings of children with ASD, Garon et al. (2009) found that the HR group was distinguished from a LR control group at 24 months by a combination of low scores on Effortful Control and high scores on negative affect scales from a parent-report inventory. HR infants who were diagnosed with ASD at 36 months were distinguished from their non-ASD counterparts by a combination of low scores on positive anticipation and attention shifting. These distinctions parallel the higher-order factors described by Evans and Rothbart (2009)— one factor was composed of scales related to positive affect and orienting sensitivity and the other, scales related to low negative affect and high effortful control. Two similar parallel meta-traits have been found in the personality literature (DeYoung 2006). DeYoung (2006) labeled the higher-order trait with positive affect as Plasticity (reflecting the tendency to explore) and the trait with negative affect, Stability (reflecting stable emotional, social and motivational regulation).

Patterns of correlations among temperament dimensions reported during childhood suggest a similar organization (Rothbart and Bates 2006). Furthermore, Shiner and DeYoung (2013) argued that antecedents of these meta-traits are present during infancy, with positive affect being a precursor to Plasticity and negative affect being a precursor to Stability. Evans and Rothbart (2009) suggested that these higher-order factors may reflect how the temperament dimensions of reactivity and regulation become organized over time. Of particular interest for ASD is that, in typical development, positive affect during infancy predicted later Effortful Control (Komsi et al. 2006), whereas no concurrent association between the two has been generally found.

The Present Study

The current study had five main goals. First, we were interested in specific temperament differences between children who had a sibling with ASD (HR) and those with no first-degree relatives with ASD (LR). We predicted that the HR group would be rated higher by parents on scales assessing negative affect (e.g., fear) and lower on scales assessing regulation (e.g., inhibitory control). Second, we were interested in whether the reactive and regulatory aspects of temperament were organized similarly in HR and LR groups. For this latter purpose, we used structural equation modeling (SEM) to determine whether early reactive aspects of temperament (i.e., positive and negative affect) at 12 months predicted later reactive and regulatory aspects of temperament at 24 months. Based on previous findings in typically developing children (e.g., Komsi et al. 2006), we expected that positive affect would play a more important role than negative affect in predicting effortful control in both HR and LR groups.

The remaining goals were related to early temperamental heterogeneity within the HR group. First, we were interested in the associations between temperament factors (at 12 and 24 months) and ASD symptoms (at 36 months) in our SEM. Based on our and others’ previous findings (e.g., Clifford et al. 2013), we expected that differences in positive affect and effortful control would be associated with ASD symptom severity. Second, we were interested in temperament differences among HR subgroups based on timing of diagnosis (Garon et al. 2009). Again, we expected to find differences between the early and later diagnosed groups on positive affect and regulation. Finally, given findings of a male bias in ASD prevalence and possible differences in phenotypic expression of symptoms in boys and girls (Zwaigenbaum et al. 2012), we included sex as a moderator in our analyses.

Method

Participants

The sample included 545 infants from our prospective infant sibling study who were followed from the first year until 36 months of age. Of these, 383 were HR infants with an older sibling with ASD, 98 of whom were diagnosed with ASD at 36 months (25.6 %). The ASD group comprised 50 children who were first diagnosed at 24 months (‘Early Diagnosis’) and 48 first diagnosed at 36 months (‘Late Diagnosis’). The remaining 162 were LR control infants, with no known first- or second-degree relatives with ASD. Most infants in both the HR and LR groups were enrolled in the study at 6 months of age (84 % and 80 %, respectively), and the remainder by 12 months. All of the infants were born full-term (at least 37 weeks gestation). Participants were recruited through four major ASD diagnostic and treatment centers in Canada. Diagnoses of the older siblings were confirmed by expert clinical judgment based on DSM-IV-TR criteria (APA, 2000), and in most cases, included the Autism Diagnostic Observation Schedule (ADOS; Lord et al. 2000) and the Autism Diagnostic Interview-Revised (ADI-R; Lord et al. 1994). None of the older or younger siblings had a known genetic or chromosomal syndrome (e.g., fragile X syndrome) or neurological disorder (e.g., tuberous sclerosis) that could account for the ASD. The large majority of participants were Caucasian (85.3 %) and came from families classified as medium to high socioeconomic status (77.7 %). Participant characteristics are shown in Table 1.

Table 1 Participant characteristics (Mean Scores) as a function of group

Procedures

As part of the protocol for our larger prospective infant sibling study, the Infant Behavior Questionnaire (IBQ; Rothbart 1981) was completed by parents at 12 months and the Toddler Behavior Assessment Questionnaire-Revised (TBAQ-R; Goldsmith 1996; Rothbart et al. 2003) was completed by parents at 24 months. At 36 months, all children were assessed for ASD, blind to risk status and previous assessments, using a combination of the ADI-R (Lord et al. 1994), ADOS (Lord et al. 2000) and expert clinical judgment with reference to DSM-IV-TR. The children were also assessed on the Mullen Scales of Early Learning (MSEL; Mullen 1995) at 12, 24 and 36 months. Ethics approval was granted for each site involved in the study and written, informed consent was obtained from a parent or guardian of each infant participant.

Measures

Infant Behavior Questionnaire (IBQ)

The IBQ (Rothbart 1981) is a reliable and well-validated measure of temperament designed for infants aged 3–12 months (for a review, see Rothbart and Bates 2006). The questionnaire is comprised of six subscales with 94 items. Items are scored on a Likert scale from 1 (never) to 7 (always) and are averaged for each scale. For the present sample, Cronbach alphas ranged from 0.73 to 0.86 for the scales (see Table 2).

Table 2 Means and standard deviations of temperament at 12 and 24 months as a function of group

Toddler Behavior Assessment Questionnaire-Revised

The TBAQ-R is a reliable and well-validated measure of temperament designed for infants aged 18–35 months (Goldsmith 1996; Rothbart et al. 2003). It is comprised of 13 scales, including the original four scales from the TBAQ (Activity Level, Anger/Frustration, Positive Anticipation, and Social Fear; Goldsmith 1996). Items are scored using the same Likert scale as the IBQ above. For the present sample, Cronbach alphas ranged from 0.68 to 0.89 for the scales (see Table 2).

Autism Diagnostic Observation Schedule

The ADOS (Lord et al. 2000) is a semi-structured, standardized observational assessment designed to elicit behavior that is characteristic of ASD. It has been found to reliably distinguish children with ASD from typical and developmentally disabled non-autistic controls (Lord et al. 2000). All ADOS administrations were conducted by examiners who had attained reliability according to the developers’ criteria.

Autism Diagnostic Interview-Revised

The ADI-R (Lord et al. 1994) is a structured parent interview designed to elicit information on developmental history, assessing various behaviors associated with ASD. This instrument has been found to discriminate children ASD from other developmental disabilities and has excellent inter-rater reliability (Lord et al. 1994). ADI-R interviews were conducted by research-reliable examiners.

Mullen Scales of Early Learning

The MSEL (Mullen 1995) contains items assessing motor, cognitive, and language development in children from birth to 68 months. The test has been found to have good validity and test-retest reliability (Mullen 1995).

Statistical Approach

To assess overall differences on temperament scales between the LR and HR groups, a multivariate analysis of variance (MANOVA) was used with group (HR, LR) and sex as the independent variables. Dependent variables included all temperament scales at 12 months and at 24 months. These were followed up by Benjamini-Hochberg corrected F-tests (Benjamini and Hochberg 1995). In this method, the p-values are ordered smallest to largest. The alpha level for each test is then set at \( \frac{k*\alpha }{m} \) with k corresponding to the p-value’s rank (e.g., lowest p-value = 1) and m corresponding to the number of comparisons; in this case, six. The comparisons stop once one of the t-tests is rejected.

To construct the temperament factors, two confirmatory factor analyses (CFAs) were conducted on the temperament scales at 12 and 24 months, using the method of maximum likelihood estimation from AMOS (Arbuckle 2006). Ninety-four items for the Infant Behavior Questionnaire and 144 items for the Toddler Behavior Questionnaire exceeded the ratio-to-parameter estimate of 10:1 recommended for CFA (Kline 2011). For each CFA, temperament scales were used as item parcels. Although the use of item parceling has been debated in the SEM literature (see Little et al. 2013), this strategy was advantageous for the current study. First, parcels compared to items typically show better psychometric characteristics such as higher reliability and have higher ratio of common-to-unique factor variance (Little 2013). Second, when a questionnaire has many items, the use of parcels can bring the ratio of parameters to sample size to acceptable levels (Little 2013).

Given SEM is sensitive to departures from normality, most variables were transformed due to either negative or positive skewness using Box Cox transformation (Osborne 2010). Each variable was also re-scaled to range from 0 to 1, as suggested by Little (2013). To provide a robust measure of symptoms (Wiggins et al. 2014), the CFA run for ASD symptoms at 36 months included scores from both ADOS and ADI-R. Five variables were included in this CFA: (1) ADOS Social Affect (SA) score, (2) ADOS Repetitive and Restricted Behavior (RRB) score, (3) ADI-R social score, (4) ADI-R communication score, and (5) ADI-R repetitive behavior score. To derive ADOS SA and RRB scores, scores from the original ADOS were transposed onto ADOS-2 algorithms (Lord et al. 2012).

Assumptions of multivariate normality were evaluated using Mahalanobis distances and Mardia’s test for multivariate outliers (Mardia 1970). Separate analyses of Mahalanobis distances for the LR and HR group were conducted with all variables. These analyses revealed 9 multivariate outliers, all p’s < 0.001 (1 LR; 8 HR, including 3 with ASD). When these cases were removed, Mardia’s test indicated conditions of multivariate normality were met for the LR group, p = 0.623 and for the HR group, p = 0.187. Table 1 shows the size of our final sample.

The three CFAs were tested for measurement invariance, to test for equivalence in latent constructs across the LR and HR groups. Four levels of measurement invariance are generally investigated when comparing groups (Widaman & Reise 1997). The first level, configural invariance, tests whether the two groups have the same general factor structure. Metric invariance tests whether groups have the same factor loadings, whereas scalar invariance (Widaman and Reise 1997) tests the additional requirements of the same intercepts for the two groups. Finally, residual invariance (Widaman and Reise 1997) adds the constraint of equal error variances. Since each model was nested within the other, they were compared using chi-square differences. In addition, differences in the CFI | ≤ 0.01| (Cheung and Rensvold 2002) and RMSEA | ≤ 0.01| (Chen 2007) were used to evaluate equivalence since using the chi-square difference test may result in rejection equivalence for minor differences (Little 2013).

The three fitted CFAs were used as a foundation for the SEMs. The first fitted model consisted of the 12-month temperament factors predicting ASD symptoms at 36 months. MSEL Early Learning Composite scores at 12 months were used as a covariate. The model consisted of all possible calculated paths from the temperament factors and IQ to ASD symptoms at 36 months. Finally, to explore whether Effortful Control at 24 months mediated the predicted effect of early Positive Affect on ASD symptoms, the temperament factors at 24 months were added to the model (see Fig. 1 for final model). Preliminary analyses showed that sex reached metric invariance for all three CFAs (p < 0.1 for all), indicating that factor loadings were the same for boys and girls for temperament factors and ASD symptoms. As a result, boys and girls could be included in the same SEM and sex was only used as a covariate in the SEM models to control for this variable.

Fig. 1
figure 1

The structural equation models for the high-risk and control group. High-risk/low-risk groups. Values represent standardized coefficient that are statistically significant at p < 0.05. Non-significant parameters remain in the model but are not displayed in the figure

Data were missing for 8.1 % of data points in the HR group and 4.7 % of data points in the LR group. Analyses were conducted to determine whether participants with missing data differed on sex, SES, or IQ; all differences were non-significant. Missing data were estimated using the Full Information Maximum Likelihood (FIML) in AMOS (Kline 2011). Maximum likelihood is relatively robust to bias even under conditions of moderate non-normality (Curran, West, and Finch 1996). Model fit was assessed via two criteria. The root mean square error of approximation (RMSEA) indicates how well the model parameters would fit the population covariance matrix, with 0.05 to 0.08 considered acceptable fit, and 0.01 to 0.05 considered close fit (Little 2013). The comparative fit index (CFI) corrects for sample size, with values of 0.90 to 0.95 indicative of acceptable fit, and values > 0.95 indicating good fit (Hu and Bentler 1999).

To investigate differences in temperament factors within the HR group, a multivariate profile analysis was conducted with group (early-, late-, and non-ASD sibs) as the independent variable, and temperament factors at 12 and 24 months as the dependent variables. The five temperament factors calculated in the CFA were used as the dependent measures.

Results

Temperament Differences between the HR and LR Groups

The means and standard deviations for all temperament scales by group and age are shown in Table 2. For the 12-month temperament scale, the group effect was significant, F (6, 462) = 7.14, p < 0.001, η 2 = 0.085, 95 % CI [0.03; 0.13]. The Sex and Sex X Group interaction were both non-significant, both F’s < 1. The group effect was followed up by Benjamini-Hochberg corrected t-tests for each scale. Using this method, Distress to Limitations and Fear were found to be significant. As can be seen in Table 2, at 12 months the HR group was rated by parents as higher on Distress to Limitations and Fear compared with LR controls.

For the 24-month temperament scales, the group effect was also significant, F (11, 437) = 7.42, p < 0.001, η 2 = 0.157, 95 % CI [0.08; 0.20]. The main effect of sex was significant, F (11, 437) = 4.58, p < 0.001, η 2 = 0.010, 95 % CI [0.04; 0.14] but the Sex X Group interaction was non-significant, F (11, 437) < 1. There were significant sex differences for Inhibitory Control, F (1, 447) = 15.62, p < 0.001, η 2 = 0.034, 95 % CI [0.01; 0.07], Activity Level, F (1, 447) = 11.51, p < 0.001, η 2 = 0.025, 95 % CI [0.01; 0.06], Anger, F (1, 464) = 10.75, p = 0.001, η 2 = 0.024, 95 % CI [0.01; 0.06] and Attention Shifting, F (1, 447) = 9.50, p =0.001, η 2 = 0.021, 95 % CI [0.00; 0.05]. Boys were rated by parents as higher on Activity Level (M = 4.42, SD = 0.92) and Anger (M = 3.95, SD = 0.92) compared with girls for Activity Level (M = 4.09, SD = 0.99) and Anger (M = 3.69, SD = 0.95). In contrast, boys were lower on Inhibitory Control (M = 3.95, SD = 0.88) and Attention Shifting (M = 4.77, SD = 0.90), compared with girls for Inhibitory Control (M = 4.27, SD = 0.91) and Attention Shifting (M = 4.99, SD = 0.89).

There were significant group (HR vs. LR) differences on Anger, Sadness, Inhibitory Control, Soothability, Fear, Attention Focus, High Pleasure, and Low Pleasure. Temperament means for both groups, shown on Table 2, reveal that the HR group was rated by parents as higher than controls on Fear, Sadness, and Anger, but lower on Inhibitory Control, Soothability, Attention Focus, High Pleasure, and Low Pleasure.

Confirmatory Factor Analysis for IBQ at 12 months

A preliminary CFA indicated that Orienting did not load significantly onto any factor and so was removed from further analysis. Table 3 provides a summary of the results of the group measurement invariance testing and standardized factor loadings. Smiling and Soothability at 12 months loaded significantly onto the Positive Affect factor, whereas Activity Level and Distress to Limitations loaded significantly onto the Negative Affect factor. Fear cross-loaded negatively onto the Positive Affect factor. The unconstrained model estimated for the two groups separately showed adequate fit, providing support for configural invariance. Constraining the factor loadings to be equivalent for the groups to test for metric invariance resulted in an adequately fitted model; the difference between the models was not significant (Table 3), suggesting that factor loadings were equivalent for HR and LR groups. Constraining the intercepts to be equivalent across group for scalar invariance produced a poorer fitting model (see Table 3), suggesting group differences on subscale intercepts. Examining critical ratios indicated group differences on Fear (z = 4.48, p < 0.001) and Anger (z = 6.11, p < 0.001), with the HR group having a higher intercept for Fear (HR = 0.47, LR =0.40) and Anger (HR = 0.51, LR = 0.41).

Table 3 Summary of results from confirmatory factor analysis and structural equation modelling

Confirmatory Factor Analysis for TBQ at 24 months

At 24 months, the CFA of the unconstrained model with the two groups showed adequate fit, again providing support for configural invariance for the two groups. Table 3 shows the loadings from the CFA and test of group factorial invariance. High Pleasure, Activity Level and Positive Anticipation loaded positively onto a Positive Affect factor, whereas Fear and Inhibitory Control loaded negatively onto this factor. Fear, Sadness, and Anger loaded positively onto a Negative Affect factor. Finally, Soothability, Low Pleasure, Inhibitory Control, Attention Shifting and Attention Focus loaded positively onto an Effortful Control factor, whereas Activity Level and Anger loaded negatively onto this factor. Constraining the factor loadings to be equivalent for the groups to test for metric invariance resulted in an adequately fitted model. While ∆χ 2 (12) = 21.04, p = 0.050, indicated a marginal difference between the configural and metric models, using ∆CFI criteria of |0.01| (Cheung and Rensvold 2002), the ∆CFI = 0.005 and slight improvement in RMSEA (Chen 2007) provided evidence for the groups having passed metric invariance. Constraining intercept loadings for scalar invariance, however, resulted in a significantly poorer fitting model (see Table 3), suggesting that groups were different on subscale intercepts. Examination of critical ratios indicated a significant group difference on the majority of subscale intercept, including Fear, Anger, Sadness in which the HR had higher intercepts and Soothability, Low pleasure, Inhibition, High Pleasure, and Attention Focus in which the HR had lower intercepts.

Examining the associations among the temperament factors indicated some interesting differences between the groups. While the association between Positive Affect and Negative Affect was not significantly different for the HR (β = 0.60) and LR (β = 0.39) groups (z = 1.73, p = 0.08), the association between Negative Affect and Effortful Control was significantly different for the two groups (z = 3.17, p = 0.002). Specifically, the association was not significant for the HR group (β = −0.10, p = 0.190), but was significant for the LR group (β = −0.58, p < 0.001). Finally, the association between Positive Affect and Effortful Control was significantly different for the two groups (z = 2.45, p = 0.014), with the association being stronger for the HR group (β = 0.55, p < 0.001) in comparison to the LR group (β = 0.26, p = 0.018).

Confirmatory Factor Analysis for ASD symptoms

For ASD symptoms at 36 months, a one-factor CFA of the unconstrained model with the two groups showed good fit, χ 2 (4) = 4.25, p = 0.373, CFI = 1.0, RMSEA = 0.011, again providing support for configural invariance for the two groups. For the HR group, the loadings for the ASD symptom factor were ADOS SA (0.69), ADOS RRB (0.57), ADI-R Social (0.87), ADI-R Communication (0.89), and ADI-R repetitive behavior (0.68), with all loadings being highly significant, p < 0.001. For the LR group, loadings for the ASD symptom factor were ADOS SA (0.43), ADOS RRB (0.33), ADI-R social (0.41), ADI-R communication (0.64), and ADI-R repetitive behavior (0.41), with all loadings being highly significant, p < 0.001. Constraining the factor loadings to be equal resulted in a good fitting model, ∆χ 2 (2) = 0.788, p = 0.753, ∆CFI < 0.001. Constraining the intercepts to be the same for both groups resulted in a model with a significantly poorer fit, ∆χ 2 (5) = 124.752, p < 0.001, ∆CFI = 0.12, with all intercepts being significantly higher for the HR group as compared to the LR group (all p’s < 0.001).

Predicting Effortful Control and ASD symptoms

The first model consisted of only the 12-month temperament factors, sex, IQ at 12 months, and ASD symptoms at 36 months. This model showed an adequate fit, χ 2 (99) = 154.39, p < 0.001, CFI = 0.961, RMSEA = 0.032. For the HR group, the majority of paths were significant. Positive Affect (β = −0.30, p < 0.001), sex (β = −0.21, p < 0.001) and 12-month IQ (β = −0.33, p < 0.001) significantly predicted ASD symptoms at 36 months. Negative Affect at 12 months was not significantly associated with later ASD symptoms (β = −0.11, p = 0.113). For the LR group, only sex was significantly associated with later ASD symptoms (β = −0.38, p < 0.001).

Next, the full model was fitted by incorporating the temperament factors at 24 months. The model showed adequate fit, χ 2 (404) = 667.11, p < 0.000 (CFI = 0.924, RMSEA = 0.035). Non-significant paths for both groups were trimmed from the model. The new trimmed model was not significantly different, χ 2 (6) = 8.413, p = 0.422, from the full model. The final model had adequate fit, χ 2 (410) = 673.12, p < 0.000, CFI = 0.924, RMSEA = 0.035. Figure 1 shows the final trimmed model for both groups.

For the HR group, there was a strong positive association between respective 12-month and 24-month reactive factors, Negative Affect (β = 0.56, p = 0.001) and Positive Affect (β = 0.48, p = 0.001), indicating continuity in these constructs over that interval. As predicted, there was a strong association between Positive Affect at 12 months and Effortful Control at 24 months (β = 0.50, p = 0.001), whereas the association between Negative Affect at 12 months and Effortful Control at 24 months was significant, but small (β = −0.18, p = 0.01).

Four variables significantly predicted ASD symptoms at 36 months. Being female predicted lower levels of ASD symptoms (β = −0.16, p < 0.001). Lower IQ at 12 months (β = −0.20, p < 0.001) combined with lower Effortful Control score at 24 months (β = −0.48, p < 0.001) predicted more ASD symptoms. Interestingly, Negative Affect at 12 months (β = −0.21, p < 0.001) with 24-month temperament incorporated into the model, significantly predicted ASD symptoms (see Discussion). In contrast, the association between Positive Affect at 12 months and ASD symptoms was no longer significant (β = −0.05, p = 0.569) (path removed in the trimmed model in Fig. 1). Bootstrapping was used to estimate the indirect effect of Positive Affect at 12 months on ASD symptoms at 36 months. The indirect effect was significant (95 % C.I., −0.038 to -0.019, p < 0.001), suggesting that the association between Positive Affect at 12 months and ASD symptoms at 36 months was fully mediated by its association with 24-month Effortful Control.

The LR group showed similar continuity between the 12-month and 24-month reactive temperament factors, Negative Affect (β = 0.60, p = 0.001) and Positive Affect (β = 0.53, p < 0.001). As with the HR group, higher Positive Affect at 12 months (β = 0.30, p = 0.003), and lower Negative Affect at 12 months (β = −0.29, p =0.004) predicted higher Effortful Control at 24 months at 24 months. In contrast to the HR findings, sex alone significantly predicted ASD symptoms at 36 months (β = −0.39, p < 0.001) while IQ was marginally significant (β = −0.17, p = 0.099).

Temperament within the HR group, Stratified by Diagnostic Outcome

Figure 2 shows the temperament factor scores as a function of diagnostic status in the HR group. Results of the MANOVA indicated a significant group effect, F (2, 367) = 22.54, η 2 = 0.109, p < 0.001, 95 % CI [0.05; 0.17] significant main effect of temperament scale, F (4, 364) = 2.75, η 2 = 0.029, p =0.028, 95 % CI [0.00; 0.06], and a significant Group X Temperament Scale interaction, F (4, 364) = 3.21, η 2 = 0.034, p = 0.001, 95 % CI [0.01; 0.05] suggesting different temperament profiles for the three groups. The main effect and all interactions with sex were non-significant, all p’s > 0.05. The significant interaction of Group X Temperament was followed up with two profile analyses comparing the non-diagnosed HR group with the diagnosed HR group as a whole, and a second comparing early- versus late-diagnosed HR children.

Fig. 2
figure 2

Scores on the temperament factors as a function of diagnosis for the high-risk sample

Diagnosed versus Non-Diagnosed HR Groups

The group effect was significant, F (1, 369) = 34.78, η 2 = 0.086, p < 0.001, 95 % CI [0.04; 0.14], while the main effect of temperament scale was not, F (4, 366) = 1.54, η 2 = 0.017, p =0.189, 95 % CI [0.00; 0.04]. The interaction of Group X Temperament Scale was significant, F (4, 366) = 6.03, p < 0.001, η 2 = 0.062, 95 % CI [0.02; 0.11]. The significant interaction was followed up with group comparisons on each temperament factor using Benjamini-Hochberg corrected t-tests. Using this method, the diagnosed group differed from the non-diagnosed HR group on the Effortful Control factor, t (371) = 5.99, p < 0.001, d = 0.66, 95 % CI [0.47; 0.94], Positive Affect at 24 months, t (371) = 5.44, p < 0.001, d = 0.56, 95 % CI [0.40; 0.87], and Positive Affect at 12 months, t (371) = 2.40, p = 0.017, d = 0.27, 95 % CI [0.05; 0.51].

Early- versus Late-Diagnosed HR Group

The group effect was significant, F (1, 95) = 7.30, p = 0.008, η 2 = 0.071, 95 % CI [0.01; 0.18]. The early-diagnosed group had a lower overall temperament score (M = −0.43, SE = 0.09), compared with the late-diagnosed group (M = −0.09, SE = 0.09). Although the main effect of temperament scale, F (4, 92) = 3.81, p =0.007, η 2 = 0.142, 95 % CI [0.01; 0.25] was significant, the interaction of Group X Temperament Scale effect, F < 1 was not significant, suggesting the key difference between the early- and late-diagnosed groups was in parents’ overall temperament intensity ratings. Figure 2 shows that the two diagnosed groups had similar temperament profiles, and that the early-diagnosed group had lower scores on all temperament factors, compared with the late-diagnosed group.

Discussion

The main goals of the current study were to examine early differences in a sample of infants at high risk for developing ASD compared to a LR sample, and to examine temperament differences within the HR sample. In relation to this, we were interested in potential differences in how early reactive components of temperament might predict later regulatory traits in the HR versus the LR samples. We were also interested in the heterogeneity of temperament within this infant sibling sample and how these differences in temperament might be associated with later ASD symptoms. Our results extend previous findings by indicating that early reactive temperament components jointly predict later regulatory aspects of temperament at 24 months in both LR and HR, and that this regulatory aspect of temperament in turn predicts later ASD symptoms at 36 months in the HR sample.

Temperament Differences for High- versus Low-risk Groups

The HR group as a whole was characterized by high levels of negative affect. Multivariate analyses indicated group differences on all temperament scales that loaded on the Negative Affect factor at both 12 and 24 months. For instance, the HR group was rated higher on scales assessing fear and anger at 12 and 24 months. Furthermore, parents rated their HR children as lower on 5 of the 7 of the scales that loaded on the Effortful Control factor, indicating not only higher negative affect but also lower levels of regulation in this group. This is consistent with evidence that individuals with the broader autism phenotype express higher levels of negative emotion than the general population (Wainer et al. 2011).

Findings from SEM provided evidence of differences in association of temperament factors among HR and LR infants at 24 months. Although a strong positive relation between the Positive and Negative Affect factors was found for the HR group, the control group showed no such association. One possible explanation for this is a disturbance in the autonomic nervous system, such as higher sympathetic arousal (Goodwin et al. 2006; Kushki, Brian, Dupuis, and Anagnostou 2014) or lower parasympathetic arousal (Porges 2005), either of which may influence both positive and negative affect. Emotional states such as happiness, fear, and anger are associated with increased sympathetic arousal (Rainville, Bechara, Naqvi, and Damasio 2006) with this activation being subject to modulation by the parasympathetic nervous system and frontal brain systems (Thayer et al. 2012). Lower levels of regulatory influences by the parasympathetic nervous system and by the frontal brain system, as reported in individuals with ASD (Eilam-Stock et al. 2014: Porges 2005), may lead to a stronger association between positively and negatively valenced emotion in this population.

The SEM finding relates to the two meta-traits discussed in the introduction, Plasticity and Stability, both of which include reactive and regulatory components and have been hypothesized to reflect integration of temperament traits through development (Evans and Rothbart 2009). Recall that Plasticity includes high positive loadings from scales measuring positive affect and regulation, whereas Stability includes negative loadings from scales that assess negative affect and positive loadings from scales that assess regulation (DeYoung 2014). In the LR sample, the strong negative association between Negative Affect and Effortful Control factors is consistent with past findings of formation of a Stability meta-trait in typically developing children. Previous reports of this negative association have been taken to indicate that Effortful Control is important for regulating negative emotions (Rothbart and Bates 2006). The failure to find a significant negative correlation in the present study between the Effortful Control and Negative Affect factors in the HR group could indicate a problem with integration of the brain networks underlying these temperament traits.

Our LR control group findings replicated work suggesting an important role for positive affect in predicting later Effortful Control in typical infant development (e.g., Komsi et al. 2006). Again, these findings are relevant for the early development of the two meta-traits. The longitudinal association of positive affect at 12 months with later 24-month regulation may reflect integration of the networks underlying temperament traits to form the meta-trait of Plasticity. In fact, the presence of high Positive Affect at 12 months and its association with Effortful Control at 24 months seemed to distinguish those HR children who were not diagnosed with ASD from those who were. Perhaps a certain level of positive affect is necessary for the development of the regulatory aspect of temperament.

Heterogeneity Within the High-Risk Group and Association with ASD Symptoms

A critical finding in the SEM was the negative association between 24-month Effortful Control and subsequent ASD symptoms and diagnosis. A temperamental profile characterized by lower regulation has been one of the most consistent findings in studies of children with ASD (Adamek et al. 2011; Clifford et al. 2013). The current profile analysis further supported the importance of this construct in the prediction of ASD in HR infants. Also notable was the association between Positive Affect at 12 months and ASD symptoms at 36 months, and the finding that the inclusion of Effortful Control at 24 months eliminated this association. The implication is that 12-month Positive Affect influences later ASD symptoms through its effect on Effortful Control. This finding was further supported by the profile analysis, indicating that the critical difference between HR children with and without an ASD diagnosis was the former group’s profile of lower Positive Affect at 12 and 24 months and lower Effortful Control at 24 months. Thus, the combination of higher positive affect and regulation, which we have linked with the meta-trait of Plasticity, was associated with a better outcome, and could be considered a resilience factor in the HR group.

Finally, contrast analyses of temperament profiles in the earlier versus later diagnosed groups revealed that the two differed in the overall intensity of the temperament profile rather than on any one temperament factor. Figure 2 shows similar profiles of lower Positive Affect and Effortful Control relative to Negative Affect, particularly evident at 24 months. Although the lower levels of Positive Affect and Effortful Control at 24 months for the early-diagnosed group were expected, the lower level of Negative Affect was not. This finding indicated that children who were diagnosed earlier with ASD, presumably because their impairments were evident earlier, actually showed less negative affect. It may be that lower levels of affect, regardless of valence, are associated with more severe ASD symptoms, a profile reminiscent of the “aloof” ASD subtype suggested by Wing and Gould (1979).

Temperament as a Possible Endophenotype

Garon et al. (2009) proposed that exploring differences between children with and without ASD in respect to temperament profiles rather than individual temperament traits could help further our understanding of this heterogeneous disorder. The current findings, combined with those of recent studies exploring early temperament development (e.g., Clifford et al. 2013; Del Rosario et al. 2014) indicate that the search for a viable temperament endophenotype may also involve investigating associations among temperament traits in HR populations, providing clues to the integration of brain systems underlying these temperament traits.

Comparing the development of temperament traits in children from HR samples to that of children from the general population can provide clues to possible mechanisms leading to differences in timing and symptom presentation. For instance, early appearance of ASD symptoms may implicate reactivity and orienting attention systems since reactivity (positive and negative affect) and its regulation by the orienting attention network show a developmental spurt during the first 2 years of life (Posner et al. 2012). Appearance of ASD symptoms in the third year, conversely, may implicate problems in the organization of two attention networks, the orienting system and the executive attention system, which show segregation during this time (Posner et al. 2012).

Limitations and Conclusion

The main limitation of this study is that parent-rated temperament data are subject to report biases. Parents who already have a child with ASD may have perceived their younger children’s development differently than other parents. Given this possibility, it will be important to assess temperament using direct observational measures (such as Lab-TAB; Gagne, van Hulle, Aksan, Essex, and Goldsmith 2011). In this regard, however, it should be noted that parent reports and direct observations are not strongly correlated, in part because they reflect different views of the child (Rothbart 2011). Another limitation is the use of item parceling for the CFAs, which has been criticized in the field of SEM (Little 2013). Finally, caution should be used in the interpretation of the group differences (Table 2) as our SEM analysis indicated that the HR and LR groups differ in the structure of temperament factors.

In conclusion, the current results extend findings on temperament and ASD by underscoring the importance of early temperament patterns for understanding the complexity and heterogeneity of ASD. Critically, the findings highlight the importance of examining differences between HR and LR groups, as well as differences within the HR group. A combination of high negative affect and low regulation appeared to characterize the HR group overall. In contrast, two temperament patterns appeared to be associated with an ASD diagnosis. One pattern was that of lower early positive affect combined with lower regulation. The second pattern, overall lower reactivity and lower regulation, was associated with earlier diagnosis and more severe ASD symptoms. Our findings indicate that the search for a temperament endophenotype for ASD must involve consideration of the hierarchical structure of temperament and personality (DeYoung 2014). Development of temperament involves not just individual temperament traits, but also the integration of these traits. Finally, an important future step will be to explore how the early dispositions of HR infants interact with their social environments to construct core abilities such as emotion regulation and self-control.