Introduction

Temperamental traits are thought to be relatively stable behavioral characteristics that are observable in early childhood and at least partially affected by genetic influences (Kagan and Saudino 2001). Behavioral inhibition is a temperamental trait that reflects inhibition or withdrawal to novelty. Individuals develop inhibited (or disinihibited) response styles in early toddlerhood, and individual differences in behavioral inhibition seem somewhat stable across the life course (e.g., Kagan and Saudino 2001). Behavioral inhibition is often studied in the context of how one relates to one’s environment, and is a general term encompassing slow approach to novel items, shyness towards new people, and fearfulness in new situations (e.g., Kagan et al. 1984; Robinson et al. 1992). Different terms are used in the literature to describe constructs similar to behavioral inhibition, such as inhibition (Caspi et al. 1996), fearfulness (e.g., Goldsmith and Lemery 2000) and shyness (specific to situations with new people; Emde et al. 1992). The term behavioral inhibition is used in the present study as a general term to encompass the range of constructs associated with inhibited response styles; however, when referring to specific studies, the term used in that study is used.

Several longitudinal studies have found associations between early behavioral inhibition and later negative adjustment and psychopathology, suggesting the importance of studying its development and etiology. For example, Sanson et al. (1996) found moderate stability in shyness from infancy to early childhood, and that shyness was associated with “difficultness” in infancy and internalizing problems at age 6. Another study found associations between inhibition and shyness in toddlerhood and internalizing symptoms during adolescence; those with stable high or increasing levels of shyness over the course of the study had the most internalizing symptoms, impaired academic performance, and fewer positive interactions with peers and parents (Letcher et al. 2009). Caspi et al. (1996) found longitudinal associations between inhibition at 3 years and psychological disorders at age 21; those who were inhibited at age 3 were at an increased risk of having one or more psychological disorders, including depression and anxiety; unexpectedly, alcohol problems and violent crimes were also associated with early inhibition, but only in males. Additionally, childhood behavioral inhibition was also found to be associated with negative emotionality, decreased pleasure from social encounters, and delayed life events in early adulthood (Gest 1997). Conversely, some researchers suggest that behavioral inhibition may be a protective factor against externalizing behaviors (e.g., harm avoidance, or avoiding novel/dangerous activities, is hypothesized to predict fewer antisocial behaviors, Lahey and Waldman 2003) or associated with more positive outcomes, such as early conscience development (Kochanska et al. 1994) and decreased impulsivity and better self restraint (Askan and Kochanska 2004).

Through twin studies, researchers are able to estimate the magnitude of genetic influences (i.e., heritability) and environmental influences by assessing differences in the similarity between monozygotic (i.e., genetically identical) and dizygotic twins. These studies indicate a moderate to high heritability for behavioral inhibition. For example, Eley et al. (2003) found high heritability for shyness and inhibition (h2 = .66 for females and h2 = .76 for males) in a twin study of preschoolers. Emde et al. (1992) found moderate heritability estimates for observed shyness (h2 = .49) and a behavioral inhibition paradigm (h2 = .56). Another study reported significant genetic, but not shared environmental, influences on parent-rated shyness in adolescent same-sex sibling-pairs (i.e., MZ twins, DZ twins, and full siblings raised in intact families; full siblings, half siblings and unrelated siblings raised in stepfamilies; Saudino et al. 1995).

Differences in heritability estimates of behavioral inhibition may be partially due to differences in the method of assessment across studies. Parental questionnaires are commonly used in studies of early temperament. Observational measures are used much less often, but provide valuable information in multi-method studies of temperament. Together, the measures may provide both corroborative and unique data in the estimation of heritability estimates (Hewitt et al. 1992; Emde et al. 1992; Kagan 2001; Kagan and Saudino 2001). However, both parental questionnaires and observational data have methodological limitations that make interpretation of studies using these assessment methods difficult.

Parental questionnaires are cost- and time-efficient, and gather information from generalized perspectives of behaviors across situations. However, parental questionnaires may be affected by rater biases, parental phenotypes, and contrast effects. Rater bias is the tendency of a rater to over- or underestimate scores consistently (e.g., Neale and Cardon 1992). This tendency may reflect phenotypes of the parents, with parents having unique interpretations of questions regarding their children’s behaviors depending on their own experiences, personality traits, and other variables (e.g., van der Valk et al. 2001; Rothbart and Goldsmith 1985). Another potential methodological limitation of parent report is the contrast effect, which refers to the tendency for parents to overestimate differences between dizygotic twins compared to parents of monozygotic twins, and leads to inflated genetic estimates in some cases (Saudino et al. 2000; Emde et al. 1992).

Observational measures allow unbiased individuals to assess temperament. Usually, two observers rate the individual twin behaviors independently, so rater biases do not contribute to the correlation between the ratings of the twins. However, observational measures only provide data for certain contexts. This can be problematic for a construct like behavioral inhibition, as individuals may exhibit high levels of inhibited responses in some situations (e.g., in a laboratory setting) but not others. Observational measures only capture a snapshot of behavior (e.g., what is observed during the specific protocol); reliability/repeatability of behaviors is an important issue to consider. Because observational measures are generally limited to small slices of an individual’s behavioral repertoire, they may reflect state, rather than trait, behaviors (Wachs 1992, cited in Sanson et al. 1996). Thus, observational measures are potentially susceptible to random error based on the generally short segments of observational data collected (e.g., Kendler and Baker 2007). Additionally, because raters are typically following a protocol and aiming for high inter-rater reliability, consistency between ratings may appear as a bias in ratings contributing to instrument variance (e.g., Cronbach et al. 1963). The protocol bias will appear as systematic variance in behaviors, especially when assessed across time, which would not be shared in common with parent ratings.

The limitations of assessment methods may have an impact on the estimate of heritability and environmental influences on phenotypes for many aspects of behavior. For example, a meta-analysis examining the putatively environmental measure life events found that parent reports yield higher heritability estimates than observational measures (Kendler and Baker 2007). Also, assessment method was a significant moderator of heritability in a meta-analysis examining antisocial behaviors (Rhee and Waldman 2002).

There have been mixed findings of low to moderate correlations between the two methods of assessing behavioral inhibition (e.g., Emde et al. 1992; Kagan 2001; Kagan and Saudino 2001). These correlations suggest that it may be possible to derive a more valid phenotype by examining both methods simultaneously. To evaluate this possibility, it is necessary to understand whether behavioral inhibition assessed via two different methods is merely correlated, or represents a common view of the phenotype. One way to assess the degree to which parent report and observational measures are assessing the same latent phenotypic construct is to fit theory-driven models to the data. Hewitt et al. (1992) proposed several steps in analyzing rater agreement in twin data. First, the biometric model (Fig. 1a) takes an agnostic approach to the data, and makes no assumptions regarding the source of the correlation between measured phenotypes. The psychometric model (Fig. 1b) assumes that a common phenotype is assessed by both raters, and reflects a trait of the individual. This model estimates the genetic and environmental influences affecting the common phenotype, as well as the genetic and environmental influences associated with each variable’s unique variance. The rater bias model (Fig. 1c) also assumes that a common phenotype is assessed by both raters while accounting for the tendency of individual raters to consistently over- or underestimate scores. In the rater bias model, genetic and environmental influences are estimated for the common phenotype, and the rater bias and residual variance is estimated for each variable’s unique variance. In the case of observations, rater biases do not contribute to the correlation between the ratings of the two twins, as two different individuals rate the two twins. Therefore, the “rater bias” is simply the source of covariance between the two twins over and above the covariance due to the latent common phenotype (e.g., protocol/instrument bias).

Fig. 1
figure 1

a Biometric model. A1 genetic influences shared in common by observation and parent report and A2 genetic influences specific to parent report. Correlation between Twin 1 genetic influences and Twin 2 genetic influences is 1.0 for MZ twins and .5 for DZ twins. a 11 genetic influences on observations, a 21 common genetic influences on parent reports, a 22 unique genetic influences on parent reports, C1 shared environmental influences shared in common by observation and parent report, and C2 shared environmental influences specific to parent report. Correlation between Twin 1 shared environmental influences and Twin 2 shared environmental influences is 1.0 for MZ twins and DZ twins. c 11 shared environmental influences on observations, c 21 common shared environmental influences on parent reports, c 22 unique shared environmental influences on parent reports, E1 nonshared environmental influences shared in common observation and parent report, E2 nonshared environmental influences specific to parent report. e 11 nonshared environmental influences on observations, e 21 common nonshared environmental influences on parent reports, e 22 unique nonshared environmental influences on parent reports, and b  contrast effects. b Psychometric model. Correlation between Twin 1 genetic influences and Twin 2 genetic influences is 1.0 for MZ twins and .5 for DZ twins. Correlation between Twin 1 shared environmental influences and Twin 2 shared environmental influences is 1.0 for MZ twins and DZ twins. b contrast effects. c Rater bias model. Correlation between Twin 1 genetic influences and Twin 2 genetic influences is 1.0 for MZ twins and .5 for DZ twins. Correlation between Twin 1 shared environmental influences and Twin 2 shared environmental influences is 1.0 for MZ twins and DZ twins. b contrast effects, f freed latent factor loading on parent report, B rater bias parameter, and R residual variance

The present study investigated the magnitude of genetic and environmental influences on behavioral inhibition in toddlerhood utilizing a longitudinal twin study approach with observational and parent-reported questionnaire data. The first study aim was to assess whether these measures reflect a common view of the toddler’s behaviorally inhibited phenotype. The second aim was to assess the extent to which the common variance shared by parental and observational measures is influenced by genetic and environmental influences. We were also able to assess the extent to which the unique variance of each measure is influenced by genetic and environmental factors (in the psychometric model), or by rater bias (in the rater bias model).

Methods

Sample

Data were collected as part of the MacArthur Longitudinal Twin Study (MALTS). Same-sex twin pairs were recruited from the Colorado Twin Registry of twins born from 1984 to 1990 (Rhea et al. 2006). Twins were selected based on high birth weight (greater than 3.86 lb) and general good health. Zygosity was determined via a ten-item assessment of twin similarity completed by examiners working with the twins. For cases in which the examiners were unsure, a panel of nine or more highly polymorphic simple tandem repeat markers was used to confirm zygosity. For the present study, behavioral inhibition data were collected as part of a larger research protocol during home and lab visits (occurring within 2 weeks of each other) at 14, 20, 24, and 36 months. All analyses were conducted at each time point. Table 1 provides the sample sizes of dizygotic and monozygotic individuals within twin pairs with behavioral inhibition data at each time point.

Table 1 Sample size and means by sex across time points

Assessment methods

Observational measures

Observational measures of behavioral inhibition were collected during home and lab visits, which were videotaped and later rated by trained research assistants. The home and lab video recordings were rated by separate examiner assistants to ensure both twins were not assessed by the same individual.

During the home visit, the first 5 min of the examiners’ visit to the family home was recorded by one examiner while the other examiner interacted with the mother and twins. Examiners introduced themselves, presented two toys (one for each twin), and placed identifying vests on the twins. This segment was later rated to assess each toddler’s level of shyness to the examiners and inhibition to the research equipment. The outcome factor scores of the segment refer to the tendency for each toddler to approach novelty (e.g., the toy and/or the examiner; “Approach”) and the tendency for the toddler to cling to their mother (“Cling”).Footnote 1 Each twin in the dyad was assigned to work with a separate examiner during the home visit. The toddlers were in separate rooms during most of the home procedures. Examiners assessed toddlers’ behaviors, including level of fearfulness, using the Infant Behavior Record (IBR; Matheny 1980) during the Bayley Scales of Infant Development (Bayley 1969; collected at 14, 20, 24 months) or during the Stanford–Binet (Terman and Merrill 1973; collected at 36 months) and over the course of all other home procedures. The “IBR Fear” item used in the present study was the average of the toddler’s “reaction to the new or strange; e.g., strangers, strange surroundings, test materials” on a nine-point scale across the two ratings (i.e., during the Bayley/Stanford–Binet and the aggregate across all other procedures).

During the lab visit, toddlers participated in a behavioral inhibition paradigm (based on research by Kagan and colleagues (e.g., Kagan et al. 1984), which was conducted in a playroom to assess toddler reactions to novel items and strangers. The measures included latency to leave parent upon entering the playroom, latency to approach toys, latency to approach stranger, latency to approach a novel object (e.g., monster mask), and time spent close to mother during the phases (i.e., toys, stranger, and novel object). The paradigm changed to a risk room setting at the 36-month visit, in which the latency for the toddler to approach different items (i.e., balance beam, box, mattress, and scary mask) was recorded. A composite score (“BI Task”) was created from the standardized scores of the variables. At the end of the lab visits at 14, 20, and 24 months, examiners rated the overall shyness of each twin during the lab assessments on a four-point ordinal scale (i.e., not shy to very shy; “Lab Visit Shyness”). This variable was not assessed during the 36-month lab visit.

Parent report measures

Parental ratings of toddler behavioral inhibition were collected using questionnaires. Shyness was ascertained using an average of the mother and father ratings on the Colorado Childhood Temperament Inventory (“CCTI Shy,” CCTI; Rowe and Plomin 1977). Fearfulness was measured via mother ratings of fearfulness on the Differential Emotions Scale (“DES Fear,” DES; Izard et al. 1980); father ratings were not collected for this scale. Approach was assessed using the average of mother and father ratings on the Toddler Temperament Scale (“TTS Approach,” TTS; Carey and McDevitt 1978). When available, averages of parental ratings were used to reduce measurement error.

Statistical analysis

Data preparation

All variables included in the analyses were assessed for normality of their distributions. In cases where the skewness/kurtosis values were greater than 1.0, the variables were log transformed or square root transformed to achieve normal distributions. Variables violating the assumptions of normality even after transformation were ordinalized while ensuring adequate sample sizes in each category to avoid the issue of zero or small cells.

Factor analysis

Factor analyses were conducted in Mplus (v6.1; Muthén and Muthén 1998–2010), which allows the simultaneous analysis of categorical and continuous variables. EFA and confirmatory factor analyses (CFA) were conducted at each time point to examine the underlying factor structure of the data. The cluster option was used when calculating standard errors and χ2 tests in Mplus to account for non-independence of the data within twin pairs (Muthén and Muthén 1998–2010). Given that the χ2 is sensitive to sample size, the comparative fit index (CFI; Bentler 1990), the Tucker–Lewis index (TLI; Bentler 1990), and the root mean square error of approximation (RMSEA; Browne and Cudeck 1993) were also examined. A CFI and TLI greater than .95 and RMSEA less than .06 indicate good model fit (Hu and Bentler 1998). EFA and CFA of all measures yielded two factors, a parent-report factor and an observational factor, across all ages. Based on these findings, factor scores computed by Mplus were used for the following analyses.

Model fitting

Model fitting analyses were conducted in Mx (Neale et al. 2003) using the factor scores from the CFA described above. Saturated models, which estimate the means, variances, and covariances without constraints, were fit to the data at each time point to estimate the fit of the other models tested. Null models, which estimate the means and variances only, were fit to the data at each time point to allow the estimation of TLI and RMSEA for each estimated model (Neale et al. 2003; null model results available upon request). To determine whether observation and parent reports are measuring the same underlying behavioral inhibition phenotype, we fit the biometric, psychometric, and the rater bias models (Hewitt et al. 1992; Neale and Maes 2004) to the data. We also assessed the degree to which the latent phenotype underlying the parent report and observation measures are influenced by genetic and environmental influences, and whether the unique variance of each method is better explained by genetic/environmental influences (psychometric model) or rater bias and residual variance (rater bias model). Figure 1 illustrates each model.

All models in the present study follow the basic twin analysis assumption for estimating genetic, shared environmental, and unique environmental influences (with the exception of the saturated models). The correlations between genetic influences on the twins are set to be equal to 1.0 for monozygotic twins and 0.5 for dizygotic twins because monozygotic twins are genetically identical, whereas dizygotic twins share an average of 50 % of their alleles identical by descent. The correlations between shared environmental influences are set to 1.0 for both monozygotic and dizygotic twins, following the equal environments assumption (e.g., Kendler et al. 1993; Eaves et al. 2003; Derks et al. 2006), which postulates that twins are raised in similar environments, regardless of the degree to which they share genes. Unique environmental influences do not correlate between twin pairs because these influences are unique to each individual (e.g., measurement error, unique environmental stressors). Additionally, an estimate of contrast effects for parent reports was included in each model because it has been demonstrated that parents of dizygotic twins tend to rate their twins as less similar (thus leading to larger variance) than parents of monozygotic twins on measures of temperament (e.g., Saudino et al. 2000).

The biometric model (Fig. 1a) estimates the common genetic (a2), shared environmental (c2), and nonshared environmental (e2) influences on the observations and decomposes the variance of parent reports between those that are shared in common with observations and those that are unique to parent reports. The a11 path refers to the genetic influences on observations, the a21 path refers to the common genetic influences on parent reports, and the a22 path refers to the unique genetic influences on parent reports. By squaring the standardized parameter estimates for each path, an estimate of heritability (a2) is obtained. The remaining estimates (c2 and e2) follow the same pattern.

The psychometric model (Fig. 1b) takes a latent variable approach to fitting the data. It is assumed that there is a latent variable of behavioral inhibition explaining the covariance between observation and parent report. The a2, c2, and e2 are estimated for the latent variable. The latent variable factor loadings are fixed to 1 in order for the model to be identified (Hewitt et al. 1992). Separate a2, c2, and e2 are estimated for the unique variance of observations and parent reports.

The rater bias model (Fig. 1c; Hewitt et al. 1992) is similar to the psychometric model, in that there is an underlying latent common phenotype. However, the latent variable loading on observational measures was fixed to 1 and the loading on parent report was allowed to be free. One loading had to be fixed to 1 for the model to be identified (Hewitt et al. 1992). Additionally, the unique variance of observation and parent report measures are assessed for rater bias (i.e., the tendency for individuals to consistently rater behavior higher/lower compared to other raters) and nonshared environmental influences.

As noted above, the parameterization of the latent factor differs between the psychometric and rater bias models. In the rater bias model, the loading on parent report was allowed to be free whereas in the psychometric model, both loadings were fixed to one. As Hewitt et al. (1992) noted, fixing the factor loadings to be one in the psychometric model allows the model to be identified and unique parameterization of the variances of each measure; otherwise, shared environmental influences in the psychometric model would be equal to the bias parameters in the rater bias model and nonshared environmental influences in the psychometric model would be equal to the residual variance parameter in the rater bias model.

Model fit comparisons were conducted with a χ2 difference test of the −2 log likelihood (−2ll) scores when the models being compared were nested (i.e., the comparisons between the saturated model and biometric, psychometric, and rater bias models, and the comparison between the biometric and rater bias models; Hewitt et al. (1992) provide substantive information regarding model comparisons). In comparing the fit of the biometric and psychometric models, which are not nested and have the same degrees of freedom due to the latent factor loading constraint in the psychometric model, the −2ll and Akaike’s information criterion (AIC) values were compared, with lower −2ll and AIC values indicating better fit. To determine the overall fit of each model, saturated models (which estimate the means, variances, and covariances without constraints) were fit at each time point, and a χ2 difference test was calculated between the saturated model and estimated models. Fit indices (i.e., RMSEA and TLI) were estimated using information from the saturated, null, and estimated models (Neale et al. 2003). To assess the statistical significance of parameter estimates, individual parameter paths were dropped and a one degree of freedom χ2 difference test was computed to determine if dropping the parameter resulted in a significant decrement in fit. When a parameter could not be dropped from a model (i.e., specific nonshared environmental influences, the variance unique to each twin, include measurement error, and are thus necessary in the biometric models), 95 % confidence intervals were calculated to determine statistical significance.

Results

Factor analysis

Eight variables were included in the EFA of the behavioral inhibition data. Initial EFA results suggested that there were two factors underlying the behavioral inhibition measures, given that there were two eigenvalues greater than one, and the two-factor model fit the data well at each age (results available upon request). The five observational measures significantly loaded on the first factor and the three parental report measures significantly loaded on the second factor.

The factor structures suggested by the EFA were then tested using CFA. Figure 2 presents the CFA results. At each age, the two-factor model suggested by the EFA fit the data well according to model fit statistics (as described above). At each age, all factor loadings were significant at p < .01, and the correlations between the factors were significant at p < .01. Based on these findings, factor scores computed by Mplus were used for the following analyses.

Fig. 2
figure 2

CFA results. *Significant p value of <.05; **significant p value of <.01. IBR Fear Infant Behavior Record fearfulness ratings, Approach Toddler’s tendency to approach novel items/strangers during home visit, Cling Toddler’s tendency to cling to mother during home visit, shyness rating shyness during lab visit, BI task behavioral inhibition task, TTS Approach Toddler Temperament Scale approach rating, CCTI Shyness Colorado Childhood Temperament Inventory shyness rating, and DES Shyness Differential Emotions Scale fearfulness rating

Descriptive statistics and correlations

Table 1 provides descriptive statistics (means and standard deviations) for boys and girls by zygosity groups (MZ and DZ) at each time point. Regressions were conducted to determine whether the behavioral inhibition scores were significantly different between gender groups, zygosity groups, and whether there was an interaction between gender and zygosity (Table 1). Generally, girls had higher behavioral inhibition scores than boys, and there appeared to be an interaction between zygosity and gender at some ages, with monozygotic females generally having higher behavioral inhibition ratings than dizygotic females.

Table 2 provides the phenotypic, cross-twin within-trait, and cross-twin cross-trait correlations. The phenotypic correlations were fixed to be equal across zygosity because of the assumption that phenotypic correlations will not vary as a function of zygosity, but were free to vary across sex. The phenotypic correlations indicated that observational and parent report measures were significantly correlated within individuals. The cross-twin within-trait correlations provide evidence for genetic influences on behavioral inhibition because the monozygotic twin correlations were consistently larger than the dizygotic twin correlations. The cross-twin within-trait correlations provided evidence of contrast effects present in the parental ratings (Neale and Maes 2004), as the ratings from parents of dizygotic twin were weakly and/or negatively correlated, but the observer ratings of dizygotic twins were statistically significant and positive. Further, MZ and DZ variances could be equated for observation measures, but not parent report, for which DZ variance was larger than MZ variance, a predicted consequence of contrast effects or sibling competition (Neale and Cardon 1992). Because of these findings, a contrast effect for parent report was modeled in the following analyses.

Table 2 Phenotypic, cross-twin within-trait, and cross-twin cross-trait correlations

Model fitting

Sex differences

A sex difference model was tested in Mplus at each time point in models estimating the phenotypic, cross-twin within-trait, and cross-twin cross-trait correlations to determine whether there were significant sex differences. A model allowing the correlations to vary across sex provided a better fit to the data only at 36 months (Δχ2 (7) = 14.36, p = .05; data available upon request). Cross-twin, cross-trait correlations were larger in the female group at 36 months. Also, we compared the fit of the model where the parameter estimates were constrained to be equal across sex and where the parameter estimates were free to vary across sex when testing the biometric, psychometric, and rater bias models in Mx. Similarly, there was only evidence of sex differences in the parameter estimates at 36 months (all Δχ2 p values <.01; data available upon request). Based on these results, sex differences were described at 36 months only.

Contrast effect estimate

As noted above, the negative DZ correlations and the greater DZ variances than MZ variances suggested the parent report measures were affected by a contrast parameter (Neale and Maes 2004). Therefore, a contrast effect was included for parent reports in each model tested in Mx (Fig. 1). For all models (i.e., biometric, psychometric, and rater bias) at each time point, the contrast parameter was negative, indicating that a higher inhibition rating for one twin by the parent was contributing to a lower rating for the second twin and vice versa (14 months: b = −.29 to −.40; 20 months: b = −.42 to −.45; 24 months: b = −.22 to −.25; 36 months: b = −.17 to −.48).

Biometric model

The standardized variance components (i.e., proportion of variance explained) from the biometric model are presented in Table 3. The biometric model provided evidence of common genetic influences between parent report and observational measures at each time point. The parameter estimates for observational measures suggest small to moderate genetic and environmental influences on the behavioral inhibition phenotype. The parent report variance was decomposed into that shared in common with observational measures (A1, C1, and E1) and that unique to parent report (A2, C2, and E2). Genetic influences shared in common by parent report and observation measures (A1) had moderate influences on parent report, whereas the C1 and E1 estimates indicate little to no common environmental influences between parent report and observational measures. A2, which influences only parent report, indicated moderate to substantial genetic effects; C2 and E2 indicated small to moderate environmental influences on parent report.

Table 3 Proportion of variance explained by the parameter estimates from the biometric model

Psychometric model

The standardized variance components from the psychometric model are presented in Table 4. The results from the psychometric model suggest that there are moderate to substantial genetic influences and small to moderate environmental influences on the latent behavioral inhibition factor underlying observations and parent reports. At 36 months, there were substantial genetic influences on the latent behavioral inhibition factor and no shared environmental influences in males, in contrast to moderate genetic and shared environmental influences in females.

Table 4 Standardized parameter estimates from the psychometric model

The latent phenotype accounted for a moderate to substantial proportion of the observational measure variance. Unique nonshared environmental influences accounted for a moderate proportion of the remaining variance of observational measures, whereas little to no variance was explained by unique genetic or shared environmental influences. The latent phenotype accounted for a moderate proportion of the parent report variance. There were modest to moderate unique genetic and environmental influences on parent report. At 36 months, there were moderate unique genetic and no unique shared environmental influences on parent report in males, whereas there were modest unique genetic and moderate unique shared environmental influences on parent report in females.

Rater bias model

Standardized variance components (i.e., proportion of variance explained) from the rater bias model are provided in Table 5. The rater bias model indicated that genetic influences on the latent common phenotype were moderate to substantial. Shared environmental influences accounted for moderate variance of the latent variable at 20 months, and a small proportion of the variances in females at 36 months.

Table 5 Standardized parameter estimates from the rater bias model

Observational measures were moderately influenced by the latent variable, rater bias, and residual variance. Parental report measures had a substantial proportion of variance explained by the latent variable, a small proportion of variance explained by the residual variance, and little to no variance explained by rater bias.

Model fit comparisons

Table 6 provides the model fit statistics. No model fit significantly worse than the saturated model, and all fit indices indicate that the estimated models fit the data well (i.e., RMSEA smaller than .05 and TLI greater than .95). Psychometric and biometric models have the same degrees of freedom, so a test of χ2 difference could not be calculated. At 14, 20, and 36 months, the rater bias model did not fit significantly worse than the biometric model. At 20, 24, and 36 months, the psychometric model had an almost identical model fit as the biometric model. Generally, these findings indicate that a behavioral inhibition latent construct fits the data well and support a common phenotype approach.

Table 6 Model fit statistics/comparisons

Discussion

The present study was conducted to assess the extent to which observational and parent report measures assess a common behavioral inhibition phenotype, and the degree to which the phenotype is influenced by genetic and environmental factors. The aims were addressed by fitting theory-driven models to the data (Hewitt et al. 1992). We found that the covariance between observational and parent report measures was significant, and either rater bias or psychometric models fit the data well at each age. Overall, these results provide support for a latent common phenotype underlying observations and parent report at each age. Additionally, all analyses indicated moderate to substantial genetic influences on the common latent phenotype of behavioral inhibition. These findings are consistent with earlier reports on the heritability of shyness and other inhibited behaviors (e.g., Emde et al. 1992; Eley et al. 2003). It also suggests that the common phenotype underlying both parent report and observations is more heritable than the unique variance of these measures, which are also influenced by measurement error.

The proportion of variance of each method explained by the behavioral inhibition latent variable was reversed in size between the psychometric and rater bias models. When interpreting this result, it is important to note that the model constraints affected the individual parameters within each model. In the psychometric model, the latent variable factor loadings are both fixed to one, whereas in the rater bias model, the latent variable factor loading is freed for parent report and fixed to one for observations. Thus, the parameter estimates cannot be compared directly across models.

Additionally, it is important to note that the heritability of the latent variable is also not comparable across the psychometric and rater bias models because there is only one source of genetic influences in the rater bias model, whereas there are genetic influences on both the latent variable and method-specific variances in the psychometric model. It is possible that the limited genetic/environmental structure of the model of the rater bias model have influenced the results, with the latent variable having a greater influence on the parent report than the observations.

In the present study, we found that girls have a higher level of behavioral inhibition than boys, with this gender difference being significant at most ages. The differences observed in the present study (see beta coefficients for gender, Table 1) are consistent with the conclusion of a meta-analysis indicating that girls have slightly higher rates of shyness than boys (weighted mean effect size d = −0.10; Else-Quest et al. 2006). We also found that the magnitude of genetic and environmental influences could be constrained across gender from 14 to 24 months, but significant sex differences in the parameter estimates emerged at 36 months, with suggestion of greater shared environmental influences on behavioral inhibition in girls than in boys. One possible explanation for the appearance of sex differences in the parameters for the first time at age 36 months is the emergence of differential social rules, in which males face higher social consequences (e.g., peer exclusion, negative parent response) for inhibited behaviors compared to females (e.g., Rubin and Coplan 2004; Karevold et al. 2011). For example, one review found rates and types of problematic behaviors diverge between the sexes around age 4, and the authors posit that this may be due to socialization factors and the development of adaptive skills (Keenan and Shaw 1997). However, it may also be possible that parents and/or examiners have different thresholds for rating inhibited behaviors in boys versus girls, although it is unclear why that difference would emerge at 36 months. Another possibility is that we did not have statistical power to detect small sex differences from 14 to 24 months.

Twin studies examining behavioral inhibition have reported slightly mixed findings of gender effect on the heritability of behavioral inhibition. For example, a study that examined parent reported shyness in toddlers (in the same sample examined here) reported no significant decrement in the fit of a model constraining parameters to be equal across gender compared to that allowing parameters to be free across gender (Rhee et al. 2007). Eley et al. (2003) found significant sex differences in a twin study of shyness/inhibition 4-year-olds, with results indicating that the magnitude of additive genetic effects and contrast effects were greater in males, whereas the magnitude of nonshared environmental influences was larger in females. It is possible that differences in the measurement of behavioral inhibition and age at which behavioral inhibition was assessed may have led to inconsistent results in the literature.

One of the strengths of the present study was the availability of several observational and parental report measures. Creating factor scores from a variety of measures likely reduced the measurement error inherent in any singular measure. Additionally, the availability of longitudinal data allowed us to examine the variables across different time points in toddlerhood. Overall, the results were consistent across time points, which provide higher confidence in our findings. However, a limitation of the study was that the assessment of some of the variables changed over time, which limited our ability to examine these data longitudinally. Additionally, the use of the same sample over time may contribute to consistency within the study, and these results need to be replicated with other study samples. Also, some models’ fit could not be compared statistically because they were not nested (i.e., psychometric and biometric models); however, the examination of −2ll and AIC values allowed us to examine whether the psychometric model fit as well as the biometric models (i.e., smaller values indicates better fit). Another potential limitation is that the sample size may be small for finding small effects, especially in complex multivariate models.

In sum, the present study found that observational and parental report measures of toddler behavioral inhibition share significant variance and reflect a common view of toddler behavioral inhibition (Aim 1). Additionally, there is a moderately to substantially heritable latent common phenotype underlying observational and parent report measures of behavioral inhibition (Aim 2). By combining across the two assessment methods, we can derive a more valid phenotype. By examining the more valid latent common phenotype, studies examining behavioral inhibition as a predictor of later outcomes or those examining the genetics of behavioral inhibition may increase their power while being less affected by the methodological limitations inherent in observations or parent reports.