Introduction

As the first longitudinal twin study of children in the modern era, the LTS was often a leader in raising questions about the nature of development and in creating methods for addressing those questions. In his pivotal paper on “developmental synchronies” Wilson (1983) focused on cognitive development and testing the hypothesis that the “distinctive developmental gradients should unfold in synchrony” for monozygotic twins (p. 299). He found that even though there were extensive individual differences in patterns of cognitive developmental, starting at age 12 months MZ twins demonstrated greater concordance than DZ twins in “both elevation and patterning of scores” over age (p. 305). Since then, behavior genetic analyses of cognitive development in childhood have primarily focused on the magnitude and continuity of genetic and environmental effects on cognitive development (Logan et al. 2013; Petrill et al. 2010; Wadsworth et al. 2014). Few analyses of cognitive development in childhood have focused on twin similarity for the developmental trajectories that were the target of Wilson’s approach. In the current analyses, we attempt to return to Wilson’s original purpose by applying biometric latent growth curve models to twin data on cognitive development available from the LTS.

In his interpretation of the LTS data, Wilson (1983) referenced Waddington’s theories of genetic canalization (Tucker-Drob and Briley 2014; Waddington 1942, 1971) to conclude that a pre-programmed template that emerges over childhood guides cognitive development and as a result development is generally buffered against environmental influences. To more directly address the issue of genetic and environmental continuity versus innovations during cognitive development, Eaves et al. (1986) developed the simplex model and applied it to published LTS data covering ages 3 months through 15 years. They concluded that the demonstrated increase in heritability of cognitive function throughout childhood results not from any genetic innovations at later ages, but from a single set of common genes that is active from birth. Persistence and accumulation of these genetic effects result in rapid increases in heritability of cognition during childhood. Evidence for both continuous and age-specific shared family environmental effects on cognitive development was reported. In the same issue of the journal, McArdle (1986) presented the latent growth curve method for examining twin similarity in growth, also based on LTS data. From ages 6–24 months, McArdle found evidence for a linear change trajectory for general cognition arising from common familial sources of variance. Subsequent analyses have focused on applications of simplex and Cholesky models to cognitive development and have produced mixed results, with reports of both a single common genetic factor contributing to cognitive development across childhood (Bartels et al. 2002; Bishop et al. 2003) and evidence for genetic innovations at 2 and 7 years (Cardon et al. 1992; Casto et al. 1995; Cherny et al. 1994; Fulker et al. 1993). Similarly mixed results were reported for shared environmental influences. Recent reviews have reported evidence for both genetically mediated stability (Wadsworth et al. 2014) and instability (Tucker-Drob and Briley 2014) in cognition during early development.

As in Wilson’s original analysis, most twin studies of cognitive development focus on standardized scores: full scale IQ, performance IQ, and verbal IQ. Standardized scores provide a direct measure of whether the child is ahead of or behind her age peers, but primarily standardized scores allow researchers to avoid measurement issues when combining data across tests. In adulthood, the same measures of cognitive functioning can be used from age 16 years to the end of the lifespan (barring dysfunction). In childhood, however, qualitative changes in development coincide, necessarily but unfortunately, with qualitative changes in instrumentation (McArdle et al. 2009). Combining standardized data across different testing instruments may introduce sources of nonrandom error and skew results toward the appearance of discontinuity (Finkel and Davis 2009). Conversely, analysis of non-standardized individual subtests of the cognitive scales allows for the examination of both (a) domain differences in genetic and environmental influences and (b) true growth trajectories. Genetic and environmental factors that impact specific cognitive abilities may be more specific themselves and may differ qualitatively from factors that influence broad cognitive constructs (Reeves and Bonaccio 2011; Tucker-Drob and Briley 2014). In fact, investigations of general versus trait-specific genetic influences on measures of specific cognitive abilities conclude that heritability of cognition in childhood reflects both general genetic influences and distinct genetic influences specific to separate abilities (Luo et al. 1994; Petrill et al. 1996).

In the current analysis, then, we focus on individual tests of the Wechsler measures of cognitive ability in childhood to support investigation of genetic and environmental influences on latent growth curves and possible differences in those influences across domains. Cross-sectional studies of the Wechsler intelligence scale for children-revised (WISC-R; (Wechsler 1974) indicate quite different heritability estimates across tests, with generally higher heritability for verbal subtests than for performance subtests (LaBuda et al. 1987; Luo et al. 1994; Petrill et al. 1996; Segal 1985). To date, the only reported growth curve analyses of cognition including twin children have focused on particular skills (such as components of reading) that can be assessed with identical instrumentation (Logan et al. 2013; Petrill et al. 2010) or on latent constructs (McArdle 1986; Tucker-Drob and Briley 2014). Estimates of heritability of rates of growth varied across measures of reading ability from. 00 to .58 (Logan et al. 2013; Petrill et al. 2010). In addition, sources of genetic influences on growth in reading ability were mixed across tests, with evidence for genetic influences on growth rates both unique from (i.e., genetic innovation) and indistinguishable from (i.e., genetic continuity) genetic influences present at the initial level of performance. A recent meta-analysis of 15 longitudinal twin and adoption studies across the lifespan found evidence for discontinuity in genetic factors influencing general cognitive ability during childhood, although genetic stability increases quickly and plateaus at a high level (.80) around age 10 years (Tucker-Drob and Briley 2014). Given these findings, we expect to find differences across cognitive subtests in rates of growth, genetic influences on growth, and evidence for genetic continuity versus innovation.

Method

Participants

Twins enrolled in the LTS were recruited from families residing in the metropolitan Louisville, KY area at the time of the twins’ birth. The LTS sample was a collection of families who represented the full range of socioeconomic status, race, and ethnic diversity within the Louisville metropolitan area. Approximately 80 % of the participants were European–American, 18 % were African–American, and the remaining 2 % were of mixed or Asian ancestry. Occupations of heads of households, converted to Duncan’s scores for socioeconomic status (Duncan 1961), represented the entire distribution of social class, with the average score on the 100-point scale falling between 40 and 50 (score range typical for middle-level clerical workers). Mean gestational age of the LTS sample was 37 weeks, which is just below the population mean of 40 weeks for single-born neonates. Special efforts were made to retain recruited families in the study, and less than 10 % of the sample withdrew from the longitudinal study during the first 3 years.

Data from cognitive testing at ages 4, 5, 6, 7, 8, 9, 12, and 15 years were used in the current analyses; sample characteristics are presented in Table 1. Total N indicates the total number of individuals at each age and the number of twin pairs indicates the number of complete monozygotic and dizygotic pairs at each age. Twins were typically tested within a week of their birthday; note that actual age at measurement was used in the subsequent analyses. Sample size is smaller at age 4 because three different cognitive tests were used at that wave over the course of the LTS; the current analyses included only twins who received the Wechsler tests at age 4. Only a subset of the sample was invited to participate in the age 12 visit; therefore, the sample is reduced at that wave as well. Sixty-two percent of the sample contributed data from 4 or more waves. Zygosity was determined by blood sera analysis made when the twins were 3 years old or older as part of the LTS protocol.

Table 1 Sample characteristics

Measures

The twins were administered the age-appropriate Wechsler scales of cognitive ability individually by separate examiners at each visit to the study center. The testing schedule was arranged so that examiners did not test the same twin on successive visits. The Wechsler preschool and primary scale of intelligence (WPPSI; (Wechsler 1967) was used to assess cognitive functioning at ages 4, 5, and 6 years of age; 78 % of the sample had WPPSI data from at least 2 waves. Between ages 7 and 15 years, cognitive ability was measured using the WISC-R; 51 % of the sample had WISC-R data from at least 3 waves. Analyses focused on 7 subtests common to both the WPPSI and the WISC-R: items differed across the WPPSI and the WISC-R to make the tests age appropriate, but the same types of items were included in these 7 subtests on both the WPPSI and the WISC-R. Information, Vocabulary, Similarities, and Comprehension are typically considered measures of verbal comprehension. Block Design and Picture Completion tap perceptual organization. The only test of freedom from distractibility included here was Arithmetic. Scores on all tests were converted to percent correct.

Statistical Method

Age-based biometric latent growth curve models (LGCM) were used to examine genetic and environmental contributions to development of cognitive ability (Neale and McArdle 2000; Reynolds et al. 2005). The LGCM provides estimation of fixed effects, i.e. fixed population parameters as estimated by the average growth model of the entire sample, and random effects, i.e. individual variation in growth model parameters. With only 3 waves of data for the WPPSI scales, a linear growth model was used with centering at age 5. Because of the expected deceleration in growth in cognitive functioning in later childhood (Tucker-Drob and Briley 2014), a two-slope LGCM was used for the WISC-R scales: one slope for early childhood (age <10) and a separate slope for later childhood (age >10). Although a quadratic model estimates decelerating growth across the entire age range, the two-slope model allowed for distinct modeling of age trends in early and later childhood. Model-fitting indicated that centering at age 10 years provided the best fit to the data, a conclusion supported by McArdle et al. (2009). Thus, the intercept of the two-slope model is evaluated at the inflection point: age 10. The two-slope version of the biometric latent growth curve model is presented in Fig. 1. Observed data are indicated by y0 through y4. Group mean intercept (Mi) and slopes are estimated (Ms1 and Ms2) and residual variances (u0 through u4) are set equal across waves. The paths from the latent slope factors to the observed scores are the age basis coefficients, B1(t) and B2(t). The age basis serves as a marker for the age of the subject at each time of measurement, adjusted for the centering age.

Fig. 1
figure 1

Two-slope biometric latent growth curve model. Observed data are denoted by y0 through y4. Mi mean intercept, Ms 1 mean slope 1, Ms 2 mean slope 2, U 0 through U 4 indicate random error. I, S 1 , and S 2 refer to intercept, slope 1, and slope 2. The paths from the latent slopes to the observed scores are the age basis coefficients, B1t and B2t, which define the intervals of change over age. The model includes additive genetic effects for the intercept (Ai) and slopes (As 1 and As 2 )

The random effects, or variance, in latent growth curve parameters can be divided into three components: additive genetic effects (A), shared rearing environmental effects shared (C), and nonshared environmental effects unique to each individual and error associated with age-specific residuals (E). A Cholesky decomposition of the variance is used to estimate A, C, and E influences on each LGCM parameter and on the relationships among them. For simplicity, the model in Fig. 1 presents only the additive genetic effects for the intercept (Ai) and slopes (As1 and As2): paths a1 through a6. Shared and nonshared environment were also included in the model. Biometric latent growth curve models were fit with the structural equation modeling program Mx version 1.67 (Neale et al. 2003). The raw maximum likelihood estimation procedure was used throughout. Nested models were tested using a difference Chi square test (i.e. subtracting the −2 log likelihoods of the models being compared).

Results

Phenotypic Growth Curve

Intercept and slope parameters estimated by the phenotypic latent growth curve models for WPPSI and WISC-R subscales are presented in Table 2. Likelihood ratio tests indicated that the two-slope model fit data from all 7 WISC-R subtests better than the one-slope model (p < .001). The expected deceleration in growth of cognitive function is apparent in the relative magnitude of the three slope estimates: ages 4–6, 7–10, and 11–15. At each subsequent age range the slope estimate is smaller, reflecting a slowed rate of growth. However, the particular slope estimates and the extent of slowing varied dramatically across subscales, and the pattern was not always consistent within the domains of verbal comprehension and perceptual organization. Block Design exhibited both the highest mean at age 5 and the largest rate of increase from age 4–6. With the change instrumentation at age 7, the mean dropped but rates of increase remained the highest at ages 7–10 and 11–15. In contrast, Picture Completion, also a measure of perceptual organization, appeared to level off by age 10 and slope 2 for the WISC-R was quite small. Results for the 4 measures of verbal comprehension tended to be more consistent, although mean intercept ranged from 30.47 to 49.06 at age 5 and slope estimates at ages 4–16 ranged from 0.96 to 1.45. Greater consistency in growth curve parameters is seen for the WISC-R measures of verbal comprehension.

Table 2 Parameters estimates (standard errors) from phenotypic latent growth curve models

Heritability Across Age

Twin correlations for each cognitive measure at each wave are reported in Table 3. Several patterns of change in twin correlations with age are evident, and they are generally inconsistent within cognitive domain. Vocabulary, Comprehension, and Picture Completion demonstrated early divergence in twin correlations that then remained fairly stable in later childhood. In contrast, MZ correlations for Information, Similarities, Block Design, and Arithmetic generally increased across waves while DZ correlations were generally stable or decreasing. Taken together, these patterns of twin correlations suggest higher heritability for cognitive function later in childhood, confirming results for general IQ reported by Wilson (1983).

Table 3 Twin correlations at each wave of testing

From previous research we expect genetic proportion of variance to increase as reliable variance increases and error variance decreases, at least up to age 8 years (Tucker-Drob and Briley 2014; Wilson 1983). Estimates of age trends in heritability and shared environmental variance calculated from the biometric latent growth curve models are presented in Fig. 2. Several trends are evident. First, between ages 4 and 6, when the WPPSI was administered, heritability is generally stable or increasing and shared environment is generally stable or decreasing. Given changes in total variance and reliable variance during this age range, focusing on proportions of variance (e.g., heritability) can mask changes in raw variance components. Focusing instead on age changes in underlying genetic variance, as estimated by the biometric LGC model, can provide a clearer picture of age trends. As shown in Fig. 3, all cognitive measures show an increase in raw genetic variance between ages 4 and 6 years, with large differences across domains. Block Design demonstrates a two-fold increase in genetic variance, whereas for Comprehension the increase in genetic variance in this age range is minimal. Thus, all measures are showing the increases in stable genetic variance for individual subtests predicted by Tucker-Drob and Briley’s (2014) analysis of general intelligence.

Fig. 2
figure 2

Changes in heritability (h2; solid line) and shared environment (c2; dashed line) with age, as estimated by the biometric latent growth curve model

Fig. 3
figure 3

Age changes in raw genetic variance from ages 4 to 6 years

Second, starting at age 7 when the WISC-R was administered, heritability and shared environment are fairly stable (see Fig. 2). Heritability estimates for similarities, vocabulary, comprehension, and block design are of the same magnitude as those reported by cross-sectional studies in the same age range (LaBuda et al. 1987; Petrill et al. 1996). Heritability estimates for information, arithmetic, and picture completion are fairly high; although it is unlikely they are significantly different from earlier reports. LGC models capture systematic age-related variance and thus, any unreliability that may reduce cross-sectional heritability estimates is contained in the residual variance from the LGC model. As a result, estimates of genetic influences based on longitudinal data can be greater than estimates from cross-sectional data.

Third, although for many of the subtests we find a very good correspondence (information, picture completion, and arithmetic) or at least moderately good correspondence (similarities and comprehension) between results for the WPPSI and the WISC-R, for two subtests the results do not accord well across the tests. Both Vocabulary and Block Design demonstrate higher heritability in younger childhood (WPPSI) than middle and older childhood (WISC-R). As shown in Fig. 3, Vocabulary and Block Design are the two subtests that demonstrate the greatest rate of increase in genetic variance, and the amount of genetic variance at age 6 years is actually consistent with results for the WISC-R at ages 7 and older. The LGCM of the WPPSI data is trying to model these large increases in genetic variance, resulting in a possible overestimation of heritability, whereas the LGCM of the WISC is modeling more stable variances and thus more stable estimates of proportions of variance. Additionally, higher relative shared environment for WISC-R Vocabulary and Block Design may reflect changes in environment associated with entering school.

Sources of Heritability

In addition to estimating changes in heritability over age, LGC models allows for investigation of continuity and discontinuity in sources variance. Using the Cholesky decomposition of genetic and environmental influences on the LGCM parameters illustrated in Fig. 1, we can estimate the extent to which heritability for the slope parameters is associated with genetic influences on the intercept (paths a2 and a3), with genetic influences unique to slope 1 (a4 or a5), or with genetic influences unique to slope 2 (a6). This decomposition of the heritability of the slope from the linear WPPSI LGCM and the two slopes from the WISC-R model are presented in Table 4. Significance of total heritability for the slope parameters was tested by comparing the full model to models that dropped all genetic influences on the slope parameters, considered separately. Significance of the portions of total heritability associated with intercept or slopes was indicated by 95 % confidence intervals for those parameter estimates calculated in Mx. Regardless of the high level of heritability estimates reported in Fig. 3, genetic influences on rates of change were quite modest and achieved significance in only 8 of 21 instances, similar to results reported by biometric LGCM analyses of reading ability in children (Logan et al. 2013; Petrill et al. 2010) and cognition in adulthood (Finkel and Reynolds 2009). In most cases that the heritability of the slope parameter was significant, the source of the significant proportion of that heritability was genetic influences associated with the intercept, indicating continuity in sources of genetic variance. In two instances, however, significant portions of heritability independent of the intercept indicated discontinuity in sources of genetic variance: WISC-R Information Slope 1 and WISC-R Picture Completion Slope 1. Fifty-six percent of the heritability of Slope 1 for WISC-R Information acted through the intercept parameters, but the remaining 44 % was independent of the intercept; reflecting genetic innovation. An even higher portion of the heritability of Slope 1 for WISC-R Picture Completion (94 %) was independent of the slope. This pattern of discontinuity was observed for 5 additional slope parameters, but failed to achieve significance. Both significant and nonsignificant indications of discontinuity of genetic etiology did not cluster within cognitive domains, but occurred across domain boundaries. Results for shared and nonshared environmental effects indicated all influences were indistinguishable from influences shared with the intercept.

Table 4 Proportion of genetic variance for slope parameters shared with intercept and independent to slope

Discussion

The goals of the current analyses were to apply biometric latent growth curve models to LTS data in order to replicate and extend Wilson’s (1983) conclusions concerning greater MZ twin than DZ twin similarity for trajectories of cognitive development during childhood. Cognitive development was examined at the level of individual subscales of the cognitive measures to examine domain differences in rates of growth, genetic and environmental influences on growth, and evidence for genetic continuity versus discontinuity in cognitive development.

Most applications of latent growth curve models to cognitive development in childhood focus on latent factors, including language and numeracy skills (Murayama et al. 2013; Skibbe et al. 2008; Willoughby et al. 2012). Focusing on latent factors generally provides greater reliability in measurement; however, nuances of cognitive development can be overlooked. For example, because of the interest in isolating the exact locus of possible difficulties in development of reading abilities, applications of LGC models to reading often focus on individual components of the skill and subsequently identify trait differences in growth trajectories (Logan et al. 2013; Lonigan et al. 2013; Petrill et al. 1996). Similarly, in the current analyses marked differences in developmental trajectories were evident across cognitive domains, particularly in early childhood when cognition was developing most rapidly. Between ages 4 and 6 years, Block Design scores increased twice as fast as Vocabulary scores. This difference did not simply reflect distinctions between verbal comprehension and perceptual organization; other subscales of verbal comprehension demonstrated rates of growth that were intermediate between these two subscales. Trait differences in rates of growth likely reflect the distinct developmental trajectories of various brain structures and neuronal circuits throughout childhood (Chavarría-Siles et al. 2014). To focus solely on general intelligence factors may result in overlooking the nuanced interplay between components of brain development and aspects of cognitive development.

That same reasoning can be extended to genetic and environmental influences on cognitive development. Longitudinal data from children consistently reports increasing heritability for general cognitive function (Bishop et al. 2003; Tucker-Drob and Briley 2014; Wilson 1983); however, differential heritabilities identified for various brain structures (Chavarría-Siles et al. 2014) may lead to differential heritabilities for various components of cognitive function. It is important to note that Chavarría-Siles et al. (2014) clearly support the consensus is that there is no direct correspondence between brain structure and performance on particular tasks. In the current analyses, we have extended Wilson’s (1983) original conclusion the MZ twins become increasingly concordant for development trends in general cognitive function throughout childhood while DZ twin concordance remains relatively stable. Results indicated that although heritability (and raw genetic variance) generally increases throughout childhood for individual cognitive traits, the rate of increase varies across traits and instrumentation. Similar to results from a recent meta-analysis (Tucker-Drob and Briley 2014), we found that after increasing dramatically in early childhood, heritability generally leveled off in middle childhood.

Finally, the focus on individual cognitive tests allowed us to address the issue of genetic continuity and discontinuity in cognitive development at the trait level. Significant genetic influences on rates of change independent of genetic factors associated with initial performance levels were found for two of the seven cognitive subscales: Information (a measure of verbal comprehension) and picture completion (a measure of perceptual organization). Other measures within these cognitive domains did not demonstrate any genetic influences that were statistically distinguishable from initial genetic influences on intercepts. These results suggest that the Information subscale differs from other components of the verbal comprehension domain, at least with regard to genetic influences on rates of growth. All four subscales tapping the verbal comprehension domain (Information, Vocabulary, Similarities, and Comprehension) reflect crystallized abilities in that they clearly result from acculturation processes and represent the accumulation of knowledge. The Information subscale also reflects semantic memory (McRae and Jones 2013), and as a result may share common patterns of etiology with other memory measures. In fact, the pattern of genetic influences found for WISC-R Digit Span (not reported here) were similar to results for Information, although the genetic variance independent of the intercept did not achieve statistical significance. Similarly, although both Block Design and Picture Completion measure perceptual organization, they measure different aspects of this latent trait and subsequently demonstrate different genetic etiologies.

It is possible, then, that the patterns of stability and change in genetic and environmental influences on cognitive development may be unresolved because focus on general cognitive performance masks significant differences in etiologies at the trait level. Some theories of cognitive development propose that cognitive functioning begins as a fairly unstructured general quality that differentiates into distinct aptitudes over the course of childhood (Garrett 1946), although see Tucker-Drob (2009) for a recent challenge to Garrett’s theory. Differentiation of cognitive abilities in childhood may explain the domain differences reported here in growth trajectories, heritability across age, and sources of heritability.

The original goal of the current analysis was to replicate Wilson’s (1983) results with current statistical methods. Latent growth curve models calculate the mean trajectory in the sample and biometric LGC models estimate the genetic and environmental influences on deviations from that mean trajectory (variance in growth curve parameters). However, LGC models of nonstandardized scores cannot tap the spurts and lags of development in the same way as Wilson’s (1983) developmental synchronies index (DSI) based on standardized scores. We did examine twin similarity for residual variance from the LGC model (i.e., variance not related to age–based trajectories), but found no significant differences in MZ and DZ similarity for residual variances. It is possible to examine more complex trajectories by including multiple turning points in the growth model (Finkel and Davis 2009), incorporating exponential growth trajectories (Tucker-Drob and Briley 2014), or modeling individual change points (Dominicus et al. 2008) and then examining twin similarity. In addition, latent difference score models incorporate both latent growth trajectories and the autoregressive parameters of the simplex model (McArdle and Hamagami 2003). Many of these approaches can be estimated in multiple programming languages, including R (Ghisletta and McArdle 2012). These approaches would benefit both from more data and from data that are harmonized across instrumentation. Using item-level data from the cognitive measures, it is possible to combine item response theory and latent growth curve approaches to examine trajectories across instrumentation and age (McArdle et al. 2009). This method has yet to be applied to twin data, but remains an aspiration as we continue to revive the LTS.