Introduction

Philosophers and psychologists have long debated whether human personality is undergirded by fundamental dimensions, and if so what these dimensions are. As early as circa 300 BC (see Theophrastus 1870), Tyrtamus of Eresos in his Characters presented the fascinating and intriguing proposition that humans take on several qualitatively different personality types, and this line of investigation has continued to the present day in both popular and academic writings. Currently, the prevailing model of personality structure is the Five Factor Model (FFM; John and Srivastava 1999). This model posits five broad personality domains that characterize and account for the majority of variation in enduring patterns of how individuals typically behave (Digman 1990). These domains are Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness to Experience (Goldberg 1993). Several more specific facets are subsumed within each domain. For example, it has been proposed that Conscientiousness is composed of the facets Competence, Order, Dutifulness, Achievement Striving, Self-Discipline, and Deliberation (Costa and McCrae 1992a). While there is wide acknowledgement that meaningful unique variation exists for each of the facets within the Big Five personality domains, it has become commonplace for contemporary research approaches to focus exclusively on the Big Five domains, which by definition, only include variation that is common among facets of a domain. The current project uses a multivariate behavioral genetic approach to clarify the extent to which these broad domains are capable of capturing genetic variation in the more specific facet measures.

Debates about competing hierarchical taxonomies, have been common in the personality literature. For instance, a measurement crisis occurred when several competing models (Block 1995; Costa and McCrae 1992b; Eysenck 1992b; Zuckerman 1992) were put forward as representing the only basic or fundamental factors. The burden of proof was placed on the newest personality theory, the FFM, to demonstrate that Agreeableness and Conscientiousness were not both subcomponents of a higher order factor, such as Psychoticism, and that Openness to Experience was a personality trait and not a measure of culture or some other non-personality construct (Eysenck 1992a; McCrae 1994; McCrae and Costa 1985; Zuckerman et al. 1993). However, evidence in support of the FFM has grown to such a level that the traits of Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness to Experience have become paradigmatic reference points for personality research (John and Srivastava 1999, but see also Block 2010 for continued disagreement).

Broad bandwidth

More recently, research about personality structure has centered within or above the Big Five in light of what Cronbach and Gleser (1957) termed the bandwidth-fidelity dilemma. This practical dilemma results from the trade-off between using measures that will cover the majority of variation in personality (domain level measurement) or measures that will assess a few specific behavioral tendencies (facet level measurement) more precisely. Ones and Viswesvaran (1996) have argued that for complex behavioral outcomes, equally broad or complex personality traits are likely to provide the most accurate prediction as they can cover the entire range of behavior patterns important to the outcome. For example, Ones et al. (1993) claimed that for the prediction of job performance, a highly complex outcome typically requiring the ability to work in a social environment, maintain high productivity, and avoid loafing or theft, a compound personality measure (labeled integrity) reflecting high Conscientiousness and Agreeableness and low Neuroticism is necessary to account for the diverse requirements of most jobs. Further evidence of the usefulness of compound traits comes from research on customer service orientation, violence and aggression, stress tolerance, drug and alcohol use, and self-regulation (Fein and Klein 2011; Ones and Viswesvaran 2001a, b).

In light of these results and the common finding that the Big Five are not entirely orthogonal, researchers have proposed two higher order factors, sometimes referred to as Stability and Plasticity, as residing hierarchically above the FFM (DeYoung 2006; Digman 1997). The “metatraits” of Stability (represented by high Agreeableness, Conscientiousness, and low Neuroticism) and Plasticity (represented by high Extraversion and Openness) have proven to have evidence of criterion validity (Hirsh et al. 2009). These high-bandwidth factors have been found to be associated with engagement or restraint of general behavioral acts, individual differences in circadian rhythm, externalizing behaviors, and basic values (DeYoung et al. 2007a, 2008; Hirsh et al. 2009; Vecchione et al. 2011). Stability and Plasticity are thought to represent individual differences in the function of serotonin and dopamine, respectively (Hirsh et al. 2009). The use of highly broad traits is supported by the far-reaching biological effects differences in serotonin and dopamine functioning have on behavior. Additionally, Stability and Plasticity have been linked with components of reinforcement sensitivity theory, namely the behavioral activation system and the behavioral inhibition system, as further evidence of the biological function of these individual differences (Mitchell et al. 2007).

Other researchers take this a step farther and postulate a general factor of personality (Musek 2007; Rushton and Irwing 2008, 2009, 2011). These researchers view the endeavor as having intrinsic worth for understanding genetics, human nature, and evolution (Rushton et al. 2008), but little research has been conducted on the criterion validity of the extreme high-order factor (van der Linden et al. 2010). Some exceptions to this are the finding that the general factor is positively related to self-esteem and supervisor-rated job performance (Erdle et al. 2010; van der Linden et al. 2010). Similar to Stability and Plasticity, the general factor has been linked with reinforcement sensitivity theory in an effort to provide a biological understanding of why individual differences in the construct exist (Erdle and Rushton 2010). Much research remains as to whether these findings are merely measurement artifacts (Bäckström et al. 2009; Erdle et al. 2009; McCrae et al. 2008), are extensively hindering theoretical development (Ferguson et al. 2011), or are better represented by parsimonious blended traits instead of postulating higher-order factors (Ashton et al. 2009).

High fidelity

Meanwhile, a minority of researchers have moved in the direction of endorsing more fine-grained measurement (DeYoung et al. 2007b). Researchers from this group have argued that lower-level traits may lose broad bandwidth, but they gain fidelity, allowing them to predict more specific behavioral patterns (Paunonen 1998). For example, Paunonen and Ashton (2001a) demonstrated the possible utility of a narrow rather than broad approach to personality measurement. In this study trained raters carefully selected five facet-level personality dimensions to predict forty behavioral outcomes based on rational grounds in addition to the broad measures of the FFM. The narrow measures of personality were able to account for more variance when included alone in a regression equation than the broad traits, and importantly, the facets were able to add incremental prediction when entered with broad trait level factors. This indicates that some useful information is lost when facets are aggregated to the broad trait level.

There is some recognition that the use of narrow traits can be more effective for making behavioral predictions. Examples of narrow traits outperforming broad personality factors include the prediction of academic achievement (Luciano et al. 2006; Paunonen and Ashton 2001b), broad job performance (Ashton 1998; Christiansen and Robie 2011) and specific performance criteria (Dudley et al. 2006), antisocial behavior (Corff and Toupin 2010), prejudice (Ekehammar and Akrami 2007), drug use (Terracciano et al. 2008a), longevity (Terracciano et al. 2008b), and extremes in weight (Terracciano et al. 2009). Additionally, there is evidence that cross-cultural results are more generalizable or reliable when using narrow facet-level measurement (Paunonen et al. 2003), and that facet-level measurement allows different personality constructs to be mapped on to one another with greater clarity (Armstrong and Anthoney 2009).

Big five as the status quo measurement level

Despite these movements to conceptualize personality either higher or lower on the trait hierarchy scheme, the majority of studies measure at the Big Five level. For example, Christiansen and Robie (2011) found that of 200 studies that utilized FFM measurement in the Journal of Applied Psychology and Personnel Psychology over roughly eight years, only 10 studies measured narrow traits. For many approaches, this may simply be a function of practicality. As Goldberg (1993) noted, “Because one always loses specific variance as one amalgamates measures, the optimal level of prediction is completely a function of statistical power and sample size” (p. 181). Obtaining a sample large enough to precisely measure five broad traits is far easier than the sample required for thirty facets. However, Costa and McCrae (1995) have argued for the practical importance of facet-level measurement for two reasons. First, while each of the facets of the FFM was found to load highly on its intended factor in a principal factors analysis, there was also strong evidence of specific variance associated with each facet that is not held in common with the other facets of a factor. In fact, seven facets actually loaded more strongly on their specific factor rather than the common factor. Second, from an applied perspective, analysis of the relationships of facets within a factor can shed important light on the types of treatment or approaches that will be most effective for a given patient. Utilizing narrow personality measures can move analysis closer to actual mechanisms of behavior.

Clearly, research surrounding the proper level of analysis of personality traits is of direct relevance to pragmatic research goals such as discovering replicable findings that are not simply sample specific. Cronbach (1960) suggested that researchers could initially measure at a broadband level, and as specific replicable findings are presented in the literature, high fidelity measures could then shed light on the presumed mechanisms or processes that the construct represents. In general agreement with this sentiment, Caspi and Shiner (2006) have argued that it is “a short-sighted strategy to rely exclusively on measures of broad superfactors” to make reliable judgments about behavior (p. 332). The argument in favor of this claim is simple. If there are differential effects of some of the more nuanced components of a personality trait, then these possibly very important effects may not be present when measured at the trait level. For example, if half of the facets of Openness are positively related to some important outcome and the other half are negatively related to this outcome, it is conceivable that there will be no measurable relationship between Openness and the outcome, but this does not mean Openness is not an essential predictor of the outcome. Cronbach (1960, p. 604) points out that although beginning a research program using broad bandwidth measures (such as Openness) may be sensible, lack of relation between a broadly measured trait and an outcome of interest may not necessarily indicate a lack of relation between a more specific aspect of that trait (such as the Openness facet termed Intellectualism) and the outcome of interest. Rather, because broad bandwidth measures are imprecise conglomerates of multiple, potentially distinct subtraits, results of broad bandwidth measures should be taken as a first step in further exploration that includes greater fidelity. A less extreme example of this potential error might be a case in which only one facet of Openness relates to an outcome, and the remaining facets are completely unrelated to the outcome, resulting in a very weak, if even detectable, relation between Openness and the outcome. In both hypothetical scenarios, finer grained analyses would be warranted. Interestingly, rather than moving in this direction, contemporary personality research continues to rely heavily on broad levels of measurement and analysis.

Behavior genetics as a method to shed light on structure

An important criterion for this controversy can be found from Faraone et al. (1999) who have used the term “genetically crisp” to define psychological measures that represent the homogenous effect of genes. A measure would be genetically crisp if the construct that it tapped into is affected primarily in a uniform way by a single set of genes. Faraone et al. (1999) argue that measures that have this characteristic are particularly important in applied situations because genetic counseling for disorders and molecular searches for candidate genes that lead to abnormal development depend on the uniform genetic composition of measures to be effective and reliable. Assessments based on measures that are influenced by a single set of allelic variants will make more consistent diagnoses than judgments using a measure that is influenced by several sets of allelic variants which may or may not all be present in a given individual.

Although not discussed by Faraone et al. (1999), it may be similarly important for personality researchers to use genetically crisp measures, as this allows for more precise inference of a given psychological construct from a specific task. If an outcome is found to be linked to a broad personality factor that is an aggregate of several lower-order facets that have unique genetic variance, then it is difficult to determine what portion of the factor the outcome is associated with in actuality. Related to this point, increased statistical power is likely to accompany searches for allelic correlates of traits that are measured by instruments assessing uniform genetic effects. In other words, both candidate-gene and genome-wide association studies of personality are likely to be most successful when the personality outcome measured reflects a set of constructs that are influenced by a common set of genes, rather than a constellation of constructs, each of which is influenced by a unique set of genes. Research conducted at the phenotypic level is unable to provide information about the genetic crispness of different personality measurements. For example, even though Costa and McCrae (1995) found substantial variation that was specific to each facet, it is conceivable that all of the specific variance was due to environmental factors (and measurement error), and the higher order factor did capture all of the genetic variance of the lower-order traits. This would be evidence that aggregation to the FFM level does not lose significant genetic variance. However, if there is evidence for facet specific genetic variance that is not accounted for by the common factor, then this would lend support for narrow measurement.

There is a strong history of quantitative behavior genetic work within personality research. Behavior genetic analyses decompose the observable variance of a trait into variance accounted for by additive genetic effects (A), shared environmental effects (C) that serve to make children from the same family similar to one another, and nonshared environmental effects (E) that are uncorrelated with family effects and make children different from one another. Analyzing personality factors and facets in this way can begin to uncover the etiology of traits which is a crucial task for constructing personality theory. The usefulness of the univariate behavior genetic approach has been demonstrated by removing at least one controversy from personality research. Some early conceptions of Agreeableness, Conscientiousness, and Openness to Experience defined these traits as learned tendencies to be contrasted with the biological traits of Extraversion and Neuroticism (Carver et al. 2000). Univariate behavioral genetic work has found this to be a false distinction, indicating fairly even contributions of both genes and environments to each of the five broad personality traits (Jang et al. 1998; McCrae and Costa 1997; McCrae et al. 2005; Yamagata et al. 2006; Bergeman et al. 1993; Bouchard 1997; Loehlin and Nichols 1976; Loehlin 1992; Riemann et al. 1997) as well as to the FFM facets (Jang et al. 1996). This body of literature represents several age groups, several cultures, and a wide range of measures and recruitment strategies enhancing the generalizability and reliability of the findings. Furthermore, genetic influence has been found for the higher-order Stability and Plasticity factors (Jang et al. 2006) and for the general factor of personality (Veselka et al. 2009).

Such a univariate approach does little to advance the debate about the proper level of analysis, however, because univariate methods cannot make claims about the structure of genetic or environmental effects. Multivariate methods are necessary to distinguished between the commonality and specificity of genetic and environmental effects, and to test whether common effects can be plausibly specified to operate indirectly on specific facets by way of more general traits. An early example of this approach comes from Heath and Martin (1990), who tested the assumption that phenotypic correlation and univariate estimates of heritability are evidence for coherent genetic and environmental effects within the Psychoticism trait (Eysenck and Eysenck 1976). Based on data from 2,903 twin pairs, the researchers found that the phenotypic unity of the scale broke down into two separate genetic factors instead of one. In hindsight, this result is consistent with the FFM, as research has since indicated that Psychoticism represents a mixture of low Agreeableness and Conscientiousness (McCrae and Costa 1985).

Univariate approaches to estimating genetic and environmental impacts on individual personality traits are now quite common, but there continues to be surprisingly little research that utilizes a multivariate approach. The few studies that have used a multivariate approach to examine the structure of the FFM have produced some conflicting results. McCrae et al. (2001) factor analyzed the genetic covariance structure of facets of the FFM in a sample of twins and uncovered five genetic factors. However, the genetic factors only roughly resembled the FFM. Yamagata et al. (2006) also point out that a methodological limitation (use of simple subtraction instead of the more precise structural equation modeling) of the study renders the findings equivocal: Did this limitation introduce noise into the data that created the lack of correspondence or did the genetic factors truly not represent the FFM? Support for the FFM comes from two separate groups of researchers (Yamagata et al. 2006; Pilia et al. 2006) that have conducted more advanced exploratory factor analyses on the genetic and environmental correlation matrices of the FFM facets. They have found that the genetic and environmental factor loadings highly resemble the phenotypic loadings in that facets load highly on the related trait. While these approaches offer strong evidence that the facets of the FFM are related at the phenotypic, genotypic, and environmental levels, the use of exploratory factor analysis is less able to provide accurate information about specific structural hypotheses such as trait hierarchical relationships which are best tested using confirmatory methods.

To our knowledge, only two research studies have utilized confirmatory, multivariate, behavior genetic methods to analyze the genetic and environmental structure of the FFM facets. Jang et al. (2002) were the first to use confirmatory behavior genetic structural equation modeling methodologies to investigate the structure of the FFM. These researchers found that two, rather than one, genetic and nonshared environment latent factors were required to successfully account for the variance in the facets of each trait. Johnson and Krueger (2004) also found more complex genetic and environmental effects than expected by the FFM. Models that were fit with trait hierarchy as hypothesized by the FFM fit more poorly than more complex models for all traits except Extraversion and Neuroticism. These findings call into question whether the Big Five or any higher-order metatraits can be considered genetically crisp.

In addition to the importance of understanding the genetic and environmental structure of personality for all types of behavioral prediction, genome-wide association studies that investigate molecular behavior genetics are particularly likely to be advanced by multivariate quantitative behavior genetics. A few large scale and quite costly projects have been undertaken to search for genetic variants that might account for some variance in personality traits (Kim and Kim 2011). It is common for these studies to find significant associations between specific genetic variants and broad personality traits, but for these associations to not replicate in separate samples (Terracciano et al. 2010; de Moor et al. 2012). These inconsistent results may stem from analyzing traits at too broad a level. Terracciano et al. (2010) point out that imprecision of measurement renders it impossible to determine if an individual scored high on Neuroticism, for example, because they were high on Depression, but not Anxiety, or high on Anxiety, but not Depression. This imprecision of measurement is then passed down to imprecision in the search for specific genes associated with the trait. Further, they argue that understanding the genetic structure of personality traits is an extremely valuable research goal as the DSM-V is moving towards a dimensional approach that conceptualizes psychiatric disorders as extremes along an otherwise normal-range distribution. Under such dimensional conceptualizations, determining what level of analysis should be emphasized in studies of normal-range personality may have implications for clinical diagnosis. If the most consistent and largest effects of genetic variants on personality outcomes occur at the levels of specific facets, then aggregation across facets may serve to dilute power in both molecular genetic research and applied practice. Alternatively, if genetic variants act at the trait level and environments contribute primarily to facet-level differentiation, then aggregation of scores across facets into larger traits may serve to maximize the genetic variation of interest by averaging out facet-specific environmental variation. In such a case, a high bandwidth approach may be most advantageous. Of course, it may be the case that regardless of the level of measurement, individual genes do not account for substantial portions of variance in any complex personality measure. It may be that genetic variation in personality results from a complex series of dynamic interactions between different biological, psychological, and environmental systems over the course of development (Turkheimer 2000). Nevertheless, future research on such complex systems is likely to benefit from determination of the extent to which the processes occur at broad and specific levels of personality.

Goals of the present study

The present study seeks to extend and further explore the structure of the genetic and environmental effects on personality by fitting a series of structural equation models that specify varying degrees of structure for trait hierarchy. To fully understand the development of any characteristic, it is important to explicate the genetic and environmental effects that produce it. However, in light of the desire to create genetically crisp measures for both theoretical and applied goals, we place particular emphasis on the structure of the genetic effects on the facets. We will seek to provide a descriptive viewpoint of the relative importance of global and specific genetic effects for each domain of the FFM, as well as test differences between theoretically meaningful models. Evidence of the amount of genetic variance that is either shared among the facets within a domain or specific to individual facets can inform the discussion of whether important information about personality is contained or lost when aggregating to the domain level. Differences in the fit of psychometrically meaningful independent and common pathway models can shed light on whether the structure of the variance that is shared among facets within a domain can be represented in a highly coherent manner, such as a hierarchical factor, or is better represented by a looser interpretation (Neale and Cardon 1992; McArdle and Goldsmith 1990). Exploring the structure at the level of facets and domains can provide the necessary foundation for interpretations of the unity or coherence of even higher order metatraits or general factors of personality. Further replication of complex genetic and environmental effects on the facets with a novel, large sample of twins would shed light on the pragmatic use of facet level measurement.

Method

Participants

This study made use of data from a sample of adolescent twins who took part in the National Merit Twin Study (Loehlin and Nichols 1976). The National Merit Twin Study was based on a sample of 596,241 students who took the National Merit Scholarship Qualifying Test (NMSQT) and answered an item asking if they have a twin. Based on this response, 1,507 potential same-sex pairs were identified and sent a questionnaire, of whom 850 twin-pairs ultimately participated (514 MZ and 336 DZ). Zygosity was determined based on a questionnaire developed by Nichols and Bilbro (1966). The final sample was composed of 354 (217 MZ and 137 DZ) male same-sex twin-pairs and 496 (297 MZ and 199 DZ) female same-sex twin-pairs. The sample used in the current study made use of the 807 twin-pairs for whom California Psychological Inventory (CPI) responses were available. The sample composition was 326 (202 MZ and 124 DZ) male same-sex twin-pairs and 481 (288 MZ and 193 DZ) female same-sex twin-pairs. Nearly all of the individuals who took the NMSQT were high school juniors. The majority of the individuals in the sample were therefore approximately 17 years of age. Note that while the students who take the NMSQT are higher achievers than a truly representative sample, the sample has been shown to be representative of the general population in terms of personality traits (Loehlin and Nichols 1976). Further discussion of the sample and data collection procedures can be found in Loehlin and Nichols (1976).

Measures

Participants completed the CPI as part of a battery of tests (Gough 1957). The CPI was constructed many years before the FFM was established. While the original CPI scoring system produces scales that appear related to Extraversion (e.g. Sociability and Social Presence) and Conscientiousness (e.g. Responsibility or Self-Control), previous studies have found that the original CPI scales do not directly map onto the FFM fully (McCrae et al. 1993). Fortunately, the CPI contains 480 items that the creators of the measure encourage to be recombined in different ways to construct new scales (Gough and Bradley 1996). In this spirit Soto and John (2009) sought to extract the FFM from an item level analysis of the CPI. Using three independent validation samples, the researchers developed an algorithm by which to score the CPI on the FFM traits and several sub-facets. Each CPI item was organized into a cluster based on its primary correlation with two FFM traits. Then, the items were plotted into circumplex space based on the primary loadings. Groups of clustered items representing the different facets were identified within the circumplex space. The CPI-Big Five measure was found to be similar in each sample to several current measures of the FFM such as the Revised NEO Personality Inventory (Costa and McCrae 1992a), the Big Five Inventory (John et al. 1991), the Mini-Markers (Saucier 1992), and the Adjective Check List (Gough and Heilbrun 1983) as well as peer ratings on these measures. Across three samples, reliability estimates of the 16 facets averaged 0.72 and ranged from 0.56 to 0.85. Table 1 provides an overview of the 16 facets, and includes sample items. The current study utilized Soto and John’s (2009) algorithm for scoring the CPI-Big Five.

Table 1 Facets and sample items of the CPI-Big Five

Analytical approach

Our analytic approach rests on a comparison of two structurally meaningful models that differ in their allocation of common genetic and environmental effects on the facets of each Big Five domain. Figure 1 presents a graphical representation of these models (Neale and Cardon 1992; McArdle and Goldsmith 1990). The first, depicted in Fig. 1a, is the independent pathways model. This model allows common genetic influences to directly affect each of the facets, rather than requiring them to influence the facets indirectly through a higher-order factor. This model allows genetic and environmental influences to have effects on multiple facets, but it does not presume these common effects to be manifestations of singular effects on a broader trait. The second model, depicted in Fig. 1b, is a common pathways model which places increased structure on the genetic and environmental effects. This model posits genetic and environmental effects on a latent variable, which are filtered down to the facets. Put differently, the common pathways model requires that the common genetic and environmental effects on a facet are proportional in terms of the facet’s loading on the latent construct. Importantly, the primary distinction between these models is the amount of hierarchical structure that is posited for the common or global genetic and environmental effects. Both the independent and common pathways models specify unique or specific genetic and environmental effects on the facets that are not related to the effects of other facets. By global, we mean the variance that is shared amongst the facets within a domain. This is represented in Fig. 1 as the variance components that are above the facets and are shared in common by each facet. By specific genetic and environmental variance, we mean the remaining variance within a facet that is not accounted for by the global variance and is represented in Fig. 1 as the variance components below the facets that are unique to each facet.

Fig. 1
figure 1

a Diagram of independent pathways model. Global and specific ACE factors are allowed to directly affect the facets. b Diagram of common pathways model. Global ACE factors are filtered through a hypothesized latent trait variable, and specific ACE factors are allowed to directly affect the facets

We had two goals in mind for our approach to analyzing the data, a descriptive and a theoretical goal. The first goal was to simply describe the global genetic and environmental effects and the specific genetic and environmental effects on each trait construct. We report parameter estimates from the independent and common pathways models with all A, C, and E influences modeled. This allowed us to examine the magnitude of the variance components for each facet that are accounted for by broad factors that are shared with the other facets and the extent to which there are facet-specific, unique effects. Our intention is to better describe the different parameters instead of test their statistical significance, and we therefore report 95 % confidence intervals for the estimates. Because the common pathways model is increasingly restrictive of the parameter estimates, the independent pathways model provides estimated values that are closer to that implied by the data and may indicate the distinction between global and specific variance more accurately. However, comparing parameter estimates between the independent and common pathways models allows one to examine the influence that positing a hierarchical latent trait has on the genetic and environmental effects.

Second, our theoretical goal was to test the hypothesis that a latent phenotype can account for the genetic and environmental effects on facet scores without producing significant model misfit. The common pathways model is the more parsimonious model because it estimates fewer parameters. According to McArdle and Goldsmith (1990), the common pathways model is a nested form of the independent pathways model. We verified that this was the case using the procedure developed by Bentler and Satorra (2010). Our primary method of model comparison was therefore the χ2 difference test. If two models did not fit significantly different from one another according to the χ2 difference test, the more parsimonious common pathways model was preferred. Also provided are Akaike’s Information Criterion (AIC) and root mean square error of approximation (RMSEA). AIC reflects a balance between parsimony and fit and can be used to compare both nested and non-nested models. Lower AIC values indicate better fit. RMSEA is an index of absolute model fit and is not used for direct comparison. RMSEA values below 0.05 indicate good model fit between the expected and observed covariance matrix (Steiger 1989).

All models were fit using full-information maximum-likelihood estimation in Mplus statistical software (Muthén and Muthén 2010).

Results

Scale description

Our first step was to determine if the CPI-Big Five scoring algorithm captured the FFM in the new sample using exploratory factor analysis of the produced facet scores. Separate analyses were conducted for the first and second member of a twin pair (twin1 and twin2). We used orthogonal target rotation to maximize the correspondence of this structure to the factor loadings reported by Soto and John (2009). The first five eigenvalues were 4.08/4.23, 2.55/0.257, 1.65/1.71, 1.51/1.40, and 1.06/1.02 followed by 0.78/0.70, 0.63/0.65, 0.55/0.53, 0.50/0.49, and 0.48/0.47 (for twin1 and twin2 respectively). From the eigenvalues, we determined that the expected five factor structure would be retained. Table 2 presents the factor loadings for both twins, as well as factor and variable congruence coefficients based on the congruence with the loadings reported by Soto and John (2009). Factor congruence coefficients ranged from 0.93 to 0.97, and variable congruence coefficients ranged from 0.86 to 1.00. Of 32 possible primary loadings, 29 facet primary loadings were for their intended trait. The one exception for both twin1 and twin2 was the facet Adventurousness which loaded most strongly (and negatively) on Agreeableness, rather than on Openness, as found by Soto and John (2009). This may indicate that the facets that indicate Openness in this sample, as takers of the NMSQT, may not be representative of the general population or more specifically, the student and community samples utilized by Soto and John (2009). Another difference was found for Compassion versus Insensitivity which loaded slightly stronger on Openness than Agreeableness for twin2. Together, this misfit at the phenotypic level may indicate that the results for these traits should be interpreted with some caution. However, the overall structure is largely similar as indicated by the factor and variable congruence coefficients, and the factor structure indicated by Soto and John (2009) was retained.

Table 2 Target factor loadings of the CPI-Big Five in current and previous study

Table 3 presents the univariate estimates and confidence intervals of the effects of genes, shared environment, and nonshared environment on the facets and on composites representing the five broader domains. In line with previous research (e.g., Loehlin 1992), estimates of heritability were typically moderate to large and ranged from 0.13 to 0.63 with a mean of 0.40 and median of 0.37 for the facets. Also in line with previous research (Bouchard 1997), the facets showed null-to-small effects of the shared environment (ranging from 0 to.23, mean = 0.08, median = 0.05). The remaining facet variance was explained by nonshared environment with estimates that were moderate to large in magnitude (ranging from 0.38 to 0.65, mean = 0.53, median = 0.56).

Table 3 Univariate estimates of the effect of genes and the environment on CPI-Big Five traits and facets

Multivariate parameter estimation

Table 4 presents the parameter estimates and 95 % confidence intervals from the ACE independent pathways model for each facet. A number of features of are note. First, there is strong evidence for common genetic effects across facets within a domain as evidenced by the high global genetic effect parameter estimates ranging from 0.18 to 0.71 (mean = 0.48, median = 0.48). Second, there is also strong evidence for specific genetic effects that operate on the individual facets and are not shared across facets. Estimates of specific genetic effects ranged from 0 to 0.57 (mean = 0.29, median = 0.31). This observation indicates that aggregating across facets within a trait would necessarily result in loss of specific genetic variance for many facets. Third, it can be seen that environmental effects are also fairly evenly distinguished into global and specific influences with a few exceptions. Global shared environmental estimates ranged from 0.02 to 0.52 (mean = 0.21, median = 0.19), and specific estimates ranged from 0 to 0.39 (mean = 0.12, median = 0). Finally, global nonshared environmental parameter estimates ranged from 0.03 to 0.75 (mean = 0.41, median = 0.41), and specific nonshared environmental parameter estimates ranged from 0 to 0.79 (mean 0.54, median 0.57).

Table 4 Standardized parameter estimates and variance components from the independent pathways model of each facet

The proportion of total genetic, shared environmental, and nonshared environmental effects that are accounted for by global or common effects is of particular interest for the current study. Table 5 presents the proportion of each total effect, the sum of the common and unique variance attributable to A, C, or E, which is due to common effects as implied by the parameter estimates from the independent pathway model. This provides an estimate of the relative importance of global or specific effects for each variance component. For example, relative to the total effect of genetics on Depression, 65 % of the effect is shared with other facets of Neuroticism. Alternatively, 100 % of the effect of genetics is accounted for by global effects shared between the facets for Trustfulness versus Cynicism, Compassion versus Insensitivity, and Rumination. This indicates that aggregating these facets to the trait level captures all of the genetic information found at the facet level. However, these facets are the exception. For Extraversion facets, the average percent of the genetic effect that is accounted for globally is 69 %. Very little specific information is lost for Social Confidence versus Anxiety (96 % global), but the majority of the information is lost for Gregariousness (40 % global). Two facets of Agreeableness show complete genetic overlap at the common level, but Humility versus Arrogance showed almost no genetic relation to the other facets (9 % global). There is strong evidence of common genetic influences for Conscientiousness and Neuroticism facets with all facets sharing more than half of the genetic variance globally and averaging 69 and 81 % global genetic variance, respectively. The Intellectualism facet strongly defined the genetic component of Openness with 83 % of its genetic variance accounted for globally, but only 46 and 55 % of the genetic variance in Idealism versus Conformity and Adventurousness is shared among the facets. Similar interpretations can be made for the shared and nonshared environment. While there is some evidence that genetic information is represented completely in an aggregate trait, it is typically the case that substantial information remains unique to each facet.

Table 5 Proportion of genetic, shared environmental, and nonshared environmental variance account for by global effects

Table 6 reports the parameter estimates and 95 % confidence intervals for the more constrained common pathways model. These estimates indicate strong common genetic effects for each of the broad traits with point estimates for the common influence of A ranging from 0.53 to 0.89 (mean = 0.74, median = 0.76), and strong common nonshared environmental effects for each of the traits, with point estimates ranged from 0.39 to 0.68 (mean = 0.60, median = 0.65). There is less evidence for the effects of the shared environment, but there is much variation between traits with estimates ranging from 0.00 to 0.50 (mean = 0.20, median = 0.23). Also of importance for these analyses, much specific genetic (mean = 0.32, median 0.39) and nonshared environmental (mean = 0.60, median = 0.60) variance remains at the facet level with some evidence of shared environmental (mean = 0.19, median = 0.24) variance. These results are generally consistent with those of the independent pathways models, in indicating that a great deal of genetic variation is facet-specific. Of course, because it imposes a highly constrained structure on the patterns of common genetic and environmental variation in the factors, its parameters may be less trustworthy than those from the independent pathways model. To evaluate whether the more constrained common pathways model represents the data as well as the less constrained independent pathways model, we turned to model comparison tests.

Table 6 Standardized parameter estimates from the common pathways model of each facet

Model comparison

The independent pathways model represents a relatively unrestricted account of the patterns of global and specific genetic and environmental influences on the facets within each of the trait clusters. It is possible that a more parsimonious common pathways model (in which higher order factors are explicitly presumed) or a model that does not estimate all of the parameters can fit the data equally well. Following the precedent of Johnson and Krueger (2004) and Jang et al. (2002), we compared models that did or did not posit the effect of the shared environment on the facets. The purpose of this is twofold. If models without the shared environmental effect specified do not fit significantly worse, this more parsimonious model can be taken as the best representation of the data. Further, in some instances the MZ correlation was slightly more than double the DZ correlation which may imply the effect of dominant genes. Demonstrating that model fit is not significantly reduced by removing the effect of the shared environment allows for the effects of dominance to be practically modeled in place of the shared environment. Table 7 presents the multivariate model fit comparison statistics for all models under investigation. RMSEA values were below 0.05 for all models examined indicating that they all fit the data relatively well. Removing the effect of shared environment produced a significant increase in χ2 for Neuroticism and Openness and did not for Extraversion, Agreeableness, and Conscientiousness. No models including the effect of dominant genes, however, fit significantly better than the AE models. Thus, for the purposes of comparing the common and independent pathways models, ACE Neuroticism and Openness models were evaluated, and AE Extraversion, Agreeableness, and Conscientiousness models were evaluated.

Table 7 Comparisons of ACE and AE models

Chi-squared difference tests were used to determine if the nested common pathways and independent pathways models fit significantly different (McArdle and Goldsmith 1990). Table 8 lists the fit statistics for these comparisons. The models did not fit differently for Extraversion (χ 2D  = 2.70, df D = 2, p = 0.26) or Openness (χ 2D  = 5.02, df D = 4, p = 0.29). Introducing the latent variable for Agreeableness produced a significant worsening of fit (χ 2D  = 8.87, df D = 2, p = 0.01). A similar worsening of fit was found for Conscientiousness (χ 2D  = 6.12, df D = 2, p = 0.05) and Neuroticism (χ 2D  = 35.52, df D = 4, p < 0.001). To summarize, an AE common pathways model was retained for Extraversion, an AE independent pathways model was retained for Agreeableness and Conscientiousness, an ACE independent pathways model was retained for Neuroticism, and an ACE common pathways model was retained for Openness.

Table 8 Comparisons of independent pathways and common pathways models

Model misspecification analysis

While the χ2 difference tests reported above were capable of determining whether significant model misfit occurs when constraining the independent pathways model to the common pathways model, the tests do not offer information about where the misfit is occurring. There are a number of different modeling strategies available to evaluate and determine more desirable structural relations. One possible strategy would begin with the model that produced a significant increase in model misfit and relax some of the constraints or add parameters until acceptable fit to the data is obtained. Such a strategy would be particularly useful when these additions or modifications would have a priori theoretical or structural meaning. An alternative approach would be to comprehensively explore potential sources of misfit in a model without endorsing specific modifications to the model. Such a strategy is more concerned with describing sources of model misspecification and the plausible tenability of the restrictive model rather than an explicit attempt to elaborate on the model.

The current analysis took the latter of the above two approaches. We are not aware of any specific theoretical prediction about the facets utilized in this study in terms of special genetic or environmental effects between common variance components and facets, nor did we have any a priori expectations about this issue. Further, it is unclear what the theoretical significance of finding that, for example, the Industriousness facet of Conscientiousness requires an additional pathway from a global variance component to account for the component’s influence above and beyond that mediated by the latent trait. A second, more pragmatic, concern for conducting a theoretically driven modeling strategy is the limited number of facets (3–4) and variance components (sometimes only two) that were used for model comparison limiting the number of possible modifications that can be made. Together, these concerns are substantial enough that any analysis based solely on improving the misfitting models used for comparison may be misleading. Therefore, we sought to descriptively, rather than theoretically, explore potential sources of model misfit that are found in the common pathways model. This was accomplished by freeing the common pathways-imposed proportionality constraint (see Appendix) from each A, C, and E component individually for each facet and noting the change in the χ2 value. This is a one degree of freedom difference test and indicates the overall amount of improvement in fit the model obtains by allowing a single variance component to no longer be constrained to be proportional with the other variance components of a specific facet.

Figure 2 displays a bar graph of the improvement in model fit as measured by change in χ2 by freeing one parameter. Of the 48 possibly significant improvements in model fit, only five reach the critical value for a one degree of freedom χ2 test. No individual significant improvements were found for Extraversion, Openness, and Neuroticism. This was to be expected for Extraversion and Openness as the common pathways model did not fit significantly worse than the independent pathways model for these traits. The finding is less clear for the Neuroticism domain which produced a substantial increase in model misfit by imposing the common pathways model. The general trend is for small improvements in fit for each variance component for each facet. This indicates that rather than one variance component or a single facet being responsible for model misspecification, the decrements of model fit are due to minor, distributed effects. Models for Agreeableness and Conscientiousness that freed a single parameter displayed significantly improved fit, particularly for genetic and nonshared environmental effects. For Agreeableness, the majority of improvement was found for the facets of Trustfulness and Compassion, and for Conscientiousness, the majority of improvement was found for the facet of Industriousness. These specific sources of misfit should be considered tentative possibilities for modifying the common pathways model until they are replicated in future work. Additionally, it is important to keep in mind that the improvements in fit indicated by individual comparisons are not additive improvements and may be overlapping. Due to these limitations and the descriptive nature of this analysis, we choose not to interpret these results further.

Fig. 2
figure 2

Bar graph of the influence of each facet’s variance component parameters on improving model fit by sequentially freeing the proportionality constraint of the common pathways model

Discussion

How to interpret the structural organization of personality has long been a contentious argument in psychology. Behavior genetic research of personality structure takes up the challenge laid out by Eysenck (1992b) “to anchor our dimensions of personality in something more concrete than the morass of factor analysis” with an understanding that “biology supplies us with the necessary tools” to build consistent theories (p. 672). By applying multivariate behavioral genetic models to data from MZ and DZ twins reared together, we found that there is substantial complexity in the structure of genetic and environmental effects on personality traits. Of particular importance, we found evidence of statistically significant specific genetic influences on facets not accounted for by a common hierarchical trait as modeled in the common pathways model for ten of 16 facets. This implies that aggregating facet scores to produce domain scores overshadows nuances of the genetic effects and renders the domain scores not genetically crisp. The empirical story that the parameter estimates present is one of important genetic and environmental variation at the level of the facets that underlie broad domains.

Both the common and independent pathways models indicated common genetic influences on the facets within each domain, but comparison of differences in model fit indicated these common influences were not well accounted for by the operation of genetic and environmental influences on higher-order traits representing Neuroticism, Conscientiousness, and Agreeableness. The common pathways model did not fit significantly worse for Extraversion and Openness, which is consistent with a conceptualization of genetic and environmental effects being filtered through latent traits. Our results of a complex genetic and environmental structure of personality are generally consistent with the previous studies that have examined the factor structure of personality traits using biometric methods. Jang et al. (2002) found multiple genetic factors were required to explain the variance in the facets. This is similar to our finding that a simple latent trait perspective does not capture all of the common variance in personality constructs. Johnson and Krueger (2004) used similar common and independent pathways modeling techniques to examine their data. They found that the common pathways model fit for Extraversion and Neuroticism. We found that the common pathways model fit for Extraversion and Openness. While this provides replication of the unitary nature of Extraversion, it casts doubt onto the replicability or structural cohesiveness for Neuroticism and Openness. Future research can help determine which results may be simply sample specific and which will be generalizable. For the remaining traits, the independent pathways model fit best. Even allowing for these inconsistencies, it is clear from the current and past results that the content and structure of personality facets goes beyond what is found at the higher-order factor level.

The utility of the FFM is in its capacity to comprehensively (or near to it) describe patterns of variation in the behavioral tendencies of individuals in five simple factors. The common practice of analyzing data on composite scores of each of the five domains may have caused interesting and important associations between lower-order facets and outcomes to be masked. Our results indicate that, to advance personality theory, nuanced viewpoints of differential effects of facets should be further explored. For instance, finding that at the domain level certain outcomes are not related to personality does not imply that there will be no associations at the facet level. Additionally, relationships that are found at the domain level may move closer to uncovering mechanisms for the relationship when examined at the facet level. One successful example of this can be found from Luciano et al. (2006) who explored the relationship between Conscientiousness at the facet level and measures of IQ and academic achievement. Only certain facets, notably Competence, Dutifulness, and Deliberation, were found to be significantly associated with intelligence or achievement. This finding at the phenotypic level led the researchers to explore the genetic relationship of these variables and find strong evidence of a common genetic component for the personality and ability measures. This emphasizes the pragmatic utility of facet level measurement for uncovering unique genetic relationships between personality and other outcomes.

Based on the complex and somewhat disperse structure of the FFM should researchers focus away from higher order traits and towards specific facets instead? We would argue that all levels of analysis have important uses within personality psychology (see Costa and McCrae 1995 and Paunonen and Ashton 2001b for discussion of when more or less detail is desirable). Costa and McCrae (1992b) claimed that “there are neurobiological structures that underlie such heritable personality traits” (p. 659). While we do not believe that multivariate analyses, be they phenotypic or behavioral genetic, can be used to directly evaluate this speculation, we do note that the biological substrate of personality need not be highly centralized in order for the shared functions of biological structures to be adequately described by a broad trait. For example, genome-wide association studies have indicated that complex and distributed interactions between many biological structures are likely crucial to larger constructs (Terracciano et al. 2010).

Costa and McCrae (1992b) have argued for measurement at the level of the Big Five Factors from a pragmatic perspective. They cite the extensive literature on the real-world correlations between important outcomes and personality traits as evidence for their utility (e.g. Costa et al. 1984; Barrick and Mount 1991; McCrae and Costa 1991; Costa et al. 1989; Digman and Takemoto-Chock 1981; Miller 1991). The issue of pragmatic utility is distinguishable from that of biological or psychology structure. As Cronbach and Meehl (1955) wrote:

Factors may or may not be weighted with surplus meaning. Certainly when they are regarded as ‘real dimensions’ a great deal of surplus meaning is implied, and the interpreter must shoulder a substantial burden of proof. The alternative view is to regard factors as defining a working reference frame, located in a convenient manner in the “space” defined by all behaviors of a given type. Which set of factors from a given matrix is “most useful” will depend partly on predilections, but in essence the best construct is the one around which we can build the greatest number of inferences, in the most direct fashion. (p. 287-288)

Indeed, our results suggest that while the Big Five personality factors may function as highly useful working reference frames at a broad level, additional important variation occurs at the facet level. One important confirmation of this argument is empirical findings of differential within-factor facet effects on outcomes (Paunonen 1998). The widespread existence of these effects and inability of even the Big Five to act as latent traits that filter global genetic and environmental effects onto lower level facets raises serious concerns about the long-term benefits of studying higher-order factors, such as Stability and Plasticity or the general factor of personality. While these factors may be able to produce some replicable and consistent findings, their inability to account for the genetic variation among the facet-level traits renders fine grained analysis more productive for future research. It is important to note that this study did not empirically evaluate the plausibility of higher order factors accounting for the genetic variation in the facets, but we would argue that it can be inferred from the current results that the Big Five factors could not account for all of the genetic variation in the facets. Thus, metatraits or the general factor (which typically use the Big Five as a measurement portion of a model) would similarly be unable to account for the genetic variance in the facets. Further, that the common pathways model fit significantly worse for three traits is evidence that the higher order factors may not be as unitary as phenotypic results may indicate. Future research should investigate the extent to which the etiological structure found in the current study is altered by incorporating higher-order structure. Rather than a broad general factor influencing a behavioral outcome, we would argue that, in terms of behavioral mechanisms and predictions as well as molecular searches, a more frequent consideration for facet-level measurement would enhance the pragmatic utility of research programs.

Strengths and limitations

As a relatively new area of investigation, multivariate behavior genetic analysis of personality factor structure is in need of further exploration and replication. One limitation of the current study that is important for future research is the number of facets that act as indicators for each trait. By increasing the number of indicators, more complex models can be fit to the data. One such model is a Cholesky model (Loehlin 1996). This model imposes less structure than the independent pathways model and provides even more strength for a looser interpretation of the coherence of the facets if it fits the data best. We did not report results for this model because it requires at least four indicators in order to be distinguishable from the independent pathways model. A greater number of indicators would also allow one to utilize the approach taken by Jang et al. (2002) to determine the number of coherent genetic and environmental factors that are needed to account for the variance in the facets. They found two factors were needed indicating a lack of genetic unity among the indicators. Again, this method requires several indicators for each trait. Another approach could model genetic and environmental effects as having different structures. Future research that utilizes finer grained analysis of facet level indicators will possess greater flexibility to test these interesting and important possibilities.

A further limitation is the use of a non-standard measure of the FFM. We used a novel scoring algorithm of the CPI which has been shown to produce reliable scores for the Big Five across several samples (Soto and John 2009). However, the scale has not been used widely in research due to its recent construction. Additionally, our results do not perfectly match the findings of past research. Johnson and Krueger (2004) utilized a similar methodology in that they specified theoretically meaningful models and classified Neuroticism and Openness differently. While they found that the common pathways model fit Neuroticism and the independent pathways model fit Openness, we found the opposite result. This may have resulted from the different scales utilized in the studies. However, as Costa and McCrae (1992b) point out, “the five-factor model should not be identified with any of its operationalizations; it is an evolving scientific construct, not an instrument” (p. 653), a logic that originally derives from Spearman’s (1927) theorem known as “indifference of the indicator.” To the extent that the scales used between different studies measure the same latent trait, the differences between them are unimportant. Replication of findings across both different samples and different methods of measurement is much stronger than simple replication across samples (Lykken 1968).

Another explanation for the differences between this and other studies could be the age of the sample. Johnson and Krueger’s (2004) sample was composed primarily of middle aged individuals, and the sample recruited by Jang et al. (2002) was from the general population with a large age range and an average age over 30 years. The current sample was primarily 17 year olds in adolescence. In both of the previous studies, the effect of the shared environment was able to be dropped to model a more parsimonious AE model without significantly worsening fit for all traits, which was not true for the current sample. Many examples and theories exist as to why estimates of quantitative genetic effects would increase with age and shared environmental effects typically decrease. For example, Scarr and McCartney (1983) have hypothesized that as children grow and develop, they are more able and encouraged to create their own environments rather than live in the same family environment. The data analyzed for this study came from a sample of teenagers who had not yet (or had very recently) moved out of their parents’ home whereas in the previous studies, the majority of twins no longer lived together as they were mostly mature adults. The effect that this would have on the distribution of global and specific genetic and environmental effects in addition to the effect on model comparisons is unclear. Future longitudinal work should investigate the multivariate biometric structure of personality at different developmental stages as well as the influence that genes or the environment play in change and stability over time.

Several cautionary notes should also be made of the current findings. This study made use of a fairly large sample of twins. However, multivariate behavioral genetic models must estimate numerous parameters. As such, the finer grained distinctions of parameter estimates should be considered preliminary findings until further replication and confirmation with different and larger samples are obtained. The minor discrepancies between the obtained and expected factor models for the CPI-Big Five also require some caution when drawing conclusions. Adventurousness was the only facet that did not have its primary loading on the intended trait for both twins which may indicate that the sample of relatively high achieving students who took the NMSQT responded differently to items assessing this construct than participants in the original samples (Soto and John 2009). However, factor and variable congruence coefficients were all very high indicating that the facets utilized in this study are largely the same as found in Soto and John (2009). This demonstrates the usefulness of the new scoring algorithm which opens a large body of previously collected data to be transformed into facets and factors that are relevant in current personality psychology.

In light of the minor discrepancies between the current factor model and that reported by Soto and John (2009), model fit statistics for the biometric model comparisons may be affected by the lack of coherence in the measurement model, specifically for Adventurousness. In our analyses, the majority of the ACE variation components influencing Adventurousness were found at the specific rather than the common level in both the independent and common pathways models. In the common pathways model that was retained for Openness, the factor loading for Adventurousness was only 0.23 in comparison to 0.58 and 0.66 for Idealism versus Conformity and Intellectualism, respectively. Keeping in mind that the comparison between the common and independent pathways models only takes into account differences at the common or global levels (all specific effects are modeled exactly the same in both model types), it makes sense that Openness would be less likely to display significant misfit when fitting the constrained model. The global effects are primarily dependent on the relationship between two facets with Adventurousness not being strongly related at the global level. Thus, constraining the three facets to have a common structure produces little model misfit because the majority of the variation for Adventurousness is found at the specific level for both the common and the independent pathways models. As such, the results regarding Openness in this study may be sample specific and should be interpreted with caution.

The example of Openness proposes a further limitation and strength of this study. As mentioned, the model comparisons performed in the study only took into account differences in model misfit of global effects. Models were specified in this particular way to mimic the hypothesized hierarchical structure of personality traits. However, judging that Openness is best represented by the common pathways model when one of the facets was largely independent of the other two facets presents a large problem that calls for careful thinking and interpretation rather than strict reliance on statistical tests. In this instance, the preliminary descriptive goal of the present study to explore the general structure of the genetic and environmental effects with a relatively unconstrained model becomes a crucial aid to interpretation. Further, the analysis of where misfit occurs for each of the Big Five provides further information about the ability to structure specific effects as common or independent. The central finding of the current study is that there are significant facet-specific, genetic effects that are overshadowed when aggregated to the factor level.

Finally, it is of note that all models fit for the current project have good RMSEA values, indicating that both the common and independent pathways models were adequate approximations of the observed data. While the χ2 difference tests that we used to compare models are sensitive to absolute fit, RMSEA is an approximate fit index. The results of the model comparisons can be interpreted as differences in exact fit to the data even though both models fit reasonably well. However, low RMSEA values for the common pathways model may not have persisted had more indicators for each factor been available. As the number of indicators increases, the differences between the common and independent pathways models also increase in terms of degrees of freedom and the amount of constraint placed on the facet structure. Additionally, one might argue that our χ2 difference tests were somewhat overpowered, in light of the good overall fit of both model types. While it is accurate that the χ2 difference test is sensitive to sample size, in the current study the test determined that the more parsimonious common pathways model did not fit significantly worse than the independent pathways model for two traits. Had the sample size been so large as to overpower the χ2 difference test, one would expect to reject the common pathways model in all instances. In light of the fact that the model fit statistics were relatively good for both models, it would be more appropriate to view the difference test as shedding light on whether or not common genetic and environmental effects can be constrained to be proportional across facets instead of proving one model “right” or “wrong” since both are largely “right,” in that they approximate the observed patterns of relations in the raw data adequately. The results of the current study would imply that the common genetic and environmental effects of facets for Extraversion and Openness act in a more uniform manner than the common effects of Agreeableness, Conscientiousness, and Neuroticism. Of primary importance, all facets were found to display specific variation unique of the other facets within a trait, with some displaying substantial genetic variation not shared with the higher order trait. Overall, these results indicate that a complete understanding of human personality will require research spanning multiple levels of both description and explanation.