1 Introduction

A well-known feature of the US economy is its high level of income inequality. Recent research has also documented that intergenerational income mobility in the US is quite low compared to other industrialized countries (Corak 2006) suggesting that family resources during childhood may play an important role in determining future economic success. Few studies, however, have analyzed the contribution of family background as either a direct or indirect influence in explaining income inequality. For example, there might be important characteristics such as cognitive skills, looks, or social contacts that are transmitted by parents through either “nature” or “nurture” that are rewarded by the labor market.

A useful way to measure the importance of family background is to examine the sibling correlation in economic outcomes. The sibling correlation provides a broad measure of the overall importance of a wide variety of factors common to the family, ranging from parental involvement to school and neighborhood quality.

Only a few previous studies have used large national samples to examine this measure for the US with estimates in the 0.3 to 0.45 range. I contribute to this literature in several ways. I employ variance component models using restricted maximum likelihood (REML) that have desirable statistical properties—most importantly, consistency—lacking in previous analyses. I also use much larger US samples from the National Longitudinal Survey of Youth (NLSY) containing many more siblings than previous work for a more recent set of cohorts. I present results on a variety of outcomes (earnings, family income, wages, and hours worked) for both men and women. I also contrast the correlation in economic outcomes to the correlation in a broad range of noneconomic outcomes.

A key finding is that the sibling correlation in economic outcomes in the US is around 0.5—suggesting that half of economic inequality may be attributed to family background. It also implies that intergenerational economic mobility is quite low. Solon et al. (1991) conduct a simulation to show that such a high sibling correlation implies that less than 7% of individuals who are near the bottom of the distribution of the family component of permanent income are likely to have their own earnings or wages surpass the median.

In addition, the sibling correlation in human capital (years of schooling and Armed Forces Qualifying Test [AFQT] test scores) is found to be around 0.6, which is higher than most previous estimates. In contrast, the sibling correlation for a variety of nonhuman capital measures is found to be much lower. It may be especially surprising to note that even measures of physical attributes such as height and weight, which presumably have a strong genetic component, are not as highly associated between brothers as is the permanent component of wages. This strongly suggests that there are factors related to individual or family decision making that lead to a high degree of similarity in the economic fortunes of siblings rather than some simple mechanical relationship. The fact that there is a reasonably strong evidence that the sibling correlation is lower in Nordic countries (Björklund et al. 2002) also suggests that economic mobility may be influenced by institutional differences or policy interventions.

This study also attempts to identify some of the underlying channels by which family and community affects future economic outcomes. Solon (1999) writes, “The mystery of what underlies the considerable resemblance between brothers in their long-run earnings remains a fascinating puzzle and should be a priority for continuing research.” A decomposition analysis finds that human capital acquisition can explain at most about 50% of the sibling correlation in earnings and wages. Other characteristics including “noncognitive” factors emphasized in some recent economic studies (e.g., Osbourne-Groves (2005), Heckman and Rubinstein (2001), and Dunifon et al. (2001)) are also found to be important. These results provide some initial clues for future research.

2 Background and previous US studies

A long literature in sociology and economics has tried to estimate the importance of family background on children’s future economic success. Early studies often encountered important data limitations such as small intergenerational samples and missing variables on key family background characteristics. Beginning in the 1970s, researchers began to examine the sibling correlation as an alternative approach to measuring the importance of family background (e.g., Corcoran et al. 1976).

Conceptually, the sibling correlation in economic outcomes provides a summary statistic that captures all of the effects of sharing a common family as well as any other shared factors (e.g., common neighborhoods, school quality). Conversely, many aspects of family background including genetic traits and sibling-specific parental behaviors will not be captured. If the similarity in, say, wages between siblings is not much different compared to randomly chosen individuals, then we would expect a small correlation. If, however, a large fraction of the variance in wages is due to factors common to growing up in the same family environment, then the correlation might be sizable. In that sense, the sibling correlation tells us how much of inequality is due to differences between families.

The earliest studies in the US were conducted before large nationally representative data became available, and they typically used only a single year of earnings for each sibling. The results varied widely but were centered at around 0.25 (Solon 1999). Only a few studies have produced estimates of the sibling correlation in permanent economic status in the US. Solon et al. (1991) estimate the brother correlation in the permanent component of log annual earnings to be 0.34 when using the nationally representative portion of the Panel Study of Income Dynamics (PSID). The estimate rises to 0.45 when they include the oversample of poor families in the PSID. Björklund et al. (2002) focus on men and use the nationally representative portion of the PSID over a longer time period and produce estimates between 0.42 and 0.45. Both studies use analysis of variance (ANOVA) to produce estimates of variance components which, in turn, are used to assemble estimates of the sibling correlation in the permanent component of log earnings.

There are also two studies that use the NLS original cohort of young men who are tracked from 1966 to 1981. Altonji and Dunn (1991) estimate the correlation in the permanent component of a variety of economic outcomes using both a time-averaging approach and a method-of-moments estimator. Their estimates of the brother correlation in log annual earnings are 0.32 and 0.37. For log hourly wages, their estimates are 0.33 and 0.42. Ashenfelter and Zimmerman (1997) report a brother correlation coefficient of 0.31 in log annual wages averaged over 1978 and 1981. Both studies only include individuals from multiple sibling families and use the oversample of black households without sampling weights. These results appear to support Solon’s (1999) conjecture that the brother correlation in permanent status in the US is about 0.4.

Only a few studies have investigated which factors drive the sibling correlation in economic outcomes. Altonji and Dunn (2000) find evidence of linkages between family members (including siblings) in unobserved preferences for work hours using a factor model on US data. Solon et al. (2000) find that little of the sibling correlation in years of schooling in the US can be explained by neighborhood effects. Rauum et al. (2006) reach a similar conclusion for Norway. Using a Swedish dataset containing specific information on various sibling types, Björklund et al. (2005) decompose the sibling correlation in earnings into genetic and environmental components. The results are sensitive to the specification but suggest that there is a large genetic component.

3 Statistical models and estimation

The following statistical framework based on Solon et al. (1991) is used to measure the sibling correlation in economic outcomes. Each economic outcome (e.g., wages, income) is denoted by y ijt , where i indexes families, j indexes siblings and t indexes years. Outcomes are then modeled as follows:

$$ y_{{ijt}} = \beta X_{{ijt}} + \varepsilon _{{ijt}} {\text{ }} $$
(1)

Here, the vector, X ijt , contains age and year dummies to account for lifecycle effects and year effects such as business cycle conditions. These are treated as fixed effects. The residual, ɛ ijt , which is purged of these effects, is then decomposed as follows:

$$ \varepsilon _{{ijt}} {\text{ }} = a_{i} + u_{{ij}} + v_{{ijt}} $$
(2)

The three terms on the right hand side of (2) are treated as random effects that are assumed to be independent of each other.Footnote 1 The first term, a i , is the permanent component that is common to all siblings in family i. The second term, u ij , is the permanent component that is individual-specific. v ijt represents the transitory component that reflects noise due to either temporary shocks to earnings or measurement error in the survey.Footnote 2 The variance of age-adjusted earnings, ɛ ijt , then is simply:

$$ \sigma ^{2}_{\varepsilon } = {\text{ }}\sigma ^{2}_{a} + \sigma ^{2}_{u} + \sigma ^{2}_{v} $$
(3)

The first term, \( \sigma ^{{\text{2}}}_{a} \), captures the variance in permanent economic outcomes that is due to differences between families, whereas the second term, \( \sigma ^{{\text{2}}}_{u} \), captures the variance in permanent economic outcomes within families. These two components are then used to calculate the correlation in permanent outcomes between siblings, ρ, which is the focus of this analysis

$$ \rho = \frac{{\sigma ^{2}_{a} }} {{\sigma ^{2}_{a} + \sigma ^{2}_{u} }}. $$
(4)

This is also equivalent to the fraction of the overall variance of the permanent components that is due to shared family and community background.

Solon et al. (1991) and (Björklund et al. 2002) use a two-step approach to estimate the variance components in this “mixed model” (mixed because it contains both fixed and random effects). First, they use a regression to estimate (1) and to produce the residuals. Then they use classical ANOVA formulas on the residuals that are adjusted for the fact that the data are “unbalanced” (the number of siblings varies by family, and the number of available years varies by sibling). The adjusted ANOVA formulas may be found in the Appendix of Solon et al. (1991).

Although ANOVA estimators of variance components have some desirable statistical properties for balanced data, virtually none of these properties transfer over to the case of unbalanced data. For this reason, the preferred approach is to use “REML”, which has a number of advantages such as consistency, asymptotic normality, and a known asymptotic sampling dispersion matrix. REML requires an assumption that the data are normally distributed. For many of the outcomes considered here (e.g., log wages, height), this is not likely to be a major factor. For other outcomes such as education, the assumption of normality may be more suspect. Until recent years, computational limitations also made practical implementation of maximum likelihood more difficult.

REML partials out the fixed effects and maximizes the likelihood of the residuals containing the random effects variance–covariance structure. Searle et al. (1992) conclude after an extensive review of approaches for estimating variance components that “It is our considered opinion that for unbalanced data each of ML and REML are to be preferred over any ANOVA method.” REML also appears to be the preferred estimator among quantitative geneticists (Meyer and Hill 1991; Visscher 1998).

A comforting feature of REML is that it produces identical results to ANOVA when the data are balanced. I will present some results using both approaches to show that although the results differ they are not dramatically different. Another nice feature of REML is that it directly produces standard errors of the variance components (standard errors for ρ are calculated by the bivariate delta method).

To understand how different observable characteristics (e.g., parent income, schooling) influence the sibling correlation in economic outcomes, I calculate an estimate of the contribution of various factors. I add the relevant variables to the vector X in (1) and treat them as additional fixed effects in the REML framework. The inclusion of additional fixed effects should sop up some of the residual variation in the outcome variable and produce lower estimates of the family component \( {\left( {\sigma ^{{2^{*} }}_{a} } \right)} \) than what was found without their inclusion \( {\left( {\sigma ^{2}_{a} } \right)} \). I then take the reduction in the variance of the family component \( {\left( {\sigma ^{2}_{a} - \sigma ^{{2*}}_{a} } \right)} \) as an estimate of the amount of the overall variance of the family component that can be attributed to the specific factor(s) in question. This provides an upper-bound estimate of the causal effect because it includes all omitted factors that are also correlated with the included fixed effects. For example, the reduction in \( {\left( {\sigma ^{2}_{a} } \right)} \) due to the inclusion of years of schooling would be comprised of both the direct effects of schooling as well as any omitted factors (e.g., perseverance) that also contribute to the outcome variable and are correlated with years of schooling. The change in the variance of the family component divided by the overall variance of the permanent component tells us what fraction of the overall sibling correlation is due to the factor(s) in question. Implementing this approach for a wide variety of possible explanatory variables, either by including them one at a time or all at once, should tell us something about which measures are critical to explaining the correlation in economic outcomes.

I also investigate the sibling correlation in several noneconomic outcomes where I do not need to examine multiple measurements at different points in time. This requires a far simpler model where I simply drop v ijt from (2) and then use REML to simply calculate the two variance components \( {\left( {\sigma ^{2}_{a} } \right)} \) and \( {\left( {\sigma ^{2}_{u} } \right)} \) to calculate ρ. In an earlier version of this paper (Mazumder 2004), I compared the REML estimates to a different approach employed by Solon et al. (2000) and found that the results were not very different.

4 Data

The analysis uses the NLSY79 which followed individuals between the ages of 14 and 21 on December 31, 1978 every year from 1979 through 1994 and then every other year. For economic outcomes, I use the NLSY data through the 1998 survey. The survey includes an oversample of black, Hispanic, and (nonblack, non-Hispanic) disadvantaged families. However, the NLSY identifies a nationally representative cross-section of families. I make use of both the full samples by using survey year weights and also use the nationally representative sample without weights.

I use men and women between the ages of 14 and 22 in the initial survey in 1979. I examine four economic outcomes: log annual earnings, log annual family income, log hourly wages, and log annual hours. The outcome variable must be observed and positive at least once when they are at least 26 years old and not enrolled in school. Earnings include wage and salary income as well as business income. I also imposed the following sample restrictions for each outcome; earnings and family income had to be at least $500 in 1979 dollars; wages had to be at least $0.50 and no greater than $100; and annual hours had to be at least 100. The NLSY identifies up to five siblings for each individual. Siblings are identified based on variables that attempt to identify precise family relationships between household members. I do not include nonbiological siblings (e.g., stepbrothers, brothers-in-law, foster brothers, adopted brothers). Only siblings born within the 8-year cohort window are tracked. This restriction on the difference in ages between siblings is very similar to previous studies.Footnote 3

I also examine a variety of noneconomic outcomes. For education, I measure years of completed schooling by age 26. For test scores, I use the AFQT. The AFQT is part of the Armed Services Vocational Aptitude Battery, a battery of ten tests given to applicants to the US military. The AFQT score is based on four of the tests that focus on reading skills and numeracy and is a primary criterion for enlistment eligibility. I use the percentile ranking for the renormed (1989 version) score. The test was administered to nearly all respondents in the NLSY in 1980 to provide new norms for the test based on a nationally representative sample. The AFQT is not meant to be viewed as a measure of general intelligence or IQ. For illegal drug use, I use a 1988 survey question asking how many times the respondent has used marijuana or hashish in their lifetime. The responses are presented in five categorical groups. I use the type of residence variable from all survey years starting in 1983 to determine if individuals were ever in jail at the time of the interview. For women, I examine the age of first pregnancy (for the subset of women who were ever pregnant). I look at three measures of physical attributes, height (in 1985), weight at age 28 or 29, and body mass index (BMI). Unfortunately, height is only asked in 1985 so I am unable to hold the age at which height is measured fixed. However, the youngest respondents who are born in 1964 will be 21 by 1985. Note that height is also used to calculate BMI.

Finally, I look at two attitudinal measures from the psychology literature. The first measure I use is the “Rotter scale,” which measures the degree to which individuals feel they have control over their lives. See Osbourne (2000) for a discussion of this measure. The second measure is a self-esteem scale, which combines responses to ten questions designed to determine respondent’s views of self-worth. In all the samples that examine noneconomic outcomes, I require that observations do not have missing data on the relevant outcome.

In most of the analysis, the samples include siblings as well as non-siblings or “singletons” in estimating population variances. I do this to increase efficiency and to maintain comparability with Solon et al. (1991) who had too small a sample of siblings in the PSID to confine the analysis only to multiple sibling families. Solon et al. speculate that including singletons in the analysis may lead to an overestimate of ρ if outliers tend to be more common among singletons than siblings. This is because although singletons’ earnings are used to calculate \( \sigma ^{2}_{a} \), the variance of the family component used in both the numerator and denominator of ρ, they are not included in \( \sigma ^{2}_{u} \), the variance of the individual component, which is only in the denominator of ρ. In the results that follow in the next section, I conduct several robustness checks that include using a sample of only siblings. A set of summary statistics is provided in Table 1.

Table 1 Selected summary statistics for mena

5 Sibling correlations in economic and noneconomic outcomes

5.1 Economic outcomes

I start by contrasting a few selected results with those obtained by using ANOVA formulas. In Table 2, I present estimates of the variance components and the sibling correlation in annual earnings and hourly wages separately for brothers and sisters. The ‘Brothers’ part shows the results for men, and the ‘Sisters’ part shows the results for women. The estimated brother correlations are 0.492 and 0.536 for earnings and wages, respectively. Although these results are similar to the higher set of estimates in Solon et al. (1991), they are more robust due to the significantly larger number of observations used. For example, for men, Solon et al. use around 2,500 observations, 750 individuals, and 600 families, whereas this analysis uses 30,000 observations, 5,000 individuals, and 4,000 families.

Table 2 Sibling correlations in earnings and wages, REML vs ANOVAa

The results are higher for both outcomes when using REML, although the differences are not statistically significant. As REML is known to produce consistent estimates, and as the NLSY samples are quite a bit larger, these results suggest that the consensus view of a sibling correlation in log earnings of around 0.4 expressed in Solon (1999) perhaps ought to be revised up to near 0.5. The results also suggest that the correlation in log wages appears to be higher than 0.5, which is also consistent with Solon et al.’s (1991) finding. In any case, these results reinforce the main point in Solon et al. (1991) that accounting for the transitory variance is critical. If I were to use data from just a single year, the implied correlation in brothers’ wages would be just 0.306.

To ensure that the difference in estimates by statistical method was not due to a dataset effect, I also conducted an analysis using the PSID. I first attempted to replicate the baseline results in Solon et al. (1991) using ANOVA. Compared to their estimate of a sibling correlation in log annual earnings of 0.342, I obtain an estimate of 0.310. When I use REML instead, my estimate increases to 0.385. The increase of 0.075 is roughly comparable to the magnitude of the increases I find for men in Table 1 when moving from ANOVA to REML (0.062 for earnings and 0.066 for wages). Therefore, I conclude that the higher estimates from REML are not a result of using any particular dataset.

The results for the sister correlation in earnings and wages are somewhat lower at 0.340 and 0.360, respectively. This is not so surprising because women’s labor force participation patterns during their 20s and 30s are very different from men and may produce much noisier estimates of long-term economic status for these outcomes. Indeed, both the permanent individual component and the transitory components are dramatically higher than for men. I will look at the correlation in family income, a broader measure of economic status, in the next set of results. The ANOVA results are a bit lower for women’s earnings but are actually higher for women’s wages.

In Table 3, I confine the results to men and only present REML estimates. I present the same estimates for earnings and wages as in Table 2, but I now add family income and annual hours as additional outcomes. In column 1, I employ the base specification, which uses the full sample of men including singletons, and weight the sample with survey year weights. The brother correlation in family income is estimated to be 0.466 or slightly lower than the correlation in earnings, 0.492. As family income includes spouse income, it may be that assortative mating acts to lower the correlation between brothers in this outcome. The correlation in annual hours is estimated to be just under 0.4, which is similar to the reported results in Solon et al. (1991) and the method of moment results in Altonji and Dunn (1991).

Table 3 Brother correlation in economic outcomesa

In Table 3, I also test whether the results are sensitive to the use of weights on the oversample of poorer families. In column 2, I use only the sample of families that are identified as nationally representative in 1979. This results in keeping a little more than half the observations. The estimates drop only slightly for three of the four outcomes and are virtually identical for the fourth.

I also experiment with confining the sample to only individuals from families with multiple siblings. In column 3, I show that compared to column 1, the results are virtually identical when using a sibling-only sample. This is true although only about a third of the total observations are used. In column 4, I combine both conditions (a nationally representative sample of only siblings). This strongly suggests that including singletons has little effect on the results. I have also experimented with changing the age cutoff and found that the results are not sensitive to these perturbations. For example, if I restrict the age of observation to be at least 28, the estimate of the brother correlation in wages falls slightly from 0.54 to 0.53. For those aged at least 30, the estimate falls to 0.51. One issue is that raising the age cutoff also necessitates dropping some of the sample years. For example, an age restriction of 34 would confine my sample to just 5 years, 1991–1993, 1995, and 1997.

In Table 4, I do the same analysis for women. The estimates for the correlation in family income among sisters are in indeed much higher than for the other outcomes. In fact, they are virtually identical to the correlation between brothers. For the nationally representative samples (columns 2 and 4), the correlation in family income is actually higher among sisters than among brothers (Table 3). Chadwick and Solon (2002) similarly find that the intergenerational elasticity in family income for daughters is close to that found for sons. The correlation in annual hours is very low but not surprising given the wide variance in labor force participation. Having established that, with respect to the most comprehensive measure of economic status, family income, the sibling correlation is essentially the same for men and women, I proceed with the most of remaining analysis focusing just on men.

Table 4 Sister correlation in economic outcomesa

5.2 Sibling correlations in noneconomic outcomes

Table 5 presents the results for a variety of noneconomic outcomes for brothers, sisters, and all siblings as appropriate. This serves as a useful point of contrast with the economic outcomes and is also interesting in its own right. The analysis includes singletons using only the nationally representative portion of the NLSY. Including the oversample of poorer and minority households without weights results in similar estimates. I found, however, that when I used weights on the noneconomic outcomes, it sometimes led to implausibly large estimates. Therefore, I chose to only use the representative sample without weights when studying noneconomic outcomes.

Table 5 Sibling correlation in selected noneconomic outcomesa

The correlation in years of schooling appears to be roughly similar for both brothers and sisters, and the correlation across all sibling pairs is about 0.6. This is slightly higher than the estimates in Solon et al. (2000). For AFQT scores, the estimates are also similar across genders and are even higher than the education estimates. These estimates are similar to those obtained by Oettinger (1999) who also uses the NLSY. I next focus on a few socioeconomic outcomes that have been commonly analyzed in studies of neighborhood/peer effects (e.g., Case and Katz 1991). I find that the correlation in drug use among all siblings is a bit below 0.3 with a slightly higher point estimate for sisters (0.37) than brothers (0.3), although the difference is not statistically significant. The fact that the overall correlation is lower than the correlation within gender type suggests that the correlation across siblings of different genders is lower. When I examine whether respondents were ever in jail, the estimate for sisters is zero. It is worth noting that as variance component models, by definition, are bounded at zero, REML cannot produce negative estimates. For brothers, the correlation estimates are around 0.25. This sharp distinction by sex is similar to that found by Duncan et al. (2001) in their analysis of measures of delinquency among teenagers. Finally, the correlation in age at pregnancy is about 0.3.

I now turn to physical characteristics/health outcomes. The correlation in height between siblings is slightly below 0.5, whereas the estimates for weight are around 0.3 for brothers and sisters. The height correlation is similar to what has been found in previous studies (e.g., Duncan et al. 2001). For BMI, I find that the correlation among sisters (0.30) is slightly higher than among brothers (0.26). These estimates are similar to the estimates of sibling correlations in cholesterol levels and blood pressure (Lee et al. 2003). Finally, with the attitudinal variables, the correlation in the Rotter scale is only about 0.1 in all cases, whereas the correlation in self-esteem is in the 0.2 to 0.3 range. These are similar to estimates of “extraversion” and “emotional stability” reported in Loehlin and Rowe (1992).

Overall, it appears that the correlations in the human capital measures are actually the highest at around 0.6. Otherwise, the only variable with a sibling correlation comparable to the economic outcomes is height. Other outcomes that presumably have a large genetic component such as weight, BMI, and personality characteristics are considerably lower and correspond to findings in the existing literature. This strongly suggests that there are factors related to individual or family decision making, particularly with respect to schooling, that lead to a high degree of similarity in the economic fortunes of siblings.

6 Contributions to the brother correlation in economic outcomes

I now examine the potential impact of various explanatory variables on the sibling correlation in economic outcomes among men. Table 6 presents estimates of the contribution of various variables using the methodology described in “Background and previous US studies”. A growing literature on intergenerational economic inequality (e.g., Solon 1999; Mazumder 2005) has emphasized the importance of parent or family income on children’s economic outcomes. Therefore, one obvious candidate for explaining the sibling correlation is parent income. With the NLSY79 sample, there is only information for a subset of individuals for just a few years on parent income. As Solon (1992) has shown, using income from just a single year is a poor proxy for permanent income and leads to downward-biased coefficients. Similarly, using just a two-year average of income from 1978–1979 will also likely to lead to biased estimates of the residuals (purged of parent income) and, therefore, biased estimates of the variance components. In any case, using this proxy for parent permanent income, I find that the variance in the family component in earnings residuals is reduced by 0.17, which explains about 36% of the sibling correlation in earnings. Interestingly, Solon (1999) shows that using the consensus estimates of the intergenerational elasticity in earnings of 0.4, one would expect the contribution to be (0.4)2 or 0.16. If, however, the intergenerational elasticity is actually closer to 0.6 (Mazumder 2005), then this is a vast underestimate of the true contribution of parent income. The contribution of parent income to the sibling correlation in family income is slightly higher at 41%. The contributions to the sibling correlation in wages and hours are 27 and 21%, respectively.

Table 6 Contributions to the brother correlation in economic outcomesa

Economic models of wage determination strongly emphasize the importance of human capital. Consequently, I next examine how years of schooling and AFQT scores influence the sibling correlation in economic outcomes. Both measures fare almost equally well across the outcomes. For earnings, family income, and wages, each measure explains anywhere between 40 and 50% of the sibling correlation. Including both human capital measures in conjunction explains more than half of the sibling correlation in earnings and family income. These human capital measures, however, only explain about 20% of sibling correlation in annual hours.

Many studies have also emphasized the importance of physical characteristics such as height and appearance on wages, so it is interesting to understand the importance of these variables in explaining the sibling correlation in economic outcomes. Interestingly, physical characteristics only account for about 5% of the sibling correlation, most of which is due to the inclusion of height. Accounting for any time spent in jail explains more than 20% of the sibling correlation in earnings, family income, and hours but less than 10% of the sibling correlation in hourly wages. Illegal drug use, however, appears to explain virtually none of the sibling correlation. Both psychological measures make an important contribution to explaining the sibling correlation in earnings, family income, and wages. Combined, they account for about 20% of the sibling correlation in these measures. This result adds to the growing research that has found that noncognitive factors such as personality traits play an important role in the intergenerational persistence in economic status.

Finally, I also try to explain the importance of occupation by including three-digit occupation dummies. This is more controversial, as occupation is often viewed as an outcome rather than a causal factor determining economic success. Still, occupation may act as a proxy for other forms of human capital or capture social connections that perpetuate intergenerational inequality. In any case, occupation appears to account for around 60% of the sibling correlation in earnings, family income, and wages.

Including all of the variables at the same time accounts for 80% or more of the brother correlation in earnings and family income and upwards of 70% of the brother correlation in wages and hours worked. Excluding the occupation dummies, 65% of the correlation in earnings between brothers and 73% of the correlation in family income can be accounted for by these variables. On the other hand, just under half of the sibling correlation in wages is still unexplained even after including all of these variables.

The fact that the variables are likely to be correlated with one another is a limitation of this decomposition approach. It would be interesting to know for example, the importance of parent income controlling for my measures of human capital. To get a sense of the conditional contributions to the brother correlation in wages, Table 7 shows how the contributions for each broad category of factors are affected if one controls for each of the other factors, one at a time. For example row 1 of Table 7 shows that the unconditional contribution of parent income (27%) is reduced to 6% if one controls for years of schooling and AFQT scores. These results continue to show that the human capital measures contribute the most followed by parent income and then psychological characteristics.

Table 7 Conditional contributions to the brother correlation in wagesa

7 Conclusion

This study uses an improved estimation approach on a much larger sample than previous studies to bolster the findings of previous economic research that has demonstrated that family and community influences account for a large portion of the variation in economic outcomes in the US. I find, for example, that the sibling correlation in men’s permanent wages is greater than 0.5, suggesting a high degree of economic rigidity. Notably, this correlation is even higher than the correlation in height and weight, suggesting that the “inheritance” of economic inequality is particularly strong.

Using a decomposition analysis, I find that observable measures of family and individual characteristics can explain a large portion of the sibling correlation in earnings and family income. Economic models of wage determination typically emphasize human capital acquisition as a key variable, and it is clear that this is one important way by which families confer advantages to their offspring. Parent income also plays an important role. I also find that psychological characteristics play a smaller, although quantitatively important role. Although this analysis provide some initial clues for which mechanisms are important, future research using more convincing research designs are needed to better guide policymakers.