Introduction

Sibling and cousin correlations are a standard tool in the study of stratification and mobility. Suppose we construct a scatterplot with each pair of siblings represented by a point, with one sibling’s income on the x-axis and the other’s income on the y-axis. The correlation between the two is often taken as an omnibus measure of the degree to which family of origin constrains life chances (Solon et al. 1991). Noting that cousins share extended kin (e.g., grandparents) but come from different nuclear families, recent research has used sibling and cousin correlations in combination to assess the degree to which life chances vary across nuclear and extended families (Hällsten 2014; Jæger 2012; Knigge 2016).

The problem with sibling and cousin correlations is that they are difficult to interpret. Siblings can be similar because parents strongly influence life chances, because they are raised in the same neighborhood, because they influence each other’s outcomes, or for numerous other possible reasons. Many theoretical stories are consistent with a given set of empirical estimates. The same is true of cousin correlations. Although it is tempting to interpret a large cousin correlation as evidence that extended families shape life chances, cousins’ outcomes would also be similar under a parent-to-child transmission process in which transmission is strong from grandparents to parents and from parents to offspring. Echoing Stuhler’s (2012) argument that parent-offspring correlations can arise from a variety of causal processes, the present article shows that sibling and cousin correlations can also arise from a variety of causal processes. Similar to the way biologists interpret similarities of traits in light of path models that summarize the generative process (Otto et al. 1995), a sibling correlation in social science also takes its meaning only under a theory about how that correlation came to be.

As a reference point, I begin with the Becker and Tomes (1979) model of income mobility. This model predicts that the cousin correlation will equal the squared sibling correlation. When empirical evidence contradicts this model, I show that the evidence is consistent with at least three distinct theoretical interpretations. Estimates would deviate from the Becker-Tomes prediction if the process is dynamic (changing across generations), latent (outcomes measured with error), or second-order (involving a direct role for grandparents). Some prior authors have favored the third interpretation, but there is no particular reason to do so.

Second, I apply a Bayesian estimation procedure to produce new estimates of sibling and cousin correlations in permanent log family income in the United States. The advantage of this procedure is that it facilitates reporting of uncertainty about the key test statistic and implied parameters of theoretical models. When estimates seem to contradict a theoretical model, it becomes essential to assess our degree of confidence by incorporating uncertainty.

The article concludes with a discussion of the role that sibling and cousin correlations may continue to play in applied research. I highlight how these quantities remain useful in a descriptive sense and may serve as a springboard for new theories, even if they cannot provide definitive evidence for or against a particular theory under assumptions we can credibly defend.

Sibling and Cousin Correlations: A Variance Decomposition

Sibling correlations capture the expected degree to which an outcome tends to be similar for a pair of siblings. Although these quantities are standard in stratification and mobility (Jencks et al. 1972; Mazumder 2008), this section briefly introduces them for the general reader. Suppose an outcome μi for person i is decomposed into three components:

(1)

In the notation in Eq. (1), S(i) indexes the sibling set to which i belongs: all siblings of i have the same value of S(i). Likewise, C(i) indexes the cousin set to which i belongs: all siblings and cousins of i have the same value of C(i). Denote the variance of each component by \( \left\{{\upsigma}_{Person}^2,{\upsigma}_{SiblingSet}^2,{\upsigma}_{CousinSet}^2\right\} \).

The relationship of Eq. (1) to sibling and cousin correlations relies critically on a set of independence assumptions that components are independent across units and across levels. To make this concrete, sibling correlations assume that the person-specific component is independent across individuals (δi ⊥ δj  ∀ i, j). This assumption would be violated if siblings differentiate from each other, thereby making δi and δj negatively correlated for a pair of siblings i and j. In the extreme, this would occur in a dystopian world in which every amount that sibling 1 sacrifices helps sibling 2 to succeed (Conley 2008). This dystopian world may seem implausible, but in the actual world siblings do in fact differentiate from each other to some degree (Conley 2004). The use of variance decompositions to study sibling correlations requires us to set aside this possibility. Likewise, estimators for sibling and cousin correlations typically assume independence across levels—for instance, ruling out that the average attainment of one’s sibling or cousin set is related to the variance of attainment for individuals within that set.

Armed with the independence assumption, the correlation between μ for a pair of siblings i and j is a function of the variance at each level (Mazumder 2008):

$$ {\uprho}_{Sibling}=\mathrm{Cor}\left({\upmu}_i^{Person},{\upmu}_j^{Person}\right) $$
(2)
$$ =\frac{\mathrm{Cov}\left({\upmu}_i^{Person},{\upmu}_j^{Person}\right)}{{\left(\mathrm{V}\left({\upmu}_i^{Person}\right)\mathrm{V}\left({\upmu}_j^{Person}\right)\right)}^{\frac{1}{2}}} $$
(3)

by definition of correlation,

$$ =\frac{\mathrm{Cov}\left({\updelta}_i+{\eta}_{S(i)}+{\uplambda}_{C(i)},{\updelta}_j+{\eta}_{S(j)}+{\uplambda}_{C(j)}\right)}{{\left(\mathrm{V}\left({\upmu}_i^{Person}\right)\mathrm{V}\left({\upmu}_j^{Person}\right)\right)}^{\frac{1}{2}}} $$
(4)

by Eq. (1).

Expand, dropping cross-level terms by independence:

(5)
(6)

Under the independence assumption, the proportion of the variance that is between sibling sets equals the sibling correlation. Analogously, the proportion of the variance between cousin sets equals the cousin correlation. For this reason, even descriptive use of sibling and cousin correlations as measures of similarity within families relies on the independence assumption. This article’s primary focus is the interpretation of sibling and cousin correlations in light of theoretical models of social mobility. The next section formalizes theoretical models of multigenerational mobility using parameters (e.g., β) linking outcomes across multiple generations. Sibling and cousin correlations provide a tool to infer these parameters given data on the outcome generation alone. As I will argue, inferring multigenerational parameters from sibling and cousin correlations requires conceptual arguments for a particular theoretical model. It is often difficult to make a compelling case for one theory over another, which makes inferences about multigenerational processes from sibling and cousin correlations difficult.

Deviation From Becker-Tomes: Three Distinct Interpretations

A deep literature has built on the formal model of Becker and Tomes (1979), in which the expected income (Yg) of someone in generation g given the incomes of all ancestors ({Yg}g′ < g) is a linear function β of the parent income (Yg – 1):

$$ \mathrm{E}\left({Y}_g\left|{\left\{{Y}_{g\prime}\right\}}_{g^{\prime }<g}\right.\right)=\upbeta {Y}_{g-1}\kern2.5em \forall g. $$
(7)

This model forms the basis of the famous claim that families can go “from shirtsleeves to shirtsleeves in three generations” (Becker and Tomes 1986:S28), rising from poverty to luxury and back again. A one-unit increase in generation g income produces a β increase in the expected income of generation g + 1, but produces only a βk increase k generations in the future. If we conceptualize each income measure (Yg) as standardized to a mean of 0 and a variance of 1, the coefficient β can be interpreted as the correlation between the incomes of parents and offspring within a family. Figure 1 plots the rapid decay across generations that results for the case when β = .50. Despite a moderately high .50 correlation between the incomes of parents and offspring, a great-great-grandparent’s income is correlated only .06 with offspring income. In short, families have abundant opportunity to move away from their origins over the course of a few generations.

Fig. 1
figure 1

Multigenerational permanent income correlations decay rapidly under the Becker-Tomes model. The attainment of a distant ancestor is only weakly associated with one’s own attainment even if β is large (β = .5 depicted).

The prediction of substantial mobility over multiple generations holds only insofar as the assumptions of the Becker and Tomes (1979) model are correct. Three assumptions are noteworthy (Fig. 2). First, the assumed process is observed rather than latent: observed parent income (Yg – 1) is sufficient to block all statistical dependence between earlier ancestor incomes ({Yg}g′ < g) and offspring income (Yg). Even if this holds for true income, income reported with error \( {\overset{\sim }{Y}}_{g-1} \) would not be sufficient to block this dependence (Ferrie et al. 2016). This is the sense in which the process is observed. Second, the assumed process is static rather than dynamic: the process that generates incomes in generation g is the same as the process that generates incomes in generation g − 1. This would be violated if the process of mobility changes over time. If the ability to buy advantages for one’s offspring differs across generations, a single β cannot represent this relationship in all generations. Third, the assumed process is first-order rather than second-order: once the incomes of parents are known, the incomes of grandparents provide no additional information about the life chances of offspring. This would not be true, for instance, if grandparents directly affect life chances net of parents (Mare 2011; Pfeffer 2014).

Fig. 2
figure 2

Theoretical models of multigenerational mobility. Each model represents a variable in the transmission process. The figure depicts the outcomes of a pair of siblings and one of their cousins. Panel a is the Becker-Tomes model. Panel b allows the parent-offspring transmission coefficient to be dynamic across generations. Panel c maintains a static transmission parameter but adds a measurement step with reliability η. Panel d assumes a static process with perfect measurement but allows a direct grandparent-offspring link. Evidence against the Becker-Tomes model can be interpreted as evidence for any of the three alternatives or others not depicted.

Statistical evidence that contradicts the Becker-Tomes model may mean that any one of these assumptions does not hold. Sibling and cousin correlations offer one source of evidence against the model. In particular, the model predicts a sibling correlation of β2 and a cousin correlation of β4 (online appendix, part 1). Without directly estimating β, one can assess evidence against this model by a test statistic τ: the difference between the cousin correlation and the squared sibling correlation,

$$ \uptau ={\uprho}_{Cousin}-{\uprho}_{Sibling}^2, $$
(8)

where τ = 0 under the model of Becker and Tomes (1979). The following subsections show how this prediction does not hold under violations of each of the assumptions that the process is static, observed, and first-order.

Interpretation 1: From a Static to a Dynamic Process

Suppose the process is dynamic, so that the intergenerational income elasticity βg is different in each generation g:

(9)

In this case, a one-unit increase in parent income increases offspring income in generation g by βg. A one-unit increase in grandparent income increases parent income in generation g − 1 by a different value of βg − 1. Combining these, a one-unit increase in grandparent income increases offspring income by βg − 1βg. This could be greater or less than \( {\upbeta}_g^2 \) (the prediction of the static model) depending on whether the grandparent-parent transmission, βg − 1, is greater or less than the parent-offspring transmission, βg.

In the analogous intragenerational case, each sibling’s income is correlated βg with parent income, producing a correlation of \( {\upbeta}_g^2 \) between siblings (online appendix, part 1). Cousin’s incomes are correlated βg − 1βg with the income of a shared grandparent, producing a cousin correlation of \( {\upbeta}_{g-1}^2{\upbeta}_g^2 \). Notably, this cousin correlation can be larger or smaller than the squared sibling correlation. Given an estimated sibling correlation, the Becker-Tomes model is consistent with only one cousin correlation, whereas a dynamic model is consistent with any cousin correlation.

$$ {\uprho}_{Sibling}={\upbeta}_g^2. $$
(10)
$$ {\uprho}_{Cousin}={\upbeta}_g^2{\upbeta}_{g-1}^2. $$
(11)
$$ {\uprho}_{Sibling}^2<{\uprho}_{Cousin}\ \mathrm{if}\ {\upbeta}_g<{\upbeta}_{g-1}. $$
(12)
$$ {\uprho}_{Sibling}^2={\uprho}_{Cousin}\ \mathrm{if}\ {\upbeta}_g={\upbeta}_{g-1}. $$
(13)
$$ {\uprho}_{Sibling}^2>{\uprho}_{Cousin}\ \mathrm{if}\ {\upbeta}_g>{\upbeta}_{g-1}. $$
(14)

Interpretation 2: From an Observed to a Latent Process

Measurement error has formed the core of numerous debates in stratification research. As early as the 1960s, it was known that interpretation of correlations in socioeconomic status depends on the “assumptions one is willing to defend” about the relationship between measured variables and true values (Siegel and Hodge 1968:144). For instance, whereas Blau and Duncan (1967) found substantial returns to education net of family background variables, Bowles (1972:S222) called this conclusion “seriously misleading” because of reporting error in family background variables that downwardly bias their association with the outcome and produce an overstatement of the returns to education. In studies of mobility over three generations, it is well known that credible estimates require measurement models for socioeconomic outcomes (Warren and Hauser 1997). Central conclusions in the field of mobility often hinge on assumptions about measurement.

Sibling and cousin correlations are no exception to this rule. As first noted in the context of sibling correlations (Solon et al. 1991), intragenerational correlations in measured attainment \( \Big(\overset{\sim }{Y} \)) will be less than correlations in true attainment (Y) to the extent that attainment is reported with classical measurement error. This occurs because measurement error makes siblings (and cousins) look more different than they actually are.Footnote 1 Formally, suppose that reported income, \( \overset{\sim }{Y} \) (standardized to a mean of 0 and variance of 1), is a noisy indicator of true income, Y (also standardized to mean 0 and variance 1), with reliability factor η linking the two. Suppose that reported income is uncorrelated across individuals given actual income.

(15)
$$ \mathrm{Cov}\left({\overset{\sim }{Y}}_{gi},{\overset{\sim }{Y}}_{gj}\left|{Y}_{gi},{Y}_{gj}\right.\right)=0. $$
(16)

Pairing this measurement model with the Becker-Tomes transmission model (Eq. (7)) yields the following sibling and cousin correlations.

$$ {\uprho}_{Sibling}={\upeta}^2{\upbeta}^2. $$
(17)
$$ {\uprho}_{Cousin}={\upeta}^2{\upbeta}^4. $$
(18)
$$ {\uprho}_{Sibling}^2\le {\uprho}_{Cousin}. $$
(19)

In short, measurement error depresses the sibling and cousin correlations by a factor of η2 1. Counterintuitively, the typical practice of squaring the sibling correlation also squares the contribution of measurement error, thereby driving the squared sibling correlation down to a greater degree than the cousin correlation, which is not squared. Because the central test statistic for the Becker-Tomes model involves a comparison of the squared sibling correlation and the cousin correlation, conclusions can be especially sensitive to the presence of measurement error. Even a shift from annual to permanent income could be sufficient to alter conclusions about multigenerational persistence.

Interpretation 3: From a First-Order to a Second-Order Process

Finally, evidence against the Becker-Tomes model can be interpreted in light of a second-order process under which the attainment of grandparents and offspring are directly linked through a path that does not involve the attainment of parents:Footnote 2

(20)

Numerous theories suggest that βGrandparent ≠ 0. Many children today see their grandparents frequently (Dunifon et al. 2018), sometimes living in doubled-up multigenerational households (Mykyta and Macartney 2011; Pilkauskas and Martinson 2014; Pilkauskas et al. 2014; Taylor et al. 2010). Prior research has documented substantial multigenerational associations in education (Sheppard and Monden 2018; Zeng and Xie 2014), occupation (Chan and Boliver 2013), and wealth (Adermon et al. 2018; Pfeffer and Killewald 2017). Extended kin may compensate for the shortcomings of parents, providing resources that parents cannot (Erola et al. 2018; Prix and Pfeffer 2017), or may serve as a cultural reference point that propels offspring toward academic achievement and attainment (Hällsten and Pfeffer 2017; Hertel and Groh-Samberg 2014). Although some evidence casts doubt on a direct role for grandparents in the attainment process (Warren and Hauser 1997), much of the literature suggests that one exists in at least some contexts. For a variety of reasons, we might expect grandparent income to be informative about offspring income net of parent income.

Correlations in a second-order process are mathematically complicated. To build intuition for this complexity, consider the case in which both parents and grandparents directly cause the incomes of offspring. The parent-offspring correlation then includes a causal component as well as components that arise through confounding (e.g., both are influenced by the grandparents and previous ancestors). The following equations show the sibling and cousin correlations generated by a second-order process (derivations in the online appendix, part 1):

(21)
(22)

For the present purpose, it is sufficient to note that the cousin correlation need not equal the squared sibling correlation in Eqs. (21) and (22).

Three Interpretations: What We Learn and Do Not Learn From Evidence

The sections above introduced the Becker-Tomes model as well as three theoretical extensions to processes that are dynamic, latent, or second-order. The Becker-Tomes model has only one unknown (β), but sibling and cousin correlations provide two sources of empirical information. This makes the Becker-Tomes model overidentified: the two sources of empirical evidence can point toward different estimates of β and thus cast doubt on the validity of the model.Footnote 3 Because the extensions have more unknown parameters, they are consistent with a wider range of possible sibling and cousin correlations. Thus, in general, the data cannot cast doubt on these alternatives.Footnote 4 Conclusions that favor one interpretation over another require some other reason, such as additional data or a conceptual argument, to justify the preferred interpretation over the other candidates.

Reinterpreting Published Evidence

Table 1 presents a few published sibling and cousin correlations. In the text, I highlight one estimate from each study. Every author provided compelling evidence that casts doubt on the Becker-Tomes model, but the language they used to discuss this evidence often favors a second-order process in which grandparents directly influence offspring attainment. Yet, the evidence is often also consistent with a dynamic or latent process.

Table 1 Published estimates of cousin correlations providing evidence against the Becker-Tomes model if the cousin correlation deviates from the squared sibling correlation

Analyzing educational attainment in the Wisconsin Longitudinal Study, Jæger (2012: table 2) estimated that 14% of the variance in years of education is between extended families, and 26% of the variance is between nuclear families within extended families. These estimates indicate a sibling correlation of .14 + .26 = .41 (with the sum reflecting rounding), which squares to predict a cousin correlation under the Becker-Tomes model of .17. The estimated cousin correlation is .14. This evidence is consistent with several theories. The process could be dynamic, with grandparent-parent income transmission of βg − 1 = .58 and parent-offspring transmission of βg = .64. It could also arise from a second-order model in which parent income raises offspring income by βParent = .68 and grandparent income directly reduces offspring income by a small amount, βGrandparent = −.06. Jæger (2012:913) concluded, “a two-generation Markov process [parent-to-child transmission] does not represent the total effect of family background on educational success.” The evidence does indeed cast doubt on a two-generation Markov process defined as the Becker-Tomes model, but it does not imply that the extended family plays any direct role in the attainment of offspring.

Hällsten (2014) referenced an estimate by Björklund and Jäntti (2012) that sibling’s years of education are correlated .39 in Sweden. Squaring this yields a Becker-Tomes prediction that the cousin correlation would be .15. This is almost exactly equal within rounding to the .15 cousin correlation that Hällsten (2014: table 2) estimated with Swedish register data. This estimate is consistent with a static, first-order Markov process with intergenerational transmission by β = .62. Nevertheless, on the basis of the inability to explain away the cousin correlation with measured covariates, Hällsten (2014:31) concluded, “unless unobserved characteristics of parents account for all of the 1st and 2nd cousin correlations, the estimated and adjusted correlations are clearly incompatible with a Markov process [parent-to-child transmission]” (emphasis in original). The robustness of the cousin correlation to controls for parental characteristics is an intriguing result; this represents one example where new external information may inform multigenerational theories beyond what we can learn from a sibling and cousin correlation alone, in a way not formalized in this article. The purpose of including this estimate here is simply to highlight that the sibling and cousin correlations alone are insufficient to provide evidence against a process of parent-to-child transmission.

Knigge (2016) studied occupational status attainment in the Netherlands in the late nineteenth and early twentieth century, estimating a sibling correlation of .50. Squaring the sibling correlation yields a predicted cousin correlation of .25, which is slightly smaller than the estimated cousin correlation of .32, as the author noted (Knigge 2016:1233). The conclusion that “these results are not congruent with the Markovian model” (Knigge 2016:1233) is thus true only if “the Markovian model” strictly means the static, observed, first-order Becker-Tomes model. The evidence is in fact consistent with a dynamic process (coefficients of .80 from grandparents to parents and .71 from parents to offspring) or a latent process (β = .80 and measurement reliability η = .88). Finally, it is also consistent with a view that the Markovian model is wrong and that the parent effect (βParent = .60) combines with a direct grandparent effect (βGrandparent = .14) to produce offspring income. It is not clear a priori which of these theories is most conceptually plausible.

Finally, using PSID data from the United States, Pfeffer et al. (2016) estimated a sibling correlation in log net worth of .34, which implies a cousin correlation of .342 = .12. The authors estimated a cousin correlation of .19, which is slightly higher than expected and warrants their conclusion that “19% of individuals’ wealth attainment can be traced to the common origins of cousins (i.e. grandparent environments), reflecting concentration of family wealth within lineages beyond just two generations” (p. 12). The authors made no claims about whether this multigenerational persistence violates parent-to-child transmission; the study is included here only because readers may suspect that wealth transmission in the United States is a context in which multigenerational transmission is especially strong. The estimate is consistent with any of the three models presented here: dynamic transmission (βg − 1 = .75, βg = .58), latent transmission (β = .75, η = .78), or a second-order process (βParent = .48, βGrandparent = .16).

Each of the aforementioned studies found additional evidence based on intergenerational correlations to support its claims, and the purpose of this review is not to discredit the contribution of these authors. Their estimates have brought a novel framework to bear on an old question, yielding an important result: cousins’ outcomes are in many settings more similar than would be expected under the Becker-Tomes model. However, the estimates do not necessarily imply the particular alternative model in which cousin similarities arise because of direct multigenerational effects. These estimates are instead consistent with a variety of alternative models, including processes that operate from parent to child but are dynamic or latent.

Empirical Tools to Estimate Uncertainty and Incorporate Repeated Measures

The preceding review comments solely on whether published point estimates diverge from the prediction of the Becker-Tomes model. This omits an entirely separate reason why estimates may be different than expected: estimation uncertainty. In this section, I develop a Bayesian estimation procedure for sibling and cousin correlations that quantifies uncertainty about the test statistic (the cousin correlation minus the squared sibling correlation). By incorporating repeated measures, the estimator also addresses some concerns that arise from classical measurement error and life cycle bias, each of which could produce a situation of latent transmission if left unaddressed.

Proposed Model

Suppose we obtain data on the log family income, Yit, experienced by each person i at various ages t from 25 to 45. I standardize income to a mean of 0 and standard deviation of .5 for analysis,Footnote 5 calculated by pooling all persons and years. Denote the population-average income at age t by βt, and assume that each person’s income is a normal draw around a person-specific mean \( {\upmu}_i^{Person}+{\upbeta}_t \) with variance \( {\upsigma}_{Observation}^2 \):

(23)

The person-specific intercept \( {\upmu}_i^{Person} \) allows that each person’s income may be shifted up or down by a fixed amount from the population average. The population mean βt at each age t adjusts for the life cycle bias problem that some people are observed at ages when incomes tend to be lower or higher. The observation-level variance, \( {\upsigma}_{Observation}^2 \), subsumes actual income volatility and classical measurement error. To examine sibling and cousin correlations in permanent income, \( {\upmu}_i^{Person} \), I decompose the variance of this quantity into three components as formalized in Eq. (1): a person-specific component (δi), a sibling set component (ηS(i)), and a cousin set component (λC(i)). I assume a normal prior on the values of each component:

$$ {\updelta}_1,\cdots, {\updelta}_{n_{Person}}\ {\displaystyle \begin{array}{c}\mathrm{iid}\\ {}\overset{\sim }{}\end{array}}\mathcal{N}\left(0,{\upsigma}_{Person}^2\right) $$
(24)
$$ {\upeta}_1,\cdots, {\upeta}_{n_{SiblingSet}}\ {\displaystyle \begin{array}{c}\mathrm{iid}\\ {}\overset{\sim }{}\end{array}}\mathcal{N}\left(0,{\upsigma}_{SiblingSet}^2\right) $$
(25)
$$ {\uplambda}_1,\cdots, {\uplambda}_{n_{CousinSet}}\ {\displaystyle \begin{array}{c}\mathrm{iid}\\ {}\overset{\sim }{}\end{array}}\mathcal{N}\left(0,{\upsigma}_{CousinSet}^2\right). $$
(26)

The central task for estimation of sibling and cousin correlations is to estimate the values of the variance components \( \left\{{\upsigma}_{Person}^2,{\upsigma}_{SiblingSet}^2,{\upsigma}_{CousinSet}^2\right\} \) because these imply particular values of sibling and cousin correlations (see Eqs. (2)–(6)). I place priors on the variance components and the other unknown parameters to target the search for a solution to the range of reasonable values and provide the basis for fully Bayesian inference:

$$ \overrightarrow{\upbeta}\sim \mathcal{N}\left(\overrightarrow{0},\mathbf{I}\right) $$
(27)
$$ \left\{{\upsigma}_{Observation},{\upsigma}_{Person},{\upsigma}_{SiblingSet},{\upsigma}_{CousinSet}\right\}\ {\displaystyle \begin{array}{c}\mathrm{iid}\\ {}\overset{\sim }{}\end{array}}\ \mathrm{half}-\mathrm{Cauchy}\ \left(0,1\right). $$
(28)

The normal prior on the age trajectory shrinks all coefficients toward 0, similar to ridge regression; in this setting, though, the prior is vastly overwhelmed by the data because the population-mean income at each age can be estimated with considerable precision (see Fig. A4 in the online appendix). The half-Cauchy prior on the variance components is the absolute value of a Cauchy distribution, with scale parameter corresponding to the median of the distribution. It is very diffuse (see Fig. A3 in the online appendix) and is a standard choice of a weakly informative prior for variance parameters (Gelman et al. 2008).

Data: Panel Study of Income Dynamics

I fit the model to U.S. data from the Panel Study of Income Dynamics (PSID). The PSID selected a probability sample of 1968 households and interviewed descendants from those households annually through 1999 and biannually thereafter through 2015, producing a sample of individuals nested in family lineages. I restrict to the Survey Research Center (SRC) sample, for which all households in the contiguous United States in 1968 had an equal probability of selection (for a discussion of other samples, see the online appendix, part 2). The SRC sample initially selected 3,000 households, of whom 755 produced descendants two generations later who were interviewed at ages 25–45. This reduction in the number of families reflects the combined influence of survey nonresponse, childlessness, delayed fertility, and survey attrition (see the online appendix, part 2). The analytic sample includes 11,730 observations on 2,363 respondents nested in 1,260 sibling sets in 755 cousin sets. A total of 1,854 respondents have at least one cousin in the sample, and 1,431 respondents have at least one sibling in the sample. The outcome variable of interest is log total family income, adjusted to 2014 dollars using the Consumer Price Index. To reduce the influence of very low family incomes when logged, I bottom-code inflation-adjusted family incomes at $5,000.

Posterior Sampling

I estimate the model using the rstan package in R (Stan Development Team 2017), which simulates from the posterior distribution using Hamiltonian Monte Carlo in Stan. I simulate 10,000 burn-in draws and 10,000 posterior samples for each of four chains executed sequentially from independent starting points, which produce a pooled set of 40,000 posterior draws.Footnote 6 Point estimates reported are posterior means. Credible intervals report the 5th and 95th percentiles of posterior draws. Trace plots for key parameters of interest are provided in Fig. A5 in the online appendix.

Results

Fig. 3 presents results. The estimated sibling correlation is .38 (95% confidence interval (CI): .32–.44). Squaring all posterior draws to predict the cousin correlation under the Becker-Tomes model yields a posterior mean estimate of .14 (95% CI: .10–.20). The observed cousin correlation is slightly larger, at .24 (95% CI: .17–.31). The key test statistic subtracts the squared sibling correlation from the cousin correlation. For this test statistic, the estimate is .10 (95% CI: .02–.17). The cousin correlation therefore provides some evidence against the Becker-Tomes model for transmission of permanent income. Estimation uncertainty clarifies the strength of this evidence.

Fig. 3.
figure 3

The cousin correlation in permanent log family income is slightly greater than expected under the Becker-Tomes model. This empirical result does not point to any particular preferred alternative interpretation without additional assumptions. Error bars denote 95% posterior credible intervals.

Rejection of the Becker-Tomes model (Fig. 3) leaves open the question of which alternative is most theoretically plausible. The results are consistent with at least three alternatives. The transmission process may be dynamic with grandparent-parent transmission βg − 1 = .80 (95% CI: .67–.91) and parent-offspring transmission βg = .62 (95% CI: .56–.66).Footnote 7 The use of permanent income removes the possibility that this result could be explained by classical measurement error at the observation level, but two alternative latent processes are possible. One is that individuals consistently over- or underreport their incomes in all waves of the survey, thereby producing classical measurement error at the person level with reliability η. A second is that the variable transmitted across generations is actually a latent variable distinct from income, and this variable is correlated with permanent income at only η. The observed data are consistent with either scenario, with β = .80 (95% CI: .67–.91) and measurement reliability η = .78 (95% CI: .64–.94). Finally, results are consistent with the interpretation favored by some prior researchers: a second-order process. In this case, parents affect offspring incomes by βParent = .47 (95% CI: .30–.61), and grandparents directly affect offspring incomes by βGrandparent = .21 (95% CI: .04–.38). Assessing the relative merits of these three theories would require conceptual argument; the data do not point toward any one theory over the others.

Comparison With Prior Approaches

This section briefly reviews what has been gained by the Bayesian statistical model of this article relative to other approaches. Under some assumptions, the model solves three key problems: uncertainty, life cycle bias, and measurement error.

Quantification of uncertainty is the primary advantage of the Bayesian approach. The key test statistic is a complex function of the variance components of a model, involving division, squaring, and subtraction. Posterior samples make uncertainty straightforward even in this setting: pass each posterior sample through the transformation, and then summarize the distribution of posterior samples that result. It is possible to arrive at similar uncertainty estimates using frequentist methods, but doing so is mathematically complicated. Standard software packages (e.g., nlme; Pinheiro et al. 2019) estimate the frequentist variance of a transformation of the variance of random intercepts at each level of the model, but mathematical derivations are required to convert these to estimates for the variance of sibling and cousin correlations (online appendix, part 3). It is therefore understandable that prior research has generally omitted uncertainty intervals on key quantities. A Bayesian model provides one solution to this problem.

The model also conducts an adjustment for age that addresses some concerns about life cycle bias: the tendency of substantive conclusions in mobility research to be sensitive to the age at which income is measured (Haider and Solon 2006). The problem arises because incomes are not stable over the life course and because not all individuals report incomes at exactly the same ages. Sibling correlations might appear large simply because the age at which incomes are reported is more similar within than between sibling sets. This source of similarity is a nuisance; sibling similarities that arise because of similarities in the age of measurement are not typically of interest for substantive theories about mobility. The proposed model corrects for this problem to the degree that every individual’s income trajectory is an intercept shift \( {\upmu}_i^{Person} \) away from the population-average income trajectory \( \overrightarrow{\upbeta} \). If we observe incomes at ages 25–30 for one pair of siblings and ages 35–40 for another pair of siblings, the population-average trajectory upwardly adjusts the estimate of \( {\upmu}_{Person}^i \) in the former case and downwardly adjusts \( {\upmu}_{Person}^i \) in the latter case.

The model does not correct for life cycle bias if income trajectories follow different patterns across individuals. If, for instance, individuals with higher long-term earnings also experience steeper earnings profiles, as in the example of Haider and Solon (2006), then a homogeneous population-average trajectory will not correct for this problem. Nonetheless, the problem is not likely to be severe in this example because the reporting ages do not vary dramatically within the analytic sample: across individuals, the interquartile range of the mean age of reporting is 27.0 to 31.4. Similarity in reporting ages across individuals means that, even if the age adjustment is incorrect, results are likely to approximate the sibling and cousin correlation around the ages at which incomes are reported in the sample. The model is especially promising if we benchmark against the most standard approach of averaging income over whatever ages are observed for each individual within a specified age window (e.g., Björklund and Jäntti 2012). Incorporating the age trajectory directly into the model is an improvement over this benchmark because it explicitly adjusts for age differences across individuals in the sample.

Net of life cycle bias, the model addresses one type of measurement error by estimating the observation-level residual variance, \( {\upsigma}_{Observation}^2 \), and omitting this component from calculation of sibling and cousin correlations. The information to estimate \( {\upsigma}_{Observation}^2 \) comes from the fact that each person is observed an average of five times, with an interquartile range from two to seven. If only one observation per person were available, then sibling and cousin correlations would be unable to distinguish the person-specific permanent variance component, \( {\upsigma}_{Person}^2 \), from the observation-level variance component, \( {\upsigma}_{Observation}^2 \). Pulling these apart avoids a problem in which random observation-level reporting error would drive down sibling and cousin correlations (Solon et al. 1991). The model does not resolve problems of nonclassical measurement error. If individuals tend to over- or underreport their incomes in the same direction over time in a way independent of all other individuals, this would inflate \( {\upsigma}_{Person}^2 \) and drive down sibling and cousin correlations. This might happen, for instance, if respondents seek to make themselves look good on the survey by overreporting income whenever they are asked. The tendency to overreport may be correlated among siblings, for instance, if their parents instilled an especially high value on presenting oneself as materially successful. This would inflate \( {\upsigma}_{SiblingSet}^2 \), thereby increasing sibling correlations and reducing cousin correlations. Finally, cousin correlations would be higher if a tendency to overreport is correlated among cousins, thereby inflating \( {\upsigma}_{CousinSet}^2 \). Addressing these types of nonclassical measurement error would likely require new forms of data about how individuals respond to surveys and whether response errors are correlated within families. I leave these tasks to future research. For the present article, a focus on permanent income resolves measurement error only if this error is classical at the level of observations.

Discussion

Sibling and cousin correlations in socioeconomic outcomes serve a useful descriptive purpose, summarizing the degree to which outcomes are similar for those born into the same family. Knowledge of these quantities and their variation across outcomes, over time, and under various institutional arrangements can spur new theories about how these correlations come to be. Sibling and cousin correlations serve as descriptive facts that force us to look at the situation in a new light and develop hypotheses about the relationship between family origins and socioeconomic opportunity.

Distinguishing between various theories, however, is difficult because the range of conceptual processes that could produce sibling and cousin correlations involves far more parameters than the amount of data available. This is in some sense unsurprising: data in one generation can tell us about a process occurring in past generations only under incredibly strong assumptions. To interpret a variance decomposition in terms of sibling and cousin correlations requires assumptions about independence across units. To further interpret those correlations as evidence of a dynamic, latent, or second-order process requires a conceptual argument against the other interpretations equally consistent with the observed data. To conclude that extended kin shape life chances requires assumptions that outcomes have been measured precisely and that the transmission process has unfolded with the same parameter values over the past two generations. A credible defense of these assumptions is difficult to build. For this reason, caution is warranted when using sibling and cousin correlations to draw multigenerational inferences.

Beyond sibling and cousin correlations, the claims of this article relate to a growing body of evidence that various indicators of socioeconomic attainment persist over three or more generations to a greater degree than the Becker-Tomes model would predict (Anderson et al. 2018; Chan and Boliver 2013; Colagrossi et al. 2019; Hällsten and Pfeffer 2017; Hertel and Groh-Samberg 2014; Huang et al. 2015; Jæger 2012; Knigge 2016; Lindahl et al. 2015; Liu 2018; Olivetti et al. 2016; Pfeffer 2014; Pfeffer and Killewald 2015, 2017; Wightman and Danziger 2014; Ziefle 2016). This body of work clearly demonstrates that decades of past research have overstated societal openness by leaving complexities out of the model. Already, others have pointed out that evidence of grandparent-offspring associations conditional on parent characteristics can arise from various processes (Braun and Stuhler 2016; Breen 2018; Ferrie et al. 2016; Solon 2018; Stuhler 2012). I join these authors in arguing that evidence from multigenerational and intragenerational correlations remains unable to pinpoint the precise complexity that should be added to the model. The interpretation favored by some authors (a direct role for grandparents) is only one candidate among many possibilities that could make society more rigid than Becker and Tomes (1979) predicted. One model that is wrong opens the door to numerous new theories that might be more consistent with the observed data.

Proper interpretation of evidence is especially important in the domain of multigenerational mobility because of the theoretical positions at stake. As an extreme example, Clark (2014) noted that in many societies, certain surnames tend to have high average attainment across many generations and concluded that genetic factors must be responsible for this high degree of persistence. Torche and Corvalan (2018) thoroughly critiqued this argument by clarifying that the persistence of group-level attainment (averaged over all those with a given surname) is distinct from the persistence of individual-level attainment, so that the former does not imply the latter. This level of clarity about evidence is obviously essential when a researcher claims that genetics underly mobility processes. I argue that researchers must take a similarly critical view when assessing the evidence on any theory of multigenerational transmission. Sociologists and demographers may be drawn to theories that predict strong socioeconomic persistence because individuals are embedded in complex family networks that constrain opportunity socially from many directions. Supporting such a strong and complex theory, however, requires an accurate statement of the evidence in support of this theory. Researchers must devote at least as much thought to more mundane alternatives, such as measurement error, which could produce the empirical finding.

To apply this lens critically to the work presented here, I cannot rule out direct effects of extended kin on life chances. The actual process by which income is transmitted over generations is likely to be far more complicated than the three simple models considered here. All I can credibly conclude is that the estimated sibling and cousin correlations could have been generated by a dynamic, latent, or second-order process, or something else that I have not considered. With confidence, I can conclude that the evidence does not point cleanly toward any one of these as the correct interpretation. Sibling and cousin correlations provide a useful point of reference, but in the case of multigenerational mobility, the answers they provide are far from definitive.