Introduction

The main empirical trend revisited in this article is the dramatic closing of the observed gender wage gap (GG) that took place in the 1980s. Understanding the forces behind the closing of the observed GG requires measurement of the joint evolution of prices of the observed and unobserved characteristics of men and women and the composition of the labor force in terms of those characteristics. To this end, Mulligan and Rubinstein (2008) (henceforth, MR) estimated the standard (Heckman 1979) selection model for the late 1970s and early 1990s and argued that the change in the composition of working women in terms of their unobservable characteristics (e.g., cognitive ability) accounted for nearly all the closing of the observed gender wage gap (see table 1 in MR, p. 1076).

This finding lends strong support for their main hypothesis—namely, that an exogenous rise in the market value of unobservable characteristics rationalizes the closing of the observed GG via the rise in selection bias.Footnote 1 This hypothesis is appealing for two reasons: (1) it reconciles the long-standing puzzle of gender equality emerging alongside with increasing inequality within gender (Blau and Kahn 1997; Katz and Autor 1999); and (2) it is consistent with the rise in the price of unobservable characteristics, which has been identified as an important factor behind increasing wage inequality among men (Juhn et al. 1993). However, the rise in the selection bias estimated by MR is so large that it leaves no role for the joint influence of the other channels. This makes it difficult to reconcile their findings with the related literature, much of which assigns an important role to changes in discrimination broadly defined.Footnote 2

In this article, we reexamine the importance of the selection bias channel relative to other GG-reducing channels by employing the same data set and model as MR used. Our benchmark specification posits that spousal income is a determinant of female participation (see Blau and Kahn 1997; Mroz 1987).Footnote 3 We then show that if one views our benchmark specification as the data-generating process, then the MR specification, which omits spousal income from the participation equation, leads to inconsistent estimates of the wage equation. Formally, because spousal income determines female participation and also correlates with other determinants in the participation equation, omitting it from the participation equation violates the assumption of independent error terms. By emphasizing the strong correlation between spousal income and other female observable characteristics, our work is closely related to the strand of literature on positive assortative matching in marriage markets (Devereux 2004; Greenwood et al. 2014; Karoly 1993).

Our benchmark specification implies drastically different estimates of the wage equations and therefore drastically different implications for the decomposition of the GG closing. In contrast to the MR specification, our estimates imply that the selection bias is positive not only for the 1990s but also for the 1970s. This finding, which is highly significant, actually provides stronger support for the MR hypothesis that growing within-gender residual inequality contributed to the GG closing, for two reasons. First, the positive sign of the 1970s selection bias makes theoretical implications of growing residual inequality consistent with the rise in female labor force participation. Second, the positive sign substantially strengthens the quantitative impact of growing residual inequality on the selection bias term.

With regard to the decomposition of the GG closing, we show that relative to our benchmark, the estimates based on the MR specification significantly overstate the role of the rise in the selection bias (78 % versus 39 % in the benchmark) and understate the role of the decline in discrimination in its broad sense (7 % versus 42 % in the benchmark). By decline in discrimination, we refer to the closing in the gender price difference obtained on observable characteristics, such as educational attainment and years of potential experience. Women’s restricted access to high-paying occupations/tasks in the 1970s should be viewed as a form of discrimination, and it will be expressed as a difference in the market price that observationally equivalent men and women were able to obtain on their observable characteristics. The contribution of the closing of the GG in terms of observable characteristics (10 %) is independent of model specification because it is evaluated using the estimated coefficients from the male regression.

A related study on the decomposition of the GG closing is Blau and Kahn’s (1997). Applying the technique proposed in Juhn et al. (1993) to the PSID data, which does not require estimates of the female wage equations, they attribute an important part to the reduction of the gender difference in the years of actual experience and occupations, as well as the unexplained component. Our technique provides a more detailed look at the GG closing because it also allows us to assess the contribution of movements in the gender price difference. We cannot control for the occupational composition because occupations are not observed for nonworking females. The extent to which women became more similar to men in terms of years of actual experience or occupational choice will be reflected in the reduction of the market price gap that men and women receive on their observable characteristics. Blau and Kahn (2007) also argued that as women increased their attachment to the labor force in the 1980s, the rationale for statistical discrimination against them declined. Many of the structural models applied to the analysis of the GG closing also attribute an important role to reductions in discrimination against women (e.g., Jones et al. 2015).

The rest of the article is organized as follows. In the first two sections, we review the benchmark specification and estimation method, and explain that in view of the benchmark specification as the data-generating process, the MR specification will produce inconsistent estimates of the wage equation. We then report the empirical results and the implied decomposition of the GG closing for both the benchmark and MR specification. We also provide a data-based explanation for why omitting spousal income leads to an understatement of the selection bias. The last section concludes.

Benchmark Model

To revisit the decomposition of the GG closing, we start with the Gronau-Heckman-Roy (GHR) model.Footnote 4 The model postulates a wage offer equation for women,

$$ w^{\ast} =X\upbeta +u, $$
(1)

and a reservation wage equation,

$$r^{\ast} =X^{r}{\upbeta}^{r}+\upalpha I_{h}+\upvarepsilon , $$

where w and r denote the latent log wage offer and reservation wage; X and [X r, I h ] are their observed determinants; and u and ε are the disturbance terms jointly distributed according to

$$\left[ \begin{array}{c} u \\ \upvarepsilon \end{array} \right] \sim N\left( \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] ,\left[ \begin{array}{cc} {{\upsigma}_{u}^{2}} & {\uprho}_{u\upvarepsilon} {\upsigma}_{u}{\upsigma}_{\upvarepsilon} \\ {\uprho}_{u\upvarepsilon} {\upsigma}_{u}{\upsigma}_{\upvarepsilon} & {\upsigma}_{\upvarepsilon}^{2} \end{array} \right] \right) $$

and satisfying E(u|X, X r, I h ) = E(ε|X, X r, I h ) = 0. A woman works if her wage offer w exceeds her reservation wage r , in which case the observed wage w equals the wage offer w . The observed wage is otherwise missing.

This GHR model provides the foundation for the standard selection model, which postulates that the wage offer in Eq. (1) is observed only for individuals with a positive participation index:

$$ L^{\ast} =Z\upgamma -\upalpha I_{h}+v. $$
(2)

The GHR model makes explicit the dependence of the participation index, L = w r , on the observed and unobserved components of the offered and reservation wages: Zγ=Xβ−X rβr and v = u−ε. It follows that the disturbance terms u and v are jointly distributed according to

$$ \left[ \begin{array}{c} u \\ v \end{array} \right] \sim N\left( \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] ,\text{} \left[ \begin{array}{cc} {{\upsigma}_{u}^{2}} & {\uprho}_{uv}{\upsigma}_{u}{\upsigma}_{v} \\ {\uprho}_{uv}{\upsigma}_{u}{\upsigma}_{v} & {{\upsigma}_{v}^{2}} \end{array} \right] \right), $$
(3)

where \({\upsigma }_{v}=\left [ {{\upsigma }_{u}^{2}}+{\upsigma }_{\upvarepsilon }^{2}-2{\uprho }_{u\upvarepsilon } {\upsigma }_{u}{\upsigma }_{\upvarepsilon } \right ]^{1/2}\) and \({\uprho }_{uv}=\frac {{\upsigma }_{u}-{\uprho }_{u\upvarepsilon } {\upsigma }_{\upvarepsilon }} {{\upsigma }_{v}}, \) and that the disturbance terms satisfy the exogeneity assumption E(u|Z, I h ) = E(v|Z, I h ) = 0.

We will refer to the selection model to be estimated in Eqs. (1)–(3) as our benchmark model. Following Mroz (1987) and Blau and Kahn (1997), we include husband’s income I h as a regressor in the participation equation in order to emphasize its role explicitly. In light of the negative relationship between female participation and spousal income, we expect to obtain a positive estimate of α.

The conditional wage equation for the benchmark model in Eqs. (1)–(3) is given by

$$\begin{array}{@{}rcl@{}} E\left[ w^{\ast} |L^{\ast} >0,Z,I_{h}\right] &=&X\upbeta +E\left[ u|v>-\left( Z\upgamma -\upalpha I_{h}\right) ,Z,I_{h}\right] \\ &=&X\upbeta +{\uprho}_{uv}{\upsigma}_{u}\uplambda \left( Z\frac{\upgamma} {{\upsigma}_{v}}- \frac{\upalpha} {{\upsigma}_{v}}I_{h}\right) , \end{array} $$

where λ(⋅)=ϕ(⋅)/Φ(⋅) is the inverse Mills ratio.Footnote 5 In order to follow the MR methodology precisely, we use the Heckman two-step estimation procedure. In the first step, the probit model \(\Pr \left (L^{\ast } >0|Z,I_{h}\right ) ={\Phi } \left (Z\frac {\upgamma } {{\upsigma }_{v}}-\frac { \upalpha } {{\upsigma }_{v}}I_{h}\right ) \) is estimated, which gives the predicted inverse Mills ratio \(\hat {\uplambda }=\uplambda \left (Z\widehat {\left (\frac { \upgamma } {{\upsigma }_{v}}\right )} -\widehat {\left (\frac {\upalpha } {{\upsigma }_{v}} \right )} I_{h}\right )\).Footnote 6 In the second step, the following regression specification is estimated with ordinary least squares (OLS) on the sample of participating women:

$$ w=X\upbeta +{\uprho}_{uv}{\upsigma}_{u}\hat{\uplambda}+\upeta , $$
(4)

where η is the model error in the wage equation plus the prediction error in the estimation of λ(⋅), which is independent of the included regressors. The selection bias is defined as the conditional mean of the error term u in Eq. (1) for working women:

$$ B=E\left[ u|v>-\left( Z\upgamma -\upalpha I_{h}\right) ,Z,I_{h}\right] ={\uprho}_{uv}{\upsigma}_{u}\uplambda \left( Z\frac{\upgamma} {{\upsigma}_{v}}-\frac{\upalpha} { {\upsigma}_{v}}I_{h}\right) . $$
(5)

The sign of this term depends on nonrandom selection into the labor force as determined by ρ u v , the correlation between the unobserved characteristics in the wage and participation equations. For example, if v represents the unobserved quality of schooling, which positively affects both the probability of work and market wages, then the estimate of the sign of ρ u v and therefore the selection bias are likely to be positive. In other words, a participating woman with a high \(\hat {\uplambda }\) is predicted to have a high v and therefore a high u.

The role of spousal income in the wage equation estimation is also clear. If spousal income negatively affects female participation, then participating women married to high earners (high predicted \(\hat {\uplambda }\)) will be predicted to have a high value of unobservable characteristics v and, under positive selection, a high u.

MR’s (2008) main hypothesis is that the increase in within-gender residual inequality σ u improved selection of females into work, thereby raising the selection bias term B and reducing the observed GG. The GHR model makes the dependence of ρ u v and σ v on residual inequality σ u explicit and therefore enables us to analyze the effect of rising σ u on the overall selection bias term B as well as female participation. This is done in the upcoming subsection Comparison With the MR Specification.

MR Specification

The MR specification omits spousal income I h from the participation Eq. (2). In view of the benchmark model with α≠0 as the data-generating process, the MR specification will produce inconsistent estimates of the wage equation because I h correlates with other observable characteristics, Z, in the data. Formally, if α≠0 but I h is omitted from the probit, the error term v−αI h will correlate with Z, thereby violating the exogeneity assumption of the benchmark model. As a result, the probit estimation of Step 1 will be inconsistent; more importantly, however, the predicted inverse Mill’s ratios will contain an error, thereby introducing an omitted variables problem in the estimation of the wage equation.Footnote 7 To see this, denote thebenchmark estimator of the inverse Mills ratio by \(\hat {\uplambda }=\uplambda \left (Z\widehat {\left (\frac {\upgamma } {{\upsigma }_{v}} \right )} -\widehat {\left (\frac {\upalpha } {{\upsigma }_{v}}\right )} I_{h}\right )\), and denote the estimator based on the incorrect probit specification, \(\Pr \left (L^{\ast } >0|Z\right ) ={\Phi } \left (Z\widetilde {\left (\frac {\upgamma } { {\upsigma }_{v}}\right )} \right ),\) by \(\tilde {\uplambda }=\uplambda \left (Z \widetilde {\left (\frac {\upgamma } {{\upsigma }_{v}}\right )} \right ) \). We can write \(\hat {\uplambda }=\tilde {\uplambda }+\zeta \), where ζ is the error in the predicted inverse Mills ratio (i.e., error in variable). Substituting \( \hat {\uplambda }\) into the wage regression (Eq. (4)) helps illustrate that the second-stage estimation in the MR specification will suffer from an omitted variables problem

$$ w=X\upbeta +{\uprho}_{uv}{\upsigma}_{u}\tilde{\uplambda}+\tilde{\upeta}, $$
(6)

where \(\tilde {\upeta }={\uprho }_{uv}{\upsigma }_{u}\zeta +\upeta \). All estimators will be inconsistent. Although the direction of bias cannot be derived analytically, we provide an intuitive data-based discussion in the upcoming section, Spousal Income and the 1970s Selection.

Empirical Results

We use the benchmark model in Eqs. (1)–(3) to repeat the estimation of wage equations in MR for the 1970s and the 1990s. We follow precisely the methodology in MR, using the Heckman two-step estimation procedure. We find that α significantly differs from 0 in both periods and that spousal income significantly correlates with most of the covariates in the participation equation, which provides support for our specification.

Relative to the benchmark model, the estimates based on the MR specification significantly overstate the role of the rise in the selection bias and understate the role of the decline in discrimination in its broad sense. Furthermore, in contrast to the MR specification, the estimates based on the benchmark model imply that the selection bias is positive for both the 1970s and the 1990s. This finding is highly significant. Furthermore, it actually provides much stronger support for MR’s hypothesis that growing within-gender residual inequality contributed to the GG closing because it makes theoretical implications of growing inequality consistent with increasing labor force participation and strengthens its quantitative impact on the selection bias.

Sample and Summary Statistics

In our sample restrictions and choice of variables, we follow MR (2008: appendix 1), with a single exception that we consider only married women with spouses reported to have positive incomes.Footnote 8 This exception is motivated by the important role of spousal income in the benchmark model, and it is warranted in light of the fact that the closing of the GG was driven almost exclusively by married women. Without this additional restriction, we can accurately replicate the results in MR, and their main results change only slightly with this additional restriction.

Our sample is from the March Current Population Survey (CPS) for 1975–1979 and 1995–1999. We apply the estimation procedure separately to these two periods. Among the regressors in X,we include the location dummy variables (Midwest, South, and West, with Northeast being the omitted category), six education group dummy variables (high school dropouts 0–8, high school dropouts 9–11, high school graduates, college graduates, and advanced degree, with some college being the omitted category), four potential experience terms (e x p = max \(\left \{0, \,\,\textit {age}-\textit {schooling}-7\right \} -15,\frac {exp^{2}}{100},\frac {exp^{3}}{1,000},\frac { exp^{4}}{10,000}\)), and each experience term interacted with education dummy variables. Summary statistics for the two estimation periods are reported in Table 3 in the appendix.

Estimation of the Benchmark Model

We run the probit model on females to estimate their selection into full-time, full-year (FTFY) status. The regressors include spousal income I h , measured in thousands of 2000 dollars, and Z, which includes all the regressors in X as well as the number of children aged 6 and younger.Footnote 9

Wage regressions are estimated on the sample of FTFY employees. The sample is restricted to civilian wage workers with nonmissing and positive wage income. We also exclude those with wage income classified as an outlier, the self-employed, agricultural workers, and private household employees.Footnote 10 We separately estimate the conditional female wage equation given in Eq. (4) and the unconditional male wage equation given by

$$ w^{m}=X{\upbeta}^{m}+u. $$
(7)

Table 4 in the appendix contains the estimated coefficients for the male and female wage equations, given in Eqs. (7) and (4), as well as the female wage equations reestimated for the MR specification, for both periods of estimation. These are discussed in the context of GG accounting in the next two subsections.

The first-step estimates, omitted for brevity, reveal that the marginal effects of spousal income (in thousands of 2000 dollars) on female participation are negative and significant at –0.00245 and –0.00129 for the 1970s and 1990s, respectively. Therefore, we strongly reject the hypothesis that α t = 0 against the alternative, α t >0, for both periods of estimation (p values are .00007 and .00005, respectively).

We also strongly reject the hypothesis that spousal income is independent of the covariates in Z. To do so, we regress spousal income on the components of Z. For both periods, the coefficients on all but a few of the interaction terms are highly significant, with p values below .001.

If one assumes our benchmark model specification as the data-generating process, this evidence supports the conclusion that the MR estimates of the wage equation suffer from inconsistency and will lead to an error in the GG accounting (see discussion in the “MR Specification” section). Next, we show that this error makes a quantitatively important difference for the decomposition of the GG closing.

Gender Gap Accounting

Given the properties of OLS estimators, averaging the fitted wage equations over the appropriate samples gives the observed mean log wages for men and women:

$$\begin{array}{@{}rcl@{}} \bar{w}_{t}^{m} &=&\bar{X}_{t}^{m}\hat{\upbeta}_{t}^{m}, \\ \bar{w}_{t}^{f} &=&\bar{X}_{t}^{f}\left( \hat{\upbeta}_{t}^{m}+\hat{\upgamma} _{t}\right) +\left( \widehat{{\uprho}_{uv}{\upsigma}_{u}}\right)_{t}\overline{ \hat{\uplambda}}_{t}, \end{array} $$

where \(\hat {\upbeta }_{t}^{m}\) is a vector of estimated coefficients in the male equation (Eq. (7)), \(\hat {\upgamma }\equiv \hat {\upbeta }_{t}^{f}-\hat { \upbeta }_{t}^{m}\) denotes the difference in vectors of estimated coefficients in the male equation (Eq. (7)) and the female wage equation (Eq. (4)), \(\left (\widehat {{\uprho }_{uv}{\upsigma }_{u}}\right )_{t}\) is the estimated coefficient on the inverse Mills ratio. Bars denote sample averages. The last term in the fitted female equation gives the average estimated selection bias defined in Eq. (5).

We can then decompose the observed GG at a given point in time into its constituent components according to

$$ GG_{t}=\bar{w}_{t}^{f}-\bar{w}_{t}^{m}=\left( \bar{X}_{t}^{f}-\bar{X} _{t}^{m}\right) \hat{\upbeta}_{t}^{m}+\bar{X}_{t}^{f}\hat{\upgamma}_{t}+\left( \widehat{{\uprho}_{uv}{\upsigma}_{u}}\right)_{t}\overline{\hat{\uplambda}}_{t}. $$
(8)

To facilitate the discussion, we enumerate the terms appearing on the right side of the equation as follows:Footnote 11

  1. 1.

    The female–male difference in terms of observable characteristics, \( \left (\bar {X}_{t}^{f}-\bar {X}_{t}^{m}\right ) \hat {\upbeta }_{t}^{m}\).

  2. 2.

    The female–male difference in terms of market prices applied to female observable characteristics, \(\bar {X}_{t}^{f}\hat {\upgamma }_{t}\).

  3. 3.

    The average value of unobservable characteristics of working women as inferred from their decision to work, \(\left (\widehat {{\uprho }_{uv}{\upsigma }_{u}} \right )_{t}\overline {\hat {\uplambda }}_{t}\)—that is, the estimate of ρ u v σ u E(v|L >0, Z, I h ).

Because we focus on the overall contributions of terms 1–3, our decomposition does not suffer from the identification issues discussed in Oaxaca and Ransom (1999).

Table 1 reports the observed gender gap decomposition, based on Eq. (8), for the late 1970s and the early 1990s. The first panel reports the decomposition implied by the benchmark model. The second panel, discussed in the next section, gives the decomposition implied by the MR specification.

Table 1 Decomposition of GG levels in the 1970s and 1990s

The observed GG for the 1970s is −0.47. More than 100 % of this gap is due to the second term, which captures the effect of the market price difference \(\left (\bar {X}_{70}^{f}\hat {\upgamma }_{70}=-.58\right )\). The average selection bias is positive at 0.12, working to reduce the observed gap. Finally, the contribution of the first term \(\left (\bar {X}_{70}^{f}- \bar {X}_{70}^{m}\right ) \hat {\upbeta }_{70}^{m}\) is close to 0, which can be interpreted in one of two ways: (1) women who selected themselves into formal labor markets were similar to men in terms of their observed characteristics, or (2) those differences did not translate into a significant difference in compensation, as evaluated with male coefficients.

The observed GG for the 1990s is much smaller, at around −0.25. Once again, the gender difference in market prices, applied to observable characteristics, is the single main factor behind the observed gap. The average selection bias is much greater, at 0.21, indicating that the positive selection into labor force in terms of unobserved characteristics became substantially stronger over time. Next, we explore the GG closing in more detail.

The change in the GG can be decomposed exactly into six components—namely, the change in quantity and the change in price for each of the three terms outlined previously:

$$\begin{array}{@{}rcl@{}} && \left( \bar{w}_{90}^{f}-\bar{w}_{90}^{m}\right) -\left( \bar{w}_{70}^{f}- \bar{w}_{70}^{m}\right)\\ && =\left[ \left( \bar{X}_{90s}^{f}-\bar{X}_{90s}^{m}\right) {\small -}\left( \bar{X}_{70s}^{f}-\bar{X}_{70s}^{m}\right) \right] \frac{\hat{\upbeta}_{70s}^{m}+\hat{\upbeta}_{90s}^{m}}{2} \end{array} $$
$$\begin{array}{@{}rcl@{}} &&{\kern10pt}+\frac{\left( \bar{X}_{90s}^{f}-\bar{X}_{90s}^{m}\right) {\small +}\left( \bar{X}_{70s}^{f}-\bar{X}_{70s}^{m}\right)} {2}\left( {\small \hat{\upbeta}}_{90s}^{m}{\small -\hat{ \upbeta}}_{70s}^{m}\right) \\ &&+\left( \bar{X}_{90s}^{f}-\bar{X}_{70s}^{f}\right) \frac{\hat{\upgamma} _{70s}^{w}+\hat{\upgamma}_{90s}^{w}}{2}+\frac{\bar{X}_{90s}^{f}+\bar{X} _{70s}^{f}}{2}\left( \hat{\upgamma}_{90s}^{w}-\hat{\upgamma}_{70s}^{w}\right) \\ && +\frac{\left( \widehat{{\uprho}_{uv}{\upsigma}_{u}}\right)_{90s}+\left( \widehat{{\uprho}_{uv}{\upsigma}_{u}}\right)_{70s}}{2}\left( \overline{\hat{ \uplambda}}_{90s}^{w}-\overline{\hat{\uplambda}}_{70s}^{w}\right) +\left( \left( \widehat{{\uprho}_{uv}{\upsigma}_{u}}\right)_{90s}-\left( \widehat{{\uprho}_{uv}{\upsigma}_{u}}\right)_{70s}\right) \frac{\overline{\hat{\uplambda}}_{90s}^{w}+\overline{\hat{\uplambda}}_{70s}^{w}}{2}. \end{array} $$
(9)

In Table 2, we report the formal decomposition of the GG closing based on Eq. (9).Footnote 12 The decomposition based on the MR specification is reported in the same table for comparison and is discussed in the next section. The change in the observed GG is 0.222, reported in the first row. The sum of terms 1a and 1b gives the exact increase in term 1, \(\left (\bar {X}_{t}^{f}- \bar {X}_{t}^{m}\right ) \hat {\upbeta }_{t}\). This increase is 0.022, and it summarizes the change in the observable characteristics and prices at which these characteristics are valued in markets for male labor. This change is almost entirely due to the closing of the gender difference in terms of observable characteristics (term 1a), and it accounts for 11 % of the total GG closing. In other words, in terms of observable characteristics relative to men, working women fared relatively well in the 1990s. This change can be interpreted as both the change in selection on observable characteristics or the change in investments, such as education. This change, however, is a small part of the story. As expected given our prior discussion of the decomposition of the GG, increases in terms 2 and 3 (\(\bar {X}_{t}^{f}\hat { \upgamma }_{t}\) and \(\hat {\upmu }_{t}\overline {\hat {\uplambda }}_{t}\)) were the main drivers behind the closing of the overall gap. Table 2 reveals that these terms accounted for 51 % (= 9 % + 42 %) and 39 % (= 67 % – 28 %) of the GG closing, respectively.

Table 2 Decomposition of the GG closing, GG 1990s –GG 1970s

Taking a closer look at term 2, we examine whether the increase is due to the change in \(\hat {\upgamma }\) – that is, the decline in discrimination in its broad senseFootnote 13 – or instead due to \(\bar {X}^{f}\) – that is, whether working women’s observable characteristics shifted in favor of those associated with less discrimination. We document that the term \(\bar {X}_{t}^{f}\hat {\upgamma }_{t}, \)which was always negative (because of the negative components of \(\hat {\upgamma }\)) but became less negative over time, increased primarily because of the rise in \(\hat {\upgamma }\). Indeed, this happened not so much because working women’s observable characteristics shifted in favor of those associated with less discrimination (the term \( \left (\bar {X}_{90s}^{f}-\bar {X}_{70s}^{f}\right ) \frac {\hat {\upgamma } _{70s}^{w}+\hat {\upgamma }_{90s}^{w}}{2}\) accounts for only 9 % of the GG closing), but rather because the market valuation of observable characteristics of females partly converged to that of males (the term \( \frac {\bar {X}_{90s}^{f}+\bar {X}_{70s}^{f}}{2}\left (\hat {\upgamma }_{90s}^{w}- \hat {\upgamma }_{70s}^{w}\right ) \) accounts for as much as 42 % of the GG closing). The rise in the components of \(\hat {\upgamma }\) is consistent with the effect of the introduction of anti-discriminatory laws. It is also consistent with the fall in the relative wages in typically male-dominated occupations, the change in the occupational composition of females in favor of high-paying occupations, and the rise in the relative years of actual experience of females (Blau and Kahn 1997, 2007).Footnote 14 Note that we cannot use occupational dummy variables in X because occupations are not observed for nonworking females.

Taking a closer look at term 3, \(\left (\widehat {{\uprho }_{uv}{\upsigma }_{u}} \right )_{t}\overline {\hat {\uplambda }}_{t}\), we examine whether the term increased because of \(\widehat {{\uprho }_{uv}{\upsigma }_{u}}\) or \(\overline {\hat { \uplambda }}\). Our estimates imply that this term grew entirely as a result of the rise in the estimated coefficient on the inverse Mills ratio. Indeed, the term \(\left (\left (\widehat {{\uprho }_{uv}{\upsigma }_{u}}\right )_{90s}-\left (\widehat {{\uprho }_{uv}{\upsigma }_{u}}\right )_{70s}\right ) \frac {\overline {\hat {\uplambda }}_{90s}+\overline {\hat {\uplambda }}_{70s}}{2}\) accounts for 67 % of the overall gap closing. This means that the estimate of ρ u v σ u increased substantially, and the already positive selection of females into the labor force got stronger. In fact, the term \(\frac {\left (\widehat { {\uprho }_{uv}{\upsigma }_{u}}\right )_{90s}+\left (\widehat {{\uprho }_{uv}{\upsigma }_{u}} \right )_{70s}}{2}\left (\overline {\hat {\uplambda }}_{90s}-\overline {\hat {\uplambda }}_{70s}\right ) \) worked against the GG closing, thus dampening the overall effect of the term \(\left (\widehat {{\uprho }_{uv}{\upsigma }_{u}}\right )_{t}\overline {\hat {\uplambda }}_{t}\). The intuition for this is simple. Because the inverse Mills ratio λ decreases in female participation, the effect of increasing female participation on the observed wage gap is negative in the presence of positive selection. It widens the gap and dampens the role of increasing selection bias in accounting for the overall gap closing. This result is important. If selection is erroneously found to be negative for the 1970s, the increase in female participation will be found to diminish the gap via the term \(\frac {\left (\widehat {{\uprho }_{uv}{\upsigma }_{u}}\right )_{70s}}{2}d\overline {\hat {\uplambda }}\), and the role of increasing selection bias in accounting for the closing of the overall gap will be overstated. We will elaborate on this point in the next subsection, when we examine the GG closing decomposition implied by the estimation of the MR specification.

Comparison With the MR Specification

As reported in the section entitled Estimation of Benchmark Model, we found strong evidence favoring the benchmark specification with α>0. We also found that the MR specification violates the exogeneity assumption and leads to inconsistency of wage equation estimates. We now explore the implications of this misspecification for the decomposition of the GG closing decomposition. We report that this misspecification leads to an understatement of the selection bias in the 1970s and consequently overstates the role of the rising selection bias in the GG closing. The implications of the benchmark specification are better aligned with previous findings in the related literature (as noted in the Introduction). Moreover, we explain why the benchmark specification actually lends stronger support to MR’s main hypothesis that rising within-gender inequality (σ u ) contributed to the change in women’s time allocation and the closing of the GG.

To make a fair comparison, we reestimate the MR specification on our sample, which differs from the sample used in MR only because we restrict attention to married women. Note that in the absence of this additional restriction, we can accurately replicate the MR estimates; the sample change does not significantly alter MR’s main message.

To estimate the MR specification, we follow the same two-step procedure as in the estimation of the benchmark model, except we exclude spousal income I h from the probit estimation in the first stage. The wage estimates are reported in Table 4 in the appendix, along with the benchmark estimates. The bias is especially strong in the 1970s, when spousal income played a particularly important role in female participation choice. One critical difference is in the sign of the selection bias, given by the coefficient on the inverse Mills ratio, \(\widehat {{\uprho }_{uv}{\upsigma }_{u}}_{70s}\), shown in bold type in Table 4. Whereas the MR specification estimate is at −0.07, the benchmark model implies a positive estimate of 0.107.Footnote 15 In the next subsection, we outline the data features that are responsible for this discrepancy.

The sign of the selection bias is very important for the theoretical argument underlying the main hypothesis in MR – namely, that the rise in within-gender residual inequality σ u induced changes in female time allocation between market and nonmarket activities and increased the selection bias B = ρ u v σ u λ(⋅), thereby closing the observed gender gap. In the appendix, we derive the effects of σ u on female labor force participation and selection bias, drawing on the GHR model underlying our benchmark specification. We show that both effects crucially depend on the sign of the selection bias. Precisely, we obtain

$$ \frac{\partial} {\partial {\upsigma}_{u}}\Pr \left( L^{\ast} >0|Z,I_{h}\right) =\upphi \left( \frac{Z\upgamma -\upalpha I_{h}}{{\upsigma}_{v}}\right) \left[ -\left( \frac{Z\upgamma -\upalpha I_{h}}{{{\upsigma}_{v}^{2}}}\right) \right] {\uprho}_{uv} $$
(10)

and

$$ \frac{\partial B}{\partial {\upsigma}_{u}}=\left[ \left( \frac{1-{\uprho}_{uv}^{2}} {{\upsigma}_{v}}\right) {\upsigma}_{u}+{\uprho}_{uv}\right] \uplambda +\left[ -{\uplambda}^{\prime} \left( \frac{Z\upgamma -\upalpha I_{h}}{{\upsigma}_{v}}\right) \right] \left( \frac{Z\upgamma -\upalpha I_{h}}{{{\upsigma}_{v}^{2}}}\right) {\uprho}_{uv}^{2}{\upsigma}_{u}. $$
(11)

Because most women (74 %) were out of the labor force in the late 1970s, the average and median probit scores appearing in Eq. (10) were negative, implying a negative bracketed term. Since ϕ is positive, the sign of ρ u v determined the qualitative effect of inequality on participation probability for most women and for a woman with average characteristics. Only under positive selection, as implied by our benchmark model, would rising inequality encourage their participation, making the MR hypothesis consistent with the empirical rise in female participation. Intuitively, with positive selection, an increase in σ u also implies an increase in the variance of the error term v = u−ε, thereby making the low and the high draws of v more likely. For a woman with a negative probit score, high draws of v are needed in order for her to work. Because these draws are now more likely to happen, she is more likely to participate.

Equation (11) reveals that the overall effect of inequality on the selection bias is ambiguous. Even if ρ u v >0, making the first term unambiguously positive, the second term is negative for women with negative probit scores because λ is negative. It is, however, clear that a positive ρ u v makes the effect of inequality on the selection bias more likely to be positive. For values of ρ u v that deem the effect of inequality on the selection bias positive, the effect is stronger for larger values of ρ u v . Computing the average \(\frac {\partial B}{\partial {\upsigma }_{u}}\) based on the estimates of the benchmark model and the MR specification for the 1970s gives 0.8 and 0.18, respectively. In other words, while both specifications imply a positive influence of within-gender inequality on the selection bias, the effect is substantially stronger in the benchmark model.Footnote 16

The GG-level decomposition implied by the estimation of the MR specification is reported in Table 1. Clearly, the contribution of term 1 to the GG is unaffected by model specification in either of the two periods because the estimates \(\hat {\upbeta }_{t}^{m}\) are obtained by running an OLS on male wages. The main difference is that the MR specification attributes a much larger role to the selection bias in accounting for the overall level of the wage gap in the 1970s. Selection bias is estimated to be negative in the 1970s, and it accounts for 17 % of the observed GG. The upshot is that it allocates a smaller role to the term \(\bar {X}_{70s}^{f}\hat {\upgamma }_{70s}\), at only 81 %, in contrast to 124 % as implied by the benchmark model.

The decomposition of the overall gap closing implied by the MR specification is reported in Table 2. The contribution of term 1 to the GG closing is unaffected by the model specification because this term depends only on coefficient estimates in the male wage equation. Our main finding is that, in contrast to our findings, the MR specification significantly understates the role of changes in discrimination and drastically overstates the role of the rise in selection bias. Precisely, term 2 accounts for only 11 % (= 4 % + 7 %) of the GG closing in the misspecified model, whereas the benchmark model attributed a much larger role (51 % = 9 % + 42 %) to that term. Term 3, which captures the overall selection bias, accounts for 78 % (= 82 % – 4 %) of the GG closing, whereas the benchmark model attributed a much smaller role (39 % = 67 % – 28 %) to that term.

Taking a closer look at term 3, we find that the role of its components, \( \frac {\left (\widehat {{\uprho }_{uv}{\upsigma }_{u}}\right )_{90s}+\left (\widehat { {\uprho }_{uv}{\upsigma }_{u}}\right )_{70s}}{2}d\overline {\hat {\uplambda }}\) and \( d\left (\widehat {{\uprho }_{uv}{\upsigma }_{u}}\right ) \frac {\overline {\hat {\uplambda }}_{90s}+\overline {\hat {\uplambda }}_{70s}}{2}\), is overstated (82 % vs. 67 % as predicted by the benchmark model for the first component, with respective percentages of –4 % vs. –28 % for the second component). This result is important. In contrast to the benchmark model, the misspecified model estimates negative selection in the 1970s, \(\left (\widehat {{\uprho }_{uv}{\upsigma }_{u}}\right )_{70s}<0\); hence, the increase in female participation in the data works to close the gap via term \(\frac {\left (\widehat {{\uprho }_{uv}{\upsigma }_{u}}\right )_{70s}}{2}d\overline {\hat { \uplambda }}\), thereby overstating the overall role of increasing selection bias.Footnote 17

Taking a closer look at term 2, we find that the discrepancy is due to the term \(\frac {\bar {X}_{90s}^{f}+\bar {X}_{70s}^{f}}{2}d\hat {\upgamma }^{w}\), which most directly captures the change in discrimination, in its broad sense. Its role is significantly understated by the misspecified model (7 % vs. 42 % found in the benchmark model).

We conclude that the benchmark model makes a much stronger case for the change in discrimination and a weaker case for the increase in the selection bias.

Spousal Income and Selection in the 1970s

As explained in the section entitled MR Specification, if α≠0, the second-stage estimation in the MR specification will suffer from an omitted variables problem, resulting in inconsistent estimators. We also noted that the critical difference is in the estimator of the selection bias term for the 1970s. In this section, we provide a qualitative description of data features responsible for this discrepancy.

Our goal is to understand why omitting spousal income from the probit estimation may reverse the sign of the estimated coefficient on the inverse Mills ratio from positive to negative.

To more effectively convey the intuition, which relies on the empirical relationship between spousal income and the other main determinants of participation, we consider a simple specification of the benchmark model. In this model, college attainment (s) is the only explanatory variable in the wage equation, and participation depends on college attainment (s), spousal income (I h ), and the number of small children (c h). We refer to this model as the “simple benchmark”; we refer to the corresponding specification that omits the spousal income as the “simple MR.”

The 1970s estimates of these two simple models are reported in columns 1–2 and 4–5 of Table 5 in the appendix. As in the case of benchmark specification, the estimated coefficient on the inverse Mills ratio in the simple benchmark model is positive, \(\widehat {{\uprho }_{uv}{\upsigma }_{u}}=0.167\). Assuming that the simple benchmark model is the true data-generating process, this estimator correctly recovers positive selection. When we omit spousal income from the simple benchmark, we estimate negative selection, \( \widetilde {{\uprho }_{uv}{\upsigma }_{u}}=-0.082\). We now examine which data features are responsible for this negative estimate.

Recall that the OLS estimator \(\widetilde {{\uprho }_{uv}{\upsigma }_{u}}\) can be mathematically represented as

$$ \widetilde{{\uprho}_{uv}{\upsigma}_{u}}=\left[ \frac{\text{SD}\left( u\right)} { \text{SD}\left( \tilde{\uplambda}\right)} \right] \left[ \frac{\text{corr} \left( u,\tilde{\uplambda}\right) -\text{corr}\left( u,s\right) \text{corr} \left( s,\tilde{\uplambda}\right)} {1-\text{corr}\left( s,\tilde{\uplambda} \right)^{2}}\right] , $$
(12)

where corr\(\left (u,\tilde {\uplambda }\right )\), corr (u, s) and corr\(\left (s,\tilde {\uplambda }\right )\) denote sample correlations between unobservable characteristics and predicted inverse Mills ratios, unobservable characteristics and college attainment, and college attainment and predicted inverse Mills ratios, in that order; and SD (u) and SD\(\left (\tilde {\uplambda }\right )\) refer to the sample standard deviations of unobservable characteristics and predicted inverse Mills ratios. This expression clarifies that the sign of \(\widetilde{\uprho_{uv}{\upsigma }_{u}}\) is determined by the sign of \(\{\textit {corr}\left (u,\tilde { \uplambda }\right ) -\) corr (u, s) corr\(\left (s,\tilde {\uplambda } \right ) \}\).

Although u is unobservable, the estimates from the simple benchmark model, assumed to generate data, can be used to estimate u for working women:

$$ \hat{u}=\widehat{{\uprho}_{uv}{\upsigma}_{u}}\hat{\uplambda}\left( \underset{+}{I_{h}} ,\underset{-}{s},\underset{+}{ch}\right) +e, $$
(13)

where \(\hat {\uplambda }\) is the predicted inverse Mills ratio in the simple benchmark model, and \(e=w-\hat {w}\) is the residual. In light of positive selection, working women with high predicted inverse Mills ratios \(\hat { \uplambda }\) (i.e., high predicted values for v) are interpreted to also possess the unobservable characteristics that are highly valued in the labor market. The dependence of \(\hat {\uplambda }\) on the observable characteristics is explicitly stated in Eq. (13), and the estimates are given in column 3 of Table 5. Intuitively, women with high \( \hat {u}\) are those women who chose to work despite having low schooling attainment, many young children, and high-earning husbands. This is because the model interprets their decision to participate as a high unobservable value of v and, through positive selection, a high unobservable value for u.

Thus far, we have employed the simple benchmark model to understand the variation of unobservable characteristics in the sample of working women. In light of Eq. (12), we can now obtain the intuition for the negative estimate \(\widetilde {{\uprho }_{uv}{\upsigma }_{u}}\) by closely examining the critical quantity

$$\overset{-0.015}{\text{corr}}\left( \hat{u},\tilde{\uplambda}\left( \underset{-} {s},\underset{+}{ch}\right) \right) -\overset{-0.043}{\text{corr}}\left( \hat{u},s\right) \overset{-0.443}{\text{corr}}\left( s,\tilde{\uplambda}\left( \underset{-}{s},\underset{+}{ch}\right) \right) <0, $$

where we indicated the actual correlations in our data set. It is immediately clear from Eq. (13) that corr\(\left (\hat {u},s\right ) \) is negative.

As in the simple benchmark model, participation is affected positively by schooling and negatively by children in the simple MR model (column 4 of Table 5). Therefore, the model assigns high predicted values of v (i.e., inverse Mills ratios) to women who work despite low schooling attainment and having many young children, explaining why the last correlation in the above expression, corr\(\left (s,\tilde { \uplambda }\right ) \), is negative (column 6, Table 5).

The intuition for the negative relationship between \(\hat {u}\) and \(\tilde { \uplambda }\) is as follows. Substituting for \(\hat {u}\) in this correlation, corr \(\left (\widehat {{\uprho }_{uv}{\upsigma }_{u}}\hat {\uplambda }\left (\underset {+}{I_{h}} ,\underset {-}{s},\underset {+}{ch}\right ) +e,\text {} \tilde {\uplambda }\left (\underset {-}{s},\underset {+}{ch}\right ) \right ) \), we see that the variation in schooling and children in the sample of working women induces \(\hat { \uplambda }\) and \(\tilde {\uplambda }\) to move in the same direction. However, as reported in column 7 of Table 5, high levels of education and low numbers of young children (and therefore a low \(\tilde {\uplambda }\)) also indicate a high-earning spouse. Given the strong positive dependence of \( \hat {\uplambda }\) on spousal income, it follows that high education and low numbers of children also indicate a high value of \(\hat {\uplambda }\). This indirect effect induces \(\hat {\uplambda }\) and \(\tilde {\uplambda }\) to move in opposite directions. This effect is responsible for the negative relationship between \(\hat {u}\) and \(\tilde { \uplambda }\).

This intuition generalizes to the benchmark model. An omission of spousal income in the first step can be thought of as an omitted variable problem in the second step,

$$w=X\upbeta +{\uprho}_{uv}{\upsigma}_{u}\tilde{\uplambda}+\tilde{\upeta}, $$

where \(\tilde {\upeta }={\uprho }_{uv}{\upsigma }_{u}\left (\hat {\uplambda }-\tilde {\uplambda } \right ) +\upeta \). We provided the intuition using the simple model for why \( \hat {\uplambda }\) and \(\tilde {\uplambda }\) move in opposite directions. This implies that the omitted variable \({\uprho }_{uv}{\upsigma }_{u}\left (\hat {\uplambda }- \tilde {\uplambda }\right ) \) correlates inversely with the included variable \( \tilde {\uplambda }\). In light of standard econometric theory, this omission will result in an understatement of the estimated coefficient on \(\tilde { \uplambda }\). In addition, because the omitted variable also varies with all other covariates in the wage regression, all coefficients will generally be inconsistent.

Conclusions

Mulligan and Rubinstein (2008) argued that the selection of females on unobservable characteristics switched from negative in the 1970s to positive in the 1990s, accounting for nearly the entire closing of the GG. This finding, they argued, supports the hypothesis that an exogenous rise in the market value of unobservable characteristics is responsible for increasing inequality within gender and rising equality across genders, both of which took place in the 1980s. However, the rise in the selection bias estimated by MR is so large that it leaves little role for the joint influence of other channels. This makes it difficult to reconcile MR’s findings with much of the previous literature.

We argued that, if one views our benchmark model as the data-generating process, the MR estimates of female wage equations are inconsistent. Omitting spousal income from the participation equation introduces an error in predicted inverse Mills ratios and therefore an omitted variables problem in the estimation of wage equations. This leads to drastically different wage equation estimates and decomposition of the GG closing. The estimates based on our benchmark specification make a much stronger case for the role of declining discrimination—that is, the closing of the gender difference in market prices paid on observable characteristics—and a much weaker case for the increase in the selection bias.

By highlighting the important role of the change in discrimination, the benchmark specification resonates better with the rest of the literature without taking away from the MR main hypothesis that an exogenous rise in the market value of unobservable characteristics generated both across-gender equality and within-gender inequality. If anything, our estimates lend stronger support to this hypothesis by making its implications consistent with the rise in female labor force participation and by making the positive impact of an exogenous increase in σ u on the selection bias substantially stronger.

In light of our findings, an exciting direction for future research is to take a more structured approach and focus on assessing the causal influence of the increase in the market value of unobservable characteristics on the GG closing through its influence on marital matching, female human capital accumulation, and selection into the labor force.