1 Introduction

The reality is that unbalanced and insufficient economic development has become the central contradiction in China. The unfair income distribution is still challenging to solve. In 2018, statistics show that China’s Gini coefficient of disposable income per capita still exceeds 0.4 (i.e., the international alert level). However, the Gini coefficient ignores the rationality of the sources of inequality. We readily accept inequality due to individual efforts but cannot tolerate inequality caused by circumstances such as gender, race, and other factors. In other words, individuals pay more attention to the sources of income inequality than to income inequality itself. An individual’s perception of income inequality mainly depends on whether the source of income is fair (see, e.g., Cappelen et al. 2010).

Roemer’s (1993, 1998) pioneering work on the inequality of opportunity (hereafter “IO”) provides a new perspective for studying the sources of income inequality. All determinants of income, according to Roemer’s paradigm, can be divided into two categories: “circumstances,” which are out of an individual’s control (for example, gender or birthplace), and “efforts,” which are affected by individuals or not exogenous to the individual economically. Thus, income inequality caused by “circumstances” is IO, and income inequality due to “efforts” is inequality of effort.

Following Roemer’s “circumstance–effort” dichotomy, academia has made multi-dimensional progress in exploring the factors affecting IO: On the one hand, focused “circumstance” factors include gender, hukou (the household registration system called hukou in Chinese), and family background largely explain the sources of inequality (see, e.g., Fuller and Vosko 2008; Björklund et al. 2012); on the other hand, focused “effort” factors include diploma, occupation, and others (see, e.g., Checchi and Peragine 2010; Ordine and Rose 2011). Moreover, some scholars are committed to improving methods and fully considering the interaction between circumstance and effort factors to reduce measurement errors (see Jusot et al. 2013).

The existing literature shows that the IO in developing countries is high, accounting for more than 30% of the total income gap. In contrast, the IO in developed countries is generally low, usually explaining less than 20% of the income gap (e.g., Lefranc et al. 2008; Ferreira and Gignoux 2011). Although research design and samples used affected the answer of how much disparity is attributable to opportunity, IO accounts for more than 35% of the income gap in most cases in China (see, e.g., Shi et al. 2018; Li and Lv 2019).

What’s more, gender frequently remains the most prominent circumstance contributor in China. In Golley et al. (2019), gender explains 28% of IO. Based on different databases, Yang et al. (2021) reported that its ratio in IO has increased significantly over time, from 31.63% in 2013 to 36.9% in 2018. According to Li and Lv (2019), the contribution of hukou to IO is 17.51%, 12.42%, 12.8%, and 10.81% in 2008, 2010, 2013, and 2015, respectively.

However, how is the IO formed? Actually, IO occurs both before and after individuals enter the labor market and violates “the level-the-playing-field principle” and “the nondiscrimination principle,” respectively (see Roemer 1998). When the former is also considered a kind of discrimination, we can claim that the IO results from pre-market and post-market discrimination. Indeed, the above existing studies did not distinguish the values pre- and post-market.

On the one hand, post-market discrimination is mainly embodied in “different pay for equal work.” There is some consensus on the key factors causing IO in the existing literature, “circumstance” factors discriminated against principally include gender and hukou (see, e.g., Meng and Zhang 2001; Kübler et al. 2018). As for male–female inequality, the International Labor Organization (ILO) “2018–2019 Global Wage Report” shows that this discrimination is widespread worldwide, and women’s average monthly income is only 80% of men’s.Footnote 1 As for urban–rural inequality, China’s dual labor market caused by the particular hukou system makes it inevitable that urban–rural opportunity inequality in wage premiums and employment (see, e.g., Zhang et al. 2016). Besides, the urban- or rural-born will also significantly impact employees’ occupational mobility (see, e.g., Liu 2012; Yamamoto et al. 2019).

On the other hand, as in “the level-the-playing-field principle,” equality of opportunity does not only mean no discrimination during the competition if work is like a competitive game. However, it also must ensure that everyone has to step into the same “arena” before competing. The same “arena,” based on the life cycle of “education before work,” means that educational opportunities for all should not differ by “circumstances” (see Roemer 1998; Démurger et al. 2009).

Therefore, the pre-market mechanism for IO is the indirect effect in light of the fact that the circumstances create differences in human capital through educational opportunities, which further leads to income inequality in the labor market (Neal and Johnson 1996; Krafft and Alawode 2018). As for male–female inequality especially, there is a common occurrence of “son preference” in rural China that parents will prioritize their sons in education, even if the daughter is hardworking. Then, how big is the effect of pre-market discrimination caused by circumstances? According to Chetty et al. (2014), based on 216 different US commuting zones, individual educational skills can explain 12% of the variation in the intergenerational income elasticity. What’s more, Palomino et al. (2019) show that education on pre-market accounts for up to 30% of total IO in 26 European countries.

In conclusion, we regard the path in which “circumstances” indirectly affect income through educational opportunities, before one’s labor force participation, as the pre-market discrimination channel of IO. The post-market discrimination channel, by comparison, is how “circumstances” directly affect income after individuals enter the labor market.

Generally, this paper has the following main contributions: (1) Existing research does not pay enough attention to the mechanism of IO. Hence we creatively obtain the pre-market and post-market discrimination pathways of IO to measure the impact of IO more comprehensively, which is rare in existing studies. What’s more, we also compare the above channels in male–female and urban–rural opportunity inequality. (2) In methods, we contribute to extending the Oaxaca–Blinder decomposition to estimate specific values of the above channels on IO with China data. (3) Existing research focuses on measuring the value of IO or factors decomposition but lacks theoretical support. Therefore, we try to show pre-market and post-market discrimination further in the formation of IO through the theoretical model based on the discrimination theory and asymmetric information.

2 The model

As mentioned above, we see the impact of “circumstances” on educational opportunities, before one’s labor force participation, as the pre-market discrimination channel in IO. In contrast, the post-market discrimination path is how “circumstances” directly affect one’s income in the labor market.

2.1 Pre-market discrimination

We devote this section to demonstrating pre-market discrimination against “circumstances.” Taking the male–female inequality in educational opportunities as an example, we propose the utility function for family, written as

$$U = C_{t} + \varepsilon \left[ {\Gamma \left( {E_{m,t + 1} ,E_{f,t + 1} } \right) - \delta_{m} E_{m,t + 1} - \delta_{f} E_{f,t + 1} } \right] - \gamma E_{f,t + 1}$$
(1)

where \(C_{t}\) is household consumption in the current period \(t\), \(\varepsilon\) is the discount factor, and \(\Gamma\) is expected return to education, which takes on two parameters: household expected years of schooling for their children that are boys \(E_{m,t + 1}\) and girls \(E_{f,t + 1}\). Let \(\delta_{m}\) and \(\delta_{f}\) denote the marginal cost of education for boys and girls, respectively. \(\gamma\) is the degree of “son preference” in the household, and then a larger \(\gamma\) indicates the more likely the household will support boys’ studies rather than girls.

For simplicity, we assume that \(\Gamma\) satisfies Cobb–Douglas form, that is

$$\Gamma \left( {E_{m,t + 1} ,E_{f,t + 1} } \right) = E_{m,t + 1}^{u} E_{f,t + 1}^{v} ,{\text{with}}\;\left( {0 < u,v < 1} \right)$$
(2)

where both \(u\) and \(v\) are expected return rates on education. The household expectation can affect children’s educational opportunities; a higher expectation implies that the household is more likely to increase investment in children’s education (refer to Sewell and Shah 1968; Spera et al. 2008; Heckman and Kautz 2013). Thus, we use the ratio of household expectations to illustrate the gender gap in access to education (i.e., pre-market discrimination). Using the first-order condition, one can derive

$$E_{f,t + 1} /E_{m,t + 1} = \left( {v/u} \right) \times \left[ {\varepsilon \delta_{m} /\left( {\varepsilon \delta_{f} + \gamma } \right)} \right]$$
(3)

where \(\left( {E_{f,t + 1} /E_{m,t + 1} } \right) < 1\) implies the disadvantages of females in educational opportunities. Moreover, we assume that both \(\delta_{m}\) and \(\delta_{f}\) are constant in time and common to all households. Then, Eq. (3) shows that pre-market discrimination against females is the result of statistical discrimination (i.e., the gender difference in expected return rate, \(v < u\)) or taste-based discrimination in the household (i.e., the patriarchal concept, \(\gamma > 0\)).

2.2 Post-market discrimination

This section builds on the statistical literature (see Arrow 1973) and illustrates post-market discrimination against “circumstances.” We assume, taking the gender in the “circumstances” as an example again, that the enterprise’s output is

$$Y = \lambda \left( {Q^{m} + Q^{f} } \right) - g\left( {S^{m} + S^{f} } \right)$$
(4)

where \(i\) is individual gender, \(i = m\) indicates the male, \(i = f\) indicates the female. \(\lambda\) is the qualified individual’s productivity, with \(\lambda \in \left( {0, 1} \right)\), whereas the productivity of unqualified individuals is zero.\(g\) is the unit production cost and satisfies \(g > 0\).

Considering existing research on discrimination in the labor market (seeCoate and Loury 1993; Conde-Ruiz et al. 2022), we assume that employers observe each individual’s “circumstances” identification (“gender” in here) and signals (e.g., test scores, work experience) to determine whether the individual is qualified and the individual’s wage, given the individual with a higher signal value has the more significant posterior likelihood that he is qualified. Let \(\epsilon \left( {0 \le \varepsilon \le 1} \right)\) and \(\hat{\epsilon }^{i}\) be the signal sent by individuals and the hiring threshold. Thus, employed individuals meet \(\epsilon \ge \hat{\epsilon }^{i}\). Then, one can derive the number of employed \(S^{i}\) and qualified individuals \(Q^{i}\), written as

$$S^{i} = \left[ {\theta^{i} \left( {1 - F_{b} \left( {\hat{\epsilon }^{i} } \right)} \right) + \left( {1 - \theta^{i} } \right)\left( {1 - F_{d} \left( {\hat{\epsilon }^{i} } \right)} \right)} \right]\overline{S}^{i}$$
(5)
$$Q^{i} = \theta^{i} \left( {1 - F_{b} \left( {\hat{\epsilon }^{i} } \right)} \right)\overline{S}^{i}$$
(6)

where \(\theta^{i}\) is the employer’s belief for the proportion of qualified individuals, \(\overline{S}^{i}\) is the total number of job seekers, \(F_{b} \left( {\epsilon } \right)\) (\(F_{d} \left( {\epsilon } \right)\)) is the cumulative distribution function of qualified (unqualified) individuals, and \(f_{b} \left( {\epsilon } \right)\) (\(f_{d} \left( {\epsilon } \right)\)) is the corresponding density function. We, in addition, assume that \(f_{b} \left( {\epsilon } \right)/f_{d} \left( {\epsilon } \right)\) is non-increasing on \(\left[ {0, 1} \right]\). Using the first-order condition, the wage in light of signals, \(I^{i} \left( {\epsilon } \right)\), is given by

$$I^{i} \left( {\epsilon } \right) = \left[ {\lambda \theta f_{b} \left( {\epsilon } \right)/\left( {\theta f_{b} \left( {\epsilon } \right) + \left( {1 - \theta } \right)f_{d} \left( {\epsilon } \right)} \right)} \right] - g$$
(7)

Let \(\Lambda \left( {\theta ,\epsilon } \right) = \theta f_{b} \left( {\epsilon } \right)/\left[ {\theta f_{b} \left( {\epsilon } \right) + \left( {1 - \theta } \right)f_{d} \left( {\epsilon } \right)} \right]\), then we can reach the following:

$$I^{m} \left( {\epsilon } \right) - I^{f} \left( {\epsilon } \right) = \lambda \left[ {\Lambda \left( {\theta^{m} ,\epsilon } \right) - \Lambda \left( {\theta^{f} ,\epsilon } \right)} \right]$$
(8)

Equation (8) shows that given the same productivity and signals after individuals join the labor market, the inequality comes from post-market discrimination against gender due to employer’s prejudice (\(\theta^{m}\) and \(\theta^{f}\)) under incomplete information (indicated by \(\epsilon\)).

3 Methodology

3.1 Inequality of opportunity

In Roemer’s framework (see Roemer 1993, 1998), all determinants of income can be partitioned into two components: “circumstances” (denoted by \(C_{i}\)) and “efforts” (denoted by \(E_{i}\)). Specifically, “circumstances” are beyond individual control (e.g., gender, birthplace, or race), and “efforts” are not economically exogenous to the individual.

In the sense of Roemer, equality of opportunity would be achieved when “circumstances” have no impact on income. Then, individuals can be divided into T exhaustive and mutually exclusive types, where individuals in the same type have the same “circumstances.” Let \(F\left( {Y|C^{t} } \right)\) denote the income distribution for individuals of type t, where \(t = 1, \ldots ,T\). Equality of opportunity demands that the income is independent of “circumstances,” written as

$$F\left( Y \right) = F\left( {Y|C^{t} } \right) = F\left( {Y|C^{l} } \right)$$
(9)

where \(t,l = 1, \ldots ,T\). However, testing whether the above condition holds is difficult. This article adopts the ex ante parametric method since there is no limit on the number of “circumstance” variables (see Ferreira and Gignoux 2011; Davillas and Jones 2020). Assuming income is a function of “circumstances” and “efforts,” we estimate the following log-linearized equation:

$$lnY_{i} = \alpha + \beta C_{i} + \gamma E_{i} + \mu_{i}$$
(10)

where \(Y_{i}\) is the individual income, \(\alpha\) is a constant term, \(C_{i}\) is the matrix of an individual’s “circumstances,” \(E_{i}\) is the matrix of an individual’s “efforts,” and \(\mu_{i}\) is the residual term, which contains unobservable factors such as luck.

To get the counterfactual distribution, we, consistent with most literature works, treat “efforts” as the function of “circumstances”Footnote 2 (see, e.g., Bourguignon et al. 2007; Ferreira and Gignoux 2011), as shown in Eq. (11).

$$E_{i} = \omega + \lambda C_{i} + \nu_{i}$$
(11)

For measuring IO—instead of estimating any causal relationship between income, “circumstances,” and “efforts”—we can write the reduced form by substituting Eq. (11) into Eq. (10), as follows:

$$lnY_{i} = \delta + \psi C_{i} + \varepsilon_{i}$$
(12)

where \(\delta = \alpha + \gamma \omega\), \(\psi = \beta + \gamma \lambda\),\(\varepsilon_{i} = \gamma \nu_{i} + \mu_{i}\).

Then, the counterfactual distribution with the interference of “efforts” removed is

$$\hat{Y}_{i} = \exp \left( {\hat{\delta } + \hat{\psi }C_{i} } \right)$$
(13)

where \(\hat{Y}_{i}\) is the circumstance-conditioned income, \(\hat{\delta }\) and \(\hat{\psi }\) are the estimates of \(\delta\) and \(\psi\) in Eq. (12), respectively. Then, \(\hat{\psi }\) includes the direct and indirect effects of “circumstances.”Footnote 3 Let \({\text{IO}}\) represent the inequality indicator (MLD index). IO—inequality between groups—is computed by \({\text{IO}}\left( {\hat{Y}_{i} } \right)\).Meanwhile, relative IO is \({\text{IO}}_{r} = {\text{IO}}\left( {\hat{Y}_{i} } \right)/{\text{IO}}\left( {Y_{i} } \right)\).

Besides, due to unobservable factors, the above result about inequality is the actual value’s lower bound. Considering that the MLD index is sensitive to low-income samples, this paper also reports the Gini coefficient (sensitive to middle-income samples) and Theil index (sensitive to high-income samples) to ensure the robustness of the results.

This paper uses the Shapley value method to explore the causes and differences of inequality based on the above measurement result. The ideas are as follows:

Firstly, we would exclude anyone “circumstance” variable from the income equation. Secondly, we would estimate the income equation again to obtain a new counterfactual distribution and IO result. At this time, the result no longer includes the influence of the eliminated “circumstance” variable. Finally, differences between results before and after elimination are the contribution of this “circumstance” factor to the IO. Since the contribution value is related to the order in which it is eliminated, this paper uses the average contribution value in each order.

3.2 Discrimination in pre-market and post-market

As in the level-the-playing-field principle proposed by Roemer (1993, 1998), equality of opportunity means no discrimination before and after. However, general decomposition methods cannot include pre-market discrimination. We build on Fleurbaey and Schokkaert (2011) and contribute to extending the Oaxaca–Blinder approach to decompose the pre-market and post-market discrimination channel simultaneously in measuring equality of opportunity. The process is as follows:

For ease of exposition, consider exploring the male–female opportunity inequality (the urban–rural inequality is the same and repeats no more). Suppose all individuals have a commonly distributed error term,\({ }\sigma_{i}\), and then we propose the measure of pre-market discrimination based on (10), for individual \(i\), written as

$${\text{ln}}Y_{i} = \alpha + \beta C_{i}^{m} + \left( {\tau + \phi C_{i}^{m} } \right)C_{i}^{n} + \left( {\rho + \xi C_{i}^{m} } \right)E_{i}^{m} + \left( {\zeta + \varphi C_{i}^{m} } \right)E_{i}^{n} + \sigma_{i}$$
(14)

where \(C_{i}^{m}\) is a dummy indicating the group of individual \(i\), i.e., \(C_{i}^{m} = 1\) if individual \(i\) is male, \(C_{i}^{m} = 0\) if individual \(i\) is female. Let \(E_{i}^{m}\) be individual years of schooling, and let \(C_{i}^{n}\) and \(E_{i}^{n}\) denote other uncontrollable and controllable factors (namely, “circumstances” and “efforts”), respectively.

According to Roemer (1993, 1998), individuals’ income should depend on their relative efforts. To express the relative effort as functions of observable variables in light of previous research works (see, e.g., Checchi and Peragine 2010), define:

$$R_{i} = \Gamma \left( {\Delta^{S} - \Delta^{F} } \right)$$
(15)

where \(R_{i}\) is the individual’s relative effort, \(\Delta^{S}\) is the difference in years of schooling between individuals and their peers and \(\Delta^{F}\) is the difference in years of schooling between the individual’s father and his father’s peers. Indeed, \(\Delta^{S}\) indicates the excess effort of individuals and \(\Delta^{F}\) indicates the excess support of the circumstances. Let quantile function \(\Gamma\) convert the relative effort into the group’s quantile.

If there is no pre-market discrimination, the relative effort rather than gender determines individual years of schooling. Then, the individual’s relative effort can be measured by

$$E_{i}^{m} = \Psi \left( {C_{i}^{m} ,R_{i} } \right) = \eta + \theta C_{i}^{m} + \left( {\vartheta + \iota C_{i}^{m} } \right)R_{i} + \epsilon_{i}$$
(16)

Substitute (15) into (16) and estimate simultaneous formulas consisting of (14) and (16) via 3SLS (three-stage least square) to get IO consisting of pre-market and post-market discrimination, denoted by D:

$$D = \hat{\beta } + \hat{\varphi }{\rm K}\left( {E_{i}^{n} |f} \right) + \hat{\phi }{\rm K}\left( {C_{i}^{n} |f} \right) + \hat{\xi }{\rm K}\left[ {\Psi \left( {m,R_{i} |f} \right)} \right] + \hat{\rho }{\rm K}\left[ {\Psi \left( {m,R_{i} |f} \right)} \right] - \hat{\rho }{\rm K}\left( {E_{i}^{m} |f} \right)$$
(17)

Among them, \({ }\hat{\beta }\), \({ }\hat{\varphi }\), \(\hat{\xi }\), \(\hat{\phi }\), and \(\hat{\rho }\) are the estimated values and \({\rm K}\) indicates sample means, i.e., \({\rm K}\left( {E_{i}^{n} |f} \right)\) is the sample mean of \(E_{i}^{n}\) using only observations where \(C_{i}^{m} = 0\) (individual \(i\) is female). In (17), items 1 to 4 are differences in coefficients resulting from post-market discrimination.

It is worth noting that the last two terms, in (17), are IO attributable to pre-market discrimination. Under the premise of the same effort and return on education, the difference between the two items represents the resultant difference in years of schooling caused by unequal access to education before individuals enter the labor market.

4 Data

The research data in this paper are obtained from the China Labor-force Dynamic Survey (CLDS).Footnote 4 Carried out by Sun Yat-Sen University, the survey is extensive in scale, broad in coverage, and highly targeted, including three levels of labor individuals, families, and communities. It is a comprehensive database with the labor population aged 15–64 as the primary survey object. Given the applicability of the improved decomposition method used in this paper, we processed CLDS into cross section data composed of all samples in CLDS2012 (it is the baseline survey) and new samples in CLDS2014 and CLDS2016.Footnote 5 Then, all monetary variables are deflated by the GDP deflator. Following the principles of data rationality and availability, the selected variables are as follows:

4.1 Income

Considering that discrimination is more closely related to wage, we choose the employee’s wage as a measurement indicator.Footnote 6 The data comes from the questionnaire question, “how much is your wage?” We also take the logarithm of income and perform 1% tailing to obtain a more stable distribution result.

4.2 Circumstances

On the one hand, personal characteristics variables include an employee’s age, ethnicity, gender, area, and hukou. The specific processing is as follows: (1) The equation includes the age and its quadratic term in the basic regression equation to explore the relationship between income and age. (2) We set ethnicity, gender, and area as dummy variables. Separately, the Han takes the value “1,” and other ethnic groups take the value “0. Moreover, “1” is the male and “0” is the female. The east province takes the value “1,” and the central and west provinces take the value “0.” (3) The value of urban hukou is “1,” while rural hukou value is “0,” and the data are obtained from the questionnaire question, “what was hukou at your birth?”.

On the other hand, circumstances also include family background variables. Considering that fathers have more decision-making power in the family (see, e.g., Solon 1999), this paper uses the father’s relevant characteristics to illustrate family backgrounds. Their father’s occupation, education, and communist membership are selected variables. The specific treatment is as follows: (1) The father’s occupation at the age of 14 is chosen as an indicator to measure the family’s economic situation. This paper divides the father’s occupation type into seven levels based on the Chinese social occupation prestige score calculated by Li (2005) and assigns the value of 1–7 from low to high. The data are obtained from the questionnaire questions “father’s occupation at 14?” and “father’s work type at 14?” (2) The data on education are obtained from the questionnaire question, “What is your father’s education” and we convert it to the education years.Footnote 7 (3) The father’s communist membership is set as a dummy variable: “1” is a communist and “0” is others.

4.3 Efforts

Effort variables include an employee’s education, communist membership, work attitude, marriage, and subjective health. The specific processing is as follows: (1) We convert everyone’s diploma into education years, and the data are obtained from the questionnaire question “what is your highest degree.” (2) The communist membership is set as a dummy variable: “1” is a communist and “0” is others. (3) We assign work attitude as a value of 1–4 from low to high according to the questionnaire question, “you agree that even if there are other reasons to take a break, I will work hard to complete what I should do every day.” The higher the willingness, the greater the value. (4) The marital status is set as a dummy variable: “1” is married and “0” is others. (5) We obtain the subjective health status from the questionnaire question “what do you think of your current health” and assign the value of 1–5 according to the health degree from ill to good.Footnote 8

In addition, we also exclude samples with missing values and outliers in critical variables and exclude samples with employees younger than 16 years old or older than 60 years old. Descriptive statistics are shown in Table 1.

Table 1 Descriptive statistics for selected variables

5 Results

5.1 The measurement and influencing factors in IO

How much does IO explain income inequality? How much is the contribution of circumstances, and what is the specific mechanism of IO? Based on the estimated results of Eq. (13), this paper will describe in turn. Besides the MLD index, this paper also reports the Theil index and Gini coefficient results.

From Table 2, IO is a fundamental cause of the income gap. The overall degree is 0.037, accounting for 14.45% of the income gap. From the subsample results, different groups’ characteristics in IO are different: on the one hand, females face higher IO. The IO can explain 16.27% of the male–female income gap, indicating that females need to overcome greater inequality. On the other hand, IO has roughly the same degree in urban- and rural-born, and explanatory power for the income gap is similar. Results show that the government has effectively balanced regional development and broken the urban–rural dual structure in recent years.

Table 2 Income inequality and inequality of opportunity

This paper further uses the Shapley value to decompose each circumstance’s contribution value in the IO to judge the relative importance of each circumstance. From Table 3, we can get that the male–female opportunity inequality has an enormous contribution to the overall IO, accounting for 31.66%, and urban–rural opportunity inequality accounts for 17.16%. What’s more, IO is also profoundly affected by family background, with a total contribution of 35.08%. Among them, the diploma accounts for the most significant proportion.

Table 3 Contribution of circumstances based on Shapley value decomposition

5.2 Decomposition

The above results show that the urban–rural and male–female opportunity inequality is the fundamental cause of the overall income gap. What is the influencing mechanism of circumstances individuals cannot control on the IO? According to Roemer (1998), the IO is explained by pre-market and post-market discrimination channels. The results are shown in Table 4 by estimating Eq. (17).

Table 4 Influencing channels of IO: pre-market and post-market discrimination

On the one hand, pre-market discrimination is the primary channel of urban–rural opportunity inequality. In other words, there is a more significant urban–rural difference in access to education. Hence the urban–rural opportunity inequality mainly comes from the effect of education inequality. As for specific values, the pre-market discrimination accounted for 79.2% of IO, while the post-market discrimination accounted for only 20.8%.

On the other hand, post-market discrimination is the central channel of male–female opportunity inequality. Namely, gender factors have a more significant impact on discrimination in the labor market than on the process of human capital accumulation. Results illustrate that the male–female difference in education acquisition has not been evident, but women are still in a weak position that is vulnerable to employer discrimination. To a certain extent, the above analysis also explains why women’s average education years are higher than men’s, but their income is lower than men’s.

As for subsample results, the primary influence channels have not changed in the east, central, and west regions. The pre-market discrimination always is the main pathway of the urban–rural opportunity inequality, and its proportions in the east and the central and west regions are 71.0% and 85.1%, respectively. For another, post-market discrimination is still the principal mechanism of male–female opportunity inequality.

According to the age-group, the proportion of pre-market discrimination in explaining urban–rural opportunity inequality has weakened over time, indicating that the new generation of employees will be more equitable in access to education. Post-market discrimination has mainly realized male–female opportunity inequality, but its proportion has gradually increased. Discrimination has increasingly become a constraint on improving the male–female income gap. Besides, considering that the decomposition results are affected by the choice of benchmarks, we replace the previous benchmark with urban and male samples, respectively, as shown in columns (3) and (6). By the results, the above conclusion is still valid.

Given that the above decomposition is based on the average income, we further explore the mechanism’s characteristics at different income quantiles shown in Fig. 1. The x-axis represents income quantiles, and the y-axis represents pre-market and post-market discrimination.

Fig. 1
figure 1

Mechanism’s characteristics at different income quantiles

Figure 1a shows that IO is on a downward trend with increased income. Separately, in the low-income group, the post-market discrimination can largely explain IO, which comes after individuals enter the labor market, such as employer prejudice. In contrast, IO gradually manifested as pre-market discrimination in the high-income group, formed in the accumulation stage of human capital.

Figure 1b shows that the change in the mechanism of male–female opportunity inequality is roughly the same as urban–rural inequality. In the low-income group, the post-market discrimination channel accounts for a more significant proportion. With the increase in income, the restrictions on the accumulation of human capital employees face are beginning to appear. Even in some higher-income groups, pre-market discrimination becomes the primary effect pathway.

6 Further discussion

The post-market discrimination is almost always the primary channel, whether it is urban–rural or male–female opportunity inequality. In addition, its proportion has gradually weakened as income increases. Then, what is the reason? We attempt to explain the phenomenon with information asymmetry and employer prejudice mentioned above.

6.1 Information asymmetry

According to the screening theory proposed by Spence (1973), employers cannot fully know an individual’s actual productivity due to asymmetric information. Then, diplomas have the function of sending signals of productivity. Given that the larger the information asymmetry individuals face, the higher the signaling value of education, we can use the signaling value to illustrate the degree of information asymmetry (see Lang and Manove 2011).

In light of data features, the approach we used to obtain signaling value is estimating the sheepskin effect and is a well-established method in existing research (see, e.g., Ferrer and Riddell 2002; Schady 2003; Olfindo 2018). Precisely, we can estimate the extra benefits brought by getting the “sheepskin” (i.e., the diploma) to employees when controlling for their years of schooling (see, e.g., Jaeger and Page 1996; Bauer et al. 2005; Mora and Muro 2008).

Specifically, the coefficient of the diploma variable represents the rate of return on signals, which is the sheepskin effect and reflects the information function of education. The coefficient of years of schooling, by comparison, represents the net return on each additional year of education after deducting the diploma effect, which belongs to the human capital effect and reflects the productive function of education. Moreover, we also ruled out the possible case of multicollinearity in the model.

Table 5 shows that the net educational return rate in rural-born is 2.9%, less than 10.4% of urban-born, while the signaling value of a diploma at all stages is higher than that in urban-born. Specifically, the signaling values of the college, bachelor’s, and master’s degrees are 38.4%, 84.4%, and 54.0%,Footnote 9 and all are significant at the 5% level. It shows that employers face a more significant information asymmetry regarding the rural-born. Then, there is a greater need for employees to send abilities signals to employers through diplomas, and the signaling value of education will be more excellent. Given the neglect of samples with diplomas but who have not entered the labor market, we use the Heckman model to deal with the sample selection bias and use the number of employees’ children as the identification variable.Footnote 10 Although there is indeed the sample selection bias according to the inverse mills ratio at the 10% level, information asymmetry in the rural is even more significant, and the above conclusion is still valid.

Table 5 Urban–rural and male–female differences in the signaling value of education

Besides, the signaling value of education for females is usually greater than that for males. In particular, bachelor’s signaling values for females are 93.1%, respectively. By contrast, males are only 60.2% relatively. According to the inverse mills ratio, there is no significant sample selection bias in the Heckman model.

Considering that the sheepskin effect is more significant in rural and females, we compare the signaling value of different income groups in rural and females to explain further the reasons for the changes in the proportion of mechanism. Simultaneously, to strengthen the conclusion’s conviction, we exclude extreme income samples (below 10% and above 90% quintile).

Table 6 shows that the signaling values of the college, bachelor’s, and master’s degrees in rural-born are 14.5%, 18.5%, and 1.4%, respectively, and all are significant in the low-income group at the 5% or 10% level. By contrast, none of the sheepskin effects in the high-income group is significant. The above results indicate that low-income groups face more severe information asymmetry than high-income groups in the labor market, so the proportion of post-market discrimination channels in low-income groups is greater than that in high-income groups. At the same time, the results of columns (3)–(4) also explain, to some extent, how post-market discrimination channels account for a large proportion of male–female opportunity inequality.

Table 6 Differences in the signaling value of education by income group

6.2 Employer prejudice

We illustrate the post-market discrimination manifested as employer prejudice through the inequality in job opportunities, as shown in Table 7:

Table 7 Urban–rural and male–female differences in job opportunities

Firstly, we set the dummy variable of the workplace. The value is “1” when the employee works in a state agency or state-owned enterprise, and the value is “0” when workplaces are others, such as private enterprises. The results are shown in Column (1). Secondly, we divide the type of job into white-collar and blue-collar occupationsFootnote 11and set dummy variables. The white-collar is “1,” and the blue-collar is “0.” The results are shown in Column (2). Thirdly, testing robustness. We subdivide the workplace into government, state-owned enterprises, private enterprises, foreign-funded enterprises, and social organizations. The results are shown in Columns (3)–(6).

Table 7 shows that the apparent work threshold caused by employer prejudice in the labor market and employer prejudice’s effect on workplaces is more significant than the type of occupation. The odds of urban and male employees entering government or state-owned enterprises are about 1.8 times and 1.35 times that of rural and female employees. At the same time, the occupation type is also affected by employer prejudice, but the influence is not statistically significant. To some extent, the threshold caused by prejudice prevents the social mobility of rural or female employees, causing them to linger in low-income jobs for a long time, which partly explains why post-market discrimination channels account for a higher proportion.

7 Conclusions

Equality of opportunity is an essential foundation of social justice. Based on CLDS data, we measure the degree of opportunity inequality and its influencing factors. Then, we extend the Oaxaca–Blinder approach to estimate pre-market and post-market discrimination to explain the influence mechanism of IO. Besides, we present the mechanism formation from information asymmetry and employer prejudice further. The conclusions are as follows:

(1) IO is a fundamental cause of employees’ income inequality, explaining 14.45% of the total value. IO for females is more significant than for males, and IO for the rural is greater than for the urban. What’s more, IO is primarily manifested in urban–rural and male–female opportunity inequality, accounting for 17.16% and 31.66%, respectively. (2) From the influence mechanism, urban–rural opportunity inequality is mainly realized through the pre-market discrimination channel, while post-market discrimination is the primary channel of male–female opportunity inequality. By comparison, urban–rural opportunity inequality comes more from the pre-market effect of human capital accumulation. Moreover, the primary mechanism for IO has not changed by region. Although the young can get an education more equitably, they suffer severer post-market discrimination, especially the employer prejudice against gender reflected in the labor market. (3) Due to variances in information asymmetry and employer prejudice, the influence mechanism has different characteristics in different income groups. With the increase in income, the proportion of the post-market discrimination channel shows a downward trend. The possible explanation is that the more significant information asymmetry and employer prejudice will lead to employees facing invisible employment thresholds, making them stay in the lower-income jobs for a long time.

Based on the above, the policy enlightenment lies in: on the one hand, equalizing public resources and increasing educational input in backward areas to eliminate pre-market discrimination. It is indispensable to guarantee fairness in vulnerable groups’ educational opportunities, especially the children in rural. On the other hand, the government should improve the dual structure of urban and rural areas and break the hukou barriers to facilitate cross-regional mobility of labor. It is necessary to ensure equal male–female employment opportunities by eliminating post-market discrimination. Besides, to solve the problem of inequity from the source, the government needs to improve the labor market by removing information asymmetry and employer prejudice.