1 Introduction

When a person reaches marriageable age, he/she enters the spouse supply and demand system, and can be chosen, and matched within this system. This spouse-choosing relationship between marriageable males and females is called the marriage market. Choice of a spouse will be affected by the social, economic, and cultural factors, as well as individual characteristics. However, an essential component of the marriage market in a population is the number of males and females. Imbalance between numbers of marriageable males and females entails that some males or females will be unable to choose a spouse according to generally accepted criteria, and this results in the marriage squeeze. When the male supply exceeds the demand, a male marriage squeeze appears (Jiang et al. 2011b).

China’s sex ratio at birth (hereafter SRB) has been consistently high since the 1980s, and many studies relate this male marriage squeeze to an imbalanced sex structure (Attané 2006; Jiang et al. 2011b; Guilmoto 2012; Jiang et al. 2014). The marriage squeeze is affected by not only sex structure but also age structure. When the concept of marriage squeeze was first proposed, the primary contributor was variations in birth cohort as an age structure problem (Akers 1967; Hirschman and Matras 1971; Musham 1974; Heer and Grossbard-Schectman 1981; Schoen 1983). Guilmoto (2012) divided his marriage squeeze indicator (MSI) into two parts, namely age structure and sex structure, and found that the impact of changes in age structure on the marriage squeeze in China will exceed 50 % after 2050. Goodkind (2006) claimed that in the past, present, or future, the main cause of sex imbalance in the peak marital ages lies in age structure, rather than in discrimination against girls before or after they are born.

Both Goodkind (2006) and Guilmoto (2012) employed the same procedure to distinguish the impacts of age and sex structure on the marriage squeeze: project the future population based on a hypothesis of a specific SRB scenario beyond the normal range, and a fertility scenario during the projection period; the marriage squeeze intensity was then measured by an index based on projected population statistics, and this index was used as the total marriage squeeze intensity caused by age and sex structure; assume SRB in the normal range (105 or 106 men for 100 women) to project anew future age and sex structure and measure the intensity of the marriage squeeze. As the second SRB is in the normal range, its resultant marriage squeeze is considered to be mainly due to age structure. The difference between these two measures is regarded as the impact of sex structure on the marriage squeeze. This procedure elucidates the effect of different scenarios of TFRs and SRBs on the extent of the marriage squeeze, but it is not easy to distinguish exactly the respective contributions by age structure and sex structure, as the two structures are interwoven. Even if the fertility rates were to remain constant, with different SRBs the total population’s age and sex structure will be quite different in the long term (Jiang et al. 2011a). Tucker and Van Hook (2013) employed different assumptions about SRBs, fertility levels, and spousal age gaps, and showed that these factors affected the marriage squeeze, but could not separate their effects.

In this paper we develop a simple index of spousal sex ratio (hereafter SSR) to measure the intensity of the marriage squeeze on the basis of spousal age differences, and a method of decomposing the proportion of excess males into contributions due to age and sex structure. After introducing population projection parameters, we present the projected marriage squeeze, as well as age and sex structure contributions from 2010 to 2050. We conclude with a discussion of the ramifications of our analysis.

2 The SSR and Decomposition

Sex ratio has been a dominant index in measuring marriage squeeze. Originally marriage squeeze was measured as the ratio of males to females of prime marriageable ages regardless of their marital status (Akers 1967). Hirschman and Matras (1971) improved the measure by comparing the unmarried females at a certain age to males of the several ages at which males comprise the predominant spouse pool for females of the specified age. Tuljapurkar et al. (1995) used the sex ratio of potential first marriage partners, which is actually a weighted sex ratio, to measure the marriage squeeze in China. Jiang et al. (2011b) improved Tuljapurkar et al.’s (1995) sex ratio by adjusting first marriage frequencies in the base year, eliminating the tempo effects of first marriages, and normalizing total first marriage frequencies. Guilmoto (2012) and Jiang et al. (2014) used longitudinal simulation to project the proportion of never-married males from a cohort perspective. The indexes used by Tuljapurkar et al. (1995), Jiang et al. (2011b), Guilmoto (2012) and Jiang et al. (2014) to explore marriage squeeze cover a wide age range. The age range in Tuljapurkar et al.’s (1995) sex ratio of potential first marriage partners is from 14 to 50, it is 14 to 60 in Jiang et al.’s (2011b) index, and 14 to 50 in Jiang et al.’s (2014) index.

The above mentioned indexes are not appropriate for decomposition into age and sex structure because: first, these indexes cover too wide an age range, which may bias the significance of the impacts of age and sex structure on the marriage squeeze. To be decomposed, the age range should be wide enough to include most marriageable ages and narrow enough to reflect dynamic changes in spouse selection. Second, the potential marriage partners in these indexes are those expected to marry in the marriage market according to an assumed marriage schedule, and the number of expected partners, males or females, are distributed over the age bounds of the marriage schedule, but no matching pattern (i.e., who marries whom and the spousal age difference) is available. Thus, for such a collective marriage pool, we cannot find an age benchmark upon which to locate the age and to analyze age effect. To be decomposed, a clear age pattern of matching should be provided.

Another factor involved in the measurement of marriage squeeze and decomposition is the sex benchmark, namely which sex should be taken as the baseline for measuring the extent of squeeze. In China, due to a higher than normal SRB and male biased sex structure, as well as a universal marriage pattern for females, females are generally taken as a baseline, and a female dominance assumption is generally reasonable and accepted (Guilmoto 2012; Jiang et al. 2014).

In this paper, we develop a new SSR index that employs the female dominance assumption and limits the spousal age difference to a certain interval. Our indexes are constructed as follows:

First, due to the universal marriage pattern for females in China, we adopt the female dominance assumption. As an age benchmark is necessary, we choose the number of females at a certain age, say 30, as the denominator for our developed index of SSR, while the numerator is the number of males falling within the range of spousal age difference weighted by the corresponding normalized distribution of females of this age marrying males with in spousal age difference bounds. The ratio of the numerator to the denominator is our index of SSR.

For the population in a certain year, we take F x and M x as the number of females and males, respectively, at age x. P x+i denotes the proportion of females aged x who are married to males aged x + i among the total number of females aged x. n and m are the upper and lower bounds of the spousal age difference, respectively (m may be negative as females may marry males younger than themselves). For corresponding male cohorts within the spousal age difference, the potential marriage pool for this female cohort is \( \sum\limits_{i = m}^{n} {P_{x + i} M_{x + i} } \) and the SSR is \( R_{x} = \sum\limits_{i = m}^{n} {\frac{{P_{x + i} M_{x + i} }}{{F_{x} }}} \). As spousal age difference bounds may be quite large, but the majority of marriages occur within a concentrated age interval, we limit the spouse age difference to a certain interval (in this paper −1 and 5, namely, the husband being 1 year younger to 5 years older than the given female cohort, which accounts for 80 % of all marriages as shown in Table 1). As females are universally married, we normalize the distribution and make\( \sum\limits_{i = m}^{n} {P_{x + i} = 1} \), as shown in Table 1. Of course, the spousal age difference interval can be widened or narrowed as needed for different populations.

Table 1 Distribution of spousal age differences for females in Shaanxi province in 2012

Unlike the summary indexes used by Tuljapurkar et al. (1995), Jiang et al. (2011b), Guilmoto (2012), and Jiang et al. (2014), which include almost all marriageable ages for both males and females and measure the overall situation in the whole marriage market, SSR is the number of potential males within the spousal age difference interval (in this paper −1 to 5) per women in the birth cohort, given the assumed spousal age difference pattern. It measures the tightness of marriage squeeze for a certain female birth cohort and corresponding marriageable male partners. If SSR is >1, males are in excess relative to the number in the female birth cohort; If SSR is less than 1, males are in relatively short supply relative to the number in this female cohort. Given the relative stability of the spousal age difference pattern (as shown in Table 1) and a universal marriage pattern for females (as shown in Fig. 1), we can compute R x for different birth cohorts (say for cohorts aged from 25 to 45) so as to produce an overall picture for the whole marriage market in a certain year, as the results in Fig. 2 show.

R x −1 is the proportion of surplus males relative to the number of this female cohort aged x given the spousal age difference pattern. It equals the number of excess males per woman and may be positive or negative, if there is a surplus or deficit of males, respectively. As both age and sex structures influence the marriage squeeze, we decompose the proportion into age and sex structure effects, drawing on Das Gupta’s (1993) method which decomposes a rate into two additive effects and does not incorporate interaction effects. Below is our decomposition; further discussion is given in “Appendix

$$ R_{x} - 1\; = \sum\limits_{i = m}^{n} {P_{x + i} \frac{{\frac{{M_{x + i} }}{{F_{x + i} }} + 1}}{2}\left( {\frac{{F_{x + i} }}{{F_{x} }} - 1} \right)} + \sum\limits_{i = m}^{n} {P_{x + i} \frac{{\frac{{F_{x + i} }}{{F_{x} }} + 1}}{2}\left( {\frac{{M_{x + i} }}{{F_{x + i} }} - 1} \right)} $$
(1)
$$ \sum\limits_{i = m}^{n} {P_{x + i} \frac{{\frac{{M_{x + i} }}{{F_{x + i} }} + 1}}{2}\left( {\frac{{F_{x + i} }}{{F_{x} }} - 1} \right)} $$

refers to the age structure effect, and

$$ \sum\limits_{i = m}^{n} {P_{x + i} \frac{{\frac{{F_{x + i} }}{{F_{x} }} + 1}}{2}\left( {\frac{{M_{x + i} }}{{F_{x + i} }} - 1} \right)} $$

represents the sex structure effect.

To elucidate this decomposition, we use a simple example. For two consecutive cohorts, the first cohort consists of 200 males and 200 females, and the second comprises 100 males and 100 females. We can see that in birth cohorts, the sex ratio is in balance. Now if females choose a spouse 1 year older, we can see that the SSR for the females in the second cohort is R x  = 2, namely the number of marriageable males is double of that of the females under question. Then we decompose

$$ R_{x} - 1 = 1 $$
$$ \sum\limits_{i = m}^{n} {P_{x + i} \frac{{\frac{{M_{x + i}}}{{F_{x + i} }} + 1}}{2}} \left( {\frac{{F_{x + i} }}{{F_{x} }} - 1} \right)\;{\text{is}}\; 1 $$

and

$$ \sum\limits_{i = m}^{n} {P_{x + i} \frac{{\frac{{F_{x + i} }}{{F_{x} }} + 1}}{2}\left( {\frac{{M_{x + i} }}{{F_{x + i} }} - 1} \right)} \;{\text{is}}\; 0 $$

This is to say, the male marriage squeeze is totally due to age structure, to the change in the size of birth cohorts, and sex structure (as the sex ratio of both cohorts is 1) contributes nothing to the male marriage squeeze.

3 Projection and Parameters

We use the conventional cohort-component model to project the future population, and regard China’s population as closed, ignoring international migration. We use the software Padis-Int developed by the China Population and Development Research Center and set parameters to project population trends. The software requires the following data as input: age and sex structure in the baseline year, fertility levels and patterns, SRBs, mortality levels and patterns, and migrations. As we take the population to be closed, international migration level and patterns are all set to zero. Data used in the projection are as follows.

3.1 Population Size and Age-Sex-Structure in Base Year 2010

China implemented a population census in 2010. The post-enumeration survey indicates an undercount of 0.12 percent in the census. Although the census data has some problems, it is the latest and most comprehensive, with a total population of 1,332.81 million (PCO 2012). However, the number under 1 year of age is much smaller than that of the number aged one in the census, and much smaller than the projected birth number in 2011 with a total fertility rate of 1.4 (as adopted below), which may indicate under enumeration for the population under 1 year of age in the census. We set the number in the population under 1 year of age as the average of the number aged one in the 2010 census and the projected birth number in 2011, and make the total population 1334.85 million in the baseline year. We adopt the age and structure of the 2010 census.

3.2 Fertility Level and Pattern

China’s total fertility rate is controversial. The total fertility rate (TFR) was 1.22 in the 2000 census, and 1.18 in the 2010 census. The National Bureau of Statistics also adjusted the TFR of 1.22 in 2000 census to 1.4 for internal use (Morgan et al. 2009). Zhao and Chen (2011) accepted a TFR of 1.45 for 2005–2010 in their projection. In the medium scenario of United Nations projection, China’s TFR was set at 1.66 from 2010 to 2015, 1.69 for 2015–2020, and 1.72 for 2020–2025 (United Nations 2013). According to the population at younger ages and projection of births with different TFRs, a TFR of 1.4 is found to ensure a smooth transition between the projected births and younger birth cohorts in the 2010 census. In this paper we adopt two scenarios for TFR, 1.4 and 1.6, and adopt the fertility pattern reported by the 2010 census (PCO 2012).

3.3 Mortality Level and Pattern

The 2010 census provided sufficient data on mortality; the infant mortality rate was 3.82 per thousand; 3.73 per thousand for male infants and 3.92 per thousand for female infants (PCO 2012). Life expectancy was calculated as 77.9 years; with 75.6 for males and 80.4 for females, but these life expectancies may be heavily overestimated due to underreporting of mortality (Huang and Zeng 2013).

Using life tables generated in 1982, 1990, and 2000 (Huang et al. 2008), Jiang et al. (2013) adopted an extension of the Lee-Carter method for limited data (Li et al. 2004) to forecast China’s mortality. Jiang et al. (2013) estimated that life expectancy for Chinese males was 71.32 in 2010, and will be 74.62 in 2030, and female life expectancy of 74.97 in 2010, with 78.64 in 2030. Male life expectancy increases by 0.165 years on average per year, with 0.184 years per year for females. In the present study, we set male life expectancy at 71.32 years in 2010, and to increase by 0.15 years per year, female life expectancy at 74.97 years, and to increase by 0.2 years per year between 2010 and 2050. Based on China’s infant mortality and the general pattern in the United Nations Life Model Table, we obtain the life tables and mortality data.

3.4 SRB

From censuses and sampling surveys before 2010, some scholars are optimistic that China’s SRB has begun to fall (Das Gupta et al. 2009; Guilmoto 2009). The National Population and Family Planning Commission, one of the organizations that monitors SRB, claimed that the SRB in China had been falling for four consecutive years, 120.56 in 2008, 119.45 in 2009, 117.94 in 2010, 117.70 in 2011 (Li 2013). In the 2010 census, the data from the long form show that the SRB was 118.6 in 2010, but it is 117.96 from the short form (PCO 2012). In this paper, we assume the SRB, 117.96 in 2010, will drop linearly to be normal at 106 in 2030 and remain 106 there after.

Due to the uncertain trend of future SRBs, some scholars have adopted multiple scenarios of SRBs in population projection (Guilmoto 2012; Tucker and Van Hook 2013). In this paper, our focus is not on the effect of different SRBs on the marriage market, so we employ just one SRB assumption as mentioned above.

3.5 Spousal Age Difference Pattern and Marriage Pattern

From the definition of SSR and decomposition of R x −1 we can see that the distribution of spousal age difference is essential. The national data from the 2010 census should be a good source, but the published tabulation of 2010 census data does not provide this information (PCO 2012), and personal records are not available. In this paper we adopt the data from the dynamically updated All-Persons Database of Shaanxi Province, which includes data on over 37 million people in Shaanxi Province and takes the family as the unit. Table 1 shows a relatively stable distribution of spousal age difference for married females aged 24–35 in 2012 based on 2.55 million married females. The range of spousal age differences is wide, but for females, over 80 percent lies between −1 (namely, husband is 1 year younger than wife) and 5 (husband is 5 years older than wife), and the distribution remain stable over cohorts. Therefore, we limit spousal age differences to within −1 to 5, and normalize the age differences for females of a specific age.

Figure 1 presents the proportions for males and females ever-married by age from 15 to 50 in the 2000 and 2010 censuses (1 minus the proportion never-married). In 2000, the proportion of ever-married males aged 30 reached 0.896 and that of females aged 30 reached 0.978. In 2010; it was 0.819 for 30-year-old males and 0.912 for 30-year-old females. We can see that over 90 percent of females at 30 are married, so we adopt 30 as the age benchmark and take the number of 30-year-old females as the baseline for the SSR model in this study. Figure 1 also displays changes in the proportion of ever married by age from 2000 to 2010 and indicates a delay in marriage. As females are in short supply and they can choose according to their age-difference criteria, delay in marriage may not affect the spousal age difference pattern significantly; females aged 24–35 exhibit a stable pattern as shown in Table 1.

Fig. 1
figure 1

Age-specific proportions of ever-married in 2000 and 2010. Data source: PCO (2002, 2012)

4 Results

4.1 Marriage Squeeze for a Year

We can assess the marriage squeeze by birth cohorts and project it for every year. We present the results for the years 2010, 2020, 2030, 2040, and 2050 with the TFR scenario of 1.4 in Fig. 2. In 2030, for example, the SSRs fluctuate between 1.1 and 1.3, indicating an excess supply of marriageable males for consecutive female birth cohorts aged <40. One dramatic drop occurs, from 1.2 for the female cohort aged 39 to 0.9 for the female cohort aged 40. For this, we can see from Fig. 4 that the size of the cohort aged 40 is much larger than preceding and following cohorts, making marriageable male partners in short supply relative to the total number of females aged 40. The steep drop for the female cohort aged 40 indicates the importance of variation of cohort size in measuring the marriage squeeze when we focus on a small age difference in the bounds for matching. As the size of cohorts aged over 40 declines, making females of a specific age in excess for males within the spousal age difference bounds, SSRs for ages 40 and older are <1.

Fig. 2
figure 2

SSR by cohort in 5 years

In Fig. 2, another noteworthy phenomenon is the resemblance of curve shapes with a translational displacement. We can see that, for example, the curve of 2020 from age 35 onward resembles the curve from age 25 onward in 2010 in both shape and magnitude. In the definition of \( R_{x} = \sum\limits_{i = m}^{n} {P_{x + i} } \frac{{M_{x + i} }}{{F_{x} }} \), if we project the male and female birth cohorts to n years later,

$$ R_{x + n} = \sum\limits_{i = m}^{n} {P_{x + i} } \frac{{M_{x + i} \frac{{L_{x + i + n}^{m} }}{{L_{x + i}^{m} }}}}{{F_{x} \frac{{L_{x + n}^{f} }}{{L_{x}^{f} }}}} $$

where \( \frac{{L_{x + i + n}^{m} }}{{L_{x + i}^{m} }} \) (i is between m and n) and \( \frac{{L_{x + n}^{f} }}{{L_{x}^{f} }} \) should be very close as the death rates for those ages 25 to 45 are very low, making R x+n (n years later) very approximate to R x . Here \( L_{x}^{m} \) and \( L_{x}^{f} \) are survival rates from birth to age x for males and females, respectively.

Figure 3 presents the decomposition into age and sex structure in 2030 of excess male supply in relative to this female birth cohort. We see that the proportions of male surplus fluctuate between 0.1 and 0.3 for cohorts younger than 40 years old; the sex structure effect gradually declines with older cohorts, whereas the age structure effect increases. For cohorts aged from 40 onward, females are in excess supply and there is a shortage of males; sex structure contributes almost nothing, and the relative shortage of males is completely due to the age structure effect. This coincides with the sex ratios for males and females of the same age which are seen in Fig. 4 to be almost equal to one.

Fig. 3
figure 3

R x −1, age and sex structure effect by cohort in 2030

Fig. 4
figure 4

Numbers (in million) and sex ratio of males and females by cohort in 2030

4.2 Marriage Squeeze over Time

The SSRs for female cohorts age 30 from 2010 to 2050 in the two projection scenarios are shown in Fig. 5, where we see that the male marriage squeeze is not severe in China before 2020 if we use China’s 2010 census data without adjustment. In fact, males are in short supply in some years before 2020. After 2021, SSRs will rise to 1.2 and to 1.2–1.3 from 2021 to 2033. Although there is a sudden decline in 2034, SSRs are between 1.1 and 1.2 from 2035 to 2050.

Fig. 5
figure 5

SSRs for 2010–2050 in Scenarios 1 and 2

For the two TFR scenarios of 1.4 and 1.6, births born before 2040 do not affect the SSRs for the female cohort aged 30; the SSRs remain the same before 2040. For 2040–2045, the SSR in scenario 1 are larger than that in scenario 2 in 2040 and smaller than those in scenario 2 for 2041–2045, but almost the same afterwards. In 2040, as the potential male partners of age 29 increases suddenly in Scenario 2 with a higher TFR for the female cohort of age 30 (marriage pool is from 29 to 35); hence the SSR increases. For 2041–2045, the sizes of female birth cohorts of age 30 increase, but for their potential male partner pools only the sizes of cohorts not older than them in Scenario 2 increase, while those older than them remain the same as in Scenario 1. Hence SSRs are smaller in Scenario 2 than in Scenario 1. After 2045, both the scenarios produce almost the same SSRs.

Figure 6 shows the proportions of surplus males relative to females of the cohort aged 30, as well as age and sex structure effects for the two TFR scenarios. Since the curve for Scenario 2 is similar to that for Scenario 1, except for those years discussed above for Fig. 5, here we depict the curve in Scenario 1. We can see that the male shortage from 2010 to 2020 is basically caused by age structure. However, after 2020, male surplus is caused by both age and sex structure. Between 2020 and 2034, the effect of age structure declines and that of sex structure increases in the total male surplus. From 2034 to 2045, the main cause of male surplus is the sex structure, while age structure has little effect on, or even reduces the extent of male surplus (negative age structure contribution). After 2045, the effect of age structure gradually increases, and the influence of sex structure declines. However, on the whole, sex structure is still the dominant cause of the male surplus.

Fig. 6
figure 6

R x −1, age, and sex structure effect for 2010–2050 in Scenarios 1(S1) and 2(S2)

We found that there will be a shortage rather than an excess of males in the years from 2010 to 2020. For example, the SSR is 0.857 in 2017, which means that the number of marriageable males within the spousal age difference is 14.3 % less than the number of females aged 30. Checking the male and female populations aged 29–35 in 2017 in Table 2, the number of females aged 30 in 2017 will be 12.77 million, whereas the number of males aged 31 will be 11.22 million and males aged 32 and 33 will only be 9.9 million and 9.81 million respectively. Thus changes in population numbers at different ages are great, resulting in the large shortage of males within this small age interval.

Table 2 Population numbers aged 29-35 in 2017 (millions)

5 Conclusion and Discussion

In literature on the marriage squeeze, age structure has largely been neglected. Based on previous studies that have attempted to separate the contributions of age and sex structure to the marriage squeeze in China, we develop a new index, SSR, and a decomposition method, and project China’s marriage squeeze, as well as age and sex structure from 2010 to 2050.

We show that China’s marriage market will be confronted with a relatively severe male squeeze. For female cohorts aged 30, the proportion of excess males relative to this female cohort is 0.1–0.3 annually from 2020 onward (in Fig. 5). Beyond this single cohort, we can provide results for cohorts of different ages, thus illustrating the whole marriage market. Take the 2030 year in Fig. 2 as an example; female cohorts aged 25–40 are in shortage and males are in excess. This finding supports previous predictions of a severe male marriage squeeze by Jiang et al. (2011b), Guilmoto (2012), Jiang et al. (2014). But unlike in their studies, our method does not provide an exact value of extent of the marriage squeeze for the whole marriage market in a specific year.

The decomposition results indicate that both age structure and sex structure contribute to marriage squeeze. Age structure will be dominant in 2010–2020. Between 2020 and 2034, the effect of age structure declines and that of sex structure increases in the total proportion of male surplus. From 2034 onward, the main cause of male surplus is the sex structure, while age structure has no or little effect on male surplus.

The two different scenarios of TFRs in this paper do not exhibit large differences except for several years. In the definition \( R_{x} = \sum\limits_{i = m}^{n} {P_{x + i} } \frac{{M_{x + i} }}{{F_{x} }} \), if the number of males and females increase as a result of a higher TFR, say in this case from 1.4 to 1.6, both males and females are scaled up by a coefficient C, then \( R_{x} = \sum\limits_{i = m}^{n} {P_{x + i} } \frac{{C \times M_{x + i} }}{{C \times F_{x} }} \) remains the same.

The dramatic fluctuation of SSR and age structure effect at some ages in a year or for the female cohorts aged 30 in different years demonstrates the contribution of variations in birth cohort sizes to the marriage squeeze, especially when the measures are limited to a small age interval. Bergstrom and Lam (1989a) pointed out that the reason for most marriage squeezes lies in size variations of birth cohorts. During the years 1885–1940 in Sweden, the sex ratio of males to females with an age difference of 3 years changed radically, jumping to 1.25 from 0.9 within 5 years as a result of large variation in birth cohorts (Bergstrom and Lam 1989b). Our index, limited to a small age interval, is sensitive to cohort size variation. Numerous researchers have considered the possible effects of cohort size on resources available to children (Lam and Marteleto 2008). Marriageable partners are surely a kind of resource, and dramatic changes in the size of cohorts can create strong competition for this resource. Summary indexes like those used by Tuljapurkar et al. (1995) and Jiang et al. (2011b) reflect the situation for the whole marriage market, but do not distinguish differences among different cohorts. Beyond the overall marriage market, our results show the importance of cohort variation in the marriage market.

One limitation of our index is that we fix the intervals of spousal age differences. Different strategies will certainly be adopted in response to sex imbalance in the marriage market, including delaying marriage, or changing the preferred age criteria. Thus the marriage squeeze will likely also affect spousal age difference intervals. But as mentioned above, in a primarily female-shortage market, females dominate the marriage market and they choose spouses according to their traditional age criteria. Figure 1 shows the delay from 2000 to 2010, and we see from Table 1 that the age difference distribution in marriage for females shows little variation over cohorts. Therefore, we don’t take into consideration the inverse effect of marriage squeeze on spousal age difference distribution. Another limitation is data source and availability. The accuracy of the age and sex structure in the baseline year affects our results and conclusions. The post-enumeration survey estimated a 0.12 % net under-enumeration rate. But statements about census data quality should be regarded with caution (Cai 2013). The male marriage squeeze would be overestimated if there were a large number of females underreported in the 2010 census. Different from previous studies by Jiang et al. (2011b), Guilmoto (2012), and Jiang et al. (2014), who claimed the existence of male marriage squeeze from 2010, our results do not predict a severe male excess, but rather a male shortage for several years. We see that the male shortage from 2010 to 2020 is basically caused by age structure. This may be due to the accuracy of the 2010 census. Further, in analyzing the distribution of spousal age difference, we used data from Shaanxi Province, as the tabulations of 2010 national census data (PCO 2012) do not include this information, and the authors have no access to individual records from the census. A national dataset is of course more representative, but the Shaanxi data is the best we could get. Another limitation is the heterosexual marriage norm in this paper.

China’s male surplus has aroused wide concern with respect to socio-economic development and stability. The Chinese government has realized the severity of the imbalanced sex structure and has adopted a series of measures to combat gender discrimination, such as the nationwide “Care for Girls” program (Murphy 2014). Initially these measures brought about some positive results; however, the program’s long-term effectiveness has been undermined by various factors (Greenhalgh 2013). Even programs like this may take effect, they would affect future marriage market only when those newborns under those programs reach marriageable ages. Nevertheless, those existent male surplus is an unavoidable topic in Chinese society. Partly due to male surplus and female deficit, recent bride-price and marriage expenses by the male side are skyrocketing, attitudes toward sons have begun to change, son preferences have begun to weaken and women have acquired more bargaining power in the marriage market (Jiang et al. 2015). The traditional hypergamous marriage pattern and free mobility enable females to migrate to prosperous areas by marriage, and leave males in poverty-stricken areas unable to find a spouse. These destitute males are at disadvantage in many aspects such as educational attainment, social status. They are concentrated in backward areas and can form bare-branch villages (Jiang and Sánchez-Barricarte 2012). Though their potential effect on social stability is to be seen, historical records has proved the correlation of male congregation and social disorder. China’s marriage squeeze and male surplus have changed the society radically, and will confront Chinese society for decades.