1 Introduction

The ratio of male live births to female live births in China has been significantly higher since the 1980s. This trend is partially due to a Confucian patrilineal family structure that is centrally embedded in China’s culture and leads families to view sons as insurance and daughters as liabilities (Jiang et al., 2011). Family planning policies in China, however, have further entrenched a pattern of selective childbirth. Sex selective abortions and the drowning of female babies are on the rise, especially in rural areas (Yu et al., 2019). China’s National Population Development Strategy Report (2007) predicts that in the future, there will be 30 million surplus males between the ages of 20 and 45 and that the number may even be as high as 40 to 50 million (Chen & Mueller, 2002).

The growing preference for male children is producing an inevitable severe male marriage squeeze in the marriage market. The costs to society of the male marriage squeeze are enormous. One direct consequence is that the marriage squeeze will pass from the upper echelons of the social structure to the bottom layer, producing a large number of bachelors in poor rural areas (Goodkind and Branch, 2006; Wei et al., 2013). The so-called bachelor class will occupy the bottom of Chinese society, endangering their health and well-being. The marriage squeeze is also anticipated to introduce anomie in the marriage market, which not only contributes to marital instability and an increasing divorce rate but also the emergence of a large number of heterogeneous marriages (Ebenstein and Sharygin, 2009; Yang and Fu, 2016). Some predict that the male marriage squeeze may result in a “risky society” characterized by sex crimes, the proliferation of underground sex industries, and the spread of AIDS and other sexually transmitted diseases (Li et al., 2014).

Understanding the trend in the marriage squeeze in Chinese society and its social risks has garnered substantial attention in academic circles. First documented by Tuljapurkar et al. (1995), researchers have devised various measures of a marriage squeeze. Traditional measures can be divided into two categories, one is the sex ratio indicator that takes into account the age structure and gender characteristics of the population, and the other is the scale indicator obtained by constructing a nuptiality table.

The sex ratio indicators used to measure the marriage squeeze include the same-age sex ratio, the relative sex ratio, and the single population sex ratio (Chen & Mueller, 2002), while the most widely used is the potential first marriage ratio, first proposed by Tuljapurkar et al. (1995). Jiang et al. (2011) further revised this index, including by introducing the sex ratio of the population by age in the baseline year, standardizing the first marriage rate, and de-trending the effects of the sex ratio on the first marriage rate. Then, they proposed an improved model for the never-married population in the first marriage market. Another commonly used sex ratio indicator is the “spousal sex ratio” proposed by Guo and Deng (2000) and Chen and Mueller (2000). In combination with a spousal choice model for first marriages, Chen (2010) constructed an analysis table for the spousal sex ratio and studied the problems of spousal selection and the sex ratio among young and middle-aged individuals. Guilmoto (2012) and Jiang et al., (2013, 2014, 2016) decomposed the spousal sex ratio into two dimensions—the age structure and the sex structure—and studied the contribution of these two factors to the marriage squeeze in different circumstances.

Regarding the nuptiality table indicators, Schoen (1983) constructed a marriage squeeze index and the proportion of marriages “lost” to the marriage squeeze using a two-sex nuptiality-mortality lifetable. Chen and Mueller (2000) proposed the marriage equilibrium index and first marriage squeeze index to measure the tightness in the marriage market. Taking first marriage events into consideration together with mortality, Guo et al. (2016) devised a male multiple-decrement life table to study the characteristics of the marriage squeeze among rural Chinese males by period and cohort by focusing on hypothetical cohort indicators such as the distribution of the age-specific first marriage probability, the mean age at first marriage, the survival rate for unmarried men, the expected lifetime marriage rate, and the expected number of years of marriage.

In previous academic research and scholarly literature, the measurement indexes often capture the overall degree of tightness in the marriage market without depicting the distribution of the marriage squeeze across different age cohorts, so these indicators cannot fully explain the causes of the marriage squeeze and the trends in its development and changes. In addition, the construction of these indicators usually involves strict assumptions or requires detailed empirical support. For example, the spousal sex ratio requires setting the age gap between couples, and the compilation of nuptiality tables requires detailed marriage data.

Based on a model of the potential first marriage ratio, we devise a simple and effective measurement for the male marriage squeeze. Our index not only captures the intensity of the marriage squeeze in the overall population but also specifically within each age cohort, and it also has some unique advantages in measuring the intensity of the marriage squeeze. We only need to predict the population size and age structure under the abnormal sex ratio at birth (SRB), in contrast to Guilmoto’s (2012) calculation based on the population prediction results from two SRB. In addition, we avoid the need to set and estimate the age structure for other gender ratio indicators. The existing marriage squeeze measures are mostly based on the ratios of male and female cohorts of the same age. Although mating-age differences are considered, such an age difference is generally assumed to be constant and to cover a relatively small range. For example, a common hypothesis is that women on average are five years younger to two years older than the men they pair with. In fact, however, the age difference between couples changes with the age of the husband, and the resulting marriage squeeze measure also varies. Our indicator does not require us to make assumptions and place restrictions on the age gap between couples but takes the population of all ages in the marriage market into consideration, and the probability of men and women of different ages pairing together is implicit in the first marriage patterns.

By using this improved potential first marriage rate index, we conduct an empirical study on the intensity of the squeeze in the Chinese marriage market from 2010 to 2019 and then estimate the trends in its development from 2020 to 2050 in combination with population forecasting. By using this improved first marriage rate index, we first evaluate the intensity of the squeeze in the Chinese marriage market from 2010 to 2019, nationally and across rural versus urban areas. We then rely on population forecasting to predict trends in the marriage market from 2020 to 2050. The remainder of this paper is organized as follows: Section 2 defines the preliminary methodology; Section 3 presents the data sources; Section 4 presents the results; Section 5 concludes.

2 Methods

2.1 Improved Marriage Squeeze Indicator

The marriage squeeze index proposed by Tuljapurkar et al. (1995) has two important principles: one is to establish a base period with which to compare the intensity of the marriage squeeze (assuming that the intensity of the marriage squeeze in the base period is 1); the second is to consider the first marriage rate by age cohort, that is, to use the age-specific first marriage rate to estimate the potential first marriage population. On this basis, a squeeze index that captured potential first marriages between men and women in the observation period (period t) was constructed.

This indicator has two potential shortcomings. One is that the indicator implicitly assumes that a relative increase in the number of potential marriages within different age cohorts will lead to marriage squeezes of the same intensity (that is, it simply adds up the number of potential marriages in each age group). In fact, a relative increase in the number of potential marriages with the different age groups produces different marriage squeezes with different intensities. Second, a change in this indicator does not reflect a change in the intensity of the marriage squeeze in a given age cohort (that is, the original indicator only reflects the impact of the change in the total number of potential marriages); in fact, a marriage squeeze is first produced by changes in the number of potentially married individuals in different cohorts.

According to the actual situation in China, the future marriage squeeze in China will be caused first by an increase in the number of men available for potential first marriages in the younger cohorts. Then, as time passes, the marriage squeeze within the younger cohorts will be gradually passed on to the older cohorts and will eventually be reflected in the entire marriage market. This phenomenon will last for decades due to the persistent anomalous SRB in China. As the male populations in the younger cohorts increase significantly relative to the female populations in the same age cohorts, the relative surplus of males available for potential first marriages will gradually evolve to include every age cohort, thus forming the so-called full male marriage squeeze (that is, a marriage squeeze among all males available for a first marriage in each age groups).

Therefore, we propose an improved measure of the male marriage squeeze based on the Tuljapurkar et al. (1995) marriage squeeze measure (the corresponding measurement for a female marriage squeeze can be obtained through a similar method).

We denote the number of males and females of age \(i\) in the baseline year by \({MP}_{i}\left(0\right)\) and \({FP}_{i}\left(0\right)\), respectively, and the first marriage rates by \({MPR}_{i}\left(0\right)\) and \({FPR}_{i}\left(0\right)\) for males and females, respectively. Note that 0 represents the baseline year. Suppose the age range of the population in the marriage market is from 15 to 60 years old; then, we obtain the male marriage squeeze index for year t, \(MS\left(t\right)\):

$$MS\left(t\right)=\sum_{i=15}^{60}\overline{{MPR }_{i}\left(0\right)}\bullet {MS}_{i}\left(t\right)$$
(1)

where \(\overline{{MPR }_{i}\left(0\right)}\) denotes the normalization of \({MPR}_{i}\left(0\right)\):

$$\overline{{MPR}_i\left(0\right)}=\frac{{MPR}_i\left(0\right)}{\sum_{i=15}^{60}{MPR}_i\left(0\right)},\begin{array}{c}i=15,\cdots,60\end{array}$$
(2)

The marriage squeeze index for age \(i\) in year \(t\) can be expressed as \({MS}_{i}\left(t\right)\):

$${MS}_i\left(t\right)=\frac{\left[{MP}_i\left(t\right)/{MP}_i\left(0\right)\right]}{\sum_{i=15}^{60}\left[{FP}_i\left(t\right)\bullet{FPR}_i\left(0\right)\right]/\sum_{i=15}^{60}{FP}_i\left(0\right)\bullet{FPR}_i\left(0\right)},\begin{array}{c}i=15,\cdots,60\end{array}$$
(3)

Furthermore, we denote the numbers of rural males, rural females, urban males, and urban females of age \(i\) in the baseline year by \({MP}_{i}^{1}\left(0\right)\), \({FP}_{i}^{1}\left(0\right)\), \({MP}_{i}^{2}\left(0\right)\), and \({FP}_{i}^{2}\left(0\right)\) and the first marriage rates by \({MPR}_{i}^{1}\left(0\right)\), \({FPR}_{i}^{1}\left(0\right)\), \({MPR}_{i}^{2}\left(0\right)\), and \({FPR}_{i}^{2}\left(0\right)\), respectively. Then, we can obtain the marriage index for rural and urban areas in year \(t\) as \({MS}^{1}\left(t\right)\) and \({MS}^{2}\left(t\right)\):

$${MS}^{1}\left(t\right)=\sum_{i=15}^{60}\overline{{{MPR }_{i}}^{1}\left(0\right)}\bullet {MS}_{i}^{1}\left(t\right)$$
(4)
$${MS}^{2}\left(t\right)=\sum_{i=15}^{60}\overline{{{MPR }_{i}}^{2}\left(0\right)}\bullet {MS}_{i}^{2}\left(t\right)$$
(5)
$$\overline{{MPR}_i^1\left(0\right)}=\frac{{MPR}_i^1(0)}{\sum_{i=15}^{60}{MPR}_i^1(0)},\begin{array}{c}i=15,\cdots,60\end{array}$$
(6)
$$\overline{{MPR}_i^2\left(0\right)}=\frac{{MPR}_i^2(0)}{\sum_{i=15}^{60}{MPR}_i^2(0)},\begin{array}{c}i=15,\cdots,60\end{array}$$
(7)
$${MS}_i^1\left(t\right)=\frac{\left[{MP}_i^1\left(t\right)/{MP}_i^1\left(0\right)\right]}{\sum_{i=15}^{60}\left[{FP}_i^1\left(t\right)\bullet{FPR}_i^1\left(0\right)\right]/\sum_{i=15}^{60}\left[{FP}_i^1\left(0\right)\bullet{FPR}_i^1\left(0\right)\right]},\begin{array}{c}i=15,\cdots,60\end{array}$$
(8)
$${MS}_i^2\left(t\right)=\frac{\left[{MP}_i^2\left(t\right)/{MP}_i^2\left(0\right)\right]}{\sum_{i=15}^{60}\left[{FP}_i^2\left(t\right)\bullet{FPR}_i^2\left(0\right)\right]/\sum_{i=15}^{60}\left[{FP}_i^2\left(0\right)\bullet{FPR}_i^2\left(0\right)\right]},\begin{array}{c}i=15,\cdots,60\end{array}$$
(9)

Note: There are four implications that follow. First, the overall marriage squeeze indexes are obtained by weighting the age-cohort marriage squeeze indexes. We use the rural marriage squeeze index as an example, and Eq. (8) can be further expressed as follows:

$${MS}_{i}^{1}\left(t\right)=\frac{\left({MP}_{i}^{1}\left(t\right)\bullet {MPR}_{i}^{1}(0)\right)/{\sum }_{i=15}^{60}\left[{FP}_{i}^{1}\left(t\right)\bullet {FPR}_{i}^{1}\left(0\right)\right]}{\left({MP}_{i}^{1}\left(0\right)\bullet {MPR}_{i}^{1}(0)\right)/{\sum }_{i=15}^{60}\left[{FP}_{i}^{1}\left(0\right)\bullet {FPR}_{i}^{1}\left(0\right)\right]}$$
(10)

The numerator is the ratio of the number of males of age \(i\) available for a potential first marriage to the number of females of all ages available for a potential first marriage in year \(t\), and the denominator is the ratio of the number of males of age \(i\) available for a potential first marriage to the number of females of all ages available for a potential first marriage in the baseline year. When the index is 1, all men and women in the marriage market can find a match; when the index is greater than 1, there are more marriageable-age men than marriageable-age women in the market. Women become a scarce resource in the marriage market, while men are a surplus resource. The higher the value of the index, the greater the scarcity of women of marriageable age.

The intensity of the marriage squeeze and its changes across different cohorts can be further distinguished by using age-specific marriage squeeze indicators, with the overall squeeze index being calculated by weighting the age-specific indexes. The reason for this weighting is that the intensity of the marriage squeeze among those of different ages is diverse even if the size of the potential first marriage population is the same, and marriage squeeze intensity will be greater for cohorts with a higher rate of first marriages.

An extreme case is that if the abnormal SRB only occurs in a certain year and the SRB in other years remains unchanged, then a marriage squeeze will arise when males born in the abnormal-SRB year enter the marriage market, and the intensity of the squeeze will depend on the age of the cohort.

Second, the above equation implies that the base period’s marriage squeeze measure is 1 (nationwide and in urban and rural areas for each cohort), and the intensity of the marriage squeeze in period \(t\) is relative to that in the base period. This is an endogenous defect of the potential first marriage ratio indicator, which may lead to an underestimation of the intensity of marriage squeeze because there are still more marriageable males than females in the baseline year. Here, we refer to the adjustment of Jiang et al. (2011) by introducing the age-specific sex ratios in each year. We denote the sex ratio of age \(i\) for rural and urban areas in year \(t\) as \({SR}_{i}^{1}\left(t\right)\) and \({SR}_{i}^{2}\left(t\right)\):

$${SR}_i^1\left(t\right)=\frac{{MP}_i^1\left(t\right)}{{FP}_i^1\left(t\right)},\begin{array}{cc}SR_i^2\left(t\right)=\frac{{MP}_i^2\left(t\right)}{{FP}_i^2\left(t\right)}&i=15,\cdots,60\end{array}$$
(11)

Then, the adjusted indicators can be obtained by:

$${MS}^{1\ast}\left(t\right)=\sum_{i=15}^{60}\overline{{MPR}_i^1\left(0\right)}\bullet{MS}_i^1\left(t\right)\cdot{SR}_i^1\left(t\right)\begin{array}{c}i=15,\cdots,60\end{array}$$
(12)
$${MS}^{2\ast}\left(t\right)=\sum_{i=15}^{60}\overline{{MPR}_i^2\left(0\right)}\bullet{MS}_i^2\left(t\right)\cdot{SR}_i^2\left(t\right)\begin{array}{c}i=15,\cdots,60\end{array}$$
(13)

Third, the equation also implies that the age-specific first marriage rate remains unchanged. Obviously, the first marriage rate can be affected when the intensity of the cohort-specific marriage squeeze increases or the socioeconomic environment changes, so the marriage squeeze measure we propose here is still essentially a static measure.

Finally, the marriage squeeze index we propose here is for males, and it can be converted to a female marriage squeeze measure by changing the relevant variables.

2.2 The Urban–Rural Difference Coefficient

In Chinese society, it is common for women to marry men with higher socioeconomic status and education levels (Yang, 2019; Yu et al., 2019). Because the permanent urban population has significantly higher economic status and social status than the rural population, the chances of establishing a marriage favor those in urban areas. Thus, surplus males in urban areas can seek out marriageable females from rural areas and lead marriageable females to move from rural areas to urban areas. Such changes alleviate the intensity of the urban male marriage squeeze, but exacerbate the rural male marriage squeeze. Our marriage squeeze indicator accounts for rural and urban differences and is described in more detail in the Appendix.

3 Data

Data come from the population census held in China in 2010 (referred to as “the Sixth Census”). In preliminary analysis, we evaluated differences between the fifth and the sixth census. There were no significant statistical differences, which suggests the sixth census data is reliable.

Census information from 2010 makes it possible to observe the marriage squeeze between 2010 and 2020, but a ten-year period is relatively short. Therefore, we consider 2010 as the baseline year and refer to the first marriage patterns in the base period to predict the intensity of the male marriage squeeze between urban and rural areas in China over the next 30 years. For this reason, it is necessary to predict the size and structure of the population from 2020 to 2050, including by urban and rural area, sex, and age. Here, we select the population migration model based on the Leslie matrix, referring to the work of Li and Li (2012).

In the sixth census, the range of age at first marriage is 15 to 40, which is not complete. Therefore, we derive a measure applicable to the age range 15 to 65 years old. This is accomplished by calculating the share of unmarried individuals by the age-specific unmarried population and the total population and subtracting the age-specific unmarried share from 1 to obtain the age-specific share of married individuals. Using the hypothetical cohort method, the age-specific marriage rate calculated from the census data can be approximated as the cumulative share of the first married population, and the cumulative share of the first married population can be differenced to obtain the age-specific first marriage rate. Due to the use of the hypothetical cohort method, the calculated age-specific first marriage rate may take on abnormal values. Cubic spline interpolation is performed, and the first marriage rates in the 61–65 age cohort are truncated. The estimated first marriage rates by urban versus rural areas, sex, and age are shown in Fig. 1.

Fig. 1
figure 1

Age-specific first marriage rate by urban and rural area, males and females, 2010 sixth census, China

Figure 2 presents the 2010 population pyramid for China, illustrating three baby booms (birth peaks). The first baby boom occurred from 1950 to 1957 at the beginning of the establishment of the People’s Republic of China. A policy to encourage childbirth resulted in an average of 12 million births per year and an average birth rate of 35.9%. A second baby boom occurred from 1962 to 1971. After the end of the “three-year natural disasters,” China entered a stage of recuperation, during which the average annual birth rate was 36.1%, and a total of approximately 271 million people were born. A third baby boom, which occurred between 1981 and 1992, represents the children of the second baby boom. This baby boom was also affected by a change in the law shifting upward the minimum age of marriage (22 for men and 20 for women). The births during the three baby booms correspond to those ages 53 to 60, 39 to 48, and 18 to 29, respectively, in the 2010 population pyramid.

Fig. 2
figure 2

Population pyramid, China, 2010

3.1 Population Forecasting Parameters

The main parameters of the population forecasting model include the age-specific survival rate, fertility patterns by urban and rural area, birth order, the migration rate, and the SRB. The control variables in the model are the total fertility rate by urban and rural area and by birth order.

The population prediction model uses the survival rate by urban and rural area, gender, and age for each year. The estimation of the survival rate requires the use of the complete life tables for China from 2011 to 2050 by urban and rural area and gender. As China’s social economy has undergone tremendous change in the past few decades and its population has been undergoing a transitional period during the same time, these complete life tables generally draw on the experiences of Western developed countries. Here, we use the Princeton West model life table to describe future mortality rates in China. By estimating China’s life expectancy in the years to come, complete life tables for China can be obtained. The age-specific survival rate can be obtained from the complete life tables. Next, we need to determine the fertility pattern in the population. The fertility pattern is the distribution of age-specific fertility rates among women of childbearing age, reflecting changes in the desired and realized fertility levels of women of childbearing age within a certain period of time. Our forecasting process requires six fertility models, and the fertility pattern can be calculated based on census data from the baseline year. At the same time, the prediction model also takes population migration into consideration. Currently, population migration generally occurs within and across provinces. Interprovincial population migration is affected by urbanization, and interprovincial population migration is affected by interprovincial population mobility. When forecasting the national population, only internal migration is considered, not international migration. As the process of urbanization deepens, part of the rural resident population will transform into part of the urban resident population. Extrapolation methods are generally used when estimating the future urbanization rate.

Linear interpolation is used to calculate future sex ratio at birth (SRB), based on observed values in 2010. The SRB is predicted to trend downwards in the future due to progress in gender equality that will weaken the preference for boys as well as an anticipated reduction in sex-selective births as China pursues a comprehensive “two-child” policy (Basten and Jiang, 2015; Yu et al., 2018; Zhao and Yang, 2019). Even so, with the shrinking of the family structure and continued influence of traditional Chinese culture, it is difficult to eliminate gender differences completely. In addition, we must take into account that the sex ratio at birth in the urban population is lower than that in the rural population. For example, when making a national population forecast, we first use the SRB in 2010 between urban and rural areas, and then assume that the SRB in 2050 is 110:100 in urban areas and 115:100 in rural areas.

The parameter settings refer to the work of Li et al. (2019). It should be noted here that although the SRB is one of the main indicators that affect the intensity of the marriage squeeze, the population who is in the marriage market from 2010 to 2050 is mainly the population born between 1970 and 2030 (assuming that the main participants in the marriage market are between 20 and 40 years old); therefore, the estimation of the intensity of the marriage squeeze over the past 40 years is mainly a deduction based on the existing population structure, and the choice of the SRB in the population prediction model does not have a significant impact on this estimate.

We refer to the design of Li et al. (2019) for the TFRs by urban and rural area and by birth order in the model.

4 Results

4.1 Observed Age Structure of the Marriage Squeeze in China, 2010–2019

Using 2010 as the baseline year, Fig. 3 demonstrates the Chinese male marriage squeeze at different ages from 2010 to 2019. Figure 3a represents the changes in the intensity of the marriage squeeze experienced by Chinese men in each age cohort during the selected period. As time passes, the trend in the intensity of the marriage squeeze in each age cohort increases, while the magnitude and rate of increase are discrepant. From the perspective of the age cohorts, the intensity of the marriage squeeze is quite distinct across the different age groups. For example, the value of the age-specific marriage squeeze index for 21-year-old men is below 1; that is, men of this age group are theoretically free to choose a perfect match when choosing a spouse in the marriage market, while the age-specific marriage squeeze index data for 26-year-old men is higher. Moreover, the index increases year by year; that is, men of this age group are squeezed more intensely in the marriage market, and the situation is getting worse. By comprehensively observing the two dimensions of time and age cohort, it can be seen that the marriage squeeze is clustered within specific age intervals, and this clustering not only deepens but also widens year by year.

Fig. 3
figure 3

Distribution and shifts in the marriage squeeze in China, 2010–2019. a Joint distribution of the marriage squeeze indicator with respect to age and time. b Marriage squeeze distribution over time

To illustrate the distribution of the intensity of the marriage squeeze across different age cohorts in depth, we projected the three-dimensional map of the marriage squeeze onto a plane, with the horizontal axis representing age and the vertical axis representing the marriage squeeze index for men of the corresponding age (see Fig. 3b). There are two obvious peaks in the distribution, roughly among those between 25 and 35 years old and those between 45 to 55 years old. Table 1 shows the corresponding ages and years for the two peaks.

Table 1 Peaks in the marriage squeeze age structure and the corresponding cohorts

The cause of the first wave is twofold. Corresponding to the population trough around the age of 30 in the base period of 2010, men in the 30-year-old age group faced greater pressure in choosing a spouse. Additionally, because China started its family planning policy in the 1980s, the SRB became distorted. As these men entered the marriage market, they encountered an obvious peak in their age-specific marriage squeeze compared with the base-period population.

The second peak is mainly influenced by the population age structure in 2010 (the baseline year), reflecting a drop in births during the “three-year natural disasters” followed by a ten-year baby boom beginning in 1961. Twice as many births occurred in 1961 (30 million) than in 1960 (15 million). Although the second peak is significantly higher than the first peak, the actual intensity of the marriage squeeze is lower because the number of people is smaller and the proportion of first marriages is very low, which has been fully captured in our marriage squeeze measure.

The age-specific marriage squeeze is related not only to the age structure of the base population but also to the abnormal SRB. In essence, a marriage squeeze caused by changes in the age structure can be alleviated or even eliminated by changing the age distribution of those entering the marriage market, but for a marriage squeeze caused by an abnormal SRB, although it can be alleviated by changing the age distribution of those entering the marriage market, it cannot be eliminated fundamentally. The population born in a year with a high sex ratio will always experience a peak marriage squeeze after entering the marriage market, and the pressure from this squeeze will not relax with an increase in time in the marriage market. Therefore, this cohort of people is in the worst situation in the marriage market.

4.2 Urban–Rural Differences in the Marriage Squeeze in China, 2010–2019

As noted earlier, the marriage squeeze has been shifting from urban to rural areas. Figure 4 presents the trend in the marriage squeeze index separately for urban and rural areas from 2010 to 2019 using data from the 2017 China General Social Survey (CGSS). The intensity of the marriage squeeze index for men has increased each year, rising from 1.03 in 2010 to 1.32 in 2019. Importantly, the gap between urban and rural areas widened during this period. By 2019, the marriage squeeze in urban areas was 1.26, but reached 1.42 in rural areas. In further analysis, the marriage squeeze indicators were decomposed for different age cohorts. The shape of the distribution was basically the same as for that of the whole country: clustering occurred within certain age ranges, and the wave peaks gradually shifted and widened over time (results not presented but available on request).

Fig. 4
figure 4

Trend in the marriage squeeze index for men in China, 2010–2019

4.3 Prediction of the Marriageable Population in China, 2020–2050

We now turn to predicting patterns for the next thirty years. Through the population forecast analysis, we obtain the predicted number of males and females of marriageable age (15 to 60) in China from 2010 to 2050 (Fig. 5a) and calculate the predicted difference in the number of the marriageable males relative to marriageable females (Fig. 5b). Figure 5a confirms more males than females in both urban and rural settings over this time period, with diverging patterns for rural and urban areas. In rural areas, the size of the marriageable-age population for males and females steadily decreases for the next thirty years. In contrast, the size of the marriageable population in urban areas peaks in 2025 and then begins to fall after 2040. Nonetheless, Fig. 5b reveals a steady increase in the number of surplus men in both urban and rural areas. As of 2019, the surplus male population in urban areas surpassed that in rural areas and the emerging gap is maintained for the next thirty years. By 2050, the excess male population in China will reach 43.75 million.

Fig. 5
figure 5

Prediction of marriageable population in China, 2020–2050. a Size of the marriageable population for both sexes, urban and rural areas. b The number of surplus males of marriageable age, for urban and rural areas

4.4 Age Structure and Trends in the Marriage Squeeze in China: 2020–2050

With the population prediction results and the marriage squeeze indicators, we can obtain the trends in the changes to the marriage squeeze among China’s urban and rural age cohorts from 2020 to 2050, as shown in Fig. 6.

Fig. 6
figure 6

Trends in the cohort marriage squeezes by urban and rural area, 2010–2050. a represents urban areas, and b represents rural areas

Due to the longer observation window and the greater fluctuations in the three-dimensional trend graph, we projected the nationwide cohort-specific distribution of the marriage squeeze onto a plane and plotted the age-specific distribution of the marriage squeeze every ten years to obtain Fig. 7.

Fig. 7
figure 7

Trend in the shifts in the marriage squeeze distribution, 2010–2050

The two-peak structure previously observed in the analysis from 2010 to 2019 also clearly appears in Figs. 6 and 7. The reason has been detailed before. The peak age for the first crest gradually shifts to the right with the changes in the population. However, due to the internal age structure of the marriage market, the age-specific marriage squeeze index remains at a relatively high level between the ages of 26 and 37; the peak of the second crest is at approximately 50 years old. Obviously, regardless of the number of people squeezed in the marriage market or the impact of that squeeze, we should focus on the population group corresponding to the first peak to formulate corresponding countermeasures to alleviate the male squeeze in the Chinese marriage market.

Figure 8 presents predictions of the male marriage squeeze over the next thirty years. The peak age for the first crest gradually shifts to the right with the changes in the population. However, due to the internal age structure of the marriage market, the age-specific marriage squeeze index remains at a relatively high level between the ages of 26 and 37; the peak of the second crest is at approximately 50 years old.

Fig. 8
figure 8

a Trend in the marriage squeeze index in China, overall and urban versus rural, 2010–2050. b Marriage squeeze index by age cohort in 2020. c Marriage squeeze index by age cohort in 2043

Overall, rapid growth in the marriage squeeze index between 2010 and 2020 is followed by small increases over the next fifteen years, reaching 1.37 in 2035 (see Fig. 8a). Thereafter, there is another upward trend, with the marriage squeeze index reaching 1.42 in 2043, before slowly declining through to 2050. Differences in rural versus urban areas are evident. In urban areas, the marriage squeeze index will stabilize at around 1.3 between 2020 and 2035, then rise to a peak of 1.38 in 2045. As for rural areas, the marriage squeeze index exhibits fluctuating growth, with peaks of 1.45 and 1.48 in 2031 and 2043, respectively. The peak in 2031 is mainly caused by the pattern of “urban squeeze rural” while the peaks in 2020 and 2043 are due to China’s special population structure. To illustrate the peaks caused by the population structure, we show the distribution of the marriage squeeze indicators by age cohort in Fig. 8b and c. Figure 8b references the first peak, showing that men between 30 and 35 years old suffer the most from the marriage squeeze. In 2020, this group corresponded to the male population born between 1985 and 1990. It can be inferred that the peak of rural marriage squeeze index in 2020 is due to the dual impact of the abnormal SRB and the third baby boom. In 2043 (see Fig. 8c), this group corresponds to the male population born between 2008 and 2013, which reflects the abnormal SRB and entry of children from the third baby boom into the marriage market.

5 Discussion

Based on the potential first marriage ratio proposed by Tuljapurkar et al. (1995), this study provides new methods and insight into China’s marriage market. Most existing studies focus on the overall trend of marriage squeeze. Our index pays attention to the age-specific marriage squeeze and can be used to predict the marriage squeeze both nationally and in urban and rural areas. These are important insights. For example, if men in rural areas bear the brunt of the marriage squeeze, there are implications for policymakers, including designing interventions that might improve the economic conditions of older unmarried men in rural areas.

Our analysis of the intensity of the squeeze in China’s marriage market from 2010 to 2050 allows us to make three conclusions. First, China began to experience a significant marriage squeeze in 2011, and the intensity of the squeeze is predicted to reach two peaks, one in 2019 and one in 2045. More generally, the intensity of the marriage squeeze in China will remain at a high level from 2020 to 2050. In absolute terms, China’s future bachelor population will reach and remain at more than 40 million. This will be a long-term social phenomenon that will profoundly affect China’s marriage market and social formation.

Second, the marriage squeeze will be experienced unevenly, depending on age cohort. Our observed and predictive estimates reveal two peaks in the distribution of the marriage squeeze by age cohort. The first peak indicates that males around the age of 30 face a more severe marriage squeeze, which is caused by China’s special cohort population structure and abnormal SRB after the 1980s. The second peak indicates that men around the age of 50 are also facing a severe marriage squeeze, which is mainly due to the special cohort population structure caused by the second baby boom. Regardless of the number of people squeezed in the marriage market or the impact of that squeeze, researchers should focus on the population group corresponding to the first peak to formulate corresponding countermeasures to alleviate the male squeeze in the Chinese marriage market.

Finally, our analysis confirms that the marriage squeeze in rural areas will be much higher than that in urban areas. Prior research established that the rural–urban population transfer rates of the male and female populations were 0.713 and 0.783, respectively, in 1990, 1.093 and 1.257 in 2000, and 1.903 and 1.964 in 2010. As such, the transfer rate of the female population is higher than that of the male population. Importantly, the population that is migrating to urban areas is mainly between 15 and 35 years old. This suggests that urban men who are squeezed in the marriage market and cannot find a spouse can be matched with a rural woman. This promotes equilibrium in the gender structure of the urban marriage market while increasing the intensity of the marriage squeeze experienced by rural men. Our findings follow this logic. In 2019, the marriage squeeze index was 1.14 in urban areas and 1.23 in rural areas. In 2045, the predicted marriage squeeze index will remain at 1.14 in urban areas but will have risen to 1.28 in rural areas.

The marriage squeeze is shifting from urban to rural areas because surplus unmarried men in urban areas may search for spouses in rural areas, whereas unmarried women in rural areas might migrate to cities to find a suitable spouse. In addition, China began to implement its family planning policy in 1982, in which the “one-child” policy was implemented in urban areas and the “one-and-a-half child” policy was implemented in rural areas (that is, families in which the first child was a girl could have another child). In China, especially in rural areas, traditional concepts such as son preferences and bringing up sons to support their parents in their old age are deeply ingrained, leading to selective births, an abnormal SRB, and excessively high female infant mortality, which in turn have led to differences in the intensity of the male marriage squeeze between urban and rural areas.

Moreover, the steepening of the gradient in the traditional intermarriage circle has widened this difference in intensity: places in the same area have similar overall development levels but relative differences in their natural and economic conditions, such as mountains versus flat land, less arable land versus more arable land, mountainous areas versus suburbs, and suburbs versus towns. The natural conditions of flat land are better than those of the mountains, and places with more arable land have better economic conditions than places with less arable land. Towns and suburbs have more economic opportunities than mountainous areas. Therefore, in general, a woman from a mountainous area will marry into a family on the mountainside, a woman from a mountain will marry into a family in a suburb, and a woman from a suburb will marry into a family in the city. Marriageable-age females in low-status areas migrate to dominant areas. This rural–urban population shift has exacerbated the urban–rural gap in the male marriage squeeze.

Finally, we found the marriage squeeze between urban men and rural men is, for the most part, not a direct squeeze but is more concentrated among the older age groups, and there is a certain age delay; at the same time, the marriage squeeze in rural areas will also become stratified, with rich areas squeezing underdeveloped areas and the eventual formation of bad environments such as “bare branch villages.”

Due to data limitations, it was not possible to test different theories to determine which best accounts for these trends. This is the direction of our future research and exploration. Another limitation of the current study is that it does not account for a growing pattern of life-long singlehood among women—especially in urban areas—in the context of the male marriage squeeze. If the share of never married women in their late forties continues to rise, this phenomenon could further exacerbate the male marriage squeeze. Thus, it is important to estimate the changes in the rate of life-long unmarried women and the shift of first marriage pattern to improve our model. In the future, researchers should consider combining the nuptiality table with the potential first marriage ratio to give a prediction of the marriage squeeze in the context of dynamic marriage patterns.

6 Conclusion

China’s male marriage squeeze stems from deep-rooted traditional culture and its unique birth policies. The result is a large number of surplus men who cannot get married, which directly and indirectly creates risks to society. The gathering of bachelor groups at the bottom of society may lead to an increase in crime rates and other social problems. Moreover, the decline in the marriage rate brought about by marriage squeeze will further translate into a decline in the fertility rate. A better understanding of how the male marriage squeeze will operate in the coming decades is critical to addressing the problem and averting a population crisis.

7 Appendix

The idea behind this adjustment is that we set a group of parameters that adjust the marriage squeeze index for urban males in year t to a certain value. For example, 1 means that there is no marriage squeeze among males of that age in period t compared with those in the baseline year.

We denote the adjusted rural male marriage squeeze index for males of age \(i\) in year \(t\) as \({MS}_{i}^{1*}\left(t\right)\):

$${MS}_i^{1\ast\ast}\left(t\right)={MS}_i^1\left(t\right)\bullet\beta_i\left(t\right),\begin{array}{c}\mathrm i=1,\cdots,60\end{array}$$
(14)

where \({\beta }_{i}\left(t\right)\) represents the rural–urban adjustment factor for males of age \(i\) in year \(t\).

Then, we obtain the adjusted rural male marriage squeeze index, \({MS}^{1*}\left(t\right)\):

$${MS}^{1**}\left(t\right)=\sum_{i=15}^{60}\overline{{{MPR }_{i}}^{1}\left(0\right)}\bullet {MS}_{i}^{1**}\left(t\right)=\sum_{i=15}^{60}\overline{{{MPR }_{i}}^{1}\left(0\right)}\bullet {MS}_{i}^{1*}\left(t\right)\bullet {\beta }_{i}\left(t\right)$$
(15)

The total number of females available for a potential first marriage in urban areas can be expressed by \({\Delta }_{t}\), and we obtain:

$${MS}^{1**}\left(t\right)=\sum_{i=15}^{60}\overline{{{MPR }_{i}}^{1}\left(0\right)}\bullet {MS}_{i}^{1**}\left(t\right)=\sum_{i=15}^{60}\overline{{{MPR }_{i}}^{1}\left(0\right)}\cdot {\beta }_{i}\left(t\right)\bullet {MS}_{i}^{1*}\left(t\right)=\sum_{i=15}^{60}{\omega }_{i}\left(t\right)\bullet {MS}_{i}^{1*}\left(t\right)$$

where \({\omega }_{i}\left(t\right)=\overline{{{MPR }_{i}}^{1}\left(0\right)}\bullet {\beta }_{i}\left(t\right)\). Combined with Eq. (8), we obtain:

$$\sum_{i=15}^{60}{\omega }_{i}\left(t\right)\bullet {MS}_{i}^{1*}\left(t\right)=\frac{{\sum }_{i=15}^{60}\overline{{{MPR }_{i}}^{1}\left(0\right)}\bullet \left[{MP}_{i}^{1}\left(t\right)/{MP}_{i}^{1}\left(0\right)\right]}{\left\{{\sum }_{i=15}^{60}\left[{FP}_{i}^{1}\left(t\right)\bullet {FPR}_{i}^{1}\left(0\right)\right]+{\Delta }_{t}\right\}/{\sum }_{i=15}^{60}\left[{FP}_{i}^{1}\left(0\right)\bullet {FPR}_{i}^{1}\left(0\right)\right]}$$
(16)
$${\Delta }_{t}=\frac{{\sum }_{i=15}^{60}\overline{{{MPR }_{i}}^{1}\left(0\right)}\bullet \left[{MP}_{i}^{1}\left(t\right)/{MP}_{i}^{1}\left(0\right)\right]}{\sum_{i=15}^{60}{\omega }_{i}\left(t\right)\bullet {MS}_{i}^{1}\left(t\right)}\bullet {FP}^{1}\left(0\right)-{FP}^{1}\left(t\right)$$
(17)

where \({FP}^{1}\left(t\right)={\sum }_{i=15}^{60}\left[{FP}_{i}^{1}\left(t\right)\bullet {FPR}_{i}^{1}\left(0\right)\right]\) denotes the total number of females available for a potential first marriage in rural areas in year \(t\) and \({FP}^{1}\left(0\right)={\sum }_{i=15}^{60}\left[{FP}_{i}^{1}\left(0\right)\bullet {FPR}_{i}^{1}\left(0\right)\right]\) denotes the total number of females available for a potential first marriage in rural areas in the baseline year.

Then, we obtain the adjusted rural male marriage squeeze index \({MS}^{1**}\left(t\right):\)

$${MS}^{1**}\left(t\right)=\sum_{i=15}^{60}\overline{{{MPR }_{i}}^{1}\left(0\right)}\bullet {\beta }_{i}\left(t\right)\bullet {MS}_{i}^{1*}\left(t\right)$$
(18)

If we assume that the adjusted coefficients for each cohort are the same, \({\beta }_{15}\left(t\right)=\cdots ={\beta }_{60}\left(t\right)=\beta \left(t\right)\), the above equation can be represented as \({MS}^{1**}\left(t\right)={MS}^{1*}\left(t\right)\bullet \beta \left(t\right)\).

Then, the adjusted urban male marriage squeeze index, \({MS}^{2**}\left(t\right)\), is as follows:

$${MS}^{2**}\left(t\right)=\sum_{i=15}^{60}\overline{{{MPR }_{i}}^{2}\left(0\right)}{\bullet MS}_{i}^{2**}\left(t\right)$$
(19)
$${MS}_{i}^{2**}\left(t\right)=\frac{\left[{MP}_{i}^{2}\left(t\right)/{MP}_{i}^{2}\left(0\right)\right]}{\left\{{\sum }_{i=15}^{60}\left[{FP}_{i}^{2}\left(t\right)\bullet {FPR}_{i}^{2}\left(0\right)\right]-{\Delta }_{t}\right\}/{\sum }_{i=15}^{60}\left[{FP}_{i}^{2}\left(0\right)\bullet {FPR}_{i}^{2}\left(0\right)\right]}, i=15,\cdots ,60$$
(20)