Introduction

In 2016, the total population of mainland China’s 31 provinces, autonomous regions and municipalities was 1.38 billion (National Bureau of Statistics of China 2016). The United Nations’ (2017) World Population Prospects showed that in 2015 China’s population accounted for 19.3% of the global population, 1.93 times bigger than the population of Europe. Undoubtedly, China’s population plays a pivotal role in world population research. Thanks to the improving quality of living and medical conditions, China’s life expectancy at birth has improved significantly, rising from an average 60 years during 1964−1982, to almost 70 years between 1990 and 2000 (Banister et al. 2004). The country’s mortality rate has been greatly reduced by its poverty alleviation program, economic development, disease prevention, treatment progress and the re-establishment of a nationwide medical care system (Zhao et al. 2016). However, as a country with diverse population, significant disparities exist among provinces because of unequal social development and the attendant health care inequalities (Zhao 2006). Comprehensive research on these disparities and their causes, is conducive to a deeper understanding of China’s demographic patterns and trends. Mortality analysis is also the theoretical foundation for such public-service policy issues as social security and long-term care. A precise analysis of mortality by province contributes to formulating appropriate systems for the provinces according to their various social developments, especially as China faces an emerging aging issue.

Province-specific mortality data in the population census powerfully supports the study of age patterns and levels of provincial mortality. Since the foundation of the People’s Republic of China, a total of six censuses have been conducted. While the first two censuses—of 1953 and 1964—were controversial regarding data transparency (Wu et al. 2015; Yu 1978), mortality as tabulated in the 1982 census provided high-quality data that could be viewed as reliable mortality indicators (C. Li 1985; Yu 1986). Regarding the 1990 census, Zhang et al. (1997) demonstrated the underreporting of deaths in infant and senior age groups leading to a downward bias in mortality estimates. A relatively serious undercounting of infant deaths was found in the 2000 census, again causing a lowered mortality estimate (S. Li et al. 2003). In the 2010 census,Footnote 1 infant and old-age mortality data still suffer from serious underestimations (Gu et al. 2016; Hu et al. 2015; Huang et al. 2013), negatively impacting the effectiveness of province-specific mortality data.

Meanwhile, due to the implementation of the one-child policy as well as issues regarding social developments, mortality underestimates vary from province to province. Most Chinese scholars’ research has focused on national studies (and corresponding corrections) concerning the quality of mortality data in the 2010 census, whereas few studies have paid attention to the provincial data. The main reason for such a dilemma is that the abundant national-level data are more accessible and is easy to compare with other countries. Besides the census and sample survey, few datasets are publicly accessible for provincial-level analysis, allowing fewer studies to concentrate on systematic evaluation regarding the quality of data on infant and old-age mortality from a province-based perspective.

One viable method of adjusting defective mortality data is the use of model life tables. Widely utilised, traditional model life tables include those contributed by Coale-Demeny (Coale et al. 1983) and the United Nations (1982), as well as Brass’s (1971) system. A common adjustment takes the model life table of Coale-Demeny or the United Nations as the standard of Brass’s system to re-estimate mortality accounting for deficiency. There is no apparent limitation in such a method, however the problem is the traditional model life tables themselves. The Coale-Demeny model life table has such a strict screening standard that it leaves developing countries underrepresented and the applicability of the United Nations life table to other populations is limited by the small number of empirical life tables in it. Additionally, the Brass system cannot capture age patterns for mortality when the observed mortality deviates from the standard (Murray et al. 2003). To tackle the disadvantages of traditional model life tables, two new types of two-parameter model life table have been developed: a modified logit system (Murray et al. 2003) and a flexible two-dimensional mortality model (Wilmoth et al. 2012), whose input parameters indicate child and adult deaths.

These two types of two-parameter model life tables overcome the defects of the traditional model life tables, embodying contemporary mortality experience and making it easier to capture age patterns. They also offer a more flexible way to estimate death rates at all ages, especially for countries without detailed census data. Based on these two new model life tables, the Developing Countries Mortality Database (DCMD) model life table (N. Li et al. 2018b; N. Li et al. 2018a) extends the inputs of the two-parameter model life tables to three, covering young-age, adult and old-age stages, therefore reflecting changes in mortality more comprehensively and especially improving the performance of estimating age-specific death rates for old age.

In this paper, we adjusted infant and old-age mortality of China’s provincial 2010 census and assessed the underestimate. We used the DCMD model life table combined with the U5MR from the Institute for Health Metrics and Evaluation (IHME) and mortality data for adults and those aged 60 − 74 from the 2010 census. We made all calculations and figures using R language (R Core Team 2019).

Data and method

The data we adopted include two parts. Firstly, the DCMD model life table’s input parameters include child mortality 5q0, adult mortality 45q15 and mortality of those aged 60−74 15q60. The data required in order to obtain the corresponding life tables were China’s provincial 5q0, 45q15 and 15q60 for 2010. We will describe this aspect in further detail below. Secondly, we used the international datasets used for demonstrating the relationship of infant and old-age mortality against social development factors to evaluate our modified values. Undoubtedly, social development influences mortality rates, so we adopted the Socio-Demographic Index (SDI). The SDI is calculated using the Human Development Index methodology, wherein a 0 to 1 index value is determined by natural log of lag-distributed income (LDI) per capita, mean education over age 15 (EDU15 +) and total fertility rate (TFR). LDI per capita is defined as GDP per capita, smoothed over the previous 10 years:

$${\text{LDI}}pc_{t} = \frac{1}{5.5}\left[ {{\text{GDP}}pc_{t} + \sum\limits_{i = 1}^{9} {{\text{GDP}}pc_{t - i} \cdot \left( {1 - \frac{i}{10}} \right)} } \right]$$

The minima and maxima of the scales for each input and calculation of SDI appeared in the publication on global burden of disease Study (GBD) 2016 Mortality Collaborators (2017). Higher SDI indicates health outcomes, better well-being and lower mortality rates (Kemon 2017).

Data for evaluation

For evaluation, except where noted, international data on SDI and mortality rates came from the GBD 2016 SDI (GBD Collaborative Network 2017)Footnote 2 and the GBD 2017 Life Tables (GBD Collaborative Network 2018) estimated by the IHME. In the SDI from GBD 2016, GDP per capita series is produced using the James et.al (2012) method. The time series of TFR and EDU15 + are synthesised by point estimates from censuses and spatiotemporal Gaussian process regression (ST-GPR). The data types used in the GBD 2017 life tables include vital and sample registration systems, household surveys, censuses, disease surveillance systems and demographic surveillance systems. The completeness of these data sources has been checked before further processing. The 5q0 and 45q15 estimated from these sources used ST-GPR and were input into a logit model life table system to produce life tables by location, single calendar year, age and sex for the years 1950−2017. The procedure is detailed in the publications on GBD 2017 Mortality Collaborators (2018) and GBD 2016 Mortality Collaborators (2017). All these original source inputs can be explored through the GBD’s Data Input Sources Tool (IHME 2017b, 2018b) and the data quality of original sources has been evaluated (Redford et al. 2018).

It is to be noted that this data set has some limitations. Firstly, the ST-GPR approach is a complex procedure, with some parameters of which we have limited understanding. The IHME (2017a, 2018a) has shared the codes, but researchers are unable to reproduce the estimates because some of the original data sources are not publicly accessible. Secondly, the details of the logit model life table system and its reference life table are quite limited. It is difficult to fully understand the appropriateness of the reference to the population in question.

Under-Five mortality

The provincial 5q0 we utilised came from China’s subnational Millennium Development Goal 4 (MDG 4) (IHME 2017a). The value is derived from censuses since 1982, the One per thousand Survey on Fertility and Birth Control, the One per thousand Annual Survey on Population Change, the 1% Survey on Population Change, the Maternal and Child Health Surveillance and the Disease Surveillance Point System. This dataset is checked by examination for completeness and fitted by Gaussian process regression and provides the province-specific 5q0 estimates for the period 1990−2013; details on the processing method are described by Y. Wang et al. (2016). So, it is plausible to consider that the quality of this dataset is superior to the census. However, this dataset only provides the 5q0 for the total population of both sexes, while our priority is to separate it into a sex-specific representation.

Chinese culture traditionally prefers sons. Discrimination against daughters varies with socio-economic conditions by province but is linked to higher female 1q0 in many provinces (Attané 2009). In this paper we adopted a compromised strategy to carry out separation for sex in 5q0: the sex ratio of processed sex-specific 5q0 is identical with the ratio reported in the census. The assumption is that the error in 4q1 is minor compared to 1q0. The grounds for this is: compared with other age groups, infant mortality underestimation has the greatest potential to occur—given social customs, technical issues and policy reasons. These factors may exist in other age groups, but the impact is not as significant as that in infants (Huang et al. 2013).

Thus, the sex-specific 5q0 can be estimated from non-sex-specific 5q0 in China subnational MDG4 by:

$$\begin{gathered} {}_{5}q_{0}^{M} = {{{}_{5}q_{0} } \mathord{\left/ {\vphantom {{{}_{5}q_{0} } {\left[ {w + {{\left( {1 - w} \right)} \mathord{\left/ {\vphantom {{\left( {1 - w} \right)} S}} \right. \kern-\nulldelimiterspace} S}} \right]}}} \right. \kern-\nulldelimiterspace} {\left[ {w + {{\left( {1 - w} \right)} \mathord{\left/ {\vphantom {{\left( {1 - w} \right)} S}} \right. \kern-\nulldelimiterspace} S}} \right]}} \\ {}_{5}q_{0}^{F} = {{{}_{5}q_{0}^{M} } \mathord{\left/ {\vphantom {{{}_{5}q_{0}^{M} } S}} \right. \kern-\nulldelimiterspace} S} \\ w = {{SRB} \mathord{\left/ {\vphantom {{SRB} {\left( {1 + SRB} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {1 + SRB} \right)}} \\ \end{gathered}$$

where SRB refers to the sex ratio at birth and S to the census sex ratio for 5q0. It should be stressed that although birth control departments fined families for violating the one-child policy, giving them incentives to hide real birth numbers, such error has minor impact on the accuracy of SRB (Banister 2004). And the Ministry of Public Security allowed people violating family planning policies to apply for household registration through the opportunities of the 2010 census, which should have encouraged families to report their children’s birth and addressed the problem of under-enumeration (Cai 2013). Therefore, the quality of census birth data is credible. Table 6 lists the final results. The values offered by China’s subnational MDG 4 are much higher than the reported values of the 2010 census.

Table 1 Provincial Life Expectancy at birth and at age 60 in 2010

Adult mortality

Death undercounting occurs mainly in infant and old-age groups. The general reason is forgetting the accurate date of death because the census is a retrospective survey. There are also specific reasons for infant and old-age deaths: the implementation of the one-child policy means that families with more than one child often faced acute penalties so much that a newborn’s birth and death would be deliberately concealed. Although unreported infant death and births coexist, the ratio of unreported deaths to reported deaths usually exceeds that of unreported births to reported births, leading to a downward bias in infant mortality values (Anthopolos et al. 2010). Additionally, some families would hide or delay reporting the death of their elderly parents and relatives to defraud the authorities regarding pensions and other social welfare provisions (He et al. 2018; Sanmenxia Social Endowment Insurance Center 2016). Due to the worship of ancestors and the taboos around death in Chinese culture, families may not be willing to talk about their elders’ deaths for fear of triggering the deceased’s displeasure, or of invoking bad luck (Hsu et al. 2009). Besides, the elderly are more likely than younger generations to adopt a nominal age up to two years greater than their chronological age (Booth et al. 2008). Such age overstatement would create a downward bias for old-age mortality estimates on the census (Preston et al. 1999).

Compared to infant and old-age cases, few motivations exist for concealing deaths in adult cohorts. Thus, the data of 45q15 in the census are relatively more accurate and stable than for infants and old ages. Adult mortality 45q15 for 31 provinces derived from China’s 2010 census was directly used as the second input parameter of the DCMD model life table.

Mortality between 60 and 74 years

Currently, because no vital statistics directly provide mortality rates in the 60−74-year-old age group 15q60, they have to be estimated in some other way. The original method of the DCMD model life table calculated the 15q60 through the census method (N. Li et al. 2018a) and the model provided by Wilmoth et al. (2012), which caused a problem in provincial research. The hypothesis of the census method is that the population aged 60 − 74 is closed to migration, a reasonable assumption for research on China’s total population because China has not traditionally seen a great deal of migration. But this assumption is unsuitable for provincial research for two reasons. Firstly, changes in the Household Registration (Hukou) System and diversification in family support and elderly care make it easier for people to migrate at old age (Tong et al. 2012). Secondly, inner-family demands such as taking care of grandchildren and benefits from migration like preferable lifestyles motivate the elders to migrate: older adults in underdeveloped provinces tend to migrate to more developed areas, following their children or seeking better social services (Dou et al. 2017). Considering the problems of underreported death in the population aged 60−74 combined with migration, we employed a general growth balance method open to migration (Bhat 2002) to correct provincial 15q60 from the 2010 census.

Bhat’s (2002) general growth balance method open to migration takes the influence of population migration into account and estimates the completeness of death registration in open populations at age x and over. The principle is that the growth rate of populations at age x and over equals the difference between the corresponding entry rate and the death rate, then adds the net migration rate, written as follows.

$$\frac{N\left( x \right)}{{N\left( {x + } \right)}} - r\left( {x + } \right) + \frac{{NM\left( {x + } \right)}}{{N\left( {x + } \right)}} = n + \frac{1}{C} \cdot \frac{{D^{*} \left( {x + } \right)}}{{N\left( {x + } \right)}}$$

Here, N(x) is the population at age x, N(x +) is the person-year lived at age x and over. Meanwhile, r(x+), NM(x +) and D*(x +) stand for the population’s growth rate, net migrating population and the number of registered deaths at age x and over. Further, C represents completeness of the mortality records and n stands for the rate of natural increase in the population. Finally, C and n do not vary with age x and C can be inferred from the slope and n from the intercept of Eq. and we let K = 1/C as adjustment factor.

The age group x + in which we applied the method was open-ended, from 60 years of age upward. To correct province-specific 15q60, we utilised province-specific data from the 2010 census to fit Eq. and took K as the average of the adjustment factor for the population at age 60 and over. The required data included population aged 60 and over in 2009 and 2010, death numbers during the same period and average population and age-specific net migration. The census tabulation presented the death and average population from November 1, 2009 to October 31, 2010 and the population as of November 1, 2010. To calculate the population aged 60 and over in November 1, 2009, we utilised the average population from November 1, 2009 to October 31, 2010 and the population in November 1, 2010 together. We also picked the five-year age groups data in order to weaken the influence of age misreporting as much as possible. The criterion for judging that an individual belongs to the migrant group was whether the location of an individual at census time was identical with their household registration. Because detailed migration data is not publicly accessible, we estimated approximate provincial age-specific net migration for the year before the census. This was by “household registration- age- and sex-specific population registered outside the province” and “time-specific population registered outside the province” according to the census tabulation. For the national value, the influence of international migration is so small that age-specific net migration NM(x) is 0.

We adopted group-mean (United Nations 1983) and orthogonal regression (Golub et al. 1980), suggested by Bhat (2002), to fit Eq. for solving the issue of partial birth and death rates switching places in the growth balance equation. In so doing, we eliminated the data points for those aged 95 and over in most provinces because of their serious deviation from the line formed by the distribution of other points. If the adjustment factor K < 1 for some provinces, we decided to let K = 1. Table 7 lists the province-specific results. We re-estimated national 15q60 through weight average of provincial 15q60 and the population aged 60 to keep consistent. The results were 0.368 for men and 0.226 for women, close to the values (0.362 for men and 0.217 for women) estimated by the original method of DCMD model life table (C. Li et al. 2018).

Table 2 Province-specific 1q0 in 2010 (per thousand)

DCMD model life table

The DCMD model life table, or three-parameter model life table, augments the flexible two-dimensional mortality model (Wilmoth et al. 2012) to fit the observed old-age mortality, by adding a parameter to ax at age 60 and over, as follows.

$$\begin{gathered} \ln \left( {\hat{m}_{x} } \right) = \hat{a}_{x} + b_{x} \cdot \ln \left( {{}_{5}q_{0} } \right) + c_{x} \cdot \left[ {\ln \left( {{}_{5}q_{0} } \right)} \right]^{2} + v_{x} \cdot k \hfill \\ \hat{a}_{x} = \left\{ {\begin{array}{*{20}c} {a_{x} ,\;\;x < 60} \\ {a_{x} + \ln \left[ {\frac{{\ln \left( {1 - {}_{15}\hat{q}_{60} } \right)}}{{\ln \left( {1 - {}_{15}q_{60} } \right)}}} \right],\;\;x \ge 60} \\ \end{array} } \right. \hfill \\ \end{gathered}$$

Here, \({}_{15}\hat{q}_{60}\) stands for an observed value or data from other sources and 15q60 is calculated using the flexible two-dimensional mortality model. Coefficient vectors ax, bx, cx and vx are obtained by fitting death data from the Human Mortality Database (HMD, https://www.mortality.org/). The values of these coefficients can be seen in the paper of Wilmoth et al. (2012). The detailed method for augmenting the two-parameter model life table to the three-parameter can be seen in the work by N. Li (2014).

Recompiling provincial life tables

When recompiling provincial life tables, the 1m0 and 5mx (x = 60, 65, …, 85) calculated from the DCMD model life table were adopted in place of the original data tabulated in the census. We utilised the exponential three-moving average formula

$$\ln \left( {{}_{5}\overline{m}_{x} } \right) = \frac{1}{3}\left[ {\ln \left( {{}_{5}m_{x - 5} } \right) + \ln \left( {{}_{5}m_{x} } \right) + \ln \left( {{}_{5}m_{x + 5} } \right)} \right],\;\;x{ = }15,20,...,85$$

to smooth the adjusted death rate curve, in order to eliminate the fluctuation. For the calculation of nax, we utilised Andreev et al. (2015) formulae for computing 1a0, the formulae of Coale et al. (1983) for 4a1 and Greville’s (1977) formula for 5ax.

The comparisons between our adjusted results and census data are displayed in Fig. 1a and Fig. 1b, from which it is clear that the adjusted age-specific mortality rates are higher than those from the census at infant and old ages. Intuitively, age-specific mortality levels of eastern provinces or municipalities like Zhejiang and Tianjin are lower than those of western provinces such as Gansu and Qinghai. Particularly, age-specific mortality rates of ethnic minorities in autonomous provinces like Ningxia or rates for provinces like Yunnan, where ethnic minorities live, tend to show higher levels.

Fig. 1
figure 1figure 1

a Comparison between adjusted and census age-specific mortality rates: male. Source: Authors’ calculations and China’s 2010 provincial census, the same below, (b) Comparison between adjusted and census age-specific mortality rates: female

We also compared our adjusted national age-specific mortality rates with the results of Banister et al. (2004), as Fig. 2 shows. As a whole, age-specific mortality rates continued to decline for both sexes over time and in 2009−2010 compared to earlier times, the accident hump around age 20 seemed to disappear. The most rapid decline in mortality rates between 2000−2010 occurred below age 5, with an average annual percentage decline of six percent for boys and nine percent for girls. The average annual percentage decline in 45q15 was three percent for men and six percent for women, while old-age mortality 30q60 declined sluggishly, with an annual percentage decline of less than one percent for both sexes.

Fig. 2
figure 2

Age-specific mortality rates, China: 1964−2010. Source: Banister et al. (2004) and authors’ calculations

We restricted adjusted age groups in the range of 15−85 for two reasons. Firstly, since the death-rate curve for those aged 0−15 presents a U-shape, the moving average would abnormally increase the values at the U-shape’s bottom. Another problem is the accident hump around age 20. According to our observation in Fig. 1a and b, the effect of the moving average on the accident hump is rather small and will not have a significant impact on the final results. Secondly, there is debate about the significant decline in age-specific death rates at 90 years and over from China’s census. Coale et al. (1986) considered that low mortality at advanced ages is caused by age overstatement, which is obvious in the 1982 census (Coale et al. 1991). However, Zeng et al. (2003) considered that this phenomenon may be caused by mortality selection in the heterogeneous Chinese population, namely because the mortality of strong individuals who survive to advanced ages is much lower than normal. Discussion of causes for such issues is beyond our scope. We will consider whether underreporting triggering the reduction of age-specific mortality rates in these age groups requires further study and restrict the age groups of corrected old-age mortality for the ages of 60 to 89.

Table 1 lists the life expectancies at birth and at age 60, calculated by adjusted age-specific mortality rates. The adjusted national e0 decreased by 2.51 years for boys and 2.02 years for girls and e60 by 1.94 years for men and 1.33 years for women, as compared to the census report. Additionally, our adjusted e0 and e60 approximate the values estimated by the life tables of GBD 2017 (e0: 73.25 years for boys, 78.65 years for girls; e60: 18.79 for men, 22.20 for women). The largest decrease in e0 occurred in Tibet (5.89 years for boys and 6.22 years for girls) and the largest e60 decline for men was in Qinghai (5.31 years), while for women the largest decline was in Tibet (5.22 years).

Modification for mortality data

According to the 2010 census, underreported death in infants and in those of old age varies by province due to different social situations. Accordingly, we utilised the results calculated by the DCMD model life table to modify and assess infant and old-age mortality data by province across China. We made use of the relationship between SDI and mortality rates, fitted by Loess regression, to evaluate our corrected results. The relationship depends on SDI values and does not vary with time, so we adopted the root-mean-squared relative deviation (RMSRD) as an evaluation metric for the difference between the two values, as follows.

$$RMSRD = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {\frac{{q_{i}^{m} }}{{q_{i}^{e} }} - 1} \right)^{2} } }$$

Here, i is the ith estimate and n represents the total number of estimates. The qm stands for modified or census mortality and qe for expected mortality predicted by provincial SDI and the regression. So, the smaller RMSRD is, the more accurate the values are.

We also tested our estimates by comparing with values derived from the models of Gompertz (1825) μ(x) = aebx, Kannisto (1992) and Himes et al. (1994) μ(x) = aebx/(1 + aebx) and Logistic (Perks 1932) μ(x) = c + aebx/(1 + debx). The models were based on single years of age-specific mortality rates extended from five-year age groups. The steps of extension are the piecewise cubic Hermite interpolating polynomial with iterations, suggested and detailed by United Nations (2013). The model parameters were fitted by the logarithm of the maximum likelihood function, as follows.

$$L = \sum\limits_{x}^{{}} {\left\{ {D\left( x \right)\ln q\left( x \right) + \left[ {N\left( x \right) - D\left( x \right)} \right]\ln \left[ {1 - q\left( x \right)} \right]} \right\}}$$

Here, \(q\left( x \right) = 1 - \exp \left( { - \int_{x}^{x + 1} {\mu \left( t \right)dt} } \right)\) is the probability for a person at age x to die before age x + 1, N(x) is the number of persons who live to age x and D(x) is the number of people who will die between age x and age x + 1. Given possible inaccuracy of age-specific mortality for ages 60 and over, we did not directly fit the models to the observed data. We made a preliminary modification through the method of Gu et al. (2016). Figure 1a and b show that the observed death rates for ages 60−69 deviated less from our adjustment, so we assumed that these mortality rates were relatively accurate. We then applied the Gompertz model to the observed data for ages 60−84 and we extrapolated to age 89, finally applying the three models to these rates for ages 60−89.

Modification for infant mortality

Table 2 lists provincial 1q0 adjusted by the DCMD model life table. Obviously, province-modified 1q0 rose significantly compared to reported values from the 2010 census. The largest increase was in Henan, showing rates about 17 times higher than reported values. After adjustment, Beijing had the lowest 1q0—4.96 per thousand for boys and 4.03 per thousand for girls. The highest 1q0 appeared in Tibet, at 38.53 per thousand for boys and 37.21 per thousand for girls. The modified national 1q0 of 13.55 per thousand for boys and 12.96 per thousand for girls showed values about three times higher than the census data. For data consistency, the national 1q0 weight-averaged by province-specific adjusted 1q0 and the population at birth, was 13.49 per thousand for boys and 12.55 per thousand for girls, which is very close to the directly modified value mentioned above. Table 3 lists the comparisons between our modified national IMR and the IMRs published by the National Health and Family Planning Commission of China (2017). Comparisons with the United Nations Inter-Agency Group for Child Mortality Estimates (UN IGME), IHME, the WHO and the United States Census Bureau (https://www.census.gov/) illustrate that our results approximate the values published by these organisations.

Table 3 China’s 1q0 estimated by different sources in 2010 (per thousand)

Figure 3 elucidates the contrast between modified IMR, the census reported IMR and the fitted line of IMR, showing that the adjusted IMR is closer to the fitted line than the census IMR. Our results also show similar distribution to the results given by Huang et al. (2013) and Hu et al. (2015). In Table 4, the RMSRD of our modified IMR is approximately half that of the census report and 76−83% of the values obtained by Hu et al. (2015), while Huang et al. (2013) results performed the best.

Fig. 3
figure 3

Correlation between SDI and IMR. Source: GBD Collaborative Network (2017, 2018), Huang et al. (2013) and Hu et al. (2015) and Authors’ calculations, China’s 2010 provincial census

Table 4 RMSRDs between IMR estimates from different scholars

Modification for old-age mortality

Table 5 lists the modified 30q60 of China’s 31 provinces by DCMD model life table. Different from 1q0, the increase of modified 30q60 compared to census-reported values is small and the largest increase in men was in Xinjiang, rising by a factor of 1.20, while for women it was in Heilongjiang, 1.29 times higher. After modification, Hainan has the lowest 30q60, at 0.792 for men and 0.646 for women; Tibet has the highest 30q60, at 0.985 for men and 0.949 for women. The modified national 30q60 is 0.911 for male and 0.815 for female, double the census-reported values and almost remaining identical with IHME estimates of 0.911 for men and 0.811 for women. To validate consistency, the national 30q60, weight-averaged by province-specific adjusted 30q60 and population at age 60, is 0.904 for male and 0.804 for female, close to the national modified value listed in Table 5.

Table 5 Province-specific 30q60 in 2010

The scatter diagram of SDI against 30q60 in Fig. 4 indicates that more of the modified 30q60 points distribute on both sides of the fitted line than do the census-reported 30q60 points. The RMSRD of the modified 30q60 is 0.0500 for men and 0.08179 for women, while the census-reported 30q60 values were 0.0670 for men and 0.1056 for women.

Fig. 4
figure 4

Correlation between SDI and 30q60. Source: Authors’ calculations and China’s 2010 provincial census

Analysis for underestimate

According to our adjustments, the underestimation of male infant mortality is more severe than female, while the underestimate of old-age mortality shows an opposite trend. For infant mortality, China is dominated by a patriarchal system endowing the son in a family with special meaning like filial obligations and the continuity of the family line. Not having a son is viewed as a curse in life and it is a case of shame and blame for not being able to bear a male descendent for the extended family (Chan et al. 2002). Under such circumstances, stigma may lead some parents to not report their son’s death, causing more severe mortality underestimation for male infants than for female. However, the old-age mortality with higher female underestimation seems to be difficult to explain due to little research on such issues, but one possible reason is the age overstatement. Chinese culture is deeply influenced by Confucianism, which highly reveres authority and age (Chou et al. 2013). This tradition may prompt individuals to overstate their age. Plus, Chinese birthdays must be celebrated before or on the actual birth date and women have more birthday taboos than men (Mack 2020). For example, Chinese women do not celebrate turning age 30, 33, or 66, while men generally skip their 40th birthdays. These factors would make age overstatement more common in elderly women, causing higher underestimation rates of female old-age mortality in the census.

Figure 7 displays the underestimation rate distributed by province for infant and old-age mortality (given in Fig. 5 and Fig. 6) and comparisons among seven main regions.Footnote 3 Based on Fig. 7, the southwestern region has the least serious infant mortality underestimation rate, with 56.93% for boys and 50.43% for girls, less than the national value (boys 72.55% and girls 69.87%). The rates are higher in the eastern region (boys 72.79% and girls 70.25%) and the northwestern region (boys 73.33% and girls 71.78%). Old-age mortality underestimation for eastern, south-central and northern China is less severe than that for the southwest, northwest and northeast.

Fig. 5
figure 5

Provincial infant mortality underestimation rates in 2010 census. Source: Authors’ calculations

Fig. 6
figure 6

Provincial old-age mortality underestimation rates in 2010 census. Source: Authors’ calculations

Fig. 7
figure 7

Regional infant and old-age mortality underestimation rates and proportion of ethnic minorities in 2010. Source: Authors’ calculations and China's 2010 Census

The plausible interpretations for such phenomenon are that the quality of medical staff and the vital registration system in eastern areas are superior to those found in western areas (Yang et al. 2005). For example, live birth deaths viewed as stillbirths (L. Wang et al. 2011) could be avoided. Additionally, by contrast with the situation in the southwestern and northwestern regions, a large proportion of the population in eastern, south-central and northern regions have Han Chinese ethnicity (see Fig. 7 for the proportion of ethnic minorities). Because of no (or low) restriction by the one-child policy (the National People’s Congress 2002), ethnic minorities less often conceal or make false declarations out of fear of penalties, leading to lower underestimation rates for infant mortality in these areas. However, the same characterisation brings about a more serious underestimation for old-age mortality in the southwestern and northwestern regions. Why? Age misstatement is practically universal in some ethnic minorities such as the Uyghur, causing serious age-specific death-rate distortion in the census, while age enumeration among Han Chinese is relatively accurate because they often use the Chinese zodiac or lunar calendar (Coale et al. 1991; Gu et al. 2008; Zeng et al. 2008; Zeng et al. 2003).

Furthermore, we compared our adjusted results with the estimates derived from the Gompertz, Kannisto and Logistic models; see Fig. 8. In most cases, the 30q60 estimated by these mortality models were almost identical to each other, but lower than our adjustment. Nevertheless, for men in some provinces—like Gansu and Jiangxi—and for women in Henan, Hubei and Sichuan–the model and adjusted values matched each other well. This phenomenon is caused by the fact that the model estimates were derived from observed age-specific mortality rates for ages 60−69. According to Fig. 1a and b, negative differences still exist among observed age-specific mortality rates that are lower than the adjusted rates for ages 60−69 and larger differences show lower model-estimated 30q60 compared with the adjusted 30q60. For provinces like Gansu, with model-estimated 30q60 close to the adjusted 30q60, the differences between observed age-specific mortality rates and our adjustment can be omitted for this age group. In addition, all model estimates based on observed age-specific mortality rates for ages 60−69 are higher than the census 30q60, indicating that systematic underreporting of death is potentially frequent in cases of advanced age.

Fig. 8
figure 8

Adjusted, observed 30q60 and the model fittings. Source: Authors’ calculations and China's 2010 provincial census

Conclusion

In this paper, we adopted the DCMD model life table to correct infant and old-age mortality rates among China’s provinces, assessing the corresponding underestimates. Our results reveal that most provinces had underestimation rates of at least 70% in infant mortality as reported in the 2010 census. The provinces with the best IMR data in this respect (i.e., most accurate and least underreported) were Guizhou and Yunnan, whose underestimation rates were less than 40% for male and 30% for female. The worst case was Henan, with underestimation rates over 90%. Old-age mortality underestimation rates also varied widely among the provinces, ranging from more than 15% to approximately zero. The province with the best data quality on men in the census report was Guangdong, with an underestimation rate of less than 1%, while those showing reliable data on women were Fujian, Shanghai and Zhejiang, where underestimation can be omitted. Xinjiang had the worst performance, with an underestimation rate above 20%.

Furthermore, based on our adjustments, we re-calculated life expectancy at birth and at age 60. The result shows that in 2010 the e0 gap between provinces ranged from nearly 0 (Zhejiang vs. Jiangsu) to 14.8 years (Shanghai vs. Tibet) for boys and 0 (Fujian vs. Guangxi) to 15.9 years (Shanghai vs. Tibet) for women. The range of e60 gap between provinces was 0 (Jilin vs. Hebei) to 8.8 years (Hainan vs. Tibet) for men and 0 (Henan vs. Tianjin) to 9.9 years (Hainan vs. Tibet) for women. The sex gap for e0 at the national level was 5.3 years, while the biggest gap—6.8 years—existed in Guangxi, followed by a 6.7-year gap in Anhui; Tibet presented the lowest gap, at 3.5 years. Sex differences in life expectancy enlarged compared to the reported values of the 2010 census. The sex gap in e60 varied from 2.2 years for Jilin to 5.2 years for Anhui, but Fujian, Hubei, Jiangxi and Qinghai all showed relatively large values in this metric, all above 4.5 years.

We recognise the limitations in interpreting our findings. Firstly, the DCMD model life table is an augmented version of the model provided by Wilmoth et al. (2012)—so, essentially, we utilised HMD countries and regions’ experiences to re-estimate China’s province-level infant and old-age mortality rates. Although C. Li et al. (2018) have preliminarily validated the DCMD model life table’s feasibility for China’s national issues and although our re-estimated results are reasonable compared with other sources, further validation for province-level analysis is necessary and encouraged. Secondly, our adjustments for provincial 1q0 rely on China’s subnational MDG 4 from the IHME, indicating that any uncertainty or limitation in the IHME’s methods would impact on our results. Different data sources and processing procedures could produce discrepant datasets for the same mortality indicators, such as under-5 mortality rates and deaths estimated, between the IHME and the UN IGME (Alkema et al. 2012). Therefore, our results could change significantly if we used different datasets.

Despite the limitations above, the approach in this paper enables us to adjust provincial infant and old-age mortality rates using provincial age-specific data with relatively few assumptions and, to some extent, to evaluate effects of migration more than previous studies (Gu et al. 2016; Hu et al. 2015; Huang et al. 2013). Such advantages make our results more exact and generalisable. Instead of focusing on the single aspect of infant mortality, or of old-age mortality, we investigated both underestimates comprehensively at the province level and we elucidated a detailed geospatial distribution of underestimation rates.

Furthermore, our findings have important implications. Because underestimation rates vary by province, we have contributed a reasonable basis for adjustment factors, which can be used in future analyses of mortality in China’s provinces, to reduce the deviations inherent with direct use of original census data. In the meantime, the results also provide a practicable evaluation reference for the mortality data of the 2020 census in China. From an international perspective, the data quality regarding mortality in China’s censuses has attracted the attention of an increasing number of scholars (Missov et al. 2019). This paper’s discussion of mortality underestimates at the provincial level and its investigation of regional disparity, in a sense enriches the literature for international studies on such topics and particularly regional mortality in China. Moreover, our research provides a methodological reference for substantive analyses assessing the accuracy of mortality at infancy and in old age as found in the censuses or other surveys.

Mortality disparity is significant for public health systems, whose purpose is to safeguard social communities’ well-being and to reduce their inequities. The measurement of mortality and morbidity is the foundation for customising public health policies to fulfill local demands. For example, if an area is vulnerable to infectious disease, its public health policies will focus on preventive measures instead of other measures. Infant mortality is a particularly important indicator of population health (Reidpath et al. 2003); given the emerging issue of ageing populations, the accuracy of estimated mortality disparity will also have important implications for effective social security. The 31 provinces of mainland China remain at different stages of development, with varying infant and old-age mortality levels; the census is a good way to investigate such disparities, but underestimated mortality negatively impacts the utility of census data. Appropriate adjustments to defective provincial mortality data are therefore conducive to formulating proper public health policies and realising equity in social welfare systems.