Introduction

In recent years, many countries around the world (for example, China, the United States, Kenya, Ghana, Colombia, Mexico, Thailand, Vietnam, various former members of the Soviet Union, and India) have been trying to expand their public health insurance coverage (Wagstaff et al. 2009; Yip and Hsiao 2009; Kowalski 2010; Duarte 2012). Two fundamental concerns for policymakers on the shaping of their future health care systems are equity in health care utilization and control of health care costs. To achieve either goal, policymakers need a better understanding of the underlying determinants of individual health care expenditure.

In the past several decades, there has been a growing empirical literature on the factors that determine health care expenditures, which include prices of health care and individual characteristics such as age, sex, education, health status, income, and socioeconomic status (e.g., Musgrove 1983; Manning et al. 1987; Wedig 1988; Kenkel 1990; Hitiris and Posnett 1992; Chiappori et al. 1998; DiMatteo and DiMatteo 1998; Eichner 1998; DiMatteo 2003; Vera-Hernandez 2003; Mocan et al. 2004; Kaestner and Dave 2006; Kowalski 2010; Duarte 2012). One limitation of this literature is that most of the evidence collected is based on a mean regression approach.Footnote 1 The observed average effect may hide many complex behaviors. In fact, the role of determinant factors at different points of the medical expenditure distribution may have important implications to equity in health care utilization and cost control.

In this paper, we apply a quantile regression method to quantify the heterogeneous impacts of various factors at different segments of the medical expenditure distribution in China. Comparing with the average effects, we find that health care expenditures at the upper end of the distribution are under stronger influences of need factors such as health status, and weaker influences of socioeconomic factors and insurance status. On the other hand, health care expenditures at the lower end of the distribution are under stronger influences of socioeconomic factors and insurance status, and weaker influences of need factors.

Our results may provide useful information to policymakers for the optimal design of their health care systems. For example, as emphasized in Cutler and Zeckhauser (2000), an important difference between real world and optimal insurance policies is that the former almost invariably have a constant coinsurance rate, whereas the latter do not. Several studies show that from the perspective of the tradeoff between risk-reduction and moral hazard, non-linear copayment schedule can be substantially superior (e.g., Blomqvist 1997; Ellis and Manning 2007; Felder 2008). Our results suggest that a regressive co-payment schedule might be preferred due to another concern—equity in health care utilization. Furthermore, this study may have particular interests to the health policymakers in China, where is still in a period of reshaping its health-care system.

The paper proceeds as follows. “Background” section discusses the background of our study. “Data and model” section illustrates the model and data. Endogeneity problems may pose significant challenges to our conclusions, therefore, we give a detailed discussion on the potential selection issues in this section. In “Model” section, we report the econometric results. Finally, “Empirical results” section draws the conclusions.

Background

China is currently still in a period of reshaping its health care system. Prior to the reform, health care in China had been financed primarily through three major public programs: the Cooperative Medical System (CMS), the Government Insurance Scheme (GIS), and the Labor Insurance Scheme (LIS). The CMS was established in rural areas, and was generally funded by contributions from participants but was heavily subsidized by the collective welfare funds. The GIS was primarily for government employees, veterans, educators, and college students, whereas the LIS was for workers of all state-owned, and some non-state owned, enterprises, and was financially supported by the welfare funds of enterprises. These three types of insurance plan together provided near-universal coverage to the Chinese population from the 1950s to the end of 1970s (Zhong 2011).

China has been undergoing its economic reform since the end of 1970s. The communes disappeared after the introduction of Household Responsibility System in the rural economy. Consequently, the CMS collapsed and its coverage fell to less than 5 % of the population by the early 1990s (Zhao 2006). Urban reform was initiated in the mid-1980s, and since then state-owned enterprises (SOEs) have been given substantial financial autonomy. The 1990s was a decade marked by profound shifts in industrial and enterprise structures in urban China. Since the early 1990s, non-state enterprises (including private, foreign, joint-venture, and mixed ownership) have emerged as important players in the Chinese economy. The proportion of the labor force employed in the state sector has fallen continuously. Meanwhile, the SOEs have performed poorly due to soft budget constraints and many other institutional problems. Radical reforms for the SOEs were introduced in the middle and late 1990s. To vitalize the Chinese economy, a wave of ownership restructuring for the SOEs occurred in which many loss making enterprises were bankrupted, merged with other enterprises, transformed into joint-stock companies, or privatized. As a result, large numbers of workers in the SOEs were laid off. Because LIS schemes pool risks at firm rather than national level, many workers in the SOEs who kept their jobs found their employers unable to maintain the plans due to the firm’s poor economic performance. As a result, health insurance coverage declined considerably in urban areas. By 1998, nearly 50 % of the urban population lacked insurance coverage (Wagstaff and Lindelow 2008). Meanwhile, both the absolute and relative costs of health care have risen sharply due to providers’ profit-seeking behavior and the resultant waste and inefficiency (Yip and Hsiao 2008).

In the response to the public discontent with unaffordable health care services, the Chinese government has initiated several public health insurance programs since 1998, which try to cover different segments of the population. The New Cooperative Medical Scheme (NCMS) was introduced in the rural areas. The new health insurance scheme that tries to cover all formal-sector workers in urban areas was known as Urban Employee Basic Medical Insurance (UEBMI). Compared with the old urban health insurance systems (LIS and GIS), the UEBMI expanded coverage to private sector employees and pools its risk at the municipal level, which provides more stable financing. The final goal of UEBMI is to replace LIS and GIS in the cities. The fundamental rules of UEBMI is designed at provincial level, and managed by each city’s Social Insurance Bureau (SIB). Local SIBs have a certain degree of discretion regarding the choices of deductibles, services covered, reimbursement methods, and methods of payment to providers. UEBMI coverage has expanded continuously since 1998. It was extended to over 180 million urban enrollees by 2007 (Lin et al. 2009). The GIS is the best public health insurance available in China in terms of generosity. Although the final goal of the health-care reforms is to gradually subsume LIS and GIS into UEBMI, the real power holders have incentives to protect their interests collectively. Consequently, the process of reform to GIS has been slow, it was still in existence in most urban areas until very recently. The percentage of the population covered by the private and ‘other’ insurance plans in urban areas has increased significantly during this period (Wagstaff and Lindelow 2008). Despite the existence of these insurance schemes, a significant proportion of urban residents without formal employment remained outside of the health-care insurance system in 2007 (Lin et al. 2009).

Data and model

Data

The data used in this study come from the Rural–Urban Migration in China and Indonesia (RUMiCI) project, which include longitudinal surveys in both countries. The RUMiCI China survey, also called Chinese Household Income Project (CHIP)Footnote 2 comprises three randomly selected samplesFootnote 3: the rural household sample surveys households living in rural areas with rural Hukou (rural sample), the urban household sample covers households living in urban areas with Urban Hukou (urban sample), and the migrant sample comprises households working in urban areas with rural Hukou (migrant sample). In this study we employ the urban sample of the most recent 2009 RUMiCI survey data, in which we exclude a small number of observations with multiple insurance plans.

The 2009 RUMiCI urban survey was conducted from nine provinces: Shanghai, Jiangsu, Zhejiang, Anhui, Henan, Hubei, Guangdong, Chongqing, and Sichuan, which includes 5000 households and 14,859 individuals. The 2009 RUMiCI urban sample collects detailed information regarding demographic characteristics of members of the household, their income, earnings, job situation, as well as medical expenditures and types of health insurance information. Among valid records, 23.82 % individuals are with zero-medical consumption. Definition and summary statistics of selected variables are presented in Table 1.

Table 1 Definition and summary statistics of selected variables

Model

Estimating the effects of determinants of medical expenditure requires the treatment of large fraction of zero values. Researchers have dealt with this issue by estimating with either selection model or two-part models. The adoption of the selection model is criticized by Dow and Norton (2003), who argued that no such selection problem exists when the observed zero expenditure is the true consumption, as opposed to latent potential outcomes. In addition, the objection to the selection model also centers around the fact that they assume a bivariate normal distribution between the error terms, which are known to be sensitive to departures from normality (Duan et al. 1983; Goldberger 1983). Along with the selection model, the use of the two-part model assumes the independent process of decision to medical spending and the decision on the level of expenditure. Although it is not satisfactory on the grounds of this potentially restrictive assumption (Hay and Olsen 1984; Maddala 1985), it is demonstrated to be a robust estimator even if the true model is a selection type (Duan et al. 1983). Additionally, the two-part model allows an investigation as to whether variables of interest have larger impacts on the participation or consumption decisions (Manning et al. 1995). Therefore, two-part models are frequently employed benchmarks in health economics research that involves observations with a cluster of zero (Doorslaer and Wagstaff 2000).

The first stage in the two-part model is to run a logit regression to check factors affecting the likelihood of urban residents’ care-seeking behavior. The participation equation is:

$$\begin{aligned} \Pr ob(D_i =1)=\exp \{Z_i \alpha \}/[1+\exp \{Z_i \alpha \}] \end{aligned}$$
(1)

where \(D_{i}\) is a dichotomous variable defined as 1 if individuals has positive health care spending, and 0 otherwise. \(Z_{i}\) are elements that influence the decision of medical consumption for urban residents. In the second stage, given the positive medical consumption, a log-linear model is created to analyze the effects of various factors on the conditional mean of medical expenditure.

$$\begin{aligned} Ln(Y_i |Y_i >0,X_i )=X_i \beta +\varepsilon _i {\begin{array}{ll} &{} {\varepsilon _i } \\ \end{array} }\sim [0,\sigma _i^2 ] \end{aligned}$$
(2)

similarly, \(X_{i}\) represents decisive factors affecting the level of medical consumption. \(\varepsilon _i\) is the error term.

Departure from the existing literatures, our paper argues that valuable information could be concealed by merely considering the effects of various determinants on the mean value of medical consumption. Consequently, we employ a quantile regression to further inspect heterogeneity of impact factors, and to mine the disparities of effects of the same factor on different medical consumption groups. The quantile regression is outlined as:

$$\begin{aligned} {\begin{array}{ll} {Q_\tau (Y_i \left| {Y_i >0,X_i )=X_i^{\prime } } \right. \beta _\tau }&{}\quad {\tau \in (0,1)} \\ \end{array} } \end{aligned}$$
(3)

by solving the below minimization problem,

$$\begin{aligned} \beta _\tau \in \mathop {\arg \min }\limits _{\beta _\tau \in R^{k}} \left( \sum _{(i:Y\ge X^{\prime }\beta _\tau )} {\tau \left| {Y_i -X_i^{\prime } \beta _\tau } \right| } +\sum _{(i:Y<X^{\prime }\beta _\tau )} {(1-\tau )\left| {Y_i -X_i^{\prime } \beta _\tau } \right| } \right) {\begin{array}{ll} &{} {\tau \in (0,1)} \\ \end{array} } \end{aligned}$$

we get coefficient vector \(\beta _\tau \)at different quantiles on the medical expenditure distribution.

In each of the regression, we include the following set of independent variables: income, age, education, gender, working status, ethnic minority, marital status, self-reported health status, whether suffer a physical disability, type of health insurance plan. Income may be positively related to medical expenditures. The variable of income is defined as the natural logged monetary value of income. Age is defined by years. It is associated with chronic health decline and need for health care services. Males and females may have different attitudes and needs for health care services. Gender is indicated by a dummy variable in which female is 1 and male is 0. Working status is defined by a series of dummy variables to indicate whether the respondent has held a full-time job in the past year, retired, or being unemployed. Labor force participation implies a higher time cost for seeking health care, which may have a negative influence on medical expenditure. Ethnic minority is a dummy variable to indicate whether the respondent is a member of ethnic minority, who may have different culture or attitudes for seeking health care. Marital status is defined by a series of dummy variables to indicate whether the respondent is single, married, divorced, or widowed. Levels of education are associated with attitudes to health care and good health behaviors, which may have certain influence on health care utilization. We therefore include schooling years into the regressions. Health status should have strong influence on seeking health care. Respondent’s self-reported health status is defined by a series of five categorical variables: very good, good, general, poor, and very poor. We also include a dummy indicator for those who suffer a physical disability that affects their normal work or living. Finally, types of health insurance plan expressed as a series of dummy variables are used to indicate whether the respondent has GIS, UEBMI, self-purchased commercial insurance and no insurance at all.

Potential endogeneity and selection issues

Endogeneity problems may pose significant challenges to our analysis. One concern at here might be that individuals may self-select into different insurance status, i.e., individuals who expect higher medical expenditure may choose more generous insurance plan. As discussed in “Background” section, the two major public health insurance plans (GIS and UEBMI) are all employer-sponsored and managed by local governments, and thus self-selection into these two types of health insurance might not be a significant issue. As one way to deal with the potential selection issue, in addition to the full sample regressions, we will also exclude those observations that have private health insurance plans from the quantile regression as a sensitivity analysis. One may still argue that certain individuals could purposefully select into particular types of job, and then these individuals will have different types of insurance, for example, GIS. This clearly can bias the results. In Table 2, we compare various demographic, socioeconomic, and health factors between the two groups of individuals. Generally speaking, we cannot observe systematic differences in those characteristics between the GIS holders and UEBMI holders. As another way to address this concern, we will also run the regressions separately for the GIS holders and UEBMI holders to see whether we can reach the same conclusions.

Table 2 GIS and UEBMI holders

Another concern might be related to the omitted variable problem. Residents in different regions may have different income levels and face different supply-side factors of health care services, which could have significant impacts on medical expenditures. Moreover, as we discussed in the second section, various health insurance plans are mainly managed at province level, and thus people in different localities may face different insurance policies. To address this concern, we add a group of provincial dummies into the quantile regression.

Against the conclusions in Duan et al. (1983, 1984, 1985), Leung and Yu (1996) argue that if the true model is a selection type, the two-part model might be problematic. Therefore, we will also run a Heckman selection model to check whether results are markedly different between the two modeling choices.

Empirical results

Two-part model

Table 3 presents the two-part model estimates.

Table 3 The two-part model

Income and education have been suggested in previous studies as important predictors of one’s socioeconomic status. In this study, although income does not significantly affect people’s likelihood of using health care service, once people with higher income get sick, they expend more on their health care. Better educated persons are more likely to visit the hospitals, but their medical expenditures are not significantly different from others. We also find that both age and gender significantly affect the likelihood of individual’s visiting hospitals and the costs on medical expenditure. Older people are more likely to visit the hospital and their medical expenditure is higher than that of those younger. Meanwhile, female are more inclined to access health care service, and their medical expenditure is larger than that of male on average.

We break urban residents’ work identity (ID) into four categories and consider the working or employed individuals as the benchmark, other people are divided into groups of inability to work, unemployed group, and not ready for job group.Footnote 4 Relative to the working employees, individuals in other work identity groups show no significant differences in their frequency to access the medical service. Nonetheless, they uniformly have higher medical expenditure than the benchmark.

Health status (Health) is one of the most important factors that determine people’s propensity to use health care. With a deteriorating trend of self-reported health status, people’s medical expenditure increases progressively. People who report their health to be general and poor tend to have a higher frequency of hospital visit. We also find that people with a physical disability neither visit hospital more nor expend higher on their medical expenditure.

For the health insurance plans (InsType), individuals with their medical and health services publicly funded by the state or work unit are regarded as the benchmark. People with UEBMI and commercial health insurance plans are not remarkably different from the benchmark in either their probability of health care utilization or medical expenditure. However, individuals who have to pay their health cares by their own have a lower frequency of hospital visiting and spend much less on their medical cares relative to the benchmark.

Marriage status (MariStatus) also influences the likelihood of hospital visiting. Relative to the single person, married, divorced, and widowed people access the health care service more often. Nonetheless, their medical expenditure does not significantly depend on their marriage status.

Quantile regression

Table 4 lists the heterogeneous quantile coefficients of various determinants of the medical expenditure, and offers supplemental information in understanding how those determinants influence people’s medical consumption. The Wald test reported in the notes below Table 4 indicates the presence of quantile effects.

Table 4 The quantile regression

It can be seen from Table 4 that the marginal effects for work identities (ID), marital status (MariStatus), and minority status do not have a clear pattern across the medical expenditure distribution, and their mean estimates reflect their impacts on medical expenditure approximately as their coefficients spread randomly around the mean coefficient. However, when the marginal effects from certain determinants display a regular pattern, policy design simply based on the mean coefficients can be incomplete. For example, the coefficient regarding the gender indicates an obvious decreasing pattern along the medical expenditure distribution, reflecting that its mean coefficient significantly underestimates female’s health care expenditure at the lower end of health care consumption, and overestimates the difference of medical payments between male and female at higher quantiles along the expenditure distribution. Meanwhile, age positively contributes to the medical expenditure with decreasing marginal effects. Therefore, the mean estimate underestimates the age effect for subgroups with lower medical expenditure, and overestimate age effect for individuals with higher health care spending.

The estimated income elasticity (Income) in the quantile regression indicates a decreasing pattern, as reflected in its coefficient of 0.317 at the first quantile versus that of 0.262 at the third quantile. The mean estimate underestimates the income elasticity at lower levels of health care expenditure, and overvalues income elasticity for those with higher medical payments. The disparity of income elasticity on the medical expenditure distribution implies that, income imposes greater effects on health care consumption when the medical expenditure is at small scale, and this effect is reduced as the medical expenditure becomes larger.

Referring to the health status, the quantile regression reveals that, with the deterioration of individual’s health condition, his/her medical expenditure increases accordingly. For those who report their health condition to be general, poor, or very poor, the marginal effects of health status along the health expenditure distribution indicates an increasing pattern, implying that the mean coefficient seriously underestimates the health care costs for those with high medical expenditure and overestimates the health care costs for those with low medical expenditure. Therefore, the mean estimates are insufficient to reflect marginal effect of health status on the medical expenditure. In addition, different from the indistinctive effects of disability on medical expenditure, the quantile regression reports that, at the 10th and 50th quantile, people who suffer certain kind of disability spend more on their health care than a normal person.

Previous studies suggest that in China, publicly funded health insurance program motivates inefficient use of health care resources (Gao et al. 2001; Liu et al. 2002). Results in our study indicate that health care expenditures are closely related to insurance status. Comparing with the people with health care insurance, individuals without such benefits consume much less medical resources, especially for those with lower medical expenditures.

One important conclusion that can be drawn from results of the quantile regression is that medical expenditures at the upper end of the distribution are mainly driven by need factors such as poor health status; one the other hand, medical expenditures at the lower end of the distribution are under stronger influences of non-need factors such as income, gender, and insurance status.

Potential endogeneity problems may have significant influences on this conclusion. First, residents in different regions may have different income levels and face different supply-side factors of health care services, which could have significant impacts on medical expenditures. Moreover, as we discussed in the second section, fundamental rules on various health insurance plans are mainly designed at provincial level, and thus people in different provinces may face different insurance policies. To address this concern, we add a group of provincial dummies into the quantile regression, and the results are reported in Table 5.Footnote 5

Table 5 Quantile regression with province dummies

Another concern might be that individuals may self-select into different insurance status. To address this concern, as discussed in “Data and model” section, we first exclude those observations that have private health insurance plans from the quantile regression. The results are reported in Table 6. We also run the regressions separately for the GIS holders and UEBMI holders, and report the results in Tables 7 and 8.

Table 6 Quantile regression without self-purchased insurance(with province dummies)
Table 7 GIS subgroup quantile regression with province dummies
Table 8 UEBMI subgroup quantile regression with province dummies

Results in Tables 4 and 5 are different. It is clear that adding province dummies changes the coefficients. Similarly, when we run the regression using different subsamples, we reach different sets of coefficients in Tables 67, and 8.Footnote 6 However, generally speaking, we still can observe the same pattern as in Table 4 in all the cases. That is, medical expenditures at the upper end of the distribution are mainly driven by some need factors and medical expenditures at the lower end of the distribution are under stronger influences of some non-need factors. To highlight the pattern in Tables 45 and 6, we summarize the results for several important variables in Table 9. In the view of space consideration, we only report the results for the most important need factors (general, poor, and very poor health) and two non-need factors (income and gender) at 5th, 25th, 75th, and 95th quantile. From Table 9, we can clearly observe that the influences of income and gender on medical expenditures become weaker, and the influences of general, poor and very poor health on medical expenditures become stronger at the upper end of the distribution.

Table 9 A summary of results

Finally, we compare results of the two-part model to results obtained from a Heckman selection model in Table 10, and we find no systematic differences.

Table 10 Results from Heckman model and two-part model

Conclusion

In this paper, we apply a quantile regression method to investigate the heterogeneous effects of various determinants of medical expenditure in China. Comparing with the mean effects, we find that health care expenditures at the upper end of the distribution are under stronger influences of need factors such as poor health status, and weaker influences of socioeconomic factors and insurance status. On the other hand, health care expenditures at the lower end of the distribution are under stronger influences of socioeconomic factors and insurance status, and weaker influences of need factors.

Many countries have been trying to expand their public health insurance coverage in recent years. To achieve two fundamental policy goals—equity in health care utilization and control of health care costs—policymakers need a better understanding of the underlying determinants of individual health care expenditure beyond the results of mean regressions. This study may provide useful information to policymakers for the optimal design of their health care systems. Health care consumers in many countries face a linear copayment rate after they reach a given deductible. Our results suggest that a regressive co-payment schedule may promote equity in health care utilization and reduce overall health care costs to the society. A higher copayment rate at the lower end of the medical expenditure distribution may reduce health care demands that are not driven by need factors. On the other hand, medical expenditures at the upper end of the distribution are under weaker influences of health insurance compared to the need factors. A lower co-payment rate may not cause large increases of expenditures but improve equity. Although it has different cost implications to the public insurer and individuals, it may reduce overall health care costs from the perspective of the society. This result may be of particular interests to the health policymakers in China, where is currently still in a period of reshaping its health-care system.

Our study has two limitations. First, due to data limitation, we are unable to examine the effects of more detailed features of health insurance plans (such as deductibles, copayment rate, etc.), which may have important implications to the design of a health insurance plan and deserve some further studies. Secondly, although “Model” section shows that the potential endogeneity problems may not significantly change our fundamental conclusions; as a major challenge to most empirical studies, the endogeneity issue deserves some further exploration when it is possible.