Introduction

In many developed countries, health insurance systems suffer from increasing expenditures [1] which is a financing challenge. The increase in expenditures is mainly driven by technological progress, demographic change, and inefficiencies in the health care system, one of them being moral hazard [2].

In general, co-payment can be a device to reduce moral hazard. It may reduce demand for health care services by increasing the price paid by the consumer at the time of consumption [3, 4]. The magnitude of the effect depends on the price elasticity of demand. Imposed on price-elastic health care services, co-payment may be shown to reduce the demand.

Indeed, there is empirical evidence that co-payment—independent from the exact design—can reduce demand in the health insurance market. In the RAND Health Insurance Experiment, medical demand becomes smaller as the level of cost-sharing increases [5]. Similarly, in the Oregon Health Insurance Experiment, randomly extending insurance coverage increases the use of health care services [6].

Furthermore, there is non-experimental literature on compulsory co-payments. A rise in co-payments for doctor visits and drug prescriptions among retired public employees in the US reduces both kinds of medical utilization [7]. Likewise, for Switzerland, it has been found that in contrast to cost-sharing, full insurance coverage increases health care costs and decreases the probability of having zero health care expenditure [8]. In 2004, the German statutory health insurance (SHI) introduced a co-payment that had to be paid for every first doctor visit in each quarter. According to Farbmacher and Winter [9], it leads to a significant reduction in the probability of visiting a doctor by 4–8% points. Effects are higher for younger than for older adults. The authors find for young male adults that the co-payment reduces the number of doctor visits by 0.2–0.3 visits per quarter, while young women do not seem to be affected. In addition, in a subsequent study, Farbmacher et al. [10] estimate a significant increase in the probability of not visiting a doctor, while the subgroup of elderly women with more severe diagnoses and a higher level of drug consumption is rarely affected. In contrast, Kunz and Winkelmann [11] use a different methodology and do not find any effect of this co-payment.

Finally, there is literature on settings in which individuals are free to choose more or less insurance coverage. Schmitz [2] concludes that with less insurance coverage, the probability of visiting a doctor is reduced for insured individuals with previously few doctor visits, while individuals with previously many doctor visits are not affected. At last, optional deductibles are found to reduce the number of doctor visits and the probability of visiting a specialist. Medical expenditures are decreased, also in the medium term. The effect is stronger for higher deductibles [12,13,14].

All in all, the literature suggests that deductibles—be they compulsory or optional—can reduce demand for health care services. Interestingly, the design of the cost-sharing scheme plays an important role. Hayen et al. [15] find that with a deductible scheme individuals react nearly twice as strongly compared to a premium refund scheme. However, the authors use data from The Netherlands, where insurance contracts are shaped differently from Germany. One crucial difference is that deductibles (premium refunds) increase (decrease) by one euro with every euro that is caused as costs to the sickness fund, until some threshold value is reached. Therefore, results cannot be generalized to the German system, where any health care resource use (apart from some exceptions) cuts the premium refund to zero (see below). Furthermore, the German system differs from the Dutch system with respect to the type of exceptions that do not cut the premium refund.

Premium refunds are rather new to the German SHI and will be analyzed in this paper. They work differently from deductibles but pursue the same goal. Besides being optional, they are known for not comprising any risk of loss for the insured. Broadly speaking, if individuals who chose a tariff including premium refunds do not make use of the health insurance for one year, they will be rewarded by cash. There is no risk of paying more than a person that stays in the default (“full insurance”)Footnote 1 tariff, only the chance of missing the reward. Since individuals are often risk-averse, this may be a fitting solution when facing the incentives of deductibles without having the risk of paying more than in the default tariff. Insured persons are exempt from the premium refund as soon as they make use of their health insurance, so the reward scheme is highly nonlinear. Nonlinear schemes are of high policy relevance and credible evidence on their impact was found earlier [10, 16, 17]. Yet, it could be a further improvement to look at specific diagnoses when analyzing moral hazard behavior.

My research question is whether premium refunds are a device to reduce ex-post moral hazard as well. Individuals that have opted for such a tariff will make use of the health insurance only if their utility from treatment is at least as high as their utility from forgoing treatment and receiving the premium. Due to information asymmetry, hospital treatment, drug prescriptions, and follow-up doctor visits are primarily decided on by the doctor. In contrast, the decision to visit a practitioner for the first time in the respective year often lies with the patient. This applies especially to general practitioners (GPs) and only to a lesser extent to specialists.

Moreover, one could imagine that participants of the premium refund tariff have an incentive to avoid visiting a doctor in the case of a trivial disease that can also be cured by means of self-medication, e.g., a common cold. Then, demand is supposed to be price-elastic. Assuming that nearly everybody gets a common cold once in a while, everybody is affected by this sickness in a similar way. If individuals visit a doctor due to a common cold, this does not mean that they are sicker than individuals with a common cold who do not visit a doctor. Instead, it reveals their behavior.

I use administrative data from two German sickness funds which both offer a premium refund tariff. I estimate the effect of choosing this tariff (in contrast to staying in the default tariff) in the year 2010 on the probability of visiting a GP or a specialist as well as on the probability of visiting a doctor due to a common cold in the same year. I am aware of the likely selection bias and use regression adjusted propensity score matching as well as a rich set of control variables. Selection is likely to result from the voluntary nature of the tariff. Younger and healthier individuals are especially more likely to opt for such an insurance scheme. Although multiple efforts are undertaken to reduce the selection bias, it is never possible to completely eliminate it using only matching on observables and OLS regression analysis. Therefore, results may not be interpreted in a causal manner. However, to support my findings, I carry out many robustness checks and I assess the level of remaining unobserved heterogeneity by applying a method proposed by Altonji et al. [18] and Oster [19]. Since this paper is motivated by increasing expenditures in the health care sector, I additionally estimate the effect on the sickness funds’ medical expenditures for practitioners (GPs and specialists), although one has to keep in mind that due to information asymmetry in most cases, the patient cannot fully decide on the type and extent of treatment.

The literature suggests that the degree of moral hazard varies across individuals, e.g., by the extent of demand and the health status prior to the introduction (or removal) of any cost-sharing [2, 16], or by the type of disease the individual is suffering from [20]. Therefore, I repeat the analysis for subgroups to find out whether the effects are heterogeneous.

The contribution of this paper to the existing literature is first that by identifying relevant ICD-10-GMFootnote 2 diagnoses, it is possible to directly test whether a reduction in medical demand is due to reduced ex-post moral hazard. This is not always clear if medical expenditures or doctor visits are analyzed. Second, to the best of my knowledge, this is the first study that analyzes the effect of premium refunds among the German SHI on moral hazard behavior.

The paper is organized as follows. “Background” gives some background information on the legal regulations, while “Identification and estimation” discusses the identification and estimation strategy. In “Data”, the data are described. Results are shown in “Results”, and section “Discussion” discusses the results and concludes.

Background

The German health insurance system is characterized by the coexistence of a statutory and a private health insurance system. The vast majority of the population in Germany (85.34% in 2010) is covered by SHI [21, 22]. Everybody is obligated to insure themselves. However, not everyone can decide between the two systems. Only high earners, self-employed, and civil servants can (but do not have to) choose private health insurance. If they opt for the SHI, they are referred to as voluntarily insured, while the rest below pension age are mostly compulsorily insured. Within the system of the SHI, there were 169 sickness funds in 2010 [23]. Sickness funds have to accept all applicants, irrespective of their health status or income. The contribution to the SHI depends on the insurance member’s gross wage only, not on the individual’s health status. Dependent children and non-working spouses can be co-insured free of charge (“co-insured family members”). Insured persons are free to choose their favorite provider, regardless of the sickness fund that they are insured with. Most benefitsFootnote 3 are identical for all sickness funds and are provided as benefits-in-kind. In addition, sickness funds can offer supplementary benefits which may vary between sickness funds but which represent only a small part of all medical services.

Premium refunds are rather new to the SHI. After some pilot projects [24], they were introduced to the whole system in 2007. In 2010, there were 152,571 individuals enrolled in the tariff [21] which corresponds to 0.2% of all statutorily insured persons. It can but does not have to be offered by the sickness fund. If the insurance company installs the tariff, it is offered to all members that have been insured with the sickness fund for at least three months. Enrollment in the tariff is voluntary and may start any time of the year. Insurance members decide whether they want to participate in the tariff, co-insured family members have to follow accordingly. If a person enrolls during the year, some sickness funds allow for retrospective enrollment (i.e., enrollment applies to the whole calendar year), while others restrict enrollment to the remainder of the year. Given that an insurance member and the co-insured family members do not cause expenditures within one calendar year, the insurance member receives a refund of 1/12 of his annual insurance contribution.Footnote 4 Otherwise, the refund is cut to zero. Thus, the plan is highly nonlinear. The only treatment allowed where the refund is not lost is for the under-18s, early diagnosis examinations, and prevention.Footnote 5

If individuals choose the premium refund tariff, they are bound to the respective sickness fund for one year which is one reason why not everyone wants to enroll. Another reason for the relatively small share of enrollees is that sickness funds do not promote this tariff very strongly.Footnote 6 Furthermore, especially women in their childbearing years will not choose this tariff if they plan to collect a prescription for contraception every (second) quarter. Likewise, those under permanent medication do not have a reason to register for this tariff. Finally, for many individuals, it is more attractive to choose another tariff which is often offered by sickness funds besides the premium refund tariff, the so-called deductible tariff. Here, individuals pay a lower premium to their sickness fund, but in the event of a sickness, they bear the risk of paying more than the premium of the default tariff. The advantage of the deductible tariff is that even if individuals visit a doctor for a minor issue, they might only pay a small deductible, so that all in all, they are better off compared to the default tariff. Since insured individuals are not allowed to combine the premium refund tariff and the deductible tariff, many of those who are willing to choose any non-default tariff will select the deductible tariff.Footnote 7

If sickness funds offer the premium refund tariff, they must prove to their respective supervision every three years that the tariff pays for itself, i.e., cross-subsidization is not allowed. This is supposed to prevent the rise of insurance contributions due to this tariff.

In addition, sickness funds are allowed to offer other tariffs. In the so-called bonus programs, the insured are rewarded if they can prove a healthy lifestyle and preventive actions. Again, participation is voluntary.

Identification and estimation

I observe two groups. The treatment group consists of individuals that participate in the premium refund tariff, whereas the control group does not. Treatment takes place during the whole year of 2010 (cf. Fig. 1). Outcomes, a group of various measures of medical demand, are quantified at the end of 2010 for the duration of the year, i.e., January 1, 2010, to December 31, 2010.

Fig. 1
figure 1

Timeline. The figure shows in which years the treatment variable, outcomes, and the different covariates are measured

I am interested in the average treatment effect on the treated (ATT), which is the difference in demand for medical treatment between persons in the treatment group that have been treated on the one hand and persons in the treatment group had they not been treated on the other hand:

$$\tau_{\text{ATT}} = E\left[ {Y\left( 1 \right)|D = 1} \right] - E\left[ {Y\left( 0 \right)|D = 1} \right],$$
(1)

where \(D = 1\) indicates that an individual belongs to the group that will choose the tariff. \(Y \left( 1 \right)\) is the demand of individuals that actually chose the tariff, whereas \(Y \left( 0 \right)\) is the demand of these individuals had they not chosen it.

Naturally, the counterfactual \(E[Y\left( 0 \right)|D = 1]\) is not known. Since participation is voluntary, individuals opting for the tariff differ from persons who do not, even if there was no treatment (selection bias). Therefore, I use the large group of non-participants to find individuals that are similar to the participants in all relevant (pre-treatment) characteristics.

The conditional independence assumption (CIA) will be violated if there are differences between participants and non-participants with respect to their risk, i.e., their health status, and with respect to their risk aversion [25]. Individuals maximize their expected utility. Therefore, if individuals expect high expenses in the future, they will prefer more coverage. The health status in the past is associated with the health status and the demand for health services, both in the future, and, therefore, also with the tariff choice. I use lagged values of health care claims and certain diagnoses as a proxy for the health status previous to the year 2010 (cf. Fig. 1, “Lagged covariates”).Footnote 8

Furthermore, risk-averse individuals might avoid co-payments of any kind and may, therefore, be less likely to choose the premium refund tariff. Simultaneously, they might show higher preventive effort [2]. As a proxy for the risk attitude toward health, I use the information of whether the individual participated in the sickness fund’s bonus program in any of the years 2006–2010 (cf. Fig. 1, “Covariate: bonus program”). Individuals that participate in a bonus program are more concerned with their health status and are, therefore, suspected of being risk-averse. Also, Cutler et al. [25] assess the “receipt of preventive health care as a behavior that likely captures individual risk aversion”. Since bonus programs primarily consist of preventive health activities, this is a useful proxy. I assume that risk attitude is stable over a period of five years, and that it is a valid signal if the individual participated in the bonus program in any of these years. The bonus program existed years before the premium refund tariff was implemented. Therefore, it is unlikely that choosing the premium refund tariff would affect participation in the bonus program. One limitation of this proxy is that the underlying risk aversion is presumably continuously distributed, while participation in the bonus program is a dichotomous measure. Therefore, it is useful to also match on pre-treatment outcomes (cf. Fig. 1, “Lagged covariates”). This approach accounts for historical factors that cause current differences in the dependent variable that are difficult to account for in other ways [26] and has frequently been used in applied research (see, e.g., [27,28,29]) to control for individual-specific unobserved heterogeneity. Being a rather unspecific approach, using lagged-dependent variables as proxy variables may also help to account for other sources of unobserved heterogeneity such as genetic factors or lifestyle factors.

I use socioeconomic information measured in 2010 to further adjust the two groups (cf. Fig. 1, “Contemporaneous covariates”). Besides age and gender, I control for insurance status and education as well as being insured with one of the two sickness funds.

While I chose some covariates due to economic theory, others were selected by an algorithm proposed by ImbensFootnote 9 [30]. In addition, the algorithm selected a large set of interaction terms which makes the model more flexible.

To take the CIA as given, it would be necessary to also consider the possibility of selection on moral hazard [31]. When opting for or against more insurance coverage,Footnote 10 individuals take into account how strongly they will react to an increase in insurance coverage. According to Finkelstein et al. [32], if deductibles are optional, those who are less responsive than average to consumer cost-sharing are more likely to choose deductibles. Individuals with higher price sensitivity would rather not choose deductibles. Although premium refunds and deductibles are not the same, they set similar incentives. Therefore, the estimated effects will resemble a lower bound and effects may even be up to two or three times higher [32] if participation was mandatory.

I use the Epanechnikov kernel estimator for the propensity score-based matching procedure.Footnote 11 Propensity score matching has the advantage of condensing the information of numerous matching variables into a one-dimensional measure. The Epanechnikov kernel estimator is appropriate for this application, because it takes many controls into consideration for every treated and gives more weight to rather similar than to rather different controls. Using a probit model, the propensity score is estimated as follows:

$${\text{participation}} = \alpha_{0} + X^{\prime}\delta + u,$$
(2)

where \(X\) represents the vector of covariates (cf. Table 1, plus a long list of interaction terms), and \(u\) is the error term. As a result, treated individuals are matched to controls that have a similar but not identical propensity score. There may still be discrepancies between the covariates of the two groups, even though differences have already been reduced by the matching procedure. Hence, the estimator may still be biased. One can attempt to reduce this (residuary) bias using regression methods [33]. Therefore, I combine matching with regression adjustment.Footnote 12 Using the matched sample, I regress each outcome on participation in the premium refund tariff and on all control variables that have also been used for the matching procedure. The regression model is

$$Y = \beta_{0} + \beta_{1} {\text{participation}} + X'\gamma + \varepsilon ,$$
(3)

where \(Y\) is one of the outcomes (cf. Table 1), \(X\) again represents the vector of covariates, and \(\varepsilon\) is the error term. In addition, the weights which result from the matching procedure are used in the regressions. In line with Schmitz and Westphal [34], in the OLS regressions, I employ robust standard errors, because they are easier to compute, even though they are slightly more conservative than bootstrapped standard errors [35].

The insured may be allowed to enroll retrospectively in the premium refund tariff, at the latest until the end of the calendar year. This leads to the problem that new participants are not necessarily affected by the tariff. Instead, one has to assume that a considerable share enrolls in the tariff by the end of the year if they discover they did not cause any insurance claims. I aim at removing this effect by eliminating all new participants of the year 2010 from the sample if they did not already participate in 2009.

The effect of more (or less) insurance coverage on medical demand consists of two parts [32]: The substitution effect is the moral hazard response, and therefore, the effect I am interested in. In addition, there may be an income effect, i.e., individuals with more insurance coverage can afford treatment which would be too expensive for them if they had less coverage. Here, the latter presumably does not exist. At the time of treatment, individuals paid the same insurance contribution as if they had stayed in the default tariff. In both tariffs, they have access to the same portfolio of benefits. The only difference is that at the end of the year, participants of the premium refund tariff lose a financial reward if they made demands for medical services. At the time of treatment, however, there should be no income effect. Therefore, what I will find is the substitution effect, i.e., the moral hazard response.

Whether the CIA is fulfilled cannot be directly tested. However, the assumption is supported if one does not find an effect of the treatment on a pseudo outcome, i.e., an outcome that is known to be unaffected by the treatment [33]. I repeat the analysis illustrated above, replacing outcomes with the pseudo outcome “probability of visiting a hospital in 2010”. Treatment in hospital is mostly associated with severe illnesses. Therefore, the demand should be price-inelastic and the effect is expected to be zero.

Finally, to get a better idea of how strong the omitted variable bias may still be, I apply a method that was proposed by Altonji et al. [18] and further developed by Oster [19]. They had the idea that the degree of selection on observables is a guide to the degree of selection on unobservables. Using the Stata command psacalc, I estimate the treatment effect for the various outcomes under three different assumptions: selection on unobservables is half as big as/as big as/twice as big as selection on observables.

Data

The panel data cover the years 2006–2010 and result from the billing processes of two German sickness funds. They cover the annual costs per insurance member, including co-insured family members but excluding under-18s. Costs contain expenditures for hospitalization, doctor visits, drugs, sickness payments, as well as so-called other costs.Footnote 13 Thereby, all relevant fields that are covered by the SHI, except for information on visits to the dentist, are included. Annual costs (and count variables, e.g., the number of doctor visits) are standardized (averaged) according to the number of members of the specific familyFootnote 14 as well as the number of days the family was insured with this fund in the respective year. The sample is limited to individuals who were insured for at least 150 days in the year 2010 as well as 150 days in sum of the years 2006–2008. This was done, because observing individuals for a few days only may lead to biased results. Furthermore, participants of the year 2010 that had not participated in the 2009 tariff were excluded from the sample.Footnote 15

Beyond costs, information on the date and the ICD-10-GM diagnosis for any contact with the health care system is available. To identify doctor visits due to a common cold, I use two different measures—the ICD-10-GM codes J00 (acute rhinopharyngitis) and J00–J06 (acute infection of the upper respiratory system). I identify treatment of the common cold in the data on practitioners and at hospitals’ outpatient departments. The data on practitioners differentiate between GPs and specialists.Footnote 16 For indicator variables (e.g., on diagnoses), the maximum per family is considered. Moreover, some socioeconomic information on the insured person is available. Finally, it is known whether the person participated in the bonus program. Table 1 provides an overview of all variables used in this paper and explains what they measure.

Table 1 Variable description

All in all, the insurance members’ structure in the sample is similar to that in the SHI with respect to gender and age (cf. Table 9 in the “Appendix”). For 2010, the raw sample contains 751,687 insurance members. After applying the above-mentioned inclusion criteria, the sample contains 439,143 insurance members, whereof 13,187 participated in the premium refund tariff. Thereof, 1492 members received a premium refund.Footnote 17 Once individuals chose the tariff, they often stayed with it for many years. Of the 13,187 participants in 2010, 12,120 and 10,072 individuals had already participated in 2008 and 2007, respectively.

This study analyzes the effect of premium refunds on a variety of outcomes. Table 2 shows mean values and standard deviations and reveals how these outcomes are influenced by the data processing. It is noticeable that individuals that participate in the premium refund tariff have lower medical demand compared to non-participants with respect to nearly all measures. Furthermore, it can be seen that trimming the data (column 2 vs. column 1) primarily affects the treatment group, while matching (column 3 vs. column 2) mainly has an influence on the control group.

Table 2 Descriptive statistics: mean values of outcomes in 2010

Results

Matching quality

After trimming the data, participants of the tariff still differ in some dimensions from unmatched non-participants (cf. Table 3). This becomes obvious through the standardized bias which lies far above 5% for most of the variables. Both groups are nearly of the same age and have a similar probability of participating in the bonus program. The distribution of the insurance status and education is similar between the two groups. This also holds for all probabilities of medical utilization (e.g., the probability of visiting a doctor due to a common cold). However, on average, the share of men is higher in the treatment group. Non-participants, on average, cause higher costs. This holds for all kinds of costs. Moreover, the number of times that they make use of the health care system (e.g., the number of drug prescriptions) is higher than for participants.

Table 3 Descriptive statistics: mean values of covariates

After the matching procedure, the average value of all covariates has converged between the treatment and the matched control group (cf. Table 3). The standardized bias is less than 5% for all variables that were used for matching. Thus, the matching procedure is successful [36]. Common support exists after having carried out trimming procedures.Footnote 18

Estimation results

Estimation results are presented in Table 4.Footnote 19 The probability of visiting a GP is significantly reduced by 2.6% points. In contrast, the effect on the probability of visiting a specialist is smaller and only marginally significant. These findings are in line with theory. Likewise, the number of visits to the GP is significantly reduced by 0.3 visits (− 7.4%), while there is only a smaller reduction of visits to a specialist (− 0.2 visits or − 3.5%, respectively). Moreover, I find that participants have a 0.7 or 2.1% point lower probability of visiting a doctor due to a common cold (depending on the definition of the ailment). This is a further indication that ex-post moral hazard behavior has been reduced. As expected, individuals avoid visiting a doctor due to trivial ailment such as a common cold. With a magnitude of 8 or 35%, respectively, this reduction is substantial.

Table 4 Estimation results (main specification)

Furthermore, I find a significant reduction in the medical expenditures for visits to the GP of 7 euros, while there is no significant reduction in expenditures for specialists. Although 7 euros does not seem to be much, it corresponds to a decrease of 7%, which is substantial.

Sensitivity analysis

I test whether results are stable and I carry out numerous robustness checks. First of all, to find support so that the CIA is fulfilled, I run regressions for the pseudo outcome (cf. Table 5). As expected, there is no significant effect of participating in the premium refund tariff in 2010 on the probability of visiting a hospital in the same year, and the coefficient is close to zero. Since the CIA cannot be directly tested, this is not a proof, but it supports the assumption. It implies that the treated observations are not distinct from the controls in that the distribution of Y(0) for the treated units is comparable to the distribution of Y(0) for the controls.

Table 5 Robustness checks I

Next, I vary the trimming procedure, since there is some area of discretion. For most outcomes, this does not lead to considerable differences (cf. Table 6, columns 1 and 2) and results are qualitatively robust to the exact cutoff for the trimming procedure, although they tend to become slightly smaller. It is noticeable that the probability of visiting a specialist becomes insignificant, and an effect should not be assumed. Furthermore, I vary the bandwidth from kernel matching. Exemplarily, results are shown for a bandwidth of 0.01 (cf. column 3). They are essentially the same as those in the main specification.

Table 6 Robustness checks II

I also try other matching estimators that rely on the propensity score. For nearest neighbor matching (1:30, cf. column 4), results are qualitatively similar to those in the main specification, only slightly smaller. For radius matching combined with regression, results are virtually the same as those in the main specification (cf. column 5). Moreover, I extend the minimum days an individual can be observed in the data from 150 to 365 (cf. column 6). This does not affect the results. Subsequently, instead of pooling the years 2006–2008 to create the lagged covariates and instead of leaving out 2009, I match treated and controls in the years 2006–2009 separately (cf. column 7). It is noticeable that the results are qualitatively the same as in the main specification, even if slightly smaller. Finally, I refrain from matching and trimming the data. Instead, I use OLS. The advantage of matching the two groups and trimming the data is that the common support can be ensured and groups can be made more similar. However, I want to assess its effect on the estimation. OLS results (cf. column 8) are weaker in magnitude but qualitatively similar to the main specification. All in all, results are stable over this variety of robustness checks.

Finally, to get a better idea of how strong the omitted variable bias still is, I apply Oster’s method [19] as already described above. Columns 1 and 2 in Table 7 show that, for the trimmed and matched data, the use of control variables in the regression is not important. Columns 3–5 make different assumptions concerning the degree of selection on unobservables relative to selection on observables. It becomes obvious that no matter whether selection on unobservables is smaller than (column 3), equal to (column 4), or bigger (column 5) than selection on observables, results are very stable, which is another reassuring result indicating that selection on unobservables is not strong in this application.

Table 7 Robustness checks III

Effect heterogeneity

Furthermore, I analyze how the effects are composed, i.e., whether subgroups are affected differently. I differentiate individuals by gender and age group. According to Table 8 (column 1), the subgroup of men reacts more strongly to the tariff’s incentives than the whole sample. For women (column 2), it is noticeable that I do not find a significant negative effect on the probability of visiting a GP. Some effects found in the overall sample become insignificant for women. Obviously, men react stronger to the premium refund tariff’s incentives than women.

Table 8 Estimation results by subgroups

Although there are three age groups that were analyzed, results show that they could be condensed into two groups. Individuals aged 34 and youngerFootnote 20 (column 3) and those aged 35–49 (column 4) react very similarly. Results are qualitatively the same as in the whole sample, but effects are slightly stronger. In contrast, individuals aged 50 and olderFootnote 21 (column 5) do not react to participation in the tariff at all. All coefficients are insignificant and they are mostly close to zero.

Discussion

This paper examines whether premium refunds are a suitable instrument to reduce ex-post moral hazard in the health insurance market. I use panel data covering the years 2006–2010 which result from the billing processes of two German sickness funds. I analyze the effect of participating in the premium refund tariff in 2010 on several health measures in the same year by combining propensity score matching and regression.

I find that participating in the premium refund tariff is associated with a significant reduction in the probability of visiting a GP (− 2.6% points). This is in contrast to Felder and Werblow [12], but in line with Farbmacher and Winter [9] and Health Policy Brief [6], although they report a higher reduction. However, this is not unexpected: since potential selection on moral hazard [31] was not accounted for in the present study, the estimated effects in this paper resemble a lower bound [32] and true effects may be higher. Like Farbmacher and Winter [9], I also find that effects are higher for younger than for older individuals and that men are more strongly affected than women. The number by which doctor visits are reduced in this study is of a similar magnitude as in Farbmacher and Winter [9] and the effect goes in the same direction as in Chandra et al. [7]. In addition, I find that the probability of visiting a doctor due to a common cold is decreased by 0.7 (or 2.1) % points. Both findings can be interpreted as evidence of reduced ex-post moral hazard. Obviously, the amount of the premium refund is high enough to encourage individuals to forgo unnecessary doctor visits.

Effects differ among subgroups. They are mainly driven by individuals aged 49 and under, and men have a stronger reaction than women. By contrast, individuals aged 50 and over do not react to the tariff’s incentives at all. A reason why women have a weaker reaction to these incentives might be that they are, in general, more risk-averse than men [37]. Probably, most women prefer a doctor’s opinion even in rather harmless situations. Individuals aged 50 and over do, on average, suffer from more severe illnesses compared to younger individuals. For these illnesses, demand is less price-elastic. This explains why they generally do not react to the premium refund’s incentives, and is in line with the previous research. Schmitz [2] finds that individuals that had high medical demands in the past—presumably ill individuals—do not react to the expansion of insurance coverage. Likewise, Gerfin et al. [16] observe that healthy individuals react much more strongly to incentives.

Even though I use lagged outcomes as proxy variables for unobserved heterogeneity, one possible weakness of this study is that relevant characteristics cannot be explicitly controlled for (e.g., lifestyle factors). Another limitation is that the data only comprise individuals from two sickness funds which may not be completely representative of all sickness funds in Germany.

This study focuses on contemporaneous effects. Further research is needed to truly identify causal effects, to consider more strongly the nonlinear nature of this scheme, and to find out about the long-term consequences of the tariff’s incentives.