1 Introduction

The causal effect of general health conditions on economic performance is intensely debated in the literature. A priori, health, as reflected in mortality of infants and adults, has an ambiguous effect on economic development as better health may increase the productivity of the workforce but, on the other hand, may also lead to faster population growth that dilutes the per capita effects. The main challenge for the identification of the total effect is the problem of reverse causality, since mortality is likely to be affected by economic development, for example, because rich countries can afford better health systems.

The existing literature has used two different instrumental variable strategies to circumvent the reverse causality problem and identify the causal effect of longevity on economic growth. Most of the literature, including recent work by Lorentzen et al. (2008) has used exogenous variation across countries, such as climatic factors, geographical features, or disease indices, as instruments for differences in life expectancy across countries, and has found positive effects of longevity on growth. A recent research, starting with the contribution of Acemoglu and Johnson (2007), has exploited within-country variation by applying time-varying instruments to identify the causal effect of life expectancy on economic growth, and has found mixed or even negative effects.

This paper aims to contribute to this literature by estimating the causal effect of health improvements as proxied by mortality reductions on economic growth by using a novel identification strategy that exploits within-country variation in a long panel of 12 European countries over the period from 1820 to 2010. In particular, we apply an instrumental variable approach that exploits variation in the introduction dates of universal public health care. Universal public health care systems in terms of the introduction of a public health insurance or the public payment of subsidies for health services imply (potential) coverage of the entire population. In the sample, we adopt a rather broad concept of universal public health care systems reflected by the introduction of access to health care for all people in need for health care, independent of their individual income. The novel feature of the identification strategy is its reliance on the particular timing of the implementation in each country, rather than on the implementation per se. While the implementation of public health care might be related to the level of economic development, the particular year in which the implementation takes place is largely random, since the implementation is typically the outcome of a lengthy political process with substantial uncertainty. In light of this fact, we apply a timing-of-events methodology (Abbring and Van den Berg 2003), in which identification is driven by within-country variation in mortality around the period of implementation, which can be used to identify causal effects on economic development.

The empirical results indicate that the introduction of public health systems led to a significant improvement in overall health conditions as measured by reductions in infant mortality and crude death rates. The second stage estimates provide evidence for a significant positive effect of overall health conditions on economic growth as well as on population growth.

These findings complement and qualify the existing estimates in the literature in several dimensions. Using cross-country data from the World Health Organization (WHO), the United Nations (UN) Population Division, and the World Bank, the empirical literature typically finds that an increase in adult mortality substantially reduces GDP per capita growth. Lorentzen et al. (2008), for example, find that an increase in adult mortality of one standard deviation reduces growth by 1.1 percentage points, mainly through the physical capital and fertility channel. This line of research cannot account for unobserved heterogeneity across countries or exploit health dynamics, since the instruments are, however, constant over time. Noting this, Acemoglu and Johnson (2007) use panel data for 47 countries from the League of Nations, the WHO and the UN, and exploit the drop in mortality from specific infectious diseases due to the international epidemiological transition as instrument for the change in life expectancy. This identification makes use of the fact that the mortality rate from these diseases was exogenous in 1940, because no treatments, medication, or vaccines were available before that time. By 1980, on the other hand, all these diseases could be treated or prevented in all countries due to medical advances and international organizations such as the WHO. The findings suggest a positive but insignificant effect of life expectancy on aggregate GDP and a positive significant effect on population growth. The total effect on GDP per capita is negative.

This finding led to controversial discussions about the identifying assumptions that drive the results. Bloom et al. (2009) argue that mortality from the specific diseases in the instrument by Acemoglu and Johnson was not exogenous in 1940 as countries like the USA had reduced their disease burden, e.g., from malaria, before 1940 and given the evidence that mortality from infectious diseases in the USA had peaked around 1900 as shown by Cutler et al. (2006). Acemoglu and Johnson (2009) clarify their identifying assumptions, leaving open whether the different results in the literature are driven by the different identification assumptions or by the different sample compositions in terms of countries and observation period, as suggested by Angeles (2010) and Cervellati and Sunde (2011).

The findings presented in this paper indicate that the positive effect of life expectancy on growth found in cross-country studies is not necessarily due to the use of time-invariant instruments. Second, the findings support the evidence of Cervellati and Sunde (2011) that the different results in the literature might be driven by differences in sample composition. Third, the findings suggest that improving health conditions might have a substantial effect on economic performance and that public health policy and the institutional environment might play an important role for economic development.

The remainder of the paper is structured as follows. Section 2 describes the data and presents and discusses the identification strategy. The main results are presented in Section 3, and results from additional robustness checks are discussed in Section 4. Section 5 concludes.

2 Data and identification

2.1 Data

We use data for 12 Western European countries over the period from 1820 until 2010. The data on outcomes in terms of economic development include information on GDP per capita, population size, and GDP, all collected from Maddison (2006). This data are available on a yearly basis and goes back to 1820. For lack of alternative data on health conditions over the long observation period considered here, we follow the literature and use infant mortality and crude death rates as measures for health conditions. Data on infant mortality rate and the crude death rate are taken from Flora et al. (1987), who provide information for 13 Western European countries from 1815 until 1975.Footnote 1 We do not consider Ireland and Germany because of too many missing observations and major territorial changes.Footnote 2 Additional data for Spain are collected from Mitchell (1992). In case of missing observations, we use data from Mitchell (1992) until 1988 and, after that, we use the mortality rates from the OECD Health Data (updated June 20, 2010). The infant mortality rate is defined as deaths under the age of one per 1,000 live births, i.e., stillbirths are excluded. The crude death rate is the number of deaths per 1,000 persons. As additional controls, we use variables that approximate the political institutions in terms of democratization of a country, reflected by a dummy that takes a value of 1 after the first observation of election rules in a specific country, as well as a variable that reflects the age of these rules. Dates for the first year of election rules are collected from Persson and Tabellini (2003). Additionally, we use the political regime characteristics and transitions (1800–2009) from the Polity IV project. From this index, we create indicators for autocracies, anocracies, and democracies, as suggested by Gurr (1974). Information about compulsory schooling laws is taken from Fort et al. (2011), Garrouste (2010), and Kunnskapsdepartementet (2007). In order to account for structural change, we condition on the share of agriculture, industry (manufacturing), and services as a fraction of total GDP from Flora (1983).Footnote 3 Moreover, we collected the government expenditure as share of total GDP, the number of labor disputes (per million workers), the number of workers involved in labor disputes (per million workers), number of days lost in labor disputes (per million workers), and gross capital formation as share of total GDP from Mitchell (1992). Missing data for government expenditure and gross capital formation are completed with data from the OECD Annual National Accounts (Volume 2) and additional data about labor disputes are collected from the ILO Department of Statistics.

Using this information, we create an unbalanced data set with a 20-year frequency from 1850 until 2008 as described in the next section.Footnote 4 In the baseline specification, we have 84 observations. Descriptive statistics for the variables used in the estimation can be found in Table 1. We use discrete growth rates in our estimation, because the mortality growth rates are typically too large to be approximated by log differences.Footnote 5 The infant mortality growth is between −75 and 37 %. On average the growth rate is −36 %. The growth in the crude death rate exhibits less variation. It is on average −11 % and lies between −50 and 33 %. GDP per capita growth is on average 49 % over a 20-year period, which implies an annual growth rate of about 2 %.Footnote 6 The population growth is on average 14 %, and the average aggregate GDP growth amounts to 70 % during 20 years.

Table 1 Summary statistics

2.2 Identification strategy

The aim of this study is the estimation of the causal effect of health improvements on economic growth. The identification strategy uses the exact date of the implementation of a universal public health system in the 12 countries as exogenous variation. The respective implementation dates are reported in Table 2. Detailed information about the history of universal public health care systems can be found, for each country separately, in Online Appendix A. This identification strategy exploits the effect of the implementation of a public health system on within-country variation in health in order to identify the effect of variation in health conditions on economic growth. Of course, the implementation and existence of a universal public health care system is influenced by the initial economic situation. Poor countries might not be able to afford the costs of a public health system, while rich countries can afford them. However, we argue that the timing of the introduction is exogenous and driven by many complex political processes that are unrelated to current economic performance. Due to the small sample size, it is possible to identify the driving political forces behind the implementation of a public health system and to investigate the plausibility of the identification assumption of the exogeneity of the implementation date in detail.

Table 2 Introduction dates of a universal public health care system

As mentioned in the introduction, we use a broad concept of universal public health systems. According to Mackenbach (1996), the introduction of universal public health systems improves life expectancy especially for children and less endowed individuals who, in contrast to rich individuals, cannot afford the contributions to a private insurer. Upon their introduction, many universal public health systems indeed only cover a small fraction of the population, often just children and very needy people. This implies that the introduction of a public health system can be expected to affect overall health in different ways. To capture this, we use infant mortality as well as the crude death rates as different proxy measures of longevity and health. We expect that the introduction has a stronger impact on infant mortality.

A first indication of exogeneity of the implementation date is the randomness of the implementation process. For example, in the Netherlands, several attempts to introduce a universal public health care system failed. Finally, the first universal public health insurance system was introduced under the German occupation in 1941. Other countries which were also under German occupation did not introduce a universal public health care system during World War II. In Spain, the Franco regime introduced the first universal public health care system in 1942. Earlier attempts to introduce a public health care system failed because there was no majority in parliament. We therefore think that the implementation dates are driven by events which have a high degree of uncertainty and are difficult to predict. In particular, we think that there are good reasons to assume that the implementation dates are exogenously determined and not driven by economic growth.Footnote 7

A second indication for the validity of this identification strategy is an evidence suggesting that the introduction of a universal public health care system took many years and that the timing was heavily influenced by the political regime. In an autocratic regime, like in Italy at the time when the health system was implemented, the government could decide about a public system by itself. In a democracy, a majority in the parliament or in a referendum is required. This takes typically much longer. For example, in Switzerland, the first referendum was rejected in 1899. Only 12 years later the health insurance law was finally passed by another referendum. Evidence strongly suggests that the introduction dates depend on the type of government. Lindert (2004) distinguishes between elite democracies and full democracies.Footnote 8 He argues that elite democracies are less willing to set up a government-financed social programs, in comparison to full democracies. In monarchies, like Austria, social insurance systems were introduced relatively early in order to reduce the power of the socialists.

A third factor is the type of health care provider in place before the introduction of a universal public health system. Private insurers tend to oppose a public insurance scheme. Especially in the Netherlands, they prevented the introduction of an early public health system for a long time. In other countries like Denmark or Finland, corporate groups or municipalities were responsible for early health care. They typically supported the introduction of a universal public health system, speeding up the implementation in an international comparison.

A fourth factor is the type of public insurance that is introduced. We do not distinguish the introduction of a health system that covers the entire population from the beginning, from the introduction of subsidies for health services only for a specific subpopulation of needy people.Footnote 9 When an adequate insurance scheme is already in place before public health care, the government can simply pay subsidies to these institutions, as was the case in Belgium and Denmark. The payment of subsidies typically covers only very needy people and is therefore less expensive. Laws for such systems can pass the parliament more easily than a more comprehensive health law like in the UK, where the public health insurance scheme covers the entire population, from the first day of introduction. Not distinguishing among the different types of public health care systems might constitute a problem because it potentially weakens the link to health outcomes. On the other hand, the consideration of different types of public health care implies a large degree of randomness in the timing of the implementation.

Thus, the introduction dates of a universal public health care system are influenced by factors that can either be controlled for or that are independent of the economic situation, so that the dates of implementation can be assumed to be plausibly exogenous for the purpose of this paper. Note that this does not imply that the fact that a public health care system is eventually implemented is independent of the economic situation. The exclusion restriction (that the implementation of a public health system affects economic growth in the intermediate aftermath of the implementation only through effects on public health conditions) appears plausible, in particular in light of the fact that we consider growth rates rather than levels in our outcome specifications. If anything, the identifying assumption is conservative, since less developed countries can also have high income or population growth rates (e.g., during a convergence process or the demographic transition). It is therefore unlikely that the growth rate has an influence on the introduction date of a public health care system, especially when one takes into account the fact that such an introduction takes many years. Moreover, since underdeveloped and less developed countries might simply not be able to afford the costs of a public health care system, we consider in our sample only relatively developed countries.Footnote 10 We argue that during the entire observation period, each of the countries in our sample could, in principle, afford the costs of a public health system.

A potential concern with the identification could be that the results are driven by global trends or events. We argue that this is not the case, since our instrument has variation both in the time and in the cross-country dimension. Accordingly, important events that affect global economic development are captured by the countries that are in the control group during that specific time period. This reasoning becomes apparent from the composition of the sample (see Fig. 1 and Table 3). As an example, consider the post-World War II boom years. There are 12 countries in the sample, for five of which (Finland, the Netherlands, Spain, Sweden, and the UK) the 20 years following the passage of health care legislation cover primarily the post-World War II boom years. For another three countries, the 20-year post legislation period occurred during the Second Industrial Revolution in the late nineteenth and early twentieth century. Four countries experienced the Great Depression during the first 20-years after the introduction of universal public health care. This implies that we observe seven countries in the post-World War II that did not implement a new universal public health scheme and that therefore serve as a control. As long as the assumption that the countries that implemented a new insurance scheme after the end of World War II would have experienced on average a similar economic development as the average country in the control group during the same period had they not implemented the insurance scheme, the identifying assumptions are satisfied. This appears reasonable given that we do not observe that public health systems were introduced by the allied forces systematically after the war.Footnote 11 Moreover, we will present robustness checks that carefully control for three different types of deterministic time dependence.

Fig. 1
figure 1

Timing of events: data structure. The figure illustrates the data structure. For each country we code the calendar time τ and the synthetic time index t, which is constructed relative to the country-specific introduction date of the universal public health care system. For each country, he universal public health care system is introduced in t = −20, which can be associated with a different calendar year t for each country (see Table 2). Starting with the introduction date, we construct a sample that contains only the years in a 20-years data frequency before and after the introduction. These years are different for each country, as becomes apparent from the figure. Vertical lines indicate that the constructed data set contains an observation for the specific country and year. (Notice that for each country one additional lag is observed, which gets lost when growth rates are calculated.) The instrument is equal to one when t = 0, i.e., in the first period after the introduction of the public health system (indicated by a dot); in all other periods the instrument is equal to zero. Time intervals indicated with a box illustrate this period, which begins with the year of introduction (t = −20) and ends in t = 0. Boxes are distributed over the calendar time 1880 to 1980, suggesting that countries did not systematically introduce universal health care at a specific point in time

Table 3 Coding of synthetic time index

2.3 Estimation strategy

In order to exploit the exact introduction dates, we construct the panel data as follows. In addition to calendar time τ, we create a synthetic time variable t, which is normalized to be equal to zero in the 20-year period after the introduction of a universal public health care system. We then calculate the growth rates for each country and 20-year time period on the basis of this synthetic time frame. Figure 1 and Table 3 illustrate that the exact years for which the growth rates are calculated are different for each country.

The universal public health care system is introduced in t = − 20, which can be associated with a different calendar year t for each country (see Table 2). Starting with the introduction date, we construct a sample that contains only the years in a 20-year data frequency before and after the introduction. For example, in Switzerland, the public health system was implemented in 1911. The synthetic time variable corresponds −40 in 1891, −20 in 1911, 0 in 1931, and so on. We compute the growth rates from 1891 to 1911, from 1911 to 1931, etc. In the Netherlands, where the public health system was implemented in 1941, the synthetic time variable corresponds to −40 in 1921, −20 in 1941, 0 in 1961, and so on. We calculate the growth rates from 1921 to 1941 and from 1941 to 1961 etc. As consequence, the exact years over which the growth rates for each country are computed are different. This procedure has two advantages. First, this approach calculates the exact growth rates directly after the introduction of the public system on a yearly basis. Second, it helps to disentangle common time factors, since we do not calculate the growth rates for exactly the same years, even though there is some overlap.

In the next step, we construct an instrument for the introduction of a universal public health care system, which is equal to 1 at time t = 0 of the synthetic time frame, and 0 otherwise (see Table 3). This means that the instrument is only equal to 1 immediately after the introduction of the public health care system, as indicated by the dots in Fig. 1. This is a precise way to identify the timing of the introduction, and hence its effect on health. This construction is conservative, as long-run effects are not captured by this strategy. Likewise, this identification strategy avoids that the results take up long-run time trends or shocks, which are unrelated to the introduction of public health insurance. Overall, exploiting a one-off variation in public health systems and constructing the data around the implementation date suggests that the exclusion restriction that the implementation of a health system affects growth only through improvements in health (conditional on other control variables) is plausibly satisfied.

The use of a 20-year frequency, despite data being available on a yearly basis, accounts for the fact that health is expected to have a long run, rather than an immediate, effect on economic development. Our data frequency is even high in comparison to Acemoglu and Johnson (2007), who use 40- and 60-year frequency, respectively. Indeed, the choice of the appropriate data frequency poses a trade-off. On the one hand, with a too low frequency, the effect of the introduction of universal public health care on health would be less precisely identified and the number of observations would become exceedingly small. On the other hand, the data frequency may not be too high because the change in health needs some time to affect economic development through the different channels. Moreover, using a very high data frequency would lead to other fundamental concerns, such as autocorrelation patterns. The use of a 20-year frequency is also in line with the findings of Angeles (2010) that suggest that it takes 20 years for fertility, and hence population dynamics, to respond to changes in mortality. For robustness, we also asses the sensitivity of our results to different data frequencies.

We are interested in estimating the causal effect of changes in mortality, measured by infant mortality or crude death growth rates, on GDP per capita growth, population growth and aggregate GDP growth, respectively. While all variables are observed in calendar time τ, they are coded in terms of the synthetic time index t. Consequently, all variables are indexed by the subscript i indicating the country, subscript τ indicating the year of observation, and t indicating the 20-year period in terms of the synthetic time index. Since the empirical analysis is conducted on the basis of a 20-year frequency, τ and t essentially contain the same information conditional on i (see Table 3), such that we can limit the notation to the synthetic time index t. The respective outcome variables (GDP per capita growth, population growth, and aggregate GDP growth) are denoted by Δy i,t for i = 1, ..., N and t = 1, ..., T. The variable Δm i,t corresponds to the change in mortality, either in terms of the infant mortality or the crude death rate. Additional lagged control variables are denoted by the vectors x 1,i,t − 1 and x 2,i,t − 1. Let s i,t be a selection indicator, where s i,t  = 1 if {Δy i,t m i,t ,x 1,i,t − 1,x 2,i,t − 1} are jointly observed in a particular country and observation period, and zero otherwise.Footnote 12 Then Δy and Δm are two 1 ×S vectors, where the number of observations are indicated by S, with \(S= \sum_{i=1}^{N} \sum_{t=1}^{T} s_{i,t}\). X 1 and X 2 are two k j ×S matrices (with j ∈ 1,2), involving k j lagged control variables, including deterministic time patterns and country-specific intercepts.Footnote 13 The regression model is specified as

$$ \Delta y = \widehat{\alpha} \Delta m + \widehat{\beta}_1' X_{1} + \widehat{\beta}_2' X_{2} + \widehat{u}, $$

where the residual vector \(\widehat{u}\) has an expected value \(E[\widehat{u}]=0\). The parameter \(\widehat{\beta}_j\) is a k j ×1 vector (with j ∈ 1,2) and the coefficient \(\widehat{\alpha}\) is a scalar. The estimated parameter of interest is \(\widehat{\alpha}\), which has no causal interpretation in this simple setting in light of the potential problem of reverse causality.

In order to identify causal effects, we use the introduction of a universal public health care system (\(z_{i,t} \equiv \Delta \textrm{Insurance}_{i,t-1}\)) as instrument for infant mortality growth and crude death growth. The instrument is equal to one in t = 0 and zero in all other periods (see Fig. 1 and Table 3). The identification of causal effects is based on assumptions about the instrument, which hold jointly conditional on the control variables X 1 and X 2. First, we assume that the outcome Δy has no systematic influence on the instrument z. Second, the instrument has only an indirect effect on the outcome variables, i.e., the introduction of universal health services improves the health status Δm and via this channel the expected outcome; alternative channels do not exist. These two assumptions are not testable, and we can only use economic arguments and plausibility tests to justify that they are satisfied. The third assumption, which states that the instrument has sufficient power to influence the endogenous variable, can be tested. In light of modest unconditional effects of the introduction of a universal public health care system, this instrument could potentially have little power. In order to account for weak instrument problems in the identification of the effect of interest, we use interactions between the instrument and other control variables, involved in the matrix X 1, as additional instruments (Angrist and Krueger 1991; Chamberlain and Imbens 2004).Footnote 14 This allows for more exogenous variation, since the introduction of a public health care system can have heterogenous effects with respect to the initial economic situation. The matrix X 2 is not interacted in order to avoid problems with too many instruments providing little additional explanatory power. The first stage regression is given by

$$ \Delta m = \widehat{\gamma}_1 z + \widehat{\gamma}_2' ( X_{1} \cdot \textrm{diag}(z' z) ) + \widehat{\delta}_1' X_{1} + \widehat{\delta}_2' X_{2} + \widehat{v}, $$

where the instrument z (dimension 1×S) is equal to one when a public health insurance is introduced in t − 1 and zero otherwise. The estimated residual vector is denoted by \(\widehat{v}\), with \(E[\widehat{v}]=0\). The parameter \(\widehat{\gamma}_2\) is a k 1 ×1 vector, the parameter \(\widehat{\delta}_j\) is a k j ×1 vector (with j ∈ 1,2), and the coefficient \(\widehat{\gamma}_1\) is a scalar.

In the second stage, we use the predicted mortality growth (\(\widehat{\Delta m}\)) as regressor,

$$ \Delta y = \widetilde{\alpha} \widehat{\Delta m} + \widetilde{\beta}_1' X_{1} + \widetilde{\beta}_2' X_{2} + \widetilde{\varepsilon}, $$

where the residual vector is represented by \(\widetilde{\varepsilon}\), with \(E[\widetilde{\varepsilon}]=0\). The parameter \(\widetilde{\beta}_j\) is a k j ×1 vector (with j ∈ 1,2) and the coefficient \(\widetilde{\alpha}\) is a scalar. The coefficient \(\widetilde{\alpha}\) represents the estimated causal effect of interest. This setup makes implicit functional form assumptions in terms of assuming linearity. Even though we do not think this parametric assumption reflects the reality exactly, we think it is a good approximation. Moreover, this specification corresponds to the canonical empirical growth model as used by, e.g., Barro (1991), Durlauf et al. (2005), and Acemoglu and Johnson (2007).

2.4 Preliminary analysis

In Table 4, we show descriptive statistics for periods when the public health systems are unchanged and for periods when a universal public health care system is introduced, separately. The lagged level of GDP per capita, which is an indicator for the initial economic situation, is lower on average in periods when the public health system has changed. Accordingly, we do not find a systematic positive correlation between the introduction of public health systems and the level of economic development. This is also confirmed when considering the economic and demographic growth rates. Only population growth is on average slightly higher in periods when a public health care system is introduced. We do not find evidence that public health care systems are introduced systematically in periods of fast economic or demographic growth. This supports the argument that the exact timing of the introduction of a universal public health care system is determined exogenously and not driven systematically by initial economic or demographic growth rates.

Table 4 Summary statistics separated for periods when public health care system is implemented

As plausibility test, we regress a dummy for the present introduction of a universal public health care system (\(\Delta \textrm{Insurance}_{i,t}\)) on GDP per capita growth (\(\Delta \textrm{gdpc}_{i,t}\)) and its components, population growth (\(\Delta \textrm{pop}_{i,t}\)) and GDP growth (\(\Delta \textrm{gdp}_{i,t}\)), respectively.Footnote 15 In order to account for deviations from the long-run equilibrium, we condition on log GDP per capita (\(\textrm{lgdpc}_{i,t}\)) and log population size (\(\textrm{lpop}_{i,t}\)), following the suggestions of Durlauf et al. (2005). The results in Table 5 show insignificant coefficients for the economic growth rates. Only for population growth that we find a weak influence on the instrument. This influence disappears as soon as we condition on the initial economic situation. All other growth rates have an insignificant influence and rather high standard errors. Accordingly, the economic and demographic growth rates have no predictive power for the introduction of a universal public health care system. As expected, the initial GDP per capita has some predictive power on the introduction dates. In the final specifications, we do not use the present but the lagged introduction of a universal public health care system as instrument (\(z_{i,t} = \Delta \textrm{Insurance}_{i,t-1}\)). Therefore, we are confident that our identification strategy is not prone to reverse causality.

Table 5 Logit regression of the dummy for the introduction of a universal public health care system on GDP per capita growth, population growth, GDP growth, and control variables in levels

3 Empirical findings

As first step of the analysis, we present the correlations between economic and demographic development and mortality growth that are obtained from ordinary least squares (OLS) regressions. Table 6 presents results from regressions of the outcomes (GDP per capita growth, population growth, and aggregate GDP growth) on changes in infant mortality as measure of health improvements. As suggested by Durlauf et al. (2005), all specifications condition on lagged log GDP per capita (\(\textrm{lgdpc}_{i,t-1}\)) and lagged log population size (\(\textrm{lpop}_{i,t-1}\)). This specification allows us to account for deviations from long-run equilibrium. Moreover, we condition on a lagged dummy for countries that have completed their demographic transition in the respective period (\(\textrm{PostTrans}_{i,t-1}\)), based on evidence by Cervellati and Sunde (2011) that the level of demographic development is an important factor for the effect of life expectancy on growth.Footnote 16

Table 6 OLS regression of GDP per capita growth, population growth, and GDP growth on infant mortality growth and other controls

We estimate six different specifications. In the first specification, we include a linear year trend. In the second specification, we allow for country-specific linear year trends. In the third specification, we include period fixed effects.Footnote 17 The other three specifications include country-specific intercepts (country fixed effects) in addition to a linear time trend, to country-specific time trends, or to period dummies, respectively.Footnote 18 It is worth noting that all specifications, particularly specification (6), are considerably more flexible than the specifications estimated in the existing literature, which has typically relied on cross-country variation or within-country variation over two periods and which therefore could not identify country- and time-specific growth patterns. We find that a one-percentage-point increase in infant mortality growth (\(\Delta \textrm{imr}_{i,t}\)) reduces GDP per capita growth (\(\Delta \textrm{gdpc}_{i,t}\)) by about 0.6 percentage points, has no significant effect on population growth (\(\Delta \textrm{pop}_{i,t}\)), and reduces GDP growth (\(\Delta \textrm{gdp}_{i,t}\)) by about 0.8 percentage points. To investigate to what extent these results are driven by infant mortality as particular measure of health, we conduct the same analysis using changes in crude death rates as alternative health measure. Table 7 presents the results. A one-percentage-point increase in crude death growth (\(\Delta \textrm{cdr}_{i,t}\)) has no or only a marginally significant positive effect on GDP per capita growth in the first two specifications. As soon as we control for period dummies, we find a weakly significant negative effect of 0.4 percentage points. A one-percentage-point increase in crude death growth leads to a reduction in population growth by about 0.1–0.2 percentage points. The effect on GDP growth is again only significantly negative in the specifications with period dummies. Thus, the overall pattern of the results is similar, even though there are slight differences in how the two health measures affect outcomes. Since we expect reverse causality, however, we suspect that the coefficients obtained with OLS are biased and expect the causal coefficients to be larger in absolute size (more negative).Footnote 19

Table 7 OLS regression of GDP per capita growth, population growth, and GDP growth on crude death rate growth and other controls

In the next step, we therefore apply the instrumental variable approach described in the last subsection. In Tables 8 and 9, we report the first stage estimation results for infant mortality and crude death rate growth, respectively. As suspected, we find that the interaction effects play an important role in predicting the health variables. The instrument and most of the interactions have a significant influence on infant mortality growth and crude death growth, respectively. It is difficult to interpret the coefficients because there might be a problem of multicollinearity due to the high number of interaction terms and the one-off variation in the instrument. This is no reason for concern, however, since the first stage only represents a prediction model, and we do not aim to draw any causal conclusions from the coefficient estimates. Multicollinearity on the first stage causes no problem for identification on the second stage (see e.g., Wooldridge 2010).Footnote 20 F-tests for the joint influence of all coefficients that involve the instrument exceed the usual threshold and suggest a highly significant joint influence in all specifications. Also, the first stage explains a substantial proportion of the variation of the health variables. The first stage statistics therefore suggest that the instruments have sufficient power to identify exogenous variation in the crude death growth. Further, we report Shea’s R 2 statistics, which suggest in all specifications substantial explanatory power.

Table 8 First stage: 2SLS regression of infant mortality growth on the instruments and other controls
Table 9 First stage: 2SLS regression of crude death rate growth on the instruments and other controls

Table 10 presents estimation results of the causal effect of changes in mortality, in terms of infant mortality growth, from the second stage of 2SLS estimations. The results in the first panel suggest a significant negative effect of infant mortality growth on growth in GDP per capita. We find that a one-percentage-point increase in infant mortality growth leads to a reduction in GDP per capita growth by about 1.3 percentage points. In the second panel of Table 10, we estimate the causal effect of infant mortality growth on population growth. The coefficients of infant mortality growth are still insignificant, which suggests that infant mortality growth has little influence on population growth. The last panel of Table 10 contains the results regarding the causal effect of infant mortality growth for GDP growth. A one-percentage-point increase in infant mortality growth reduces GDP growth by 1.6–1.9 percentage points. Since child mortality is sensitive to bad overall health conditions, these findings suggest that health improvements are indeed conducive to economic development, while not affecting population growth.Footnote 21

Table 10 Second stage: 2SLS regression of GDP per capita growth, population growth, and GDP growth on infant mortality growth and other controls

In Table 11, we present the corresponding IV results for crude death rate growth as alternative measure for variation in health conditions. The effect on GDP per capita growth turns negative but is still insignificant except for the specification without period fixed effects. Including period and country dummies, we find a significant effect of −1.3, which is comparable to the findings for infant mortality growth. The effect on population growth is significant and even stronger (with a coefficient between −0.1 and −0.4). The effect on GDP growth are all negative but only significant in the specifications with period fixed effects.

Table 11 Second stage: 2SLS regression of GDP per capita growth, population growth and GDP growth on crude death rate growth and other controls

In the lower part of each panel of Tables 10 and 11, we also report the Hansen’s J-statistic because we use a large number of interactions as instrument. The null hypothesis can generally not be rejected, suggesting that we do not face a considerable problem of too many instruments with little additional explanatory power. We also present the results of a Hausman specification test. We regress the outcome variable (GDP per capita growth, population growth, or aggregate GDP growth) on the observed change in mortality (infant mortality growth and crude death growth, respectively) and the estimated error from the first stage. Significant coefficients for the error term would indicate problems of endogeneity in the OLS regressions, otherwise the problem of reverse causality is less severe in the growth rates. We find significant evidence for endogeneity in regressions with infant mortality growth on GDP per capita growth and on GDP growth, respectively. However, the effect of infant mortality growth on population growth does not differ significantly between OLS and 2SLS estimates. Also for the crude death growth, we do not find strong indications for endogeneity. Only in the specifications where we include period and country dummies that endogeneity might be an issue. This indicates that the use of growth rates instead of levels in the outcome equation, as well as the sample construction already accounts to a large extent for the endogeneity issues of the crude death rate.

Comparing the results presented in Tables 10 and 11, we find very similar estimates of the causal effect of health on growth in GDP per capita and aggregate GDP growth, regardless of the particular measure that is used.Footnote 22 However, there is a systematic difference in the effects on population growth between infant mortality and crude death rates. We find no population effect of infant mortality, but growth in crude death rates has a negative effect on population growth. This difference can be rationalized since crude death rates capture health conditions at all ages and thus directly affect population size, while infant mortality has an a priori ambiguous effect on net fertility and hence population growth.

Despite the evidence from the first stage statistics, the 2SLS results might be biased due to the relatively large number of instruments, some of which might be weak (Bound et al. 1996). In order to investigate the robustness of our results, we replicate our analysis using limited information maximum likelihood (LIML) methods, which imply a lower bias than 2SLS estimates, in particular, in small samples (see, e.g., Flores-Lagunes 2007; Angrist and Pischke 2008). Tables 12 and 13 present the results for infant mortality and crude death rate growth, respectively. The results are qualitatively as well as quantitatively very similar to those obtained with 2SLS, even though the standard errors are larger as is to be expected with LIML. Taken together, however, the results suggest that the bias of the IV results is, at best, modest. As suggested by Angrist and Pischke (2008), we replicate the baseline specification using only the single best instrument in another robustness test. The single best instrument is the interaction between the introduction dummy and population size (\(\Delta \textrm{Insurance}_{i,t-1} \times \textrm{lpop}_{i,t-1}\)). To allow for a constant effect, we also include (\(\Delta \textrm{Insurance}_{i,t-1}\)) in the first stage. The significance of the point estimates do not change, and the size of the coefficient is even a bit larger in absolute terms in all specifications. However, the first stage is considerably weaker in these specifications.Footnote 23

Table 12 LIML estimation results for GDP per capita growth, population growth, and GDP growth on infant mortality growth and other controls
Table 13 LIML estimation results for GDP per capita growth, population growth, and GDP growth on crude death rate growth and other controls

4 Robustness checks

4.1 Additional controls

Up to this point, the focus was mainly on the problem of reverse causality, since this is the most serve endogeneity problem in empirical growth models. In this section, we present the results of several robustness checks for various confounds of the results. Potentially, there exists an omitted variable that is jointly correlated with the instrument, the change in mortality, and economic or demographic development. In this case, the results obtained with our identification strategy would still be biased.

Unfortunately, we have missing observations for most of the additional controls. Therefore, we include the additional controls one by one. Further, we estimate only the most parsimonious baseline specification including a calendar time trend. To account for the lower number of observations, we replicate the baseline specification with the reduced sample (unconstrained model without the additional control).Footnote 24 Then we apply a Durbin–Wu–Hausman (DWH) test to check if there is a significant difference in the parameter of interest between the constrained and unconstrained model.Footnote 25

4.1.1 Influence of institutions

As discussed above, institutions can also have an important influence on economic and demographic growth (see e.g., Acemoglu et al. 2005). In order to account for this influence, we use three different measures for institutions. We use a dummy for democratization, which is equal to one in the period after election rules are introduced and zero otherwise (\(\Delta \textrm{ElectRules}_{i,t-1}\)). Introduction dates of election rules are collected from Persson and Tabellini (2003). Another measure is the age of these election rules (\(\textrm{AgeElectRules}_{i,t-1}\)). This variable is equal to zero before the introduction of election rules and afterwards corresponds to the difference between the respective observation year and the introduction year. Finally, we use political regime dummies from the Polity IV Project. We distinguish between autocracies, democracies, and anocracies. The omitted category are anocracies, which are mixed or incoherent authority regimes.

The differences in the coefficients compared to the benchmark results are marginal. Regression results are presented in Table 14. According to the DWH test, we do not find a significant difference in any of the models. This suggests that the institution variables cannot account for the influence of mortality growth on economic and demographic growth, i.e., the effect of mortality changes on economic development does not appear to be just an indirect effect of institutions, and accounting for institutions does also not damage our identification strategy. However, especially for population growth, institutions have some additional explanatory power.

Table 14 Second stage: 2SLS regression of GDP per capita growth, population growth, and GDP growth on infant mortality growth and on crude death rate growth, with controlling for democratization variables

4.1.2 Contemporaneous reforms

Another concern is that contemporaneous reforms (e.g., land or education reforms) take place in the same time period of the introduction of universal public health care. In order to constitute a threat to our identification of the effect, these reforms would have to be contemporaneous, they would have to be picked up by the instrument, they would have to work through health (in terms of child mortality or crude death rates), and they would have to have an effect on the outcomes, particularly on economic growth. For some reforms (e.g., land reforms), we think that such a scenario is less likely. Nevertheless, such reforms could be approximated by a general change in the political system, as it was controlled for in the last section.

For educational reforms, it is likely that they have an effect on health and economic outcomes, e.g., through the fertility channel (Fort et al. 2011). In Table 15, we control for compulsory years of education (\(\textrm{CompEduc}_{i,t-1}\)) and changes in years of compulsory education (\( \Delta \textrm{CompEduc}_{i,t-1}\)). Compared to the baseline specifications, the results remain qualitatively unchanged. The point estimates become even larger in size and the effect of crude death rate on GDP growth is significant when controlling for change in compulsory years of education. At the same time, we find that, in particular, compulsory years of education have additional explanatory power.

Table 15 Second stage: 2SLS regression of GDP per capita growth, population growth, and GDP growth on infant mortality growth and on crude death rate growth, with controlling for compulsory years of schooling variables

4.1.3 Sectoral composition

The introduction of a universal public health care system could increase the number of doctors, nurses, and chemists. People who work in the service sector have lower mortality rates, since the probability of an accident is much lower in such occupations. Finally, the fraction of people working in the health sector could be correlated with economic and demographic growth since it is an indicator for structural change. In order to account for potential biases, we would ideally condition on the fraction of people working in the service sector. This variable is not available, but we have as proxy the fraction of GDP by sector of origin. We distinguish between the agricultural (%\(~\textrm{Agriculture}_{i,t-1}\)), industrial (%\(~\textrm{Industry}_{i,t-1}\)), and service sectors. Since the shares sum up to one, we use the service sector as omitted category. The coefficients of interest remain robust when including control variables for structural change (see Table C.1 in Online Appendix C).

4.1.4 Public consumption

Additionally, one might think that it is important to condition for government expenditure. The introduction of a universal public health system could be rather expensive and could have a direct effect on economic growth and via the larger income also on mortality growth. However, Lindert (2004) argues that developed countries show much care in choosing the design of taxes and transfers so as to avoid compromising growth. We can observe public consumption as share of total GDP (%\(~\textrm{PubCons}_{i,t-1}\)) in the data. Results including public consumption as additional control do not deliver any significant difference in the coefficient of infant mortality growth in comparison to the baseline model with this smaller sample (see Table C.2 in Online Appendix C). Additionally, one could argue that the share of public consumption has no influence on the coefficients of interest, but the change in public consumption, following an argument that only changes in fiscal policy influence economic development. However, when conditioning on the lagged relative public consumption growth (Δ% \(\textrm{PubCons}_{i,t-1}\)), the coefficients of infant mortality growth are not significantly different. The effect of infant mortality growth on population growth even becomes significant, but this can be explained by the different sample compositions (see Table C.3 in Online Appendix C).

4.1.5 Labor disputes

It could also be that individuals have a higher work-life satisfaction after the introduction of a universal public health care system, because they have the feeling that the government takes care of them. Higher satisfaction could have an influence on health as well as on economic development, since more satisfied individuals could have a higher productivity. We use the lagged number of labor disputes (per million workers) (\(\# \textrm{Disp}_{i,t-1}\)) and the lagged number of workers involved in labor disputes (per million workers) (\(\# \textrm{Work}_{i,t-1}\)) as proxy for work-life satisfaction. Since we have a number of autocratic (terror) regimes in our sample, where the population is potentially very unsatisfied but is not allowed to demonstrate, we additionally condition on democracies. Results can be found in Table C.4 in Online Appendix C. According to the DWH test, the coefficient of interest is not affected significantly by this. In order to have also a different proxy for work-life satisfaction, we include the lagged number of days lost in labor disputes (per million workers) (\(\textrm{DaysLost}_{i,t-1}\)) in the regressions, but the coefficient of interest is not affected by this according to the DWH test (see Table C.5 in Online Appendix C).

4.1.6 Gross capital formation

Finally, one could argue that our estimation model is misspecified since one needs to condition on the gross capital formation according to the prediction of the Solow growth model. For a small number of observations, we have data on the gross capital formation as share of total GDP (\(~\% \textrm{GCF}_{i,t-1}\)). Results are presented in Table C.6 in Online Appendix C. We find that the effect of infant mortality growth on GDP per capita growth differs significantly at the 10 %-level when we condition on gross capital formation, and the effect is even more negative than that in the baseline. If anything, our baseline specifications would underestimate the true absolute effect of infant mortality growth on GDP per capita growth. The effects on population growth and GDP growth are unaffected by the inclusion of gross capital formation.

4.2 Placebo treatments

As another robustness check, we apply a placebo treatment test. As placebo treatment, we artificially set the instrument equal to one period “too early”.Footnote 26 Accordingly, the instrument is equal to one in the period prior to the introduction and not in the period after the introduction of a universal public health care system. Since there is not enough adjustment time for the mechanisms that affect changes in mortality, we expect zero effect at the first stage. As expected, the instrument and the interaction have virtually no effect on mortality growth. Tables D.1 and D.2 in Online Appendix D display the first stage of the placebo treatment test for the baseline specifications. Almost all coefficients of the instruments are insignificant or only weakly significant at the 10 % level. The Shea’s R 2 and the F-statistic are rather low, which indicates that the placebo instrument has nearly no power despite a comparably large number of observations as in the main analysis. In the second stage (see Tables D.3 and D.4 in Online Appendix D), the coefficients of interest are insignificant (excluding two cases) and have high standard errors; most coefficients change sign and become positive. We interpret this as indication that our identification strategy passes the placebo treatment test. This also suggests that it it unlikely that the previous results are driven by common time trends or random shocks. Moreover, this supports our initial assumptions about the causal channel, namely that first the universal public health care systems were introduced and then the effects on economic and demographic development unfolded, and not vice versa.

4.3 Different data frequencies

One important parameter in our empirical analysis is the choice of the data frequency. The choice of a 20-year data frequency is to some extent arbitrary. In our identification approach, the data frequency has two important implications. On the one hand, we want a fairly high data frequency, because otherwise the instrument has no power on mortality growth. In the long run, the effect of the introduction of a universal public health care system diminishes. On the other hand, we need a sufficiently low data frequency, because we expect that the effects of mortality change on economic and demographic development have a rather long-run character and need time to unfold. If the data frequency is too high, mortality growth has no influence on the outcome variables. In this section, we therefore investigate the sensitivity of our results for alternative data frequencies of 15- and 30-year periods.

Using a 15-year frequency, the main results remain robust (see Tables E.1 and E.2 in Online Appendix E). Infant mortality growth has a negative effect on GDP per capita growth and GDP growth and no significant effect on population growth. The coefficients are smaller, which indicates that infant mortality has a smaller effect on economic and demographic growth in the short run. When using a 30-year data frequency, the number of observations becomes very small, but the size of the coefficients is even larger than for the 20-year data frequency in the specifications without period dummies (see Tables E.3 and E.4 in Online Appendix E). When we increase the data frequency further to 40 years, the coefficients of interest get even larger. However, the first stage has less power and the number of observations decreases to 35. For crude death growth, we find that the instrument has strong power independent of the data frequency. In summary, we conclude that our estimates might potentially not capture the total effect of mortality growth on economic and demographic growth. Our findings might nevertheless be valid for the medium term. Long-run effects of mortality growth on economic and demographic growth could be even stronger but are difficult to identify with our identification strategy.

5 Conclusion

This paper has applied a novel identification strategy based on the timing of the implementation of a universal public health system to estimate the causal effect of mortality changes on economic growth and population growth. The results indicate that a reduction in mortality accelerates growth of income per capita and population size and reconcile earlier findings in the literature by documenting a positive effect of mortality reductions on growth based on an identification strategy that exploits within-country over time variation. This suggests that the discrepancies in earlier findings might be the result of differences in sample composition, rather than identification method. Moreover, our results suggest that public health policy plays a potentially important role for economic development.

Naturally, there are caveats to our analysis that need to be taken into account when interpreting our results. First, the findings are based on a small sample, with the identifying variation stemming from European countries in the late nineteenth and early twentieth century. As in previous studies, sample composition might affect the generality and external validity of our results. Nevertheless, given the particular sample, the results can be seen as a complement to studies using exclusively cross-country variation in geo-climatological conditions (as in Lorentzen et al. 2008) or using within-country variation during a very particular period of global development (as the global epidemiological transition exploited by Acemoglu and Johnson 2007). As always in this type of empirical work, the interpretation of the results critically hinges on the fact that all possible confounds and relevant variables have been included in the estimation. In light of our extensive robustness checks, we are confident that the results are robust and not driven by omitted variables such as other reforms like institutional change in the quality of democracy, education reforms, or other developments like sectoral change, worker movements, or changes in government expenditure patterns. However, more work is needed to investigate the channels through which the causal effect of health on economic growth operates, and what might be additional consequences of health improvements, e.g., on fertility, education, consumption patterns, or retirement.Footnote 27