1 Introduction

Healthcare is one of the important public policy issues of our time. Quality of healthcare services, its distribution and accessibility are often perceived as being related to quality of life of individuals. As advances in technology and sciences are reflected into healthcare domain and as demand increases, costs and expenditures have been increasing, reaching levels between 10–20% of gross domestic product (GDP) in several countries (OECD 2013). This has contributed to a higher emphasis being placed on performance evaluation for healthcare institutions (Kazandjian and Lied 1999; Özcan 2008). While healthcare organizations can be viewed and evaluated as service systems to some extent, it is difficult to assess the quality of the healthcare system of a country. In WHO (2000), the World Health Organization published a ranking of healthcare systems of 191 countries and found France as the country that has the best healthcare system. Rankings of healthcare systems of countries appear in the media from time to time as in Capell (2008), The Guardian (2003) and The New York Times (2007). Most of the time, these articles are based on WHO reports or Organisation for Economic Cooperation and Development (OECD) annual reports as presented in WHO (2015), De Looper and Lafortune (2009), and OECD (2013).

The focus placed on the efficiency of the healthcare system can vary not only across countries but also across the years within a country. Different countries can have different priorities depending on the current state of their economy and other relevant parameters. In addition, when the political system changes and technology progresses, these priorities may also change.

Past research has often identified life expectancy (LE) at birth and infant mortality rate (IMR) as key outcomes of the healthcare system. Mohan and Mirmirani (2007) investigated the significance of different factors that can influence LE and IMR using panel data of 12 years, from 1990 to 2002. Two regression models were built, one where LE was chosen as the dependent variable and another where IMR was chosen as the dependent variable. Several independent variables were used in the regression models. Empirical results indicated that the level of healthcare expenditure among OECD countries has been an important factor in extending LE but did not have much impact on lowering IMR. Education level used as an indicator of health awareness was significant in both of the regressions.

Data Envelopment Analysis (DEA) has been among the methods used for performance evaluation of healthcare systems (Greene 2004; Tandon et al. 2001; Jacobs 2001). DEA is a methodology based on linear programming that can be used to analyze the relative efficiency of similar Decision Making Units (DMUs). DEA has been applied in very different areas with success (Afonso and Aubyn 2005; Galterio et al. 2009; Johnes 2006; Zhou et al. 2008). In healthcare, the methodology has been used for performance evaluation of units such as hospitals (Jacobs 2001; Özcan 2008), or different units within hospitals (Wang and Yu 2006) as well as countries (Afonso and Aubyn 2005; Reinhardt et al. 2002; Varabyova and Schreyögg 2013; Borisov et al. 2012).

Afonso and Aubyn (2005) compared performances of 24 OECD countries in education and healthcare using two different methodologies, Free Disposal Hull (FDH) and DEA. They used number of physicians, nurses and beds per 1000 population (for brevity, the term per 1000 population will be omitted when referring to these widely used inputs in this manuscript) as input measures and infant survival rate (ISR) and LE at birth as output measures. They used data from the year 2000. According to their results, 11 out of 24 countries were efficient in the FDH analysis whereas 8 of them remained efficient in the DEA analysis.

Greene (2004) analyzed 191 countries based on WHO data, using stochastic frontier analysis. He used disability adjusted LE and a composite measure of health care delivery as output measures. As input measures, health expenditure per capita in 1997 and average years of schooling were considered. He also used different variables that were considered as indicators of cross country heterogeneity such as the Gini coefficient that measures income inequality, and OECD membership. As a result, Greene pointed out that expenditure is a major component of healthcare system performance and should be taken into account. The author also found that OECD membership explained much of the variation in the outcome measures and the distribution of income is a significant factor.

In another study, Retzlaff-Roberts et al. (2004) developed different types of DEA models using the OECD 2000 database with data of 1998. As in Afonso and Aubyn (2005), ISR and LE at birth were considered as output measures but these measures were dealt with separately. For input measures, they considered four healthcare-related inputs and three social environment inputs. The healthcare inputs included the number of physicians, beds, healthcare expenditure as a percentage of GDP and MRI units per million population. The social environment input variables included the expected number of years of education, the Gini coefficient and the maximum value of the percentage of male and female smokers. From their different models, the authors concluded that countries with relatively modest outcomes like Turkey and Mexico turned out to be efficient while other countries with good health outcomes were not necessarily using their resources efficiently.

Varabyova and Schreyögg (2013) analyzed the hospital care efficiency in OECD countries using panel data between 2000 and 2009. They used DEA and Stochastic Frontier Analysis (SFA) methods and compared them. For both methodologies, they considered hospital discharges and mortality as output measures whereas hospital resources such as number of beds, physicians, nurses and hospital employments were the inputs in the models. Total hospital employments refer to the number of persons employed (including self-employed and full-time equivalent employed) in general and special hospitals. Varabyova et al. argued that their analyses are good indicators of efficient use of resources. Hence, countries with good health outcomes in terms of longevity like Japan can be inefficient while developing countries like Turkey can be efficient.

Samut and Cafrı (2015) analyzed the efficiency of hospitals for 29 OECD countries between 2000 and 2010 to understand the parameters that have an impact on efficiency. They used a two stage model in which DEA and Panel Tobit were used for the first and second stage respectively. As a result they found that countries that were fully efficient during this 10-year period were Mexico, Turkey, and the United Kingdom. On the other hand, Japan, Iceland, France and Belgium had under-average efficiency scores for the same period. From the Panel Tobit analysis the authors pointed out that wealthier countries had better hospital efficiency. Indeed, they found a positive relation between GDP and education along with a positive relation between GDP and efficiency.

Frogner et al. (2015) analyzed health data of 25 OECD member countries using panel data analysis. They applied stochastic frontier analysis and fixed effect analyses over 11 input variables, including health care resources, health-related behavior, and economic and environmental factors. They built 36 different models and by comparing their results from different models, they revealed the fragility of the results of ranking models. However, they were not able to demonstrate that the U.S., in particular, performed significantly better than its WHO ranking in these alternative ranking models.

In this study, we experiment with Assurance Region Global (ARG) models based on the expectation that they may produce more conservative and consistent evaluations than those produced by standard DEA models. In addition to the widely utilized outputs of ISR and LE at birth, we also explore a different approach considering survival rates from major causes of death as output variables. Our goal is to see how efficiency and inefficiency with respect to these models differ from those obtained by traditional models. We also provide a comprehensive evaluation of OECD member countries as some past studies exclude some member countries due to a lack of data observations (Mohan and Mirmirani 2007; Jacobs 2001; Afonso and Aubyn 2005; Retzlaff-Roberts et al. 2004; Frogner et al. 2015). We use OECD data for the years 2008 and 2012 therefore making it possible to observe any shifts in healthcare system performance at different points in time in the studied countries. In the next section, we explain the DEA methodology and present basic models. In Sect. 6.3, we describe how the data is processed, give our model results and discuss them. In Sect. 6.4, we build a model with specific causes of death as outputs. Finally, we conclude and provide further research directions in Sect. 6.5.

2 DEA

DEA methodology was originally introduced by Charnes, Cooper and Rhodes in 1978 as a method for evaluating the relative efficiency of Decision Making Units (DMUs) performing essentially the same task. The methodology got its name because of the idea of enveloping the observations to identify an efficient frontier. This frontier is computed via a ratio where multiple inputs produce multiple outputs (Joro et al. 1998; Cooper et al. 2006). The main idea of this linear programming (LP) model is to have a score between 0 and 1 representing the degree of efficiency of a DMU where 1 represents an efficient DMU. The model provides also an identification of sources and amounts of possible inefficiency and a direction of improvement based on orthogonal projection of the observation to the frontier.

Let n denote the number of DMUs to be evaluated. Suppose that there are m input and s output variables and let the input and output data be denoted by matrices X and Y  of size m × n and s × n respectively. DEA methodology seeks to attach weights to each input and output variable. In the formulation below, these are decision variables denoted as v i , i = 1…, m and u j , j = 1, …s respectively.

The first model introduced by Charnes, Cooper and Rhodes, denoted as CCR, sets up a fractional problem (FP k ) for an arbitrary DMU k for k ∈{1, …, n} which can be expressed as:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \max \theta = \frac{u_{1}y_{1k}+u_{2}y_{2k}+\ldots+ u_{s}y_{sk}}{v_{1}x_{1k}+v_{2}x_{2k}+\ldots+v_{m}x_{mk}} \\ \textit{subject \ to} \frac{u_{1}y_{1j}+u_{2}y_{2j}+\ldots+u_{s}y_{sj}}{v_{1}x_{1j}+v_{2}x_{2j}+\ldots+v_{m}x_{mj}}&\leq& 1, \forall j=1,\ldots,n \end{array} \end{aligned} $$
(6.1)
$$\displaystyle \begin{aligned} \begin{array}{rcl} v_{1},\ldots,v_{m} &\geq& 0, \end{array} \end{aligned} $$
(6.2)
$$\displaystyle \begin{aligned} \begin{array}{rcl} u_{1},\ldots,u_{s} &\geq& 0,\vspace{-3pt} \end{array} \end{aligned} $$
(6.3)

The objective is to obtain the optimal weights such that the ratio of weighted output over weighted input is maximized for DMU k . In other words, the objective is to find the most favorable weights for DMU k that maximizes weighted output obtained per weighted input used. Constraint (6.1) ensures that for these weights, output-to-input ratios for all DMUs are between zero and one. Constraints (6.2) and (6.3) are non-negativity constraints for the weights. Also the values for inputs and outputs are assumed to be positive. At optimality, \((v^{*}_k, u^{*}_k)\) represents the set of most favorable weights for DMU k that maximizes the ratio scale. Each weight shows how highly the associated input or output is evaluated relatively. Note that the above formulation is a fractional program that can be linearized relatively easily. In addition, an output-oriented version of this model can be written as opposed to this input-oriented version. The orientation is named based on the objective function of the dual problem. Discussion of equivalence between these two versions, derivations of equivalent linear programming transformations as well as a discussion of alternative DEA models can be found in Cooper et al. (2006). In this study, we employ the output oriented BCC model which differs from the CCR model by an additional convexity constraint in the dual formulation. This constraint translates into a new variable free in sign in the LP model in which the observations for the n DMUs may be combined, thus allowing a variable returns to scale in the production frontier. Formulation (BCC-O-FP) represents the dual problem with the convexity constraint and the corresponding fractional problem of an output-oriented BCC model.

$$\displaystyle \begin{aligned}\begin{array}{rcl} (BCC-O-FP)\quad \min \bar{\theta} = \frac{v_{1}x_{1k}+v_{2}x_{2k}+\ldots+v_{m}x_{mk}-v_{0}}{u_{1}y_{1k}+u_{2}y_{2k}+\ldots+ u_{s}y_{sk}} \\ \textit{subject \ to} \frac{v_{1}x_{1j}+v_{2}x_{2j}+\ldots+v_{m}x_{mj}-v_{0}}{u_{1}y_{1j}+u_{2}y_{2j}+\ldots+u_{s}y_{sj}}&\geq& 1, \forall j=1,\ldots, \ n\quad \end{array} \end{aligned} $$
(6.4)
$$\displaystyle \begin{aligned} \begin{array}{rcl} v_{1},\ldots,v_{m} &\geq& 0, \end{array} \end{aligned} $$
(6.5)
$$\displaystyle \begin{aligned} \begin{array}{rcl} u_{1},\ldots,u_{s} &\geq& 0, \end{array} \end{aligned} $$
(6.6)
$$\displaystyle \begin{aligned} \begin{array}{rcl} v_{0} &&\textit{free \ in \ sign}\vspace{-4pt} \end{array} \end{aligned} $$
(6.7)

Note that since in BCC there is an additional constraint compared to CCR, the feasible region of the latter problem contains the feasible region of BCC. Hence, any BCC efficient DMU is CCR efficient.

DEA methodology has been used in a variety of application areas such as education, healthcare, energy efficiency, among others (Galterio et al. 2009; Johnes 2006; Zhou et al. 2008; Liu et al. 2013). One drawback of the methodology is that it may present a very favorable outlook of a DMU since it is designed to choose input and output weights in a way to benefit that particular DMU the most. In other words, the methodology is capable of overemphasizing strengths of a DMU while ignoring its weaknesses by setting respective weights to zero. As a way to alleviate this shortcoming, imposing restrictions on weight vectors u and v has been proposed. Thompson et al. (1986) developed the assurance region approach which is based on imposing constraints on the magnitude of the weights for specific inputs or outputs of DMUs relative to each other. Another proposal presented in Charnes et al. (1990), known as the cone ratio approach has been to restrict input and output weights to predetermined cones via additional constraints. Wong and Beasley (1990) propose limiting the proportion of total output of DMU k devoted to output measure i by imposing limits as follows:

$$\displaystyle \begin{aligned} L_{i}\leq\frac{u_{i}y_{ik}}{\sum_{j=1}^{s}u_{j}y_{jk}}\leq U_{i}. \end{aligned} $$
(6.8)

Similar constraints can be defined for inputs and their corresponding weights. This approach has the advantage of restricting relative weights instead of virtual weights and therefore may be more intuitive. While Wong and Beasley (1990) motivated this approach as a means of incorporating value judgments in a DEA model, Cooper et al. (2006) mentioned that it can also be used to establish some consistency in weight choices of different DMUs by a careful choice of the bounds L i and U i . We employ this approach as a variation of our base model and label it as the ARG model following the terminology in Cooper et al. (2006). Since there are additional constraints in ARG models, it may be expected that some DMUs that were formerly classified as efficient may become inefficient once the constraints are imposed.

3 LE, Infant Mortality and Efficiency

Our data comes from the OECD online library. OECD is an organization that aims to promote policies that will improve the economic and social well-being of people around the world (OECD 2016). It has been established in 1961 and it consisted of 34 member countries at the time analyses were conducted for this study. With Latvia becoming a member in July 2016, the number of member countries has reached 35. OECD collects data from its members in order to develop policies with respect to its mission. It keeps track of significant amounts of data about not only economics, taxes, trade finance but also education, health, environment and social issues. Within healthcare domain, OECD collects a variety of data from various expenditure figures to amounts of tobacco consumption. The reader can find the list of variables related to healthcare in Health at a Glance report of OECD (2013). Table 6.1 provides a list of variables that we use in this part of our study. In these models, we pick our two output measures as ISR and LE at birth. ISR is computed using IMR as given in Afonso and Aubyn (2005) by:

$$\displaystyle \begin{aligned} ISR=\frac{1000-IMR}{IMR} \end{aligned} $$
(6.9)

ISR represents the ratio of children that survived their first year to the number of children that died. As inputs, we pick number of physicians, number of nurses and number of hospital beds. A vast majority of the countries account for number of physicians and nurses as practicing professionals. A few countries report the number of physicians and nurses by including practicing physicians or nurses plus others working in the healthcare sector as managers, educators and researchers, adding another 5–10% to each group. We use the figures as reported in the database. We take two cross sections of data for the 34 countries from 2008 and 2012 to be able to take two snapshots in time and to see if any differences can be observed.

Table 6.1 Description of input and output variables

The missing data in our data set were estimated via the previous and future available data and by means of linear interpolation.Footnote 1 In the literature, it is suggested that the number of DMUs should exceed 3 times the total number of inputs and outputs. With 34 DMUs, 3 inputs and 2 outputs, we obey this guideline. All of our models are based on output-oriented BCC approach. In addition to the base models, we build ARG models where weights on outputs are imposed with the values of L i  = 0.4 and U i  = 0.6 for i = 1, 2 in constraint (6.8). These bounds are chosen so that nearly equal importance is given to both output variables. We use DEA-Solver-Learning Version developed by Kaoru Tone where the platform is Microsoft Excel 2003 (Cooper et al. 2006).

3.1 2008 Models with Respect to LE and Infant Mortality

In terms of inputs and outputs, our base BCC model is parallel to the work of Afonso and Aubyn (2005). The descriptive statistics of the variables are presented in Table 6.2. In addition to the implementation of the base model on 2008 data, we run our ARG model as well and observe the differences between the results of the two models. As we use the same inputs and outputs, in the ARG models we expect to see a subset of the efficient countries of the base model. The efficiency scores of all countries for both BCC and ARG models can be found in Table 6.3. In the following tables with efficiency scores, a score of 1 is presented in bold in order to highlight countries that are efficient.

Table 6.2 Descriptive statistics of variables for 2008
Table 6.3 Scores of all OECD countries for 2008 models

In 2008 BCC model, Canada, Chile, Greece, Italy, Japan, South Korea, Luxembourg, Mexico, New Zealand, Spain, Sweden and Switzerland are efficient. We can see that some of the developed economies (e.g. Germany) are inefficient whereas some of the developing economies are efficient (e.g. Chile).Footnote 2 A similar counter intuitive result was pointed out in the work of Retzlaff-Roberts et al. (2004) with data from OECD 2000 database. Note that due to the nature of DEA methodology, inefficiency does not necessarily imply a deficiency in outputs. The developed countries may be using more inputs compared to developing ones for getting certain level of outputs whereas developing countries do relatively well with their limited available resources. A closer look at the output weights of the efficient countries reveals that in 2008, Greece, Luxembourg, Spain and Sweden have zero weight on their LE at birth output. 16 out of the 22 inefficient countries achieve their best by nullifying one of their output weights.

In the 2008 ARG model, six of the BCC-efficient countries (Chile, Greece, Luxembourg, Mexico, Spain and Sweden) are still efficient. On the other hand, Canada, Italy, Japan, South Korea, New Zealand and Switzerland lost their efficiency once the additional constraints are added. We note that countries that lose their efficiency when a more balanced combination of outputs is enforced are developed countries. This strengthens the interpretation that the inefficiencies may stem from abundant inputs rather than poor outputs. We also observe that scores now come from a wider range between 0.627 and 1, as expected.

3.2 2012 Models with Respect to LE and Infant Mortality

The descriptive statistics of the variables for 2012 are presented in Table 6.4. In the 2012 BCC model, some countries that were efficient in 2008 are not efficient whereas some countries that were not efficient in 2008 are observed as efficient in 2012. The countries that were efficient in 2008 but not in 2012 are Italy, Luxembourg, New Zealand and Switzerland. In 2012, Iceland, Israel, Slovenia and Turkey join the list of efficient countries. Even more inefficient countries (21 out of 22) have zero weight on one of their output weights. In 2012, Canada, Israel and Sweden are the countries that are BCC efficient but inefficient with respect to the ARG model. Again, it is mainly developed countries that are forced to inefficiency by a balanced use of outputs. Furthermore in the ARG model scores vary between 0.523 and 1. All the results for BCC and ARG models for 2012 can be found in Table 6.5.

Table 6.4 Descriptive statistics of variables for 2012
Table 6.5 Scores of all OECD countries for 2012 models

3.3 Discussion of Results

Looking at the overall results, it is possible to observe that Chile, Greece, Mexico and Spain are efficient in both models for both years. On the other hand, 18 countries (including Austria, France, the United Kingdom and the United States) out of 34 are always inefficient regardless of the model types and years.

We conducted reference set analyses on the 2008 and the 2012 ARG models. Tables 6.6 and 6.7 report the reference set members along with the associated weightsFootnote 3 for the inefficient countries for 2008 and 2012, respectively. In Table 6.6 we can see that Luxembourg is dominant with an occurrence of 26 out of 28. Furthermore, Luxembourg is the country with the highest reference set weight value for more than half of the countries. Therefore it can be stated that Luxembourg, with the highest ISR (554.6) and its LE at birth being relatively good (80.7), can be seen as the “ideal” country in terms of healthcare system performance. This outcome is worth attention because Retzlaff-Roberts et al. did not include Luxembourg in their data set due to lack of information in OECD 2000 database. Chile, Sweden and Spain are the other countries that appear frequently in the reference sets. An interesting observation is that Luxembourg is not efficient in either of the 2012 models although it has a strong appearance in 2008 models. DEA model results include the values of the slacks that give a direction of improvement from the computation of the model. We remark that most of the excess is present in the number of nurses and number of hospital beds according to ARG models.

Table 6.6 Reference set for inefficient countries for ARG model in 2008
Table 6.7 Reference set for inefficient countries for ARG model in 2012

A reference set analysis for 2012 is given in Table 6.7. It can be observed that Slovenia is dominant with an occurrence of 21 out of 25 times. Also Iceland has an occurrence of 17 out of 25. Chile, Greece, Spain and Turkey are the other countries that appear frequently in the reference sets.

4 Survival from Major Causes of Death and Efficiency

Our results above show that when LE at birth and ISR are taken as the outputs that indicate system efficiency, developing countries may have an advantage as they achieve relatively good results in relation to the inputs they provide to the system. Obviously our models correspond to a very high level analysis of the healthcare system. In order to focus on the system’s ability to deal with major health issues, we analyze some major causes of death and their corresponding survival rates as outputs. According to WHO fact sheet, the first two major causes of death in the world are ischemic heart disease and stroke. Death from all types of cancer is also listed as a major cause of death in WHO (2014). We build new models by using survival rates of ischemic heart disease, cerebrovascular disease (since stroke is the most common type of cerebrovascular diseases) and malignant neoplasms as outputs of the system while the inputs remain the same. We label our new models that consider survival rates from three major causes of death (ischemic heart disease, cerebrovascular disease, malignant neoplasms) as outputs along with the same inputs as in previous models BCCs and ARGs. Data from OECD library is presented in the form of age standardized mortality rates per 100,000 population. These rates are calculated by the OECD Secretariat, using the total OECD population for 2010 of each corresponding country as the reference population. They use the method of standardization for age-standardized calculations to be able to compare the level of mortality across countries and over time. We compute survival rates based on mortality rates using Eq. (6.9) replacing 1000 by 100,000.

4.1 2008 Models with Respect to Survival from Major Causes of Death

For 2008, Turkey is the only country with missing data. We use data from 2009 for Turkey. The descriptive statistics of the output variables for 2008 are presented in Table 6.8.

Table 6.8 Descriptive statistics of mortality rates from common causes of death for 2008

Table 6.9 shows the results of the BCC and ARG models. In the BCC model nearly half of the countries (16 out of 34) are efficient. The efficient countries are Canada, Chile France, Israel, Italy, Japan, South Korea, Luxembourg, Mexico, Netherlands, Portugal, Slovenia, Switzerland, Turkey, the United Kingdom and the United States. Compared to the BCC model in Sect. 6.3.1, France, Israel, Netherlands, Portugal, Slovenia, Turkey, the United Kingdom and the United States became efficient with the new output variables. On the other hand, Greece, New Zealand and Sweden are no longer efficient. Looking at the ARG model results, we see that Portugal, Slovenia, Turkey and the United Kingdom are not listed as efficient anymore. Also looking at the scores, we see that the discrepancy is much more apparent when the output variables changed. This can be explained by high standard deviation values of the output variables.

Table 6.9 Scores of all OECD countries for 2008 mortality output models

4.2 2012 Models with Respect to Survival From Major Causes of Death

The most recent and complete data about survivals from the three conditions belong to 2012. Data on survivals from these conditions are not available for Canada, Iceland and Slovenia. We use the most recent data for these countries: 2011 for Canada, 2010 for Iceland and 2009 for Slovenia. The descriptive statistics of the variables are presented in Table 6.10. Table 6.11 shows the results of both models for BCC and ARG models using 2012 data.

Table 6.10 Descriptive statistics of mortality rates from common causes of death for 2012
Table 6.11 Scores of all OECD countries for 2012 mortality output models

We observe that Canada, Chile, France, Israel, Japan, South Korea, Mexico, Spain and Turkey are efficient in both models. Portugal and Switzerland are BCC efficient but ARG inefficient. France, Portugal and Switzerland are efficient when survival rates from major causes of death are considered as outputs instead of LE at birth and ISR. On the other hand, Greece, Iceland Slovenia and Sweden are not efficient when survival rates from major causes of death are considered as outputs although they are efficient with respect to LE at birth and ISR. Again, we observe that efficiency and inefficiency do not necessarily align with the status of countries as developed or developing. Looking at the reference sets presented in Table 6.12, even though France is not efficient when classical outputs are considered, this country can be followed as the model country for most of the inefficient countries of the ARG model when main causes of mortality are considered as output variables. Also Canada, Mexico and Spain appear frequently in the reference sets.

Table 6.12 Reference set for inefficient countries for ARGs model in 2012

5 Conclusion

This research aims to provide an evaluation of healthcare system efficiency of 34 OECD countries. DEA methodology is used with different modeling techniques and with different output measures. OECD data for 2008 and 2012 are used. The two output measures we use in the first models, LE at birth and ISR, are generally accepted measures as system output in the literature. We suggest the use of ARG models in a way to balance the two outputs as a remedy against the overly optimistic nature of the DEA methodology. To our knowledge, this is the first study that suggests using an ARG model for evaluating healthcare system efficiency at the country level. We observe a more consistent result when constraints on relative weights are imposed as weight restrictions require a more balanced relative output generation. A reference set analysis on the ARG model of 2008 displays Luxembourg as a good role model for inefficient countries whereas for 2012, Slovenia assumes that role.

In addition to using LE and ISR as outputs as is traditionally done, we experiment with survival rates from major causes of death as possible measures of outputs and implement this model with 2008 and 2012 data. When compared to LE at birth as a measure, this model rules out deaths from less common causes as well as deaths whose occurrence have a more indirect relationship to the performance of the healthcare system, such as deaths from accidents, homicides and suicides. In other words, under the assumption that most countries spend more effort on increasing their healthcare system capacity geared towards conditions surrounding most likely causes of death, efficiency with respect to this model can be taken as an indication of their success. We observe that while there is significant overlap among efficient countries of different models, there are also differences. Countries that are not efficient with respect to LE and ISR but are efficient with respect to survival rates from major causes of death may focus on other causes of death to improve their standing. On the other hand, countries that are efficient with respect to LE and ISR but are not efficient with respect to survival rates from major causes of death need to focus more on these major causes of death. A reference set analysis on the ARG model of 2008 and 2012 displays France as a good role model for inefficient countries with these two outputs whereas for the previous models.

Lack of output measures that capture the quality of health services remains as a limitation of the current study. This is mainly due to the difficulty of quantifying system quality and obtaining associated complete data. Bringing commonly accepted appropriate pseudo-measures that quantify system quality into our models remains as future work. Another future research direction is to study how efficiency of healthcare systems change over time by using, for instance, the Malmquist Index, named after Malmquist (1953). This may be more revealing in terms of the nature of the changes that take place in the studied time interval. Although we studied two snapshots in time to observe any possible shifts with respect to different DEA models and different output measures and also to better relate to past literature, a multi-year study designed to track changes in system performance might provide more insights to policy makers.