1 Introduction

The three major dimensions in the assessment of health system performance are effectiveness, equity, and efficiency [1]. In most developed countries with comprehensive medical standards and near-universal access to health services, the efficiency goal is particularly important. Measuring, reporting, and comparing the efficiency of health care systems allows managers and policy makers to identify areas with high improvement potential. Health care efficiency analysis, particularly in the hospital sector, has attracted substantial research attention in recent decades.

One of the most widespread approaches to measure hospital efficiency is to use frontier techniques, which analyze the performance of hospitals relative to the best-practice frontier. Most of the studies analyzing hospital efficiency have relied on nonparametric methodology, with the most frequently used approaches being Data Envelopment Analysis (DEA) and Free Disposal Hull (FDH) [2]. The popularity of nonparametric methodology is a result of its apparent advantages: the ability to accommodate multiple inputs and outputs and the simplicity of the underlying assumptions. However, because of their deterministic nature, the DEA and FDH methods are highly sensitive to outliers, extreme values, and random shocks, all of which are common among hospital data. Moreover, the conventional nonparametric approaches do not provide a satisfactory technique for incorporating and interpreting the relation of external factors to efficiency. A good overview of the nonparametric methods and their limitations can be found in Ozcan [3] and Daraio & Simar [4].

The majority of studies estimating hospital efficiency have focused on within-country analyses [2]. However, efficiency analyses based on a single-country dataset are limited in generalizability [5]. The efficiency estimates from the nonparametric analysis represent a relative measure, meaning that the true best practice frontier is unknown, and the best available hospital sets the benchmark against which other hospitals are measured. The analysis of hospital efficiency based on a single-country dataset may lead managers and policy makers to draw false conclusions about the extent of inefficiency in the country if all hospitals are systematically inefficient, for example, due to detrimental institutions. Moreover, small countries may lack suitable benchmarks, for example, if a country has a single large university hospital. One solution to this problem is to compare decision-making units (DMUs) in a particular sector across several countries and orient the efficiency improvement targets to the internationally best available practice [5, 6].

In a recent review of studies comparing hospital efficiency across the Nordic countries, Medin et al. [7] described the data challenges inherent to international comparative studies of hospital efficiency. These challenges included differences in patient and treatment classifications and comparability of hospital staff and operating costs. The existence of a uniform Diagnosis-Related Group (DRG) system and similar structures in the organization of hospital care in the Nordic countries mitigates these challenges. In contrast, only a few studies have attempted to compare hospital efficiency across countries with different DRG systems [810]. To overcome the absence of a uniform system to adjust hospital discharges for case-mix severity, these studies grouped patients into a small number of discharge categories based on the type of intervention or patient age. Mateus et al. [11] weighted the discharges in four European countries by the relative weights associated with the DRG system used in each country; however, because of the considerable differences in treatment costs across countries, this approach could potentially generate different case-mix adjustments across the countries.

In the context of the above, the objectives of this study are twofold. First, demonstrate the application of advanced nonparametric methods to the analysis of hospital efficiency and explain the usefulness of these methods in overcoming the limitations of the conventional nonparametric estimators. In particular, we rely on a partial frontier analysis of order-m [12, 13] to overcome the sensitivity of DEA and FDH to outliers, extreme values, and random shocks, which are common in hospital datasets. The estimator based on a partial frontier does not envelop all data points and is therefore more robust to outliers than the conventional nonparametric estimators. Additionally, we rely on the conditional approach to incorporate hospital and regional characteristics into the estimation of efficient frontier (see the excellent review of the literature on the conditional approach by Bădin et al. [14]). Because hospitals facing different operational environments are characterized by different attainable frontiers, adjusting the efficiency estimates to the environmental conditions provides the conditional efficiency measure, which may deviate from the naïve unconditional measure. To identify the relation of the hospital and regional characteristics to the attainable efficient frontier, we regress the ratios of conditional to unconditional efficiency measures on the set of external variables. Despite its advantages over the conventional nonparametric approaches, the conditional approach based on partial frontier has only rarely been applied in the health care setting [15, 16].

Our second objective is to demonstrate how an empirical comparison of hospital efficiency can be conducted using the example of two European countries: Italy and Germany. There have been several analyses of hospital efficiency within Italy [1723] and within Germany [2431]. In this study, we analyze the hospital efficiency in both countries by projecting the observations on the common frontier. Doing so may uncover efficiency improvement potential beyond that revealed using single-country analyses because there may be a better benchmark in another country. To assure the comparability of hospital outputs in two countries with different DRG systems, we adjust the number of inpatient discharges for case-mix severity based on the average length of stay (LOS) in different diagnostic groups [24]. Using this approach to adjust the discharges allows us to generate the same weights across the analyzed countries, and it is more precise than the approach selected in previous studies based on a small number of output categories.

The study is structured as follows. The next section briefly describes the prominent features of the German and Italian hospital sectors. Section 3 introduces the methodology. Section 4 explains our selection of input, output, and environmental variables and describes data challenges encountered in the cross-country context. Section 5 presents hospital efficiency estimates and examines the relation between efficiency and environmental variables. Finally, the study concludes with a brief discussion of implications and provides guidelines for scholars wishing to perform similar analyses.

2 Institutional background on Italian and German hospitals

2.1 Organization

Italy’s health-care system is a regionally organized National Health Service that provides universal coverage and is largely free of charge at the point of service [32]. Full jurisdiction over health care belongs to the regions, which set goals and conduct planning. The regions fund their spending with regional taxes and user charges. Investments in the health care infrastructure are regulated by the Center for Evaluation and Verification of Public Investments supervised by the Ministry of Health [32].

In Germany, the health care system is based on Statutory Health Insurance (SHI), which is funded through contributions from employers and employees. The SHI covers approximately 90 % of the population; the rest of the population is mainly privately insured. Hospital inpatient stays are covered at the same rates for statutorily and privately insured patients and require a co-payment of 10€ per hospital day for SHI patients. In Germany, the states are responsible for planning inpatient capacities and financing hospital investments.

2.2 Reimbursement

To encourage more efficient utilization of health care resources, both countries have adopted DRG-based reimbursement systems. In Italy, the transition to the DRG-based payment system occurred at the end of 1994, whereas Germany transitioned to the DRG-based system only in 2004. The introduction of the DRG-based system in Italy led to a swift decrease in the average LOS in acute hospital care [17], which amounted to 6.7 days in 2010 [33]. In Germany, the average LOS in acute hospital care in 2010 was 8.1 days [33].

In an attempt to reduce excess capacity, many Organisation for Economic Co-operation and Development (OECD) countries have reduced their numbers of beds in recent decades. Indeed, the number of hospital beds per 1000 residents decreased from 6.6 in 1994 to 3.6 in 2010 in Italy and from 9.7 to 8.3 in the same time period in Germany [33]. Piacenza et al. [34] evaluated the outcomes of bed downsizing in Italy and determined that because of the lack of substitutability between capital and labor in hospitals, reducing the number of beds without a simultaneous reduction in the medical staff could not successfully reduce inefficiencies. In fact, although Italy had a substantially lower number of hospital beds per 1000 residents than Germany in 2010, the number of physicians per 1000 residents employed in hospitals in 2010 was 2.1, which was slightly higher than the 1.9 physicians per 1000 residents in Germany in the same year [33].

Volume-driven profits associated with DRG reimbursement might incentivize hospitals to induce demand for their services. Although there has been a general trend toward reducing inpatient hospitalizations in Italy, Germany has indeed observed an upward trend in inpatient discharges over the past two decades. As of 2010, Germany achieved almost 24,000 inpatient care discharges per 100,000 residents and thus had the second-highest rate (after Austria) of all the OECD countries, whereas Italy had only approximately 13,130 inpatient care discharges per 100,000 residents, which is below the OECD average rate of 16,000 discharges per 100,000 residents [33].

2.3 Ownership

Over the past two decades, the hospital sectors in both countries have experienced extensive changes in their regulatory and competitive environments. To enhance their competitive position, hospitals have engaged in mergers, acquisitions, and cooperative agreements. Both countries have also experienced a wave of privatizations in this time period, which have generally aimed to improve the efficiency of the hospital sector [17].

In Italy and Germany, hospital care is delivered by hospitals under three forms of ownership: public, private non-profit, and private for-profit. In both countries, private for-profit and private non-profit hospitals are reimbursed by the DRG system and receive funding for infrastructure in the same way as public hospitals. Most patients treated by private for-profit and private non-profit hospitals are paid for by public payers. Table 1 summarizes the main characteristics of the health care systems in both countries.

Table 1 Country facts

3 Methods

The purpose of this section is to describe the methodology in a non-technical way and to provide direction to other analysists wishing to perform the analysis of hospital efficiency. For technical details, the reader is invited to consult the Appendix; moreover, the references in this section identify studies that provide an extensive description of the related concepts. We start by delineating the advantages of partial frontier methods over the conventional full frontier methods, such as DEA and FDH. Next, we present the conditional efficiency measure, which incorporates hospital and regional characteristics, and explain its difference from the unconditional efficiency measure. Subsequently, we explain how performing a nonparametric regression of the ratios of conditional to unconditional efficiency estimates on the hospital and regional characteristics is useful in understanding the effect of external factors on production possibilities. We conclude this section by describing the empirical application of the conditional approach based on the partial frontier.

3.1 The idea of partial frontier analysis

Nonparametric estimators based on envelopment techniques, such as DEA and FDH, are frequently used to examine the efficiency of production units by analyzing a vector of inputs X utilized to produce a vector of outputs Y. However, the conventional estimators are deterministic because all observations belong to the production set and all deviations from the efficient frontier are regarded as inefficiency [4]. As a consequence, the estimators based on full frontier are sensitive to extreme values and outliers, which can have a large effect on the efficient frontier. To obtain estimates that are less sensitive to outliers, Cazals et al. [12] suggested estimating a partial efficiency measure by comparing a unit operating at level (x, y) to m randomly selected peers. The derived frontier is less deterministic because the benchmark is set according to the partial frontier of m units, which does not need to envelop all observations [35]. The parameter m has a dual function [4]. First, the value of m determines the degree of robustness. For m = 1, the m-frontier characterizes an average production frontier. To obtain a robust estimator, a large value of m is required. For m approaching infinity, the order-m efficiency estimator converges to the full frontier FDH measure. The second function of the parameter m concerns its economic interpretation as the number of potential competitors against which the firm is benchmarked. Thus, the choice of m in empirical applications can be guided by either statistical properties (the required degree of robustness) or economic considerations (how many competitors are needed to provide a realistic benchmark).

Additionally, the conventional nonparametric methods are affected by the curse of dimensionality, whereby the rate of convergence deteriorates as the dimension in the input-output space increases [4]. The rate of convergence indicates the likelihood of obtaining sensible results using finite samples. Unlike the DEA or FDH estimators, the rate of convergence of the partial frontier estimator does not depend on the number of input or output dimensions [4]. In this way, the partial frontier estimator avoids the curse of dimensionality.

In health care settings, the aforementioned advantages of using partial frontiers over full frontiers are particularly useful. The fact that hospitals provide heterogeneous services at various levels of health care leads to some extreme or atypical observations in terms of input and output combinations. Additionally, errors are quite common in hospital data, which may also cause outliers lying beyond the true efficient frontier. Therefore, relying on partial frontier analysis in empirical applications is convenient because it is less sensitive to extreme values and outliers than are deterministic full frontier estimators. Moreover, avoiding the curse of dimensionality means that in empirical applications, a large sample size is not necessary to achieve a high level of statistical confidence. This can be particularly beneficial in situations in which the sample size is limited by the institutional characteristics of the hospital sector—for example, when there is only a small number of comparable hospitals in a country.

3.2 The conditional approach

A hospital’s attainable efficiency is often influenced by its operational environment, which, unlike inputs and outputs, is not controlled by the hospital. For example, hospitals in disadvantaged neighborhoods may have a more severe patient case-mix, which will in turn affect those hospitals’ production possibilities. The conventional nonparametric approaches do not provide a satisfactory technique to incorporate and interpret the relation of external factors to efficiency [36]. Cazals et al. [12] and Daraio & Simar [13] introduced the conditional measure of efficiency, whereby the estimate of efficiency was adapted to the potential presence of favorable or unfavorable environmental conditions.

To illustrate the idea of the conditional approach compared to conventional approaches, we describe a simple data generating process (DGP) (inspired by Bădin et al. [14]) that is applicable to the hospital environment. Thus, consider a production technology in which the input value for all hospitals (n = 200) is standardized to one (X ≡ 1) and is used to produce the value of output Y, for example, the number of treated patients. Assume that the environmental factor Z represents a hypothetical case-mix of patients residing in the hospital’s neighborhood, where higher values of Z represent a more severe patient case-mix (Z is uniformly distributed with Z ∈ [0, 50]). Suppose that there is no effect of patient case-mix at small values of Z, but that there is a negative effect on output Y for values of Z larger than 20 (as illustrated in the left panel of Fig. 1). Thus, for Z ≤ 20, the amount of produced output Y is independent of Z and equals 40 - U, where U represents the inefficiency of a hospital, which is exponentially distributed with the mean of 3. In the left panel of Fig. 1, we highlight the most efficient observations (U smaller than the first-quartile value of 1.05) in black. Note that according to our DGP, the maximum value of Y in the absence of inefficiency (U = 0) equals 40. However, for Z > 20 the amount of output Y is determined by the expression Y = 40 + 200.9 - Z 0.9 - U and thereby decreases with Z. In this situation, hospitals located in neighborhoods with a more severe case-mix can treat fewer patients than hospitals with a less severe case-mix, because these hospitals have to manage additional comorbidities.

Fig. 1
figure 1

The illustration of the conditional approach based on simulated data (adapted from Bădin et al. [14]). Note: The left panel illustrates the difference between the unconditional efficiency estimate, λ(x, y), and conditional efficiency estimate, λ(x, y| z). The right panel plots the values of R(x , y| z) against Z and shows the nonparametrically fitted line

In a conventional nonparametric analysis, the efficiency, λ(x, y), of a hospital producing at (x, y, z) can be expressed using the distance (solid line) to the attainable frontier defined by the maximum value of Y (dotted line). However, the fact that the patient case-mix has an effect on hospital output causes a shift in the attainable frontier. In this case, the so-called separability condition does not hold, and λ(x, y) will overestimate the true inefficiency of the hospital (x, y, z). Note that in the output-oriented framework, higher values of λ(x, y) represent higher inefficiency.

According to the conditional approach, the measure of efficiency is adapted to the influence of the environmental factor Z. The hospital (x, y, z) is thus benchmarked according to similar hospitals, which in this case are characterized by a similar patient case-mix. The interval that includes similar hospitals is defined by the value of the bandwidth factor, h. Bandwidth selection is a key component of sound nonparametric analysis. The data-driven approach to optimal bandwidth selection was described in Bădin et al. [36]. In our generated dataset, h of the hospital (x, y, z) is roughly equal to 2, meaning that a hospital with a case-mix of 34 will be compared to hospitals with a case-mix in the interval between 32 and 36 because of the additional condition |Z - z| ≤ h. Thus, the conditional measure of efficiency, λ(x, y| z), which takes Z into account, estimates the efficiency of hospital (x, y, z) based on the distance (dashed line) to the conditional attainable set represented by the solid gray line in the left panel of Fig. 1. Therefore, in cases in which Z causes a shift in the attainable frontier, the conditional approach adjusts the estimate of efficiency to Z and in this way better captures the true efficiency than the traditional DEA/FDH methods.

The theoretical background on the consistency and asymptotic properties of the conditional efficiency estimators was provided by Jeong et al. [37]. The mathematical derivation of conditional efficiency measure is provided in the Appendix.

3.3 Examining the influence of external variables on the frontier shift

Most nonparametric applications rely on a two-stage approach to examine the influence of external variables on efficiency [38]. In the two-stage approach, the efficiency estimates derived from the DEA or FDH analysis in the first stage are regressed on a set of environmental variables in the second stage (for further discussion of the two-stage approach, see [39]). However, the two-stage approach becomes problematic when the environmental factors are related to the hospital’s inputs or outputs, which define the attainable production set [38, 39]. In this case, the separability condition does not hold, which results in the shift of the attainable frontier (as illustrated in the left panel of Fig. 1). The formal test of the separability condition is described in Daraio et al. [40].

To examine the potential influence of a set of environmental variables on the shift of the attainable frontier, Daraio & Simar [13] suggested nonparametrically regressing the ratios of conditional to unconditional efficiency measures, \( R\left(x,y|z\right)=\frac{\lambda \left(x,y|z\right)}{\lambda \left(x,y\right)} \), on the set of environmental variables. The individual effects of environmental variables can be visualized and interpreted using partial nonparametric regression plots. The right panel of Fig. 1 shows a nonparametric regression plot of the effect of Z on R(x , y| z) from our hypothetical example. As expected from our simulated data sample, there is no effect of Z on the efficient frontier for the values of Z less than or equal to 20 (as evidenced by the horizontal regression line). A decreasing regression line for the values of Z greater than 20 indicates an unfavorable influence of Z on hospital efficiency. In other words, the complexity of the patient case-mix represents the production of an unavoidable output (e.g., managing additional comorbidities), which diminishes a hospital’s production possibilities.

The analysis based on the robust order-m frontier provides very similar results because our simulated dataset does not contain outliers. However, the results based on the average frontier (m = 1) may be informative in situations in which m has an effect on the distribution of inefficiencies. The average frontier can be examined if the researcher has a clear picture of the separability condition [41].

3.4 The application of the conditional approach in our analysis

In the efficiency analysis, the analyst decides whether to select the input- or the output-oriented model. In our study, we apply the output-oriented model because regulatory issues and labor laws in both countries fix hospital inputs (i.e., the numbers of beds and personnel) in the short run, and the objective becomes the maximization of output using a given number of inputs [17]. In the output-oriented framework, efficiency estimates above 1 represent the maximal output radial expansion to reach the efficient frontier.

To obtain a robust measure of the efficient frontier, we use partial frontier analysis of order-m. Because the partial frontier does not envelop all observations, some efficiency estimates can be smaller than one, representing observations that are considered super-efficient with respect to the order-m frontier. We select m = 1,000 as the optimal number of reference hospitals because further increases in the size of m result in only a slight decrease in the proportion of the super-efficient units [3]. We further use the partial frontier analysis to identify potential outliers by increasing the value of m and observing the observations remaining beyond the frontier [42]. The outlier detection procedure is necessary when working with real-life hospital data.

To obtain the conditional efficiency estimates, we adapt the code written in R by De Witte & Kortelainen [43]. The primary challenge of the conditional analysis is to estimate observation-specific bandwidths for continuous and discrete environmental variables. De Witte & Kortelainen [43] suggested using the kernel function of Li & Racine [44] because it can accommodate continuous and ordered and unordered discrete environmental variables. The R ‘np’ package [45] can be used to estimate the appropriate bandwidths. This package is also useful for estimating the nonparametric regression of the ratios of conditional to unconditional efficiency estimates both to determine the significance of individual environmental variables and to produce the partial regression plots, which identify the direction of the influence of the environmental variable on the shift of the efficient frontier.

4 Data sources and definitions

The following section describes how the data challenges in international hospital comparisons are addressed in the context of the Italian and German hospital sectors. The data used in this analysis come from the calendar year 2010. The Italian hospital dataset includes only hospitals that are accredited by the National Health Service and is obtained from the national database of hospital discharges. The German hospital data are retrieved from the structured quality reports, which all acute care hospitals in Germany are required to publish.

A regional analysis across countries can be complicated by differences in the interpretation of the term “region”. In this study, we use the standardized classification of regions at two territorial levels developed by the OECD to facilitate international comparisons. Territorial level 2 represents the higher level of regional classification and consists of macro-regions (20 regions in Italy and 16 states in Germany), while Territorial level 3 represents the lower-level micro-regions and is composed of 96 spatial planning regions in Germany and 107 provinces in Italy. The variables used to account for a hospital’s operating environment in the conditional analysis are at Territorial level 3, which provides more precision than Territorial level 2, and come from the OECD Regional Data [46].

In our efficiency analysis, the DMU is an acute care hospital excluding psychiatric facilities, hospitals exclusively providing neurological beds, prevention and rehabilitation facilities, and long-term nursing care facilities. We examine the dataset for the presence of erroneous records, such as missing inpatient diagnoses or unrealistic numbers. Consequently, we exclude all hospitals with missing values in the input-output set and hospitals with zero values for beds, physicians, nurses, or inpatient days. The day cases are permitted to have minimum values of zero because some hospitals may choose not to perform certain services [8]. Moreover, we delete hospitals with fewer than 25 beds from the sample because of their often-atypical service provision. In the last step, we exclude outliers using partial frontier analysis, as described in Section 3.4. The final sample in our analysis comprises 920 Italian and 1381 German hospitals.

4.1 Input and output variables

We include three inputs and two outputs to describe hospital production technology. The selected inputs and outputs are in line with the previous literature (see the review of selected applications of frontier efficiency measurement by Worthington [47]). In our study, the inputs consist of the number of hospital beds and the head counts of physicians and nurses. The measure of full-time equivalent (FTE) units would be preferable; however, this measure is not available for Italy. In contrast, the number of German personnel is reported in FTEs, inducing us to convert the German personnel numbers into head counts using the corresponding ratios of FTEs to head counts for physicians and nurses employed in German hospitals from the OECD Health Statistics [33]. The adjustment makes German and Italian staff levels comparable but does not substantially affect individual hospital rankings.

The two output measures are represented by day cases and inpatient discharges adjusted for case-mix. We define day cases as hospital episodes in which a patient stayed less than 24 h in a hospital but the provided treatment requires substantially more resources than outpatient visits. Day cases include, for example, day surgery, oncological treatments, pain therapy, HIV therapy, and dialysis. This definition of day cases is similar in Germany and Italy and consistent with the definitions in other European countries [see, e.g., 48].

For the second output measure, inpatient discharges, we utilize the case-mix adjustment based on the relative LOS in different diagnostic categories following Herr [24]. First, we aggregate hospital discharges to 130 diagnostic groups listed in the International Shortlist for Hospital Morbidity Tabulation (ISHMT). The ISHMT accommodates for countries using different International Classification of Diseases (ICD) revisions, as is the case with Italy using the ICD-9-CM and Germany using the ICD-10-GM. Then, we estimate an index of discharge weights for each diagnostic category, d, as follows:

$$ {\pi}_d=\frac{LOS_d}{\frac{1}{D}\sum_{d=1}^D{LOS}_d},for\ \mathrm{d}=1,\dots, D, $$

where LOS d represents the cross-country average LOS in an ISHMT diagnostic category, d. Overall, we use the LOS for 130 diagnostic categories (D = 130) from OECD Health Statistics for 2010 [33]. A weight π d smaller (bigger) than one indicates a treatment that requires a longer (shorter) hospital stay than the average LOS. Subsequently, for each hospital i, the number of hospital discharges in each of the 130 diagnostic groups is adjusted by the corresponding weight π d and aggregated to produce the measure of inpatient discharges adjusted for case-mix as follows:

$$ Inpatient\ {adjusted}_i=\sum_{d=1}^D{\pi}_d\times {inpatient}_{d,i.} $$

Since its introduction by Herr [24], this measure of case-mix adjustment has been applied in a number of studies (e.g., [26, 49]). We conduct a robustness test to check whether the measure of case-mix severity based on the LOS captures the intensity of resource use. The Pearson correlation between our case-mix index and the aggregated input measure (the aggregation procedure is described in Section 5.2) is 0.58 and is highly significant. Therefore, this measure of case-mix severity represents a good proxy for resource use.

4.2 Operational environment

We include three hospital characteristics and five regional characteristics in the conditional efficiency model. The first hospital characteristic is a discrete ordered variable representing the bed size category of a hospital. Bed size category is often related to the different levels of acute inpatient and day surgery services, in which the complexity of the provided care often increases with hospital size. Moreover, large hospitals often provide additional services, such as teaching or research.

Second, we differentiate among three types of ownership: public, private for-profit, and private non-profit. Hospital ownership is associated with hospital mission and is often related to incentive structures that reward efficient performance. Economic theories (agency theory, property rights theory, public choice theory) suggest that because of the differences in incentives and budget constraints, private for-profit hospitals perform more efficiently than private non-profit and public hospitals, with public hospitals being the least efficient [28]. Although there are standardized definitions of the three ownership types in Germany, the classification of Italian hospitals by ownership is less straightforward. We classify the Italian hospitals into the three types of ownership along the lines of the approach suggested by Barbetta et al. [17].

The third hospital characteristic is the degree of hospital specialization. The direction of the association between specialization and efficiency is theoretically unclear. The advantages of operational specialization include reduced complexity, lower uncertainty, development of specialized expertise, and leveraging task repetition; however, the advantages of greater breadth are related to the economies of scope achieved by sharing common resources across activities [50]. Moreover, a hospital specializing in one discipline but containing all of the departments of a general hospital may appear inefficient because of the non-substitutability of some labor (or capital) inputs—such as specialty physicians or medical technologies—across functions [23]. In this study, we estimate hospital specialization based on the concentration of discharges in 130 diagnostic groups using a version of the Gini Index that is described in Kobel & Theurl [51] and that has been shown by the authors to outperform the traditional measures of specialization (the Herfindahl-Hirschman Index (HHI), the Distance measure, and the Information Theory Index), although all the measures show a high level of concordance. The maximum value of specialization is one, indicating the highest concentration of services; the minimum value is zero, representing complete diversification.

In addition to hospital characteristics, our model includes the following regional characteristics: market concentration, degree of urbanization, income, population age structure, and a country dummy. The inclusion of these regional characteristics is consistent with previous studies [52, 53]. To capture the degree of market concentration within a hospital-unique area, we use the HHI defined by a fixed radius of 32 km. We further adjust the measure of market concentration by the number of inpatient discharges in 20 major ICD chapters to account for the fact that hospitals do not compete for generic patients [54]. The degree of urbanization is defined as an ordered categorical variable ranging in our study from 1 = “rural remote” to 4 = “urban” based on the regional typology used by the OECD Regional Statistics [46]. Income is measured as the gross domestic product (GDP) per capita and is used as a proxy for socioeconomic differences among regions. Population age structure is measured as the percentage of the population over 65 years old and is used as a control for patient case-mix. Finally, a country dummy captures all other differences between the two countries.

5 Results and discussion

5.1 Descriptive statistics

Table 2 presents the summary statistics of the variables used in the unconditional and conditional models. In the descriptive analysis of hospital characteristics, we observe that Italy has more hospitals in the smallest bed category than Germany and that most Italian hospitals have between 50 and 150 beds. In Germany, most hospitals have between 150 and 400 beds. The average degree of specialization is similar in both countries and is approximately 40 %. In terms of ownership type, most Italian hospitals in our sample are under public ownership, followed by private for-profit ownership, and only 63 hospitals are under private non-profit ownership. In contrast, most German hospitals in the sample are under private non-profit ownership, followed by public ownership and then by private for-profit ownership.

Table 2 Descriptive statistics

Considering regional characteristics, the hospital market in Italy is more concentrated than the German hospital market. Most hospitals in the Italian sample are located in intermediately urban regions, whereas in the German sample, the majority of hospitals are located in urban regions. Furthermore, the German regions have a higher median income. The regional age structure is similar in both countries.

5.2 Efficiency based on the national frontiers and the common frontier

In the following section, we present the unconditional efficiency estimates based on two frontier measures: a national frontier includes only observations from the same country, whereas a common frontier pools the observations from different countries. Comparisons of efficiency using national frontiers can indicate the dispersion of firms away from their national frontier but cannot be used to determine whether firms in one country are more efficient than firms in another country because firms’ efficiency is estimated using different benchmarks [6]. In contrast, comparisons of efficiency using a common frontier can be used to directly compare firms in different countries, provided that the underlying data are comparable.

We first restrict the dataset to hospitals in the same country and estimate the efficiency of hospitals using partial order-m methodology based on the national frontiers of an average of 1000 hospitals to obtain robust estimates (Table 3). Remember that in the output-oriented framework, efficiency estimates larger than 1 represent inefficient observations, whereas efficiency estimates below one represent observations that are considered super-efficient with respect to the order-m frontier. The dispersion of efficiency scores is higher in Italy than in Germany, meaning that there is much more variation in hospital performance in Italy. This high dispersion also explains why Italy has a higher median inefficiency than Germany (1.16 versus 1.05) because in the nonparametric analysis, the relatively more efficient hospitals determine the efficiency of the relatively less efficient hospitals and thus, the individual efficiencies are higher when the hospitals are similar [6].

Table 3 Efficiency estimates and bandwidths

In the next step, we examine the efficiency of hospitals compared against a common frontier. In the nonparametric efficiency estimation, by construction, joining separate datasets does not increase the efficiency scores of the individual hospitals in comparison with the analysis based on the separate frontiers because the optimal solution must be less than or equal to the previous optimal solution [6]. Therefore, the median efficiency estimates based on the common frontier for each country are higher (meaning more inefficiency in the output-oriented model) then the median efficiency estimates based on the national frontiers. The median efficiency estimate based on the common frontier equals 1.19 in Italy and 1.07 in Germany. Whereas the median national efficiency estimates change slightly when we compare hospitals based on the common frontier (in the predicted direction), the hospital rankings in each country remain very similar. Thus, for Italian hospitals, the Spearman rank correlation for efficiency estimates based on the common frontier and estimates based on the Italian frontier is 0.95 (the Pearson correlation is also 0.95). For German hospitals, the Spearman rank correlation for the efficiency estimates based on the common frontier and the estimates based on the German frontier is 0.94 (the Pearson correlation is 0.98).

To graphically examine the production frontiers (common and national), we aggregate the three inputs (beds, physicians, nurses) into one input factor and the two outputs (adjusted inpatient discharges, day cases) into one output factor. We use the aggregation procedure based on principal component analysis (PCA), as suggested in Daraio & Simar [4], because there is no clear managerial reason to impose particular weight restrictions. To make the inputs and the outputs unit-free, we first divide each input and output by their mean. Note that because the radial efficiency measures are scale-invariant, the mean standardization does not affect the efficiency scores. Then, the three inputs are aggregated using the eigenvector with the largest eigenvalue of the matrix X T X as the weights. The resulting univariate input factor is highly correlated with the three original inputs (Pearson correlations are above 0.96). We follow the same procedure with the two outputs to derive an aggregated output factor.

Figure 2 shows the distribution of the Italian and German hospitals in the production space and plots the partial order-m frontiers (m = 1,000) based on the national frontiers and the common frontier. Note that because we are conducting the efficiency analysis based on the partial order-m frontier and not the full frontier (as in FDH), there are some observations above the efficient frontier (super-efficient observations). Figure 2 illustrates that the national frontiers are very close in the production space; therefore, the assumption of the common technology does not appear to be violated. Moreover, the common frontier provides a more robust reference than the national frontiers because it is based on more information. This robustness is especially important in the production space with few observations, as is the case for larger hospitals in these two countries.

Fig. 2
figure 2

National and common production possibility frontiers. Notes: Production possibility frontiers are obtained by the partial frontier of order-m (m = 1,000) and are based on the output-oriented model with one aggregate input factor and one aggregate output factor

To summarize, we believe that Italy and Germany can be compared using the common frontier. The unconditional efficiency scores for each country derived using the national or the common frontier are significantly and strongly correlated (correlation coefficients are above 0.94). Moreover, the national and the common frontiers lie close to each other in the two-dimensional production space; the common frontier is the robust average representation of the two national frontiers because it is based on more information. In the following estimations, we assume a common frontier, allowing hospitals from both Italy and Germany to support the efficient frontier.

5.3 Conditional efficiency estimates

The bandwidth values listed in Table 3 provide some indicative evidence for the relation of the operational environment to the production set. High bandwidth values help identify irrelevant external factors for particular observations, which are effectively smoothed out for these observations [36]. In our sample, the optimal bandwidths for market concentration and population age structure are very high even at the third quartile and well beyond the range of these variables. Consequently, most of the bandwidth values for these variables produce oversmoothing. In contrast, small values of optimal bandwidths indicate the influence of an external variable on the production process. Most of the external variables (with the exception of market concentration and population age structure) in our sample have quite small bandwidth values in the three quartiles with respect to the range of these variables.

In the conditional efficiency model, we control for the heterogeneity of hospital operational environments using the observation-specific bandwidths described above. The median conditional efficiency estimate for the total sample of 1.03 is lower than the unconditional measure of 1.10 because the reference sample is smaller and includes hospitals with similar characteristics. The median conditional efficiency of German hospitals is 1.01; thus, only a 1 % increase can be achieved when using best practices. However, in Italy, a 9 % increase in the median conditional efficiency can be achieved.

Figure 3 presents the medians of conditional efficiency estimates aggregated at Territorial level 2 (20 regions in Italy and 16 states in Germany). In Italy, the most efficient regions are Liguria, the Aosta Valley, Veneto, and Emilia-Romagna in the North and Apulia and Campania in the South. In Germany, all the new federal states of East Germany (except Berlin) perform well in terms of conditional efficiency, possibly because the modernization and restructuring of the hospital sector after the German reunification put in place new infrastructure and efficient hospital systems [10, 28].

Fig. 3
figure 3

Median conditional efficiency by region

Note that the conditional efficiency estimates already account for the differences in production possibilities determined by hospital and regional characteristics. This purely descriptive analysis only illustrates the regional differences in the median conditional efficiency; however, the variation within regions is large, and the regional factors per se are not the main reason for the efficiency differences. This will become clear in the next section, which formally explores the relation between the environmental variables and the estimated efficiency scores using nonparametric significance tests.

5.4 Interpretation of operational environment

The results of the nonparametric regression, which uses the ratio of the conditional efficiency estimates to unconditional efficiency estimates as the dependent variable, are listed in Table 4. The significance test for continuous and categorical variables suggests that hospital characteristics play a significant role in explaining the differences in hospital efficiency. In contrast, market concentration, degree of urbanization, income, and population age structure are significant only at the 10% level. Despite controlling for certain hospital and regional characteristics, some country-specific factors remain unexplained and are captured in the country dummy variable, which is significant at 0.1% level.

Table 4 Nonparametric significance analysis of environmental factors

To detect the direction of the relation of the significant environmental variables to efficiency, we examine the partial regression plots, in which all of the exogenous variables are set at their median values (the last column in Table 4 summarizes the observed relations). Thus, there is an inverted U-shaped relationship between hospital bed size and efficiency. Hospital efficiency increases with size and reaches its highest for hospitals with 150 to 400 beds, after which point the effect of size on performance becomes negative. Hospitals with more than 650 beds are the least efficient. This finding shows that although the smallest hospitals do not operate on an optimal scale, leading to increasing economies of scale, the biggest hospitals operate on decreasing economies of scale. This finding is consistent with the literature on cost-size relationship, in which average hospital costs decline with increased hospital size but, after a point, begin to increase with size [55]. Moreover, large hospitals often combine the delivery of patient care with other outputs that are not included in our production model, such as teaching activities or research.

With respect to the type of hospital ownership, private for-profit hospitals tend to be the most efficient, followed by private non-profit hospitals. Public hospitals are the least efficient. Although empirical findings on the association of hospital ownership with efficiency appear mixed (see the review by [56]), our finding is consistent with the economic theory [28] that for-profit hospitals—followed by non-profit hospitals—have greater incentives to control costs and therefore exhibit greater efficiency than public hospitals. Moreover, public owners have limited financial capacity to finance investments in hospital facilities, major technology, and general hospital maintenance. In both countries, the limited financial capacity of public owners is aggravated by the fact that public entities have accumulated substantial public deficits in recent years, making it difficult to maintain public infrastructure. The difficulty in financing hospital investment along with the increased financial risk associated with DRG-based funding have led to a series of hospital mergers and acquisitions, with privatization being a welcome relief for public budgets [57].

The degree of hospital specialization has a negative association with efficiency. Empirical evidence addressing the link between specialization and efficiency is limited and contains conflicting findings. Although some studies found a positive association between specialization and efficiency [e.g., 20], others revealed the exact opposite findings [23, 58]. The theoretical explanation for our result could be that hospitals offering diversified services take advantage of the economies of scope, whereas hospitals specializing in a small number of diagnoses but also providing a number of other services may be inefficient because they maintain labor and capital inputs that are not utilized fully and cannot be used across specialties.

Finally, hospitals in Germany are significantly more efficient than hospitals in Italy. A significant country dummy variable suggests that some unobserved country factors (e.g., institutional arrangements, tax-based versus social insurance funding, gatekeeping) play an important role in hospital efficiency. Whereas the provision of hospital care represents the largest share of the health care system (measured by health expenditures), there exist important cross-country differences in how primary, emergency, rehabilitation, long-term, palliative, mental health, and informal care are provided. The efficiency of the hospital sector is inextricably linked to the organization of other types of health care provision.

5.5 Study limitations

Porter [59] claimed that the appropriate way to measure value in health care should encompass all services and activities that contribute to success in meeting patient needs and advocated the need to change the focus from the volume of services delivered to the quality of the achieved outcomes. Unfortunately, because of the lack of suitable data at the hospital or appropriate regional level (Territorial level 3), we do not control for the quality of outcomes (e.g., mortality, re-admission rate, patient satisfaction) in the main model specification. However, in the sensitivity analysis, we estimate three models including a control for three different quality indicators at Territorial level 2. These indicators are death rates per 100,000 inhabitants (standardized by age and sex) from ischemic heart disease, circulatory system disease, and cerebrovascular disease. The results show that the quality variables are not significant in any of the three efficiency models.

Another limitation of our analysis is the lack of additional variables to comprehensively describe the production set. For example, we do not have data on inputs such as other medical and non-medical staff or technical equipment. We also do not possess information on outputs such as the number of outpatient visits or research and teaching activities. Future research would benefit from extended comparative statistics, provided these statistics are reported using internationally standardized definitions.

The results of a cross-country comparison of hospital efficiency may also be skewed when the patterns of health care provision in other types of care (e.g., primary care) vary across countries. For instance, we cannot make any judgments about the necessity of inpatient or day-case treatments and the extent of supplier influence on the demand for medical services. The problem of supplier-induced demand is particularly acute for countries without a gatekeeping system, such as Germany, but is less critical in Italy, where general practitioners control the utilization of hospital care [32]. Differences in practices in primary care create different demand pressures on inpatient care, which in turn may have an impact on the efficiency of hospitals in these two countries.

There remains a possibility that the case-mix measure of output adjustment based on the LOS will underestimate the complexity of some diagnoses that require particularly intensive care not captured by the LOS, which represents a limitation. Fattore and Torbica [60] analyzed tariff setting and costing in Italy, finding that the tariffs did not necessarily reflect production costs and might have different levels of generosity across clinical areas. In turn, the variation in tariff setting could affect the LOS in different diagnostic categories. However, we used the average LOS across the two countries, and in this way, the effect of country-specific tariffs will be partly mitigated.

Moreover, we stress that the conditional approach corrects the efficiency estimates for the potential influence of environmental factors on the shift to an attainable frontier. The nonparametric regression of the ratios of conditional to unconditional efficiency estimates explains the significance and direction of the effect of operational environment on production possibilities. However, these ratios are not sensitive to changes in the distribution of inefficiencies that may occur because of the influence of certain environmental variables. Bădin et al. [41] explained in detail how to differentiate the effect of environmental factors on the shift of the attainable frontier from their effect on the changes in the distribution of inefficiencies. Those authors further showed how to derive the values of managerial efficiencies whitened from the effect of the environmental factors.

6 Conclusion

The present study demonstrates an application of the nonparametric conditional approach based on the robust partial frontier to the comparison of hospital efficiency in Italy and Germany. Partial frontier analysis produces robust efficiency estimates that are not affected by extreme values and outliers (which are common in large datasets) and thus avoids the main limitation of traditional efficiency methods, such as DEA and FDH. Additionally, the conditional approach naturally incorporates the operational environment into the efficiency analysis and provides a means to interpret the relation of external variables to efficiency without making the strong separability assumption of the traditional two-stage approach. Finally, the cross-country analysis offers the possibility of benchmarking DMUs according to the common best-practice frontier, taking advantage of an increased reference set, which allows the development of better managerial and policy implications from the hospital efficiency analysis. To address the issue of output comparability in the absence of a uniform DRG system, we utilize the information on the average LOS, which serves as a good proxy for case-mix complexity in the analysis of hospital efficiency.

Our analysis is based on all accredited Italian hospitals and all German hospitals, which include 920 Italian and 1381 German hospitals after the removal of atypical observations. The examination of hospital efficiency based on the two countries’ national frontiers shows that there is more dispersion in the efficiency performance of the Italian hospitals than that of the German hospitals. The high dispersion in the efficiency of the Italian hospitals suggests that if the inefficiency of extremely inefficient units is addressed, there is substantial potential for improvement. The graphical inspection of the national frontiers against the common frontier, which combines the Italian and German hospitals, shows that the assumption of a common technology is not violated. Therefore, our adjustment of hospital inputs and especially hospital outputs to make them comparable across the two countries is successful, and the hospitals in both countries can be realistically compared against a common frontier.

The analysis of hospital efficiency that uses a common frontier and is conditional on the differences in operational environments reveals that the Italian hospitals can improve their median conditional efficiency by 9 %, whereas the German hospitals have a median improvement potential of 1 %. The analysis of the significance of the individual factors in the operational environment shows that hospital characteristics play a more important role than regional factors in explaining the differences in hospital production possibilities. The consideration of these hospital factors may shed light on the lower median performance of the Italian hospitals compared with the German hospitals.

Thus, one potential reason for Italian hospitals showing lower efficiency may be that they do not achieve the optimal scale of production. More than half of the Italian hospitals in our sample (55 %) have 150 or fewer beds, whereas in the German sample, only 31 % of hospitals fall into that bed size category. Our analysis clearly shows that hospitals with fewer beds have greater difficulties in achieving efficiency, suggesting that optimal hospital size should be an important issue for Italian policy makers. Moreover, the popular policy of bed downsizing to control health expenditures can only succeed if excess staffing is also addressed because of the lack of substitutability between hospital staff and beds [34].

Additionally, privatization could be another way to enhance the efficiency of the Italian hospital sector. Currently, 72 % of hospital beds in the Italian sample are publicly owned, whereas in the German dataset, public beds represent only 54 % of total hospital beds. Privatization within a comprehensive regulatory framework could help the regions reduce the budget deficits and put into place better incentive systems to encourage efficient performance. Indeed, public investment in hospital buildings and technology in Italy has been slowing in the aftermath of the economic downturn, which might not only play a role in the inefficiency of hospitals but also represent a risk to health workers and patients, considering that the average age of an Italian hospital is 70 years [32].

Finally, policy makers should not embrace the conception that specialized hospitals provide patient care more efficiently. The complexity of hospital care production may actually lead to the non-substitutability of some labor (or capital) inputs across departments, and a high degree of specialization will necessitate the use of additional labor and capital inputs. Based on our analysis, the Italian and the German hospitals could in fact increase their efficiency performance by taking advantage of economies of scope through sharing major technology, administration, and medical expertise and resources across departments.

Some unexplained cross-country differences remain because hospitals in different nations may face different regulatory and institutional conditions and because efficiency of hospital care provision is related to the interplay between different health care providers. Further research should attempt to explain the country-specific variation in hospital efficiency using a larger sample of countries and specifying variables for the different types of health care organizations in these countries. In this study, we offer new empirical results from a comparison of two European countries, using state-of-the-art techniques in nonparametric efficiency analysis.