1 Introduction

The health care industry faces new challenges every day, such as new regulations or new technologies. Managers and policy makers need to respond to these challenges with sound performance evaluation [1]. In the health sector, the outputs (products) are difficult to define, since health is a complex concept, there is no traditional health market [2,3], and there is lack of homogeneity between the output measurements, as in case of inpatients [4]. Differences between hospitals are due to factors such as technology, complexity and severity differences and legal status. There are several ways to try to homogenize that information, e.g. using a service-mix index (for structures and facilities) or a case mix (for complexity and severity of patients). There is a special care in some studies to adjust the hospital production in terms of complexity, using the case mix index (CMI), especially for inpatients, e.g. [510]. Indeed, highly aggregated variables (such as gross measures of hospital services, e.g. the total number of inpatients) may, however, harm the efficiency comparisons against hospitals that treat a relatively high proportion of severely ill patients [11].

Nonetheless, only few studies have been performed to evaluate the implication of such index on the efficiency scores (ES). While some studies conclude that there are no differences in technical ESs whether the case mix is incorporated or not [1215], others conclude that there are statistically significant differences in the ES distribution due to different case mix specifications [1618]. However, other authors studied the limitations of case mix grouping and found that such instrument should be applied with caution both for monitoring and financing purposes at the department level [19]. Concerning the Portuguese case, Dismuke and Sena [20] concluded that the inclusion of the diagnosis-related group (DRG) payment “has had a positive impact on productivity and technical efficiency of some commonly employed diagnostic technologies”. Furthermore, according to Yang and Reinke [21] both classification systems (ICD and DRG) generate similar case mix-based indexes.

Therefore, there is a noteworthy discussion about the influence of the CMI (as the most frequent risk adjustment way) on hospital efficiency. Nevertheless, the literature does not address conveniently the effect of environmental variables, which may be considered as good proxies of the complexity/ risk/ severity of illness. Furthermore, most of them use deterministic efficiency methods, and it is well known that they may entail some problems (see infra for details).

The objective of this paper is threefold. It is intended to answer three different questions: (1) Does risk adjustment (through the CMI) really matter for efficiency assessment? (2) Can risk adjustment be replaced by a hospital service complexity-based index without major efficiency measurement changes? To do so, a new service-mix index is proposed and analyzed. (3) Can risk adjustment be done through the inclusion of some environmental variables (such as the morbidity)?

This study is organized as follows: after the CMI has been characterized in Section 2, a new index (service-mix based) is proposed in Section 3. An overview of the main methods is provided in Section 4 and the main results are presented in Section 5. The conclusions, results discussion and further research topics are given in the last section.

2 The case-mix index

Despite the existence of multiple services in a hospital, it might not be correct to include them all in the same model, due to the necessity of considering a lot of outputs (and even inputs). Then, only the internment department is considered in this study, due to lack of data available. The main outputs are the inpatients. However, they should be adjusted by risk because “not all patients arriving at the hospital require the same level of attention and service” [1], pp. 107. It seems to be clear that the inpatients risk adjustment is necessary [13,2224], at least theoretically, although some authors (e.g. [25]) have noted that this adjustment may be a fallacy. Being the CMI the most common risk adjustment (for details, see infra), there are two main approaches to do that: (1) by multiplying the number of inpatients (admissions or discharges) by that index—given the way it is computed (see infra), this seems to be the most appropriate procedure, or (2) by considering the CMI as an extra (non-discretionary) variable, as discussed below (see [14]). Both approaches are tested in this paper so as to assess the model sensitivity to this variable (or adjustment).

The most common case mix classification scheme is the DRG system, which is a set of patient case types established under a disease and injury classification system that identifies patients with similar illness conditions and processes of care (assigning the DRGs with a weighting factor that compares its costliness to the average for all DRGs under a case-mix funding scheme) [11]. The DRG system has been employed in Portugal since 1984 [20]. The DRG system reflects the distribution of patients based on morbidity, patient characteristics such as age, and aspects of the treatment received. Each DRG represents a class of patients with similar clinical conditions, hence requiring similar resource usage for treatment.

Associated with DRG groups, there is the CMI, which can be computed using the Eq. (2.1), where (1) it is assumed that two patients in the same DRG group are equivalent, i.e., the expenses are the same for both, and (2) the weight of the k-th DRG is the relationship between its average costs and the national costs, then it measures the expected costs for an average inpatient (of this group). The CMI reflects the complexity of the treatments provided and the level of resources consumed in a hospital [21].

$$ CM{I}_{hospital\;k}=\frac{\sum_i{\left( equivalent\; inpatients\; DR{G}_i\right)}_k\times weight\; DR{G}_i}{\sum_i{\left( equivalent\; inpatients\; DR{G}_i\right)}_k} $$
(2.1)

The national CMI is unitary—any hospital with CMI < 1 (resp. > 1) treats, on average, patients with low (high) complexity. That is, the higher the CMI, the higher the average diseases complexity the hospital handles. In the Portuguese case, the CMI is only available for inpatient and ambulatory services, medical or surgical. In general, the CMI for surgical services is greater than the CMI for medical services.

3 The service-mix index: a new approach

Despite the theoretical importance of weighting the inpatients by their clinical complexity (morbidity), it is not unusual that the CMI values may be often non-available in hospital reports. Analogously to the CMI, one can define a new index, changing the patient classification into DRG to a patient classification by serviceFootnote 1—for this research purpose, all considered services belong to the internment production line, i.e., all specialty wards where patients are admitted for a period longer or shorter, including both medical and surgical services/ specialties and intensive care units.

This new index is called Service-Mix Index (SMI) because it measures the complexity of hospitals by their services, meaning that if a hospital has services that consume more resources than the national average, then its SMI is greater than 1. This index allows homogenizing the hospital services and, furthermore, it has some advantages relative to the CMI: a) it can be computed with respect to any baseline (e.g., the global internment service or a lower/ higher hospital level)Footnote 2 and b) it requires much less data (only the costs and production), which are usually available. The SMI also takes into account the complexity of inpatients, if it is accepted that two patients treated in the same service have similar disease complexity.

Ozcan and Luke [26] and Ozcan [27] discuss the importance of some proxies of capital investments, namely the service-mix that measures the hospital services complexity. However, that complexity is frequently associated with the inpatients severity/ complexity/ risk—the higher the patient’s illness severity, the higher the capital requirements in the hospital service where the patient is treated. For instance, an Intensive Care Unit (ICU) treats pathologies with severe clinical profiles, thus a priori it requires more inputs than other services. Consequently, a hospital with ICU services (and/ or other equally complex services) has a priori a higher CMI. Therefore, the SMI may be a similar appropriate inpatients weight, or an extra variable, as it will be discussed further.

The formula of the SMI is equal to the CMI, replacing the DRG weight by the service weight (sw), as indicated in Eq. (3.1). Weights could be estimated as follows. First, the geometric mean of unitary costs for the internment global service (GMIGS) is computed. Next, the geometric mean of unitary costs for each service (GMS i ) is determined and divided by GMIGS, i.e., \( s{w}_i=\raisebox{1ex}{$ GM{S}_i$}\!\left/ \!\raisebox{-1ex}{$ GM IGS$}\right. \). Then, it is easy to conclude that services with higher relative weights present higher costs. It may be observed, from Eq. (3.1) that the same patient can be counted more than once. Actually, one patient can be admitted to more than one service during the same hospital stay. However, for two different services, the same patient is considered twice. As a consequence, the SMI is an aggregate index of all services complexity.

$$ SM{I}_{hospital\;k}=\frac{\sum_i{\left( inpatients\; of\; servic{e}_i\right)}_k\times s{w}_i}{{\displaystyle {\sum}_i{\left( inpatients\; of\; servic{e}_i\right)}_k}} $$
(3.1)

In Portugal, for instance, in 2009, the geometric mean of the unitary costs of all internment services was 2,776€ (GMIGS per inpatient). Thirteen hospitals provide “Convalescence Units” services (CUS), such that the geometric mean of their unitary costs was 5,566€ (GMSCUS per patient), thus the weight of this kind of service is 2.01 (sw CUS = 5,566/2,776), i.e., this service consumes, on average, about twice more resources than the national standard (taken as unitary). Meanwhile, 7 hospitals provide “Palliative Care Units” services (PCUS) such that GMSPCUS = 3,984€ (per patient), then sw PCUS = 3,984/2,776 = 1.44. Suppose that two hospitals provide only these two services—H1 treats 100 inpatients in CUS and 200 inpatients in PCUS, thus SMIHospital1 = (100 × 2.01 + 200 × 1.44)/(100 + 200) = 1.63; H2 treats 500 inpatients in CUS and 50 in PCUS, thus SMIHospital2 = (500 × 2.01 + 50 × 1.44)/(500 + 50) = 1.96; hospital H2 has an higher SMI because it treats more inpatients than H1 in the service that consumes more resources due to its implicit complexity (kind of technology and drugs, staff specialization level and inherent inpatients complexity).

The comparison of CMI and SMI can be observed in Fig. 1. A simple correlation was performed to assess the Pearson’s coefficient R = 0.7840, which means that the linear equation describes the relationship between CMI and SMI with a strong correlation, i.e., there is a strong linear relationship between the global complexity of inpatients and the global complexity of the hospital internment services. Thus, the SMI could be considered as a good proxy of CMI, which means that the most complex patients are generally associated with the most complex services. The fact that the relationship between CMI and SMI is not perfectly linear results from the possible existence of a significantly broad spectrum of patients’ complexities in the same service, especially regarding the less complex services, or eventually due to the fact that evidence based protocols are sometimes vague.

Fig. 1
figure 1

Correlation between CMI and SMI: R = 0.7840, SMI = 0.3750xCMI + 0.6625

It should be noted that there are several definitions of service-mix in the literature. In particular, some studies (e.g. [1,28,29]) describe and use the service-mix as the non-weighted quantity of services provided by the hospital, from a list of possible 151 services identified by the American Hospital Association. Meanwhile, Chou et al. [30] develop a relative weighted service-mix since the non-weighted one does not correctly differentiate the hospitals, because there are services that require larger financial investment due to their inherent complexity and their patients’ severity/ complexity (which is the basis of the new SMI developed in this paper). This weighted service-mix scoring is based on two different dimensions (clinical intensity and investment intensity). Nonetheless, such measure is categorical and one may argue that even between two similar services, the same scoring on clinical and/or investment intensities may result into biased measurements of the service complexity. Thus, the SMI computed by Eq. (3.1), being a continuous variable, seems to be more appropriate to differentiate the services concerning their complexities. It has a further advantage: unlike the standard service-mix measure (as previously defined), it is able to adjust the inpatients since they are included in the computation method. Furthermore, the SMI (1) is comparable to the CMI and may replace it (at least, theoretically) when that one is not available, (2) it is easy to compute—one only needs the total costs and the production, per service, and (3) it is able to represent the capital inputs, as well as the standard approach [26,28].

4 Efficiency of Portuguese hospitals: the internment case

4.1 Methods

This paper is based on two different efficiency estimation approaches, which are based on frontier analysis. The locally convex non-parametric order-m is employed to achieve robust efficiency estimation, without strong frontier assumptions. This procedure is quite useful since it allows the inclusion of environmental effects in the model; furthermore, being a partial frontier method, it is much less sensitive to outliers and extreme data and to the curse of dimensionality than full frontier methods, such as DEA or FDH. Non-parametric methods only require few frontier assumptions (RTS, disposability of variables and convexity/non-convexity), thus they are not based on frontier functional specifications [31,32].

Let X ∈ ℝ p+ be the set of p inputs consumed to produce Y ∈ ℝ q+ outputs by n Decision Making Units (DMU), under the influence of Z ∈ ℝs exogenous (or environmental) factors. The conditional, locally convex order-m ES, θ, of the unit (x, y |z) projected against the set (X, Y |Z) can be computed as follows (after an appropriate m has been defined by the user) [33]:

  1. 1.

    For a given y, draw a sample of size m with replacement, and with a probability K((z − Z)/h)/∑ n j = 1 K((z − Z j )/h), among those X such that Y ≥ y—denote this sample by ℒ = (X 1,b  … X m,b );

  2. 2.

    \( \mathrm{Compute}\;{\theta}_b= mi{n}_{\lambda, \theta}\left\{\theta \left|\begin{array}{c}\hfill \theta {x}_i\ge {\displaystyle {\sum}_{j=1}^n{\lambda}_j{X}_{ji},i=1,\dots, p;}\hfill \\ {}\hfill {y}_r\le {\displaystyle {\sum}_{j=1}^n{\lambda}_j{Y}_{jr},r=1,\dots, q;}\hfill \\ {}\hfill {\displaystyle {\sum}_{j=1}^n{\lambda}_j=1;}\kern0.5em \hfill \\ {}\hfill {\forall}_{j=1,\dots, n}{\lambda}_j\ge 0;\hfill \\ {}\hfill X\in L\hfill \end{array}\right.\right\} \);Footnote 3

  3. 3.

    Redo previous steps for b = 1… B, where B is large (>>500) that allows tuning the estimation qualityFootnote 4;

  4. 4.

    Finally, \( \theta \left(x,y\right)\cong \frac{1}{B}{\displaystyle {\sum}_{b=1}^B{\theta}_b} \).

In the first step of the previous procedure, K represents a kernel function that works as a probability density function (pdf), and h is a bandwidth that is useful to restrict the entities to be selected in the first step of the previous procedure (if h → ∞, then the drawing process is equiprobable among all units, no matter what kernel function was chosen). The most common kernel function is the Gaussian one, and its optimal h is given by the normal reference rule as follows [34]:

$$ h=1.06\cdot \min \left\{\sigma, {R}_{23}/1.34\right\}\cdot {n}^{-\frac{1}{5}} $$
(4.1.1)

In the previous equation, σ and R 23 are the empirical standard deviation and the interquartile range of the n data points, respectively. In the case of multivariate Gaussian kernels (several environmental variables), it is assumed that the global pdf is the product of all pdf’s although some others methods have been proposed in the literature [32]. Indeed, there is no noticeable reason to choose one method in particular: the pdf product seems, however, to be the simplest procedure.

Since the fourth step of order-m has been concluded, one may fit the set of efficiency estimations by employing a parametric function [31].Footnote 5 It is employed to strengthen the conclusions. The employment of both parametric and nonparametric measures in health care can be found in several studies, e.g. [35], although most of the health care analyses follow non-parametric approaches [36]: some examples include O'Neill [37], Hollingsworth et al. [38], Bjorkgren et al. [39] and Biorn et al. [40]. Following Daraio and Simar [32], the input-oriented translog function is given by:

$$ \begin{array}{l} \ln \frac{1}{\theta }={\alpha}_0+{\displaystyle \sum_{i=1}^p{\alpha}_i\cdot \ln {X}_i}+{\displaystyle \sum_{r=1}^q{\beta}_r\cdot \ln {Y}_r}+\frac{1}{2}{\displaystyle \sum_{i=1}^p{\displaystyle \sum_{j=1}^p{\Gamma}_{ij}\cdot \ln {X}_i\cdot \ln {X}_j}}+\frac{1}{2}{\displaystyle \sum_{r=1}^q{\displaystyle \sum_{w=1}^q{\Gamma}_{rw}}\cdot \ln {Y}_r\cdot \ln {Y}_w}+\hfill \\ {}\frac{1}{2}{\displaystyle \sum_{i=1}^p{\displaystyle \sum_{r=1}^q{\Gamma}_{ir,1}\cdot \ln {X}_i\cdot \ln {Y}_r}}+\frac{1}{2}{\displaystyle \sum_{r=1}^q{\displaystyle \sum_{i=1}^p{\Gamma}_{ir,2}\cdot \ln {X}_i\cdot \ln {Y}_r}}\hfill \\ {}s.t.\hfill \\ {}{\displaystyle \sum_{i=1}^p{\upalpha}_i}=1;\hfill \\ {}{\displaystyle \sum_{r=1}^q{\Gamma}_{ir,1}}=0\;\left(q\; constraints\right);\hfill \\ {}{\displaystyle \sum_{j=1}^p{\Gamma}_{ij}}=0\;\left(p\; constraints\right)\ \hfill \end{array} $$
(4.1.2)

In the previous equation, α 0, α i , β r , Γ ij , Γ rw , Γ ir,1 and Γ ir,2 are the model coefficients that are computed in the least-squares sense subjected to 1 + p + q homogeneity (of degree one in X) constraints, which follow an input-oriented framework. Daraio and Simar [32] argue that it overcomes the difficulties of classical approximations [37,38], namely (1) the restrictive and non-realistic assumption of homoscedasticity, and (2) some problems of consistency between the error term and the dependent term. Indeed, the previous translog model uses the projected values on the non-parametric frontier and does not make any assumption on the error term, which results into more sensible results [32].

4.2 Modeling issues—an overview

In this paper, three different approaches are proposed and developed in the following sections. Although some methodological details are described there, some common issues may have already been defined elsewhere.

4.2.1 Data collection

All required data for this research is available in the official database of the Portuguese Ministry of Health, the Central Administration of Health Systems (CAHS), and in lawful annual reports of each hospital, for the period between 2002 and 2009 (8 years). Only this period has been considered due to the readiness of data.Footnote 6

Moreover, only 49 hospitals (among all Portuguese units) could be selected from this database, since they were the ones that provided reliable information without gaps for all the years and all the variables. Maternities, oncology centers and psychology centers (specialized hospitals found in Portugal) were excluded because their own production process and structures are expected to be significantly different from those non-specialized units considered in this paper. No private hospitals have been included in the sample, as well. Therefore, 392 units, homogenously distributed in 2002–2009, comprise this sample.

4.2.2 Variables

Data are evaluated under the most important hospital dimension—the internment department—given its expenditure volume (~50 %). Furthermore, the CMI is only available for this department, which is why no other departments have been chosen to this study, although all hospitals in this study provide outpatient services. The choice of variables followed the availability of data and the purpose of the paper, which focuses on the case of technical efficiency (no outcomes measures are included here, mainly due to absence of this kind of information in official databases).

The main output considered is the Inpatient Discharges (InpD) which is the total number of patients treated in any service from the internment department, within a year, adjusted by mortality rates. Since it is an internment department study, no other outputs have been used, such as outpatients or number of emergencies, although all 49 hospitals provide these services.

Since an economic outlook is desirable (given, at least, the theoretical financial unsustainability of the health system), most of the input variables employed here are expenses, updated to 2009 (by using the consumer price index):

  1. a)

    Costs of goods sold and consumed (CGSC), representing the drugs and clinical materials expenditures;

  2. b)

    Supplies and external services (SES), which include external labor outsourcing;

  3. c)

    Staff Costs (StaffC);

  4. d)

    Other costs (OthC), including depreciation and indirect costs related to the other services in the same hospital; and, finally

  5. e)

    Hospital Days (HpD), which correspond to the total number of days used by all inpatients within 1 year. The latter output may be considered as an extra input since it is necessary to produce InpD, along with the other inputs.

4.2.3 Adjusting for environment

To take into account the environmental effect, the conditional formulations are employed here. The Gaussian kernel and the bandwidth given by Eq. (4.1.1) are used in all approaches. The environment factors adopted were:

  1. a)

    Population density—PD—(inhabitants per km2), since in urban regions the secondary health care access is facilitated compared to rural areas, where the populations have typically lower levels of education and, therefore, a priori lower wealth levels; in this sample, PD ∈ [14.80; 2811.30] and the coefficient of variation (CV) is 132.03 %, i.e., there is a significant heterogeneity in the PD variable;

  2. b)

    Wealth index—WI—(purchasing power, adjusted for inflation); WI ∈ [63.82; 150.63], CV = 19.20 %;

  3. c)

    Aging index—AI—(number of older people, >65, per 100 youth, 0–14 years old) is included since expenses with health usually increase with aging, as predicted by the Grossman model, because diseases are generally more usual and complex; AI ∈ [78.60; 228.80], CV = 28.83 %;

  4. d)

    Morbidity—M—measured by the total of patients discharged from hospitals by 1,000 inhabitants in the most frequent diseases groups in International Classification of Diseases (ICD-9), i.e., groups II, VII, VIII, IX, XI and XVII; M ∈ [8.80; 982.71], CV = 141.37 %;

  5. e)

    Corporatization status—CS—categorical variable (SPA, SA, EPE)—until 2002, all Public Portuguese hospitals had belonged to the Administrative Public Sector (SPA), subjected to public/administrative law, but at the end of 2002 the legal status of some of them was changed: they were transformed into hospital enterprises (SA) with limited liabilities and became subjected to the commercial/private law [36,39,40]; however, all SA hospitals were transformed into corporate public entities (EPE) in 2005—in practical terms the difference between SA and EPE is that SA hospital are easier to be privatized (this change occurred only for political reasons when the new government come into force) [3,40]. This sample is composed of 216 SPA, 40 SA and 136 EPE hospitals, heterogeneously distributed among the 2002–2009 period; EPE hospitals tendentiously treat populations with higher WI and AI (although these ones are negatively correlated); apart from that, there is no significant relationship between the CS and the other epidemiological variables (PD and W);

  6. f)

    Merging status—MS—categorical variable (SH, HC, LHU)—for efficiency gains, some singular hospitals (SH) have been merged with other SHs, creating hospital centers (HC), or with other SH and few primary health units, creating local health units (LHU) [3]; this sample (of 392 units) is composed by 269 SH, 106 HC and 17 LHU (all HC and LHU are EPE units), heterogeneously distributed among the 2002–2009 period;

  7. g)

    Year—Y—categorical variable (2002–2009)—a pooled frontier is employed (see infra), gathering the information of those 8 years into a single meta-frontier; this variable only has significantly non-zero correlation coefficients with CS and MS because these reforms have gradually occurred in time—however, its lack of correlation with the epidemiological variables means that they do not significantly change with time.

The inclusion of the CS and the MS as exogenous variables allows filtering the effect of the political and structural reforms on hospital efficiency.Footnote 7 These have been the most important ones in the Portuguese public health system in the past few years. Therefore, any conclusion drawn from environment-corrected results does not depend on those important reforms. Note also that all the exogenous variables are, in general, good proxies of the services provided by the hospitals (e.g. large hospitals are usually HC providing services to richer populations with high PD and low AI), thus the comparison imposed by the method does not generate injustices among hospitals providing (resp. treating) different ranges of services (resp. inpatients).

It should be noted that the sample is quite homogenous, such as in Grosskopf and Valdmanis [14], because most of units lie near CMI ~ 1 and SMI ~ 1, while the exogenous variables present high heterogeneity levels (their coefficient of variation may reach 141 %). However, the CMI and the SMI present statistically significant Pearson’s coefficients of correlation with the population density, the wealth index (being positively correlated with these ones), the aging index and the morbidity.Footnote 8 There is no evidence of correlation between the CMI and the year or the corporatization status.

4.2.4 Models specification

The models adopted assume an input minimization emphasis, since (1) the Portuguese public hospitals (non-for-profit) should satisfy all the patients’ needs at the least possible cost, with a standard quality of service [3], and (2) one expects that it is easier to manipulate resources rather than production. A DMU is technically efficient under that standpoint if it is not possible to reduce its inputs without worsening its outputs [45,46].

In this paper, the locally convex order-m method is employed using m = 49 and B = 1,500. Conditional (or environmental-corrected) models assume Z = {PD, WI, AI, M, CS, MS, Y}, while unconditional models only adopt Z = {Y}, due to the aforementioned reasons. Note that the convexity seems to be more appropriate than the non-convexity assumption since it allows the feasibility of some linear combinations between two or more observations [33], which is particularly important with cost-related inputs. Furthermore, the non-convexity assumption tends to overestimate the efficiency estimation [45] and to bias the efficiency spread analysis (see infra). Note that the global convexity assumption does not correctly handle conditional measures.

A pooled frontier (or meta-frontier) is employed, which projects units from different time moments in the same frontier—however, since the variable year is included as an exogenous variable and imposing m = 49 (the size of the sample per year) and \( h=\left[1.06\cdot min\left\{\sigma, {R}_{23}/1.34\right\}\cdot {n}^{-\frac{1}{5}}\right]/3 \), it becomes clear that only units from the same year are chosen in the first order-m step, with a probability of 99.8 %. The time frontier shift effect is then prevented. This procedure, with above-mentioned restrictions, allows the increase of the sample size (392 units = 49 × 8) and then, the scale-up of the results discrimination, the decrease of effect of the curse of dimensionality as well as the reduction of the number of exercises (1 instead of 8).

However, there may be some evidence of the curse of dimensionality due to the choice of m = 49.

In order to avoid that effect, to strengthen the results, and since there is no clear political or managerial reason to impose virtual weight restrictions, the following approach seems to be the most appropriate one [32]: the inputs (or even outputs) are merged, using the eigenvector components (corresponding to the largest eigenvalue of the matrix XTX) as the weights. In this paper, for all approaches, X g  = α 1 ⋅ CGSC + α 2 ⋅ SES + α 3 ⋅ StaffC + α 4 ⋅ OthC + α 5 ⋅ HpD with α = [0.1956, 0.0478, 0.6399, 0.7416, 0.0034]T. X g has a significant correlation (p < <5 %) with each input indicating how well this one-dimensional variable represents the original inputs. Another examination that may be employed is to add up all capital-related inputs into a single variable (total costs), i.e., imposing  X T  = ∑  4 k = 1 X k  = CGSC + SES + StaffC + OthC.

For each approach, at least three different models are proposed to test the results robustness. Table 1 summarizes the variables considered in each model and approach. Note that this study analyses 60 different models (=15 conditional parametric + 15 conditional non-parametric + 15 unconditional parametric + 15 unconditional non-parametric) in order to strengthen the conclusions. The InpD variable is always weighted by mortality rates, which was the best outcome that could be found in databases. Further sub-sections describe some details that are specific of each approach.

Table 1 List of variables considered in each model and approach

5 Empirical results of efficiency assessment—the case of Portuguese hospitals

5.1 First approach

5.1.1 Modeling issues

As suggested by Simões and Marques [46], this approach tries to regress the ratios Q 1 = θ(x, y|z ∪ {CMI})/θ(x, y|z)  and Q 2 = θ(x, y|z ∪ {SMI})/θ(x, y|z) against the CMI and the SMI, respectively, to test whether these indexes are associated or not with high hospital performance. Simões and Marques [46] use the double-bootstrap approach to test the influence of the CMI in the Portuguese hospital performance and conclude that it negatively affects hospital performance. To compute θ(x, y|z ∪ {CMI}) one only has to include the CMI as an exogenous variable, discarding it when θ(x, y|z) is calculated.

These regressions shall be done by using the non-parametric Nadaraya-Watson method. Under an input-oriented framework, an exogenous variable Z is detrimental (conductive) to efficiency if the smoothed nonparametric regression is increasing (decreasing) [32]. Clearly, if Q j  = 1, no effect of Z on efficiency should be expected. In a statistical sense, one may argue that if the 95 %-confidence intervals of Q 1 = θ(x, y|z ∪ {CMI})/θ(x, y|z), contain the unit, then there is no reason to believe that the CMI affects hospital productivity.

5.1.2 Main results

Several important results may be drawn from the analysis described above. First, there is no meaningful influence of the CMI on efficiency if the epidemiological pattern and important political/ structural health reforms variables are taken into account by the models and under the non-parametric framework, since the 95 % Nadaraya-Watson confidence intervals (NWCI) of Q1 include the unit over all values of the CMI. Thus, according to these specifications, it is not expected that the CMI affects the hospital performance. Under the parametric framework, the CMI seems to be detrimental (conductive) to efficiency for low (high) values, i.e., when the hospital treats patients with lower complexities, on average, it is expected that this lower complexity is harmful to efficiency due to the existence of outsourced assets and human capital. All these results are consistent among models 1a, 1b and 1c. Figure 2 shows the non-parametric regression (against the CMI) for the model 1b, where the efficiencies have been environment-corrected and computed under the non-parametric (on top) and the parametric (on bottom) frameworks.

Fig. 2
figure 2

Non-parametric Nadaraya-Watson regression (against the CMI) using the Gaussian kernel, for environment-corrected efficiency estimates (Model 1b): non-parametric (top) and parametric (bottom)

Second, if the model disregards the environment, CMI seems to affect efficiency. Under the parametric basis, CMI only affects hospital efficiency for low values (being favorable), i.e., CMI < 1; otherwise, the 95 % NWCIs contain the unit and there is no significant influence on hospital efficiency. However, if the model is non-parametric, then there is no consistency among the models. While model 1b points out that CMI is unfavorable to the production process, according to model 1a the CMI is only unfavorable (favorable) for low (high) CMI values, i.e., CMI < 1 (CMI > 1). Meanwhile, model 1c indicates that CMI is unfavorable until CMI = 1 and after CMI = 1.5, being favorable to the efficiency between these two values.

Third, when Q2 is regressed against the SMI, the results show some consistency among the models. In the case of conditional measures, (1) parametric models show that Q2 is increasing with the SMI, thus this variable is adverse to hospital efficiency, while (2) non-parametric models indicate that it seems to have no influence on efficiency. On the other hand, in the case of non-environment-corrected measures, (1) the SMI seems to be detrimental for SMI > 1.5 when the model is parametric, but (2) it appears to have no influence on efficiency when the model is non-parametric. Note that here there is a higher consistency between conditional and non-conditional backgrounds.

5.2 Second approach

5.2.1 Modeling issues

This approach aims to examine the influence of both indexes on hospital efficiency assuming that they are InpD weights. To do so, the most appropriate procedure is to compare those different frontiers (obtained for CMI, SMI and NW schemes).

With the purpose of comparing two different frontiers, two different metrics are suggested, as follows [47]:

$$ \begin{array}{l}{\Theta}_{\mathrm{AB}}=\sqrt[n]{{\displaystyle \prod_{i=1}^n}\frac{\delta^A\left({x}_i^A,{y}_i^A\right)}{\delta^B\left({x}_i^B,{y}_i^B\right)}}\hfill \\ {}{\Phi}_{\mathrm{AB}}=\sqrt[2n]{{\displaystyle \prod_{i=1}^n}\frac{\delta^B\left({x}_i^A,{y}_i^A\right)\cdot {\delta}^B\left({x}_i^B,{y}_i^B\right)}{\delta^A\left({x}_i^B,{y}_i^B\right)\cdot {\delta}^A\left({x}_i^A,{y}_i^A\right)}}\hfill \end{array} $$
(5.2.1)

In previous eq., (1) δ A(x A i , y A i ) represents the Shephard’s input distance of DMU characterized by (x A i , y A i ) against the frontier A (usually, a cluster frontier); (2) δ B(x A i , y A i ) represents the Shephard’s input distance of DMU characterized by (x A i , y A i ) against the frontier B; (3) ΘAB denotes the efficiency spread between clusters A and B—if ΘAB < 1, then DMUs belonging to cluster A have greater consistency in efficiency than DMUs from cluster B; and (4) ΦAB designates the productivity gap between both clusters’ frontiers—if ΦAB < 1, then cluster A frontier has, on average, higher levels of productivity than group B frontier. In this work, there is only one cluster, but three different frontiers. Using previous concepts in a straightforward way, one may perform their comparison. Clearly, if CMI and/or SMI have no effect on efficiency (productivity), then ΘAB (resp. ΦAB) is unitary. In a statistical viewpoint, it may be measured by the confidence intervals of ΘAB and ΦAB: if they contain the unit, then no meaningful differences are observable between those different models.

If any difference is detected by non-parametric methods, then it is totally attributed to inefficiency (which may be not true in practice), but one may argue that the detected inefficiency is the same whether the InpD are weighted by the CMI, by the SMI or none—that is, the analysis proposed allows disentangling the inefficiency from the index effect on performance if one states that the “true” efficiency is that one obtained by using the CMI as the InpD weighting scheme. In such a case, one may argue that we are under a separability condition case and possible deviations may be due to differences between the CMI and the SMI/ non-weighting schemes. No differences mean that providers follow evidence-based principles to guide patient care (standardized care) and act in the interest of their patients.

5.2.2 Main results

According to the results of this approach,Footnote 9 if the model has been corrected by epidemiological pattern and important political/ structural health reforms variables, then there is no noteworthy efficiency spread (ΘAB) among those different InpD weighting schemes. That is, efficiencies do not change by using the CMI or the SMI or none of them. It is noticeable that (1) both parametric and non-parametric efficiencies lead to quite similar results, being the translog function a good choice, and (2) there is no remarkable presence of the curse of dimensionality since all models, 2a, 2b and 2c, yield alike values of ΘAB. Notwithstanding, unconditional models generate statistically significant deviations of ΘAB from the unit, except when the SMI model is compared with the NW model. Actually, the employment of the CMI seems to generate efficiencies with higher consistency than the ones obtained in the other models. Once again, there is a high consistency among parametric and non-parametric measures.

Even though conditional measures have shown that there is no efficiency spread among those different metrics, i.e., the relative position of the units against their own frontier does not change whether the InpD are risk adjusted or not. The same does not happen in the case of the relative positions of the different frontiers (one for each weighting scheme) where conditional and non-conditional measures show similar trends, as opposed to the parametric vs non-parametric dichotomy. The CMI seems to strongly decrease (increase) the productivity of the units when compared to the NW and the SMI models in non-parametric (parametric) frameworks. This effect is strengthened in non-conditional backgrounds. One should remark that in conditional parametric models, the CMI and the SMI’s frontiers tend to overlap, which means that under these conditions, it is not relevant to choose between one index or the other.

5.3 Third approach

5.3.1 Modeling issues

In this approach, it is assumed that any possible differences between the case mix and service intensity indices are solely due to inefficiency, and there is no explicit hospital-wide intention to game the CMI or the SMI for financial (reimbursement-related) purposes. In such cases, if the CMI is truly meant to measure illness severity, it should be an extra non-discretionary output in the model, according to Grosskopf and Valdmanis [14] and Garavaglia et al. [48], because it is based on patient conditions and other factors external to the hospital. On the other hand, the severity of services provided should be a discretionary input [29], given that the hospital determines when and sometimes how to treat certain conditions when evidence based protocols are vague. Since the models are input-oriented, the discretionary status of those extra variables does not change the model—all inputs are discretionary and the outputs non-discretionary, by default.

However, by using the bootstrap it is possible to test whether the inclusion of the CMI and the SMI as variables is meaningful to the DMU efficiency analysis. As a matter of fact, the inclusion of a variable keeps or increases the ESs, but never the opposite. Let θCMI & SMI be the ES using the CMI and the SMI as previously defined; let also be θCMI, θSMI and θ the ESs obtained only with the CMI but without the SMI, only with the SMI but without the CMI, and without any of these variables, respectively. Thus, the succeeding five hypotheses sets are imposed (the p-value is computed through Eq. (5.3.1)):

$$ \begin{array}{llll}{H}_{0(1)}:{\theta}_{\mathrm{CMI}\&\mathrm{SMI}}=\theta; \hfill & {H}_{0(2)}:{\theta}_{\mathrm{CMI}\&\mathrm{SMI}}={\theta}_{\mathrm{CMI}};\hfill & {H}_{0(3)}:{\theta}_{\mathrm{CMI}\&\mathrm{SMI}}={\theta}_{\mathrm{SMI}};\hfill & {H}_{0(4)}:{\theta}_{\mathrm{CMI}}=\theta; \hfill \\ {}{H}_{1(1)}:{\theta}_{\mathrm{CMI}\&\mathrm{SMI}}>\theta; \hfill & {H}_{1(2)}:{\theta}_{\mathrm{CMI}\&\mathrm{SMI}}>{\theta}_{\mathrm{CMI}};\hfill & {H}_{1(3)}:{\theta}_{\mathrm{CMI}\&\mathrm{SMI}}>{\theta}_{\mathrm{SMI}};\hfill & {H}_{1(4)}:{\theta}_{\mathrm{CMI}}>\theta; \hfill \\ {}{H}_{0(5)}:{\theta}_{\mathrm{SMI}}=\theta; \hfill & \hfill & \hfill & \hfill \\ {}{H}_{1(5)}:{\theta}_{\mathrm{SMI}}>\theta .\hfill & \hfill & \hfill & \hfill \\ {}p=\frac{1}{B}\cdot {\displaystyle \sum_{k=1}^B\mathbb{I}\left({T}_k^{bootstrap}\le {T}_k^{observed}\right)}\hfill & \hfill & \hfill & \hfill \end{array} $$
(5.3.1)

In the previous equation, \( \mathbb{I} \) is the indicator function and it is equal to the number of true occurrences on its argument. Meanwhile, T k,j is a statistical test that may be computed using, for instance, the following equationsFootnote 10:

$$ \begin{array}{l}{T}_{k,1}=\raisebox{1ex}{${\sum}_{i=1}^n\left\{{\theta}_{Model\ 1}\left({x}_i,{y}_i\right)\right\}$}\!\left/ \!\raisebox{-1ex}{${\sum}_{i=1}^n\left\{{\theta}_{Model\ 2}\left({x}_i,{y}_i\right)\right\}$}\right.\hfill \\ {}{T}_{k,2}=\raisebox{1ex}{${\left({\displaystyle {\prod}_{i=1}^n\left\{{\theta}_{Model\ 1}\left({x}_i,{y}_i\right)\right\}}\right)}^{\frac{1}{n}}$}\!\left/ \!\raisebox{-1ex}{${\left({\displaystyle {\prod}_{i=1}^n\left\{{\theta}_{Model\ 2}\left({x}_i,{y}_i\right)\right\}}\right)}^{\frac{1}{n}}$}\right.\hfill \\ {}{T}_{k,3}=\raisebox{1ex}{$ median\left\{{\theta}_{Model\;1}\left({x}_i,{y}_i\right)\right\}$}\!\left/ \!\raisebox{-1ex}{$ median\left\{{\theta}_{Model\;2}\left({x}_i,{y}_i\right)\right\}$}\right.\hfill \end{array} $$
(4.5.2)

5.3.2 Main results

The results of this approach are peremptory, as shown in Table 2 (for the case of Model 3b). In terms of efficiency measurements, there is no significant differences whether those two variables are included or not in models. This is in line with the efficiency spread analysis previously performed and its results, concerning the conditional framework. One should note that there is a considerable consistency among parametric and non-parametric measures, except for few special circumstances. Although it is not clear whether the SMI (as formulated in this paper) would be an input or an output, one would expect that if it is an output (unlikely this approach states), it would return similar results because the SMI and the CMI are highly comparable and present quite similar values.

Table 2 p-values for those five different hypotheses sets, three different statistical tests, T k , using bootstrap (Model 3b)

6 Discussion and conclusions

The results attained point out towards one simple but important conclusion: both the CMI and the SMI seem to be pointless in the model if and only if (1) it takes into account epidemiological pattern and other environmental variables,Footnote 11 and (2) one does not perform any cluster comparison analysis (e.g. by using the Malmquist index) because the CMI tends to considerably change the frontier, but not the relative position of units against that. This means that a time evolution analysis (or others somehow related) should account for the inpatients complexity, either through the CMI, the SMI or the environmental first-step adjustment, instead, since the latter group of units is truly comparable. Therefore, the inpatients adjustment has an influence on hospitals productivity.Footnote 12 Otherwise, the model would certainly be biased/ mis-specified and the results would be wrongly obtained/ addressed. Thus, the kind of analysis and the chosen model (parametric/ non-parametric; conditional/ non-conditional; weighting schemes/ extra variables) really matter because they tend to significantly influence the results. Nonetheless, the results suggest that the CMI does not actually reflect the hospital costs. One strong limitation of this study is that CMI data is quite homogenous, which might be the result of considerable epidemiology homogeneity within Portugal (because it is a small country), and the reader should be aware of that. Even so, this may introduce important changes in efficiency assessment if not corrected by the environment. Nevertheless, hospitals may become efficient or increase their efficiency without necessarily increasing the average inpatients illness by choosing those ones with higher diseases complexities, which is in line with [1215]. This is an important, debatable but stimulating result. It is debatable because the results seem to be model- and hospital realities-dependent, although a lot of authors have obtained similar conclusions, as previously stated.

One should also note that the high correlation among CMI and SMI as well as the similar efficiency results obtained through these different adjustment schemes may not occur in other countries, where the CMI may present a broader spectrum of values and/or if there are services where the inpatients have a larger broad of complexities than in the Portuguese case—indeed, the service notion is not standardized at all across countries as a common unit of hospital organization. Clearly, the CMI and the SMI are only comparable if and only if the last one is computed with the same reference than the CMI, in this case, the hospital admission. A statistical test is then required to test whether the influence of one index on efficiency is higher/ smaller than the other.

At the level of the facility, there is a tremendous amount of patient aggregation. Hence, many differences in clinical practice (which are based on patients who present with very different conditions, each of which was managed differently by the patient prior to inpatient care) would average out. If the hospitals that are being compared are truly comparable, and if the efficient frontier is solely based upon comparable hospitals, then CMI/SMI should not matter, regardless of how they are measured. In essence, at an aggregate level, comparable hospitals treat comparable patients. So if one identifies the “right” set of hospitals against which to create the efficient frontier (the choice of environmental variables) these differences effectively mitigate any need to control for illness severity and/or service mix, at the level of the facility as the DMU. Two main implications arise from this logic.

Firstly, it does not necessarily imply that such adjustments are “meaningless”. Rather, the use of some CMI/SMI adjustment forms the basis for a test of model misspecification, especially at the level of the facility. CMI/SMI data should be collected and used as a robustness check. If the results are dependent on the inclusion of such variables, the researcher, in all likelihood, has made some other error in his analysis. Secondly, the choice of Z is per se a hot topic and has several implications on results. A wrong/ inappropriate/ incomplete choice of Z (and, possibly the bandwidth and the kernel functions, as well) generally biases the final results and, therefore, the arising conclusions [40]. Moreover, the frequently employed two-step methodology (e.g. by using a truncated Tobit model or a double bootstrap method, as proposed by Simar and Wilson [49]), without an a priori units environmental adjustment, inherently assumes a separability condition on the production process, which is not necessarily true [32,33,40]. In such a case, the results in the first step are biased because units are not compared with truly comparable units [50]. Future analysis should take into account other exogenous variables, other kernel functions and other bandwidths for a robustness check.

Although the results point out that those indexes seem to be needless to assess hospital efficiency, the SMI may replace the CMI as InpD weight, due to theoretical reasons, or as an extra variable to describe the hospital resources. This index is easy to compute, it is comparable to the CMI, which is not usually available, and may be computed taking any baseline (a service, a department, a hospital or a country). Thus, the SMI, as proposed in this paper, may be useful for international comparisons. Nevertheless, we should be cautious in concluding that CMI and SMI indexes measure the same thing. While the SMI is a cost-based measure, since it is based on expenses associated with specific types of patient care, the CMI, being a DRG-based measure, is revenue based measure, because prospective payment system reimbursement models are based on the DRG units. There is certainly quite a bit of overlap, especially for non-profit firms (as happens in the present paper), who essentially equate revenues and accounting expenses. However, if the researcher is using variables, DEA orientations, or extensions of VRS technical efficiency (revenue efficiency, cost efficiency, etc.), it may be more appropriate to use one over the other. One would use the most appropriate measure as the basis for the robustness test. Hence, these results should be compared with other obtained through suchlike methods, but using other realities (countries with different health systems, for-profit units, etc). For instance, the sample used by Grosskopf and Valdmanis [14] has a quite different nature compared to ours (including the significant time shift between both samples), thus they are not directly comparable. Notwithstanding, by using similar methods (less robust however), similar conclusions are drawn.

This paper is important for a) scholars/ practitioners, because it is a hot topic on hospital efficiency measurement and contributes with several important questions and answers, and for b) policy-makers and hospital managers, since it has been concluded that, under the environment influence, the effect of inpatients severity/ complexity (assumed to being quite related to service complexity) seems to be meaningless for efficiency assessment, which has important implications on hospital management and new political reforms. E.g. this contributes to the decrease of the perverse inpatient choice probability by the hospital as a function of the case mix. That is, efficiency is not a function of the output complexity and heterogeneity, but rather of the inputs consumption. Therefore, the “case mix game” (and specially the so-called “DRG creep”, where the lower severity inpatients are coded as sicker) has no sense if hospitals want to increase their own efficiency. Furthermore, this conclusion has an important influence on hospitals reimbursement by the Ministry of Health, since the increase of the case mix (or eventually the service mix) does not necessarily translate itself into technical efficiency improvements, in line with this paper results. One may also suggest that the prospective payment should be done through a function of the environmental conditions, instead of the CMI.

One should strongly recognize that this analysis takes place at the level of the facility (non-for-profit hospitals), and more specifically over inpatient services. If efficiency analyses are completed at a much more local level, say, individual operating units (intensive care, medical/surgical, physical medicine, etc.), it may be not necessary to control for patient heterogeneity, as well as organizational differences (for example, separate intensive care and medical/surgical units vs combined units) across hospitals. The more specific and narrowly defined are the operating units in the analysis, the less likely are differences in patient heterogeneity (no matter how they are adjusted: CMI or SMI) to average out. It would be incumbent upon future research to replicate this study in these more “local” areas of production to determine whether or not CMI/SMI adjustments are necessary.

Also, it must be acknowledged that this is an introductory study of efficiency in the Portuguese health sector and the influence of the severity adjustment of inpatients. All techniques and models here presents should be viewed as complementary. Then, the models employed should be target of a further research. Apart from the previous model specifications, new models should consider, for instance, (1) the HpD as an output (as frequently pointed out in the literature), (2) new inputs (as doctors, nurses and beds), (3) other SMI formulations (see supra), (4) other CMI formulations (by changing the DRG grouping schemes by the ICD ones, e.g. [21]), or (5) other model orientations (such as directional models). These points should be included for a robustness analysis. Furthermore, one may test the local convexity and/or the efficiency estimates adjustment by other parametric functions. Finally, the models should include outcome (quality) measures [48], because there might be some efficiency/quality trade-offs that are not accounted for in this paper models and may change due to the presence of such outcomes information. It would have some political, economic and managerial implications on hospitals because the tendentious “case mix game” may affect the health care quality. Because the outputs volume may affect the quality (in general, it puts down the outcomes), the information concerning the illness severity may explain the outcomes, like the mortality rate or the increase in the length of stay, being a powerful indicator for the hospital management and higher entities, such as the Ministry of Health.