Introduction

Ever rising healthcare expenditures necessitate not only policymakers, but also healthcare providers, health insurance companies and patients to make choices in healthcare. Especially in spinal disorders where a small group of patients is responsible for a large amount of the costs, efficient interventions are needed. In an economic evaluation, both the costs and consequences of two or more interventions are compared [15]. The evaluation is aimed at answering the question if an intervention is worth doing compared to other strategies that could be performed within a certain budget. Economic evaluations may help to identify ‘value for money’ interventions. They do not necessarily answer the question what the cheapest intervention is. If an intervention is more effective than another intervention but associated with higher costs, the intervention may still be used by healthcare providers and patients may still be reimbursed by insurance companies.

During the last decade, the number of economic evaluations and methodological papers concerning economic evaluations have increased. In order to value and use economic evaluations in the decision-making process knowledge of the methodology is necessary. This paper discusses the interpretation and practical application of economic evaluations in the field of spinal disorders. Although economic evaluations can also be performed in modelling studies, this paper focuses on economic evaluations alongside randomised controlled trials.

Types of economic evaluations

A distinction is made between full and partial economic evaluations. The criteria for a full evaluation are (1) two or more interventions are being considered and (2) both costs and consequences of the interventions are assessed. Evaluations not meeting both criteria are considered partial evaluations and are often outcome and/or cost descriptions, effectiveness analyses or cost analyses. Although partial evaluations may contribute to understanding effectiveness or costs involved in an intervention, full economic evaluations are the most useful in resource allocation questions.

The most commonly used full economic evaluations are [15]:

Cost–minimisation analysis (CMA)

In this analysis, the consequences (or effects) are considered to be equal and therefore only the costs of the interventions are compared. For instance in the study by Seferlis et al., the costs of three conservative treatment programmes for acute low back pain were compared to identify a least costs alternative [39]. Although the relative simplicity of this analysis may be appealing, the circumstances that allow this analysis to be performed are rare [8].

Cost–benefit analysis (CBA)

In a CBA both costs and consequence are expressed in monetary terms. An intervention may be considered efficient when the benefits outweigh the costs. However, the translation of outcomes to monetary terms is challenging. One method for this translation is ‘willingness-to-pay’. Using this method, patients are asked what they would be willing to pay for an intervention given certain changes in, for example, their low back pain, physical functioning or quality of life [15].

Cost–effectiveness analysis (CEA)

In a CEA, consequences are expressed as disease specific effects. A core set of outcome measures is recommended for spinal disorders, including pain, functioning and work disability [5]. The incremental effects of an intervention are related to the incremental costs in a so-called cost–effectiveness ratio. In the study by Kovacs et al., for example, neuroreflexotherapy was compared to routine general practice in patients with sub-acute and chronic low back pain. As outcome measures low back pain intensity and disability were included and compared to the healthcare and indirect costs in both intervention groups [28].

Cost–utility analysis (CUA)

In a CUA, the outcomes are patients’ preferences, which are expressed as quality-adjusted life-years (QALYs) or disability-adjusted life-years. For instance, the EQ-5D was used to estimate how many QALYs participants had experienced in the UK BEAM trial, comparing best care in general practice, an exercise programme, a spinal manipulation package and combined treatment. The results were expressed as costs per QALY gained [1].

Because QALYs represent generic health status, theoretically, a ranking of different interventions across different disorders can be made. In practice, it is difficult to obtain information on the full costs and benefits of all health problems and alternative interventions [35]. Consequently, policymakers do not often use a ranking system.

In the field of spinal disorders, cost–effectiveness and CUA are the most commonly used full economic evaluations.

Key elements of economic evaluations

In order to use and interpret economic evaluations, knowledge of key elements is essential.

Alternatives

To determine the efficiency or cost–effectiveness of an intervention, a comparison must be made with one or more alternatives. The alternative generally might be ‘usual care’ or the best or most widely used alternative since these are the most informative comparisons to policymakers. In the case of a new intervention in a field where no other interventions are available, a ‘doing nothing’ alternative is possible. A ‘placebo intervention’ is not recommended as an alternative in economic evaluations since this is not a real treatment option and information derived from the evaluation is therefore not useful for policymakers. For example, economic evaluations in spinal disorders have compared surgery with usual care [19], manual therapy with exercise therapy, usual care by the general practitioner and combined therapy [1].

Perspective

Different perspectives can be chosen for the economic evaluation, such as the societal perspective, the patient’s perspective, the health insurance perspective, the healthcare provider perspective or the perspective of companies. Whether results can be generalised to other settings and which costs and outcomes are considered in the evaluation depends on the chosen perspective. If, for instance, the economic evaluation is performed from the perspective of the healthcare provider, effects such as patient satisfaction and functioning may be the most relevant outcomes and relevant costs are costs for the therapy. However, economic evaluations are usually performed from a societal perspective. In that case all relevant outcomes and costs are measured, regardless of who is responsible for the costs and who benefits from the effects. Since the chosen perspective has large implications for the design of the economic evaluation, the perspective of the study should be clearly stated.

Identifying, measuring and valuing outcomes

The outcomes should be relevant to the type of disorder. Bombardier et al. have recommended a core set of outcome measures for intervention studies in the field of spinal disorders [5]. Although this element is very important in economic evaluations, other papers in the current issue address this topic extensively and therefore further details are not provided here.

Identifying, measuring and valuing costs

One of the main challenges in economic evaluations is to decide which costs should be included and how these costs should be measured and valued. The type of costs that are relevant in a specific economic evaluation depends on the chosen perspective. A distinction can be made between direct and indirect costs, within and outside the healthcare sector [33]. Direct costs within the healthcare sector include all costs of healthcare services. For spinal disorders these costs include, for example, costs of general practitioner care, physical therapy and hospitalisation. Direct costs outside the healthcare sector include, for example, out-of-pocket costs and travel expenses. Indirect costs within the healthcare sector are costs during life years gained. For example, the costs of treating unrelated heart problems several years after a life saving operation of spinal cord trauma. In spinal disorder studies, these costs are usually not relevant, because interventions do not prolong life. Indirect costs outside the healthcare sector include costs of productivity loss [33]. The costs of work absenteeism amount to 93% of the total costs of back pain [48]. Because absenteeism has a substantial impact on the total costs of spinal disorders, this cost category should be included in economic evaluations in this field. Table 1 provides an overview of the different cost categories with examples of relevant costs within each category.

Table 1 Different cost categories in economic evaluations in spinal disorders

Usually costs cannot be measured directly. Initially resource use is measured and subsequently valued. When performing an economic evaluation alongside a randomised clinical trial resource use can be measured with different instruments. Data on healthcare use can be collected in patient interviews or patients can be asked to fill in questionnaires or to keep a cost diary [22]. Databases of insurance companies, healthcare providers or employees and patient files can also be used to measure resource use. However, none of these methods are perfect. Patients may not adequately remember healthcare visits or days of absenteeism (recall bias) and databases do not always provide the necessary information (information bias). Using several methods simultaneously in one study generates a large amount of information but can be time consuming and expensive. This may also raise the question which method is the golden standard when results from different resources conflict.

When all relevant resource data have been collected, the next step is valuing the costs. Direct costs are determined by valuing resource use with valid unit prices. Five different ways of obtaining unit prices are distinguished: (1) prices derived from national registries; (2) prices derived from health economics literature and previous research; (3) standard costs; (4) tariffs and charges; (5) calculation of unit costs [33]. In Table 2 these methods are summarised. Two approaches are used to value indirect costs due to absenteeism: the Human Capital Approach [15] and the Friction Cost Method [26]. In the Human Capital Approach, the production losses for an individual worker are calculated from the moment of absence until full recovery or, in the absence of recovery, until the moment of death or retirement. This method is most frequently used. The Friction Cost Method takes into account that sick workers are replaced after a certain period of time, the friction period, depending on the elasticity of the labour market. For example, a worker is sick listed for more than 6 months due to low back pain and the estimated friction period is 4 months. The cost of absenteeism is calculated for 4 months; in theory, the worker can be replaced after this time period.

Table 2 Advantages, disadvantages and recommendations for different methods of valuing resource use [33]

Analysing cost data

Reviews assessing the statistical methods used in economic evaluations showed that costs analyses need improvement [2, 24]. Analysing and interpreting costs data from a randomised clinical trial can be challenging due to the highly skewed distribution of cost data and relatively small sample sizes. The skewness is caused by a relatively small number of patients with high costs. For example, there may be a few subjects with substantial periods of absenteeism from paid work. Also, the sample size for economic evaluations need to be larger than is usually required in a randomised clinical trial due to the large variance in cost data [2]. Usually sample size calculations are based on expected differences in effects and not costs. Consequently, interpreting results from economic evaluations requires caution because the study may have been underpowered.

For the comparison of mean costs, non-parametric methods and data transformation are not regarded as appropriate methods. These methods do not necessarily compare arithmetic mean costs [3, 11, 46]. For policymakers, the arithmetic mean is the most informative measure since total costs for implementation of the intervention can be calculated from the arithmetic mean [3]. The non-parametric bootstrap method involves drawing samples with replacement from the original distribution [30]. For example, in an original dataset presenting the costs of 50 patients, random values are selected with replacement. In this way another dataset (bootstrap dataset) of 50 observations is created; this process can be repeated indefinitely. Although it does not make assumptions about the distribution, it does assume that the original dataset represents the true distribution of the data. The non-parametric bootstrap is recommended for analysing cost data or as a check on the robustness of standard parametric methods [3, 12, 46]. However, O’Hagan and Stevens have argued that non-parametric methods, such as the bootstrap based on the sample mean may be inappropriate because of the non-robustness of the sample mean to skewed data [31]. A recent study has shown that a single (non)parametric form for the distribution of costs cannot be assumed; modelling the tail of the distribution is problematic. Sample sizes should be large enough for accurate modelling of the tail of the cost distribution and sensitivity analysis should be performed for the model uncertainty [29].

Missing values

When economic data are collected alongside a randomised clinical trial, drop-outs and missing data will occur. However, few studies report drop out rates and missing values in cost data [2]. In case of missing values, a complete case analysis as well as an analysis with imputed data are recommended. Recent studies have shown that the different methods for dealing with missing data may influence outcomes of an economic evaluation, stressing the importance of reporting the completeness of economic data and the methods used to deal with missing data [32, 34]. For random missing values, the bootstrap Expectation Maximisation algorithm, multiple imputation regression and multiple imputation Monte Carlo Markov Chain are recommended as methods for analysing incomplete data [32]. However, as Briggs et al. have stated ‘imputation methods are not a cure for poor study design and/or a poor data collection process’ [9].

Discounting

Discounting means computing equivalent present values of future costs or benefits. Costs should be discounted in studies with a time horizon longer than 1 year. Although the value of the appropriate discount rate is debated, discount rates usually vary between 3 and 5% [14, 21]. Discounting effects is more controversial and discount rates varying from the same or a lower rate as the costs have been proposed [21, 23]. The use of discount rates should be clearly stated.

Sensitivity analysis

Different assumptions and choices made during economic evaluations cause uncertainty in the outcomes. In sensitivity analyses, the robustness of the various assumptions and choices are investigated. For example, in a cost–effectiveness study of physiotherapy, manual therapy and general practitioner care for neck pain a sensitivity analysis was performed leaving out two patients who had been hospitalised, generating considerably more costs than the remaining patients [27]. Different types of sensitivity analysis are identified: one way sensitivity analysis, extreme scenarios and probabilistic sensitivity analysis [7]. In the first variant, the impact of variables in the study are assessed by varying the range of plausible values. Extreme scenarios examine the most optimistic and/or the most pessimistic cost and effectiveness estimate. Monte Carlo simulations are used in a probabilistic sensitivity analysis where variables vary simultaneously [7, 37]. Sensitivity analyses enhance the interpretability and quality of an economic evaluation and should always be performed and reported.

Interpreting results

Interpreting results of CEA and/or CUA, is challenging. For each outcome measure, an incremental cost–effectiveness ratio (ICER) can be calculated. In the incremental approach additional costs of an intervention over another intervention are compared to the additional effects [15]. Costs and effects of two interventions can directly be compared since the ratio represents the difference in costs divided by the difference in effects.

scheme a

The ICER indicates the additional investments needed to gain one extra unit of effect. For example, the additional costs of surgical treatment compared to non-surgical treatment to gain one point on the Roland Morris Disability Questionnaire or per QALY. Table 3 shows that the ICER is difficult to interpret. An ICER of € 2,000 could mean that the intervention is € 10,000 more expensive and five points more effective (situation A), but could also indicate that the intervention is € 10,000 less costly and five point less effective (situation B). Without the context of the values of the difference in costs and difference in effects the ICER is uninformative. To determine which treatment is to be preferred, measures of precision and the policymaker’s maximum willingness to pay (in the literature often referred to as λ) are needed.

Table 3 Examples of incremental cost–effectiveness ratios (ICERs)

Different methods have been proposed for estimating confidence intervals for the ICER [6, 36]. On a cost–effectiveness plane, the bootstrap estimates can be displayed [30]. The x-axis represents the difference in effects and the y-axis represents the difference in costs. Four quadrants can be distinguished (see Fig. 1). In situations 1 and 2, one treatment dominates the other and pose no significant problems for interpretation. Situations 7 and 8 require information on the threshold value for determining cost–effectiveness. Interpreting the ICER when the confidence surfaces of the bootstrapped ratios overlap one of the axes is challenging. This is the case in situations 3, 4, 5 or 6 in Fig. 1 when there is no statistically significant difference in either the costs or effects.

Fig. 1
figure 1

Cost–effectiveness plane with nine possible situations resulting from an economic evaluation [9]. Reprinted, with permission, from the Annual Review of Public Health, Volume 23 ©2002 by Annual Reviews www.annualreviews.org

Negative ratios are also problematic for decision-making since they have no meaningful ordering [45]. Van Hout et al. have introduced the cost–effectiveness acceptability curve (CEAC) to overcome some of the difficulties of the ICER [47]. The curve represents the proportion of the sampling distribution of costs and effects that lie below the policymakers’ maximum willingness to pay. It shows the probability that an intervention is cost-effective for a wide range of threshold ratios [40]. Because ICERs and their confidence surfaces can lie in different quadrants the acceptability curve can take different forms [18]. Stinnett and Mullahy introduced another approach for the analysis of uncertainty: the net health benefit framework, in which the cost–effectiveness decision rule is reformulated [43].

Interpretation of an economic evaluation: an example

An example of an economic evaluation in the field of spinal disorders is the cost–effectiveness study of lumbar fusion versus non-surgical treatment by Fritzell et al. [19]. A total of 284 patients with severe and therapy-resistant chronic low back pain of unknown origin for at least 2 years were included in the study. Patients were randomly assigned to four treatment groups; three included different surgical procedures and one consisted of commonly used non-surgical treatments.

The authors clearly describe almost all of the key elements mentioned above. Although the study was initially not organised as an economic evaluation, which has consequences for the sample size and cost data collection, this is clearly discussed in the paper. Information on missing data and the choice for the cost analysis methods was not explained.

The ICER and confidence intervals all fall within the NE quadrant; lumbar fusion is both more costly and more effective than non-surgical treatment. Whether lumbar fusion is more cost-effective depends on the policymakers willingness to pay. The acceptability curves show that when the policymakers are willing to pay more for surgical treatment, the probability that lumbar fusion is more cost-effective than non-surgical treatment increases.

Cost–effectiveness threshold

Acceptability curves and net benefit only indicate the probability that a certain therapy is cost-effective. Whether the therapy is cost-effective depends on the policymakers’ maximum willingness to pay. Recent papers have focussed on cost–effectiveness thresholds [13, 16, 25]. Eichler et al. identified different types of thresholds: those proposed by individuals or institutions, thresholds estimated from willingness-to-pay or related studies, thresholds inferred from past allocation decisions and cost–effectiveness ratios from other (non-medical) programs [16]. However, the use of threshold values is debated. Gafni and Birch have argued that thresholds might lead to uncontrolled growth in healthcare expenditure and that the necessary assumptions for application of thresholds are not met in practice [4, 20]. The introduced ‘affordability curve’ combines budget constraints with cost–effectiveness. The curve shows the probability that the therapy under study is affordable given a wide range of threshold budgets [40]. It is unlikely that policymakers will use a single threshold value in the decision to implement an intervention. Other factors, such as the overall budget impact of the intervention and the absence of adequate alternatives influence policy decisions [44].

Progress to date

To assess the quality of economic evaluations for systematic reviews, several guidelines and recommendations have been developed [14, 41, 42]. Recently, the Consensus Health Economic Criteria (CHEC) list was designed [17]. The CHEC-list consists of 19 items and focuses on the methodological quality of economic evaluations. Table 4 summarises the 19 items of the CHEC-list. In a recent systematic review, economic evaluations in non-specific low back pain studies were assessed using the CHEC-list. Due to the heterogeneity of interventions, controls and study populations no definite conclusions could be drawn about the most cost-effective non-operative treatment in patients with low back pain [38].

Table 4 Items of the Consensus on Health Economic Criteria (CHEC)-list

Conclusion and recommendations

It is often argued that decisions in clinical practice should not be based on cost issues but medical necessity or clinical effectiveness. With the rising healthcare expenditure and its consequences for budgets choices have to be made. These choices do not have to be made on costs considerations alone but at the same time, basing decisions solely on medical necessity would be insufficient. Implicitly, choices in clinical practice are already based on cost considerations. Simply providing all available care to one group of patients implies that other groups of patients are left with nothing. Economic evaluations can provide valuable information but the methodology and especially the cost methodology needs to improve. To be able to critically appraise economic evaluations and consequently use these studies, knowledge of the methodological aspects is of utmost importance. Although the ‘perfect’ study is an utopia, specific assumptions, choices and used methods should be clearly described to provide insight in the quality and practical use of the evaluation. In the near future, economic assessment may thus play an increasingly important role in the outcome evaluation of spinal interventions.