FormalPara Key Points for Decision Makers

Reimbursement recommendations in Ireland are not only driven by cost effectiveness.

Quality of evidence appears to be an important driver for reimbursement recommendations in Ireland.

The analysis highlights a need for the formal inclusion of additional criteria to ensure transparent and consistent decision making.

1 Introduction

Health technology assessment (HTA) plays an important role in informing reimbursement decisions in many jurisdictions across Europe and the world [1]. While the focus of such evaluations has historically been on cost-effectiveness analysis [2], it is widely accepted that cost effectiveness alone should not determine the decision on reimbursement [3]. A decision problem, such as the one at hand, requires simultaneous consideration of multiple, sometimes conflicting, objectives.

In Ireland, decisions on the reimbursement of pharmaceutical products are made by the Health Service Executive (HSE). On behalf of the HSE, the National Centre for Pharmacoeconomics (NCPE) carries out assessments of all new pharmaceuticals where an application has been made for reimbursement. Following a rapid review, the NCPE either recommends the reimbursement of a product or requests a full pharmacoeconomic assessment. Based on the full assessment, the NCPE makes one of three recommendations: (1) reimbursement at the requested price; (2) against reimbursement; or (3) against reimbursement at the submitted price. A cost-effectiveness analysis (including an analysis of comparative safety and efficacy) and a budget impact (BI) analysis are the core parts of each assessment. An agreement between the HSE and the marketing authorisation holders (MAHs) is in place in Ireland, which suggests a threshold of €45,000 per quality-adjusted life-year (QALY), below which technologies will be reimbursed [4]. The agreement states that exceptional products that fail to satisfy the €45,000/QALY threshold may proceed to discussions between the HSE, relevant stakeholders and the MAH. Likewise, the attainment of an incremental cost-effectiveness ratio (ICER) below €45,000/QALY may not result in a positive recommendation, particularly where there are concerns about the validity of any aspect of the cost-effectiveness analysis. In practice, therefore, there is flexibility in the threshold, which allows for other aspects (criteria) of individual interventions to be taken into account. No list of relevant criteria or their influence on the decision is made explicit, limiting the transparency of the decision-making process.

A number of authors have identified influential criteria in the HTA process [58]. Dakin et al. [5] modelled National Institute for Health and Care Excellence (NICE) decisions with three possible outcomes (“recommendation”, “restricted recommendation” and “no recommendation”). Multinomial logistic regression results suggest that clinical evidence, higher ICERs and decision date influence the decision. In a binary model, Devlin and Parkin [6] found that a combination of ICERs, burden of disease and uncertainty are acceptable indicators for NICE decisions. Harris et al. [7] modelled a binary decision outcome for Australia using probit multiple regression. Their analysis suggests clinical significance, cost effectiveness, cost to government and severity of disease as predictors for coverage decisions. Tanios et al. [8] conducted an international survey of 140 decision makers in 23 countries to report criteria that are currently considered or should be considered in healthcare decisions. Clinical efficacy, safety, quality of evidence (QoE), disease severity and costs were ranked to be the most relevant criteria.

The work in this paper aims to identify criteria that are likely to have influenced reimbursement decisions in Ireland in the past, through a retrospective analysis. The revealed preferences do not necessarily represent the criteria the HSE and the Irish population would select in a formal process to influence decisions on drug reimbursement [9]; however, they establish what is likely to currently influence decisions. The work highlights the need for a systematic process to incorporate influential criteria to improve transparency and consistency in reimbursement decision making.

2 Methods

2.1 Identification of Criteria

Based on case studies of HTA agencies in the UK, Germany, Australia and the USA, Rotter et al. [10] recently explored the changing landscape of economic evaluations in HTA. They propose a taxonomy of factors impacting on the value of medical technologies, which we used as a basis for this analysis. Guindo et al. [11] recently published a literature review identifying decision criteria for resource allocation. The top ten criteria identified in their analysis are covered by the taxonomy proposed by Rotter et al. [10]. The list of criteria was modified to suit Irish experiences in the past and clear definitions were added to each criterion. Clear definitions and measurement scales of each criterion were agreed upon in discussions with members of the NCPE review team.

2.2 Scoring of Assessments

Information was sought for every full pharmacoeconomic assessment completed by the NCPE up to July 2015. Data were extracted from summaries published on the NCPE website (http://www.ncpe.ie/), as these do not contain any confidential information. Each intervention was scored against each of the criteria using the predefined scales. Two members of the NCPE review team (LMC and SS) independently extracted the data; disagreements were resolved in discussions with the NCPE review team.

2.3 Statistical Analysis

The statistical analysis explores the relationship between the set of criteria identified in the literature review and the outcome of the NCPE assessment. The outcome was binary, with either a positive recommendation for reimbursement (1) or a negative recommendation (0).

As a first step, descriptive statistics are used to explore the relationship of each individual criterion and reimbursement. The dataset is then analysed using logistic regression. Logistic regression predicts the outcome of a categorical variable based on a number of predictor variables, which can be either categorical or continuous. In addition to the set of criteria identified in the literature review, the year in which the technology was assessed (coded as year of assessment) is considered as an additional predictor variable to account for the changes made during the last 7 years caused by financial constraints. Values are imputed where applicable for the primary analysis; cases with missing data were excluded in a sensitivity analysis. A logistic regression is a special case of a generalised linear model and takes the following form:

$$ \begin{array}{*{20}c} {Y_{i } \sim {\text{Bern}}\left( {p_{i} } \right)} \\ {logit\left( {p_{i} } \right) = \mathop \sum \limits_{j} w_{j} \times S_{ij} = : u_{i} } \\ \end{array} $$

where Y i is the observed outcome of treatment i, taking values 1 and 0; each observation has a Bernoulli distribution with a probability p i of success. The weight given to each criterion is denoted by w j ; the score achieved by treatment i on criterion j is denoted by S ij . The overall utility u i of treatment i is then calculated as a weighted sum of these scores. A logit link connects the probability of reimbursement p i with the utility of treatment i.

Model selection is used to select important predictors out of the pool of defined criteria. Model selection is conducted using the Bolasso method, a combination of bootstrapping and LASSO (least absolute shrinkage and selection operator) [12]. The LASSO regression selects relevant variables by minimising the residual sum of squares while constraining the sum of the coefficients to be lower than a tuning parameter. If the tuning parameter is large, LASSO will give the same results as standard least squares regression. Cross-validation to trade-off bias and variance is used to choose the tuning parameter [13]. To ensure a consistent model choice, the analysis is repeated on 100 bootstrapped realisations of the dataset and only variables selected in more than 95 % of replications are selected for the base-case model. Variables selected in more than 85 % of cases are considered as an alternative scenario.

Model selection is conducted in R (version 3.2.1) using the glmnet package [14, 15]. The logistic regression incorporating the selected predictors is fitted in R using the JAGS package [16].

3 Results

3.1 Identification of Criteria

The final list of criteria and their definitions are summarised in Table 1. A total of 13 criteria were elected: cost effectiveness, BI, safety and tolerability, process utility (evaluation of the delivery/implementation of the product within the health service), unmet need, orphan status, disadvantaged population, end of life, severe disease, innovation, reversibility, QoE and uncertainty (based on probability of cost effectiveness). Cost effectiveness is measured as the ICER. The use of the ICER as a criterion in a regression can cause problems, since negative ICERs can refer to situations where QALYs are gained at a reduced cost (dominant technology) as well as situations where a higher cost is requested for a reduction in QALYs (dominated technologies). However, in this application, all negative ICERs refer to dominant technologies. The 5-year gross BI is measured in euros. The gross BI was chosen as it was broadly available and a high level of heterogeneity was found in the net budget calculations. All other criteria are measured on a simplistic two- or three-category scale as more detailed information on these criteria was not broadly available. Pharmaceuticals, which are listed on the Orphanet Database (having been granted an orphan designation for disease(s) considered to be rare in Europe) [17] are classified as orphan drugs. Further details can be found in Table 1.

Table 1 Details on the criteria assessed in this analysis

3.2 Scoring of Assessments

A total of 85 full pharmacoeconomic assessments were completed by the NCPE between January 2006 and July 2015. Details on the technologies are summarised in Table 2. Each of the assessments was scored against each of the criteria. Each assessment evaluated one intervention (Table 3).

Table 2 General information on assessments conducted by the National Centre for Pharmacoeconomics between January 2006 and July 2015: each pharmaceutical is either recommended for reimbursement (outcome = yes) or not (outcome = no)
Table 3 Criteria scores for pharmacoeconomic assessments conducted by the National Centre for Pharmacoeconomics between January 2006 and July 2015

The base-case ICER was chosen; in cases where more than one base case was specified, the average across values was calculated. (In six cases, multiple comparators were deemed equally appropriate. In one case patients were split into subgroups; however, an overall recommendation on reimbursement was made.) Annual BIs are multiplied by five to determine the 5-year BI; the average is used where a range of values was reported.

Three assessments [17, 20, 44] did not submit a cost-effectiveness analysis and three assessments [23, 30, 75] conducted a cost-minimisation analysis. No ICER was reported in the summary in a further four cases [56, 59, 78, 81]. Two cases reported an ICER in the southwest quadrant of the cost-effectiveness plane (less costly and less effective than comparator) [22, 64] and the intervention in one case was dominated (more costly, less effective) [69]. These 13 cases were excluded from the analysis. One of the remaining assessments was a re-evaluation of a previous assessment. We only included the more recent assessment into the analysis in order to avoid double counting. Case 85 was therefore excluded.

No BI was reported in nine of the remaining cases [18, 24, 32, 39, 52, 54, 57, 72, 74]. Thirteen assessments [2, 3, 4, 12, 18, 21, 39, 41, 47, 49, 50, 52, 67] show dominance of the intervention in question and do not report an ICER.

In the base-case analysis, the BI is imputed as the mean BI of the remaining treatments. An ICER of –1 is adopted for dominant treatments. A sensitivity analysis (SA1) explores the analysis of the ICER as a categorical variable (0 for dominant technologies, 1 for ICERs €0–20,000, 2 for ICERs €20,000–40,000, 3 for ICERs €40,000–60,000, 4 for ICERs €60,000–80,000, 5 for ICERs €80,000–100,000 and 6 for ICERs more than €100,000). We have excluded dominant cases and cases for which no BI was reported in a sensitivity analysis (SA2).

The base-case analysis and SA1 is therefore based on 71 assessments, while 52 assessments are included in SA2.

3.3 Descriptive Analysis

Of the 71 evaluations included in the base-case analysis, 27 (38 %) interventions were recommended for reimbursement. Table 4 summarises the descriptive statistics. For continuous criteria, the table reports the overall mean as well as the mean stratified by reimbursement recommendation. For binary and categorical outcomes, the table reports overall counts and percentages in each category as well as stratified by recommendation.

Table 4 Descriptive statistics reporting overall means and means stratified by recommendation for continuous criteria and overall counts as well as counts stratified by recommendation for binary and categorical criteria

All but one of the reimbursement decisions were classified as being reversible and only one intervention related to a disadvantaged population. These criteria are therefore not further considered in the analysis.

3.4 Regression Results

Two variables, ICER and QoE, are selected in >95 % of the bootstrap replications to be important predictors for the reimbursement recommendation and are therefore chosen for the base-case model. Other strong predictors (selected in >85 % of the bootstrap replications) are year of assessment, safety and tolerability, as well as level of uncertainty.

The regression model estimates a probability of reimbursement for each decision made in the past. The decision rule based on the logistic regression suggests a positive recommendation for technologies with a probability of ≥0.5 and a negative recommendation for technologies with a probability of <0.5.

Figure 1 plots the estimated probability of reimbursement for each assessment stratified by actual recommendation using the ICER and QoE as predictors.

Fig. 1
figure 1

Estimated probability of positive reimbursement stratified by the actual recommendation for each assessment, based on the model using the incremental cost-effectiveness ratio and quality of evidence as predictors

The model correctly classifies 64 of 71 assessments (90 %). When extending the model to incorporate year of assessment, safety and tolerability and the level of uncertainty, 96 % of assessments are classified correctly.

The coefficients of both models are summarised in Table 5; they contain information on the impact the different criteria have on the reimbursement decision. In the base case coefficients are significant on a 95 % confidence level. Coefficients in the extended base case are more uncertain. The ICER, QoE, uncertainty and year of assessment remain significant at a 95 % confidence level, while confidence intervals for coefficients for safety and tolerability (95 % CI –0.02 to 0.04) as well as the intercept (95 % CI –49.7 to 70.3) are very wide.

Table 5 Regression coefficients and standard deviation for base case, extended base case, SA1 (incremental cost-effectiveness ratio categorical) and SA2 (excluding dominant cases and cases where budget impact is missing)

Since QoE is a categorical variable, it is possible to calculate a threshold for the cost per QALY for each level. The model suggests the reimbursement of interventions based on evaluations with no issues associated with the quality of available evidence (QoE = 0) up to a threshold of €40,633/QALY. For interventions with poor QoE (QoE = –1), a negative threshold is estimated, indicating that evaluations based on poor QoE should not be recommended for reimbursement.

For the extended base case, the probability of a positive recommendation decreases with increasing ICER; a probability of 0.5 is reached for an ICER of €59,046/QALY for a technology assessed in 2015 where all other predictors equal 0. QoE remains important with a probability of close to zero for evaluations with poor QoE. The probability of a positive recommendation increases slightly with time. The probability decreases with a worsening of safety and tolerability and increased uncertainty.

SA1, treating the ICER as categorical, also selects the ICER and QoE in >95 % of simulations. The model suggests the reimbursement of technologies at a threshold between 2 (ICER: €20,000–40,000) and 3 (ICER: €40,000–60,000) for technologies with appropriate evidence and does not recommend the reimbursement of technologies with a poor evidence base. The model correctly classifies 90 % of technologies.

SA2, excluding dominant cases as well as cases where BI was not reported, also selects the same two criteria (ICER and QoE) in >95 % of bootstrap replications; year of assessment is selected in >85 % of replications.

4 Discussion

The analysis demonstrates that recommendations for or against the reimbursement of technologies in Ireland are not only driven by cost effectiveness. Apart from the ICER, the quality of available evidence is identified as a potential key predictor. Other potential drivers of the decision include safety and tolerability as well as decision uncertainty. While uncertainty surrounding the estimates is high, the results resemble what one would expect. Ireland currently operates at a threshold of €45,000/QALY for pharmaceuticals, which was reduced to €20,000/QALY between 2010 and 2012; an ICER of €40,633/QALY in the base-case model is therefore in line with current policy. An evaluation can only be as good as the evidence it is built on; estimates based on poor QoE or inadequate modelling approaches are unreliable. The large impact of QoE on reimbursement recommendation is therefore not surprising. While all interventions have adverse effects, a significant improvement in safety and tolerability has been identified to positively impact on reimbursement. In the same way, a significant dis-improvement in safety and tolerability makes a positive recommendation less likely. While the economic evaluation takes place post market authorisation, meaning companies have demonstrated an acceptable level of safety, the consideration of this criterion remains important as it may impact on the well-being of the patient.

These findings resemble those of analyses conducted in other countries [58]. The criteria selected here were also identified in one or several of these studies (see Sect. 1 for details). Burden of disease had been identified in several of the published studies; however, this was not mirrored in this analysis.

Multi-criteria decision analysis (MCDA) in HTA is emerging as a new approach allowing for the systematic inclusion of multiple objectives in the assessment. Thokala and Duenas [3] have illustrated how different approaches to MCDA can be incorporated in the HTA framework. Goetghebeur et al. [18] have developed the EVIDEM (Evidence and Value: Impact on DEcisionMaking) framework for Canada, a model which has also been applied in Nepal [19], Chile [20] and Ghana [21]. Sullivan [22] has proposed the application of MCDA for New Zealand. One approach to MCDA extends the incremental net benefit (INB) to a linear additive value function incorporating relevant criteria [23]. The methodology easily extends the current decision-making approach in Ireland.

This analysis is the first phase of the development of a MCDA approach for the Irish healthcare setting. A range of potentially relevant criteria was selected based on the literature and a descriptive approach is taken to explain the impact different criteria have had on reimbursement recommendations in the past. The results of this analysis could be used to inform parameters of a linear additive value function. However, while the identified criteria are likely to have influenced decision making in the past, they have done so in an informal way; their value and relative importance has never been formally assessed. However, a formal assessment is needed to select parameters of an MCDA model to inform future decisions.

The NCPE now routinely assesses all factors identified as relevant in this analysis and incorporates outcomes in their report. The findings of the analysis can strengthen the case for conditional reimbursement in a policy setting. MCDA structures may offer a way of formally recognising concerns regarding QoE or safety and tolerability.

There are some limitations associated with this analysis. The analysis is limited to data available in the public domain; it is therefore possible that other factors influencing the recommendation may not be captured.

The outcome variable of the analysis distinguishes between positive and negative recommendations for reimbursement. Approximately half of the negative recommendations by the NCPE were issued with a recommendation “not at this price”, which may result in post-recommendation price discussions and potential reimbursement following price reduction. While this study analysed factors influencing the recommendation process, an analysis of factors influencing the actual reimbursement decision would complete the picture of decision making in Ireland. However, price discussions and agreements following the recommendation are often commercial in confidence and assessments based on the newly agreed price are not available.

The ICER as a measure for cost effectiveness has certain properties that are not ideal in a regression analysis. Positive values can indicate a QALY gain at an additional cost as well as a loss of QALYs at a saving, and negative values can indicate a QALY gain at a saving as well as a QALY loss at additional costs. However, this is not a problem in this analysis, since the technologies included all show a positive QALY gain. A solution to the problems associated with the ICER is the use of the INB instead. Unfortunately, data to infer the INB were not available for many of the technologies analysed here.

Some of the criteria are not easily defined. Gross BI was reported for the majority of treatments and therefore used to measure BI. However, it would be interesting to see if net BI yields a different outcome. Unfortunately, the net BI is not in the public domain for many cases, and there is high heterogeneity in its calculation between the different assessments. Nevertheless, there is uncertainty in the gross BI as it is influenced by many factors including projections on patient numbers and market penetration. Furthermore, a constant BI over 5 years was assumed where only an annual BI was reported.

The scale used to measure the criteria also influences the analysis. Due to the difficulties in obtaining information on all criteria as well as to improve clarity, a simple two- or three-level scale was used for most of the criteria. A more sophisticated approach for some of the criteria may be beneficial. For instance, measuring uncertainty on a continuous scale, such as the expected value of perfect information, would capture the full level of uncertainty.

In addition, while year of assessment was included in the analysis to explain some of the changes over time, the ICER and BI were not adjusted to a common year to explain inflation effects.

5 Conclusion

Transparency is reduced when criteria informally influence the decision process. We hope that highlighting potentially influencing criteria in this analysis will trigger communication between decision makers, experts and other stakeholders to discuss potential ways of incorporating relevant criteria in a more formalised manner. The results of this analysis provide a basis for discussions on whether the revealed criteria, which have influenced decisions in the past, should also be the criteria used to influence decisions in the future. MCDA forces an explicit list of criteria and the quantification of their value. Careful consideration is necessary to determine the relative importance of the criteria before an MCDA approach can be applied to actually aid decision making in the Irish healthcare setting.