FormalPara Key Points for Decision Makers

There exists no formal guidance for modelling treatments that achieved a marketing authorisation based only on uncontrolled clinical trial data.

Of treatments gaining a marketing authorisation in this manner, approximately half have been analysed in published models, with around a quarter of indications not-submitted to UK health technology assessment bodies for review when requested.

Where models have been constructed, the most frequent approach is a naive comparison to a historical control without any adjustment for differences in patient population. This approach is open to bias should the patients be non-exchangeable.

1 Introduction

Treatments are usually granted a marketing authorisation on the basis of randomised controlled trials (RCTs), conducted against either a placebo or an active control [1]. This provides a basis for regulators to make decisions regarding the efficacy of interventions compared with the current standard of care [2]. This evidence may then be used to estimate the difference between the new treatment and the standard of care. Indirect treatment comparisons using a common comparator sometimes enable the comparison of the efficacy of treatments across different studies [3, 4].

Less commonly, treatments can be granted a marketing authorisation without a study containing a control arm. In a few cases, it may be apparent that the treatment is efficacious, for example, if all patients died before an intervention was available, but all live afterwards [5], or patients achieve a marked improvement in an objective measure, for example, blood count [2]. While treatments may receive a licence without being supported by RCTs, estimates of their comparative efficacy (relative to the current standard of care) are still needed to inform decisions on reimbursement in many healthcare systems. This decision problem faced by regulators is different to that of payers—whilst a regulator must ask the question of whether the benefit/risk of a product is positive, a payer is interested in how much benefit is gained for the additional cost of treatment (or alternatively may use the additional benefit to set a price). In many countries (particularly in Western Europe), these calculations are formally brought in to decision making through the use of cost-effectiveness analysis for resource allocation decisions [3].

Where cost-effectiveness analysis is used as a decision criterion, in general, treatments are required to generate more health (usually defined in terms of quality-adjusted life-years) than the treatments that would be displaced (represented by a ‘shadow budget’). This means that in practice the money spent on the new intervention should generate more health than money spent elsewhere in the healthcare system. To estimate the magnitude of the health gains seen with new technologies, modelling is used to extrapolate the benefits beyond the trial(s), though how comparative estimates should be constructed without controlled trials is unclear. While there exists extensive guidance on constructing economic models based on RCT results, there is no health technology agency or professional body guidance on the most appropriate method of modelling study data without an internal control (Table 1).

Table 1 Guidance for economic modelling relating to uncontrolled clinical studies from key agencies and societies

The objective of this study was therefore to identify models constructed for treatments granted marketing authorisation without RCT evidence, and the approach taken to estimating relative efficacy of the treatment(s).

2 Methods

Hatswell et al. [6] identified treatments granted a marketing authorisation by either the US Food and Drug Administration or the European Medicines Agency from January 1999 to May 2014, without supportive RCT results (74 indications for 62 drugs). We conducted a systematic review for economic evaluations published for each of the treatments in the relevant indications using PubMed (search terms given in Fig. 1). The search strategy used was an extremely broad one, as multiple types of study may have included methods used to estimate comparative efficacy, for example, clinical papers estimating the benefit of treatment and cost-effectiveness studies will have required comparative effectiveness as an input. Furthermore, cost-effectiveness studies are often published with varying titles, again supporting a wide search strategy with the expectation of a large amount of filtering performed on hits.

Fig. 1
figure 1

PubMed search strategy for cost-effectiveness papers

To ensure we identified all relevant modelling approaches, searches were also conducted for health technology appraisals conducted by the National Institute for Health and Care Excellence (NICE), the Scottish Medicines Consortium (SMC) and the All Wales Medicines Strategy Group (AWMSG), as well as the grey literature of the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Scientific Presentations Database. As the search tools on the health technology appraisal agency and ISPOR websites lack the sophistication and complexity of PubMed, a search was conducted on each website for the generic drug name, the US brand name or the European Union brand name. This again was expected to result in a large number of hits not meeting the inclusion criteria (as any document mentioning the product would be included), but is likely to include all relevant papers.

After identification, results (papers, health technology appraisal submissions and scientific presentations) were filtered for models that analysed indications where uncontrolled study data were the primary basis for approval and used only these non-RCT data (some pharmaceuticals had multiple indications, or subsequently had RCT data become available). The exclusion criteria used were for hits that did not include a method of generating comparative effectiveness in the specified indication, for example, it only discussed the (uncontrolled) trial results in isolation, or made comment on the cost of the drug. Results were then de-duplicated, based on the model descriptions and study authors, to account for the same model being used for different purposes (for example, a model used in a NICE submission, then published with Spanish costs, all while using the same approach to modelling efficacy). Where it was not clear whether a model was reported on multiple occasions, or was a similar (yet independent) approach, this was discussed by the reviewers and a decision reached by consensus.

Following identification of the economic models, the approaches used to estimate efficacy against the relevant comparator were categorised for each model. If a model included multiple approaches to modelling efficacy data, these were classed as separate modelling approaches. The modelling approaches identified were then placed into a taxonomic framework and analysed for commonality in approach.

3 Results

Figure 2 shows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses diagram for economic evaluations retrieved through PubMed [7]. The initial 74 literature searches in PubMed yielded 1202 hits, which were reduced to 56 full articles after abstract and title review. Twenty-nine papers were included after the full paper review. The main reasons for exclusion during this review were models being based on RCT data (n = 9), models evaluating a different indication (including a different stage of the same disease, n = 7) and papers that did not contain an economic model (e.g., burden of illness studies, n = 6). As expected, there was a large amount of initial hits (owing to the wide search strategy) excluded at the initial review stage, for example, papers that discussed the treatment of interest (search 4), and mentioned cost (in any context).

Fig. 2
figure 2

PRISMA diagram of economic evaluations retrieved from PubMed. PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses, RCT randomised controlled trial

In addition to published papers, searches of health technology assessment body websites led to 19 NICE appraisals being identified (9 included), 52 SMC appraisals identified (16 included) and 27 AWMSG appraisals identified (5 included). Overall, there was a notable level of non-submission to health technology assessment agencies, in particular to the SMC (13/52 non-submissions) and AWMSG (13/27 non-submissions). Appraisals also often occurred after RCT-based results had become available (NICE 8/19, SMC 9/52, AWMSG 3/27), leading to exclusion from this study. Full results of the review are shown in Table 2.

Table 2 PRISMA information for economic evaluations identified as being based on uncontrolled clinical study data

Searching the ISPOR Scientific Presentations Database led to 1780 abstract hits, with 43 records selected for further review and 15 full records included. The most common reason for exclusion of records selected for full review was insufficient information reported regarding the model or approach used (n = 14). The large number of hits relative to included documents was mostly owing to the imprecise nature of the search function available, where we were forced to search for all abstracts that mentioned the drug in any context. For widely used drugs, this resulted in a large number of hits that were on the whole, not relevant—1569/1780 hits (88 %) were for just six products (imatinib, cetuximab, bortezomib, sunitinib, dasatinib and nilotinib) and yielded only eight relevant abstracts. Whilst this pattern was similar with PubMed hits, the additional specificity of search terms meant these six products constituted 882/1202 of hits (73 %) whilst representing 15 of the 74 indications (20 %).

In total, 74 relevant documents were identified (including publications, health technology appraisals and scientific presentations), which described 91 distinct modelling approaches for 30 indications. After consolidation of approaches reported multiple times (for example, one model being used for NICE and SMC submissions, presented at ISPOR and then published in an indexed journal), 51 unique modelling approaches were identified. Of these 51 modelling approaches, the overwhelming majority (n = 43, 84 %) were based on historical controls. Other approaches identified included using patients as their own control by statistical analysis or comparisons with baseline values (n = 3, 6 %), cost-minimisation analyses (n = 3, 6 %), threshold analyses (n = 1, 2 %) or assuming in oncology that time on treatment (assumed to be equal to progression-free survival in the model) was added to overall survival, with treatments then given in sequence (termed the ‘cumulative method’; n = 1, 2 %) (Fig. 3).

Fig. 3
figure 3

Taxonomy of economic modelling approaches used for estimating incremental benefit from uncontrolled clinical studies

All the 43 historical controls identified compared the results of the uncontrolled study of the new treatment with a separate set of data. In 16 cases (37 %), the treatment was compared with an investigational arm from another clinical study, and in five cases (12 %), the treatment was compared with pooled or meta-analysed data from a series of studies. A further 15 models (34 %) used comparisons to registry or case series data, and seven models (16 %) compared the results of the uncontrolled study with expert opinion. Trial and registry data appeared to be used interchangeably in evaluations, with only seven studies (16 %) attempting to account for differences in patient characteristics or patient selection between data sources. A summary of each of the modelling approaches identified is given in Table 3, which reports the approach taken, taxonomic category and reporting source.

Table 3 Economic models identified as having been based on uncontrolled clinical study data by treatment (alphabetical order)

When looking at the taxonomy by source (health technology assessment, published paper or conference proceedings), a similar pattern of modelling methods is apparent. This is shown in the Online Supplementary Material Appendix (Taxonomy of economic modelling approaches used for estimating incremental benefit from uncontrolled clinical studies by source).

4 Discussion

The results of this review show that 51 unique models have been published for 30 different indications granted a marketing authorisation without a comparative trial. Of the 74 indications for treatments approved without a comparative trial [6], 44 indications have not been modelled and estimates of relative effectiveness are not available. It is not known what the rate of economic evaluation of new treatments is, although we suspect it will be higher than the 40 % rate seen in this study.

The use of a historical control was by far the most common approach (43/51), which was most frequently taken from another trial or trials (21/43). However, even within this method there was substantial variation, some studies compared the results of uncontrolled trials with results taken from multiple trials (for example, Dinnes et al. who pooled the results of eight other clinical trials to compare against), whereas the majority of models compared against single arms from other studies.

The assumption inherent in naive comparisons to historical controls (first proposed by Pocock [8]) is that patients are similar, or “exchangeable”, between studies. If this is not the case, and patients do systematically differ between studies, then this procedure will introduce bias in the comparison. Several approaches to matching patients and baseline characteristics between studies are available in the literature, including methods based on propensity scores [9] and match-adjusted indirect comparisons [10]. Despite the availability of these approaches, only seven models attempted to control for any differences between trials, with one notable example being the work by Annemans et al., who constructed a historical control by reviewing patient records at the centres participating in the clinical trial in the time period before the clinical trial was open for enrolment, matching patients against the trial inclusion criteria [11].

The lack of adjustment of outcomes to reflect potentially more favourable patient cohorts may represent a substantial bias in the literature in favour of the new treatments. In a study by Sacks et al. of 50 RCTs and 56 historically controlled trials of the same interventions, the randomised control arm performed better than the historical control arm. In the studies cited, 79 % of historically controlled trials stated that the intervention was effective, compared with only 20 % of RCTs [12]. Diehl and Perry investigated the same question looking at overall survival or relapse-free survival in oncology, finding 43 examples in the literature of well-matched historical cohorts and RCT control groups. However, when comparing the outcomes of the two groups, 18 of the 43 studies had a greater than 10 % difference in effect size between the control groups—the randomised group performing better on 17 out of 18 occasions [13]. This latter finding is particularly concerning given that 32 of the 43 historically controlled models identified in our study were in oncology, though other example historical controls have proved a poor match for RCT control arms that would be expected to have shown similar results based solely on the inclusion criteria of patients [1416].

Outside of historical controls, cost minimisation (though frowned upon in the literature [17]) was used in three models. While it may appear superficially attractive to assume treatments have equal efficacy to similar ones, it is unlikely that they exhibit exactly the same efficacy, with zero uncertainty. A further three models compared patient outcomes on treatment with a patient’s baseline result. This is also a potentially biased approach, owing to issues such as regression to the mean [18]. One additional approach, comparing all patients with non-responders, allows the estimation of an effect size, but it will be overly favourable towards the intervention, as non-responders will include an inherently sicker population [19]. The final approach noted was that of Tappenden et al., who pragmatically performed threshold analysis of the relative risk needed for the drug to be considered cost effective. Although this does not necessarily give an estimate of effect size, it allows a decision maker to make a more informed decision after reviewing the clinical evidence [20]; as such, we would recommend the use of similar threshold analyses where appropriate.

That there is a number of differing approaches to modelling, with a lack of a standard approach to handling issues such as patient selection, is likely a reflection of the relative rarity of evaluations with this type of data (we identified only 51 models, compared with the vast literature of health economic evaluations published [21]). Nevertheless, despite the lack of standard approaches and guidelines, some studies appear to be well conducted, with attempts to select an approach based on reasonable assumptions and control for any patient selection (for example Woods et al. [22]). Guidance has also recently been published by the NICE Decision Support Unit on the use of observational data in modelling where individual patient data are available for both trials [23], although this is not likely to be relevant in all instances, it does provide an outline of the available methods for use by modellers.

Whilst we have focussed on how comparative estimates have been generated, other limitations should also be noted regarding clinical studies without an internal control. These include limited sample size (with correspondingly large uncertainty) from which to extrapolate, the use of surrogate endpoints or interim endpoints (such as response rates rather than overall survival), and the duration of evidence collected (requiring extrapolation). Because of the limited information collected in studies without a control arm (both in sample size, duration and comparative data), regulators often specify the need for confirmatory clinical trials to be conducted. These may be comparative (yet in an earlier stage of disease) or may be single arm, and will most commonly be used to confirm the benefits seen with the new treatment in a larger cohort, and increase the number of treated patients for a better understanding of the adverse-event profile.

5 Conclusion

The majority of treatments granted a marketing authorisation without controlled study results have not been subject to economic evaluation in a published form, and there is a high level of non-submission to UK health technology agencies for such products. The evaluations that have been performed were generally based on naive comparisons to historical controls from individual arms of clinical trials, or registry/case series data.

Further research and guidance is required on the appropriateness of historical controls in economic evaluation, and on the most relevant methods to use when modelling without RCT data with the aim of estimating comparative effectiveness (including the relevance of data from other indications already approved). Ultimately, formal guidance and standardisation may reduce the level of bias in economic evaluations of indications approved without RCT data, and lead to an improvement in the average quality of published models. Standardisation would also provide a basis for comparison between studies, such that interventions can be more readily compared with other approaches to evaluation, where methods are comparable [24].