FormalPara Key Points

The structure of the models was consistent with little deviation from the pre-progression, post-progression and death health states.

The modelling of overall survival is routinely one of the most influential factors on the cost-effectiveness conclusions but is often associated with considerable uncertainty.

Studies should report with greater transparency their methods for extrapolating survival curves to reduce bias in cost-effectiveness analyses.

There is insufficient evidence to conclude which treatment is the most cost effective and further research is necessary.

1 Introduction

Lung cancer is one of the most commonly diagnosed cancers and the leading cause of cancer-related deaths globally [1], with non-small lung cancer (NSCLC) accounting for 85%–90% of all forms of lung cancer [2]. The development of targeted therapies and immunotherapies promises to fill some of the unmet needs for the treatment of advanced/metastatic NSCLC. To date, 13 agents have a label indication for the treatment of advanced/metastatic NSCLC in patients after failure to first-line chemotherapy (docetaxel, pemetrexed, ramucirumab with docetaxel, erlotinib, nintedanib with docetaxel, afatinib, nivolumab, pembrolizumab, atezolizumab, crizotinib, ceritinib, gefitinib and osimertinib), four of which are targeted therapies for patients with anaplastic lymphoma kinase expression (ALK +) or epidermal growth factor receptor expression (EGFR +) disease. In the absence of head-to-head comparison studies between most of the licensed drugs for this specific population, we showed, in a previous systematic review with network meta-analyses, that the three recent immune checkpoints inhibitors namely, nivolumab, pembrolizumab and atezolizumab, exhibited superior benefit/risk balance compared to other licensed drugs [3]. The same was found in a secondary analysis of trials using restricted mean survivals and parametric modelling to measure effectiveness [4].

However, due to the substantial costs of these drugs, their use is raising concerns because of the high economic impact these drugs are likely to have on health systems [5]. This advocates for the use of economic modelling to be conducted in order to comprehensively compare these licensed drugs on both the cost and effectiveness dimensions.

Prior to this comprehensive cost-effectiveness evaluation, we aimed to undertake a systematic review of existing economic evaluations relating to previously treated NSCLC drugs to synthesise existing evidence, specifically focusing on model-based economic analyses. This first stage is required because of the anticipated complexity of the cost-effectiveness modelling of NSCLC drugs. In this systematic review, we have summarised the modelling techniques, clinical inputs, resource use and costs, and outcome measures used in the analyses, and suggested key issues to consider in developing further cost-effectiveness models. Previous systematic reviews comparing the clinical effectiveness of interventions for NSCLC have been published [3, 6], but our literature search did not identify any systematic reviews with a focus on cost-effectiveness evidence. This paper addresses this gap in the literature.

2 Methods

The protocol for this systematic review was registered on the PROSPERO international prospective register of systematic reviews [7].

2.1 Search Strategy

A literature search of published economic evaluations was performed, following the PRISMA (Preferred Reporting Items for Systematic review and Meta-Analysis) guidelines [8]. Electronic databases [MEDLINE, EMBASE (Ovid), Cochrane Library (Wiley), Science Citation Index (Web of Knowledge), Research Papers in Economics (RePEC) and the Cost-Effectiveness Analysis (CEA) Registry], and the National Institute of Health and Care Excellence (NICE) website were searched for relevant literature. We also performed citation searches and searched reference lists of relevant included studies, and any previously published systematic reviews. The search was limited to studies published in the English language from 1 January 2001 until 26 July 2019. This start point was chosen because it corresponds to the year that docetaxel was appraised by NICE for NSCLC, docetaxel being the first agent that was labelled for this indication and was established as the standard of care in second-line therapy [9]. The search strategy combined NSCLC terminology with economic terms. A copy of the search terms is available in the supplementary information.

2.2 Study Selection/Inclusion and Exclusion Criteria

All citations retrieved were screened independently by two reviewers (DG and PA) at title/abstract stage, and full texts of potentially relevant records were further examined. Any disagreements between the reviewers were resolved by consensus or recourse to a third reviewer (XA). We examined original papers, technology appraisal guidance, letters, editorials and meeting abstracts. Studies were considered to be relevant if the study examined at least one treatment with label indication for advanced/metastatic NSCLC as of January 2018 (docetaxel, pemetrexed, ramucirumab with docetaxel, erlotinib, nintedanib with docetaxel, afatinib, nivolumab, pembrolizumab, atezolizumab, best supportive care alone or in combination with a drug of interest. We excluded the four targeted therapies (crizotinib, ceritinib, gefitinib and osimertinib). To be included, studies should have used an economic analysis to compare treatments licensed for adults with advanced/metastatic NSCLC, and meeting the following characteristics:

  • Non-squamous (adenocarcinoma, large cell), or squamous histology

  • ALK expression either predominantly negative or 100% negative

  • EGFR expression either predominantly negative or 100% negative

  • Patients who experienced failure to prior first line chemotherapy (i.e. those receiving second-line treatment and beyond)

We excluded studies that included people with ALK + and/or EGFR + expression, as according to current practices, these patients are routinely offered targeted therapies.

2.3 Data Extraction and Synthesis

Two reviewers (DG & PA) each extracted information from half of the studies and further cross-checked each other’s extractions. Any disagreements were resolved by discussion or by recourse to a third reviewer (XA). Information was extracted on study details (title, author and year of study), baseline characteristics (population, intervention, comparator and outcomes), methods (study perspective, time horizon, discount rate, measure of effectiveness, units of currency, conversions, assumptions and analytical methods), results (study parameters, base-case and sensitivity analysis results), discussion (study findings, limitations of the models and generalisability), other (source of funding and conflicts of interests), overall comments and conclusions (author’s and reviewer’s). A template of the extraction form is provided in the supplementary information. Information extracted from the included studies were summarised and presented in Table 1. Due to the nature of economic analyses (different aims/objectives, study designs, populations, and methods) these findings from individual studies were compared narratively, and recommendations for future economic analyses are discussed.

Table 1 Summary characteristics and results of included studies

2.4 Critical Appraisal and Quality Assessment Tools

The reporting quality of the studies was assessed against the Consolidated Health Economic Reporting Standards (CHEERS) [10] and the Philips’ checklist [11], respectively. PA and DG critically appraised half of the final list of included studies, with XA independently verifying the accuracy of the information. Any differences were resolved by discussion or by a fourth assessor (HM).

3 Results

3.1 Search Results

Details of the literature search and review process can be found in the PRISMA flow chart [12] in Fig. 1. Following screening of the 837 identified records, 612 were screened at title and abstract and 54 were assessed at full text level, with 30 records included in the review, representing 30 separate studies.

Fig. 1
figure 1

PRISMA flow diagram detailing the results of the search and screening process

3.2 Summary of Modelling Techniques, Clinical Inputs, Resource Use and Costs, and Outcome Measures

3.2.1 Structure

Eight studies did not use a formal economic model for their analysis, and used benefits observed from a clinical trial or registry data [13, 14, 21,22,23, 26, 37, 41]. The structures of the economic models in all other studies were clearly stated and were consistent, all reflecting the progressive nature of NSCLC. Most model-based studies used the same three health states (progression-free, post-progression and death) to distinguish between patient quality of life and associated costs, but two used alternative health states [15, 35]. The most commonly used models were partitioned-survival models [16, 24, 25, 28,29,30,31,32,33,34,35,36, 39, 40, 42] and Markov models [15, 18,19,20, 27, 28, 40]. Carlson et al. appeared to use a decision tree [17], whilst the type of model used was unclear for two studies [36, 38]. All studies but one [27] with model-based cost-effectiveness analyses clearly stated the time horizon, which ranged from 2 years [16,17,18,19] to 25 years [35]. The study that did not provide a specific time horizon did state that it used a ‘lifetime’ time horizon [27]. The choice of economic model was rarely well justified, and it was often unclear why a study opted for their implemented approach. It is possible that the modelling approach may influence the outcome and so it is important to consider which approach is best suited to answer the question with the data available [43]. Goeree et al. and Gao et al. both compared Markov and partitioned survival approaches. The results from Goeree et al. were similar for both models, although for one treatment comparison the incremental cost-effectiveness ratio (ICER) did vary by CAD$1200/quality-adjusted life years (QALY) based only on the modelling approach [28]. The results of Gao et al. produced ICERs that differed by over A$20,000/QALY [40].

3.2.2 Data

The source of clinical and cost inputs was reported and was adequate for all studies, except one [37] where the sources were not referenced. All studies which included medical resource use in their analysis clearly stated the source of resource use information used. The choices of outcome measure were clearly stated and always consistent with model structure [overall survival (OS) and progression-free survival (PFS)]. All but two studies [37, 41] stated the perspective from which their economic analysis was conducted, with all other studies including costs from the perspective of the relevant healthcare system or funder. Ten studies reported also considered costs related to personal social services [15, 18, 19, 24, 25, 29, 32, 34, 35, 38].

3.2.3 Uncertainty and Assumptions

Almost all studies explored potential sources of uncertainty in their analysis, with three studies not performing any sensitivity analyses [36, 37, 41]. Uncertainty was most commonly explored through one-way sensitivity analyses (OWSA) (n = 23), probabilistic sensitivity analyses (PSA) (n = 22), and scenario analyses (n = 21). Sixteen studies included all three approaches [18,19,20, 22,23,24,25, 27, 29,30,31,32,33,34,35, 40]; however, there was inconsistency over the parameters included in the sensitivity analyses.

The most influential factors observed in multiple studies were the survival related parameters, including hazard ratios, parametric curves and cure proportions, and utility values; however, not all studies were exhaustive in their inclusion of variables within their sensitivity analyses. No study comprehensively addressed all potential sources of uncertainty.

The majority of assumptions were related to patient survival, assuming either equal survival or proportional hazard rates between different interventions. Additional assumptions made within studies were related to the utility values, treatment duration or the impact of later lines of treatment. Many studies did not report any assumptions made within their economic analysis. Only the technology appraisals presented analyses where the impacts of the main assumptions were assessed through the application of alternative assumptions.

3.2.4 Economic Results

Table 1 summarises the characteristics of included studies. Docetaxel was the most commonly considered intervention, with only three studies not including it as an intervention of interest or comparator [23, 31, 39]. The majority of studies reported results in terms of cost per QALY as well as cost per life-year gained (LYG), with six reporting results in terms of LYG alone [13, 14, 23, 26, 37, 41] and two only in terms of QALYs [38, 39]. One study did not present any form of cost-effectiveness ratio, due to the interventions being indistinguishable in their benefits [21].

The majority of studies were from Western Europe (UK = 11 [14, 15, 18, 19, 24, 25, 29, 32,33,34,35], France = 2 [22, 31], Spain = 3 [20, 26, 41], Portugal = 1 [16], Switzerland = 1 [27]) with the remaining studies based in the Americas (Canada = 7 [13, 17, 21, 23, 28, 38, 42], USA = 2 [30, 37], South America = 1 [36]), China = 1) [39] and Australia = 1) [40].

Most studies (n = 19) were sponsored by pharmaceutical companies with 11 studies not declaring any pharmaceutical support [18, 21,22,23,24, 26, 27, 36, 37, 40, 41].

3.2.5 Patient Characteristics

All analyses focussed on the same general population (patients on second line or later treatment for previously treated metastatic or advanced NSCLC); however, there was slight variation in the staging of patients with two studies including stage IIIA patients in addition to IIIB and IV stage patients [15, 16], and five studies not reporting on the disease stage of patients [14, 27, 36, 37, 41]. Aside from Carlson et al. [17], (stage III and IV) and technology appraisal 403 (stage IV only) [29], all other studies considered stage IIIB and IV patients. Restrictions on ECOG performance status was also reported by 13 studies, with restrictions varying across 0 to 1 [29, 33, 34, 40, 42], 0 to 2 [13,14,15, 22, 39, 41], and 0 to 3 [19, 24]. Two analyses focussed on non-squamous disease [20, 34], six on squamous disease [28, 31, 33, 38,39,40], with the remaining studies not distinguishing between NSCLC subtypes. Nine studies presented results by subgroup, with the range of subgroups considered: ECOG score [15, 16], line of treatment [16], squamous/non-squamous disease [29, 36, 37], EGFR negative/unknown [24, 34], programmed death-ligand 1 (PD-L1) expression [27, 34, 36, 37] and adenocarcinoma [29, 32].

3.2.6 Survival

A range of approaches were used to modelling the clinical effectiveness of the interventions. Retrospective studies used the mean or median survival observed from their relevant source of data. Meanwhile the Markov models assumed a constant hazard to calculate the transition rates between its health states. An increasingly popular approach was to use a partitioned survival model where the PFS and OS curves are modelled parametrically, either jointly to multiple trial arms, or independently. This provides the number of patients in the progression-free and death health states. The number of patients in the post-progression heath state is then calculated as the difference between the PFS and OS curves. For indirect comparisons, hazard ratios were estimated and applied to parametric models.

3.3 Quality Assessment

We assessed the reporting quality of 30 studies using both the CHEERS and Philips checklists, summaries of which can be found in Tables 2 and 3, respectively. The reporting quality was generally high, with the majority of items on both checklists fulfilled by over 85% of studies. All studies reported their characteristics such as setting, perspective, the comparators and measures of effectiveness. There were several key areas for improvement: inclusion of additional relevant comparators, presentation of justification when multiple sources of information were available, consideration of subgroups and other sources of heterogeneity and discussions of the generalisability of the findings. It was apparent that half-cycle corrections were not used in the majority of models, but given the short cycle length used, this was not thought to detract from the quality.

Table 2 Summary of results from CHEERS checklist
Table 3 Summary of results of Phillips checklist

4 Discussion

This review demonstrated that there are a number of different approaches to performing an economic analysis within the scope of assessing therapeutic options for advanced/metastatic NSCLC. Whilst in the older studies economic evaluations of clinical trials were very popular, as computing power and awareness of modelling techniques increased, Markov models and partitioned survival models have become more common. This likely reflects better awareness of modelling approaches combined with superior treatments and healthcare which prolong patient survival. Whilst 2 years of trial follow-up may have been sufficient to observe all survival events 20 years ago, with time horizons of over 20 years in the more recent articles, it is clear that some prediction and accompanying assumptions are necessary.

Whilst all of the studies provided a comparison to a suitable and relevant intervention, often the comparator was not recently licensed. Whilst this may be explained by the rapidly evolving nature of healthcare and interventions, there may also be a bias when selecting comparators to ensure new interventions look as good as possible [44]. It is important to compare to the current best treatments, to ensure patients receive the best care and that a healthcare system receives optimal value for money.

The complexity of the evaluations varied greatly, with some making assumptions such as equal efficacy between treatments with limited or no direct comparative evidence, whilst others creating de novo economic models.

Alongside the shift towards partitioned survival modelling is the consideration of quality of life, through quality-adjusted life years (QALYs), rather than length of life alone, life years (LYs). This reflects a change in attitude of decision makers that the quality of life of patients should be considered with the length of life, and that treatments which offer life extending benefits but with heavy side effects may not be in the interest of patients.

There was a general trend of increasing time horizon as studies became more recent, suggesting that improved healthcare is improving the life expectancy of NSCLC patients.

It was rare for studies that were not directly related to a technology appraisal to consider and explore sources of uncertainty within their economic evaluation. Those that did explore uncertainty performed either probabilistic sensitivity analyses, allowing for uncertainty around multiple factors feeding into the economic model, or explored scenario/one-way sensitivity analyses, using confidence intervals or other parameter values to capture uncertainty in individual parameters.

A challenge of this systematic review was how to extract information from the evaluations directly from a NICE technology appraisal, as there can often be multiple opinions from the company, the ERG and even the committee themselves. Opinions too may change during an appraisal with the availability of more information. It was sometimes challenging for our review team to select the most useful information for inclusion in this review, and so we focused our extraction on the first available set of committee papers.

It is plausible that publication-based evaluations may also be hampered from mistakes in modelling or bias(es) that are not identified, without the level of critique that comes with a NICE technology appraisal. A further limitation is that we have not specifically captured the quality of the methodology within each paper, having focused on the reporting quality, nor completed a formal assessment of transferability of each study.

The geographical range of studies showed that the cost effectiveness of treatments is an important factor in the decision-making process in many countries around the world. The transferability of all the results is difficult to ascertain because what may be cost effective in one setting is not necessarily cost effective in another setting. Different countries have different healthcare priorities and budgets with which to accomplish them. Indeed, the relative cost effectiveness of two interventions may vary between countries due to differences in administration, cost and availability of later line treatments, and discounts offered by the manufacturer on the interventions. Whilst aspects of the different studies may be generalisable to other settings, the different currencies, decision makers and funders make it difficult to transfer conclusions of cost effectiveness.

It is difficult to draw conclusions over which treatment is the most cost effective, not least because manufacturers often offer a discount on the list price for their interventions. These discounts are confidential, and so economic analyses published in journals are based on list prices, with only analyses from decision-making processes (such as NICE technology appraisal documentation) including the actual prices paid. Whilst this suggests that technology appraisals may be the more informative source of information, part of the cost-effectiveness results are often redacted. Whilst the ICER is usually available, detailed breakdowns of costs and benefits are withheld to prevent back-calculation of the discount.

Whilst all licensed interventions have had their cost effectiveness assessed against at least one comparator, there has been no published work comparing them simultaneously. This review has highlighted an unmet area of research. In order to ensure that health services receive best value-for-money, it is important to perform such an evaluation.

Both partitioned survival models and Markov models have their limitations. A Markov model can cope with any number of health states, and allow for transitions from any one state to any other. However, these transition probabilities will often be modelled in a simple manner and assumed to be constant over time. It becomes harder to obtain reliable estimates for the transitions when modelling more health states.

Meanwhile, a partitioned survival model can more easily capture hazards which vary over time, utilising a wide range of parametric survival curves, but it requires the health states to be ordered with transition between them only allowed in one direction. Whilst this may be adequate at present for progressive diseases such as NSCLC, it is unclear whether they will always be suitable for capturing the benefits of future treatments. As demonstrated by Goeree et al. [28], the approaches can lead to almost identical results. It is likely that for the majority of interventions considered in this review, the decision to analyse using either a partitioned survival model or a Markov model is relatively inconsequential on the outcome. However, for more recent interventions such as immunotherapies, which claim to be very effective in certain patients, both approaches can fall short of accurately capturing the patient pathway without adjustment. For example, two of the most recent technology appraisals reported altering the basic partitioned survival framework to assume that certain patients were cured or at a reduced risk of a cancer-related death [34, 35]. Further developments in the treatment for advanced NSCLC may require further adjustments to be made to the traditional modelling approaches, but we are not certain of the suitability of any adjustments without supporting data.

Whilst all aspects of a cost-effectiveness analysis should be scrutinised, survival extrapolations should be given extra attention since they were highly influential to cost-effectiveness results in a number of studies. If an intervention was wrongfully demonstrated to be cost effective, and became a benchmark for future treatments to be assessed against, this could result in more heavily stretched healthcare budgets. With model time horizons increasing alongside pressure from public and patient demands to get rapid access to treatments, survival extrapolations will only become more influential. In the NICE technology appraisals, it was common for the ERG to disagree with the company’s survival-related assumptions. It raises questions over the reliability of the extrapolations in other published studies, as it is unlikely that the peer-review process contained the same rigour as a NICE technology appraisal. A recent review of NICE technology appraisals showed that in only 7% of appraisals did the ERG agree with all the major survival-related assumptions [45]. This demonstrates the need for well-established guidelines to reduce the extent to which survival extrapolations are based on subjective assumptions. Methods detailing the selection of extrapolation approach should be clearly described, with all supporting material provided in appendices.

We recommend that an economic model should accurately capture all of the major phases of a patient’s pathway. The framework, inputs and assumptions should be clearly stated and referenced. Inputs should be relevant to the population and setting where possible. The potential effects of key areas of uncertainty should be explored through OWSAs, PSAs and scenario analyses. Supporting evidence related to decisions around influential assumptions, such as choice of survival extrapolation, should be presented as supplementary material to maximise transparency and reproducibility.

This approach could be used to undertake a cost-effectiveness assessment comparing all currently licensed drugs used for EGFR and ALK negative advanced/metastatic NSCLC, and could be extended to other disease areas.

5 Conclusion

This review summarises the range of methods used in assessing the cost effectiveness of licensed interventions for advanced/metastatic NSCLC. The structure of the models was generally consistent. The modelling of overall survival is routinely one of the most influential factors on the cost-effectiveness conclusions and often contains considerable uncertainty due to the short follow-up of the most recent studies used in the economic evaluations. Transparency over survival extrapolation approaches is critical to reduce bias in cost-effectiveness analyses.