FormalPara Key Points for Decision Makers

Vedolizumab appears to be more effective than placebo in both the induction and maintenance phases in patients with moderate-to-severe, active Crohn’s disease who have had an inadequate response to, loss of response to or intolerance of conventional therapy or tumour necrosis factor alpha antagonist (anti-TNF-α) therapy. However, it is noted that the primary endpoint (clinical remission at week 6) in the GEMINI III trial was not met, though clinical remission was observed at week 10.

The effectiveness of vedolizumab compared with adalimumab and infliximab is unknown and uncertain in the absence of a head-to-head, randomised, controlled trial, and given the differences between the studies included in a network meta-analysis.

The Evidence Review Group identified a number of limitations in the company’s model that were believed to limit the robustness of the results presented by the company.

The National Institute for Health and Care Excellence Appraisal Committee recommended vedolizumab as an option for treating moderately to severely active Crohn’s disease only in patients for whom anti-TNF-α therapy has failed (that is, the disease has responded inadequately or has lost response to treatment), cannot be tolerated or is contra-indicated, on the condition that the company provides vedolizumab with the discount agreed to in the patient access scheme.

1 Introduction

In order to be recommended for use within the National Health Service (NHS) in England, health technologies must be shown to be clinically effective and to represent a cost-effective use of NHS resources. The National Institute for Health and Care Excellence (NICE) is an independent organisation, which provides national guidance and advice to improve health and social care. The NICE single technology appraisal (STA) process covers a single technology in a single indication and is usually conducted soon after a UK marketing authorisation is granted [1]. The manufacturer of the technology submits a written submission to NICE, which details the company’s estimates of the clinical effectiveness and cost effectiveness of the technology, together with an executable health economic model, which provides estimates of the cost per quality-adjusted life-year (QALY). An independent external organisation (the Evidence Review Group [ERG]) reviews the submission in consultation with clinical specialists and produces an ERG report. The NICE Appraisal Committee then meets to make a decision based on the company’s submission (CS), the ERG report and testimony from experts and other stakeholders. Where the Committee decides to recommend a technology without restrictions, a Final Appraisal Determination (FAD) is issued. Where the initial decision is to restrict or not recommend a technology, an Appraisal Consultation Document (ACD) is produced. Stakeholders are then invited to comment on the ACD and on the submitted evidence, after which a subsequent ACD may be produced or a FAD may be issued, which is open to appeal.

This paper presents a summary of the ERG report [2, 3] for the STA of vedolizumab for the treatment of adults with moderate-to-severe, active Crohn’s disease (CD) and the subsequent development of the NICE guidance for the use of this drug in England. Full details of all relevant appraisal documents can be found on the NICE website [4].

2 The Decision Problem

The prevalence of CD is approximately 50–100 per 100,000 patients, with CD estimated to affect approximately 60,000 patients in the UK in total [5]. CD is characterised by a chronic, relapsing inflammation, which mainly affects the gastrointestinal tract and is often accompanied by abdominal pain, diarrhoea, weight loss, malaise and anorexia [6, 7]. CD may also be complicated by stricturing (leading to intestinal obstruction), fistulae (often perianal) or abscesses [7].

The diagnosis of CD combines the patient history, physical symptoms and evidence from imaging and laboratory studies [6]. Disease activity, in combination with phenotypic and endoscopic features, allows stratification of patients and selection of appropriate therapeutic strategies [6]. In clinical trials, the Crohn’s Disease Activity Index (CDAI) has been widely used to describe disease activity [8], though the index is based on symptoms and does not capture intestinal mucosal activity or healing. A simplified form, the Harvey Bradshaw Index, may be used in trials and clinical practice.

The aim of drug treatment in CD is to induce and maintain remission and mucosal healing, with the optimal outcome of maintaining corticosteroid-free remission and reducing complications and the need for hospitalisations and surgery [5].

Existing guidelines [5, 7] suggest a standard ‘step-up approach’ for the treatment of CD in the UK. This involves the initial use of monotherapy with a conventional or locally released glucocorticosteroid to induce remission, escalating to the addition of azathioprine, mercaptopurine or methotrexate in patients who do not respond.

Infliximab and adalimumab are currently recommended as treatment options for adults with severe, active CD whose disease has not responded to conventional therapy, and for those who are intolerant of or have contra-indications to conventional therapy [9].

Vedolizumab (brand name Entyvio®, Takeda UK) is a humanised monoclonal antibody, which binds exclusively to α4β7 integrin on gut-homing T-helper lymphocytes and selectively inhibits adhesion of these cells to mucosal vascular addressin cell adhesion molecule 1 (MAdCAM-1) and fibronectin, but not vascular cell adhesion molecule 1 (VCAM-1) [5]. Vedolizumab has a therapeutic indication for the treatment of adults with moderately to severely active CD who have had an inadequate response to, loss of response to or intolerance of conventional therapy, including tumour necrosis factor alpha antagonist (anti-TNF-α) therapy [10]. The recommended dosage of vedolizumab is 300 mg, given by intravenous infusion at 0, 2 and 6 weeks and then every 8 weeks thereafter. In patients who have not shown a response by week 10, an additional dose should be considered at that point, resulting in a 0, 2, 6, 10 and 14-week schedule. The licensing states that treatment should be stopped if no evidence of therapeutic benefit is observed by week 14 [10]. Finally, the licensing states that the dose could be increased to every 4 weeks in patients who have experienced a decrease in their response.

NICE issued a final scope [11] to appraise the clinical effectiveness and cost effectiveness of vedolizumab, within its licensed indication, for the treatment of moderate-to-severe, active CD in adults in whom the disease has responded inadequately to, or is no longer responding to, either conventional therapy or an anti-TNF-α agent, or who are intolerant of either therapy.

3 The Independent ERG Review

The company provided a submission to NICE on the clinical and cost effectiveness of vedolizumab for the treatment of patients with moderate-to-severe, active CD [5]. The CS [5] included a systematic review and network meta-analysis (NMA) of the clinical effectiveness literature and a model-based health economic analysis.

In line with the STA process, the ERG critically reviewed the evidence presented in the CS by assessing (1) whether the submission conformed to NICE methodological guidelines; (2) whether the company’s interpretation and analysis of the evidence were appropriate; and (3) the presence of other evidence or alternative interpretations of the evidence. In addition, the ERG identified areas requiring clarification, for which the company provided additional evidence.

3.1 Clinical Effectiveness Evidence Submitted by the Company

The company [5] presented a systematic review of the clinical effectiveness and safety of vedolizumab for the treatment of moderately to severely active CD in adults in whom the disease has responded inadequately to, or is no longer responding to, either conventional therapy or an anti-TNF-α agent, and in those who are intolerant of either therapy. The systematic review aimed to assess the best available evidence to evaluate the efficacy and safety of all biologic treatments (vedolizumab, adalimumab and infliximab) in patients with moderate-to-severe CD, to inform an NMA.

Two trials, GEMINI II [12] and GEMINI III [13], formed the main supporting evidence for the intervention. Both studies were phase III (GEMINI II was performed in 39 countries; GEMINI III was performed in 19 countries), randomised, double-blind, placebo-controlled, multicentre trials designed to evaluate the efficacy and safety of vedolizumab, and included patients who were naïve to anti-TNF-α therapy and patients who had an inadequate response to, loss of response to or intolerance of immunomodulators or an anti-TNF-α agent.

GEMINI II [12] was designed to evaluate the efficacy and safety of vedolizumab as an induction treatment (dosing at weeks 0 and 2, with assessment at week 6) and maintenance treatment (during weeks 6–52, with dosing every 4 or 8 weeks). In contrast, GEMINI III [13] was designed to evaluate the efficacy and safety of vedolizumab as an induction treatment only, with a dosing regimen of weeks 0, 2 and 6, and assessment at weeks 6 and 10. In general, the efficacy analyses in GEMINI II and III [12, 13] were conducted according to the intention-to-treat (ITT) principle, whereby patients who withdrew prematurely were considered treatment failures.

For the 6-week induction phase of GEMINI II [12], 368 individuals were randomised (in a 3:2 ratio) to receive intravenous vedolizumab 300 mg or placebo (as saline) at weeks 0 and 2 (cohort 1). In order to fulfil sample size requirements for the maintenance study, an additional 748 individuals were enrolled in an open-label group (cohort 2), which also received intravenous vedolizumab 300 mg. For the maintenance phase, patients from both cohorts (cohorts 1 and 2) who had a clinical response (defined as a >70-point decrease in the CDAI score) to vedolizumab at week 6 were randomised (in a 1:1:1 ratio) to receive double-blind treatment with intravenous vedolizumab 300 mg every 8 weeks (with placebo administered at every other visit to preserve blinding), intravenous vedolizumab 300 mg every 4 weeks or placebo every 4 weeks for up to 52 weeks. Randomisation was stratified by three factors: (1) cohort; (2) concomitant use/non-use of glucocorticoids; and (3) concomitant use/non-use of immunosuppressive agents or prior use/non-use of anti-TNF-α agents, or both. The two primary endpoints in the induction trial phase were an enhanced clinical response at week 6 (defined as a ≥100-point decrease in the CDAI score) and clinical remission at week 6 (defined as a CDAI score ≤150 points). The primary endpoint for the maintenance trial phase was clinical remission at week 52. Secondary outcome measures included an enhanced clinical response at 52 weeks, glucocorticoid-free remission at week 52 in patients receiving glucocorticoids at baseline, and durable clinical remission (defined as clinical remission at ≥80 % of study visits, including the final visit). The proportion of patients meeting these endpoints was analysed.

During the 10-week induction phase of GEMINI III [13], 416 individuals were enrolled, of whom 315 had a previous inadequate response to, loss of response to or intolerance of one or more anti-TNF-α agents, and 101 individuals were naïve to anti-TNF-α therapy. Patients were randomly assigned to receive intravenous vedolizumab (300 mg) or placebo (as saline) at weeks 0, 2 and 6, with three stratification factors: (1) the presence or absence of previous anti-TNF-α failure; (2) concomitant use/non-use of glucocorticoids; and (3) concomitant use/non-use of immunosuppressive agents. The primary endpoint in GEMINI III [13] focussed on patients for whom an anti-TNF-α agent had failed (i.e. patients with an inadequate response to, loss of primary response to, loss of secondary response to or intolerance of ≥1 anti-TNF-α agent), and was the proportion of patients in clinical remission (those with a CDAI score ≤150 points) at week 6. A secondary analysis evaluated an overall population, which included patients who were naïve to anti-TNF-α therapy, and pre-specified exploratory analyses examined the group naïve to anti-TNF-α therapy.

Key efficacy data for both trials are presented in Table 1. Only the results for the primary outcomes are summarised here. Further efficacy data can be found in the CS [5] and the ERG report [2, 3].

Table 1 Summary of key efficacy outcomes

In GEMINI II [12], patients treated with vedolizumab had significantly higher rates of clinical remission (CDAI score ≤150 points) at week 6 in comparison with placebo (14.5 % versus 6.8 %, treatment difference 7.8 % [95 % confidence interval (CI) 1.2 to 14.3], p = 0.0206) in the ITT population. In the anti-TNF-α-failure population in GEMINI III [13], there was no significant difference in the proportion of patients achieving clinical remission at week 6 between vedolizumab and placebo (15.2 % versus 12.1 %, treatment difference 3.0 % [95 % CI −4.5 to 10.5], p = 0.433); thus, vedolizumab was not significantly better than placebo with respect to the primary outcome. Therefore, the statistical evaluations of all remaining endpoints in GEMINI III were considered exploratory. For the full recruited population in GEMINI III, the exploratory analysis reported a significant difference in favour of vedolizumab (19.1 % versus 12.1 %, treatment difference 6.9 % [95 % CI 0.1 to 13.8], p = 0.0478) for clinical remission at week 6. Only GEMINI III reported results at 10 weeks, and significant differences in clinical remission in the exploratory analyses were reported in both the anti-TNF-α-failure population (14.4 % [95 % CI 5.7 to 23.1], p = 0.0012) and the whole recruited population (15.5 % [95 % CI 7.8 to 23.3], p < 0.0001).

There was no significant difference between the vedolizumab and placebo groups for the second primary outcome in GEMINI II [12], which analysed the number of patients achieving an enhanced clinical response (defined as a 100-point CDAI score reduction from baseline) at week 6.

In the maintenance phase of GEMINI II [12], 48 % of patients (242/461) discontinued from the study. Patients treated with vedolizumab every 8 weeks and every 4 weeks had significantly higher rates of clinical remission at week 52 in comparison with placebo (treatment difference 17.4 % [95 % CI 7.3 to 27.5], p = 0.0007, and 14.7 % [95 % CI 4.6 to 24.7], p = 0.0042, respectively).

In the absence of any direct head-to-head, randomised, controlled trials comparing vedolizumab with other relevant biologic therapies (adalimumab and infliximab) for the treatment of moderate-to-severe CD, the company conducted an NMA [5]. The NMA, as reported in the CS [5], compared vedolizumab, adalimumab, infliximab and placebo for the outcomes of clinical response, enhanced clinical response, clinical remission and discontinuation due to adverse events (AEs). The data were gathered from the trials GEMINI II [12], GEMINI III [13], CLASSIC I [14], Targan et al. (1997) [15], ClinicalTrials.gov study NCT00105300 [16], ClinicalTrials.gov study NCT00445939 [17], EXTEND [18], ACCENT I [19], CLASSIC II [20], ClinicalTrials.gov study NCT00445432 [17] and CHARM [21]. The size of the network for each outcome varied depending on the availability of the data in each study.

In the induction phase in the anti-TNF-α-naïve population, for clinical response (a drop ≥ 70 points in the CDAI score), all treatments were significantly more effective than placebo. Infliximab was significantly better than vedolizumab. For clinical remission, all treatments except for adalimumab 40/20 mg (a dose not licensed in the UK) were significantly better than placebo. In pairwise comparisons, infliximab was significantly better than vedolizumab at 10 and 6 weeks; vedolizumab had a better odds ratio (OR) versus placebo than adalimumab 80/40 mg but a worse OR versus placebo than adalimumab 160/80 mg, though neither difference was significant. For the outcome of discontinuations due to AEs, adalimumab 160/80 mg was significantly better (with a lower discontinuation rate) than vedolizumab; there were no data available for infliximab.

In the maintenance phase in the anti-TNF-α-naïve population, vedolizumab every 4 weeks was significantly better than placebo only for the outcome of clinical remission. Vedolizumab every 8 weeks was significantly better than placebo for both clinical response and clinical remission. Infliximab was significantly better than placebo for all three outcomes (clinical remission, clinical response and discontinuation due to AEs). The significance of the difference in clinical response between vedolizumab and infliximab was not reported for the standard infliximab dose (5 mg) licensed in the UK, but infliximab 10 mg was significantly better than vedolizumab every 4 weeks. The clinical response OR for infliximab 5 mg versus placebo was better than those for both vedolizumab every 4 weeks and vedolizumab every 8 weeks (the dose every 4 weeks is licensed in the UK only for patients who have experienced a decrease in their response; it was not clear if the patients in this analysis met this criterion). The difference between vedolizumab and infliximab for the outcome of clinical remission was not significant. There was a high OR for discontinuation due to AEs with infliximab compared with placebo; vedolizumab was significantly better than infliximab in terms of discontinuations due to AEs.

For most outcomes, no significant differences were observed between vedolizumab and adalimumab in the induction phase in the anti-TNF-α-experienced/failure network analyses. A network analysis for the maintenance phase in the anti-TNF-α-failure subgroups was not possible, because of lack of data.

3.1.1 Critique of Clinical Effectiveness Evidence and Interpretation

The ERG considered the systematic review process followed by the company to be satisfactory, although the details were not reported fully in the CS [2] but were provided in a separate document (commercial in confidence). Despite minor limitations in the company’s search strategy, the ERG was confident that all relevant studies of vedolizumab were included in the CS [2]. The specified inclusion and exclusion criteria appeared generally appropriate, though lacking in detail in places, and reflected the information given in the decision problem. The validity assessment tool used to appraise the included studies, as suggested by NICE’s Specification For Company/Sponsor Submission Of Evidence template [22], was based on the quality assessment criteria for randomised, controlled trials and was considered appropriate by the ERG [2].

The efficacy and safety of vedolizumab was positively demonstrated in GEMINI II [12]. Because of the high discontinuation rates in the maintenance phase of GEMINI II, estimates of treatment effects (including the magnitude) may have been affected. The imputation of missing patients as failures should, however, have limited the impact of attrition on estimates of efficacy to underestimation of treatment effects, though the effect of attrition could be more problematic for safety outcomes and lead to underestimates of AEs. The trials assessed response in the induction phase earlier than would be done in the UK—at 6 weeks compared with 10 weeks. As such, the population entering the maintenance phase of GEMINI II [12] may not have been fully representative of the UK spectrum, as patients who took longer to respond were excluded. This could conceivably have led to an overestimation of maintenance treatment effects if these patients were also less likely to maintain a response when in remission. In addition, the trial of maintenance therapy was not of sufficient size or duration to estimate the risk of uncommon AEs. The primary endpoint was not achieved in GEMINI III [13]; therefore, the statistical evaluation of the secondary endpoints was acknowledged as exploratory by the company.

The ERG considered that the results presented in the company’s NMA may have underestimated the uncertainty in treatment effects, since fixed-effects models were used [5]. The network analyses included in the CS [5] were of varying quality and relevance. The results of the ‘entire population’ network analyses were thought to be difficult to interpret, as the study populations were too heterogeneous in terms of potentially important treatment-modifying effects [2]. The anti-TNF-α-failure network analysis may have overestimated the efficacy of adalimumab, as primary anti-TNF-α-failure patients were excluded from the adalimumab study but not from the vedolizumab studies. Several studies across the evidence base excluded patients with strictures, meaning that generalisation to this population is problematic, and most did not report the proportion of patients with fistulising disease, so it is unclear whether all studies were representative of UK populations in this respect [2]. Similarly, no studies included patients with a CDAI score >450 points, meaning that generalisation to severe patients (if defined as those with a CDAI score of 450–600 points) is problematic. Uncertainty remains around how the ‘usual care’ comparator provided in the studies compares with UK practice. No analysis for serious AEs was provided for the anti-TNF-α-naïve networks. Additionally, for the induction network analyses, there were limitations with the induction schedule used in the included trials, with fewer doses than recommended being provided and/or assessments taking place earlier than would be done in UK practice or earlier than stated in the licence. The maintenance network analyses were subject to potential bias from the recruitment of patients on the basis of assessment at earlier timepoints that would commonly be done in the UK.

3.2 Cost-Effectiveness Evidence Submitted by the Company

The company submitted a model-based health economic analysis as part of its submission to NICE, which was subsequently revised. The analysis was undertaken from the perspective of the NHS over a 10-year time horizon. All costs and health outcomes were discounted at a rate of 3.5 % per annum, in accordance with the NICE guidance. The company’s analysis was presented for three populations: (1) the mixed ITT population, which comprised patients who had previously received anti-TNF-α therapy and those who were anti-TNF-α naïve; (2) patients who were anti-TNF-α naïve only; and (3) patients who had previously received anti-TNF-α therapy only. Within all three analyses, the comparators included conventional non-biologic therapies (a combination of 5-aminosalicylic acids [5-ASAs], immunomodulators and corticosteroids). Other anti-TNF-α agents (infliximab, adalimumab) were included only in the analysis of the anti-TNF-α-naïve subgroup; they were excluded from the analyses of the mixed ITT and anti-TNF-α-failure subgroups.

The company’s model structure was based on the structure published by Bodger and Hughes [23] and adopted a hybrid approach, whereby a decision tree was used to evaluate outcomes at the end of the initial induction therapy, during which all patients received initial treatment to induce a response (Fig. 1). The induction period was assumed to be 6 weeks for all biologic and non-biologic therapy. A Markov structure (8-week cycle) was used afterwards to evaluate subsequent outcomes (Fig. 2). The model was composed of a total of 12 mutually exclusive and exhaustive health states, according to the treatment received, the severity of the condition and surgery.

Fig. 1
figure 1

Decision tree for induction treatment (reproduced from the company’s submission). aResponse is defined as a drop ≥70 in the Crohn’s Disease Activity Index score. Asterisk indicates Markov structures. AE adverse event, CT conventional therapy, Mod/Sev moderate to severe

Fig. 2
figure 2

Markov model schematics for the Crohn’s disease maintenance phase and beyond (reproduced from the company’s submission). aThe reasons for discontinuation include lack of response and adverse events. Discontinuation due to adverse events is applicable only to responders receiving biologic treatments, because non-responders on biologics switch to conventional therapy and continue receiving such until the end of the model’s time horizon. bPatients may transition to death from any health state during any cycle. CDAI Crohn’s Disease Activity Index score

Key efficacy parameters used within the company’s model were either (1) observed or (2) derived and taken from the two pivotal trials of vedolizumab (GEMINI II and III [12, 13]) and from the NMA of anti-TNF-α therapy [5]. Key parameters were the transition probabilities in the maintenance phase. These were ‘calibrated’ using the Solver function within Excel, so that (a) the proportion of patients in remission at the end of the maintenance treatment (approximately at 1 year) predicted by the model matched the ‘expected’ proportion of patients in remission at the end of the maintenance phase; and (b) the proportion of patients with mild disease at the end of the maintenance phase predicted by the model matched the ‘expected’ percentage of responders in the induction phase with a drop of ≥70 points in the CDAI score and not in remission at the end of the maintenance phase. AEs were included in the model, and the EQ-5D utility scores from the GEMINI trials [12, 13] were used to represent the utility values for the disease health states. Management costs (healthcare resource use associated with inpatient treatment, outpatients visits, investigations and medications) for the different health states were taken from Bodger and Hughes [23] and uplifted to 2012.

Key results provided by the company are presented here. The full list of results is available in the CS [5]. Within the anti-TNF-α-naïve subgroup, the company reported the incremental cost-effectiveness ratio (ICER) for vedolizumab versus adalimumab to be £758,344 per QALY gained and that for infliximab versus vedolizumab to be £26,580 per QALY gained [5]. On the basis of a fully incremental analysis (performed by the ERG), vedolizumab was subject to extended dominance [2].

Within the anti-TNF-α-failure subgroup, the company reported the ICER for vedolizumab versus conventional non-biologic therapy to be £98,452 per QALY gained in its original submission to NICE [5]. Following publication of the ACD [24], the company submitted a revised economic model, which included the following modifications:

  • A focus on patients for whom anti-TNF-α therapy had failed (i.e. patients with an inadequate response to, loss of response to or intolerance of >1 anti-TNF-α agent).

  • Employment of a lifetime horizon.

  • Use of an assessment time of response in line with its licensing [10].

  • Inclusion of a revised patient access scheme.

  • Amendment of inputs and assumptions, including assumptions around mortality, and updating the health state costs, using resource use estimated through a survey conducted among eight clinical experts rather than the costs reported from Bodger and Hughes [23].

  • Amendments to the Markov trace and calculations.

These had the effect of reducing the company’s base-case deterministic (probabilistic) ICER from £98,452 (not reported) per QALY gained [5] to £21,620 (£27,428) per QALY gained [25] within the anti-TNF-α-failure population.

3.2.1 Critique of Cost-Effectiveness Evidence and Interpretation

The ERG [2, 3] critically appraised the company’s health economic analyses and the models upon which these analyses were based. In summary, the ERG identified a number of limitations, with the main limitations being described below. The ERG noted that the combination of all of these issues led to discrepancies between the model prediction and trial data in terms of the proportion of patients in remission in the placebo arm and responders to vedolizumab in the induction phase remaining on treatment and discontinuing treatment.

3.2.1.1 Limitations Regarding the Model Structure/Key Structural Assumptions

While the model structure was based on a previous economic evaluation by Bodger and Hughes [23], the ERG [2] noted the following:

  1. (a)

    The company’s model captured two key aspects of the condition: changes in disease severity (measured by the CDAI score) and the risk of surgery. The model ignored a key aspect of the condition in that CD is relapsing (exacerbation) and remitting (some patients may improve spontaneously).

  2. (b)

    Surgery was modelled as a single health state representing a mix of procedures.

  3. (c)

    The difficulty associated with parameterising the company’s chosen structure led the company to make a series of assumptions and adjustments that were not adequately justified by the evidence.

  4. (d)

    Key structural assumptions were debatable. These included the assumption that non-responders had moderate-to-severe disease; the lack of distinction between responders and non-responders with moderate-to-severe CD; the assumption of the same induction phase duration for all therapy; the relevance to clinical practice of a drop of ≥70 points in the CDAI score to identify patients going on to receive maintenance treatment; the end of scheduled maintenance at approximately 1 year; and a potentially optimistic assumption following discontinuation during biologic therapy and omission of discontinuation due to lack of efficacy.

3.2.1.2 Generalisability of the Population

The ERG [2] further noted that the population included in the economic model was based on the GEMINI trials [12, 13], which included only patients with a CDAI score between 220 and 450 points, and therefore may not be representative of clinical practice in England. The trial recruited participants from a large number of centres worldwide; therefore, conventional non-biologic therapy may not be generalisable to England. The ERG noted that interpretation of the results and the relevance of the mixed ITT population to the decision problem was open to debate. The ERG believed that patients who had previously received anti-TNF-α agents and those who were anti-TNF-α naïve were two distinct, defined patient groups, with different characteristics and propensities to respond to treatment, as demonstrated in the GEMINI trials [12, 13]. The appropriate comparators as chosen by the company were also different within these two populations. As such, the ERG believed that the use of vedolizumab in these groups represented two separate decisions.

3.2.1.3 Comparators and Treatment Regimens

The company’s analysis within the anti-TNF-α-failure subgroup excluded all other biologic therapy. However, use of a second anti-TNF-α agent following the failure of a first anti-TNF-α agent may be possible, particularly where loss of response has occurred due to development of antibodies to the first anti-TNF-α therapy; however, the ERG recognised the limited efficacy evidence available.

The ERG had concerns with the treatment regimens assumed in the company’s model. Notably, despite biologics having different treatment regimens, the company assumed the same induction phase duration for all therapies (6 weeks in the original model and 10 weeks in the revised model), with adjustment of the cost accordingly leading to discrepancies in the company’s model (in terms of the costing, cycle length and efficacy).

3.2.1.4 Parameterisation of the Company’s Model

The ERG [2] discussed the efficacy data that were used in the economic model—notably, the comparability of the data for the different biologics in the maintenance phase, the efficacy data used for conventional non-biologic treatment, the partial use of the NMA and lack of clarity of the derivation of inputs—in particular, the derivation of the transition probabilities during the maintenance phase, which were ‘calibrated’. The ERG observed that the calibration approach was complex and may have been unnecessary, as patient-level data from GEMINI II [12] were available and could have been used to estimate the transition probabilities in the maintenance phase in patients treated with conventional non-biologic therapy and vedolizumab. The ERG identified a number of limitations with the calibration approach used by the company—notably, that the target datapoints used in the fitting process seemed inconsistent with the datapoint the model was fitted to and that the derivation of the transition probabilities was dependent on structural assumptions and input parameters. Transition probabilities were assumed to be constant and applied to the remainder of the model, which was uncertain, given the lack of evidence after 1 year.

While the ERG [2, 3] recognised that there may have been limitations with health state costs taken from Bodger and Hughes [23], use of costs estimated from the clinician survey conducted by the company may also have been inaccurate. This is particularly important, given that this amendment to health state costs had a considerable impact on the ICER. The revised base-case ICER estimated by the company was £21,620 per QALY gained, using the updated cost for the CD health states, based on the clinician survey. Use of the original management cost for the CD health states from Bodger and Hughes [23] increased the ICER to £46,025 per QALY gained.

3.3 Additional Work Undertaken by the ERG

The key issues described above could not be addressed by the ERG [2, 3] without major restructuring of the economic model, which was not achievable within the timeframe for this STA. Incorporation of changes to the model was challenging, given the structure of the model and the lack of transparency. As a result, the ERG was not able to amend the economic model structure.

However, the ERG [2, 3] conducted additional scenarios analyses where possible, which included removing AEs, changing utility values associated with the surgery health state, amending the cost of adalimumab, assuming the same efficacy between the different biologics in the maintenance phase, accounting for lack of efficacy and assuming the same excess mortality rate for each CD health state. In summary, the additional exploratory analyses conducted by the ERG had a limited impact on the ICER in isolation (a variation in the ICER <5 %).

3.4 Conclusion of the ERG Report

In comparison with placebo, the addition of vedolizumab to standard care in patients with moderately to severely active CD who had previously had an inadequate response to, loss of response to or intolerance of conventional therapy or anti-TNF-α therapy was significantly more effective in terms of remission (defined as a CDAI score ≤150 points) at week 6 in the induction phase of GEMINI II [12]. However, in GEMINI III [13], there was no significant difference between vedolizumab and placebo in the primary endpoint of the proportion of patients achieving clinical remission at week 6 (defined as a CDAI score ≤150 points) in the anti-TNF-α-failure population.

In the maintenance phase of GEMINI II [12], patients treated with vedolizumab every 8 weeks and every 4 weeks had significantly higher rates of clinical remission at week 52 (defined as a CDAI score ≤150 points) in comparison with placebo.

There were, however, a number of limitations and uncertainties in the evidence base, which warranted caution in its interpretation. Key issues related to the high attrition rates in the maintenance phase of GEMINI II [12], the uncertainty about the long-term treatment effect, the duration of optimal therapy, and how and when withdrawal should be introduced. The primary endpoint was also not achieved in GEMINI III; therefore, the statistical evaluation of the secondary endpoints was exploratory. The results presented in the NMA were highly uncertain.

Changes made by the company in the revised economic model following the ACD had the effect of reducing the company’s base-case deterministic (probabilistic) ICER from £98,452 (not reported in the CS) to £21,620 (£27,428) per QALY gained in the anti-TNF-α-failure population [25]. It should be noted that most of the changes were attributable to two amendments that were subject to uncertainty: increasing the time horizon from 10 years to a lifetime; and updating the health state costs, using resource use estimated through a survey conducted among clinical experts.

The ERG [2, 3] identified a number of limitations that were believed to limit the robustness of the results presented by the company. These limitations could not be addressed by the ERG without major restructuring of the economic model. Therefore, the ERG concluded that the results from the company’s model needed to be interpreted with caution and that it was unclear whether the ICERs would increase or decrease following amendment of the identified structural issues.

4 Key Methodological Issues

The NMAs included in the CS [5] were also of varying quality and relevance. There was heterogeneity in the populations and outcomes in the studies that were included in the network. The company’s NMA was also likely to have underestimated the uncertainty in treatment effects, since fixed-effects models were used.

The health economic model submitted by the company was subject to a number of methodological issues that limited the credibility of the company’s results [2], including potential omission of key aspects of the condition, such as the relapsing–remitting nature of CD; simplification and debatable assumptions regarding surgery; the difficulty associated with parameterisation of the company’s chosen model structure; most notably, the derivation of the transition matrices; and debatable key structural assumptions. The combination of all of these issues led to some discrepancies between the model prediction and the observed trial data. These issues could not addressed by the ERG without major restructuring of the economic model.

5 NICE Guidance

The Appraisal Committee reviewed the data available on the clinical and cost effectiveness of vedolizumab, having considered evidence on the nature of moderate-to-severe, active CD and the value placed on the benefits of vedolizumab by patients with the condition, those who represent them and clinical experts. It also took into account the effective use of NHS resources.

In December 2014, the Appraisal Committee produced a preliminary negative recommendation [24] for the use of vedolizumab within its marketing authorisation, i.e. in adults whose disease had responded inadequately to, or had lost response to, either conventional therapy or an anti-TNF-α agent, or who could not tolerate either of these treatment types.

As part of the appraisal consultation process, the company provided further analyses of patients in GEMINI II and III [12, 13] for whom an anti-TNF-α agent had failed (i.e. patients with an inadequate response to, loss of response to or intolerance of >1 anti-TNF-α agent) [25] and submitted a revised economic model focusing on the anti-TNF-α-failure population, including a revised patient access scheme.

Following consideration of the evidence presented on the clinical and cost effectiveness of vedolizumab in patients for whom an anti-TNF-α agent had failed [25], NICE issued its final guidance [4] in August 2015 and recommended the use of vedolizumab as an option for treating moderately to severely active CD only if:

  • An anti-TNF-α agent has failed (that is, the disease has responded inadequately or has lost response to treatment).

  • An anti-TNF-α agent cannot be tolerated or is contra-indicated.

  • The company provides vedolizumab with the discount agreed to in the patient access scheme.

The guidance [4] also states that vedolizumab should be given as a planned course of treatment until it stops working or surgery is needed, or until 12 months after the start of treatment, whichever occurs sooner. The guidance recommends that at 12 months, patients should be reassessed to determine whether treatment should continue, and that treatment should continue only if there is clear evidence of ongoing clinical benefit. The guidance [4] recommends that for patients in complete remission at 12 months, cessation of vedolizumab should be considered, with treatment being resumed if there is a relapse. The guidance recommends that patients receiving vedolizumab should be reassessed at least every 12 months to decide whether continued treatment is justified.

The guidance [4] further states that patients whose treatment with vedolizumab is not recommended but was started within the NHS before this guidance was published should be able to continue treatment until they and their NHS clinician consider it appropriate to stop.

5.1 Consideration of Clinical and Cost-Effectiveness Issues

This section discusses the key issues considered by the Appraisal Committee. The full list can be found in the Appraisal Committee’s FAD [4].

5.1.1 Generalisability of the GEMINI Trials to the Likely Use and Population for Vedolizumab in Clinical Practice in England

The Committee [4] considered the generalisability of the populations enrolled in GEMINI II [12] and GEMINI III [13] to the populations that would be eligible to receive vedolizumab in clinical practice in England. The Committee heard from clinical experts that only a small number of patients seen in clinical practice have a CDAI score above 450 points and therefore considered the spectrum of disease activity of patients included in the trial broadly comparable to that seen in clinical practice. The Committee also discussed the induction regimens used in the GEMINI trials [12, 13] and heard from the clinical experts that the induction response would usually be assessed later than was done in the trials. The Committee considered that the two populations (anti-TNF-α naïve and anti-TNF-α failure) needed to be evaluated separately and that assessing response at week 6, as was done in the GEMINI trials, would not detect all patients whose disease responds to therapy.

5.1.2 Clinical Effectiveness of Vedolizumab

The Committee [4] discussed the efficacy estimates for vedolizumab from GEMINI II [12] in the induction phase in comparison with placebo and noted that vedolizumab was more effective at week 6 in inducing clinical remission in the ITT mixed population, patients who had not received an anti-TNF-α agent and patients in whom an anti-TNF-α agent had failed. The Committee considered the efficacy estimates for vedolizumab from GEMINI III [13] in the induction phase in comparison with placebo and noted that while vedolizumab did not meet the primary outcome for inducing better clinical remission in comparison with placebo at week 6 in patients in whom anti-TNF-α therapy had failed, a significant benefit was observed in week 10. The Committee then discussed the efficacy estimates for vedolizumab in the maintenance phase in comparison with placebo and noted that only GEMINI II provided 52-week evidence for this outcome. The Committee noted that vedolizumab showed higher remission rates than placebo in the mixed ITT population, patients who had never received anti-TNF-α therapy and patients in whom anti-TNF-α therapy had failed. The Committee also heard from the clinical experts that even a small absolute treatment effect would be perceived as beneficial, given the absence of alternative treatment options. After consideration of the clinical evidence, the Committee concluded that vedolizumab improved clinical remission in the induction phase and that vedolizumab was more effective than placebo in maintaining response up to 52 weeks in patients who had never received anti-TNF-α therapy and patients in whom anti-TNF-α therapy had failed.

The Committee [4] also considered the results from the NMA to estimate the relative effectiveness of vedolizumab compared with adalimumab and infliximab, but concluded that the results from the NMA were too uncertain in light of the ERG’s comments and the testimony from the clinical experts.

Finally, the Committee considered the evidence presented on the impact of vedolizumab on health-related quality of life and identified discrepancies in the reporting of the EQ-5D. These discrepancies could not be explained by the company; therefore, the Committee was not able to conclude whether vedolizumab would have an effect on the EQ-5D value, but noted that results using other assessment tools suggested that vedolizumab could improve quality of life.

5.1.3 Uncertainties Around the Model Structure and Plausibility of Assumptions and Inputs Used in the Economic Model

The Committee [4] considered the model structure used by the company and concluded that it was uncertain whether the model was structurally sound in light of the number of concerns expressed by the ERG, but that, overall, it was acceptable to inform its decision-making. The Committee then went on to discuss the dosing assumptions and the assessment of response used in the economic model, and considered that the dosing assumptions used in the revised economic model were appropriate.

The Committee discussed the discontinuation rule assumed by the company, whereby biologic treatments would be stopped after a maximum of 1 year. The Committee heard from the clinical experts that patients at high risk of relapse or surgery are likely to remain on treatment after 1 year, but that they would try to stop treatment if it was not needed. The Committee considered that the assumption made by the company was not unreasonable, but that in clinical practice, patients could be treated for longer.

The Committee [4] considered whether the time horizon used in the original model (10 years) was appropriate, and concluded that, while there was uncertainty in the long-term extrapolation, given the few data available, the use of a lifetime horizon in the revised economic model was more appropriate.

The Committee considered the modelling of long-term AEs and noted that AEs associated with the long-term use of corticosteroids, such as diabetes and osteoporosis, were not included and were likely to improve the cost effectiveness of vedolizumab in comparison with conventional non-biologic treatments.

Finally, the Committee considered the modelling of surgery, health state costs and mortality rates, and was generally satisfied with the assumptions used in the revised economic model, but highlighted that there was some uncertainties.

6 Conclusion

Vedolizumab appears to be more effective in both the induction and maintenance phases, in comparison with placebo, in patients with moderate-to-severe, active CD who have had an inadequate response to, loss of response to or intolerance of conventional therapy or anti-TNF-α therapy. The effectiveness of vedolizumab compared with adalimumab and infliximab is unknown and uncertain in the absence of head-to-head, randomised, controlled trials, and given the differences between the studies included in the NMA. The ERG identified a number of limitations that were believed to limit the robustness of the results presented by the company. These limitations could not be addressed by the ERG without major restructuring of the economic model. Therefore, the ERG concluded that the results from the company’s model needed to be interpreted with caution, and that it was unclear whether the ICERs would increase or decrease following amendment of the identified structural issues. Nevertheless, after taking into account the uncertainty in the modelling of the long-term treatment effect of vedolizumab and structural assumptions, the absence of modelling of long-term AEs associated with corticosteroids and the high unmet need in patients in whom anti-TNF-α therapy has failed, the Committee considered that, on balance, vedolizumab could be considered cost effective, and it recommended vedolizumab in this population, provided that the company provides vedolizumab with the discount agreed to in the patient access scheme [4]. The Committee [4] also considered the high unmet need of a subgroup of patients who cannot take anti-TNF-α therapy and in whom vedolizumab would provide the only medical alternative to conventional non-biologic therapy, and concluded that vedolizumab could be prescribed for this population provided that the company provides vedolizumab with the discount agreed to in the patient access scheme. The Committee [4] did not recommend the use of vedolizumab in patients who have never received anti-TNF-α therapy and are able to receive it.