FormalPara Key Points for Decision Makers

Evidence on the cost effectiveness of bladder cancer screening is consistent but very limited.

Bladder cancer models rely on data with high uncertainty such as international data and assumptions.

In the absence of sufficient data for complex models, more trials are needed to inform the parameters of natural history disease models, which in turn can inform the protocols of the trials to test the bladder cancer screening interventions.

1 Introduction

Bladder cancer (BC) is a common malignancy with its highest burden falling on economically developed countries [1,2,3]. Worldwide, BC ranks sixth in men and 17th in women with the lifetime incidence risk of 1.1% and 0.27%, respectively [1]. The risk of BC increases with age and the higher risk for men than women reflects a higher exposure to carcinogens [1,2,3]. Tobacco smoking is the strongest risk factor, accounting for an estimated 50–65% of all BC cases [4, 5]. Other common risk factors include occupational exposure [6, 7], contamination of drinking water with arsenic and a family history of BC [1, 2].

Bladder cancer is usually first suspected because of visible haematuria or urinary symptoms [8, 9]. At the time of diagnosis, around 75% of patients have non-muscle-invasive BC (NMIBC) [10], which generally has a favourable prognosis. However, around 15% of patients with NMIBC will progress to invasive disease with a much lower expected survival [11]. The diagnostic procedures for symptomatic patients may include: cystoscopy, telescopic endoscopy, ultrasound and/or computed tomography [4]. Screening (i.e. detection of asymptomatic cancers) has been demonstrated to provide survival benefits in prospective studies [8]. However, there remains no conclusive evidence on the effectiveness of the implementation of either national or regional BC screening programmes [1, 8].

In clinical trial settings, several BC screening approaches have been explored [12]: urine dipstick is often considered as a screening intervention in primary care settings, with the potential for urinary biomarkers as well as cystoscopy with ultrasound or computed tomography [8, 13]. Guidelines from professional organisations across different countries, including the USA, Canada, the UK, Japan and the Netherlands, are consistent in recommending an evaluation for asymptomatic microscopic haematuria [14]. However, the recommendations vary regarding screening interventions, particularly the role of urine dipstick and how to define the target screening population [14].

From an economic perspective, BC is one of the most expensive malignancies to manage, with the follow-up costs being twice as high for medium-risk disease and five times as high for high-risk disease compared with low-risk (NMIBC) disease [15]. As multiple BC screening options emerge, modelling studies are often used to assess optimal screening regimes and outcomes prior to large-scale recommendations. The aim of this study was to classify the approaches that have been used in cost-effectiveness models in BC screening and early diagnosis with a specific focus on understanding the modelling methods that have been applied, the structure of the economic models, and modelling inputs and parameterisation. This review also summarises the main outcomes of the identified cost-effectiveness models.

2 Materials and Methods

An initial scoping search was conducted in September 2021 to identify existing reviews. No reviews of BC natural history or cost-effectiveness models were identified; however, search strategies from previous reviews of diagnostic and treatment interventions, and a review of the economics of BC were used to define the most appropriate search terms [16,17,18,19]. As the scoping search identified few studies, the literature scope was then expanded to include diagnostic and surveillance models to provide a comprehensive understanding of BC modelling. The International Society for Pharmacoeconomics and Outcomes Research Good Practices Task Force Report on Critical Appraisal of Systematic Reviews With Costs and Cost-Effectiveness Outcomes was followed in the development of the protocol and reporting of these studies [20]. The protocol registration number in the Prospective Register of Ongoing Systematic Reviews (PROSPERO) is CRD42021281256.

Based on the initial scoping review, a systematic search was conducted in MEDLINE via PubMed, Embase, EconLit and the Web of Science databases. This search was supplemented by searching the Health Technology Assessment database of the Centre of Reviews and Dissemination of the University of York, the National Institute for Health and Care Excellence appraisal system, the Open Access Theses and Dissertations (https://oatd.org), Google Scholar (the first 300 hits in the search for “bladder cancer”, “cost-effectiveness”, “model”) and the references of the included studies. The search period in the review was restricted from 01/01/2006 to 08/09/2021 to reflect current practice both with cost-effectiveness modelling methods and early detection pathways. The development of the search strategy was based on the recommendations of the UK InterTASC Information Specialists’ Sub-Group [21]. The search strategy was validated on the modelling studies identified through a targeted search. An example of the search strategy developed for one of the databases is reported in the Electronic Supplementary Material (ESM). An targetted update of the literature search was conducted in May 2022.

Studies in any language were included if they met the following criteria:

  • Population: human adult population;

  • Intervention: bladder cancer screening or diagnostic interventions;

  • Design: model-based research (either cost-effectiveness models or natural history models of bladder cancer);

  • Perspective/time horizon: any;

  • Publication type: original studies; full-text publications or reports.

Exclusion list:

  • Risk models, animal models, lab models, in vitro models, regression statistical models assessing relationships between the parameters or only cost assessments;

  • Reviews of the literature, protocols, commentaries and conference abstracts.

Titles and abstracts were screened by the first author (OM) using the Rayyan tool to synthesise the studies that fit the inclusion criteria [22]. The full texts of the articles were independently evaluated by a second researcher (AIH), who also validated the data extraction and duplicated the quality assessment for each of the included studies.

The extraction tables included the categories on several dimensions: (1) general information (authors, publication year, country, setting, funding) and PICO (Population, Intervention, Comparator and Outcome; (2) modelling methods (model type according to the taxonomy of model structures for economic evaluations of health technologies [23], software, cycle, time horizon, disease states, discounting, inflation, methods used for costs and outcomes, parametrisation approach and sensitivity analysis); (3) data sources; (4) choices in modelling BC; and (5) quality of the studies using the Philips checklist [24] and the Bilcke et al. guide on uncertainty evaluation [25].

The standardised evaluation of the included models was based on two instruments: the Philips checklist [24] and the guide on uncertainty evaluation by Bilcke et al. [25]. The Philips checklist included the questions on the structure (S1–S9), data (D1–D3), and consistency (C1, C2) [24]. The questions on uncertainty (D4) were excluded from the Philips checklist, while have been guided by the Bilcke et al. methodology [25] to avoid incompatibility between the instruments (this approach was selected as more detailed and explicit, see the ESM for the details). The ranking options of the Philips checklist included “yes”; “partially”, “can’t tell” and “no” (all treated as “no”); or “NA”.

The approach for data synthesis was consistent with the International Society for Pharmacoeconomics and Outcomes Research Good Practices for systematic reviews with cost and cost-effectiveness outcomes [20]. A narrative synthesis was used to address qualitative aspects of model design, including model scope, methods and choices in modelling BC. For screening studies, graphical synthesis reported standardised (inflated to 2022 and converted to international dollars) incremental cost-effectiveness ratios to visualise the cost-effectiveness outcomes by underlying disease prevalence, using the consumer price index and purchasing power parities to standardise the values [26,27,28]. Graphical synthesis of the outcomes for the diagnostic studies was not undertaken because of heterogeneity in PICO, methods and health settings [20].

3 Results

3.1 General Description and PICO

Our search identified 3082 records, of which 18 models—four on BC screening and the remaining on BC diagnostic or surveillance interventions (Fig. S1 of the ESM)—met our inclusion criteria. The excluded full text articles are reported in the ESM.

All included models were developed in high-income countries, with nine of them within the US context (Tables 1, 2). Payer perspective was mentioned in the majority of the studies (n = 12) with two studies stating the societal perspective but reporting the inputs for the direct medical costs only [29, 30].

Table 1 Bladder cancer screening cost-effectiveness studies: PICO and the outcomes
Table 2 Bladder cancer diagnosis and surveillance cost-effectiveness models: PICO and the outcomes

Three and two out of four screening models simulated high-risk and general-risk populations, respectively [29, 31,32,33]. High-risk groups were defined in the models as heavy smokers and those with occupational exposure, and as any male individual above the specified age. Two related cost-effectiveness studies assessed biochemical bladder markers [32, 34] as an intervention for BC screening, and two assessed dipstick haematuria testing [29, 33] (all compared to no screening, Table 1).

The diagnostic models included patients with haematuria (n = 5), NMIBC (n = 8) and muscle-invasive BC (n = 1). A range of different diagnostic and surveillance interventions were assessed in the models. Hexaminolevulinate blue light cystoscopy and white light cystoscopy (WLC) were the most frequently compared interventions, followed by cystoscopy as a stand-alone or a combination of the interventions (Table 2).

In screening models, two out of four studies reported quality adjusted life-years (QALYs) [29, 33] and one more life-years saved (LYS) [31] (Table 1). In diagnostic models, QALYs were reported only in four out of 14 studies [35,36,37,38] and two more studies reported LYS only [39, 40], with cases detected and resource utilisation used as the primary modelling outcomes (Table 2).

3.2 Screening Models: Outcomes

The models that evaluated haematuria tests included the impact on bladder and kidney cancers, as well as other urological diseases. All studies concluded that BC screening is cost effective in either all populations (n = 1) or only high-risk population groups (n = 3, as defined using BC demographic features) (Table 1).

All studies concluded that screening is more cost effective with a higher incidence or prevalence of the disease (Fig. 1). There was no homogeneity in a value of BC prevalence or incidence that would define when screening becomes a cost-effective intervention. Cost per cancer detected was the lowest in the older age group (71–80 years) with the highest disease prevalence [24]; although no studies compared cost per QALY for populations among different ages to examine how cost effectiveness of screening varies by age.

Fig. 1
figure 1

Incremental cost-effectiveness ratio for bladder cancer screening with different prevalence rates for the disease. Notes: Squares reflect the outcomes “per cancer detected”, circles reflect the outcomes the life-years saved or quality-adjusted life-years. The incremental cost-effectiveness ratios under the axe represent cost-saving outcomes. The grey circle reflects the incremental cost-effectiveness ratio in the UK study (assumed cost-effectiveness threshold £20,000)

3.3 Diagnostic Models: Outcomes

White light cystoscopy dominated the computed tomography scan [41], the protocol including a microsatellite analysis with control cystoscopy at 3, 12 and 24 months [30], and the protocol using virtual cystoscopy followed up by cystoscopy if the first test is positive [42]. Interventions that supplemented cystoscopy had higher costs and effects, while tumour markers had higher costs and varied values for clinical effects [30, 39, 41, 42]. The strategy of using the cystoscopy only for positive cases with other primary diagnostic tools (such as urine cytology or cystosonography) had lower costs and effects [38, 42]. Compared with hexaminolevulinate blue light cystoscopy, WLC had higher costs in two out of four studies [36, 43,44,45]. These studies concluded that hexaminolevulinate blue light cystoscopy had higher therapeutic effects than WLC, and is therefore likely to be cost effective [36, 43,44,45]. Only one of the included studies [28] assessed incremental cost-effectiveness ratio as costs per QALY (the intervention was considered as dominating). Three other studies [32, 35, 36] assessed cost per progression, recurrence or resource use, leaving a high uncertainty around interpretation of their results. The heterogeneity in the choice of other evaluated diagnostic interventions and their comparators was too large to support a systematic comparison (Table 2).

3.4 Screening Models: Methods

Two screening models used a decision tree and two others used Markov model structures [29, 34] (Table S1 of the ESM). All screening models were cohorts rather than individual patient-level models.

The models with decision tree structures predicted the potential health and cost impact of screening interventions by combining the characteristics of the screening tests (such as sensitivity and specificity) with underlying BC prevalence data [32, 33]. Average life expectancy by stage among the modelled population group (75-year-old men) was used in the decision tree model predicting LYS and QALYs over the lifetime [33]. The models with Markov structures (one with a lifetime and another one with a 5-year horizon) used a decision tree to model the screening and diagnostic pathways leading to the detection of BC; patients with the diagnosed BC entered one of the BC states (Markov model) and could undergo recurrence, surveillance, progression or death [26, 27].

3.5 Diagnostic Models: Methods

All but one diagnostic model [39] had a time horizon of 5 years or less. Five out of 14 diagnostic and surveillance models had a decision tree cohort structure [36, 40,41,42, 45], and one model was a simulated patient-level decision tree model [46] (Table S2 of the ESM). The decision tree structure was applied mainly in the diagnostic and surveillance models with the focus on clinical or healthcare outcomes (e.g. cancers detected, or healthcare resources used, and not LYS or QALYs); similar to screening models, the decision tree structure was used to model the diagnostic and treatment pathways based on sensitivities of the tests. In the simulated patient-level decision tree model of Georgieva et al., patients were assigned individual characteristics (including sex, age, smoking status and history of gross haematuria), and the probabilities of different types of urinary tract cancers were based on these characteristics at diagnosis [46]. This model predicted the number of detected and missed cancers, which allowed for the assessment of costs and the cost effectiveness of each intervention based on the sensitivity and specificity of each diagnostic test.

Seven diagnostic models were cohort-level Markov models [30, 35, 37, 39, 43, 44, 47] (one of them was a semi-Markov model [30]), and one model was an individual-level Markov model [40]. Markov states were used in the cohort models to simulate the transitions during the surveillance period (i.e. after the diagnosis), such as progression, recurrence of the disease or death. The simulated patient-level Markov model of Yuan simulated the natural history of secondary BC to assess the impact of different diagnostic guidelines, with the Markov states including the natural history of BC, treatment and death [38].

3.6 Screening Models: Sources of Data

Screening models were directly parameterised from published sources and/or registers and were based on assumptions on the disease incidence, prevalence and screening effect (e.g. downstaging) [32,33,34] [Table S3 of the ESM]. Base-case epidemiological inputs, such as incidence, were based on experts’ or researchers’ assumptions. The definition of high-risk populations varied by study, from 2% for prevalence to 10% for incidence [29, 31,32,33]. Data on costs were retrieved from the databases (Medicare, National Health Service reference costs and the National Health Insurance) and supplemented with data from local hospitals and expert opinions [24,25,26,27]. Three studies [31,32,33] used other inputs from published sources; the screening accuracy and downstaging data were retrieved from meta-analyses of international studies, individual publications, clinical experts’ and authors’ opinions. Models had differing assumptions on screening test sensitivities, which ranged from 60 to 100% for different tests (dipstick tests or biomarkers) and population groups (average risk or high risk) [32,33,34]. Two studies (with the UK and Japan context) reported QALYs as the outcome measures and both retrieved utility values from previous cost-effectiveness analyses, including those conducted in other countries (from Canada and the USA, respectively). A recent study by Okubo et al. evaluating the cost effectiveness of combining haematuria screening with a Specific Health Checkup (where a haematuria test is already performed for around 38% of participants) informed the transition probabilities by the Specific Health Checkup report and the National Cancer Registry data [29].

Specificity of the primary tests in the screening models (with values ranged from 60 to 99.9%) impacted the follow-up interventions and costs of diagnosis [24,25,26,27]. None of the models reported screen-induced overdiagnosis, overtreatment or other potential screening-related harms.

3.7 Diagnostic Models: Sources of Data

Most of the models were directly parameterised from published sources (i.e. used published data as direct model inputs) with one study also using a within clinical trial assessment [30] and two others manually calibrating some of the disease parameters by using the data from the European Organization for Research and Treatment Center as calibration targets [38, 47] (Table S4 of the ESM). Expert elicitation, assumptions and published sources were used for epidemiological data, with all but three studies referencing international data for some of the parameters including sensitivities, disease severity and progression [35, 37,38,39, 41,42,43,44,45,46,47]. National datasets (such as Medicare for all the US studies, National Health Service reference costs or the National formularies) were used in all but two studies with in-hospital cost calculations [30, 44] to estimate the direct medical costs. Variable uptake for the diagnostic and surveillance interventions was not considered in the included models, as it was not measured empirically for the evaluated interventions. Diagnostic studies included harms (n = 7) related to unnecessary tests for those with false-positive diagnoses, complications from invasive diagnostic and treatment procedures, including mortality from radiation-induced tumours and anaesthesia based on published data [30, 37,38,39, 41, 42, 46].

Three out of four studies reporting QALYs retrieved health-related utility values from previous cost-effectiveness studies [35, 37, 38]; all three studies (two from the USA and one from the UK) referenced a cost-effectiveness analysis of radical cystectomy in Canada that evaluated related utilities based on a standard gamble approach involving 25 urologists [48]. Mowatt et al. used utility values from the other urological cancers [39] stating that the modellers selected the best available source of the evidence to inform health-related utility values. While the study of Mowatt et al. [39] is not recent, the reliance of the later studies on qualitative data from the previous model suggests that scarcity in utility values may still be an issue.

3.8 Modelling BC

The identified models defined BC states in the following ways (Table 3):

  1. 1.

    Without a standard classification system defining the cancer as detected, progressed and/or recurrent [30, 35, 40, 42, 45,46,47].

  2. 2.

    Using Tumour, Node, Metastasis (TNM) system or its elements [34, 36] or numerical staging [29].

  3. 3.

    Using risk-based classification states, such as NMIBC of low, intermediate and high risk, and non-metastatic and metastatic muscle invasive BC [33, 37, 39, 43, 44].

Table 3 Modelling bladder cancer in diagnostic and screening models

Some of the diagnostic and screening models simulated population groups including patients with asymptomatic microscopic haematuria [41], microscopic haematuria [39, 42, 46] or suspected haematuria [42], while predicting outcomes from the time of the diagnosis. However, none of the models simulated a complete natural history (i.e. progression of asymptomatic disease from primary cancer onset). Screening or diagnostic models can be divided into several types according to the inclusion of the natural history components.


1. Models Without Progression of Undiagnosed Cancer

Models of this type simulate effects and costs based on stage at diagnosis for screen-detected and symptomatic disease and did not consider cancer progression [33,34,35,36, 40,41,42, 45, 46] or considered only progression for diagnosed disease [29, 30, 47]. These models were informed by the assumed or evidenced incidence rates and test sensitivities. When modelling the consequences of a false-negative test instead of disease progression, these models assessed incremental costs. For example, the diagnostic study of Rodgers et al. [42] considered costs of repeat testing for microscopic haematuria with false-negative diagnosis. Teoh et al. [33] applied higher lifetime treatment costs to false-negative screened patients, similar to those detected symptomatically.


2. Models with Progression of Undiagnosed Cancer as a Result of a False-Negative Test

These models simulate progression to more advanced BC states for patients with a false-negative test result by combining prevalence data and characteristics of screening tests [37, 39, 43, 44]. For example, the diagnostic model of Sutton et al. included an undiagnosed state for patients with false-negative results and assumed that these patients will be diagnosed within the next 2 years; patients in an undiagnosed state could progress to low-risk, high-risk or metastatic states and could then be diagnosed [37].


3. Models with Progression of Asymptomatic Cancer

The only model that included undiagnosed states for BC that were not related to testing false negative (i.e. asymptomatic cancer) was a clinical surveillance model of Yuan et al. [38]. This model simulated the natural history of secondary BC for patients defined as low risk at the time of diagnosis and were disease free following the treatment. This model assumed a progression of patients from treated low-risk BC, to asymptomatic intermediate risk and then finally high risk. At each of these states, patients could transit to the detected state following the surveillance intervention. Diagnosed patients could not progress to more advanced disease but could progress to the death state as a result of BC death or age-specific death from other causes. The progression of asymptomatic disease was estimated by comparing the predicted disease rates to the one observed in the European Organisation for Research and Treatment of Cancer trials [49]. The process of calibration is not described in the article.

3.9 Quality Ranking Using the Philips Checklist

In general, studies addressed most of the evaluated quality criteria of the Philips checklist (Table S5 of the ESM), with 14 studies were scored “no” only on 30% or less questions. Meanwhile, assessment of internal and external consistency was not reported in 17 and 14 studies, respectively (possibly being reported in separate publications or reports). A short time horizon was also a frequent concern (n = 8 out of 18 studies) in the models (Fig. 2). The quality of two older screening studies was lower than the quality of later screening models and the diagnostic studies; however, because only a few studies were identified, no meaningful comparison can be provided. Agreement between the two reviewers for each category of the Philips checklist [24] was very high at 92%.

Fig. 2
figure 2

Critical appraisal of the economic models using the Philips et al. checklist. Notes: Dimensions of quality in the Philips et al. checklist: S1 clear statement of decision problem, defined objectives and decision makers; S2 clear statement, justification, and consistency of scope and perspective; S3 rationale for structure explained and based on evidence; S4 structural assumptions justified and reasonable; S5 strategies/comparators defined with all the options considered; S6 model type based on decision problem; S7 sufficient and justified time horizon; S8 disease states/pathways reflect biological process; S9 cycle length justified by the nature of the disease; D1 data identification is transparent, appropriate, justified and high quality; D2a baseline data described and justified; D2b treatment effect based on recognised meta-synthesis, justified extrapolation and survival, with all assumptions documented and justified; D2c costs and discounting accord with standard guidelines; D2d quality of life weights (utilities) appropriate, justified and referenced; D3 Data incorporation justified and transparent; C. internal and external consistency is evaluated. The categories used: “yes”, “no” (no, partially, or can’t tell), “NA” (not applicable

3.10 Structural and Parameter Uncertainty in BC Models

Structural uncertainty in screening models was related to different structural assumptions, such as using a decision tree structure to ascertain long-term outcomes, choice of static probabilities, using the BC cases detected as the modelling outcome (instead of the LYS or QALYs), methods and assumptions on BC mortality/survival use in modelling and mismatch between the selected perspective and costs [29, 31,32,33]. Structural uncertainty was not fully addressed in screening models (Table S6 of the ESM), with the study of Teoh et al. partially exploring structural uncertainty by specifying the availability of sources of evidence, their appropriateness and the limitations. Parameter uncertainty (related to the assumed epidemiologic values, unrepresentative populations [using international data or small sized samples] or unspecified sources) was present in all screening studies (Tables S3 and S6 of the ESM) [29, 31,32,33]. None of the published articles mentioned the model validation. Only the most recent study by Okubo et al. fully addressed parameter uncertainty by the explicit probabilistic sensitivity analysis [29].

In diagnostic models, short-term time horizon, static transition rates, choice of the outcomes (only BC cases detected or resource use and not LYS or QALYs), the approaches to test sensitivity, incidence, disease progression, recurrence and BC mortality evaluation were recorded among the other sources of structural uncertainty (Table S6 of the ESM) [7, 30, 35,36,37,38,39,40,41,42,43,44,45,46,47]. While most models did not report on structural uncertainty, Klaassen et al. [44] and Mowatt et al. [39] addressed the structural uncertainty by conducting scenario analyses, while three other studies [37, 38, 42] partially addressed structural uncertainty by explicitly specifying the accepted and the alternative assumptions. Similar to the screening models, parameter uncertainty was identified in all included studies and was addressed through the probabilistic sensitivity analysis in five diagnostic studies [37, 39, 41, 42, 46]. Two publications described a validation conducted for the diagnostic models (one of them with the calibrated parameters) using survival data or the risk distribution [38, 46].

3.11 Summary from Studies with Low Uncertainty

Studies that addressed at least partially both structural and parameter uncertainty [25] also were ranked high on Philips checklist criteria [24]. All three studies with explicitly addressed structural and parameter uncertainty (two of which were reports) were the diagnostic studies [37,38,39]. Mowatt et al. [39] analysed multiple diagnostic interventions concluding that cytology followed by WLC in initial diagnosis and follow-up while being the least effective strategy is the most cost-effective approach in the UK setting. Sutton et al. concluded that a diagnostic classifier for risk stratification of haematuria patients is cost effective in the UK with a probability of 68% [37]. Yuan et al. compared the long-term clinical effect of different guidelines in the US setting and concluded that none of the comparators dominate each other [38].

4 Discussion

This review explored methods used in modelling the cost effectiveness of diagnostic, surveillance and screening interventions in BC. The screening models evaluated the cost effectiveness of biomarkers and urine dipstick tests in general-risk and high-risk populations; all screening studies concluded that screening is cost effective with the underlying disease prevalence being its important determinant. The earlier models evaluating the cost effectiveness of biomarkers [31, 32] though had low quality and high structural and parameter uncertainties.

Diagnostic models assessed a wide range of interventions. In studies of variable quality, hexaminolevulinate blue light cystoscopy was consistently considered as a cost-effective intervention compared with WLC. The studies with low structural and parameter uncertainty concluded on the cost effectiveness of cytology followed by WLC in the initial diagnosis (compared with multiple alternatives) [39] and a risk stratification approach for patients with haematuria in the UK [37]. Diagnostic models had variable predictions on the cost effectiveness of urine biomarkers in BC diagnosis (reporting higher costs and variable effects compared with their alternatives), with a high-quality model with low uncertainty reporting that tumour markers are not cost effective in the UK setting [39].

The conclusions of the cost-effectiveness analyses are subject to provisos regarding limitations of the methods used and available data constraints, with the following discussion points identified:


(1) Correspondence of the PICO to the decision problem

The description of the population (asymptomatic, symptomatic, or diagnosed with NMIBC or muscle invasive BC) defined the initial and the following states of the models. The choice of the intervention will affect the model design because some screening and diagnostic tests, such as the urine dipstick test, may also lead to the diagnosis of other diseases (e.g. kidney cancer or other urological conditions). As such, BC models should assess the need to include the simulation of other relevant health conditions to avoid underestimating the potential benefits of screening and diagnostic interventions.

While patients, interventions and comparators were well defined in BC models, the economic outcomes investigated were more inconsistent. Bladder cancer models frequently reported cost per detection, recurrence, progression or resources used as the main outcome. While these outcomes may be interesting in their own right, they are inadequate in two regards: first, they do not capture the long-term mortality or health-related quality-of-life impacts of early or delayed detection; second, they do not allow comparative economic analyses across different health conditions and thus cannot inform policy decisions [50].


(2) Choice of the model structure

Selection of the model should be based on the simplest structure that addresses the objectives of the study, the structure of the disease and the clinical guidelines or treatment pathways [23]. The healthcare decisions, particularly large investments such as national screening programmes, should consider uncertainty that cannot be reflected in deterministic models. In cancer modelling, timing is important for costs and health outcomes, as costs are commonly higher the first year of diagnosis than the following years [51] and cancer-related decrements in health related utilities vary over time [52]. While stochastic timed models without interaction would be the expected choice for most BC screening and diagnostic models, in our review, most of the included models were deterministic, and the decision tree structure was used in more than one-third of all the analysed models.


(3) Modelling natural history of bladder cancer in screening models

In comparison to breast, cervical and colorectal cancers [53,54,55], the evidence pertaining to the cost effectiveness of BC screening is currently limited. As such, BC models are less sophisticated and have a much greater reliance on expert judgement than models for cancers with well-established screening programmes. Only one natural history model, without a cost-effectiveness component, was identified. However, as this model simulated only secondary BC cancer, it is not directly applicable to a screening population [38]. None of the cost-effectiveness models simulated a complete natural history (i.e. a progression of asymptomatic primary BC from cancer onset), which hinders cross-comparisons between modelling predictions. While there is some understanding of the BC risk factors, onset, progression and recurrence [11, 56], modelling natural history of BC is constrained by a lack of direct or indirect data that are able to: (a) inform the progression of asymptomatic disease (e.g. dwell time) and (b) inform long-term clinical outcomes (e.g. survival) in complex individual-level models or when the model states are consistent with the detailed histology of the disease. The absence of the natural history modelling leads to a general limitation of published BC screening models. Such models are not nimble enough to compare different designs of screening programmes or accurately predict a long-term effect of repeated screenings or the impact of screening on screening-related harms, such as overdiagnosis.

Modelling a complete natural history in screening models requires a complex structure and a life-time horizon to capture the long-term effect and harms. There is a high requirement in data for indirect parametrisation of such models (i.e. calibration of the parameters to inform the transitions in unobserved health states), including the prevalence of undiagnosed cancer, speed of cancer growth or sojourn time, and the probability for cancer spontaneous regression or recurrence [47], which in turn implies that more modelling inputs need to be evaluated for their quality in screening models compared to diagnostic models.


(4) Uncertainty in bladder cancer modelling

Structural and parameter uncertainty is common in screening and diagnostic BC models. This uncertainty relates to both the epistemic uncertainty in the applicability of data (e.g. using the international data or assumptions), an aleatory uncertainty with a frequent reliance on deterministic analysis and a lack of validation or scenario analyses to explore uncertainty in model structures.

The parameter uncertainty in the identified models suggests a possible scarcity of sources to inform country-specific parameters and a need to assess the transferability of sources available for modelling. In particular, the data need to be improved to inform health-related utility values in BC models.

While a clinical effect of medical interventions is generally considered to be generalisable, there may be specific considerations that make this less so for diagnostic tests, especially for screening interventions [57, 58]. It is common for cancer screening models to assume that disease onset is a setting-specific transition relying on a set of risk factors, while cancer progression consists of generalisable parameters [59]. This assumption, mainly based on a lack of data to state otherwise, suggests that careful consideration should be taken to generalise the baseline disease risk from other settings [57]. Considering that all models were developed within the context of high-income countries, neither their outcomes nor their inputs are generalisable to the middle-income or lower-income settings.

4.1 Implications for Research

While empirical evidence is necessary to inform the modelling parameters and to improve predictions, mathematical disease models are also used to inform the trials’ design [60, 61]. As such, development and implementation of trials needed to inform the models and models to inform the trials should be an iterative process. This also suggests that BC models informed by the limited trial data should be flexible enough to incorporate this iterative process when the new data appear, especially where this has the potential to inform developments to model structure in addition to simple parameter updates. The utility values for BC health states as well as population preferences for different diagnostic and screening interventions, currently not considered in the mathematical disease models, should be explored in future studies.

4.2 Limitations of the Review

While this review sought to search comprehensively the literature, there are limitations to note as well. Only one reviewer screened the initial abstracts, which may have resulted in missed studies or an unintentional bias in the initial search. To assuage any further bias, two independent reviewers assessed the full texts of the included publications and the quality of studies. Moreover, two of the included publications were grey-literature reports (i.e. publications that did not go through the formal peer-review process), which may not appear in a systematic search if the reproduction of the search strategy is attempted. To standardise the quality assessment, the Philips checklist [23] was used, with a very high average agreement rate among the raters (92%). However, some of its components, such as a short time horizon, are better suited for screening studies rather than diagnostic studies. Moreover, some limitations of the appraised health economic studies may be reasoned by compliance to local guidelines. Finally, all the models that were included in this review were from high-income countries, and therefore may not be generalisable to other populations across the globe.

5 Conclusions

Although the evidence pertaining to the cost effectiveness of BC screening is consistent, it is still in its nascent stages. More data are needed to systemically address uncertainties in models, as well as the natural history of BC. This suggests that BC models are not nimble enough to compare different designs of screening programmes, or to predict screening-related harms such as overdiagnosis. Future clinical trials may help to decrease uncertainty in the structures and parameters of BC models, as all models rely on data. Once the natural history of BC models is established, these models can then inform optimal population screening and surveillance strategies that may not be possible to evaluate in the scope of clinical trials.