Keywords

Urine-based testing would seem to be the obvious diagnostic choice for bladder cancer. Conceptually, an ideal diagnostic test would be simple and application of the test would determine if the disease is present or absent. However, like all diagnostic tests for cancer, urine-based tests for bladder cancer suffer from poor performance, limited clinical utility, and the potential for introducing harm. Consequently, none are universally recommended diagnostic tests for use in the evaluation of patients at risk of having bladder cancer [1,2,3]. Despite this fact, extensive investment into the research and development of urine-based technologies promising to be better bladder cancer tests continues to be made [4].

Test Performance Characteristics for Urine-Based Tests

Diagnostic test accuracy can be summarized using the QUADRAS-2 tool [5], and the STARD initiative was developed to make reporting of diagnostic accuracy studies complete and transparent [6]. Describing the performance of a urinary test is usually done using several metrics including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), the diagnostic likelihood ratio positive (DLR+), and the diagnostic likelihood ratio negative (DLR-) [7]. It is important to recognize that the calculation of these parameters requires that the test result be binary; either positive or negative. For non-binary tests with results reported on ordinal or continuous scales, alternative methods are available (e.g., receiver operating characteristic (ROC) curves) that summarize test performance appropriately. In these cases of non-binary results, however, “optimal” thresholds are often selected by researchers in order to dichotomize the test. This allows results to appear binary and therefore make more straightforward binary result analysis possible. However, it is important to recognize that dichotomizing a continuous test using a sample-driven threshold can lead to several biases and should be undertaken with great care [8, 9].

Using Table 8.1, the prevalence of disease in the sample is calculated as \( \rho =\frac{a+c}{a+b+c+d} \). Sensitivity is the probability of a positive test if the subject truly has the disease and is calculated by \( Sens=\frac{a}{a+c} \). Specificity is the probability of a negative test if the subject truly is disease-free and is calculated by \( Spec=\frac{d}{b+d} \). Note that since the specificity and sensitivity are calculated within the columns of the contingency table, they are not affected by the prevalence of disease in the sample. This means that the sensitivity and specificity of a test is independent of how rare the disease of interest is in the sample population. Furthermore, these parameters should be similar in cohort and case-control designs which utilize different sampling methods. While sensitivity and specificity are not affected by disease prevalence, they are affected by disease severity, something known as spectrum bias [10,11,12]. Generally when the disease severity/burden is high, the test sensitivity will appear better and the specificity will appear lower.

Table 8.1 Diagnostic contingency table

The positive predictive value is the probability of having the disease if the test is positive and is given by \( PPV=\frac{a}{a+b} \). The negative predictive value is the probability of not having the disease if the test is negative and is given by \( NPV=\frac{d}{c+d} \). Unlike sensitivity and specificity, the predictive values are calculated from the rows of the contingency table and are therefore directly affected by the prevalence of disease in the sample population. Predictive values will vary when the same test is applied to different patient subgroups that have different risks of disease. For example, referral bias occurs when a diagnostic test is applied to a sample taken from a specialty clinic with a higher than expected disease prevalence.

Predictive values must in turn be distinguished from diagnostic likelihood ratios. The positive diagnostic likelihood ratio is given by \( DLR+=\frac{Sens}{1- Spec} \) and the negative diagnostic likelihood ratio is given by \( DLR-=\frac{1- Sens}{Spec} \). Like sensitivity and specificity, the diagnostic likelihood ratios are calculated along the columns of the diagnostic contingency table and are therefore independent of disease prevalence. The diagnostic likelihood ratios quantify the increase in knowledge about the presence/absence of disease that is gained by applying the test, something that becomes very important in Bayesian decision-making frameworks. The interpretation of the diagnostic likelihood ratio is given in Table 8.2.

Table 8.2 Interpretation of diagnostic likelihood ratios

The Gold Standard for Bladder Cancer Diagnosis: Cystoscopy

The gold standard against which most diagnostic tests for bladder cancer are measured is white light cystoscopy (WLC) [3, 13]. While WLC is considered extremely sensitive and very specific for bladder cancer, it remains costly and somewhat invasive. Newer augmented cystoscopy methods such as hexaminolevulinate photodynamic diagnosis (PDD) and narrow-band imaging (NBI) [14,15,16] have been developed with the goal of increased sensitivity, however they are even more costly and have not yet become the community standard [1, 17,18,19]. Tumor histology obtained from cystoscopic biopsy specimens is an inappropriate gold standard for the evaluation urine test for two reasons. First, histology results are available only after the decision to biopsy has been made. This is the key decision point (biopsy or no biopsy) that cystoscopy and urine tests are trying to inform. It is not possible to use results that occur after a decision to inform that decision (e.g., you cannot make the decision to bet or fold your poker hand after you know if you won the hand). Second, histology is not available on all patients since only the positive/suspicious cystoscopy and positive urine tests proceed to biopsy. This unequal application of the gold standard to the study population is known as verification bias [20], and it can have very significant impact on how the diagnostic test performance metrics discussed above are interpreted.

Urine test results that are discordant from negative cystoscopy are a significant problem as they may falsely indicate the need for further diagnostic workup. This risk of this includes morbidity from unneeded extra interventions and testing done to chase down a positive test result (e.g., ureteroscopy and bladder biopsy), the risk of financial consequences of excess testing, the risk of patient worry and anxiety, and the medico-legal risk to the physician of a missed diagnosis. Currently available urine tests for bladder cancer are plagued in the clinic by mediocre performance, a strong dependency on how suspicious/atypical results are handled [21], spectrum bias where the test performs dramatically differently in one group of patients than another [12], cost issues, reader/interpreter dependency [22, 23], an inability to replace cystoscopy [18, 24, 25]. The AUA has recently recommended against using urine cancer tests during microscopic hematuria evaluation for this reason [3], but this argument could be extended to bladder cancer surveillance as well [21].

Anticipatory Positive Tests

Occasionally, a urine-based diagnostic test will be positive while cystoscopy and upper urinary tract evaluation are negative. In these cases, it is possible that either the urine test result is a false positive or that it is, in fact, a true positive which will become clinically apparent after some interval of follow-up when tumors become visible. An anticipatory positive test result refers to these true positives which detect bladder cancer prior to clinical detection by cystoscopy; the gold standard [26,27,28]. Several criteria can be used to define what constitutes an anticipatory positive test result: (1) the urine test must be positive prior to cystoscopy or upper urinary tract imaging or endoscopy; (2) the probability of developing a positive cystoscopy over time must be higher when the urine test is positive than when it is negative; and both (3) the measured specificity and (4) the sensitivity of the urine test must increase when the cystoscopy results that occur in the future (i.e., the cystoscopy results that show that the prior urine test anticipated the tumor) are credited to the urine test. Some urine tests appear to anticipate future tumors, but do so in such an unpredictable and inconsistent way that this property becomes all but useless. Anticipatory positivity was recently assessed in a large sample of urine cytology and FISH tests and demonstrated that positive urine tests frequently are not anticipating cancer [29].

Spectrum Bias

Sensitivity and specificity (and consequently the diagnostic likelihood ratios) are not fixed test properties and often vary across subgroups. This means that when a urine test is reported to have a particular sensitivity or specificity, this result may not apply to your patient population, a phenomenon that is known as spectrum bias [11, 30,31,32]. Although reporting the spectrum biases of diagnostic tests is recommended by the STARD initiative, it is an uncommon practice [33]. Sometimes, the differences in test performance can be so dramatic between patient subgroups that the test becomes very difficult to use. For example, we have shown that urine cytology and Urovysion FISH performance has dramatic variation between patient subgroups [12]. Proper stratification into relevant subgroups during the evaluation of a diagnostic test can highlight important spectrum biases [10].

Combining Diagnostic Test Results

It is common that more than one diagnostic test for a disease is considered. These multiple tests may be obtained sequentially or in parallel. When tests are ordered sequentially, the results of the first test inform the decision to obtain the second, and so on. Sequential testing leads to a decrease in sensitivity and NPV while causing an increase in specificity and PPV. Parallel testing, when a battery of tests is obtained all at once, leads to an increase in sensitivity and NPV while causing a decrease in specificity and PPV. Bayesian methods that use diagnostic likelihood ratios are particularly well suited for the combination of multiple decisions in medical decision making [34].

Hematuria and Bladder Cancer

Hematuria is the presence of microscopic (≥3 RBCs per high-powered microscopy field) or visible blood in the urine [3]. The association of hematuria with the presence of bladder cancer varies greatly in gross versus microscopic hematuria. Bladder cancer has a high prevalence (10–20%) in patients presenting with gross hematuria, indicating a clear need for cystoscopy in this population for detection [35,36,37]. However, the indication for cystoscopy in patients with microscopic hematuria is far more controversial, as the probability of BC in this setting is only 1–3% [3, 38]. This is complicated further by the high prevalence of microscopic hematuria in the adult population (2–20%) [36, 39], representing millions of adult Americans. Detecting microscopic hematuria is easy and inexpensive; urine dipstick tests have a sensitivity of ~80% and specificity of ~90% [40]. The problem arises when we attempt to use microscopic hematuria evaluation as a screening test for bladder cancer. This is due to the fact that microscopic hematuria itself has a low specificity for bladder cancer [40]. The low prevalence of bladder cancer in the general population therefore has dulled the enthusiasm for generalized microscopic hematuria screening [41,42,43]. In certain occupational settings where the risk of bladder cancer is felt to be very high, microscopic hematuria screening may make sense, however it is likely inappropriate in the general population. Analysis of a single urine dipstick as a screening tool for bladder cancer for the general population actually results in a PPV of 0.2% and an NPV at 98.8% [44]. Attempts at raising the cutoff for hematuria [45], or performing serial dipsticks have proven only marginally better [46]. Ultimately the low prevalence of BC has rendered broad screening measures ineffective [46].

Urine Cytology

Urine cytology involves looking for exfoliated neoplastic cells in the urine by microscopy and was first described in 1864 [47, 48]. It is the most commonly used urine test in the detection of bladder cancer. The urine cytology procedure involves centrifuging urine to obtain a cellular pellet, washing and resuspending the pellet, smearing the cells on a glass slide, then staining the slide with a Papanicolaou stain (or equivalent). In many centers, a cytotechnologist screens the cells and any abnormal slides go on to second tier evaluation by a cytopathologist (verification bias). Traditionally, urine cytology results are reported as positive, negative, atypical, inconclusive, suspicious, or as an inadequate sample. However, cytology results are not very reproducible and significant intra- and interobserver variation has been observed [49, 50]. Furthermore, urine cytology results are often (25–50%) reported as equivocal (atypical, inconclusive, or suspicious) [12, 51,52,53,54,55], which confounds clinicians and patients [56,57,58]. Equivocal results have a very large impact on the diagnostic performance of urine cytology and are rarely taken into account in studies of its diagnostic accuracy where test results are assumed to be binary, either positive or negative. When equivocal results are considered, the sensitivity and specificity of cytology worsen dramatically [59]. Adjunctive diagnostic tests have been used to adjudicate equivocal cytologies, as discussed later.

In order to make urine cytology more reproducible, a new classification method called the Paris system has been developed [60]. This system is designed to focus on high-grade cytological features (Table 8.3). Surprisingly, the new system includes a review of imaging and cystoscopy reports for certain cytology categories which indicates that diagnostic review bias is a significant possibility [13]. Ideally, the result of the urine cytology test should not depend on the results of other tests. It remains to be seen whether the Paris system will improve cytology performance.

Table 8.3 Paris system for reporting urine cytology [60]

Cytology is generally reported to have a sensitivity of ~30% and a specificity ~95% for bladder cancer, though these overall estimates are likely overly optimistic given more recent findings (see below) [61]. Urine cytology performance also varies significantly between patients. Numerous investigators have found a better sensitivity/specificity for high-grade tumors and worse sensitivity/specificity for low-grade lesions, a manifestation of spectrum bias [56, 57, 62,63,64,65]. Low-grade lesions and small tumors are thought to be less likely to exfoliate cancer cells into the urine and consequently are harder to recognize with urine cytology [66]. Other patient factors also affect urine cytology. Increasing age, male gender, and history of smoking are associated with increased sensitivity and decreased specificity [12]. Also, false positive results occur in the settings of instrumentation, inflammation, infection, stones, treatment with chemo and radiotherapy [52]. Despite all these factors affecting urine cytology results and universal acceptance that it has extremely poor sensitivity for bladder cancer, it is still widely used, predominantly because of a prevailing belief that it is rarely falsely positive. Indeed, some positive urine cytology tests have been shown to anticipate some future bladder cancers that are currently invisible with cystoscopy [67]. While this undoubtedly occurs in some cases, other investigators have shown that random bladder biopsies done in normal appearing bladders for positive cytologies has little benefit [68]. In consideration of these limitations, the AUA no longer recommends cytology in the workup of asymptomatic hematuria or in surveillance of low-grade bladder cancer [3].

Several things can be done to improve urine cytology performance. Immediate centrifugation prevents loss of cells due to prolonged processing [66]. Using whole voided specimen and multiple urine samples can increase the sensitivity (though also probably reduces specificity) [69]. Although it is a routine practice to obtain a voided urine specimen, a bladder barbotage obtained at cystoscopy increases the sensitivity for high-grade lesions [70, 71]. However, others have found that instrumentation can be a potential source for a false positive result [72, 73]. Other causes of a false positive cytology include inflammation, infection, stones, treatment with chemo and radiotherapy [52].

UroVysion Fluorescence In Situ Hybridization (Fish)

Fluorescence in situ hybridization (FISH) is the second most commonly used urine test for bladder cancer. UroVysion FISH is a cell-based assay that detects aneuploidy of chromosomes 3, 7, and 17 as well as the deletion of the 9p21 locus in exfoliated urine cells. Though FISH was long known to have the ability to detect bladder cancer [74,75,76], it wasn’t until 2000 that it FDA-approved its current form for initial bladder cancer diagnosis as well as for surveillance [77]. A meta-analysis of studies of UroVysion FISH has calculated its sensitivity at 63% and specificity at 87% in the detection of bladder cancer [78].

Spectrum bias has also been reported for FISH [12]. Unsurprisingly, FISH sensitivity has been reported to vary by stage: pTa (65%), pTis (100%), and pT1-pT4 (95%) [79]. For surveillance, sensitivity was 55% (CI 36–72%) and specificity was 80% (CI 66–89%) [78]. When UroVysion is obtained in the context of an equivocal cytology, the reported sensitivity and specificity are 72% and 83%, respectively [80]. Importantly, several retrospective studies have noted that a persistently positive FISH result during Bacillus Calmette Guérin (BCG) therapy predicts a poor response to therapy [81,82,83,84,85]. If these results are validated in a current prospective trial, FISH could serve as an early indicator of BCG treatment failure.

FISH has also been shown to anticipate future bladder cancer [26, 27, 86]. These studies usually assume that any future bladder cancer that develops after a positive FISH can be attributed to the positive FISH test, even if it occurs years earlier. Others have disputed this claim and careful analysis has shown that only a portion of future bladder cancers are actually anticipated by FISH [21, 29, 59].

Perhaps the most common clinical utilization of FISH is to adjudicate positive or equivocal cytologies occurring in the context of a normal cystoscopy [87,88,89,90]. Multiple studies have shown that FISH detects most cancers and misses few high-grade bladder cancers when used in patients with equivocal cytologies [27, 89, 91, 92]. Furthermore, data from two prospective studies of reflex FISH testing (done in equivocal cytology or cystoscopy) showed a decrease in bladder cancer associated costs and a 60% PPV and 97% NPV [93].

Bladder Tumor Antigen (BTA) Tests

Bladder Tumor Antigen (BTA) test is a protein-based test that is FDA-approved for diagnosis and surveillance of bladder cancer. The BTA tests identify two basement membrane antigens, human complement factor H-related protein and complement factor H, which are present within the urine of bladder cancer patients [94]. The original BTA test described by Sarodsy and later validated by D’Hallewin [95, 96] was different than the current tests and is no longer available secondary to its low sensitivity and specificity [97]. There are now two forms of the BTA test available: BTA stat and BTA TRAK. BTA stat is a point of care test that uses an immunochromatographic method to give a result in 5 min and does not require specialized personnel [94]. A meta-analysis of 22 studies of BTA stat calculated the sensitivity as 64% and specificity as 77% [78]. This was confirmed in a second meta-analysis [98]. BTA TRAK is a quantitative sandwich immunoassay that requires a laboratory assessment and several hours to perform [99]. A meta-analysis of four studies of BTA TRAK calculated the sensitivity as 65% and specificity as 74% [78].

Overall, BTA appears to have a higher sensitivity but lower specificity than urine cytology [56, 98]. Like most urine tests, it does seem to anticipate future bladder cancer in some cases [95, 100,101,102]. The test suffers from cross reactivity with red blood cells since complement factor H is present in high concentrations in serum and consequently has a high rate of false positives in hematuria [103]. It also suffers from poor performance in patients treated with BCG due to local inflammation [104]. Studies of BTA tests suffer from poor reporting and [6, 33], consequently, test sensitivity has varied by study design, 66% in case-control studies and 77% in cohort studies [105].

Nuclear Matrix Protein-22 (NMP-22) Test

NMP-22 is an immunoassay that detects a nuclear matrix protein involved in the mitotic apparatus which is present in greater concentration within tumor cells [106,107,108]. NMP-22 has been FDA-approved for both diagnosis and surveillance of bladder cancer. Like BTA, NMP-22 is either available as a qualitative point-of-care test or as a quantitative, laboratory-based test. Meta-analysis estimated the sensitivity and specificity of the qualitative assay as 58% as 88%, respectively, and that of the quantitative assay as 69% and 77% [78]. The improvement in sensitivity of NMP22 over cytology is due to improved detection of low-grade tumors.

NMP22 does, however, display spectrum bias. For example, the test has better sensitivity in women [107], and when multiple tumors are present [109, 110]. NMP22 anticipates future bladder cancers when cystoscopy is negative [111]. Several factors affect the performance of NMP22 including UTI, benign inflammatory conditions, urinary calculi, instrumentation, foreign bodies, other urologic malignancies, and genitourinary bowel interposition [112]. In fact, the false positive rate has been reported to be >80% when UTI is present and 100% with bowel interposition [113]. Even a concentrated urine secondary to dehydration can cause a false positive result by overestimating the NMP22 level [114]. In general, studies of NMP22 have been of poor quality [6, 33].

ImmunoCyt Test

ImmunoCyt is a cell-based test approved by the FDA for the bladder cancer surveillance. This test consists of fluorescent monoclonal antibodies that bind specifically to three cell surface glycoproteins present on the membrane of bladder cancer cells, making urinary bladder cancer cells visible microscopically. ImmunoCyt is used in conjunction with cytology to enhance the sensitivity of cytology [115,116,117,118]. A meta-analysis of 14 studies calculated the sensitivity of ImmunoCyt as 78% and specificity as 78% [78]. Due to spectrum bias, sensitivity increases with bladder cancer grade and stage. In a separate review examining the sensitivity, specificity, and predictive value of ImmunoCyt, the negative predictive value was better than the positive predictive value, suggesting that it has more false positives and fewer false negatives [119].

Perhaps the greatest limitation of ImmunoCyt is that, like cytology, the test is operator-dependent. Some investigators have found high interobserver variability and poor agreement [120], while others suggest that adequate training can overcome this limitation [121]. ImmunoCyt does not appear to anticipate future bladder cancers, though this aspect has not been carefully considered [122].

CxBladder Test

CxBladder is a cell-based test that identifies the presence of five mRNA fragments (MDK, HOXA13, CDC2, IGFBP5, CXCR2) in the urine that are expressed at high levels in patients with BC [123]. CxBladder is not FDA-approved though it is marketed for both hematuria evaluation and surveillance of BC. At a set specificity of 85%, CxBladder was able to detect 48%, 90%, and 100% of stage Ta, T1, and >T1 bladder cancers, respectively [123]. It was then validated in a cohort presenting with macroscopic hematuria [124, 125]. Based on a limited number of studies, test sensitivity is estimated to be ~85% and specificity ~85% [124, 126]. Given the paucity of studies involving Cxbladder, it is difficult to compare it to other urine-based diagnostic tests. Breen et al. performed multiple imputations with five datasets to compare four diagnostic tests (cytology, NMP22, FISH and CxBladder) and found that Cxbladder had a higher signal-to-noise ratio and better sensitivity than the other tests [127].

Arguments for and Against Routine Urine-Based Testing for Bladder Cancer

The purpose of urine-based diagnostic tests for bladder cancer is ultimately to replace cystoscopy for hematuria evaluations or for bladder surveillance in patients with a history of bladder cancer. This is an excellent goal with potential significant benefit to the patient as well as healthcare costs. After all, cystoscopy is an invasive test that is expensive, impacts patient quality of life, and can cause adverse events like urethral strictures, pain, and urinary tract infections. Unfortunately, several limitations preclude the recommendation of routine urine-based testing in place of cystoscopy. In the case of hematuria evaluation, particularly microscopic hematuria, the pre-test prevalence of bladder cancer is so low (1–3%) that even a near perfect urine test would not change decision making. For example, in Table 8.4 we have calculated the pre-test probability of bladder cancer and the post-test probabilities of bladder cancer given either a positive or negative result on a urine test. This is actually an overly optimistic view because many of these tests can have indeterminate results which would complicate things further. What can be seen in Table 8.4 is that the none of the urine tests obtained for microhematuria, whether positive or negative, significantly change the probability of having BC and are therefore uninformative. In the case of gross hematuria, a negative test result is associated with a ~ 10% probability of having bladder cancer. Most patients and physicians would agree that a 1 in 10 chance of bladder cancer is high enough to proceed to cystoscopy. Therefore, the result of the urine test does not change the need for cystoscopy and is therefore of little utility.

Table 8.4 Probability of having bladder cancer before and after a urine-based test done for hematuria

A more complicated issue exists in non-muscle invasive bladder cancer (NMIBC) surveillance since the pre-test probability of disease depends on patient risk. This is related to the characteristics of their particular BC as well as the time interval between cystoscopies. In our BC population at Duke, for example, the 1-year probability of recurrence in patients with NMIBC undergoing surveillance is approximately 25%. This overall value is not personalized, however, and could be much higher or lower than what is seen in the general community due to referral and other biases. The EORTC risk tables can help in this regard [128], although they tend to overestimate risk slightly in modern cohorts that use immediate postoperative intravesical chemotherapy and second-look transurethral resection. For example, in low-risk NMIBC patients (EORTC score 0) the 1-year cumulative incidence of recurrence is 15%, and since these patients undergo annual cystoscopy the pre-test probability of having a tumor is also 15%. In the very high risk (EORTC score ≥ 10) cohort, the 1-year cumulative incidence of recurrence is 61%, but since these patients undergo cystoscopy every 3 months (at least initially), the pre-test probability of having a tumor is actually 21% (note that it is not 61%/4, the reasons for which are explained in the following reference [129]). In Table 8.5, we demonstrate how these factors affect test performance for NMIBC undergoing surveillance. We would argue that in all cases, any of the urine tests being positive would indicate a clear need for cystoscopy because even the worst performing test done with the most frequency would have a 24% probability of bladder cancer if positive. The more important question is whether a negative urine test would cause a clinician to forego cystoscopy. In some of the scenarios below, a negative urine test is associated with a < 5% risk of bladder cancer, which for some physicians and patients would be low enough to avoid cystoscopy. In other scenarios (very high risk), the risk with a negative test is still ~10% or so, probably more risk than most patients and physicians would accept to avoid cystoscopy.

Table 8.5 Probability of having bladder cancer before and after a urine-based test done for bladder cancer surveillance

Conclusions

Urine tests are widely available for bladder cancer, but their value in routine clinical practice is unclear. Careful consideration of how these tests affect clinical decision making is required in order to understand their use.