Introduction

Bladder cancer (BCa) is the ninth most common malignancy worldwide, with more than 80,000 cases diagnosed and 17,000 deaths in the USA each year [1, 2]. In Europe, approximately 118,000 cases and 52,000 deaths were estimated in 2012 [3]. The majority of BCa are histologically typed as urothelial carcinoma and stratified into low- and high-grade cancers. High-grade lesions may be non-muscle-invasive (NMIBC) and muscle-invasive (MIBC) cancers [4], with distinct treatments and clinical outcomes [5, 6].

The local staging of BCa traditionally relies on transurethral resection of bladder tumor (TURBT), with cross sectional imaging (such as CT and MRI of abdomen, pelvis, and thorax) used to assess nodal disease and distant metastases.

TURBT carries a significant risk of understaging a cancer [7, 8], risks bladder perforation, can be a morbid uncomfortable procedure, and delays radical treatment in those whom it is needed. Therefore, more accurate, faster, and non-invasive staging techniques are needed to improve the outcomes from BCa. Given its superior contrast resolution and the addition of functional sequences, MRI has proved to be the best imaging modality for local staging [9]. Several studies have tested its ability to differentiate NMIBC from MIBC and two recent meta-analysis have shown a pooled sensitivity of 87% and 92% and a specificity of 79% and 87%, respectively [10, 11]. As such, one could imagine replacing TURBT with mpMRI in suitable patients, with the potential for faster radical treatment when necessary and non-invasive diagnosis and staging in those with poor fitness. Indeed, an ongoing trial is currently testing this hypothesis: ISRCTN Reference 35296862.

Despite the growing use of MRI in local staging of BCa, there is a lack of standardization in terms of protocol, reporting, and aims of imaging. A panel of experts has recently issued a reporting system aimed at standardizing multiparametric MRI (mMRI) of the bladder for research and clinical application, the Vesical Imaging-Reporting and Data system (VI-RADS), in accordance with previously successful attempts at standardizing mpMRI protocols [12, 13]. Particularly, the purpose of the 5-point scoring system proposed is to define the risk of muscle invasion in untreated patients with BCa. To date, however, the recently developed score has not been validated in clinical practice.

Here we evaluate both the accuracy and inter-observer variability with the use of VI-RADS for discrimination between NMIBC and MIBC, in patients undergoing mpMRI before TURBT.

Materials and methods

Patient population

This retrospective study received Institutional Review Board prior to starting and the need for informed consent was waived.

Between September 2017 and July 2018, patients referred to our institution for suspected BCa were offered an institutional protocol which entailed mpMRI prior to TURBT.

Imaging protocol

All exams were performed using a 3 T magnet (Discovery MR750W, GE Healthcare), equipped with a 32-channel phased-array coil. The imaging protocol of the pelvis, focused on the bladder, was VI-RADS-compliant and included 2D fast spin-echo (FSE) or PROPELLER T2-weighted sequences in axial, coronal and sagittal planes, axial diffusion-weighted imaging (DWI), and axial dynamic contrast-enhanced imaging (DCE) after a single injection of Gadovist administered at a dose of 0.1 mmol/kg at a rate of 3 ml/s. Imaging protocol is summarized in Table 1. Patients were administered intramuscular antispasmodic agent and instructed to start drinking 500–1000 ml of water in 30 min before the examination to obtain an adequate distension of the bladder.

Table 1 Multiparametric MRI protocol at 3 T. Source: [12]

TURBT and histopathologic evaluation

All patients underwent bipolar TURBT within 6 weeks after mpMRI at the Department of Gynecological-Obstetric and Urological Sciences, “Sapienza” Rome University, and at Department of Urology, Regina Elena National Cancer Institute, Rome, Italy. All the procedures were carried out by the same two experienced surgeons. According to the European association of Urology (EAU) Guidelines [6], high-risk NMIBC was defined as any high-grade transitional cell carcinoma (TCC) (primary or recurrent), any multiple, recurrent and large (> 3 cm) low-grade TCC, and any primary carcinoma in situ (Cis). Four weeks after the first TURB, all high-risk patients underwent re-TURBT with resection of the previous primary tumor site to eliminate any suspicious residual areas and to confirm the stage of the disease. All tumor samples were analyzed by two uropathologists with more than 15 years of experience in BCa.

MR image interpretation

All mpMRI exams were reviewed by two radiologists with a special interest in urogenital imaging, with more than 10 years of experience (reader 1) and 5 years of experience (reader 2). Both radiologists had read more than 50 mpMRI of the bladder in the years prior to the beginning of the study, starting from 2014. Both readers were blinded to clinical history and histopathology results and were asked to score each lesion, up to 3 per patient, according to VI-RADS, which is summarized in Fig. 1. When patients presented more than one lesion, only the lesion with the highest VI-RADS was considered. A schematic representation of each VI-RADS category is depicted in supplementary fig. 1. In brief, T2WI is the first modality to be assessed, looking for a continuous low signal intensity (SI) line in the bladder wall that represents an intact muscularis propria. As for DWI (high b-value images), the tumor is hyperintense, with a corresponding hypointensity on ADC maps, while the tumor stalk and inner layer have low SI. On DCE images, the tumor and inner layer enhance early and can enhance to the same degree, while the muscularis propria should maintain a low SI in the early phase. Case examples of VI-RADS scoring are shown in Figs. 2 and 3. An example of false-negative MRI exam is depicted in Fig. 4. An example of false-positive MRI exam is shown in supplementary fig. 2.

Fig. 1
figure 1

Schematic representation of VI-RADS scoring. Source: [12]. SI: signal intensity, CE: dynamic contrast-enhanced imaging, DWI: diffusion-weighted imaging

Fig. 2
figure 2

Sixty-seven-year-old man presenting with hematuria and a bladder mass discovered at ultrasound. a T2W imaging shows an exophytic lesion on the posterior bladder wall, > 1 cm in greatest dimension, with a low SI stalk and preservation of the low SI representing the muscularis propria. VI-RADS score for T2W imaging is 2. b and c DWI (b = 2000) and ADC map, respectively, show an exophytic lesion with restricted diffusion, with low SI stalk on DWI and muscularis propria with continuous intermediate signal on DWI. VI-RADS score for DWI is 2. d DCE imaging shows early enhancement of the lesion and inner layer, without early enhancement of the muscularis propria. DCE was assigned a VI-RADS category 2. Overall VI-RADS score was 2. Histopathology after TURBT and RE-TURBT confirmed a T1 tumor. T2W T2-weighted, SI signal intensity, DWI diffusion-weighted imaging, ADC apparent diffusion coefficient

Fig. 3
figure 3

Seventy-eight-year-old man with hematuria. a T2W imaging show a lesion > 1 cm in the right lateral bladder wall, with intermediate SI that extends through the muscularis propria. T2W imaging was assigned a VI-RADS category 4. b and c DWI (b = 2000) and ADC maps show a lesion with significant restricted diffusion, extending through the muscularis propria. VI-RADS score of DWI was 4. d DCE imaging shows early and heterogeneous enhancement of the lesion, which extends through the muscularis propria. DCE was assigned a VI-RADS category 4. Overall VI-RADS score was 4. Pathologic stage after cystectomy was pT2bN0Mx. T2W T2-weighted, SI signal intensity, DWI diffusion-weighted imaging, ADC apparent diffusion coefficient

Fig. 4
figure 4

Seventy-year-old man with history of T1 BCa treated with TURBT. a T2W imaging shows a broad based tumor > 1 cm, localized on the posterior wall of the bladder, with thickened inner layer and uninterrupted low SI of the muscularis propria. VI-RADS score of T2WI was 2. b, c DWI (b = 2000) and ADC map, respectively, show a lesion with restricted diffusion, without a stalk, and a muscularis propria with continuous intermediate signal, particularly evident on ADC map.VI-RADS score of DWI was 2. d DCE imaging shows early enhancement of the lesion and the inner layer, without extension through the muscularis propria. VI-RADS score of DCE was 2. Overall VI-RADS score was 2. Histopathology after TURBT and RE-TURBT showed a T2 tumor; therefore, the case was recorded as false-negative. T2W T2-weighted, SI signal intensity, DWI diffusion-weighted imaging, ADC apparent diffusion coefficient

Statistical analysis

Sensitivity, specificity, positive predictive value (PPV), and negative predictive (NPV) value were calculated for both readers, using a 2 × 2 contingency table, and for each VI-RADS category used as cutoff. MIBC (stage pT2 or higher) correctly classified by mpMRI was considered as true positive. The performance of mpMRI with combined T2W, DWI, and DCE and the use of VI-RADS was assessed by means of receiver operating characteristics (ROC) curve analysis, for both readers. The Ƙ statistics was used to estimate inter-reader agreement. The bootstrap resampling procedure was used to calculate the standard error of the Ƙ estimates, as previously described [14]. All statistical analysis was performed by using software SPSS version 23.0 (IBM Corp.). All tests were two-sided and statistical significance was set at p < 0.05.

Results

Patient and lesion characteristics

In total, 94 patients were eligible for inclusion, of which 8 were excluded because of a cardiac pace-maker, 5 for claustrophobia, and 3 for renal failure. A total of 78 patients were enrolled in the study, and underwent mpMRI (median 2, range 1–6 weeks) prior to TURBT. Of these, 2 were excluded because of incomplete mpMRI imaging and one patient because the tumor identified at imaging was recurrent ureteric cancer. The final study population included 75 patients (including 13 females), with a median age of 69 years (Table 2). A total of 53 patients were diagnosed with NMIBC and 22 patients with MIBC. Eighteen patients underwent radical cystectomy. Table 3 summarizes the lesions characteristics, including stage, size, and location.

Table 2 Patient demographics
Table 3 Lesion characteristics

VI-RADS performance, ROC curve analysis, and inter-reader agreement

The proportion of patients assigned to each VI-RADS category is summarized in Table 4. The sensitivity of VI-RADS for reader 1 and 2 was 91% (95% CI 71–99) and 82% (95% CI 60–95), respectively, when the cutoff VI-RADS > 2 was used to define MIBC. The specificity for the same cutoff was 89% (95% CI 77–96) and 85% (95% CI 72–93) for reader 1 and 2 respectively. PPV and NPV at the same cutoff were 77% (95% CI 56–91) and 96% (95% CI 86–99) for reader 1 and 69% (95% CI 48–85) and 91% (95% CI 81–98) for reader 2. If the cutoff to define MIBC was moved at VI-RADS > 3, sensibility and specificity were 82% (95% CI 60–95) and 94% (95% CI 84–99) for reader 1; for reader 2, they were 77% (95% CI 55–92) and 89 (95% CI 77–96). PPV and NPV at the same cutoff were 86% (95% CI 63–97) and 93% (95% CI 82–98) for reader 1; for reader 2, they were 74% (95% CI 51–90) and 91% (95% CI 79–97). Accuracy was 91% for reader 1 at both cutoffs; for reader 2, it was 84% for the first cutoff and 85% for the second. Table 5 summarizes sensitivity and specificity for both readers at each cutoff.

Table 4 Proportion of muscle-invasive bladder cancer in each VI-RADS category, for both readers
Table 5 Sensitivity and specificity of both readers for each cutoff. In brackets, 95% confidence interval

ROC curve analysis showed that the optimal criterion, identified with the Youden’s index, was VI-RADS > 2, for both readers. The area under curve (AUC) was similar for both readers: 0.926 (95% CI 0.842 to 0.974) for reader 1 and 0.873 (95% CI 0.776 to 0.938) for reader 2, respectively (p = 0.21). ROC curve analysis is summarized in Fig. 5.

Fig. 5
figure 5

Comparison of receiver operating characteristics curve analysis for both readers. AUC area under curve

Inter-reader agreement evaluated by the Ƙ statistics was good for the overall VI-RADS score (Ƙ = 0.731, SE 0.072). As for the single mpMRI sequences, T2WI showed excellent agreement (Ƙ = 0.804, SE 0.065), while DCE and DWI showed moderate (Ƙ = 0.554, SE 0.074) and good agreement (Ƙ = 0.714, SE 0.058). Assessment of inter-reader agreement is summarized in supplementary table 1.

Discussion

In this study, we retrospectively validated the accuracy of VI-RADS in discriminating between NMIBC and MIBC in our cohort of patients with BCa. Firstly, we found a low proportion (5–10%) of false-negative exams, i.e., a small number of MIBC in VI-RADS categories 1 and 2. As for the overall performance of VI-RADS in differentiating NMIBC from MIBC, we found an overall accuracy between 85% and 91%.

In the past decade, several authors have demonstrated the ability of MRI to distinguish NMIBC from MIBC [15, 16]. The introduction of functional in addition to morphological sequences has proved to be crucial in such distinction; particularly, DWI has been shown to be able to accurately T stage BCa, thanks to the description of the “tumor stalk” semiotics by Takeuchi et al [17]. The utility of the stalk in describing T stage was more recently confirmed by Wang et al [18]; our results are in accordance with those authors, as we found that 81% of NMIBC presented with a stalk. Given the growing interest in evaluating mpMRI performance in BCa local staging, three meta-analyses have recently addressed the issue, finding a pooled sensitivity of 0.87 (95% CI 0.82–0.91), 0.92 (95% CI 0.88–0.95), and 0.90 (95% CI 0.83–0.94) and specificity of 0.79 (95% CI 0.72–0.85), 0.87 (95% CI 0.78–0.93), and 0.88 (95% CI 0.77–0.94), respectively [10, 11, 19].

In our study, sensitivity and specificity, though higher for the more experienced reader 1, were in accordance with the recent meta-analyses. Reader 1 obtained a sensitivity and specificity of 91% and 89%, reader 2 had a sensitivity of 82% and a specificity of 85%. If the cutoff VI-RADS > 3 was used to define MIBC, the improvement in specificity was minimal for both readers (89 to 94% and 85 to 89%, respectively), with a significant decrease in sensitivity, especially for reader 2 (82 to 77%). Given the aggressiveness of MIBC, we believe that it is of utmost importance to avoid false-negative, and therefore, in our practice, we tend to consider a VI-RADS category 3 tumor as MIBC until otherwise proven. ROC curve analysis confirmed this finding, setting the optimal criterion at VI-RADS > 2. Inter-reader agreement analyzed by Cohen’s Ƙ was excellent for T2W imaging (Ƙ = 0.804), moderate/good for the overall score and DWI (Ƙ = 0.731 and 0.714 respectively), and only moderate for DCE imaging (Ƙ = 0.554). We hypothesize that this discrepancy could be attributed to the less widespread use of DCE in clinical routine and therefore inter-reader agreement as for DCE is likely to improve as mpMRI of the bladder becomes more widespread.

Current approach to diagnosis and local staging of BCa has several well-known limitations. To encourage further diffusion of mpMRI into research and clinical practice, a panel of experts recently proposed a 5-point scoring system aimed at standardizing the use [12]. The VI-RADS scoring system, however, is still in its infancy, and therefore it needs validation against the available staging techniques.

Even if cystoscopy and TURBT are still considered the gold standard, a substantial proportion of patients are understaged at TURBT [20]. Moreover, TURBT frequently needs to be repeated [6], but adherence to guidelines varies between urologist [21]. Imaging modalities such as CT or MRI are currently recommended by several guidelines for local and distant staging, with the caveat that neither technique can be accurate in differentiating T2 from higher-stage tumors [5]. If a non-invasive diagnostic modality such as mpMRI was considered reliable in identifying MIBC, the attending urologist could perform a “diagnostic” TURBT and expedite radical treatment without waiting for patients to undergo a TURBT and a RE-TURBT. On the other hand, if an NMIBC could be identified preoperatively, the surgeon could be more confident in performing a “curative” TURBT.

This study has several limitations. Firstly, its retrospective design carries a strong risk for selection bias, even though the prevalence of MIBC in our cohort broadly reflects the epidemiology of BCa [5]. In addition, it is a single-center study and therefore all mpMRI were performed on the same highly performing 3 T magnet, making the applicability of our results to the clinical routine somewhat difficult. Nonetheless, BCa is often treated at tertiary care center, and therefore the availability of good equipment and expertise should not be of concern. The need of multicentric validation studies is nonetheless strong because radiologist working in the same reference center could have very similar background, which reflects in the way they interpret mpMRI exams. The lack of radical cystectomy as a gold standard for all patients is another important limitation; however, for patients without a diagnosis of MIBC, TURBT and RE-TURBT remain the best standard available. Furthermore, we did not test the performance of single sequences, and therefore we could not extrapolate any information about a possible “dominant sequence,” as was previously done for other scoring systems such as PI-RADS [13]. In addition, we did not assess the accuracy of VI-RADS by tumor location: some “special” locations such as the dome and the trigone, may pose specific problems to the reader and may need a more focused assessment, which is difficult to standardize.

Another important limitation of the present study is the restricted patient population offered mpMRI. Even though the distinction between MIBC and NMIBC is imperative to further direct patient management, the clinician may ask different question as regards to BCa. Particularly, van der Pol et al showed that mpMRI can reliably stage BCa even after TURBT [22]; however, their patient population consisted only of patient with MIBC who underwent radical cystectomy. In the setting of neoadjuvant therapy offered to patients with MIBC, mpMRI is also uninvestigated; further studies will clarify its role in this context, as suggested by Necchi et al [23]. Finally, the accuracy of mpMRI in nodal staging was not assessed in the present study, limiting its applicability as a comprehensive staging tool. However, a recent meta-analysis showed that MRI and PET/CT have a comparable performance in nodal staging [24].

In conclusion, mpMRI with the use of VI-RADS is accurate in differentiating MIBC from NMIBC and has a good sensitivity and specificity. Inter-reader agreement is overall good. Our results support the use of mpMRI in patients with BCa before TURBT, as it could give more confidence to the urologist in performing a curative TURBT, or it could make him more cautious in interpreting TURBT results. However, prospective, multicentric studies are needed to validate the VI-RADS.