Introduction

Bladder cancer is the tenth most common malignancy in the world [1] and the fourth most common in men in the United States [2]. Urothelial carcinoma is the most common pathologic subtype of bladder cancer [3]. Muscle-invasive bladder cancer (MIBC) has a five-year survival rate of 50% [4] whereas non-muscle-invasive bladder cancer (NMIBC) has a rate of 90% [5]. Determining muscle invasion is therefore important; NMIBC is treated with transurethral resection of bladder tumor (TURBT) with or without intravesical chemo- or immunotherapy whereas MIBC is usually treated with cystectomy with or without neoadjuvant chemotherapy, followed by adjuvant chemotherapy when needed [6]. Bladder cancer is usually diagnosed by examining the pathology specimens obtained at TURBT, but the procedure may underestimate muscle invasion in up to half of patients [7, 8].

The Vesical Imaging-Reporting and Data System (VI-RADS) was proposed in 2018 to estimate the probability of MIBC based on a combination of features derived from T2-weighted imaging, diffusion-weighted imaging (DWI), and dynamic contrast material-enhanced (DCE) images [9]. The system uses a five-point Likert scale to express the probability of muscle invasion and has been found to be useful for managing urothelial cancers; VI-RADS has been shown to have high sensitivity and specificity for predicting muscle invasion before TURBT and to have good inter-observer agreement [10,11,12,13,14,15,16,17,18,19,20].

Relatively limited data exist regarding use of MRI for evaluation of bladder cancer after TURBT. Although multiparametric MRI has been shown to have an 88–92% sensitivity and 74–84% specificity for muscle invasion after TURBT [21], the applicability of VI-RADS in this setting has not been fully evaluated [22]. Therefore, the purpose of this study was to assess performance and inter-reader agreement of VI-RADS for detecting MIBC following TURBT.

Methods

Patient population

This institutional review board-approved, Health Insurance Portability and Accountability Act-compliant retrospective study was performed at a large academic center, and informed consent was waived. A list of patients who had a cystectomy between January 1, 2016, and October 1, 2020, was obtained from an electronic urology database. The inclusion criteria were urothelial carcinoma of the bladder, MRI performed after TURBT, followed by cystectomy within three months, without treatment between MRI and the cystectomy (Fig. 1). Patients who received treatments other than TURBT (e.g., Bacillus Calmette-Guerin (BCG) and neoadjuvant chemotherapy) before the MRI were included as they are part of established clinical practice and did not alter the purpose of the study. Pathological stage of bladder cancer at cystectomy recorded in electronic medical records (Epic Systems Verona, WI) was used as the reference standard.

Fig. 1
figure 1

Flow chart showing cohort development of 70 patients with urothelial carcinoma of the bladder examined with MRI following TURBT who were then treated with cystectomy within 3 months of the MRI without intervening treatments. TURBT: transurethral resection of bladder tumor, CE: contrast-enhanced, DW: diffusion-weighted

MRI technique

Pelvic MRI examinations were performed on nine scanners. All studies included T2-weighted imaging in axial, coronal, and sagittal planes; diffusion-weighted imaging (DW) in the axial plane; and dynamic contrast-enhanced (CE) T1-weighted imaging with fat saturation in the axial plane before and 30, 60, and 90 s after intravenous administration of 0.1 mmol/kg Gadovist (Bayer Inc, Mississauga, Ontario, Canada), 1 mmol/mL with maximum dose of 10 ml). Patients were not given specific preparation instructions for bladder distension and not administered antispasmodic agents.

Review of images

MRI examinations were reviewed by two fellowship-trained abdominal radiologists (Faculty 1 JLC with 27 years of experience and Faculty 2 LAR with 2 years of experience) and an abdominal radiology fellow (XL). The readers together assigned VIRADS scores based on T2-weighted (referred to as ‘structural category’ (SC)), DW, and CE sequences [9] to a training cohort of 15 pelvic MRI exams that were not included in the final study cohort and reviewed the pathology. For the study cohort, the readers were blinded to pathology results of both the TURBT procedure and cystectomy and independently reviewed MR images on a commercial picture archiving and communication system (Visage 7 Imaging Platform, Visage Imaging, San Diego, CA). A VI-RADS score was assigned by each reader to the largest mass (if multiple) for each patient to ensure the same mass was evaluated by each reader.

Reference standard

Each reader documented location and size of the assessed mass. Readers’ VI-RADS scores were compared to pathology reports from cystectomy specimens obtained from the electronic medical record (Epic Health Research Network, Verona, WI) (initials withheld for blinded review).

Statistical analysis

For each reader, the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy of VI-RADS for detecting MIBC was calculated using two thresholds: VI-RADS ≥ 3 (score of 3, 4, or 5) and VI-RADS ≥ 4 (score of 4 or 5). An optimal VI-RADS score threshold for detecting MIBC was determined by generating a receiver-operating characteristic (ROC) curve for each reader’s score and by maximizing the sum of sensitivity and specificity. We also calculated the area under the curve (AUC) for each reader’s scores. Inter-reader agreement was assessed for SC, DW, and CE sequences and for each VI-RADS score using Gwet’s AC [23,24,25] and percentage agreement, both in quadratic-weighted form to reflect the similarities of grading. Inter-reader agreement for the presence of MIBC was also evaluated with VI-RADS scores ≥ 3 and ≥ 4 using Gwet’s AC and percentage agreement. Though Gwet AC is less well-known than Cohen’s or Fleiss’ kappa, it is less prone to paradoxical results and can be interpreted similarly [24,25,26]. Test characteristics and inter-reader agreement were also compared between patients who received treatment other than TURBT prior to the MRI and those who did not. All statistical analyses were performed using R4.2.0 (r-project.org).

Results

Patient cohort

Of 462 patients who underwent cystectomy, 180 patients with a pathological diagnosis of urothelial carcinoma of the bladder underwent TURBT and had MRI prior to cystectomy. Of these, 84 were excluded because the MRI was performed prior to the TURBT procedure (n = 41), the cystectomy was performed more than three months after MRI (n = 38), or patients received one or more additional treatments between MRI and cystectomy (n = 5). Finally, of the 96 remaining patients, 26 were excluded because the MRI was incomplete or insufficient image quality. The final study cohort consisted of 70 patients (Fig. 1). Of these 55/70 (79%) were examined on 3 T scanners, 15/70 (21%) on 1.5 T scanners. The mean age of patients was 68 years ± 11 years [SD] with range of 39–85 years; 58 patients were men, 12 women.

Of 70 patients, 31 (44%) received treatment other than TURBT before the MRI. Of these, 14 patients were treated with systemic chemotherapy (cisplatin/gemcitabine, combination therapy with methotrexate, vinblastine, adriamycin and cisplatin [MVAC], or both), 12 with intravesical BCG, 3 with both BCG and systemic chemotherapy, and 2 with radiation therapy and systemic chemotherapy. The median duration between the TURBT and MRI was 66 days (interquartile range, IQR, 33–110 days) and the median duration between the MRI and cystectomy was 25 days (IQR 13–39 days).

Final pathology at cystectomy

At cystectomy, of 70 patients, 32 (46%) had MIBC; these included 8 with muscle invasion only, 16 with invasion of the perivesical fat, and 8 with invasion of adjacent organs. Of 70 patients 38 (56%) did not have MIBC; of these, 14 had no residual tumor. On pathology, evidence of inflammatory or post-therapy changes such as “fibrosis,” “giant cell reaction,” post-therapy effect,” “therapy-related changes,” or “scar” were described in 9/32 (28%) patients with MIBC and 31/38 (82%) in patients without MIBC.

VI-RADS assessment and inter-reader agreement

The distribution of VI-RADS scores for each reader is summarized in Table 1. The AUC for each of the three readers was 0.65, 0.71, and 0.74 respectively with no pairwise statistical significance between them (p-values = 0.06, 0.08, 0.97) (Fig. 2). For VI-RADS score of ≥ 3, sensitivity for detection of MIBC for each of the three readers ranged from 81.3 to 93.8%, specificity 36.8–55.3%, PPV 55.6–60.5%, NPV 77.3–87.5%, and accuracy 62.9–67.1%. For VI-RADS score of ≥ 4, sensitivity ranged from 78.1 to 81.3%, specificity 47.4–68.4%, PPV 55.6.-67.6%, NPV 72.0–76.9% and accuracy 61.4–72.9% with no significant change when the three readers’ scores were compared to each other (Table 2, Fig. 3). For VIRADS scores ≥ 3 and ≥ 4, there was no significant difference in detection of MIBC in patients who received treatments other than TURBT prior to MRI compared to those who did not (p ≥ 0.33, Table 3).

Table 1 Distribution of Vesical Imaging-Data and Reporting System scores among three readers (Faculty 1, Faculty 2, and Fellow) for cohort of 70 patients with bladder urothelial carcinoma following transurethral resection
Fig. 2
figure 2

Receiver-operating characteristic curve depicts Vesical Imaging—Reporting and Data System (VI-RADS) scores of each reader for predicting muscle invasion of bladder urothelial carcinoma after transurethral resection. Area under curve (AUC) for consensus, Faculty 1, Faculty 2, and Fellow VI-RADS scores with values ranging from 0 (no tumor) to 5 (high likelihood of muscle-invasive tumor extending into perivesical fat)

Table 2 Test characteristics of applying Vesical Imaging–Reporting and Data System (VI-RADS) in the detection of muscle-invasive bladder cancer using scores ≥ 3 and ≥ 4 for all cystectomy patients (n = 70) who underwent MRI following transurethral resection
Fig. 3
figure 3

64-year-old male who had muscle-invasive bladder cancer (MIBC) at transurethral resection and underwent neoadjuvant chemotherapy. MRI performed after treatment yielded a VI-RADS score of 4 by Faculty 1 and 2 and VIRADS score of 5 by Fellow. a Axial T2-weighted image shows intermediate signal within muscular layer of posterior bladder wall (arrow); b Axial contrast-enhanced image demonstrates enhancement within muscular layer (arrow); c axial diffusion-weighted image and d apparent diffusion coefficient map reveal restricted diffusion within muscular layer (arrow). Cystectomy confirmed MIBC, T3 stage

Table 3 Test characteristics for detection of tumor and detection of muscle-invasive bladder cancer with VI-RADS scores ≥ 3 and ≥ 4 for patients who did not (n = 39) and did receive (n = 31) treatment other than TURBT prior to MRI

Inter-reader agreement was 0.67 [95% confidence interval (CI), 0.53,0.83] for SC, 0.65 [95% CI 0.50,0.80] for DW, 0.67 [95% CI 0.52,0.82] for CE, and 0.69 [95% CI 0.55,0.83] for VI-RADS score using Gwet’s AC and not statistically significant among the readers for each category (p = 0.36–0.92). Percent agreement was 88 [95% CI 83,92] for SC, 88 [95% CI 84,93] for DW, 88 [95% CI 84,93] for CE, and 89 [95% CI 84,93] for VI-RADS score and not statistically significant among the readers for each category (p ≥ 0.19). Inter-reader agreement for detection of MIBC using Gwet’s AC with VI-RADS score ≥ 3 was 0.64 [95% CI 0.49,0.78] and 0.54 [95% CI 0.38,0.70] with VI-RADS score ≥ 4. Percent agreement was 79 [95% CI 72,87] for MIBC using a VI-RADS score ≥ 3 and 76 [95% CI 69,84] using a VI-RADS score ≥ 4 (Table 4). For VIRADS scores > 3 and > 4, there was no significant difference in Gwet’s AC and percent agreement in patients who received treatments other than TURBT prior to MRI compared to those who did not (p ≥ 0.37).

Table 4 Inter-reader agreement using Gwet’s AC and % agreement for A) T2-weighted or structural category (SC), diffusion-weighted (DW), and contrast-enhanced (CE) MRI categories and Vesical Imaging-Reporting and Data System (VI-RADS) scoring for all 70 patients and for those who did not (n = 39) and did (n = 31) receive treatment other than TURBT prior to MRI. B) Inter-reader agreement using Gwet’s AC and % agreement for VI-RADS ≥ 3 and VI-RADS ≥ 4 for all 70 patients and for those who did not (n = 39) and did (n = 31) receive treatment other than TURBT prior to MRI

Although Faculty 2 and Fellow had slightly higher accuracy for VI-RADS score ≥ 4 vs ≥ 3 (72.9% versus 67.1%, p = 0.58, and 65.7% versus 62.9%, p = 0.86, respectively) and Faculty 1 had higher accuracy for score of ≥ 3 (62.9% versus 61.4%, p = 1.0), these differences were not statistically significant. Further, there was no statistical difference between sensitivity, specificity, PPV, and NPV of each reader’s VI-RADS ≥ 3 and ≥ 4 scores (Faculty 1 p ≥ 0.15, Faculty 2 p ≥ 0.34, Fellow p ≥ 0.65).

Correlation of VI-RADS scores and pathology

The majority of false positive (FP) assessments (19/24 for Faculty 1, 13/17 for Faculty 2, 15/21 for Fellow) had evidence of inflammatory or post-therapy changes at pathology. FP results were more common with VI-RADS scores ≥ 3 than VI-RADS scores ≥ 4. Among those patients in whom all three readers’ results using a VI-RADS score ≥ 3 were FP (n = 12), 10 pathology reports at cystectomy revealed inflammatory or post-therapy changes (of which six had coexisting NMIBC) and two had NMIBC without inflammation or post-therapy changes; among those patients in whom all three readers’ results using a VI-RADS score ≥ 4 were FP (n = 9), all revealed inflammatory or post-therapy changes at pathology (Fig. 4), four of which were associated with NMIBC. Regarding false negative results, of the two patients in whom all three readers assigned VI-RADS scores ≥ 3 and ≥ 4 but MIBC was not found, one had a 0.5 cm T2 tumor, and in another, the prostate gland obscured visualization of a 3.2 cm T2 tumor.

Fig. 4
figure 4

73-year-old male with muscle-invasive bladder cancer at transurethral resection with neoadjuvant chemotherapy prior to cystectomy. MRI after treatment yielded a three-reader agreement of VI-RADS score of 4. a Axial T2-weighted image shows intermediate signal within thickened muscular layer of left bladder neck (arrow); b axial contrast-enhanced image demonstrates arterial enhancement of muscular layer of left bladder neck (arrows) and blood in bladder lumen (arrowhead); c axial diffusion-weighted and d apparent diffusion coefficient map reveal restricted diffusion involving the thickened muscular layer (arrows). Cystectomy revealed “chronic inflammation”

Discussion

In our study of 70 patients examined with MRI following TURBT, using a VIRADS score ≥ 3, the sensitivity of among three readers ranged from 81.3 to 93.8% and specificity ranged from 36.8 to 55.3%; using a VIRADS score of ≥ 4 sensitivity ranged from 78.1 to 81.3% and specificity ranged from 47.4 to 68.4%. Our sensitivity results are comparable to a meta-analysis of 20 studies of 2725 patients who had VI-RADS applied before TURBT and showed a 92% pooled sensitivity for VI-RADS ≥ 3 and 82% for VI-RADS ≥ 4, but our specificity results are lower than the pooled specificity of 85% for VI-RADS ≥ 3 and 95% for VI-RADS ≥ 4 reported in that meta-analysis [26]. The lower specificity of VI-RADS scores we observed following TURBT may have been due to inflammatory or post-therapy changes as 82% (31/38) of patients without MIBC had those findings at pathology. Others have postulated that CE and DW images may help distinguish inflammation from tumor [27,28,29]. In a study of 61 patients who had MRI after TURBT or biopsy, malignant lesions at cystectomy enhanced significantly earlier (at least 3.6 s) than non-malignant lesions [27]. In a study of 12 patients, DW images showed higher accuracy, sensitivity, specificity and PPV (92.6, 100, 81.8, 88.9%, respectively) compared with CE images (59.3, 81.3, 27.3, 54.2%, respectively) for detecting recurrent urothelial cancers [30]. In this small study, the degree of restriction and enhancement helped discriminate cancer from inflammation and fibrosis. The technical parameters for the dynamic CE series used in our study followed VI-RADS recommendations [9] and as a result did not have the temporal resolution to distinguish between early and late arterial enhancement. We did not specifically analyze characteristics of diffusion restriction in our cohort to distinguish between inflammation and tumor.

We included patients who received treatments other than TURBT before MRI in our cohort as neoadjuvant treatments were used commonly (44%, 31/70) and have become an important part of the management of bladder cancer. The analysis showed that the performance of VI-RADS for these patients was not statistically significant compared to those who did not receive additional treatment, and there was no statistical difference in inter-observer agreement between these two groups.

There was moderate inter-reader agreement in our study; Gwet’s AC value was 0.64 with VI-RADS score ≥ 3 and 0.54 with a VIRADS score ≥ 4. This level of inter-reader agreement for VI-RADS is comparable to a prior study on performance of MRI for detection of MIBC in the post-TURBT setting with k = 0.66 [21] but lower than a recent meta-analysis of VIRADS performance among 19 studies and 2439 patients with a pooled k = 0.76 using cystectomy or repeat TURBT as reference standards, and reporting of inter-reader agreement for detection of MIBC [31]. Our inter-reader percent agreement was similar for VI-RADS scores of ≥ 3 (79% agreement) and ≥ 4 (76% agreement) and comparable to 81% agreement in a prospective study of 231 pre-TURBT patients [15].

VI-RADS score of ≥ 3 has been considered optimal for detection of MIBC prior to TURBT [10, 15, 16, 19]. However, in the post-TURBT setting, VIRADS score of ≥ 4 may be more appropriate. In our study, VIRADS scores ≥ 3 and ≥ 4 among the three readers had similar sensitivities (ranges of 81.3–93.8% and 78.1–81.3%, respectively); however, a VIRADS score of ≥ 4 had a higher specificity (47.4–68.4%) than a VIRADS score ≥ 3 (36.8–55.3%). A VIRADS threshold score of 4 had a slightly better, though not statistically significant, performance for two of the three readers based on ROC analysis, maximum sum of sensitivity and specificity, and accuracy, similar to the conclusion of another study of 73 patients that included 31 patients who underwent TURBT and additional treatments at least two weeks prior to MRI [32]. Although a VIRADS score of ≥ 4 had an overall slightly better performance, choosing which VIRADS score to guide subsequent management in clinical practice may depend on a shared decision between the patient and urologist. Opting for a VIRADS score of ≥ 4 would maximize specificity; on the other hand, choosing a VIRADS score of ≥ 3 would maximize sensitivity and thus decrease the chance of undertreating MIBC. Although there were slightly fewer, 9/70 (13%) exams in which all 3 readers incorrectly diagnosed MIBC using a VI-RADS score ≥ 4 compared to 12/70 (17%) exams using a VIRADS score ≥ 3, using the higher VIRADS threshold may still result in some patients getting an unnecessary cystectomy. Despite this, MRI results can be communicated to urologists and patients and assist with decision making.

Limitations of this study included its retrospective, single institution design. Because pathology was used as the reference standard, only patients undergoing cystectomy for bladder cancer were included. As a result, the cohort likely had a higher proportion of patients with muscle-invasive disease than if all patients with bladder cancer had been included. Nevertheless, 38/70 patients (54%) had non-muscle-invasive disease at surgical pathology including 14 who had no tumor at surgical pathology, perhaps due to prior medicinal treatments. Our study included examinations performed on both 3 T (79%, 55/70) and 1.5 T (21%, 15/70) scanners. Although inter-reader agreement has been shown to be higher when VIRADS was used in patients examined with 3 T scanners [31], VI-RADS has been thought to applicable to patients examined at 1.5 T [9]. Also, patients were not given specific bladder preparation instructions prior to the MRI. In our experience, patients with bladder cancer, especially in the post-TURBT setting, often cannot tolerate a fully distended bladder. However, we believe bladder distension was adequate for interpretation. In patients with multiple tumors, the largest was assessed and may not have been the tumor with the greatest likelihood for invading muscle. However, this methodology assured that the same mass was assessed by the three readers, an approach used by others (10, 33). Though each reader documented size and location of each mass, we did not evaluate the effects of these variables on performance of VI-RADS. Finally, our study was performed at a tertiary care center with abdominal radiologists and may not be applicable in the community setting with general radiologists.

In summary, VI-RADS had moderate accuracy and inter-reader agreement for detecting muscle-invasive bladder cancer following TURBT. Compared to what has been reported in the pre-TURBT setting, application of VI-RADS in patients after TURBT had similar sensitivity but reduced specificity, likely due to inflammatory and post treatment changes.