Introduction

Bladder cancer is the second common cancer in genitourinary malignancies, with approximately 80,470 estimated new cases and 17,670 deaths in 2019 in the USA [1]. Pretreatment evaluation of muscle invasion is an essentially important and crucial factor to make therapeutic strategy. For non-muscle invasive bladder cancers (NMIBCs), bladder-sparing techniques such as transurethral resection of bladder tumor (TURBT) and intravesical instillation are generally applied, while cystectomy with urinary diversion or adjuvant chemotherapy is recommended for muscle invasive bladder cancer (MIBC) [2, 3]. However, TURBT is an operator-dependent procedure and the quality of surgery is partly influenced by the experience of surgeons [4]. Seven to 30% of NMIBCs were underestimated with tumor stage by first TURBT, even 45% in high-risk tumors of those without muscle tissue in the initial surgical specimen [5, 6]. Imaging modalities, especially magnetic resonance imaging (MRI), have been widely used for pretreatment evaluation. A recent study proposed a model combining TURBT with diffusion-weighted imaging (DWI) and proved it could improve the accuracy in distinguishing the presence of muscle invasive status in clinical practice [7]. Furthermore, typical features in imaging modality were able to help the surgeons to avoid unnecessary invasive operations and carry out definitive surgery.

MRI has been suggested as a promising alternative in tumor staging in recent years. The sensitivity and specificity of MRI for differentiating ≤ T1 and ≥ T2 were 0.87 and 0.79, respectively, reported by a previous meta-analysis [8]. In addition, multi-parameter MRI (mpMRI) with advanced functional imaging sequences such as DWI and dynamic contrast enhancement (DCE) could provide quantitative features and anatomic information for clinical assessment [9,10,11]. To standardize imaging protocol and reporting principle, recently, the Vesical Imaging-Reporting and Data System (VI-RADS) with a 5-point score criteria was proposed to suggest the probability of muscle invasion (Supplementary Fig. 1). The VI-RADS score was an overall estimation, which was generated by scoring the appearance of tumors in T2-weighted imaging (T2WI), DWI, and DCE sequences [12]. Up to now, several validation studies referring to the diagnostic values of the VI-RADS system have been reported. The current study was aimed to systematically assess the performance of VI-RADS score for detecting muscle invasive status of bladder cancer.

Materials and methods

Literature search and study selection

We systematically performed the literature search through PubMed, Embase, and Web of Science for eligible studies from inception up to November 20, 2019. The search terms including “VI-RADS” or “vesical imaging reporting and data system” were applied. The reference lists of relevant articles were also searched for potential reports. The current study was carried out in accordance with the recommendations of Preferred Reporting Items for Systematic and Meta-analyses (PRISMA) [13]. Studies were selected according to the following criteria: (1) study aimed to report the performance of VI-RADS for detecting muscle invasion of bladder cancer; (2) the condition of muscle invasion was confirmed through the pathologic results of surgical specimen; (3) raw data of true positive (TP), false positive (FP), false negative (FN), and true negative (TN) value could be directly extracted or calculated though the crosstabs; (4) full-text was available for the quality assessment. Exclusion criteria were set as follows: (1) articles written using non-English language; (2) review literature, comments, or conference abstract; (3) studies with inadequate information for data extraction or quality assessment. Two authors independently conducted the literature search and study selection; if any discrepancies existed, discussion was conducted until a final consensus was reached.

Two cutoff values of VI-RADS score (VI-RADS 3 and VI-RADS 4) were evaluated previously. The aim of this meta-analysis was set to comprehensively evaluate the ability of VI-RADS score for detecting muscle invasion with VI-RADS 3 and VI-RADS 4 as the cutoff value separately.

Data extraction and quality assessment

Data were retrieved from eligible studies using a standard form by two authors independently and the following items were included: author’s names, year of publication, study country, sample size, number of tumors with the percentage of muscle invasive tumors, mean or median age of patients, sex distribution, study design, surgical pattern for reference standard, magnetic field strength, number of readers. Specially, the corresponding TP, FP, FN, and TN values with cutoff point of VI-RADS 3 and VI-RADS 4 were retrieved, respectively. For studies without apparent TP, FP, FN, and TN values, these estimates were calculated from the crosstabs containing VI-RADS score and tumor stage. Tumors with T1 or lower stages were defined as non-muscle invasive, and T2 or higher stages were muscle invasive.

To evaluate the quality of the included studies, the revised Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool [14], which focused on four domains of participant selection, index test, reference standard, and flow and timing, was utilized.

Statistical analysis

Heterogeneity in the pooled estimates was evaluated by using the Q test and I2 heterogeneity index. I2 value greater than 50% indicated significant heterogeneity; for this case, the bivariate mixed-effects regression model was applied for the meta-analysis [15]. The pooled estimates were sensitivity, specificity, positive likelihood ratio (LR+), and negative likelihood ratio (LR−). The forest plots of sensitivity and specificity with VI-RADS 3 and VI-RADS 4 were depicted, respectively. The hierarchical summary receiver operating curve (HSROC) and area under the curve (AUC) with 95% confidence interval (95% CI) were constructed for the diagnostic usefulness. Fagan nomogram was depicted to exhibit the post-test probabilities when the pre-test probability was 50%, which implied the clinical utility of VI-RADS score [16].

Due to the high heterogeneity, subgroup analysis and meta-regression were conducted to explore the potential source. Subgroup analysis was based on these groups: sample size (< 100 and > 100), study design (retrospective and prospective), field strength (3.0 T and 1.5 T), and number of readers (2 readers and 5 readers). Sensitivity analysis was carried out to examine the robustness of the pooled results. Deeks’ funnel plot analysis was applied for assessing the publication bias. All the statistical analyses were performed using the MIDAS module of STATA software (version 14.1) [17]. p value less than 0.05 was defined as statistically significant.

Results

Search results

In total, 70 articles were retrieved by literature search and 39 articles were left after removing the duplicates. After then, the title and abstract were screened; 20 articles were excluded for irrelevant content, 6 for comments, and 4 for conference abstract. After the full-text review by two authors independently, three studies were excluded for original study proposing the VI-RADS score (n = 1), written in non-English language (n = 1), and inadequate information to retrieve (n = 1). In addition, we manually searched the reference list for potential studies, but no new eligible articles were obtained. Finally, six articles [18,19,20,21,22,23] with 1064 patients were included in this meta-analysis (Fig. 1).

Fig. 1
figure 1

Flowchart of the literature search and selection

Study characteristics and quality assessment

The basic characteristics of the included studies are listed in Table 1. All the six studies were reported in 2019, in which four studies were conducted retrospectively [18, 19, 21, 23] and two studies were prospective [20, 22]. The mean/median age of patients ranged from 57.2 to 72.8 years. The pathological results of surgical specimen were adopted as the reference standard. Four studies reported the surgical pattern was TURBT or re-TURBT for high-risk tumors [18, 20,21,22], one study was partial or radical cystectomy and TURBT [19], and the other one study was cystectomy or TURBT and re-TURBT for previously inadequate assessment of muscle invasion [23]. The percentage range of muscle invasive tumors in these reports was 25.0–50.0%. One study from Japanese researchers presented the interobserver agreement of 5 readers for interpreting VI-RADS score [18], and other five studies reported the results of 2 readers [19,20,21,22,23]. The quality of the included studies, evaluated by the QUADAS-2 assessment tool, is listed in Supplementary Figs. 2 and 3, which revealed that only low or unclear risk of bias and applicability concerns occurred.

Table 1 Characteristics of included studies in this meta-analysis

Synthesis of included studies

The sensitivities and specificities of these studies with different cutoff values were calculated through TP, FP, FN, and TN values, which are listed in Table 2. For VI-RADS 3 as the cutoff value, the sensitivity ranged from 0.78 to 0.95 and the specificity ranged from 0.44 to 0.96. The corresponding results of VI-RADS 4 as cutoff values, which could be extracted or calculated in five studies, were 0.66–0.91 and 0.76–1.00. By synthesizing these estimates, the pooled sensitivity and specificity of VI-RADS 3 for detecting muscle invasive condition were 0.90 (95% CI 0.86–0.94; I2 78.84%) and 0.86 (95% CI 0.71–0.94; I2 98.02%), respectively (Fig. 2). The pooled LR+ was 6.5 (95% CI 3.0–14.2) and LR− was 0.11 (95% CI 0.08–0.16). The HSROC of VI-RADS 3 as the cutoff value is presented in Fig. 3a and the AUC value was 0.93 (95% CI 0.91–0.95), which was similar with the result of VI-RADS 4 as the cutoff value (Fig. 3b). Figure 4 shows the Fagan nomogram, from which we could read off that the post-test probabilities of muscle invasion, given VI-RADS ≥ 3 and < 3, were 87.0% and 10.0%, with the pre-test probability of 50.0%. Regarding VI-RADS 4 as the cutoff value, the pooled sensitivity, specificity, LR+, and LR− were 0.77 (95% CI 0.65–0.86), 0.97 (95% CI 0.88–0.99) (Supplementary Fig. 4), 23.3 (95% CI 6.9–79.1), and 0.24 (95% CI 0.15–0.36).

Table 2 Summary of the diagnostic estimates of included studies
Fig. 2
figure 2

Forest plot of pooled sensitivity and specificity of VI-RADS 3 as the cutoff value for detecting muscle invasion

Fig. 3
figure 3

The HSROC curve of VI-RADS 3 (a) and VI-RADS 4 (b) as the cutoff values for diagnosing MIBC. HSROC: hierarchical summary receiver operating characteristic; MIBC: muscle invasive bladder cancer

Fig. 4
figure 4

Fagan nomogram reflecting the clinical utility of VI-RADS score

Subgroup analysis, meta-regression, and sensitivity analysis

To identify the source of heterogeneity, subgroup analysis and meta-regression were performed. Study design (p value 0.01) and surgical pattern of reference standard (p value 0.02) were demonstrated as the cause for the heterogeneity of sensitivity. However, the heterogeneity of specificity could not be explained by meta-regression analysis (Table 3). Deeks’ funnel plot analysis revealed that no publication bias existed in the analysis (p value 0.94; Fig. 5). Furthermore, influence analysis indicated all the included studies were below the red-dotted line, and the outlier detection analysis suggested that no outlier value was detected and all the six studies should be included for this meta-analysis (Fig. 6). The above tests confirmed the robustness of our results.

Table 3 Subgroup analysis and meta-regression results
Fig. 5
figure 5

Deeks’ funnel plot to evaluate publication bias

Fig. 6
figure 6

Sensitivity analysis including residual-based goodness of fit (a), bivariate normality (b), influence analysis (c), and outlier detection analysis (d)

Discussion

The current meta-analysis comprehensively evaluated the diagnostic performance of the VI-RADS score for detecting the muscle invasive status. To the best of our knowledge, this was the first meta-analysis focusing on this subject. Our results revealed that the pooled sensitivity and specificity of the included studies were 0.90 (95% CI 0.86–0.94) and 0.86 (95% CI 0.71–0.94), respectively, considering VI-RADS 3 as the cutoff value. The AUC value of HSROC was 0.93 (95% CI 0.91–0.95). Therefore, the VI-RADS score was capable with high diagnostic accuracy to differentiate MIBC and NMIBC based on the results of this study.

Most of included studies reported the diagnostic performance by setting VI-RADS 3 as the cutoff value. In this meta-analysis, we separately calculated the pooled estimates of VI-RADS 3 and VI-RADS 4 as the cutoff value. As the results show, the AUC values of HSROC of different cutoff values are similar, but the specificity and LR+ of VI-RADS 4 are obviously higher than those of VI-RADS 3. The results suggest that it is more accurate for predicting muscle invasion to treat VI-RADS 4 and VI-RADS 5 as “positive”. Due to the substantial complications and impaired life quality, the therapeutic decision of radical cystectomy should be determined after cautious evaluation with certainty of surgical indications [24, 25]. Therefore, VI-RADS 4 as the cutoff value should be applied for those only using imaging modality predicting muscle invasive status and not operating TURBT before radical cystectomy. In addition, it is noted that VI-RADS 3 as the cutoff value performed better in sensitivity and LR−, which would decrease the misdiagnosis of MIBC and accordingly reduce the incidence of metastasis and recurrence. So VI-RADS 3 as the cutoff value could be used for the evaluation of patients who tend to receive conventional diagnostic TURBT for pathologic confirmation of muscle invasiveness.

In the past decade, MRI was recommended to be a promising tool in pretreatment assessment and has superior performance in clinical staging in bladder cancer than computed tomography [26]. One prior meta-analysis included 24 studies and 1774 patients to review the diagnostic ability of ≥ 1.5 T MRI for local staging [27]. The results reported that the pooled sensitivity and specificity were 0.92 (95% CI 0.88–0.95) and 0.86 (95% CI 0.42–1.00), respectively. In particular, the subgroup analysis for functional techniques showed the sensitivity and specificity increased up to 0.94 (95% CI 0.89–1.00) and 0.95 (95% CI 0.89–0.98) for conventional plus two functional sequences. Another meta-analysis focusing on the mpMRI reported the AUC of HSROC was 0.946 [28]. The results of our current study were inferior to these two reports, which might be explained by the fact that the studies included in earlier meta-analysis may be lacking uniform interpretation criteria of MRI. Aiming at the unification and standardization for mpMRI interpretation and reporting, the VI-RADS system was created. The system adopted several typical features such as the tumor stalk [29] and enhancement of bladder wall to define the scoring principles. In addition, the excellent agreement between different observers indicated the VI-RADS score was convenient to follow, which was supportive to our results for future generalization.

Significant heterogeneity of the included studies was observed. Further meta-regression analysis revealed that the study design and surgical pattern for reference standard might be the source of the heterogeneity of pooled sensitivity. However, the cause for the heterogeneity of specificity was not found by meta-regression analysis. Prospective study was considered to be more reliable than retrospective study for avoiding several biases. Regarding surgical pattern, only TURBT was reported in four studies, and cystectomy or TURBT in other two studies. Since the risk of no muscle layer contained in surgical specimen exists, TURBT might provide uncertain or incorrect pathological tumor stage in those cases with unsophisticated surgical operations [30]. However, the VI-RADS score was created as a consensus for pretreatment evaluation of primary bladder cancer without previous surgical history, so it is hardly achieved only taking cystectomy as the reference standard. Even so, we further conducted the sensitivity analysis, which confirmed the robustness of our results.

The VI-RADS system was initially created for primary tumors without intravesical instillation or surgical history because these treatments would result in edema and inflammation of the bladder tissues. The application of MRI might be restricted in recurrent tumors due to the overestimation of tumor stages. In an exploration study, Guidice FD et al advocated an ambitious perspective of applying the VI-RADS score in the management of those candidates for re-TURBT [22]. High diagnostic efficiency was presented with the sensitivity and specificity of 0.85 (95% CI 0.62–0.97) and 0.94 (95% CI 0.87–0.98) in differentiating adverse pathology (upstaging to MIBC) and persistent NMIBC at re-TURBT. Despite the good prognostic ability, the utility of mpMRI as a predictive criterion to avoid unnecessary re-TURBT still should be selected cautiously. Therefore, future researches focusing on the application of the VI-RADS score before re-TURBT or in recurrent tumors are warranted.

Several limitations should not be ignored in this meta-analysis study. First, since the VI-RADS score was proposed within 1 year, only six validation studies have been published, and the sample size of three studies was limited with less than 100 patients. In addition, we only included literatures written in English so as to exclude one study because of the inconvenience of extracting important information. Second, most of the included studies were conducted retrospectively, which might cause the heterogeneity of pooled estimates due to the undeniable biases. Third, one of the included studies reported the diagnostic results of two readers and we chose the results with higher accuracy to perform the meta-analysis. The basis of the choice was the concept that higher accuracy potentially came from the interpretation of more experienced readers. Besides, the pooled results would not come to an opposite conclusion but a minor change if the results of another reader were included. Above all, prospective studies with large sample size are needed for further validation of the VI-RADS score.

Conclusions

The VI-RADS score has a good performance in detecting the muscle invasiveness of primary bladder cancer. VI-RADS 3 and VI-RADS 4 as the cutoff value seem to provide similarly overall diagnostic efficiency and should be selectively utilized according to the individualized condition.