Introduction

Cervical cancer is the second most common malignancy in women [1]. One of the most important aspects in the pretreatment evaluation of cervical cancer is parametrial invasion (PMI) [2]. PMI is known to be associated with prognosis, and patients with suspected PMI usually will be treated with primary chemoradiation or adjuvant treatment after surgery [3]. Therefore, it is crucial to accurately assess PMI in patients with cervical cancer in order to select the optimal treatment.

Currently, the International Federation of Gynecology and Obstetrics (FIGO) staging system is widely used for clinical staging of cervical cancer [4]. FIGO staging is primarily based on physical examination and further evaluation may be performed using modalities of cystoscopy and proctoscopy. Errors in clinical FIGO staging have been consistently reported, with understaging and overstaging up to 40% and 64%, respectively [5]. On the other hand, magnetic resonance imaging (MRI) has shown promising results regarding the staging accuracy of cervical cancer, but it is only recommended, not required according to the FIGO committee of Gynecologic Oncology [6].

Until now, there have been two published meta-analyses assessing the diagnostic performance of MRI for detection of PMI. The sensitivity and specificity of MRI were 74% and 82% in the report by Bipat et al. [7], which compared computed tomography (CT) and MRI using studies published from 1985 to 2002. In a more recent report, MRI showed sensitivity of 84% and specificity of 92% in studies published up to 2011 [8]. Although these two meta-analyses showed that MRI was superior to CT and clinical examination, MRI is still not being used by some groups (up to 30%) according to a recent survey [9]. However, as both meta-analyses included studies in the remote past and with recent technical advances in MRI such as high magnetic field strength (i.e. 3-Tesla) and diffusion-weighted imaging (DWI), one could expect that the diagnostic performance of MRI would have further improved over the years, possibly providing additional evidence for MRI to be incorporated in the FIGO staging system.

Therefore, the purpose of our study was to review the literature published since 2012 to obtain updated diagnostic performance values of MRI for detecting PMI in patients with cervical cancer using surgico-pathological results as the reference standard.

Materials and methods

This meta-analysis was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. For this meta-analysis, we formulated a research question based on the PICOS criteria as the following [10]: What is the diagnostic performance of MRI for detection of PMI in patients with cervical cancer, as compared with surgico-pathological results, in studies published since 2012?

Literature search

A computerised search of MEDLINE and EMBASE databases up to 29 December 2016 was conducted. All synonyms or related terms were included in the search query as the following: ((‘cervical cancer’) OR (‘cervical carcinoma’) OR (‘cervical malignancy’) OR (‘cervical neoplasm’) OR (‘cervical tumor’) OR (‘cervical tumour’) OR (‘cervix cancer’) OR (‘cervix carcinoma’) OR (‘cervix malignancy’) OR (‘cervix neoplasm’) OR (‘cervix tumor’) OR (‘cervix tumour’)) AND ((staging) OR (stage) OR (parametrial invasion) OR (parametrial infiltration)) AND ((magnetic resonance imaging) OR (MRI)). The bibliographies of included articles were screened to identify other eligible studies. We did not limit the search to any particular language.

Inclusion criteria

We included studies that met the following PICOS criteria [10]: (1) patients diagnosed with cervical cancer; (2) index test used MRI for detection of PMI; (3) for comparison, surgico-pathological results were available as the reference standard; (4) the study provided the sensitivity and specificity of MRI, or the corresponding raw data for constructing a 2 × 2 contingency table; and (5) publication type had to be original articles.

Exclusion criteria

The exclusion criteria were: (1) published before 1 January 2012; (2) less than ten patients; (3) publication type other than original articles; (4) MRI was used for evaluation of cervical cancer, but focused on topics other than detection of PMI; (5) overlapping patient population; and (6) insufficient data for reconstruction of 2 × 2 tables (even after attempts to contact the authors). If multiple publications with an overlapping study population were identified, we only included the study with the largest patient cohort.

Two reviewers (S.W. and C.H.S.) independently performed the literature search and study selection. When disagreement was present, consensus was reached after discussion with a third reviewer (S.Y.K.).

Data extraction and quality assessment

The following data regarding patient, study and MRI characteristics were extracted using a standardised form: (1) patient characteristics – number of patients, median age and range of patients, prevalence of PMI, histological subtypes of included tumours, and FIGO stages; (2) study characteristics – origin of study (authors, institution and country), publication year, duration of patient recruitment, study design (prospective vs. retrospective and whether enrolment was consecutive or not), reference standard, interval between MRI and the reference standard, blinding to surgico-pathological results, level of analysis (per-patient or separately analysed for each parametria); and (3) MRI characteristics – magnet field strength (3- vs. <3-Tesla), scanner model and manufacturer, coil type, spin echo (SE) technique (fast SE [FSE] or turbo SE [TSE] vs. SE), slice thickness (≤5 mm vs. >5 mm), acquired imaging planes for T2-weighted imaging (T2WI), inclusion of DWI and contrast-enhanced (CE) MRI, and use of an antispasmodic drug (i.e. scopolamine butylbromide).

We assessed the methodological quality of the selected studies using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool [11].

Both data extraction and quality assessment were performed independently by two previously noted reviewers (S.W. and C.H.S.) followed by discussion with a third reviewer (S.Y.K.) in case of disagreement.

Data synthesis and analysis

Data from the included studies were reconstructed in 2 × 2 tables (true positive, false negative, false positive and true negative) and their sensitivity and specificity were calculated. If diagnostic performance of various MRI protocols was separately provided, then the one including the most advanced and comprehensive MRI protocol was selected (T2WI + DWI > T2WI; endovaginal coil > phased array coil; and 3D > 2D). If results by multiple independent readers were available, we chose the one with higher sensitivity. If diagnostic performance had been assessed both on a per-patient basis and for both parametria, we used the results of the per-patient analysis, as the treatment decision making (primary chemoradiation vs. radical surgery) is based on whether PMI is present on at least one side.

Pooled estimates of sensitivity and specificity were calculated with hierarchical logistic regression modelling including bivariate modelling and hierarchical summary receiver operating characteristic (HSROC) modelling [12]. An HSROC curve with 95% confidence and prediction regions was plotted to graphically present the results. Publication bias was evaluated by visual inspection of the Deeks’ funnel plot and calculating the p-value from Deeks’ asymmetry test [13].

Heterogeneity, or in other words the variation in study outcomes between the included studies, was determined using various statistical methods. First, Cochran’s Q-test was performed with p < 0.05 indicating heterogeneity. Second, Higgins I2 test was performed and interpreted using the following criteria: inconsistency index (I2), 0–40%, heterogeneity might not be important; 30–60%, moderate heterogeneity may be present; 50–90%, substantial heterogeneity may be present; and 75–100%, considerable heterogeneity [14]. Third, we looked for a threshold effect in terms of a positive correlation between the sensitivity and false-positive rate among the selected studies. Fourth, tau squared (τ2), which is considered the most informative expression of heterogeneity in a meta-analysis, was calculated [15].

Meta-regression analysis was performed to investigate the cause of heterogeneity using the following variables: (1) study design (prospective vs. retrospective); (2) ethnicity (Asian vs. non-Asian); (3) prevalence of PMI (≥16.7% [median value of study population] vs. <16.7%); (4) magnet field strength (3- vs. <3-Tesla); (5) coil type (phased-array or endovaginal coil vs. others); (6) SE technique (FSE/TSE vs. SE); (7) slice thickness (≤5 mm vs. >5 mm); (8) T2WI planes (included both axial oblique and sagittal planes vs. not included), (9) inclusion of DWI; (10) inclusion of CE MRI; and (11) use of an antispasmodic drug. In addition, subgroup analysis was planned for studies solely using radical hysterectomy as the reference standard.

The ‘midas’ module in Stata 10.0 (StataCorp LP, College Station, TX, USA) and ‘mada’ package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses with p <0.05 indicating statistical significance.

Results

Literature search

The systematic literature search yielded 1,195 articles. Among them 379 were duplicates and 785 were excluded based on the review of the abstract alone. Full-text reviews were performed for the remaining 31 articles and 17 were excluded for the following reasons (see Online Supplementary Table 1): (1) not in the field of interest (n = 13); (2) overlapping population (n = 2); and (3) insufficient data to reconstruct 2 × 2 contingency tables (n = 2). Ultimately, 14 original articles including a total of 1,436 patients were included [16,17,18,19,20,21,22,23,24,25,26,27,28,29]. Figure 1 shows the detailed study selection process.

Fig. 1
figure 1

Flow diagram showing study selection process. * = MRI was used for staging but parametrial invasion was not separately assessed (n = 6), criteria for determination of parametrial invasion was tumour size (n = 3) or visibility of tumour (n = 2), MRI was used to assess parametrial invasion but was correlated only with clinical FIGO staging (n = 2)

Characteristics of included studies

The patient characteristics are described in Table 1. The size of the study population ranged from 25 to 298 patients and the prevalence of PMI ranged from 4.0% to 43.3%. The patients had a median age of 34.4–57.8 years. The studies mostly included only adeno- or squamous subtypes, but other histological subtypes constituted 0.9–6.6% in five studies, and in one study the subtype was not mentioned. The FIGO stages varied among the studies, including only stages IIA or lower in eight, including advanced stages of IIB or greater in five and not explained in one.

Table 1 Patient characteristics

The study characteristics are described in Table 2. Articles originated from Asian countries in seven studies, and from non-Asian countries in the other seven. Regarding study design, six studies were prospective and eight were retrospective. Patient recruitment was consecutive in seven studies, but was not explicitly mentioned in the other seven. Eight studies solely used radical hysterectomy specimens as the reference standard; three used radical hysterectomy or trachelectomy; and others were based on surgery (without details of the type of operation), radical hysterectomy or biopsy, and histopathological correlation or multidisciplinary decision based on imaging (initial and follow-up), clinical examination and treatment change. Only one study analysed the right and left parametria separately, whereas all other studies performed analysis on a per-patient basis.

Table 2 Study characteristics

The MRI characteristics are described in Table 3. Six studies used only 3-Tesla scanners; six used only 1.5-Tesla scanners; one used either 1.5- or 3-Tesla scanners; and one used a 1-Tesla scanner. With regard to coil type, one study used an endovaginal coil, two used phased array or body coils, and the remaining studies used only phased-array coils. All studies, except for one that was not explicit, used TSE or FSE sequences. Only two studies used a slice thickness >5 mm; it was ≤5 mm in the remaining studies. Both axial oblique and sagittal planes were included in the MRI protocol in half of the studies. DWI was used in two studies, CE-MRI in five, and both DWI and CE-MRI were used in three. Antispasmodic drugs were used in eight studies, not used in one, and use was not reported in five.

Table 3 MRI characteristics

Quality assessment

The distribution of QUADAS-2 scores in the 14 included studies is shown in Fig. 2. The quality of the studies was generally moderate, with 12 (86%) studies satisfying more than four of the seven domains [30]. The details for each domain are provided in the Online Supplementary Material.

Fig. 2
figure 2

Grouped bar charts show risk of bias (left) and concerns for applicability (right) of 14 included studies assessed with QUADAS-2

Heterogeneity among the included studies

Based on the Q-test, heterogeneity was not likely to be present among the 14 studies (p = 0.127). The Higgins I2 statistics showed that there may be moderate heterogeneity in terms of the sensitivity (I2 = 49.29%) and specificity (I2 = 51.16). However, the coupled forest plot of the sensitivity and specificity demonstrated no threshold effect through visual assessment (Fig. 3). In addition, no threshold effect was demonstrated between the sensitivity and false-positive rate (Spearman correlation coefficient = −0.042 [95% CI −0.560–0.500]). When heterogeneity was assessed in terms of the diagnostic odds ratio (DOR), Cochran’s Q-test (p = 0.471), Tau2 (0.240), and Higgins I2 (0%) all suggested that heterogeneity is not likely to be present.

Fig. 3
figure 3

Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% confidence intervals (CIs) in parentheses. Corresponding heterogeneity statistics are provided at the bottom right corners. Horizontal lines indicate 95% CIs. Studies are number (1–14) from bottom to top in descending order of sensitivity

Diagnostic accuracy of MRI for detection of parametrial invasion

For all 14 studies, the pooled sensitivity was 0.76 (95% CI 0.67–0.84) with a specificity of 0.94 (95% CI 0.91–0.95). In the HSROC curve, there was only a small difference between the 95% confidence and prediction regions, again implying that the heterogeneity among the included studies was low (Fig. 4). The area under the HSROC curve was 0.94 (95% CI 0.92–0.96). The Deeks’ funnel plot and the results of the Deeks’ asymmetry test showed that the likelihood of publication bias was low (p = 0.31) (Fig. 5).

Fig. 4
figure 4

Hierarchical summary receiver operating characteristic curve of the diagnostic performance of MRI for detection of parametrial invasion in cervical cancer. Each numbered circle represents each included study in order of descending sensitivity, as annotated in Fig. 3

Fig. 5
figure 5

Deeks’ funnel plot. A p-value of 0.31 suggests that the likelihood of publication bias is low. Each numbered circle represents each included study in order of descending sensitivity, as annotated in Fig. 3. ESS effective sample size

Heterogeneity exploration using meta-regression and subgroup analyses

The results of meta-regression analysis are shown in Table 4. Among the different variables evaluated, only magnet field strength, use of DWI and administration of antispasmodic drugs were significant factors affecting the heterogeneity (p <0.01 for all three variables). Regarding magnet field strength, studies using 3-T MRI scanners showed higher sensitivity (0.84 [95% CI 0.76–0.93]) but similar specificity (0.94 [95% CI 0.91–0.98]) compared with studies using MRI scanners with 1.5-T or lower (sensitivity of 0.66 [95% CI 0.55–0.77] and specificity of 0.94 [95% CI 0.91–0.97]). Studies that used DWI demonstrated higher sensitivity (0.82 [95% CI 0.70–0.94]) and specificity (0.97 [95% CI 0.95–0.99]) compared with studies that did not (sensitivity of 0.72 [95% CI 0.62–0.82] and specificity of 0.91 [95% CI 0.89–0.93]). A statistical comparison was limited regarding the use of antispasmodic drugs, as there was only one study that did not use this drug [20]. Other factors, including study design (p = 0.97), ethnicity (p = 0.17), prevalence of PMI (p = 0.15), coil type (p = 0.11), slice thickness (p = 0.17), T2WI planes (p = 0.42) and inclusion of CE MRI (p = 0.61) were not significant factors affecting the heterogeneity. SE technique (FSE/TSE vs SE) was not included as a covariate in the meta-regression analysis as there was no study that did not use FSE/TSE sequences.

Table 4 Results of meta-regression analysis of MRI for detection of parametrial invasion (PMI) in cervical cancer

As some studies used methods other than radical hysterectomy as the reference standard, additional subgroup analysis was performed to obtain the diagnostic performance values using studies that solely used radical hysterectomy as the reference standard. The pooled sensitivity estimates for the eight included studies was 0.73 (95% CI 0.60–0.83) with specificity of 0.93 (95% CI 0.90–0.95). The area under the HSROC curve was 0.94 (95% CI 0.91–0.96).

Discussion

In our meta-analysis, we assessed the diagnostic performance of MRI for detection of PMI in patients with cervical cancer. The pooled sensitivity and specificity of the included studies were 0.76 (95% CI 0.67–0.84) and 0.94 (95% CI 0.91–0.95), respectively. Moreover, the summary estimates using a subgroup of studies (n = 8) that solely used radical hysterectomy as the reference standard showed consistent results with a pooled sensitivity and specificity of 0.73 (95% CI 0.60–0.83) and 0.93 (95% CI 0.90–0.95), respectively. The overall sensitivity estimates for MRI in detecting PMI have not shown substantial improvement from those in previous meta-analyses by Bipat et al. [7] published in 2002 and Thomeer et al. [8] published in 2013, which reported sensitivities of 74% and 84%, respectively. Furthermore, in the study by Bipat et al. [7], the publication period (1985–1991 vs. 1992–1997 vs. 1998–2002) was demonstrated not to have influenced the sensitivity of MRI. The studies included in our study (n = 14) do not overlap with the studies included in the prior meta-analysis, and therefore represent the performance of MRI using more recent techniques. For instance, FSE or TSE sequences were used in 13 of 14 studies in our meta-analysis, while only 17 of 36 used them in the analysis by Thomeer et al. [8]. In addition, all but one of the studies used 1.5- or 3-Tesla MRI scanners in our study, while only 24 of 36 studies in the meta-analysis by Thomeer et al. [8] used such scanners. Although the sensitivities of 74–84% reported in the previous two studies and ours can be considered good, it is still discouraging that there has been no remarkable improvement over decades, given the rapid advancement in MRI technology. However, the updated performance values of MRI in our study are consistent with previously reported values, showing better diagnostic accuracy than CT (sensitivity of 55%) and clinical examination (sensitivity and specificity of 40% and 93%, respectively), and on the basis of these results, MRI should be the preferred modality for detection of PMI in patients with cervical cancer.

One of the strengths of the current meta-analysis is the relatively low degree of heterogeneity between the included studies. Except for the results from Higgins I2 statistics, which suggested that there may be moderate heterogeneity (I2 = 49.29% and 51.16% for sensitivity and specificity, respectively), all other results from various statistical methods indicated that the possibility of heterogeneity was low. On the other hand, in the earlier meta-analysis by Thomeer et al. [8], substantial heterogeneity was thought to be present among the studies assessing the diagnostic performance of MRI (I2 = 72.93% and 70.94%, respectively). The degree of heterogeneity in meta-analyses is important, as it may affect the general applicability of the results [14, 15]. Therefore, the small degree of heterogeneity among the studies in the current meta-analysis suggest that the good performance of MRI for detection of PMI may be generally applicable, and provide additional evidence for MRI to be used as a crucial modality in the FIGO staging system.

Several variables with regard to the MRI techniques were evaluated as potential sources of variation in the diagnostic performance of MRI. Magnet field strength, use of DWI sequences and administration of antispasmodic drugs were statistically significant factors, whereas coil type, slice thickness, imaging planes and use of CE-MRI sequences were not. Regarding magnet field strength, a previous meta-analysis reported that higher magnetic field (≥1.5- compared with <1.5-T) had a positive influence on detecting PMI in cervical cancer [8]. In our meta-analysis, we found that even at a greater threshold (3- vs. <3-T), higher field strength still demonstrates incremental value in the detection of PMI (p < 0.01). The pooled sensitivity was higher in studies using 3-T scanners than those using <3-T machines (0.84 vs. 0.66, respectively). We speculate that the improved performance when using 3-T scanners may be attributed to the higher spatial resolution, greater signal-to-noise ratio (for tumour and for cervical stroma), and greater tumour-to-cervical stroma contrast-to-noise ratio compared with using 1.5-T [31]. In addition, the pooled sensitivity and specificity were higher in studies that used DWI than in those that did not (0.82 vs. 0.72 for sensitivity; and 0.97 vs. 0.91 for specificity; p < 0.010). Although T2WI provides a high contrast between cervical cancer (high signal intensity) and cervical stroma (low signal intensity) for evaluation of PMI, DWI has been considered to have the potential for added value, as cervical cancer manifests with higher signal intensity on DWI with corresponding lower apparent diffusion coefficient values compared with the normal cervical stroma [32]. However, it is crucial to note that no study included our meta-analysis used DWI alone, but interpretation of PMI in studies using DWI was based on comprehensive evaluation of both T2WI and DWI. DWI by itself suffers from poor spatial resolution and anatomical detail, but using DWI as an adjunct to T2WI may improve the diagnostic performance of detecting PMI in cervical cancer. The use of antispasmodic drugs was also shown to be a significant factor affecting heterogeneity (p < 0.01). However, further statistical analysis was limited due to the fact that only one study reported that they did not use antispasmodic drugs. Therefore, caution is needed when interpreting the effect of antispasmodic drugs. Nevertheless, antispasmodic drugs are well known to decrease bowel motion artefacts from peristalsis, and are used in the majority of ESUR members for this reason [33].

Our meta-analysis had some limitations. First, it included only a relatively small number of articles (n = 14). However, this was mainly due to the fact that we only included studies since 2012 so that there was no overlap with previous meta-analyses in order to obtain updated diagnostic performance values. Nevertheless, we were able to acquire pooled estimates with relatively low heterogeneity from the included studies. Second, there was a lack of patients with advanced disease in most studies. Specifically, only five (35.7%) of the 14 studies included patients with FIGO stage IIB or higher. Pooling studies mostly including patients with low or intermediate stage disease may have led to a bias toward decreased sensitivity. This bias may have been more evident as the five studies that included high-stage disease had a relatively smaller number of patients (n = 29–45). Third, there were six studies that included patients that did not undergo radical hysterectomy. Therefore, we performed a subgroup analysis for studies that solely used radical hysterectomy as the reference standard, and obtained consistent results. Fourth, we used the performance values from the reader with highest experience when there were multiple readers. However, the inter-reader agreement was substantial or almost perfect with kappa values of 0.735 (between 20 and 11 years’ experience by Yu et al. [29]), 0.82 (between 14 and 4 years by Shin et al. [26]) and 0.86 (between 9 and 3 years by Park et al. [25]) [34]. Fifth, it should be noted that the ethnicity of the study population was based on the nationality of the institution. Although this may be generally correct, there could have been a minor population of non-Asian patients who underwent MRI examinations in hospitals in Asia, or vice versa.

Conclusion

MRI shows good performance for detection of PMI in patients with cervical cancer with a pooled sensitivity of 0.76 and specificity of 0.94. The use of 3-T scanners and DWI may further improve diagnostic performance.