Introduction

Cervical cancer is the fourth most common cause of cancer-related mortality in women worldwide, and squamous cervical cancer (SCC) accounts for 75–80% of cases [1]. With the popularity of cancer screening, the incidence of early-stage (IB-IIA) SCC (ESSCC) has increased dramatically during the past decades [1, 2]. Radical hysterectomy with bilateral pelvic lymph node dissection is the cornerstone of treatment for ESSCC [3]. Despite curative treatment, 25–30% of patients with ESSCC experience locoregional recurrence or distant metastasis [2]. Therefore, accurate prognosis prediction is essential for choosing the optimal surgical plan, postoperative adjuvant therapy, and the intensity of postoperative observation.

Federation of Gynecology and Obstetrics (Figo) staging is most commonly used to evaluate cervical cancer prognosis in clinical practice [1, 4]. However, Figo staging is easily affected by physician’s experience, and it is limited by the lack of tumor’s pathological characteristics. Some pathological characteristics such as tumor differentiation, lymph node metastasis, and lymphovascular space invasion (LVI) have been identified as adverse prognostic factors for cervical cancer in previous studies [5, 6]. Therefore, the new 2018 Figo staging incorporates the pathological finding into the disease staging [4]. However, the pathological information can only be obtained through invasive methods, and it is easily influenced by sampling error and interobserver variability. Therefore, there is a need for noninvasive prognostic indicators that provide quantifiable risk measures for patients with cervical cancer.

Given its excellent contrast resolution, magnetic resonance imaging (MRI) is the first-line imaging modality to diagnose, stage, and monitor cervical cancer [7, 8]. In addition to detailed anatomical information, MRI can describe the tumor microscopic characteristics through some functional MRI sequences, such as contrast-enhanced T1-weighted imaging (CET1WI) and diffusion-weighted imaging (DWI). A meta-analysis has revealed that DWI and its derived apparent diffusion coefficient (ADC) may be a suitable candidate indicator of recurrence in patients with cervical cancer [9]. However, MRI-derived quantitative metrics are usually expressed as average values, which ignore the heterogeneity of tumors.

Radiomics is a mathematical-statistical procedure that quantifies high-throughput features from medical images, and enables noninvasive profiling of tumor heterogeneity [10, 11]. In the field of cervical cancer, previous studies have verified that radiomics might be a promising tool in predicting tumor staging, LVI, and lymph node status [12,13,14,15]. Fang et al reported an MRI-derived radiomics score as a prognostic biomarker in patients with cervical cancer [16]; however, only T2-weighted imaging (T2WI) and CET1WI were used, while functional DWI and ADC maps were not incorporated. Moreover, previous studies have usually performed the extraction of radiomics features based on the entire volumetric tumor region, but peritumoral information was ignored. Yet, some previous studies have shown the significant application of peritumoral radiomics in tumor assessment [17, 18].

Therefore, this study aimed to conduct multiparametric MRI-derived radiomics based on multi-scale tumor region for predicting disease-free survival (DFS) in ESSCC patients.

Materials and methods

Study population

Our institutional review board approved this retrospective study and written informed consent was not required. A total of 346 consecutive patients with suspicious cervical cancer for preoperative pelvic MRI assessment from March 2016 to September 2019 were retrospectively enrolled. The inclusion criteria were as follows: (1) patients underwent radical hysterectomy and bilateral pelvic lymph node dissection; (2) postoperative pathology confirmed to be ESSCC (Figo stage IB-IIA); (3) patients had no history of chemotherapy or radiotherapy before MRI examination; (4) image quality was adequate for analysis; (5) clinical data were complete. Finally, 191 patients (mean age, 50 years; range, 24–73 years) were included in the study. The patients were allocated to a training set and a validation set on basis of the time of surgery at 7:3 ratio. The training set consisted of 135 patients (mean age, 50 years; range, 24–73 years) from March 2016 to August 2018, whereas the validation set consisted of 56 patients (mean age, 50 years; range, 32–69 years) from September 2018 to September 2019. Detailed demographic characteristics are summarized in Table 1.

Table 1 Characteristics of patients with early-stage cervical cancer in the training and validation sets

Postoperative follow-up

All of the patients underwent radical hysterectomy and bilateral pelvic lymph node dissection. Thirty-two patients (training set, n = 22; validation set, n = 10) received adjuvant chemoradiotherapy postoperatively. The patients were routinely followed up every 3–6 months during the first 2 years, 6 months for the next 3–5 years, and then once a year thereafter. Disease-free survival (DFS) was defined as the period from the date of surgery to the date of first locoregional recurrence, distant metastasis, or last follow-up. Locoregional recurrence and distant metastasis were screened by means of gynecological examination and cross-sectional imaging such as computed tomography (CT), MRI, and positron emission tomography CT, or proven by biopsy [1, 2].

MRI acquisitions and parameters

All of the patients were examined using 1.5-T MRI (Achieva, Philips Healthcare) using a 16-channel matrix torso coil. For contrast scanning, a gadolinium contrast (Omniscan, GE HealthCare) was intravenously administered at 0.1 mL/kg at 2.5 mL/s, followed by 20 mL of saline flushing. Detailed MRI scan parameters are listed in Table S1.

Clinical and pathological information

Clinical information including age, blood routine examination (erythrocytes, neutrophils, platelets), human papillomavirus (HPV) status, and serum gynecological tumor markers (CEA, CA125, CA199, and squamous cell carcinoma antigen (SCCA)) was obtained from medical records [1]. Age was categorized by 50 and 55 (years) separately to obtain a binary variable. Tumor size was determined as the longest diameter of the lesion with the maximum tumor size. Pathological information included tumor differentiation, lymph node metastasis, LVI, and invasion depth. The number of lymph node metastases was categorized by two. Figo staging was determined following the 2018 Figo staging guidelines [4].

Multi-scale tumor segmentation and radiomics feature extraction

Tumor segmentation was conducted by reader 1 (with 6 years of experience in gynecological MRI) and confirmed again by reader 2 (with 8 years of experience in gynecological MRI). Both readers were blinded to clinical information and pathological results.

The workflow of our study is shown in Fig. 1. Tumor segmentation was performed semi-automatically using syngo.via Frontier Radiomics (Siemens Healthcare) on axial oblique T2WI, DWI, ADC, and sagittal CET1WI [19]. The software allowed semi-automatic entire volumetric interest (VOIentire) segmentation through integrated hybrid algorithm combining a threshold-based approach with model-based morphological processing [19, 20]. The method consisted of three parts: (1) identification of points on the tumor boundary by region growing and subsequent ray casting from the seed point; (2) calculation of an ellipsoid that approximates the shape of the tumor; (3) convex hull of the inverse mask within the dilated ellipsoid [20]. Then, regions at a 5-mm distance outer (VOI+5 mm) and inner (VOI−5 mm) from the tumor surface were automatically reconstructed using specified dilate and erode tools, which applied the morphological operations of erosion and dilation to the target VOIentire. Finally, three types of VOIs were generated (VOIentire, VOI+5 mm, and VOI−5 mm), and the contour of each VOI was locally refined mainly in the involved tumors with obscure boundary. The process of multi-scale tumor segmentation is depicted in Fig. 2.

Fig. 1
figure 1

The study flowchart and the radiomics workflow

Fig. 2
figure 2

VOI segmentation for the radiomics analysis. A radiologist first segmented the whole tumor region (VOIentire) semi-automatically with the help of the software (red line). On basis of VOIentire of the tumor, the regions at 5 mm outer (VOI+5 mm, blue line) and inner (VOI−5 mm, green line) from the tumor boundary surface were automatically reconstructed, respectively, using the object operations module integrated in the software

After tumor segmentation, radiomics features from each VOI were automatically computed using syngo.via Frontier Radiomics interfaces with the PyRadiomics library [21]. Radiomics features, including 17 shape-based features, 18 first-order statistical features, and 75 textural features, were extracted from the original image set. Textural features included five classes (24 Gray Level Co-occurrence Matrix (GLCM), 14 Gray Level Dependence Matrix (GLDM), 16 Gray Level Size Zone Matrix (GLSZM), 16 Gray Level Run Length Matrix (GLRLM), 5 Neighbouring Gray Tone Difference Matrix (NGTDM)). Detailed information on the texture features is listed in Table S2. Default parameter settings are available from the website (https://github.com/AIM-Harvard/pyradiomics/tree/master/examples/exampleSettings). To obtain high-throughput features, the non-linear intensity transforms on image voxels (square, square root, logarithm, and exponential); Laplacian of Gaussian (LoG) filtering with sigma values of 0.5, 1.5, 2.5, 3.5, and 4.5 mm; and eight wavelet transformations algorithms (LLL, LLH, LHL, LHH, HLL, HLH, HHL, and HHH) were conducted for first-order statistical and textural features (Table S3). Standardization with z scores was performed for all extracted radiomics features to eliminate the impact of the different ranges of gray values.

Radiomics feature selection and three radiomics scores’ calculation

A two-step procedure was applied for high-dimensional radiomics feature selection. First, 75 patients were randomly selected for test and re-test and intraclass correlation coefficient (ICC) was calculated to evaluate the reproducibility of the features. Features with ICC higher than 0.8 were remained in the subsequent analysis. Second, least absolute shrinkage and selection operator (LASSO) Cox regression was applied for radiomics feature selection and radiomics scores’ (rad-scores) construction. LASSO Cox regression compressed the coefficients of useless features to zero with the regulation parameter λ through tenfold cross-validation [16]. Rad-scores were then constructed by a linear combination of final selected robust features with their respective coefficients.

On account of three different types of VOIs, three radiomics scores (rad-score (VOIentire), rad-score (VOI+5 mm), and rad-score (VOI−5 mm)) were finally built through the above-mentioned procedure, respectively.

Establishment, performance, and validation of clinical and combined models

Together with the three constructed radiomics scores, clinical predictors of recurrence with statistical significance in the univariable Cox regression analysis were included in multivariate Cox regression analysis. Backward stepwise variable selection was implemented with the Bayesian information criterion (BIC) [22]. Variance inflation factor (VIF) was checked for each variable included in the final combined model [23]. The clinical model, including only the clinical risk factors, was also built for comparison.

DFS probabilities were estimated using the Kaplan–Meier method and compared using the log-rank test [24]. Prognostic predictive ability of the models was assessed and compared using C-index calculated as prognostic performance indicator [25]. Time-dependent receiver operating characteristic (t-ROC) analysis with the areas under the curve at different time points was applied to assess prognostic accuracy [26]. Calibration curves were used to evaluate calibration precision. Internal validation was tested in an independent validation cohort.

Clinical utility

The clinical usefulness of the constructed models was evaluated by decision curve analysis through quantifying the net benefits at different threshold probabilities for the entire set [27]. The y-axis measured the net benefit, calculated by summing the benefits (true positive values) and subtracting the harms (false positive values). The x-axis represented the percentage of threshold probability, i.e., probability of positive clinical outcome (locoregional recurrence, distant metastasis, and death). “All” or “none” referred to the situations when all or none of the patients had positive clinical outcome (locoregional recurrence, distant metastasis, and death) [28].

Statistical analysis

Statistical analyses were conducted using SPSS (version 23.0, SPSS), MedCalc (version 15.0, MedClac), and statistical packages R (The R Founding: http://www.r-project.org; version 3.4.3). A t test or Mann–Whitney U test was used to compare numerical variables (age, erythrocyte count, neutrophil count, platelet count, serum levels of gynecological tumor markers, tumor size, rad-scores, DFS time) between different groups. The chi-square test was used to compare categorical variables (age categorized by 50, age categorized by 55, HPV, Figo stage, differentiation, lymph node metastases, number of lymph node metastasis, LVI, invasion depth, additional chemotherapy) between different groups. LASSO Cox regression was implemented using the “glmnet” and “rms” packages. The Kaplan–Meier survival analysis was performed using the “survival” package and the comparisons were done using the log log-rank test. The “Hmisc” package was used for comparisons between C-indexes. The “rmda” package was used for decision curve analysis. A two-sided p value less than 0.05 was indicative of statistically significant difference.

Results

Clinical characteristics

Detailed baseline characteristics of the patients are shown in Table 1. There were no significant differences in any of the clinical and pathological characteristics between the training set and the validation set (p range, 0.093 to 0.999). The median duration of the follow-up was 29 months (interquartile range, 16–40 months).

Radiomics feature selection and three radiomics scores’ construction

In total, 1187 T2WI features, 971 DWI features, 1054 ADC features, and 1092 CET1WI features with ICC higher than 0.8 remained. Reproducibility of the extracted features is depicted in Fig S1. Owing to multi-scale tumor segmentation, three different radiomics scores (rad-score (VOIentire), rad-score (VOI+5 mm), and rad-score (VOI−5 mm)) were respectively built through LASSO Cox regression analysis. Rad-score (VOIentire) contained three T2WI features, one DWI feature, two ADC features, and three CET1WI features. Rad-score (VOI+5 mm) contained one T2WI feature, two DWI features, one ADC feature, and four CET1WI features. Rad-score (VOI−5 mm) contained two T2WI features and one ADC feature. The process of LASSO Cox regression is shown in Fig S2. The formulae of the three radiomics scores are detailed in Appendix 1. Detailed information regarding the final selected single radiomics features in each of the three radiomics scores is shown in Table S4.

DFS prediction performance of the three radiomics scores

The three radiomics scores showed favorable performances in DFS prediction with C-index values over 0.700, both in the training set and in the validation set. Rad-score (VOI+5 mm) performed best, with a C-index of 0.750 (95% CI: 0.658, 0.842) in the training set and 0.839 (95% CI: 0.696, 0.983) in the validation set. Two presented ESSCC patients, who had distinctly different DFS times (49.6 months vs 13.8 months) with similar clinicopathological features, showed significantly different rad-score (VOIentire) (− 1.9 vs 1.3; p < 0.001), rad-score (VOI+5 mm) (− 1.7 vs 2.5; p < 0.001), and rad-score (VOI−5 mm) (− 2.3 vs 0.6; p < 0.001) (Fig. 3).

Fig. 3
figure 3

Two presented cases of ESSCC patients who had distinctly different DFS time (49.6 months vs 13.8 months) with similar clinicopathological features showed significantly different rad-score (VOIentire) (− 1.9 vs 1.3; p < 0.001), rad-score (VOI+5 mm) (− 1.7 vs 2.5; p < 0.001), and rad-score (VOI−5 mm) (− 2.3 vs 0.6; p < 0.001)

The performances of the final selected single radiomics features in the three radiomics scores for 2-year DFS probability are shown in Table S4. The distribution of three built rad-scores and their correlation with DFS are shown in Figure S3. The predictive performances of the three radiomics scores in predicting DFS and 2-year DFS probability are shown in Table S5.

Development, performance, and validation of clinical and combined model

Three clinicopathological characteristics (age categorized by 55, Figo stage, and LVI) and three established radiomics scores (rad-score (VOIentire), rad-score (VOI+5 mm), and rad-score (VOI−5 mm)) were predictive of recurrence in univariable Cox analysis (Table 2). With application of stepwise multivariable Cox analysis with the lowest BIC score, age categorized by 55, Figo stage, LVI, and rad-score (VOI+5 mm) were identified as independent predictors of recurrence (Table 2).

Table 2 Univariable and multivariable Cox regression analyses for association with DFS

Age categorized by 55, Figo stage, and LVI were finally selected to build the clinical model. For DFS prediction, the clinical model possessed a C-index of 0.709 (95% CI: 0.624, 0.784) in the training set and 0.719 (95% CI: 0.583, 0.831) in the validation set. The combined model was constructed after combining these three clinicopathological characteristics with rad-score (VOI+5 mm) and was presented as a nomogram (Fig. 4a). The combined model demonstrated good predictive performance, with a C-index of 0.815 (95% CI: 0.739, 0.876) in the training set and 0.866 (95% CI: 0.748, 0.942) in the validation set. The combined model performed better in DFS prediction than the clinical model, both in the training set and in the validation set (p = 0.024; p = 0.001, respectively). The predictive performances of the clinical and the combined models for predicting DFS and 2-year DFS probabilities are summarized in Table 3 and shown in Fig S4. The predictive performances of the combined model in predicting DFS at multiple time points are detailed in Table 4. Time-dependent ROC curves of the combined model in DFS prediction are depicted in Fig. 4b–c. Good calibration of the combined model was confirmed by calibration curves (Fig. 4d–e).

Fig. 4
figure 4

The combined model was constructed and presented as a nomogram (a). To use the nomogram, add corresponding points for four variables (vertical projection to a point to the first graph scale for each variable) and locate it on the total points scale. Vertical project from the total points scale to the bottom risk scale and get the probability of DFS. The linear predictor scale is the coordinate value of the linear predictive value. Through corresponding transformation function, the linear predictive value can be converted to the probability of DFS. The graphs show areas under time-dependent ROC curves of the nomogram for DFS in the training set (b) and validation set (c). The calibration curves demonstrated good calibration of the nomogram in the training set (d) and validation set (e)

Table 3 Predictive performance of clinical and combined model in predicting DFS and 2-year DFS probability
Table 4 Predictive performance of the combined model in predicting DFS at multiple time points

Kaplan–Meier analysis

The Kaplan–Meier analysis was applied to further validate prognostic value of the selected clinicopathological and radiomics characteristics. Age > 55, Figo IIA, presence of LVI, and higher risk of rad-score (VOI+5 mm) were all significantly associated with worse DFS in the entire cohort (p = 0.001; p = 0.001; p = 0.001; p < 0.001, respectively). The Kaplan–Meier cumulative event curves for ESSCC recurrence stratified by age categorized by 55, Figo stage, LVI, and rad-score (VOI+5 mm) are depicted in Fig. 5.

Fig. 5
figure 5

The Kaplan–Meier cumulative event curves for survival in ESSCC stratified by age categorized by 55 (a), Figo stage (b), LVI (c), and rad-score (VOI+5 mm) (d)

Clinical practice

The combined model showed good positive net benefits across wide ranges of probability thresholds. When the threshold probability was 0.286 and 0.500, the net benefit value was 0.500 and 0.387, respectively. When the threshold probability was over 0.103, the combined model added more benefit than in the situations where all or none of the patients had positive clinical outcome and also more than the clinical model (Fig. 6). This demonstrated the favorable clinical utility of the combined model in aiding clinicians to determine adjuvant treatment (chemotherapy) in patients with ESSCC according to the risk of recurrence.

Fig. 6
figure 6

Decision curve analysis for the clinical model and the combined model in predicting DFS. The y-axis stands for the net benefit, and the x-axis stands for the threshold probability. The decision curves indicated that if the threshold probability was over 0.103, the combined model (green line) added more benefit than the situation when all (gray line) or none (black line) of the patients had positive clinical outcome and more than the clinical model (red line)

Discussion

In this study, we conducted multiparametric MRI-derived radiomics based on multi-scale tumor region to predict DFS in ESSCC patients. Encouragingly, the combined model, which incorporated radiomics features with clinicopathological characteristics, exhibited improved prognostic performance and better clinical usefulness compared with the clinical model alone. The combined model presented as a nomogram can be easily used to predict survival, thereby facilitating the choice of initial surgery, postoperative adjuvant radiotherapy, and the intensity of postoperative observation.

Radiomics has been recognized as a useful imaging technology tool in predicting survival of patients with cervical cancer [16, 29]. Fang et al [16] verified an MRI-derived radiomics score as a prognostic biomarker in patients with early cervical cancer. Lucia et al confirmed PET/CT and MRI radiomics as independent predictors of recurrence in locally advanced cervical cancer treated with chemoradiotherapy [29]. However, previous studies have conducted tumor segmentation using a completely manual approach which was limited by poor inter-reader reliability. Besides that, feature extraction was based on the entire volumetric tumor region, while peritumoral information was ignored. Instead of the manual approach, we performed tumor segmentation semi-automatically using an integrated hybrid algorithm. This method was less affected by hand-related artifacts. Apart from the entire volumetric interest (VOIentire) segmentation, the regions at a 5-mm distance outer (VOI+5 mm) and inner (VOI−5 mm) from the tumor surface were also reconstructed to obtain multi-scale tumor information. Consistent with other studies [16, 29], all of the three types of rad-scores all showed favorable performances in DFS prediction. These results indicate that radiomics may be a promising tool in assisting prediction of survival in ESSCC patients. Notably, rad-score (VOI+5 mm) performed best both in the training set and in the validation set. Moreover, rad-score (VOI+5 mm) was identified as an independent predictor of survival in multivariable Cox regression analysis. These results indicate that radiomics features from the peritumoral region reflecting the biological properties associated with intratumor heterogeneities [17, 18] may provide more information in prediction of cervical cancer prognosis.

Besides those from conventional structural images, radiomics features from DWI, ADC, and CET1WI were also analyzed in our study. In total, eight radiomics features (including one feature extracted from T2WI, two from DWI, one from ADC map, and four from CET1WI) were identified to be predictive of DFS in rad-score (VOI+5 mm). Shape-based MajorAxisLength is a measurement of the largest axis length of the tumor enclosing ellipsoid [21]. A larger MajorAxisLength indicates a larger tumor size, which is consistent with previous studies that have shown that tumors with larger size have poor prognosis [5, 7]. Texture-based features (LargeAreaHighGrayLevelEmphasis, GrayLevelNonUniformity, DependenceNonUniformity, DependenceVariance, GrayLevelNonUniformity, LargeAreaHighGrayLevelEmphasis, and ZoneVariance) describe the distribution of voxel signal intensities, which correlates with tumor heterogeneity [21]. Our results are in agreement with previous studies that have reported that intratumor heterogeneity was tightly correlated with invasion and metastasis of ESSCC [12, 16, 29]. Importantly, a half of the selected features (4 of 8) were derived from CET1WI; they indicate that intratumoral and peritumoral tumor enhancing features that are potentially associated with tumor perfusion and vascularization may provide more prognostic information for ESSCC patients [30, 31].

By combining the clinical risk factors and radiomics features, we finally established the radiomics nomogram containing age categorized by 55, Figo stage, LVI, and rad-score (VOI+5 mm) to predict DFS in ESSCC patients. Consistent with previous studies [4,5,6], older age, higher Figo stage, and existence of LVI were regarded as adverse prognostic factors. Notably, the radiomics nomogram possessed favorable predictive ability and it added significant incremental performance to the clinical model. Hence, it indicated the prognostic value of multiparametric MRI-derived radiomics in predicting DFS for ESSCC patients.

The current study had several limitations. First, this study was performed in a single center and the sample size was relatively small. Further multi-center studies with larger sample size are needed. Second, 32 of 191 patients received additional chemotherapy in addition to hysterectomy, which may have influenced survival. Third, some acquisition parameters differed from the current recommendations in Europe [7], which may cause potential bias. The slice thickness of T2WI and DWI was 5 mm, which was thicker than recommended (4 mm or less). Field of view of T2WI (240 × 240 mm) and DWI (380 × 380 mm) were not matched, which may have influenced side-by-side interpretation. Fourth, dynamic contrast-enhanced scan was not used; however, its derived parameters can potentially be used to identify tumor hypoxia regions tightly correlated with tumor migration, invasion, and metastasis [31]. This may have influenced the generalizability of our results. Fifth, in-depth tumor biological behavior was not analyzed and correlated with radiomics features. Radiomics feature explanation at a biological level should be conducted in further radiogenomics studies. Sixth, deep learning method was not applied due to the limited sample size. In the future, studies combining the use of deep learning and radiomics method for better prediction of DFS for ESSCC patients are needed.

In conclusion, multiparametric MRI-derived radiomics based on multi-scale tumor region is useful in predicting DFS in ESSCC patients, thereby facilitating clinical decision-making.