Introduction

Colorectal cancer is the second leading cause of cancer-related deaths globally, with rectal cancer alone accounting for one-third of these cases [1]. Extramural venous invasion (EMVI) is defined as the presence of tumor cells in blood vessels located beyond the muscularis propria in the mesorectal fat. Interstitial adipose tissue around the tumor is at an increased risk of vascular invasion, which, in turn, significantly increases the risk of distant metastasis. Hence, EMVI can be considered a significant predictor of both the local recurrence and distant metastasis in rectal cancer [2, 3]. Therefore, EMVI is used for routine assessment and to identify risk stratification indicators in rectal cancer [4]. Accurate identification of the EMVI status is crucial for treatment decisions and prognosis.

Multiparameter magnetic resonance imaging (mpMRI) is the first choice for early noninvasive assessment of rectal cancer and detection of EMVI [5]. However, mpMRI has limited spatial resolution. It cannot be used to accurately diagnose the invasion of vessels smaller than 3 mm [6]. It is also sometimes difficult to distinguish such small blood vessels from small lymphatic vessels and peritoneal reverse folds [7]. In addition, inflammation, edema, and fibrosis may also adversely affect the mrEMVI evaluation [8]. Therefore, visual assessment based on mpMRI alone may not be sufficient to accurately identify EMVI. Hence, there is an urgent need to develop an objective, noninvasive, and accurate method for the preoperative evaluation of EMVI.

Radiomics uses big data mining techniques to analyze the correlation between radiological characteristics and pathological data. Hence, it is a powerful tool to provide decision support in oncology [9, 10]. Radiomics has been successfully used for the diagnosis, treatment, and prognosis of rectal cancer [11]. A recent study has also shown that radiomics is a superior tool to predict the occurrence of EMVI in rectal cancer [12]. However, such studies used only a single sequence. In contrast, mpMRI can often provide more useful information [13]. Additionally, predictive and prognostic models are important in radiomics [14]. Highly accurate and reliable models are needed in clinical practice to improve the decision-making process, which can be achieved through machine learning algorithms [15]. Accordingly, we hypothesize that more valuable radiomics features could be extracted using mpMRI and a new model can be constructed through machine learning for better prediction and stratification of the EMVI state.

The purpose of this study is to apply mpMRI-based radiomics to preoperatively predict the EMVI status of rectal cancer using different machine learning algorithms to build the best radiomics signature and to develop and validate the joint model by combining the radiomics signature and clinical and radiological characteristics.

Materials and methods

Patients

The study design was approved by our institutional ethics committee and the need for informed consent was waived off (No. 2021QT211). For this retrospective study, 1123 patients with rectal cancer confirmed between January 2017 and January 2021 were identified from a picture archiving and communication system (PACS). Among these patients, 317 were finally selected for the study based on the following inclusion criteria: (1) pathologically confirmed rectal cancer; (2) complete clinical and radiological data; (3) no history of other malignant tumors; and (4) no preoperative antitumor treatments. The exclusion criteria were as follows: (1) incomplete pathological data; and (2) poor imaging quality of mpMRI. In addition, the patients diagnosed between January 2017 and December 2019 were grouped into a training set (n = 221). This training set was used to test the robust features of radiomics and construct a model. Those diagnosed between January 2020 and January 2021 were grouped into a test set (n = 96) for validating the reliability of the model. A flowchart for patient recruitment is shown in Fig. 1.

Fig. 1
figure 1

Flowchart for recruitment of patients in this study (Note. PACS, picture archiving and communication system; EMVI, extramural venous invasion; T stage, tumor stage)

Clinical and radiological data

The clinical and radiological data of all patients were retrospectively analyzed from our PACS, including data on age, gender, carcinoembryonic antigen (CEA) level, EMVI status, mpMRI-based radiological tumor (T) stage, lymph node (N) stage, tumor long diameter, transverse diameter, anteroposterior diameter, tumor volume, distance (DIS), circumferential resection margin (CRM), anal canal invasion (ACI), and mrEMVI status [16]. The postoperative pathological tissue was used as the reference standard for EMVI status. These features were independently assessed by two experienced radiologists. For the purpose of this study, the quantitative measurements obtained by the two radiologists were averaged for further analysis. For qualitative parameters, these two radiologists carefully reviewed all the images until a consensus was reached. Detailed information can be found in the Supplementary Materials.

Image preprocessing and segmentation

All patients underwent mpMRI examination, which was performed using a 3.0-T MRI scanner (Skyra; Siemens Healthineers). The image protocols and detailed parameters are provided in Supplementary Materials and Table S1. Noncommercial A.K. Software (Analysis Kit, GE Healthcare) was used for image preprocessing and registration of T2-weighted imaging (T2WI), T1-weighted imaging (T1WI), diffusion-weighted imaging (DWI), and enhanced T1-weighted imaging (T1 + C) sequences before extracting the features in order to reduce the potential influence of the parameters of a scanning scheme. The T2WI sequences on the axial plane were collected on an oblique-axial plane perpendicular to the rectum axis. Image preprocessing was performed by resampling the images with a resolution of 1 × 1 × 1 mm3 through the linear interpolation method and by discretizing and normalizing the image gray level to order 32. Then, the registration function of the A.K. Software was used to adopt T2WI as the template for rigid registration of all sequences to ensure that the four sequences contained the same resolution, spacing, and origin. The standardized T2WI images were imported into the ITK Software to segment the entire rectal tumor layer by layer to determine the volume of interest (VOI). Depending on the registration of different sequences, T1WI, DWI, and T1 + C can share the same VOI obtained from T2WI. Finally, the VOIs were imported into A.K. Software for feature extraction.

Extraction and selection of radiomics features

We extracted 378 radiomics features from each sequence, including 42 histogram features, 10 Haralick features, 9 form factor features, 126 Gray-level co-occurrence matrix features, 180 run-length matrix features, and 11 Gray-level size zone matrix features. Four sequences were scanned in one patient, affording 1512 radiomics features per patient. In addition, tumor segmentation was manually delineated independently by two radiologists (radiologist A and radiologist B) using ITK software to ensure the stability and accuracy of the radiomics features, and feature set A (from radiologist A) and feature set B (from radiologist B) were obtained. Spearman’s rank correlation test was used to calculate the correlation coefficient (CC) of each feature in sets A and B. Features with CC > 0.8 were considered robust.

Construction and validation of radiomics signatures

Dimension reduction of the robust features selected was performed using the training set. More information can be found in the Supplementary Materials. On the basis of the retained features, we selected five machine learning algorithms—logistic regression (LR), support vector machine (SVM), Bayes, k-nearest neighbor (KNN), and random forests (RF)—to construct radiomics signatures. To select the best machine learning algorithm, we used the relative standard deviation (RSD) and Bootstrap method to quantify the stability of the five algorithms. The machine learning algorithm with the minimum RSD value was selected as the best algorithm to construct the radiomics signature. Detailed information about the RSD can be found in the Supplementary Materials. Finally, to quantify the signature discriminability, a machine learning score of each patient was calculated using the radiomics signature model. This result reflected the possibility of EMVI and was defined as the RAD score.

Model construction and evaluation

Multivariate logistic regression analysis and backward stepwise selection method with the stopping rule based on Akaike’s information criterion (AIC) were conducted to select independent predictors from clinical and radiological variables, based on which a joint model was built. To verify the improvement in the performance of the model after including the radiomics signature, we used the selected independent predictors to construct different combined models. We used the area under the receiver operating characteristic (ROC) curves (AUC) to evaluate the performance of different models. In addition, we used the DeLong test to determine the difference between the joint model and other combined models. To assess the clinical efficacy of the joint model, we developed a visual nomogram to calculate the probability of EMVI for each patient. Finally, we used the Hosmer–Lemeshow test to analyze the goodness-of-fit of the nomogram and employed the calibration curve to visually assess the consistency between the predicted and actual EMVI probabilities. We used the optimal cut-off value corresponding to the Youden index of the ROC curve as the threshold. Using the EMVI probability of each patient, we grouped patients with negative mrEMVI into high-risk and low-risk groups and compared their pathological EMVI results. The construction and evaluation of different models are shown in Fig. 2.

Fig. 2
figure 2

Construction and evaluation of different models

Statistical analysis

Statistical analyses were performed with SPSS software (version 24.0), MedCalc software (version 11.2), and Python (version 3.5). The continuous variables were compared by performing a two-sample t-test or the Mann–Whitney U test, and the categorical variables were compared by a chi-square test. The ROC curves were used to evaluate the predictive performance of different models. The metrics included AUC, sensitivity, and specificity. All statistics were two-way, and the statistical significance was set at p < 0.05.

Results

Characteristics of the patients

No statistical differences were noted between the clinical and radiological characteristics of the training and test sets (p = 0.159–0.904), as shown in Table 1. In contrast, statistical differences were observed in CEA, mrEMVI, and tumor stage of both the EMVI and non-EMVI groups in the training and test sets (p < 0.05). In addition, we observed statistical differences between the EMVI and non-EMVI groups in terms of the lymph node (p = 0.017) and CRM (p < 0.001) status in the test set. The characteristics of the patients in the EMVI and non-EMVI groups in the training and test sets are detailed in Table 2.

Table 1 Clinical and radiological characteristics of patients in the training and test sets
Table 2 Characteristics of patients in the EMVI and non-EMVI groups in the training and test sets

Performance of radiomics signatures

Of the 1512 radiomics features in the training set, we selected 793 as robust features and used them for dimension reduction. Finally, 20 features were retained from the four sequences—T2WI (n = 4), T1WI (n = 6), DWI (n = 7), and T1 + C (n = 3)—to construct the radiomics signatures. These features have been detailed in the Supplementary Materials. The RSD values of the radiomics signatures based on LR, SVM, Bayes, KNN, and RF were 2.72, 8.13, 2.43, 7.46, and 11.01, respectively. Therefore, Bayes was chosen as the machine learning algorithm for constructing the radiomics signature in this study. The Bayes-based radiomics signature performed well in both the training and test sets, with the AUCs of 0.744 and 0.738, sensitivities of 0.754 and 0.728, and specificities of 0.887 and 0.918, respectively, as shown in Fig. 3.

Fig. 3
figure 3

Density distribution of the area under the receiver operating characteristic (AUC) curve of radiomics signatures constructed by five machine learning algorithms (a) and diagnostic efficiency of the Bayes-based radiomics signature for predicting extramural venous invasion in the training set (b) and the test set (c)

Model construction and comparison

CEA (p = 0.043), mrEMVI (p = 0.039), transverse diameter (p = 0.003), and radiomics signature (p < 0.001) were determined using multivariate logistic regression analysis and selected as independent predictors (Table 3) to construct the joint model and develop the visual nomogram (Fig. 4a). Calibration curves of the joint model for predicting the EMVI demonstrated good agreement with the ideal curve in both the training set (Fig. 4b) and the test set (Fig. 4c). The Hosmer–Lemeshow test yielded no significant difference between the predictive calibration curve and the ideal curve for EMVI prediction in the training set (p = 0.745). In addition, the joint model performed better than MRI in predicting the EMVI, as shown in Fig. 5.

Table 3 Results of univariate and multivariate logistic regression analyses
Fig. 4
figure 4

Visual nomogram based on the joint model (a). Calibration curves of the joint model for predicting extramural venous invasion in the training set (b) and the test set (c), which demonstrated good agreement with the ideal curve (Note. mrEMVI, MRI-based extramural vascular invasion; CEA, carcinoembryonic antigen)

Fig. 5
figure 5

Two cases with rectal cancer and positive extramural venous invasion (EMVI) at histopathologic examination. ad One case had obvious EMVI on MRI, which could also be detected by the joint model. eh The other was EMVI detected by the joint model, which was not apparent on MRI

The radiological model was constructed using mrEMVI and transverse diameter; the clinicoradiological model was constructed using CEA, mrEMVI, and transverse diameter; and the clinicoradiomic model was constructed using CEA and radiomics signature. Among the four different models, the joint model performed best with the AUCs of 0.839 and 0.835, sensitivities of 0.633 and 0.714, and specificities of 0.901 and 0.885 in the training and test sets, respectively. The DeLong test showed that the joint model was statistically different from the other three combined models in both the training and test sets (p < 0.05), highlighting the improved predictive performance of the radiomics signature, as shown in Table 4 and Fig. 6a, b.

Table 4 Predictive performance of different models in the training and test sets
Fig. 6
figure 6

Diagnostic performance for extramural venous invasion prediction of different models in the training (a) and test (b) sets. Negative mrEMVI patients were divided into high-risk and low-risk cases according to the nomogram. The probability of pathological EMVI in the high-risk group was significantly higher than that in the low-risk group in both the training (c) and test (d) sets. (Note. * means p < .05; EMVI, extramural venous invasion; mrEMVI, MRI-based extramural vascular invasion)

The optimal threshold of 0.5545 was selected in the ROC analysis based on the Youden Index of the joint model for patients with negative mrEMVI. Patients with model scores of > 0.5545 were predicted as high-risk cases, while those with scores of < 0.5545 were predicted as low-risk cases. The number of patients with a positive pathological EMVI status in the low-risk and high-risk groups in both the training and test sets was significantly different, indicating the clinical applicability of the nomogram, as shown in Fig. 6c, d.

Discussion

We compared mpMRI-based radiomics for preoperatively predicting the EMVI status of rectal cancer using five machine learning algorithms and observed that the Bayes-based radiomics signature performed well. Our results showed that radiomics can be used for predictions, which further validated Zech et al.’s proposal that the EMVI can be used as a new imaging biomarker for the prognosis of rectal cancer [3]. In addition, the joint model showed significantly improved prediction performance for EMVI. The nomogram can also provide good classification and recognition for patients with the negative mrEMVI status, which may be used as a convenient and accurate tool to identify and predict the EMVI.

The mrEMVI assessment mainly depended on the scanning technique and subjective imaging analysis [17]. However, radiologists often find it difficult to use, which may lead to misdiagnosed prediction of EMVI [18]. Our study also confirmed this phenomenon. The radiological model based on mrEMVI showed poor performance in the preoperative prediction of EMVI. In contrast, the joint model demonstrated the best diagnostic performance in both the training set (AUC = 0.839) and the test set (AUC = 0.835). Encouragingly, the joint model can also afford good classification and recognition in patients with negative mrEMVI, further demonstrating its superiority, which may also be due to its high specificities (90.1% and 88.5% in the training and test sets, respectively). Of course, the joint model showed such a superior performance because of the inclusion of the radiomics signature. In fact, the radiomics signature itself outperformed the radiological model in the test set (AUC: 0.738 vs 0.647). The application of machine learning algorithm and multiparameter radiomics features was the main reason for the better diagnostic performance of the radiomics signature.

Machine learning algorithms have been widely applied in the field of radiomics, which greatly improve diagnostic performance [19]. The LR is currently the most widely used machine learning algorithm because of its simplicity. Although the diagnostic efficiency of the LR is higher than that of the Bayes, the latter was more stable than the former in our study. The stability of the radiomics model is also highly important for its clinical application. Therefore, we chose the Bayes to construct the radiomics signature. In addition, multiparameter features may contain more information, which allows for a more comprehensive characterization of the tumor [20]. The sensitivity and specificity of the radiomics signature for EMVI prediction as observed in our study are significantly higher than those reported by Roberto et al. using DWI and T2WI [21], which further demonstrated the great potential of the multiparameter radiomics features.

Previous studies have identified the tumor size, location, T staging, and N staging as the risk factors for EMVI [22, 23]. In contrast, our study showed that only the transverse diameter, which is related to the tumor size, can be used as an independent predictor of EMVI. This result indirectly showed that the transverse diameter may help in determining the impact of tumor size on prognosis, further validating the findings of Yoshimoto et al. that the tumor diameter can be used as a prognostic indicator of colorectal cancer [24]. Our study also showed that CEA is an independent predictor of EMVI. CEA, a large glycoprotein, has been proposed as a prognostic biomarker that can be used to determine the prognosis and stage of colorectal cancer [25, 26]. Although mrEMVI is still the most favorable independent predictor, it is important to note that the radiological model based on mrEMVI and transverse diameter performed poorly in predicting EMVI, perhaps owing to the inherent defects of the visual subjective imaging analysis. Despite this drawback, it cannot be denied that mrEMVI is still one of the important contents routinely used in mpMRI-based imaging analysis [27].

A previous study [28] has reported that the computed tomography (CT)–based superior hemorrhoidal vein diameter had a better discrimination power in predicting the EMVI (AUC = 0.83, sensitivity = 88.2%, specificity = 94.6%) than our joint model. However, our model may be more suitable for clinical practice than CT, because the latter causes radiation damage. Aysegul et al. used changes in the diameters of the superior rectal vein (SRV) and inferior mesenteric vein (IMV), and apparent diffusion coefficient (ADC) values for EMVI prediction [29]; their AUC values were 0.851, 0.893, and 0.664, respectively. Although the AUCs of CT-based SRV and IMV were higher than those of our joint model, the AUC of the MRI-based ADC value was significantly lower than that of our model. In addition, the specificities of our joint model (90.1% and 88.5% in the training and test sets, respectively) are significantly higher than those reported by Aysegul et al. (67.9%, 71.4%, and 57.1% for SRV, IMV, and ADC, respectively). A previous study showed that functional imaging such as DWI cannot improve the efficiency of EMVI detection [30]. Yu et al. combined DCE and clinical-pathological factors to construct a radiomics model to predict EMVI with an AUC of 0.904, sensitivity of 90.5%, and specificity of 79.2% [12]. Although this result was significantly higher than that reported in the present study, it should be noted that the AUC was only 0.812 in the test set, indicating the poor stability of the model used by Yu et al. The AUC of our joint model reached 0.835 in the test set, probably because of its higher stability. In fact, the radiomics signature showed greater variability than the clinical characteristics [31]. We used a stable machine learning algorithm and multiparameter radiomics features to overcome the variability in the radiomics signature of our study, which may provide a new idea for conducting future radiomics research in clinical practice.

In addition, our joint model has good clinical applicability, because the model construction is mainly based on noninvasive mpMRI examination and image analysis. The measurement of another construction feature, CEA, depends on the routine test item for each hospitalized patient. As the test cost is low, it is suitable for clinical development. We also built a nomogram based on the joint model, which enables clinicians to more conveniently and quickly quantifies the EMVI status of patients. Therefore, our study provides a reliable, convenient, and rapid tool to predict the EMVI status in clinical practice.

Our research has some limitations as well. Firstly, it is a retrospective study. However, the eligible patients were consecutively retrieved from a prospective database that included all patients with rectal cancer in our hospital. Secondly, the pathological evaluation could not be checked for consistency because of the retrospective design. Nevertheless, the pathologic EMVI status used for training the radiologist was basically reliable, although it was difficult to ensure that the pathologic EMVI status was correct for every patient. Thirdly, this research lacks external validation because of data confidentiality, which we aim to resolve in our future work. Finally, there is a discrepancy between the number of EMVI-positive and EMVI-negative patients. However, it did not have any effect on our results, because the proportion of EMVI-positive and EMVI-negative patients is approximately the same in both groups.

Conclusion

This study showed that mpMRI-based radiomics can improve the diagnostic performance of preoperative EMVI prediction in patients with rectal cancer, especially for inexperienced radiologists and residents. The visual nomogram based on the radiomics signature is a useful tool to avoid misdiagnosis to a great extent caused by the inexperience of radiologists. The study results also provided important evidence for the potential use of the joint model for risk stratification of rectal cancer in the future.