Introduction

Colorectal carcinoma (CRC) is among the most frequently diagnosed cancers and the primary cause of cancer-related mortality [1]. Rectal carcinoma (RC) constitutes approximately 29% of newly diagnosed CRC cases between 2012 and 2016, and is the most common type of CRC in individuals below 50 years of age [2]. The prognosis of CRC is dependent on the tumor’s biology and heterogeneity [3]. Routine computed tomography (CT) and magnetic resonance imaging (MRI) examination preoperatively predict the stages and degrees of RCs with different accuracy [4], which affected the clinical decision-making. The subtle information underlying these images may reflect the genetic status of RCs.

Microsatellite instability (MSI) is a crucial biomarker of CRC with prominent diagnosis, prognosis, and prediction significance. MSI determines whether RCs respond well to immunotherapy, patients with MSI generally do not benefit from preoperative 5-fluorouracil-based adjuvant therapy [5]. Tumors that show loss of one or more mismatch repair (MMR) proteins upon immunohistochemistry testing are classified as high-MSI (MSI-H) [6], while those with intact MMR proteins are likely to be microsatellite stable or low-MSI (MSS or MSI-L). MSI is detected in approximately 15% of CRC patients and has emerged as a predictor of patient response to adjuvant chemotherapy [7]. MSI, which exhibits clinicopathological characteristics distinct from MSS ones, has been reported to be more prevalent in stage II CRC [8] and associated with a better prognosis [9].

Radiomics, which extracts quantitative high-through image data from conventional images to improve diagnostic and predictive accuracy [10], is gaining considerable attention in medical research. Entropy features reflect the invasiveness and heterogeneity of tumor, texture features represent the appearance of the surface and the distribution of elements, factor parameters describe the size and shape of tumor region, and so on. Previous studies have demonstrated that radiomics analysis based on CT [11] or MRI [12] imaging can help predict MSI status in CRC. To the best of our knowledge, only several articles have investigated the MR-based [13, 14] and T2WI-based [15] radiomic signature for predicting the MSI phenotype of RCs. The established radiomics model based on contrast-enhanced T1WI or multiparametric MRI has similar predictive performance to predict MSI status in RCs [16]. The radiomics model based on logistic regression algorithm performed best to preoperatively identifying MSI status of RCs based on MRI after comparing different machine learning algorithms [17]. However, there was no CT relevant radiomic analysis in this field. This article aims to develop a non-invasive, reproducible CT-based radiomic approach to evaluate the MSI-H status of RCs. The objective of this article is to construct and confirm an integrative model that combines clinical and tumoral/peritumoral radiomic features to evaluate the MSI-H status of RCs based on preoperative CT images.

Materials and methods

This retrospective study was conducted with the permission of the Medical Ethics Committee of our hospital (No. 2021QT339) and in conformity to the Declaration of Helsinki. The informed consent was waived for this retrospective study.

Patient selection

From January 2015 to January 2021, a total of 1103 patients with pathologically proven to be RCs were identified through a search of the surgical database in our hospital. The inclusion criteria required that patients have pathological confirmation of RCs, including classical adenocarcinoma, mucous adenocarcinoma, and signet-ring cell carcinoma. All CT examinations were conducted within 2 weeks prior to surgeries. Additionally, patients with tumors originating from the rectum to the adjacent sigmoid colon were also recruited. The exclusion criteria included patients who received preoperative therapy such as radiation, chemotherapy, or chemoradiotherapy, those with metachronous or recurrent cancer, and those with lesions in the ascending, descending, and sigmoid colon or in the junction of the rectosigmoid. Patients without MSI evaluation were also excluded. Ultimately, a total of 788 patients, consisting of 97 MSI-H and 691 MSS, were retrospectively enlisted in this analysis.

Clinical characteristics of RC patients

Baseline clinical variables for analysis included age, gender, body mass index (BMI), CT-displayed long diameter, tumor location (low RC refers to the lesion within 5cm from anal margin, middle RC refers to the lesion 5–10 cm from anal margin, high RC refers to the lesion more than 10cm away from the anal margin), carcinoembryonic antigen (CEA) with threshold values of 5.0 μg/L [18], carbohydrate antigen 19-9 (CA19-9) with threshold values of 37.0 U/mL [18], history of diabetes, hypertension, and liver metastasis. Additionally, tumors originating from the rectosigmoid region and those with a distance greater than 10cm from the anal margin were classified as high RC [19].

Evaluation of MSI status

The immunohistochemistry method was used to test MMR proteins, including MLH1, MSH2, MSH6, and PMS2. Tumors displaying a lack of one or more MMR proteins were collectively classified as defective mismatch repair (dMMR) and expected to be MSI-H, while those with intact MMR proteins were considered proficient mismatch repair (pMMR) and estimated to be MSS or MSI-L. After referring to the revised Bethesda guideline for MSI, the MSI-L type for CRCs was revised and categorized as MSS tumors for clinical purposes [20]. Therefore, our study divided all RC patients into two groups based on the MMR proteins: the MSI-H cohort and the MSS cohort.

CT examination

All 788 RC patients underwent three-phase CT examinations using 64/128 slices CT scanners (Siemens, Somatom Definition AS). The three-phase examination included an unenhanced phase, arterial phase, and venous-phase achieved through computer-aided bolus tracking. Contrast media (iomeprol 350, GE Healthcare) was administered at a rate of 3.0 mL/s, with a dose of 1.3 mL/kg. The arterial phase was scanned after 35 s after injection, and the venous-phase was obtained 25 s later. The images of venous-phase were taken for radiomic analysis. The specific parameters were as follows: 120 Kv of tube voltage, 200 mA of tube current, 360mm field of view, 64*0.625mm of collimation, 0.75 s of the rotation time, 5mm of slice and interval thickness, and 300HU of window width, 40HU of window level.

Tumor segmentation and radiomic features selection

The original CT images were obtained from our picture archiving and communication system in DICOM format. After standardizing the original images using the software of “A.K. 3.0.0” (Artificial Intelligence Kit, GE Healthcare), the tumoral volume of interests (VOIs) were manually segmented using the software of “itk-SNAP 3.4.0” (http://www.itksnap.org/) by two radiologists with 7 and 10 years of experience, respectively (Fig. 1a). The peritumoral VOIs were then automatically obtained by expanding 5 mm from the tumor contour (Fig. 1b). Regions of necrosis, intraluminal air, non-invaded rectal wall, vessel, and peri-rectal fat were manually eliminated from contours of VOIs.

Fig. 1
figure 1

The VOIs were manually segmented in the software of “itk-SNAP.” a shows the tumoral VOI segmentation in the axial image. b shows the peritumoral VOI segmentation in the sagittal image

The tumoral and peritumoral radiomic features were automatically calculated by A.K. software. The intraclass correlation coefficients (ICCs) of radiomic features from two radiologists were calculated to assess interobserver agreement between the two radiologists, with all ICCs greater than 0.75, indicating good agreement [21]. The mean values of radiomic features from two radiologists were calculated for subsequent research. Since the sample sizes of two sets were not balanced, the synthetic minority over-sampling technique (SMOTE) was used to balance them. SMOTE is a straightforward approach used to regulate the ratio between the unbalanced groups [22]. The cohort (97 MSI-H and 691 MSS) was randomly partitioned into a training set (68 MSI-H and 484 MSS) and a validation set (29 MSI-H and 207 MSS) at a proportion of 7:3. The CT-based tumoral and peritumoral models were constructed according to the training set and were tested in the validation set to predict the MSI-H status of RCs. Before analysis, variables with zero variance were excluded, and outlier values were replaced by the median. The data were standardized by the method of standardization. Hereafter, the methods of variance, correlation analysis, gradient boosting decision tree (GBDT), and multivariate logistic analysis with stepwise selection were performed to select optimal radiomic features. A tenfold cross-validation approach was used in both the training and validation cohorts to construct the model with the best performance. The accuracy of the algorithm was tested using tenfold cross-validation, where the dataset was divided into 10 pieces, with 9 pieces used as training data and 1 piece as test data. The average of the correct or error rate of each trial yields of 10 times was used as an estimate of the accuracy of the algorithm.

Construction the prediction models

After the selection of radiomic features and resampling the data with 100 bootstrapped replication, the corresponding logistic models of tumoral radiomics (LM-tRadio), peritumoral radiomics (LM-ptRadio), and tumoral/peritumoral radiomics (LM-Radio) were constructed and the tumoral radiomic score (t-Radscore), peritumoral radiomic score (pt-Radscore), and tumoral/peritumoral radiomic score (Radscore) were quantified. The Radscore was calculated based on linearly combining the selected radiomics features with their respective coefficients. All models were constructed and validated with 10-fold cross-validation.

The clinical characteristics were analyzed using independent t test or χ2 test. An integrative clinical and CT-based tumoral/peritumoral radiomics nomogram (LM-Nomo) with significant clinical characteristics and Radscore was then constructed to evaluate the MSI-H status of RC. The area under curves (AUCs) of the receiver operator curve (ROC) calculated using the Delong test were applied to assess the efficiency of all logistic models.

Statistical analysis

The radiomic features selection and logistic model construction methods were proceeded using R software (https://www.r-project.org/). A Hosmer-Lemeshow test (HL-test) was used to evaluate the goodness-of-fit and accuracy of the model. The analysis of clinical characteristics was executed in SPSS software (https://spss-64bits.en.softonic.com/) using the independent t test or chi-square test. The ICCs were utilized to assess the consistency of VOI segmentation between two radiologists. The Delong test was carried out in MedCalc software (https://www.medcalc.org/), and the corresponding AUC and 95% confidence interval (CI) were recorded. A two-tailed p value < 0.05 indicated statistical significance.

Results

Baseline clinical and pathological characteristics

The baseline clinical characteristics are presented in Table 1. There were 97 patients in MSI-H cohort including 33 females and 64 males, with an average age of 64.04 ± 11.01 years old with a mean BMI of 22.93 ± 3.01 kg/m2, while 691 patients recruited in MSS group including 256 females and 435 males, and an average ages of 63.32 ± 11.52 years old with a mean BMI of 22.93 ± 3.32 kg/m2. In terms of clinical characteristics, the variables of CEA (p = 0.043) and history of hypertension (p = 0.036) showed significant differences. The MSI-H cohort tended to have normal CEA levels (71.1%) and a higher incidence of hypertension (45.4%) compared to the MSS cohort.

Table 1 The baseline clinical characteristics

Performance of the tumoral and peritumoral radiomic model

The LM-tRadio (p value of HL-test was 0.879) containing 55 radiomic features was developed, and the AUCs of the training set and validation set were 0.708 (95%CI 0.648–0.766) and 0.602 (95%CI 0.515–0.687). There were 25 peritumoral radiomic features remained in LM-ptRadio (Fig. 2). The AUCs of LM-ptRadio (p value of HL-test was 0.375) in both the training (Fig. 3a) and validation set (Fig. 3b) were slightly higher than those of LM-tRadio as 0.724 (95%CI 0.668–0.778) and 0.613 (95% 0.514–0.714). After the 41 radiomic features from venous-phase CT images were extracted, the LM-Radio was developed (p value of HL-test was 0.263). The AUCs of 0.785 (95%CI 0.732–0.837) in the training set and 0.628 (95%CI 0.528–0.723) in the validation set were the highest compared those of LM-tRadio and LM-ptRadio. The heatmap of LM-Radio in the training set is listed in Fig. 4.

Fig. 2
figure 2

The coefficient of 25 peritumoral radiomic features in LM-ptRadio

Fig. 3
figure 3

The comparison of AUCs in the training (a) and validation (b) set of LM-tRadio (yellow line) and LM-ptRadio (green line) by Delong test

Fig. 4
figure 4

The heatmap of LM-Radio in the training set after the method of GBDT, there were 41 radiomic features extracted

Performance of the clinical and tumoral/peritumoral radiomics nomogram

The significant clinical characteristics of CEA and hypertension integrated with Radscore constituted the LM-Nomo (Fig. 5). The AUCs of LM-Nomo were 0.796 (95%CI 0.732–0.837) in the training set and 0.679 (95%CI 0.588–0.771) in the validation set. The non-significant HL-test (p = 0.438) showed the goodness-of-fit of the model.

Fig. 5
figure 5

The integrative clinical and tumoral/peritumoral radiomics nomogram including variables of CEA, hypertension, and Radscore was developed

Discussion

MSI-H is a biomarker for predicting the clinical outcomes of RCs. Unlike MSS CRCs, the MSI-H CRCs are associated with abundant lymphocyte infiltration, a poor differentiation pattern, longer postoperative survival, and predominantly occur in the proximal colon [23], and mucous or signet-ring cell component [24]. They may have a mildly better prognosis and not benefit from 5-FU-based chemotherapy compared to patients with MSS [25]. Previous study focused on CT-based radiomics analysis has found that a clinic-radiomics nomogram model combining clinical risk factors, qualitative imaging data, and radiomics features may effectively predict the MSI status of CRC [26]. Another CT-based radiomics study in CRC found that a radiomics nomogram incorporating radiomics signatures, tumor location, patient age, high-density lipoprotein expression, and platelet counts showed good discrimination of MSI status [11]. An MRI-based radiomics analysis concluded that T2WI and DWI radiomics were significant in predicting the MSI status of RC [27]. Very few published studies have evaluated the clinical and tumoral/peritumoral radiomic differences between MSI-H and MSS status in RCs. Hence, the preoperative predicting MSI-H status from these fields in RCs could facilitate adjuvant therapy strategies, follow-up monitoring, and management. In this analysis, we merely focused on RCs to reduce bias between the ascending, descending, and sigmoid colon. Although not statistically significant, tumors with MSI-H are more common in right-sided colon tumors than left-side colon and rectum [28]. Heterogeneity in clinical and radiomic manifestation of MSI-H in RCs is the more commonly observed than being exceptional. Clinical characteristics such as CEA and hypertension history, as well as the tumoral and peritumoral radiomics, showed statistical differences. RCs with MSI-H were founded to be easier to have a history of hypertension and normal CEA levels. It emphasized the importance of medical history. In contrast, the characteristics of location, histological type, and differentiation pattern did not differ significantly from those observed in CRCs.

The peritumoral region immediately surrounding the tumor mass has remained relatively unexplored and may offer unique information, which cannot be effectively captured from the bulk of tumoral parenchyma. The characteristics of peritumoral tissue provided the additional information of tumor infiltration and pathological stage, which may affect the therapeutic regimen. This article aimed to calculate the CT-based radiomic features of tumor and peritumoral tissue to explore their relationship with the MSI-H status. Regarding that, CT has been suggested as the most commonly used modality to evaluate the RC. After calculating the tumoral and peritumoral CT radiomic features, the corresponding logistic models in predicting the MSI-H status of RCs were constructed. Then the efficiency of the logistic models of clinical, radiomics, and the integrative nomogram was quantified as AUC values to evaluate the MSI-H status. To elucidate which factors contribute to a more favorable prediction of MSI-H tumors, clinical characteristics as well as the CT radiomic features were analyzed in 788 patients with 97 MSI-H and 691 MSS. Since radiological images were closely connected with its pathological characteristics [29], the quantitative radiomic features showed the potential for predicting MSI-H status of RCs in our study. However, only a few previous publications on peritumoral radiomics of RC have been reported, and intratumoral and peritumoral radiomics can help predict the lymph node metastasis status of RCs [19]. This study provided new insights into tumoral and peritumoral radiomics to evaluate the MSI-H status of RC. Interestingly, the predictive efficacy of simple LM-tRadio and LM-ptRadio was approaching and acceptable with AUCs of 0.708 and 0.724 compared with LM-Radio with AUCs of 0.785 in the training set and was suboptimal with AUCs of 0.602 and 0.613 compared with LM-Radio with AUC of 0.628 in the validation set. The heatmap of LM-Radio visualized the correlation of selected CT radiomic features in this data matrix by the varying color, which helped us to grasp the research focus and further analyze its difference. Interestingly, the predictive efficacy of simple LM-ptRadio and LM-tRadio was approaching both in the training and validation set, and was disappointing compared to LM-Radio. Our research explores, for the first time, the effects of peritumoral radiomics of CT images in distinguishing the MSI-H status from MSS status in RC patients.

Therefore, an integrative nomogram comprising clinical characteristics and Radscore became an important modality to predict the MSI-H status of RCs, noninvasively. Previous studies have almost exclusively focused on evaluating the MSI-H phenotype of CRCs, ignoring specialized analysis of RCs. Data from the study of YT Cao et al. [29] suggested that the radiomics signature of triphasic enhanced CT was a reliable method to predict MSI in CRCs, and the clinical-radiomics nomogram including age, location, CEA, and radiomics has shown promising prediction. Our integrative clinical and tumoral/peritumoral radiomics nomogram, including CEA, hypertension, and Radscore, was the most meaningful model for predicting MSI-H phenotype of RCs, with the highest AUCs of 0.796 (95%CI 0.748–0.843) in the training set and 0.679 (95%CI 0.588–0.771) in the validation set, compared to simple LM-tRadio, LM-ptRadio, and LM-Radio. The p values of HL-test of all models were non-significant, indicating the goodness-of-fit of models.

Despite some strengths, there were several limitations. First, this retrospective analysis existed several biases including single-center design, an unbalanced sample size, and limited universality. Thus, future multi-center studies are necessary to validate and improve the performance of the predictive nomogram. Second, we only evaluated the tumoral and peritumoral radiomics of venous-phase CT images to predict the MSI-H phenotype of RCs. The reason for our choice of venous-phase is that, after referring the previous literature [30] and our preliminary on some cases, we found that venous-phase performed better and was more conducive to delineating the areas of interest of RC lesions. So the CT images of unenhanced and arterial phases should be emphasized by providing additional information to better predict the MSI-H status. Third, the irregular shape of RCs may lead to bias between manual segmentation, which could affect the radiomic analysis, despite efforts to reduce intra-observer difference through ICC calculations. Therefore, an automatic approach to segment the RCs for radiomic analysis needed to be further explored.

Conclusion

In conclusion, our study demonstrated that an integrative clinical and CT-based tumoral/peritumoral radiomics nomogram including a history of hypertension, CEA levels, and Radscore showed an encouraging performance in predicting MSI-H status of RCs, and may provide a non-invasive tool for clinical decision-making. However, both specific additional research on multi-phase CT images and external validation were needed to improve confidence in predicting MSI-H status of RC in this nomogram.