Introduction

Rectal cancer is common cancer with high mortality and morbidity rate [1]. Mesorectal lymph node metastasis is a primary metastatic modality in rectal cancer and an important prognostic factor affecting local recurrence and overall survival [2]. The National Comprehensive Cancer Network rectal cancer guidelines recommend accurately evaluating LN invasion to guide surgical and individualized treatment strategies [3]. Therefore, preoperative clarification of LN status is essential for managing rectal cancer.

In clinical practice, magnetic resonance imaging (MRI) is the preferred method for assessing rectal cancer staging [4]. However, lacking uniform criteria for defining LN status, previous studies have shown that the accuracy of MRI in predicting the N stage of rectal cancer is unreliable, both in terms of short diameter and morphological features [5, 6]. Therefore, more powerful and sensitive diagnostic tools are urgently needed to optimize the diagnostic accuracy of LNM.

In recent years, radiomics research has attracted much attention with the development of artificial intelligence. As an emerging field of translational research, radiomics convert medical images into high-throughput quantitative features or biomarkers for tissue characterization and constructs predictive models through machine learning, which can be used for tumor diagnosis, staging, treatment response prediction, and prognosis [7, 8]. Several colorectal cancer studies have applied radiomics to predict LN status and provide a basis for clinical decision-making, implying that radiomics has great potential for predicting LN status [9,10,11,12,13,14].

However, lacking pathological ground truth of individual LN, most studies have focused on extracting imaging features of the primary tumor to predict LNM indirectly, and few have explored the features of LNs themselves. Therefore, this study prospectively investigated the lymph nodes T2WI-based texture features performance, by combining with clinical risk factors to develop and validate the radiomics models, assessing the value of preoperative prediction of LNM.

Methods

This methodological study was based on “Prospective study for the accuracy of imaging in the diagnosis of mesangial lymph node staging in rectal cancer.” The Medical Ethics Committee of our hospital approved the study, and written informed consent was obtained from all patients before surgery. From October 2021 to October 2022, patients with histopathologically verified rectal cancer underwent rectal MRI before surgery. The inclusion criteria were: (1) rectal carcinoma located ≤ 10 cm above the anal verge; (2) radical surgery scheduled within two weeks after rectal MRI; and (3) patients with 1 ≤ LNs ≤ 15 on preoperative imaging fndings. Exclusion criteria were: (1) complete imaging and clinical data were unavailable; 2) patients received preoperative neoadjuvant chemoradiotherapy. A total of 74 eligible patients with rectal cancer were finally enrolled (Fig. 1).

Fig. 1
figure 1

Flowchart of patient selection. pN + any pathologic tumor stage with lymph node positivity, pN- any pathologic tumor stage with lymph node negativity

MR imaging protocol

MR imaging was performed with a 3 T MAGNETOM Skyra MR scanner (Siemens Healthineers, Malvern, PA, USA) employing an 18-channel body coil. All patients were given an intravenous antiperistaltic agent (10 mg raniscopolamine hydrochloride) 30 min before MRI for bowel preparation. The high-resolution rectal MRI protocol comprised turbo spin-echo sagittal, oblique coronal, oblique axial T2-, and diffusion-weighted imaging. The scan parameters used for the oblique axial T2-weighted imaging sequence were as follows: repetition time/echo time, 6890/100; slice thickness, 3 mm; voxel size, 0.3 × 0.3 × 3 mm; field of view, 180 mm; matrix, 384 × 346; slices, 48; average, 3; total scanning time, 5 min, and 5 s; parallel acquisition technique with generalized autocalibrating partial parallel acquisition acceleration factor. The oblique axial DWI sequence was a transverse echo-planar imaging diffusion sequence with 1000 s/mm2 as the highest b value. The same parameters, such as field of view (FOV), slice thickness, and gap, were used in DWI to match the tumor on the oblique axial T2WI. The total scan time was 30 min. We retrieve and acquire all (DICOM) images from our hospital's picture archiving and communication system (PACS) system for image segmentation and analysis.

Radiologic–pathologic node-by-node comparison

All rectal MRI images of patients with rectal cancer were analyzed preoperatively by a radiologist with more than 15 years of experience interpreting rectal imaging studies. Similar to the method described in previous reports, the radiologist performed anatomic correlation matching with T2WI to confirm regional LNs and drawn on anatomical maps to obtain accurate histology-imaging correlation [15]. All patients underwent total mesorectal excision (TME) within 2 weeks of the imaging examination. Fresh specimens were sent to the pathology laboratory for macroscopic examination. A pathologist with 15 years of experience guided each specimen by a 3 mm ruled template and sectioned at 3 mm intervals transversely. LNs were carefully identified on each slice, and the anatomical map was used as a template for node-by-node correspondence. The benign and malignant LNs were reported according to the microscopy results.

Clinical and imaging data collection

The clinical data included patient age, sex, carcinoembryonic antigen (CEA), and carbohydrate antigen 19–9 (CA19-9) levels. The histological grades were obtained from pathological reports.

In MRI data assessments, two radiologists experienced in reading rectal MRI were blinded to the histological information and independently reviewed the entire MR images. The tumor diameter, infiltration depth, MRI-EMVI, and MRI-CRM were determined. Two radiologists determined the above indicators and then reviewed with a senior radiologist for final judgment.

Radiomics workflow

The radiomics workflow is illustrated in Fig. 2, and the details of steps are described as follows.

Fig. 2
figure 2

The workflow of radiomics in this study

Image data acquisition and segmentation

The MRI DICOME image files of the matched LNs were collected and the region of interest (ROI) on oblique axial T2WI was traced manually along the largest cross-sectional area of each LN using 3D Slicer version 4.11.0 (Harvard University, Boston, MA, USA), excluding adjacent air, vessels, fat and normal tissues. All outlines were performed independently by two experienced colorectal surgeons.

Feature extraction and selection

Slicer-radiomics (an extension of the 3D Slicer program that encapsulates a PyRadiomics library to compute various radiometric features) was used to texture features extraction. Each ROI extracts 6 sets of raw fine texture features, including morphological features (Shape, 14), gray level co-occurrence matrix (GLCM, 24), gray level dependence matrix (GLDM, 14), first-order statistics (FoS, 18), gray level run length matrix (GLRLM, 16), gray level size zone matrix (GLSZM, 16), and neighboring gray tone difference matrix (NGTDM, 5), for a total of 107 texture features. Supplementary Table 1 summarizes the details of the features. The intraclass correlation coefficient (ICC) was used to assess the intra-observer and inter-observer agreements of feature extraction. Features with better repeatability were retained for analysis with ICC > 0.70 as the benchmark. Before calculating, features were rescaled into the range of [0,1] using min–max normalization and irrelevant were eliminated using the method of maximum correlation and minimum redundancy (mRMR). The least absolute shrinkage and selection operator (LASSO) was applied to select the optimized features, adjusted by ten-fold cross-validation for the penalty parameter lambda to avoid over-fitting. The features weighted by LASSO coefficients were linearly combined for constructing the radiomics score (Rad-score) and radiomics model.

Model construction and validation

LNs were randomly allocated into training and validation groups in a 7:3 ratio to ensure the same distribution of malignant and benign LNs within the two groups. Based on the pathology report as the gold standard in the training cohort, the Wilcoxon test was first performed for all clinical risk factors, and univariate logistic regression was applied for factors with p < 0.1 to select independent predictors. Then multivariable logistic regression analysis was performed for the independent clinical risk factors, Rad-score was used to establish the clinical prediction model, radiomics prediction models, and the clinical–radiomics prediction model. Nomograms were generated for model visualization and application based on the multivariate logistic regression. The performance of the nomogram was evaluated using the calibration plot. Receiver operating characteristic (ROC) curves and the area under the curve (AUC) were used to assess the diagnostic value of the three models and validated. Decision curve analysis (DCA) was developed to compare the net benefits of a series of threshold probabilities based on the three models. All steps were performed with R version 3.6.2 (www.r-project.org) using the “glmnet,” “glm2,” and “pROC” packages.

Results

Demographic and clinical findings

From October 2021 to October 2022, 74 cases were prospectively enrolled. Table 1 summarizes and compares the basic demographic characteristics, tumor characteristics, clinical features, and categorizes according to the pN status. Among them, pT stage, preoperative CEA level, CA-199 level, and MRI-EMVI were statistical differences between the two sets (p < 0.1). For evaluation on a node-by-node basis, 381 histopathologically confirmed LNs were collected, 342 LNs were negative and 39 were positive. The matched LNs were randomly allocated to a training cohort (239 negative/27 positive) and a validation cohort (103 negative/12 positive) in a ratio of 7:3. Univariate logistic regression results demonstrated that preoperative CEA levels and MRI-EMVI were associated with LNM and proved to be independent predictors in multivariate logistic regression (p < 0.05) (Table 2). Therefore, a clinical prediction model based on two significant predictors is presented in Fig. 3. In the training cohort, the diagnostic accuracy of the nomogram was 53.8%, the AUC was 0.676 (95% CI 0.582–0.771), and the sensitivity and specificity were 85.2 and 50.2%. The validation cohort had a prediction accuracy of 60.9%, sensitivity and specificity were 91.7 and 57.3%, with the AUC of 0.774 (95% CI 0.648–0.899) (Table 3).

Table 1 Demographic comparison between pN + /pN- cohorts
Table 2 Univariate and multivariate logistic regression analyses for LNM
Fig. 3
figure 3

ROC curves of clinical, radiomics, and clinical–radiomics combined model: A ROC curve in the training cohort. B ROC curve in the validation cohort

Table 3 Accuracy and predictive value between 3 models

Radiomics-based model comparison and validation

A total of 107 texture features were extracted from LN-MRI images. The mean inter-observer ICC was 0.850, indicating substantial agreement. 97 features with favorable repeatability were retained for analysis, and after LASSO regression, features were reduced to 8 (Fig. 4).

Fig. 4
figure 4

Feature selection and dimension reduction: A The ten-fold cross-validation of the LASSO analysis. B The regression coefficients of LASSO

The final formula of Rad-score used to predict LNM is as follows:

Rad-score = 1.692 * MinorAxisLength—13.527 * Sphericity—0.023 * VoxelVolume—2.242 * X90Percentile + 8.503 * JointEntropy—1.539 * DependenceNonUniformity + 1.371 * LowGrayLevelZoneEmphasis—0.093 * Contrast + 3.171.

The AUC values for the radiomics model were 0.793 (95% CI 0.707–0.880) and 0.847 (95% CI 0.734–0.960) in the training and validation cohorts. The sensitivity and specificity were 81.5%/70.7% and 75/83.5%, respectively. After combining clinical and radiomics features, preoperative CEA level was not an independent predictor in multivariate logistic regression. The comparison revealed that the clinical–radiomics prediction model had the best performance in both cohorts with the highest AUC values, 0.818 (95% CI 0.742–0.893) for the training cohort and 0.922 (95% CI 0.863–0.980) for the validation cohort. The sensitivity, specificity, and accuracy of the combined model in the two cohorts were 88.9, 61.9, 64.7 and 91.7, 82.5, 83.5%, respectively. The corresponding ROC curves and AUC values are shown in Fig. 3 and Table 3. Based on this combined model, we generated a visualization of the clinical–radiomics nomogram (Fig. 5), with the Rad-score obtaining the highest classification contribution.

Fig. 5
figure 5

The clinical–radiomics predictive nomogram

The Hosmer–Lemeshow goodness-of-fit test was p = 0.453 and the bootstrapped-concordance index was 0.818 in the training cohort. In the validation cohort, the Hosmer–Lemeshow goodness-of-fit test was p = 0.927 and the bootstrapped-concordance index was 0.922. The calibration curves demonstrate good agreement between the predicted and actual probabilities for the training and validation cohorts (Fig. 6).

Fig. 6
figure 6

Calibration curve of the nomogram: A Calibration curve of the nomogram in the training cohort. B Calibration curve of the nomogram in the validation cohort

The result of DCA based on the three models indicated that the radiomics-based predictive model gains more benefit compared to the “treat all” or “treat none” strategy when the individual LNM threshold probability from 0.5 to 1.0 (Fig. 7).

Fig. 7
figure 7

DCA of the nomogram. The net benefit is measured by y-axis. The gray curve symbolizes that all LNs had metastasis. The black line symbolizes that no LNs had metastasis

Discussion

In this study, we built and validated a combined clinical–radiomics model by extracting LN-T2WI texture features based on node-by-node discovery. Our results demonstrate that the texture features perform well, significantly improve diagnostic accuracy, which can be used as a non-invasive adjunctive assessment tool in clinical decision-making.

Accurate assessment of LN status remains challenging in current clinical practice. Previous studies have revealed several clinical features associated with LNM [10,11,12, 16,17,18]. In our study, preoperative CEA levels and MRI-EMVI were independent risk factors associated with LNM, which is consistent with the results of previous studies [11, 12, 17, 18]. CEA is the most common tumor marker of colorectal cancer. It plays a fundamental role in cell recognition and adhesion, which is a widely used and simple test in clinical practice. High preoperative levels of CEA may indicate increased tumor metastasis or invasiveness [19]. In addition, we found that pN + patients were more likely to present with specific features of the mesorectum, including the invaded vessel diameter, signal changes, et al. Liu et al. demonstrated that MRI-EMVI positivity was associated with LNM [20]. In this study, the clinical prediction model obtained a high sensitivity but low specificity and diagnostic accuracy, with moderate AUC values, implying that clinical features may not be sufficient to diagnose LNM reliably. Furthermore, while there was a strong correlation between pT staging and LNM (p < 0.001), clinicopathological variables were not included in the final risk factors since we desire to propose a non-invasive assessment tool.

Texture analysis can reflect the spatial distribution of gray level intensity and thus objectively describe the heterogeneity of the target area. Huang et al. were the first to develop radiomics nomogram by extracting texture features of the primary tumor in the CT-venous phase and obtained a C-index of 0.778 in the validation cohort [17]. Liu et al. found that the entropy of tumor was an independent predictor of LNs involvement in ADC histogram texture features [21]. Yang et al. went a step further and found that skewness, kurtosis, and energy were all correlated with LNM by T2WI histogram features of the primary tumor, with independently predicted AUCs of 0.648–0.750 [22]. However, both of them only extract features from the primary tumor, not individual LNs. The truth is features of LN itself are more predictive based on node-by-node analysis.

In this study, we determined the status of individual LNs by preoperative node-by-node comparison and selected 8 out of 107 texture features from T2W images, including three shape features, one first-order feature, and four higher-order features. These features characterized the geometry and color space, suggesting that metastatic LNs may have more significant intra-heterogeneity. Interestingly, the MinorAxisLength selected from the texture features indicates that the size dimensions of LNM are closely related to LNM. Although the current diagnostic accuracy of LNM based on the maximum short diameter of LNs is only between 0.5 and0.7 [23], our findings suggest that morphologic features and intra-heterogeneity are critical factors in identifying metastatic LNs. After fitting to Rad-score, the radiomics prediction model showed well-predictive performance in the training and validation cohorts (AUC, 0.676; 95% CI 0.582–0.771) (AUC, 0.774; 95% CI 0.648–0.899), indicating that the LNs texture features are helpful in LNM diagnosis.

Considering the variability of individualized treatment modalities, a single feature or model may be inadequate in clinical practice. In exploring the individualized preoperative prediction nomogram of LNM, Li et al. included nine clinical risk factors and demonstrated that the clinical-primary lesion radiomics-peripheral lymph node radiomics model obtained the highest AUC value (0.7606) [11]. Liu et al. developed a multi-regional-based MRI radiomics prediction model with improved performance after combining clinical data with radiomics features (AUC = 0.832 (95% CI 0.717–0.915)) [16]. In our study, the clinical–radiomics combined prediction model obtained the highest AUC value and maximum net benefit. It should be noted that though CEA was statistically different in the univariate analysis, it was excluded in the final combined model. The strong discriminatory efficacy of texture features diminished the contribution of CEA, similar to the findings of Chen et al. [18]. Furthermore, we found that the combined model had a high NPV in both cohorts. The truth is, it is the most clinically valuable metric, as patients with accurately identified pN- can consider organ preservation strategies, which is important for improving the quality of life. To facilitate clinical application, we constructed a nomogram in which the LN Rad-score showed absolute dominance, predicting that texture analysis could be a valuable tool for LNM prediction.

Since 2016, the role of texture analysis in predicting LNM of rectal cancer has been reported in several studies, mostly using CT as the initial evaluation tool [10, 11, 17, 24]. In contrast, MRI has high soft-tissue contrast and spatial resolution, which is advantageous in characterizing the details and is more conducive in finding the differences between LNs [4]. In sketching ROI, as in the study by Huang et al. [17], we focused on depicting the largest cross-section of the LNs. Even though it may be challenging to summarize all texture features in the largest cross-section for large volumes or morphologically irregular LNs, small volumes LNs may only be displayed at a single level due to the high layer thickness of MRI. Thus, to balance the errors caused by tracing, the current segmentation method is probably the most clinically feasible, and future related studies need to improve and optimize this aspect.

This study has some limitations. First, this trial with a relatively small sample size, containing only 39 positive lymph nodes, which may affect the radiomics model's stability, exhibiting in lower sensitivity and higher specificity. We aim to increase the sample size in the future and substantially improving the model’s sensitivity. Second, lacking external validation may hinder the universality of results. Third, we selected only T2WI for feature extraction. Studies have demonstrated that combining multi-regional or multi-objective features can further improve the prediction [16, 18]. Fourth, whole-lymph nodes manual segmentation was used in this study, which is time-consuming and error-prone. Currently, several studies utilized radiomics features extracted from automatic or semi-automatic segmentation based on deep learning [25]. The utilization of artificial intelligence has significantly enhanced the reliability of research and holds promise as a replacement for the time-consuming and non-reproducible manual segmentation method currently in use [26]. Fifth, the inherent shortcomings of Radiomics lead to poor reproducibility. For example, different MRI techniques, parameters can affect the results. In addition, the variations in treatment approaches across regions or institutions can result in distinct lymph node responses. The reactive LN swelling makes it difficult tobe differentiated from involved nodes. Edema of the perirectal fat tissue by radiation and post-radiation fibrosis around the LN may result in false-positive results of LN status, which makes the reliability of the extracted features challenged. Therefore, a larger sample and multi-center validation are required to complete the clinical transformation.

Conclusion

The results of this study demonstrate that the lymph node T2WI-based texture features can facilitate individualized preoperative prediction of LNM in rectal cancer.