Introduction

Approximately 338,000 people are diagnosed as renal cell carcinoma (RCC) worldwide each year, accounting for more than 90% of renal malignancies [1]. Clear cell renal cell carcinoma (ccRCC) is the most common histological subtype of RCC, accounting for approximately 70% of the cases and associating with the most cancer-related deaths of RCC [2]. Moreover, ccRCC received more attentions in clinical practice due to its high risk of a poor prognosis [3, 4].

Malignancy grade assessment plays a key role in clinical management decisions of ccRCC. The traditional 4-tiered Fuhrman grade system (FGS), including I, II, III, and IV, is the most frequently used grading system in ccRCC [5]. A higher grade of ccRCC indicates a higher risk of postoperative recurrences [6]. A simplified 2-tiered Fuhrman grade system, strategizing the traditional 4-tiered Fuhrman system into a low grade (correspondence to grade I and II in the traditional FGS) and a high grade (grade III and IV in the traditional FGS), is also widely used in clinic, which reduces the variability and improves the reproducibility for pathologists. The simplified FGS predicts cancer-specific mortality as good as the traditional 4-tiered FGS [7]. Unfortunately, due to the high spatial and temporal heterogeneity of ccRCC, biopsy cannot roundly reflects the Fuhrman grade of the entire tumor [8]. Therefore, preoperative evaluation of Fuhrman grade using noninvasive procedure with medical images receives widely attentions.

Radiomics methods translate medical imaging data into high-dimension features, which are used as noninvasive biomarkers for auxiliary prognosis or prediction [9,10,11]. Previous studies have shown that radiomics signatures were associated with tumor vascularity, vascular endothelial growth factor expression in ccRCC [12, 13], providing valuable predictive information for identifying benign and malignant tumors, tumor subtypes [14], and tumor grades [15, 16].

The purpose of this study was to investigate the ability of CT-based peritumoral radiomics signatures to noninvasively distinguish low and high grades in simplified Fuhrman grading system.

Materials and methods

Datasets

This is a retrospective study and all CT images were downloaded from the Cancer Genome Atlas-Kidney Renal Clear Cell Carcinoma (TCGA-KIRC) [17, 18], including 267 patients of ccRCC confirmed by pathology reports. The images in TCGA-KIRC were required from seven centers in US with multiple scanners with GE, SIEMENS, and Philips, the manufacture, the range of slice thickness is 1.25–7.5 mm. Inclusion criteria in our study were as follows: (1) pathology confirmed ccRCC after surgery; (2) CT scans in nephrographic phase before surgery and radiotherapy. Exclusion criteria were: (1) only included MR images; (2) only included CT plain scans; (3) with surgery and/or chemotherapy prior to CT scans; (4) multiple lesions; (5) poor quality of CT scans. The following data flow chart describe the details of inclusion and exclusion criteria, as shown in Fig. 1.

Fig. 1
figure 1

Details of flow chart of inclusion and exclusion criteria

Figure 2 shows the workflow of radiomics-based malignancy grading, which included the preparing, feature extraction, feature selection and signature building.

Fig. 2
figure 2

Flow chart of CT-based radiomics signatures extraction and analysis for ccRCC grading

Tumor segmentation

To obtain the VOIs of TMV for further analysis, the experienced radiologist with 8 years of experience firstly segmented TMV by delineating the outline of all contiguous slices of the entire tumor with itk-SNAP (http://www.itksnap.org/). The contour of TMV in each slice was drawn within the borders of the tumor masses in CT images of nephrographic phase, including necrotic, cystic change and hemorrhagic areas, but excluding normal renal tissue, perinephric and sinus fat. A senior radiologist with 25 years of experience checked all VOIs of TMV and conducted further feature extraction. Both of the two radiologists were blinded to the clinical and pathological reports. Subsequently, the TMVs were used as templates to create a corresponding VOIs of PTV. The VOIs of PTV were automatically created by a morphological method with the way of expanding and shrinking the borders of the tumor masses by 5 mm [19] in three dimensions uniformly. The sketch map of VOIs of TMV and PTV is demonstrated in Fig. 3, which also showed the fusion of TMV and PTV in axial, coronal, and sagittal in a CT image. In this study, the segmentation of PTV was a noninteractive method, meaning the PTV would contain some normal renal tissue, perinephric, sinus fat, or micro-vessel. The main advantage of this automatic process was to reduce manual intervention of PTV segmentation and to ensure the consistency and reproducibility of malignancy grade by PTV model. Moreover, a third type of VOIs was derived from TMV and PTV, which was defined as the combination region of tumor mass and peritumoral region (rTM-PTV) in this study.

Fig. 3
figure 3

Demonstration of TMV and PTV. The left sub-figure demonstrates the definition and relationship of TMV and PTV, where the pink ellipse represents a tumor mass, and elliptical ring filled with yellow diagonal lines represents a PTV. The right three sub-figures shows the fusion of TMV (red regions) and PTV (yellow regions around TMV) in axial, coronal, and sagittal of a CT image

Radiomics signatures extraction

Radiomics features extraction

Radiomics features in each VOI of TMV, PTV, and rTM-PTV were quantitatively extracted using PyRadiomics [20], an open-source python package for extraction of radiomics features from medical images. Some preprocessing methods were used before radiomics feature extraction. All VOI were resampled into voxel spacing of 1 × 1×1 mm using B-Spline interpolation, and voxel value was shifted 1000 to guarantee a positive value for each voxel. Moreover, 25 bin width was used for high-order feature extraction in Pyradiomics. These radiomics features were categorized into seven classes: 19 first-order gray-level statistic features, 16 3D shape-based features, 24 gray-level co-occurence matrix (GLCM) features, 16 gray-level run length matrix (GLRLM) features, 24 gray-level size zone matrix (GLSZM) features, 5 neighboring gray tone difference matrix (NGTDM) features, and 14 gray-level dependence matrix (GLDM) features (Fig. 4). All features are compliance with definitions of Imaging Biomarker Standardization Initiative (IBSI) [21]. Moreover, eight image filters, including wavelet, Laplacian of Gaussian (LoG), square, square root, logarithm, exponential, gradient, and 3D local binary pattern (LBP), were applied to each original image for yielding corresponding derived images. All classes of radiomics features were extracted from sixteen different image types (origin, Laplacian of Gaussian, square, square root, logarithm, exponential, gradient, 3D local binary pattern, wavelet-LHL, wavelet-LHH, wavelet-HLL, wavelet-LLH, wavelet-HLH, wavelet-HHH, wavelet-HHL, and wavelet-LLL). Ultimately, 1760 quantitative 3D radiomics features were extracted from each VOI of TMV, PTV, and rTM-PTV.

Fig. 4
figure 4

Details of radiomics features

Feature selection

As a preprocessing procedure of feature selection, a standardization method was applied to normalize all radiomics features of TMV, PTV and rTM-PTV to a zero mean and unit variance to prevent features in greater numeric ranges from dominating those in smaller numeric ranges. To select discriminative features, a sophisticated feature selection procedure was performed as follows. Firstly, low reproducibility features were removed from feature set if variance of normalized feature value was close to zero. Specially, the threshold was set to be 10−3 in this study. The Pearson correlation analysis [22] was performed to identify the distinctiveness of features and remove the redundant and low reproducible features if their mean absolute correlations were higher than 0.9. Some previous studies have shown that an additional feature ranking procedure was an effective way to improve the final performance of classifiers [23]. Consequently, a multivariate ranking method, named as minimum redundancy maximum relevance (mRMR), was applied to identify the most important features on the basis of a heuristic scoring criterion after removing the redundant and low reproducible features. Finally, only the top ranked features were input into the least absolute shrinkage and selection operator (LASSO) to build radiomics signature for evaluation of malignancy grades.

Classification

To evaluate the feasibility and potential of CT-based peritumoral radiomics signature to identify low and high grades in the simplified FGS of ccRCC, four types of radiomics signatures were input to the LASSO classifier and four classification model were trained using fivefold cross-validation with 5 repeat times on training cohort to determine an optimal parameter configuration. Four types of radiomics signatures included: (1) radiomics signature from TMV features, (2) radiomics signature from PTV features, (3) radiomics signatures from rTM-PTV features, (4) radiomics signatures from the merged features of TMV and PTV (sTM-PTV). Radiomics signatures from TMV, PTV and rTM-PTV were built respectively using the same procedure mentioned in “Radiomics features extraction” and “Feature selection” subsections. sTM-PTV features were derived from the combination of features of TVM and PTV, and subsequently selected the statistically significant and top ranked features following the procedure represented in the “Feature selection” subsection . The independent validation data was used to assess the performance of the classification models by the receiver operating curve (ROC), the quantitative indices of area under the curve (AUC), accuracy, sensitivity, specificity and precision. The DeLong test model [24] was used as a difference test for a statistical comparison of ROC curves.

Results

Patient characteristics

203 patients were included in this study (134 males and 69 females, with mean age 59.93 ± 12.02, range 34–88 years), of which 80 patients (39.41%) were high grade (50 for grade III, and 30 for grade IV) and 123 (60.59%) were identified as low grade (106 for grade I, and 17 for grand II). The group of high grade had significant older age (p = 0.023) and larger tumor size (p < 0.001) than the low grade group. Patient characteristics of the two groups are shown in Table 1.

Table 1 Patient characteristics of low and high grade of ccRCC

On the basic of random allocation, 122 patients (82 male and 40 female, with average age 58.74 ± 11.70 years) were assigned to the training group, while 81 patients (52 male and 29 female with age 61.48 ± 12.65 years) were assigned to the validation group. The training set contained 46 patients with high grade and 76 low grade patients, while there were 46 high grade patients and 33 low grade in the independent validation set. No significant intra-group difference was observed of gender both in training set (p value = 0.011) and validation set (p value = 0.013). It seemed there were some significant intra-group different of tumor size in both sets (p < 0.001) and of age in the training set (p < 0.001). An analysis detail of the patient characteristics and grades in the training and validation sets was shown in Table 2.

Table 2 Detail of patient characteristics and grade in training and validation sets

Performance of peritumoral radiomics signature

Low reproducibility features, whose variance was smaller than 10−3, were firstly removed from the feature set. Subsequently, the pair-wise Pearson correlation coefficients with threshold 0.9 were calculated to identify highly correlated feature pairs. The remaining features were ranked by mRMR, where the top 40 features were selected from TMV, PTV, rTM-PTV and sTM-PTV respectively. After LASSO method, 20 discriminative features were selected to finally build peritumoral radiomics signature for malignancy grading, including 10 high-order texture features, 7 3D shape features, and 3 first-order features from PTV, while 13 high-order texture features, 5 3D shape features, and 2 first-order features were built as signature in TMV. The details of these features were shown in Supplementary S1. The discrimination abilities of radiomics signatures were evaluated on the training cohort by fivefold cross-validation with repeating five times and subsequently evaluated on the independent validation cohort.

The performance of four type signatures are shown in Tables 34 and Fig. 5, where the PTV signature showed the highest AUC value on both of training and validation cohort, (training cohort: 0.807, 95% CI 0.800–0.834; validation cohort: 0.848, 95% CI 0.760–0.936), while the AUC value of TMV (training cohort: 0.773, 95% CI 0.744–0.802; validation cohort: 0.810, 95% CI 0.706–0.914) is smallest in the comparison of PTV, rTM-PTV, and sTM-PTV.

Table 3 Performance of radiomics signatures on training cohort
Table 4 Performance of radiomics signatures on validation cohort
Fig. 5
figure 5

Receiver operating characteristic (ROC) of radiomics signatures on training set and validation set

Table 5 displays the statistic difference of AUC values between peritumoral-based signatures and TVM signature, where p values were derived from the DeLong test in a one by one comparison. It demonstrate that PTV model had a slightly statistic difference on AUC from TMV model in validation dataset.

Table 5 Statistic difference of AUC values between peritumoral-based signatures and TVM signature

Discussion

In this study, a novel Fuhrman grading model based on peritumoral radiomics signature was presented for noninvasively assessing simplified Fuhrman grades of ccRCC. Comparison to the previous studies [25,26,27], this study focused on the role of peritumoral microenvironments in the malignancy grading of ccRCC. The statistical results in the experiments showed that CT-based peritumoral radiomics signature had an ability to distinguish malignancy grades of ccRCC with a satisfactory performance. Comparison to the signature of TMV, all of PTV, rTM-PTV, and sTM-PTV improved the grading performance, the improvement range of AUC values is from 2.40 to 3.40% in training cohort and from 0.10 to 3.80% in the validation cohort, demonstrating the peritumoral delineation reflected the characteristics and the heterogeneity of the tumors accurately and comprehensively [28]. Moreover, our evaluations also showed that radiomics signature from peritumoral region, regardless of the signature of PTV or its variants, was a stable and more accurate indicator for CT-image-based machine learning models to distinguish ccRCC malignancy grades in training cohort, which was confirmed further in the analysis of independent validation cohort.

The final selected features showed that high-order texture features played a major role in radiomics signature building (13/20 in TMV, 11/20 in PTV, 14/20 in rTM-PTV, and 12/20 in sTM-PTV), which implied that the high-order texture features had an ability to quantify heterogeneity of tumors in multiple perspectives and multi-scales [29]. The result of feature selection showed that the “zone entropy” and “busyness” in high-order texture feature set had the highest correlation to Fuhrman grades in peritumoral region, and “joint energy” was the top ranked feature in TMV. Entropy was a valid radiomics signature to describe the heterogeneity of a tumor, a higher value indicated more heterogeneity in the texture patterns [30]. The previous studies [31, 32] also pointed out that entropy was closely related to tumor invasiveness. In this study, “zone entropy” in PTV was the top important feature in high-order texture features, which measured the uncertainty and randomness in the distribution of zone sizes and gray levels. In the view of pathology, nuclear diameter, nuclear shape and nucleoli determine the Fuhrman grades of ccRCC together, where a higher grade exhibits a larger diameter of nuclear, more irregular shape of nuclear, and more irregular in the arrangement of histological internal components in pathological sections. ‘Joint energy’ was the top ranked features of high-order texture features in the rTM-PTV model. Joint energy is a measure of homogeneous patterns in an image, where a greater energy value implies that there are more instances of intensity value pairs in an image, which neighbor each other at higher frequencies. The top rank of ‘zone entropy’ and ‘joint energy’ implies the heterogeneity in the CT-based peritumoral and tumor mass texture patterns could macroscopically reflect the irregularity of nuclear shape and arrangement. ‘Busyness’ is another important feature in PTV, which is a measure of intensity change from a pixel to its neighbors. A high value for busyness indicates a ‘busy’ image with rapid changes of CT intensity between a pixel and its neighborhoods. ‘Small dependence emphasis’ (SDE) was the most relevant feature to malignancy grades in the sTM-PTV model. Small dependence emphasis is a measure of the distribution of small dependencies, a greater value indicatives smaller dependence and less homogeneous textures. The results also showed that wavelet features were the majority in the selected high-order features in PTV model, except ‘business’ and ‘gray-level nonuniformity’ from origin image, ‘large area emphasis’ from LBP image, and ‘gray-level nonuniformity’ from gradient image.

3D shape features also played an important role in the malignancy grading of ccRCC (5/20 in TMV, 7/20 in PTV, 6/20 in rTM-PTV, and 6/20 in sTM-PTV), where ‘major axis length’ was the most relevant feature in 3D shape feature set for TVM, rTM-PTV, and sTM-PTV. This feature was calculated by the largest principal component and indicated the largest axis length of the enclosing ellipsoid. This study demonstrated that 3D size of tumor was a high relative indicator for malignancy grading. Some previous studies got a conclusion that every centimeter increase in tumor size increased the probability of a high grade by 25%, comparing to a low grade tumor [33].

‘Contrast’ was a discriminative feature in the first-order histogram feature, which quantitatively described the spatial intensity change, depending on overall gray-level dynamic range. The value of this feature was high when both the dynamic range and the spatial change rate were high, i.e., an image with a large range of gray levels and large changes between voxels and their neighborhoods.

There were several limitations in this study. Firstly, we only constructed the signatures of the VOIs sketched on the nephrographic phase CT images, not using the other phases (e.g., corticomedullary phase) for further multi-parameter analysis. Some studies showed that the AUC value of signature in corticomedullary phase was higher than that in nephrographic phase for ccRCC grading. Additional external validation studies using more images with multiple-phases are expected to confirm the finding in this study. Secondly, although the total count of patients in this study conformed with the standard of the sample size for a diagnostic experiment, a prospective and multicenter experimental study is considered to be needed to experimental verify the validity of peritumoral radiomics signature in further studies [27]. The third limitation is that the tumor mass region was manually delineated, meaning this study required experts annotation. However, radiomics features are well known to be different with small changes in the manually delineated ROI, which limits its integration in clinical routine and hinders radiologists’ workflows. Despite the above limitations, our study found that peritumoral radiomics signature was a valid model for malignancy grading of ccRCC.

In conclusion, CT-based peritumoral radiomics signature could noninvasively predict the Fuhrman grades of ccRCC.