Introduction

Renal cell carcinoma (RCC) is the seventh most common cancer worldwide, with a mortality rate of 140,000 per year [1]. The most common types of renal cancer cells are clear cells RCC (ccRCC), papillary RCC (pRCC), and chromophobe RCC (chRCC) [2, 3]. Approximately 70% of kidney cancers are made up of ccRCC, pRCC accounts for 10–15% of kidney cancers, whereas chRCC is the least common type with only 5% of kidney cancer cases [4]. ccRCC has a survival rate of less than 5 years and a higher risk of metastasis compared to pRCC and chRCC [5].

One of the most important tasks in cancer diagnosis and treatment is tumor staging and grading. Tumor grading defines the differentiation of tumor tissue cells relative to normal tissue cells. It is an indicator of how quickly a tumor is expected to grow and spread. The Fuhrman grading system is widely recognized among the clinical oncology community [6]. It is based on the assessment of the following cell nucleus characteristics: nuclear size, nuclear shape, and nucleolar prominence. Based on these assessments, the tumor will be classified into one of four different grades (I–IV). Grades I and II are considered as low-grade tumors with a favorable prognosis, while grades III and IV account for high-grade tumors commonly having unfavorable prognosis [7].

Currently, fine-needle aspiration (FNA) and imaging-guided biopsies are the gold-standard methods for preoperative kidney tumor grading. However, these techniques have some drawbacks including infection, bleeding, tumor cells spreading, and provide limited information regarding the whole tumor due to tumor heterogeneity. Because of intra-tumoral heterogeneity in ccRCC [8], biopsy underestimated the Fuhrman grade in 55% of the cases [9].

A number of noninvasive therapeutic strategies for RCC have been devised during the last decade, including radiofrequency ablation, cryoablation, and active surveillance [10, 11]. However, a proper criterion for patient management using these noninvasive/minimally invasive treatment methods is still lacking, as most patients are often treated surgically post-diagnosis [12, 13]. Therefore, it is desirable to recommend individualized treatment strategies, where radical approaches (e.g., surgery) are kept only for aggressive or high-grade ccRCC tumors (III, IV), whereas conservative management (e.g., active surveillance) is applied for low-grade (I and II) lesions [14]. To guide the decision-making process, an accurate noninvasive method for preoperative Fuhrman grading of renal cell carcinoma tumors is desirable. To this end, two promising approaches have been adopted in clinical settings, namely MRI-guided derivation of apparent diffusion coefficient (ADC) values [15] and CT-based semiquantitative and quantitative techniques [16, 17].

Radiomics serve as the bridge between medical imaging and personalized medicine [18] and refers to the comprehensive quantification of tumor phenotype to uncover disease characteristics that fail to be revealed by the naked eye [19,20,21]. In fact, radiomics is a new era of science which faces many challenges, including image acquisition [22], reconstruction and processing [23, 24], and model development to provide robust and reproducible representations. Previous studies have shown that the radiomics signature is valuable for differentiating high-/low-grade ccRCC tumors [25,26,27]. This study aims to construct a radiomics feature-based machine learning model to predict the Fuhrman grade of ccRCC patients preoperatively.

Materials and methods

Figure 1 presents the workflow followed in the current study.

Fig. 1
figure 1

Workflow adopted in the study protocol

Patient population

Two hundred and twenty-two clinical studies from the cancer image archive database [28] were included in the study protocol. The inclusion and exclusion criteria included simultaneous availability of CT images and pathological information, good CT image quality, and tumor visibility on CT images. This results in the inclusion of 71 patients in the current study. Table 1 provides the demographics of the patient population.

Table 1 Clinical characteristics of the patient population presenting with clear cell renal cell carcinoma

CT scanning protocol

All patients underwent a three-phase CT scan, including (1) a routine unenhanced CT scan, (2) a corticomedullary phase (CMP) contrast-enhanced scan starting 40 s after injection of the contrast material, and (3) a nephrographic phase (NP) contrast-enhanced scan performed 70–90 s after intravenous injection of iodinated contrast material. The iodine content (300 mg/mL) was infused at an infusion rate of 3 mL/s at an infusion dose of 80–100 mL. All subjects were scanned on GE Healthcare and Siemens Healthineers CT scanners with a tube voltage of 120 kVp and a tube current of 150–300 mA using daily clinical reconstruction parameters.

Tumor segmentation

In this study, manual volume of interest (VOI) segmentation was performed and verified by an experienced radiologist using the 3D slicer software package [29].

Image preprocessing

Prior to feature extraction, the voxel size resampling method was applied on the images to create an isotropic dataset. This allowed comparisons between image data from different samples and scanners [30]. Laplacian of Gaussian (LoG), wavelet decomposition (WAV), and discretization into 32, 64, and 128 bins preprocessing were performed to generate a different set of features. For the LoG filter, different sigma values were used to extract fine, medium, and coarse features. Wavelet filtering yields 8 decompositions per level: all possible combinations of applying either high- (H) or a low (L)-pass filtering in each of the three dimensions, including HHH, HHL, HLH, HLL, LHH, LHL, LLH, and LLL. The preprocessing steps (including discretization, LOG, and wavelet) were also performed on all intensity, histogram, and textural features.

Feature extraction

Radiomic features were extracted using the PyRadiomics open-source python library [31]. The extracted features were then categorized into the following subgroups. Firstly, shape features depict the shape of the tumor volume and geometric properties, such as volume, maximum surface, tumor compactness, and sphericity. Furthermore, first-order statistic features describe the distribution of voxel intensities within tumor volumes, including mean, median, maximum, and minimum values of the voxel intensities. Second- and higher-order statistic features (known as textural features) are used to measure inter-relationships between voxel distributions within tumor volumes, reflecting changes in image space gray levels. These features include gray-level co-occurrence matrix (GLCM), gray-level run length matrix (GLRLM), gray-level size-zone matrix (GLSZM), and gray-level dependence matrix (GLDM) features. Table 2 provides detailed information about the extracted features.

Table 2 Summary of radiomic features used in this work

Univariate analysis

For univariate analysis, early correlation tests between features were used to eliminate highly correlated features. Student’s t tests were then used for comparisons between groups. To control the false discovery rate (FDR) in multiple hypothesis testing, the Benjamini–Hochberg (FDR) correction method was applied on the resulting p values to ultimately report q-values [32].

Feature set preprocessing

Due to the different range of the various radiomic features, without feature normalization, some features might exhibit a larger weight, while others might exhibit a lower weight. This depends on the distribution of feature values. To overcome this issue, z-score normalization was applied to the feature values [33].

Feature selection

Three different feature selections methods were implemented in this framework (Table 3), namely enhanced variable selection algorithms based on the least absolute shrinkage and selection operator methods [34], Student’s t test, and the minimum redundancy maximum relevance (MRMR) algorithm.

Table 3 Feature selection and classification methods

Multivariate machine learning classifier

The following three classifiers were implemented and compared (Table 3): logistic regression, random forest, and support vector machines (SVM).

Model evaluation

The cross-validation (CV) technique was applied to tune the model parameters. Furthermore, bootstrapped datasets were used for model evaluation. The predictive power of all models was investigated using the area under the receiver operator characteristic (ROC) curve (AUC). All analyses and evaluations were performed using the R software for statistical computing (version 3.5.2).Footnote 1

Results

Considering the inclusion/exclusion criteria, 71 (31 low grade and 40 high grade) patients were enrolled in the study protocol. The mean age of low- and high-grade groups was 60.05 and 60.08 years, respectively. Overall, there were 51 male and 20 female subjects.

Univariate analysis demonstrated that among filtered and non-filtered images, only 128 bin-discretized images had a statistically significant difference in texture parameters with a mean AUC of 0.74 ± 3 (q value < 0.05). These features include Long-Run High Gray-Level Emphasis from GLRLM (AUC = 77, q value < 0.0005), Cluster Tendency from GLCM (AUC = 72, q value < 0.002), Contrast from NGTDM (AUC = 74, q value < 0.05), and Dependence Non-Uniformity from GLDM (AUC = 72, q value < 0.05 (Fig. 2).

Fig. 2
figure 2

Area under receiver operating characteristic curve for discrimination between high- and low-grade ccRCC. a Univariate analysis of best predictor, b LR model with 128 bin discretization, c SVM model with coarse LoG filter, d RF model with wavelet filter. AUC: area under receiver operating characteristic curve, LR: logistic regression, SVM: support vector machine, RF: random forest

Table 4 shows the AUC (95% CI) of three different ML-based classifiers. As shown in the table, there is a wide performance range, ranging from 0.5 to 0.86. Three different feature selection methods were applied prior to the implementation of each ML-based classifier to determine the best technique for this specific model. The results demonstrated that the LASSO method performed the best for logistic regression. Furthermore, Student’s t test proved to be the best for random forest and SVM classifier models. The results of logistic regression suggested that 128 bin-discretized images and fine LoG features had the highest performance with a mean of AUC of 0.75. According to the results, the predictive performance of the random forest model had a range of 0.48 to 0.67. Among these, wavelet-filtered images showed the lowest performance, whereas 128 bin-discretized images showed the highest performance. Among the three classifiers, SVM with Student’s t test feature selection presented the best predictive performance. SVM with coarse LoG features demonstrated a mean AUC of 0.83 (Fig. 2).

Table 4 Classifiers performance with different preprocessing techniques

Discussion

There is a significant association between Fuhrman grade and patient’s prognosis [35, 36]. Among the different therapeutic strategies, radical surgery remains the main and most effective treatment approach to cure RCC patients. Radiofrequency ablation and active surveillance could be considered alternative treatment options for small renal masses and low risk small renal mass [37, 38]. Therefore, the preoperative assessment of RCC tumor aggressiveness plays a key role in optimal treatment planning and selection of appropriate follow-up regimens [11]. Various strategies were proposed for preoperative noninvasive prediction of ccRCC Fuhrman grade. MRI-derived ADC values are known to be an indicator of tumor activity. Several studies have assessed the utility of ADC in distinguishing low- and high-grade clear cell RCC [15, 39]. These studies showed that MRI has an acceptable predictive accuracy in the preoperative detection of high-grade RCC (AUC = 0.80) [40]. However, MRI is not as widely available as CT and a wide range of ADC values for ccRCC have been reported in the literature [41, 42]. Therefore, their robustness and repeatability need to be further validated. Conversely, CT-based semiquantitative and quantitative studies have attempted to classify low- and high-grade ccRCC [16, 17]. These studies showed that CT is a promising method for classification of low- and high-grade ccRCC.

The radiomics approach converts medical images into quantitative, high-dimensional, and mineable features enabling to predict tumor status. However, the abundance of predictive modeling techniques implies a selection process to choose the most appropriate one for predicting tumor status. Given that a number of previous radiomic studies [17, 25] for Fuhrman grade prediction did not include shape features in their analyses, this study combined shape features and texture features to differentiate low and high grades of ccRCC. It was observed that shape features cannot be ignored from multivariate machine learning models.

Univariate analysis of the extracted radiomic features demonstrated that among filtered and non-filtered images, only the 128 bin-discretized images showed statistically significant texture parameters. In a similar univariate analysis, Feng et al. [26] analyzed CT texture parameters and found effective quantitative parameters to evaluate the heterogeneity of ccRCC. After applying the LoG filter, they reported that only entropy had a statistically significant difference after FDR correction in all image phases. In this work, four features showed statistically significant differences between two groups. These features include Long-Run High Gray-level Emphasis from GLRLM, cluster tendency from GLCM, Contrast from NGTDM, and Dependence Non-Uniformity from GLDM matrix. Among these features, the Long-Run High Gray-level Emphasis demonstrated the highest AUC (AUC = 77, q value < 0.0005).

The first machine learning model applied in this study was logistic regression. This is a machine learning classification algorithm used to predict the class probability of a categorical dependent variable. It was observed that among the three different feature selection methods, the best results for the logistic regression model was obtained when using the LASSO algorithm. These results suggest that the AUC logistic regression model is approximately similar to results obtained in previous studies. Ding et al. [17] used a texture-score-based logistic regression model on a training cohort resulting in an AUC of 0.878. When predictive models were applied on the validation cohort, good results were still obtained (AUC > 0.670). Shu et al. [43] extracted radiomic features from corticomedullary (CMP) and nephrographic phases (NP) of CT images of 161 and 99 patients diagnosed with low- and high-grade ccRCCs. They constructed logistic regression classification models to discriminate between high- and low-grade ccRCC. Application of the model on CMP and NP showed an AUC of 0.766 (95% CI 0.709–0.816) and 0.818 (95% CI 0.765–0.838), respectively. Random forest, an ensemble learning method consisting of a collection of decision trees, was an alternative machine learning model investigated in this work. It uses a weighted average of these trees for the final decision [44], commonly resulting in a good outcome for a large range of data, but is susceptible to overfitting. In this work, applying the random forest model on the dataset yielded unsatisfactory results (Table 4). SVM, which creates a decision boundary between two classes to enable the prediction of labels from one or more feature vectors, was the best-performing classifier. After applying the SVM model on filtered and unfiltered images, the best classification result was obtained when coarse LoG features were used with a mean AUC of 0.81. LoG filtering is an advanced image filtering technique combining Laplacian filtering and Gaussian filtering. In a similar single-center retrospective study [25], the performance of quantitative CT texture analysis combined with different ML-based classifiers was evaluated for discriminating low- and high-grade ccRCC. Despite differences in the procedure followed, they also determined that the highest predictive performance is achieved by an SVM classifier. In summary, both studies support each other with a common conclusion that CT texture analysis is a useful and promising noninvasive method to predict the Fuhrman grades of ccRCCs preoperatively. In the current study, we applied machine learning algorithms on CT radiomic features to predict noninvasively Fuhrman grades of ccRCCs, demonstrating promising results. The noninvasive identification of ccRCC grading could help in defining appropriate treatments, especially for patients with small mass and could potentially serve as an alternative for FNA in renal cancer.

This work bears a number of limitations. (1) This was a retrospective study using a small sample size with no external validation data. As such, we used the bootstrap technique (estimate statistics on a population by sampling a dataset with replacement) to evaluate our models. Future studies exploiting the current model could involve a large size of external validation set emanating from a multicenter study protocol. (2) Since the tumor boundary was drawn manually, interference of the volume effect cannot be completely avoided. Yet, the current study involved image segmentation performed and evaluated by one human observer. Future studies should consider the effect of volume segmentation to provide a repeatable study for clinical multicentric studies.

Conclusion

The results of this study show that CT-based SVM classifier with t test features selection could be a useful and promising noninvasive approach for the prediction of low and high Fuhrman nuclear-grade ccRCCs. Moreover, the results demonstrated that 128 bin-discretized preprocessing is an effective method under these conditions.