Introduction

Renal tumors encompass a large spectrum of neoplasms that vary in clinical behavior, histopathologic features, and genetic expressions. These lesions range from benign (e.g., oncocytoma, angiomyolipoma [AML]) and indolent tumors (e.g., variants of papillary and chromophobe renal cell carcinomas [RCCs]), to aggressive malignant tumors (e.g., clear cell RCC [ccRCC]) [1,2,3].

Distinguishing RCC aggressive subtypes from other benign and indolent lesions has been a topic of interest. Malignant and indolent kidney tumors differ in prognosis, biological behavior, and response to available therapies [4]. Percutaneous renal biopsy of renal masses can provide pre-treatment pathologic diagnosis, with a reported accuracy ranging from 70% to 90%. However, biopsy is an invasive procedure with potential complications, and has the possibility of sampling errors and non-diagnostic analysis in up to 20% of cases [5, 6]. Non-invasive methods, such as dynamic contrast-enhanced computed tomography (CT) and contrast-enhanced magnetic resonance imaging (MRI), can provide qualitative assessment for characterizing renal masses. In particular, MRI has been well described in the evaluation of more common subtypes of RCC (ccRCC, papillary [pRCC], and chromophobe), primarily using a combination of T2-weighted imaging (T2-WI) and post-contrast T1-weighted imaging (T1-WI) [7, 8]. Despite this, determining an accurate diagnosis is still challenging, mainly because of known histologic and molecular heterogeneity within different renal neoplasms, subtypes of RCC, and within a single tumor [9, 10].

Radiomics is an emerging field that intends to extract maximal information from standard of care images, to provide information beyond what can be achieved from human imaging interpretation alone [11, 12]. Radiomics features can be divided into semantic (like size, shape, location, and necrosis), and agnostic features (quantitative features defined by an advanced mathematically algorithm) [12, 13]. Statistical features are part of agnostic features, including first-order, second-order, and higher order radiomics features [13, 14]. First-order features (histogram) describe the distribution of pixel intensity values [15]. Second-order features (texture) describe the spatial relationships of voxels. Gray-level co-occurrence matrix (GLCM) texture features introduced by Haralick et al. are the most often analyzed features [16]. Higher order radiomics methods impose filters on the images to extract patterns [13].

Large datasets of features can be extracted from a single image with radiomics. These features can be used in machine learning (ML) algorithms and potentially aid tumor detection, diagnosis, assessment of prognosis, prediction of response to treatment, and monitoring disease status [17].

Machine learning is a subfield of artificial intelligence in which algorithms or classifiers learn patterns from large databases, to generate valuable predictive outputs [18, 19]. In general, ML uses a training set of examples to perform tasks such as feature selection and parameter fitting, and a validation or test set to evaluate the model performance [20]. Random Forest (RF) is a type of ML model known as a highly effective algorithm for classification. In RF algorithms, multiple decision trees are combined to get strong and robust models, minimizing training errors, and allowing generalization to new datasets [20, 21]. Application of ML models to clinical practice may have beneficial implications, improving healthcare quality and safety [22].

The aim of our study was to assess the diagnostic value of MRI-based radiomics features using ML models in characterizing solid renal neoplasms, in comparison/combination with qualitative evaluation.

Materials and methods

Patient selection

The local institutional review board approved this retrospective single-center study, and HIPAA compliance was maintained throughout the study period. Our center’s urological database was queried between August 2015 and June 2019 using the search terms “renal mass,” “renal cell carcinoma,” “RCC,” “nephrectomy,” and “partial nephrectomy.” This search identified a total of 152 patients with renal masses with preoperative MRI and pathologic correlation. Inclusion criteria were adult (> 18 years) patients with (1) renal mass that underwent total or partial nephrectomy, (2) patients who underwent abdominal MRI before surgery, and (3) patients with solid or mixed lesions (solid and cystic). Exclusion criteria were (1) cystic lesions without any solid component and (2) lesion size less than 1 cm. One hundred twenty-five patients (84 M/41F; mean age 58.8 ± 11.5 years, range 20–83 years) met the inclusion criteria and comprise our study population. Average time between MRI and surgery was 78.1 ± 91 days (range 4–808 days). A flowchart of patient inclusion is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of patient population

MRI examination

Imaging studies were performed with multichannel MRI systems including 1.5T (n = 95; Aera, Espree, Symphony, Amira, Siemens Healthineers; Signa HD, HDxt and Optima, GE medical systems) or 3T (n = 30; Skyra and Verio, Siemens Healthineers; Signa HDxt, Discovery 750w, GE Medical Systems) imaging platforms. All patients were imaged in supine position using phased array torso coils. Routine abdominal MRI sequences performed included axial and coronal single-shot fast spin echo T2-WI (HASTE/SSFSE), axial fat-suppressed (FS) fast spin echo T2-WI, diffusion-weighted imaging (DWI, at b values of 50, 400, and 800 s/mm2) with apparent diffusion coefficient [ADC] maps), and dynamic multiphase T1-WI. For dynamic contrast-enhanced imaging, unenhanced, arterial phase (AP), nephrographic phase (NP), and delayed phase (DP) were obtained using 3D T1-WI fat-suppressed spoiled gradient-recalled echo sequence (VIBE/LAVA) before and after the administration of a gadolinium-based contrast agent (Gadobutrol [Gadavist, Bayer Healthcare]; Gadoterate meglumine [Dotarem, Guerbet]; Gadobenate dimeglumine [Multihance, Bracco Diagnostics]; Gadopentetate dimeglumine [Magnevist, Bayer Healthcare]; Gadoversetamide [Optimark, Guerbet]; Gadoteridol [Prohance, Bracco Diagnostics]; Gadoxetate disodium [Eovist, Bayer Healthcare]; Gadodiamide [Omniscan, GE Healthcare] or unknown). Eighty-four (67.2%) patients had an outside MRI; 4 of them did not have contrast-enhanced images and 33 patients did not have DWI.

Image analysis

Qualitative analysis

The index lesion was defined as the largest pathologically confirmed solid or mixed (solid and cystic) renal mass. Two independent radiologists (with 20 and 14 years of experience in abdominal MRI) identified the index lesion and performed qualitative evaluation, using PACS (Centricity 3.0, General Electric Medical Systems). The observers were aware that the patient had a renal mass; however, they were blinded to the final diagnosis. The evaluation consisted of filling out a form that included tumor size, laterality (right or left), margins (well-defined or ill-defined), composition (solid, cystic or mixed), presence of hemosiderin, tumor fat content, growth pattern (endophytic, < 50% exophytic or > 50% exophytic), collecting system invasion, renal vein invasion, contrast enhancement on AP (hypovascular [lower enhancement compared to renal cortex] or hypervascular [same or higher enhancement compared to renal cortex]), heterogeneity on NP (homogeneous or heterogeneous nodular enhancement), chemical shift, and T1-WI, T2-WI, diffusion-weighted imaging (DWI), and apparent diffusion coefficient (ADC) map signal (hypointense, hyperintense, isointense, or heterogeneous) [23]. Presumed benign or malignant appearance of the tumor was also recorded based mainly on signal intensity of the lesion on T2-WI, degree of contrast enhancement on AP, and presence of intra-lesional fat on in- and out-of-phase T1-WI [24].

Quantitative analysis

The same index lesions were evaluated for quantitative analysis. T2-WI, DWI/ADC, and T1-WI pre-and post-contrast (AP, NP, and DP) images were analyzed using OsiriX (Pixmeo SARL, Bernex, Switzerland) software by a third radiologist (with 3 years of experience in body imaging). Regions of interest (ROIs) were placed in the previously defined index lesion including almost the entire area of the solid portion of tumors, avoiding cystic areas and the most peripheral portions to exclude partial volume effects, using OsiriX software. One single-slice ROI was placed in the largest tumor area if the lesion was < 3 cm, and two ROIs were placed on two consecutive slices if the lesion was ≥ 3 cm. Signal intensity (SI) in the lesion ROI was normalized by the mean SI in an ROI placed in the uninvolved renal cortex of the ipsilateral kidney (area 90–100 mm2), with the purpose to decrease SI variation due to heterogeneity in MR acquisition protocols [25]. Mean enhancement ratios were extracted from the ROIs during AP, NP, and DP.

Radiomics analysis was performed by an MRI physicist (with 4 years of experience), utilizing in-house developed software with MATLAB (MathWorks, Inc. Natick, MA). For each sequence, histogram features (first-order radiomics) included central tendency parameters (mean and median) and heterogeneity parameters (standard deviation (SD), kurtosis, and skewness), with a total of 50 histogram features extracted for each lesion. Fourteen Haralick texture features were calculated from the gray-level co-occurrence matrix (GLCM) for each sequence also, with a total of 140 texture features per lesion. Gray-level intensity data were normalized before the texture extraction in order to standardized the signal intensity range to 0–64. The lists of histogram and texture features are included in Supplemental Fig. 1. Illustration of radiomics workflow is shown in Fig. 2.

Fig. 2
figure 2

Illustration of radiomics workflow assessment in two different patients. The figure shows post-contrast T1-weighted images obtained on delayed phase with regions of interest placed on the lesion, mask image of the tumor, texture map (sum variance), and histogram skewness distribution of the lesion. a A 32-year-old female patient with right clear cell renal cell carcinoma (ccRCC). b A 42-year-old male with right papillary renal cell carcinoma (pRCC) with atypical magnetic resonance imaging (MRI) appearance (hypervascular tumor). Both lesions were characterized as hypervascular malignant tumors, with heterogeneous enhancement. Texture map images showed higher sum variance (texture feature) on ccRCC compared to pRCC. Histogram figures showed skewed distribution to the left on ccRCC, while pRCC showed a more symmetrical histogram distribution. This is a representative example where radiomics features helped differentiating these RCC subtypes

Histopathologic analysis

Histopathologic tissue confirmation for tumor type/subtype of each lesion was extracted from the pathology report of partial (82.4%; n = 103) or total nephrectomy (17.6%; n = 22).

Statistical analysis

Clinical and demographic data were summarized using descriptive statistics expressed as mean ± standard deviation (SD). The study sample was randomly divided in a training set (70%; n = 88) and a validation set (30%; n = 37) [26]. Chi-square test was used to compare demographics and clinical data of both sets.

We built three classification categories, one to differentiate malignant from benign lesions. The second and third categories were used to differentiate ccRCC and pRCC from all other renal lesions, respectively; given that these two subtypes are the most common [27]. Mann–Whitney U test was used to assess differences in qualitative and quantitative radiomics features among the three classifications, and receiver-operating characteristic (ROC) curves were generated to assess the diagnostic performance of statistically significant features in the training set. ML modeling was performed using qualitative, quantitative radiomics, and combination of features for prediction of tumor diagnosis (RCC, ccRCC, and pRCC) using random forest on MATLAB statistical tools. Only features that showed significance for the Mann–Whitney U test on the training set were selected as input for the models. Sensitivity and specificity for the models were calculated using Youden index. Missing data were filled using an interpolation imputation method. All statistical tests were conducted in MATLAB (MathWorks, Natick, MA) and SPSS (IBM, Armonk NY). For all tests, a p value below 0.05 was considered statistically significant.

Results

Study population

Our patient population included 104 patients with RCC (83.2%) and 21 with benign lesions (16.8%). RCC subtypes (n = 104) included ccRCC (n = 51, 49.1%), pRCC (n = 29, 27.9%), chromophobe RCC (n = 12, 11.5%), and other less common subtypes (n = 12, 11.5%). Benign lesions (n = 21) consisted of fat poor AML (n = 8, 38.1%), oncocytoma (n = 11, 52.3%), solitary fibrous tumor (n = 1, 4.8%), and mixed epithelial and stromal tumor (n = 1, 4.8%). Mean tumor size at pathology was 3.5 cm ± 2.6 (1.0–13 cm). Demographics and clinical parameters of the patient population are described in Table 1.

Table 1 Demographics and tumor characteristics of our study population

Qualitative analysis

Mean tumor size on imaging was 3.5 cm ± 2.7 cm (range 1.0–12.2 cm). Sixty-three tumors were located in the right kidney (50.4%), and sixty-two in the left kidney (49.6%).

On the training set, statistically significant differences for distinguishing between RCC and benign lesions were observed for contrast enhancement on AP (p = 0.002; AUC 0.72, p = 0.001), and presumed histologic subtype based on enhancement and T2 signal (p = 0.002; AUC 0.64, p = 0.025).

For differentiating ccRCC from other lesions there were significant differences on contrast enhancement on AP (p = 0.005; AUC 0.66, p = 0.008); heterogeneity on NP (p < 0.001; AUC 0.71, p < 0.001); presence of cystic components (p = 0.007; AUC 0.64, p = 0.010); signal intensity on T2-WI (p < 0.001; AUC 0.68, p = 0.002); and ADC map signal (p = 0.019; AUC 0.65, p = 0.030). Presumed benign or malignant MRI appearance showed to be significant in the Mann–Whitney U test (p = 0.011), but AUC analysis was not significant, with a value of 0.59 (p = 0.079). When comparing pRCC with other lesions, we found significant differences in growth pattern (presence of exophytic component, p = 0.036; AUC 0.64, p = 0.025); heterogeneity on NP (p = 0.003; AUC 0.68, p = 0.002); signal intensity on T2-WI (p < 0.001; AUC 0.71, p < 0.001); contrast enhancement on AP (p < 0.001; AUC 0.75, p < 0.001); and ADC map signal (p = 0.021; AUC 0.65, p = 0.02). Significant qualitative features are summarized in Table 2.

Table 2 Qualitative imaging features in the training set (n = 88)

Quantitative analysis

On the training set, five texture features were found to be significant for differentiating benign from malignant tumors, with highest AUC of 0.81 (p < 0.001) for ADC homogeneity. Representative cases are shown in Fig. 3.

Fig. 3
figure 3

Representative examples of apparent diffusion coefficient (ADC) homogeneity texture maps helping differentiating oncocytoma from clear cell renal cell carcinoma (ccRCC). T1 post-contrast on arterial phase, ADC map, and texture ADC homogeneity map in a a 51-year-old male with left oncocytoma, and b a 69-year-old female with right ccRCC; showing higher ADC homogeneity in ccRCC

When comparing ccRCC versus other renal lesions we found significant differences in 22 histogram features, 9 texture features, and mean enhancement ratios. Standard deviation on AP and mean signal intensity in DWI-b800 were the most significant with AUCs of 0.84 (p < 0.001) and 0.90 (p < 0.001), respectively.

When comparing pRCC versus other tumors, there were significant differences in 19 histogram features, 13 texture features, and enhancement ratios. The highest AUC was 0.85 (p < 0.001); obtained in mean and median on AP. Box plots of significant features examples are shown in Fig. 4. Details of all significant features on the training set are included in Supplemental Table 1.

Fig. 4
figure 4

Box plot distributions of significant radiomics features. Renal cell carcinomas (RCCs) vs. other; RCCs showed higher texture values (homogeneity and maximal correlation) on diffusion-weighted image (DWI [low b value]) and apparent diffusion coefficient (ADC). Clear cell RCC (ccRCC) vs. other; ccRCC showed higher standard deviation on arterial phase (AP), and lower median values on DWI (high b value). Papillary RCC (pRCC) vs. other; pRCC showed lower mean and median values on AP and nephrographic phase (NP), respectively

ML random forest models for qualitative features, quantitative radiomics features, and combination of features

Three models were generated for predicting RCC, ccRCC, and pRCC, using qualitative, quantitative, and combination of qualitative and quantitative features.

Only one ML model was significant for differentiating RCC from benign lesions, using combination of qualitative and quantitative radiomics features, with AUC of 0.97 (95% confidence intervals [CI] 0.94–1.00, p < 0.001) on the training set (93% sensitivity and 94% specificity), and AUC of 0.73 (95% CI 0.50–0.96, p = 0.002) on the validation set (58% sensitivity and 100% specificity).

For diagnosing ccRCC versus all other lesions, radiomics-based ML model had the highest diagnostic performance with AUC of 1.0 (95% CI 1.00–1.00, p < 0.001) on the training set (100% sensitivity and 100% specificity), and 0.77 (95% CI 0.62–0.92, p < 0.001) on the validation set (95% sensitivity and 56% specificity).

For differentiating pRCC from all other renal tumors, the model with highest AUC was generated using qualitative features, with AUC of 0.91 (95% CI 0.82–0.99, p < 0.001) on the training set (91% sensitivity and 81% specificity), and 0.74 (95% CI 0.53–0.95, p = 0.010) on the validation set (75% sensitivity and 69% specificity). Complete results of multivariate modeling are summarized in Table 3.

Table 3 Diagnostic performance of machine learning models based on qualitative and quantitative radiomics features

Discussion

Our study found associations between qualitative and quantitative imaging features with different types of renal neoplasms. Based on informative features, we generated ML models to predict RCC, ccRCC, and pRCC diagnosis. Best diagnostic performances were observed in quantitative radiomics model for differentiating ccRCC, qualitative features model for predicting pRCC, and combination of features model for discriminating malignant lesions. Our study suggests that MRI-based radiomics features may improve characterization of solid renal neoplasms.

There is an increasing amount of evidence supporting the utility of radiomics [28,29,30].

Medical images are becoming a valuable source of data, and quantitative radiomics features may be used as a non-invasive tool for lesion characterization and classification [11]. There are several studies showing promising results of the utility of radiomics features in differentiating histologic subtypes in renal neoplasms. One study found similar performance to our results using MRI radiomics features in renal tumors evaluation [25]. Several informative features they found were also significant in our study, including skewness on AP and DP, DP variance, and DP sum average. Also, the authors generated models using cross-validation and random forest, showing 79% accuracy for differentiating RCC and ccRCC from oncocytoma, and 78% accuracy for distinguishing pRCC from ccRCC. This study differs from ours because they compared RCC and ccRCC only with oncocytomas as benign lesions, and results were not correlated to qualitative evaluation.

Another study reported that several MRI texture features had excellent diagnostic performance in differentiating ccRCC from non-ccRCC (AUC > 0.8), including T2-WI entropy, DWI standard deviation at b-500 and b-1000, ADC mean, and skewness on T1-WI and on AP [28]. From these features, we agree on ADC mean as an informative feature that showed high AUC value (0.80) when comparing ccRCC to other tumors.

Recently, a study proposed radiologic-radiomics ML models for differentiation of benign and malignant solid renal masses [31]. They extracted CT-based radiomics features in different renal tumors and compared it also to clinical radiologic evaluation. In concordance with our study, they found that models incorporating radiologic assessment and radiomics features may help in the differentiation of different renal solid tumors, with sensitivity up to 90%, and specificity up to 91.7%.

Our results are relevant to clinical practice because of the heterogeneous nature of renal tumors, it is necessary to better characterize this disease [10]. Radiomics analysis can add complementary information for tumor characterization, which may not be perceptible to human eye [12]. These features can be used to generate ML models and aid radiological evaluation of renal tumors non-invasively. Also, in the era of personalized medicine, radiomics in combination with histopathologic, genetic, and metabolic datasets may help to improve patient management, and might be used as a biomarker that would help in tumor characterization, treatment selection, and prognosis [32, 33].

Our findings have several future applications such as facilitating workflow in busy clinical settings. Further studies are needed to incorporate these ML models in deep learning and AI algorithms including tasks like lesion detection/segmentation, characterization, and prediction of diagnosis and treatment response [34, 35]. Future studies should also evaluate the use of radiomics and ML models to assess treatment response on minimally invasive treatment modalities, such as ablative therapies, which have been used as an alternative treatment to surgery for T1a stage RCC, especially for medically fragile patients [36].

Our study has several limitations. It was conducted in a single center and the sample size was relatively small, with only 16.8% of benign lesions. This difference in numbers may affect the lack of significant features when comparing benign versus malignant tumors. Selection bias may exist because we retrospectively analyzed only patients with renal tumors that underwent surgery with prior abdominal MRI. Variability in MRI protocols/scanners may have affected the robustness and reproducibility of radiomics features [11, 37]. Results can also be affected by the fact that segmentation of the lesion was performed by one observer, and we did not measure inter-observer variability [11, 38]. Our results can potentially be improved in the future, with a larger and multicentric cohort of patients, standardization in MRI protocols, and development of new ML techniques.

In conclusion, this study showed that ML models incorporating MRI-based radiomics features and qualitative radiologic assessment can help characterize renal masses.