Introduction

The rapid development of computational analysis for radiological images represents a major change from conventional visual interpretation. Radiological images are a rich source of quantitative data [1]. Quantitative analysis of images has given rise to the field of radiomics which has resulted in an increasing interest in its potential use in a clinical setting, for example, radiomic application to tumour characterisation (e.g. grading and differentiation) and clinical prediction (e.g. survival) [1,2,3,4,5]. Computational analysis based on data extraction and modelling can detect features within radiological images that are not readily apparent to the human eye. By identifying characteristics such as shape, intensity and texture in heterogeneous volumes of interest (VOI), radiomic-based computational analysis may be a successful tool for radiologists to detect, differentiate and grade tumours and other pathologies.

Radiomic analysis begins with the acquisition of the image. Patients’ images are acquired during standard-of-care procedures such as contrast-enhanced computed tomography (CT). The images are processed to transform them into quantitative data for data mining. This occurs in an ordered sequence of events. First, the image is segmented into a volume of interest [1]. This may be achieved manually whereby trained personnel draw regions of interest (ROI) around tumours on each CT slice. Alternatively, there are various automated techniques to segment tumours—convolutional neural networks being the most well known [6]. This is a machine learning (ML) model biologically inspired by the human visual cortex. Second, features are extracted from each segmented image. Features include the shape (geometric parameters) of the tumour, 1st order features (intensity of voxels) and 2nd order features (texture of the voxel habitat—i.e. how voxels relate to each other). Higher order features (such as fractals and wavelet transformations) are also prevalent within the literature in areas such as glioma radiomic analysis [7].

Redundant features are removed once radiomic data has been extracted. The feature selected data can be used to predict tumour characteristics or clinical sequelae using an algorithm developed through ML. In simple terms, ML is a form of artificial intelligence which involves training a model to recognise features within a dataset. This model is then tested and validated on ‘testing’ and validation’ datasets. Various ML models have been developed such as the convolutional neural network [8] whilst others are formed in a tree-like structure such as the decision tree [9]. ML algorithms are first applied to a training dataset. The trained model is applied to the testing/validation dataset for verification. Once verified, the trained model can be applied to the target image for the pathology of interest to be classified. The model’s sensitivity and specificity may be determined by, producing a receiver operator or precision recall curve and its associated area under the curve (AUC). Clinical outcomes such as survival can also be predicted with an associated correlation coefficient [10]. The basic steps of a radiomic model for renal tumours are detailed in Fig. 1.

Fig. 1
figure 1

CT Radiomic Analysis Pipeline for Renal Tumours: The CT image is manually or automatically segmented to identify volumes of interest (in this case normal kidney—red; and tumour—green). Data from the volume are extracted for shape, 1st order (intensity), 2nd order (texture), and higher order features (not pictured). Four examples are given in the diagram. These are feature selected. The outputs are then run through a ML model, and a receiver operator curve (ROC) is generated with an area under the curve (AUC)

Renal tumours represent a growing interest in radiomic analysis. There are over 400,000 new cases of renal cancer diagnosed globally each year [11]. Predicting tumour grade and subtype prior to a histological diagnosis may guide treatment decisions and aid in prognostication [12,13,14]. Research in radiomics has been done to pathologically grade renal tumours through imaging before formal histological analysis. The radiomic literature reports a number of models able to successfully differentiate between renal tumours such as non-clear cell renal carcinoma (non-clear cell RCC), clear cell renal carcinoma (clear cell RCC) and angiomyolipoma (AML) which have different clinical sequelae [15,16,17].

This study will aim to systematically review the literature on CT radiomics to differentiate renal tumours by (1) predicting pathological grade and (2) differentiating between tumour subtypes. CT imaging was identified as the most commonly used modality for incidentally detecting renal tumours. If this imaging can be utilised for initial tumour characterisation [18] then perhaps further imaging may be avoided; this will benefit the patient and reduce the demand on busy clinical departments. Results will be presented in combination with the discussion to facilitate an educational approach.

Methods

A systematic review was performed in accordance with the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) checklist [19]. Search terms were developed from pilot searches of the literature and the PICO (population, intervention, comparison, outcome) framework. Studies of interest involved patients with renal tumours which were either graded or differentiated using CT radiomics with the prediction confirmed by histology. The main outcomes of interest were classification of pathological grade and renal tumour types with sensitivity and specificity of the model measured by the area under the curve (AUC) from the receiver operator or precision recall curve. PubMed, Scopus and Web of Science databases were searched. The search string was: (renal OR kidney) AND (CT OR "computed tomography") AND radiomic*

Study selection and extraction

Studies were included if they were journal articles that reported: (1) grading or differentiating renal tumour subtypes using radiomic features with ML, (2) adequate information for extraction of pipeline characteristics such as imaging acquisition parameters, segmentation method, features used, ML model and classification of results against histology as the ‘gold standard’ (such as total/partial nephrectomy or core biopsy), (3) reported an AUC (area under the curve) above 0.8 from a receiver operator or precision recall curve. The associated confidence interval was reported where available. Exclusion criteria were (1) reviews, abstracts, books, opinion articles, (2) non-English articles. Data was extracted by the authors A.B., M.I. and C.S. In addition to AUC, information on segmentation techniques and radiomic features used to grade and differentiate the renal tumours were also extracted. The search was last performed on 09 July,2020.

Data synthesis

The analysis of multiple radiomic pipelines is difficult given their heterogeneity. Features, feature selection method, and ML models differed between the studies; therefore, a meta-analysis was not performed.

Quality assessment

The Radiomic Quality Score (RQS) was applied to assess quality. It is a radiomic-specific scoring system and is based on the Transparent Reporting of a multivariable prediction model for Individual Prognosis OR Diagnosis (TRIPOD) initiative which has a set of recommendations for predictive models [20].

Results and discussion

The literature search found 49 articles from PubMed, 67 from Scopus, and 72 from Web of Science and 21 from the hand search. This gave a total of 209 articles. However, once duplicates were removed 125 articles remained. Titles and abstracts were screened for relevance leaving 24 articles. Full texts were reviewed, 1 article was non-English language, 3 articles did not fit our inclusion criteria of having an AUC > 0.8, and 7 articles did not report an AUC. The results from the thirteen remaining papers are discussed in order of the radiomic pipeline steps, namely image acquisition, segmentation, radiomic features, and machine learning used to grade and differentiate between renal tumours.

Image acquisition

The acquisition parameters of images used in radiomic feature extraction are detailed in Table 1. Slice thickness ranged from 1 to 8 mm, voltage between 120 and 140 kVp, phases included the unenhanced phase (UP), corticomedullary phase (CMP), nephrogenic phase (NP), portal venous phase (PVP), and excretory phase (EP). One study used only unenhanced scans [21]. The amount of contrast injected was between 70 and 150 mL, and the rates were between 3 and 4.5 mL/s. Iodinated contrast used in the studies were Ioversol, Iopamidol, and Iohexol. Three studies [22,23,24] did not report the type of contrast used.

Table 1 CT acquisition parameters for images used in radiomic feature extraction to differentiate renal tumours

Segmentations of renal tumours

Segmentation methods are detailed in Table 2. All thirteen studies utilised manual components for segmentation to achieve their first step in the radiomics pipeline (See Fig. 1). To avoid partial volume effect, segmentations were done 1–3 mm inside the tumour margin [2, 22, 25, 27,28,29,30] in some studies and skipping slices in the superior and inferior poles [2, 25, 27]. Others tended to incorporate the whole tumour slice during segmentation [16, 21, 24] without segmenting inside the tumour margin. Two studies used a single-slice axial ROI and texture radiomics which classified renal subtype [24] and grade [26] (AUC = 0.90 and with AUC = 0.91, respectively).

Table 2 Segmentation methods to grade and differentiate subtypes in renal tumours

Manual segmentation is sub-optimal due to operator subjectivity and its time-consuming nature. Accurate and reliable automated tissue segmentation software should be developed and clinically validated as it will provide greater efficiency in the radiomic pathway [31]. As a response to the lack of automated segmentation, the Medical Imaging Computing and Computer Assisted Intervention (MICCAI) society developed the KiTS19 (Kidney Tumour Segmentation) Grand Challenge where scientists compete using algorithms to automate the segmentation of kidney tumours. The KiTS19 challenge is associated with a single institution-derived database which is freely available on GitHub (https://github.com/neheller/kits19). It contains arterial phase abdominal CT scans of kidney tumour patients. The manual segmentations were done by medical students under the guidance of a urological surgeon. A total of 210 lesions were used for the training set, and 90 of the lesions were used for the testing set. Clinical information associated with the images was also available and includes the type of surgery, risk factors for renal cancer, comprehensive clinical outcomes and histological characteristics [32]. Segmentation consistency between automated and manual for the challenge is described using the Dice-Sorensen coefficient (DSC) which lies between 0 and 1. A perfect consistency equalled 1, whilst 0 equalled no consistency.

The first and second place winners of the MICCAI KiTS19 challenge both used the U-Net segmentation method [33]. This segmentation structure is based on a U-shaped computational architecture which “contracts” the image data as it propagates downwards and “expands” the data as it goes upward forming a U. The KiTS19 challenge included segmentations performed on the kidney, the tumour and the composite of both. The winning team’s U-Net segmentation method had DSC values for the tumour: 0.85, kidney: 0.97 and kidney plus tumour composite: 0.90 [17]. Development of automated segmentation methods would help improve efficiency. The studies to date have incorporated manual steps in segmentation.

Radiomic features

Texture-type features were the most frequently used radiomic feature in the studies from this systematic review [3, 16, 21,22,23,24,25,26,27,28,29,30]. Table 3 reports texture features used in differentiating renal tumours from this search. Other radiomic features utilised include shape, intensity and wavelet (higher order) features. We provide definitions for each texture feature for understanding. Texture features used were grey-level co-occurrence matrix (GLCM) and grey-level run length matrix (GLRLM), followed by grey-level difference matrix (GLDM), grey-level size zone matrix (GLSZM), autoregressive model, neighbouring grey-tone difference matrix (NGTDM) and gradient. Feature extraction should also occur on common image quality parameters for generalised use across multi-institutional contexts.

Table 3 Radiomic texture features used in differentiating renal tumours

Application of machine learning

Support vector machine (SVM) models were the most common method of differentiating renal tumours using CT radiomics [2, 3, 16, 22,23,24, 29, 30]. SVM models involve separating data values into a binary classification system (i.e. voxels belonging to either healthy tissue or tumour) using a hyperplane constructed by the ML process. The hyperplane is constructed with the widest margins between the two datasets. Figure 2 demonstrates a linear SVM model. It illustrates how the hyperplane is represented by a linear function that separates the blue values from the red values. The values that lie on the margins of the hyperplane are the support vectors—from which the name is derived. Other ML models used were logistic regression [21, 27], decision tree [25] and random forest [28].

Fig. 2
figure 2

Support Vector Machine: Schematic representation of a linear SVM. The red dots represent voxels belonging to tumour, and the blue dots are voxels from healthy tissue. These are separated by a hyperplane. Machine learning is used to construct the hyperplane, with the goal being maximal separation between tumour and healthy tissue voxels. Data points on the margins of separation are the support vectors

Grading and differentiating

Table 4 below shows articles found from the systematic review that graded tumours as high-grade and low-grade tumours using radiomic features. To be included in the table, the study had to directly compare two grade categories (high or low) and report an AUC. Three grading systems were used—the Fuhrman (I-IV), International Society of Urological Pathology (ISUP) grade (I-IV) or Paner system. High-grade tumours were defined as being Fuhrman III–IV, ISUP III-IV, Paner system 2 or 3 out of 3 [34]. Low-grade tumours were Fuhrman I–II, ISUP I–II, Paner system: 1 out of 3 [34]. Only studies with AUCs above 0.8 were included.

Table 4 Grading renal tumours using CT radiomics

Studies that graded tumours with an AUC > 0.8 dealt with clear cell RCCs apart from one study on chromophobe RCCs [21]. The most accurate predictor was using the multi-layer perceptron (MLP) ML model on texture features to discriminate between high- (III–IV) and low-grade (I–II) clear cell RCCs using the ISUP system (AUC = 0.978) [2]. The most accurate discriminator between high and low-grade renal tumours using the Fuhrman grading system was achieved by decision tree ML model and texture features (AUC = 0.87) [25]. Other Fuhrman grading pipelines include SVM with texture features, wavelet features and PVP CT (AUC = 0.869) [22] and SVM with shape, intensity and texture features achieving an AUC = 0.822 [25]. Schieda et al. 2018 [21] proposed another type of grading system by Paner [34], and an AUC = 0.84 was achieved using logistic regression and texture features for chromophobe RCC. CT contrast phases may influence the radiomic analysis for differentiating high-grade and low-grade clear cell RCC. Lin et al. 2019 [25] showed that ML with radiomics achieved an AUC = 0.87 based on three-phase CT (precontrast, nephrogenic and corticomedullary) and was superior to solely just using UP, CMP, NP individually on CT.

The ISUP grading system has been used in the top radiomic pipeline [26]. The ISUP grading system has been proposed to account for deficiencies in the Fuhrman grading system. The Fuhrman system is more complex and involves three parameters: nuclear size, shape and nucleolar prominence, without clarity on how to weigh conflicting information between them, leading to interpretation errors and poor to moderate inter-observer reproducibility. The ISUP grading system is based on the assumption that nucleolar grade alone is sufficient for grading clear cell and papillary RCC, which was shown to result in higher inter-observer consistency [35]. Given the ease of use, potential less inter-observer variability and higher classification in renal radiomics it may be pertinent that more studies use the ISUP grading system.

The main findings of studies that address differentiation of renal tumour types are summarised in Table 5. Only those studies that used radiomic features and reported an AUC > 0.8 were included in the table.

Table 5 Differentiating renal tumour subtypes using CT radiomics

Differentiation between malignant and benign renal tumours is important for clinical decision-making regarding invasive procedures [4]. Benign (oncocytoma and fat-poor AML) versus malignant (clear cell RCC, papillary RCC, chromophobe RCC) differentiation can be achieved with an AUC = 0.915 [28]. Difficulty arises in differentiating fat-poor AML from RCCs as the RCC may be incorrectly suspected due to the lack of macroscopic fat. Four studies examined fat-poor AML differentiation [24, 27, 29, 30] and three in lesions smaller than 4 cm [4, 29, 30]. The highest classifying pipeline used a combination of radiomics and human interpretation [29]. Other studies found that papillary, chromophobe and clear cell RCC can be differentiated from other RCC with an AUC = 0.92, 0.81 and 0.91, respectively [16]. In addition, chromophobe RCC can be differentiated from renal oncocytoma with an AUC = 0.964 [27].

Recent research has been performed comparing qualitative interpretation with radiomic ML pipelines. Sun et al. [26] compared radiologists’ interpretation of the tumour subtype based on clinical experience with radiomic analysis using an SVM ML model on textural features of CT scans. The study aimed to differentiate among clear cell RCCs, primary RCCs, chromophobe RCCs, clear cell RCCs and fat-poor AML. The promising finding is that combining radiologist and radiomic classification can improve performance. For example, for differentiation between fat-poor AML and clear cell RCC/renal oncocytoma, radiomics showed an accuracy of 61.9%, whilst radiologist accuracy ranged from 73.5 to 95.1% [26]. Combining both the radiomic model and radiologist interpretation, the study demonstrated an accuracy of 85.8%. Cui et al. 2019 [29] found that fat-poor AML can be differentiated from all other RCC with an AUC of 0.96 (accuracy = 87%) on unenhanced phase, corticomedullary and nephrogenic phases in lesions < 4 cm. In combination with radiologist input this increased to an AUC = 0.96 (accuracy = 93%). This radiologist input involved scoring the measurement of the attenuation of the tumour as compared to the cortex (hypo-, iso-, hyper attenuation, amount of exophytic growth (< 50% and ≥ 50%) and homogeneity of the tumour (marked: ≥ 50% heterogeneous and mild: < 50% heterogeneous) and completely homogenous tumour).

Quality assessment

Figure 3 demonstrates the Radiomic Quality Score [36] graphically. The RQS demonstrated a median score of 12/36 (range: 5/36 -15/36). This was 33.3% (range: 13.9–41.7%) of the maximum score. Main deficiencies relevant for integration into clinical practice were that there were no cost-effectiveness analyses in all studies, no application of the models in a clinical setting and no prospective studies, and validation was mainly only done in single institutions—hence generalisability to other contexts and scanners may be an issue. Further large-scale studies are warranted in multi-institutional settings for further substantiation of applicability in clinical settings.

Fig. 3
figure 3

Radiomic Quality Score (RQS) of Included Studies

Future directions

Translation of research into the clinical sphere remains challenging. Limitations of existing research include a lack of prospective data, studies carried out in single centres, the need to adapt models to a clinical setting and a lack of cost analysis. There is also a need for automation of steps such as segmentation. As a stepping stone, it may be beneficial to compare or combine computerised techniques with qualitative interpretation. Only two studies in this review compared human observers with radiomics, and found that together, this improved classification performance [26, 37]. Other gaps in the renal tumour CT radiomic literature include the fact that only one higher order feature was examined (wavelet), and grading studies were limited mainly to one tumour type (clear cell RCC). Other areas of interest include the use of CT radiomics to predict clinical outcomes in patient with renal tumours. In lung tumours, for example, clinical outcomes such as disease survival [38] and overall survival [39] have been reported indicating a role for prognostication using radiomics. There have also been studies in non-small cell lung cancer to predict the absence of distant metastasis [40] and response to neoadjuvant chemotherapy [41].

Conclusion

CT radiomics shows promise in grading and differentiating renal tumours. Studies have been performed to differentiate between clear cell RCC, fat-poor angiomyolipoma, papillary RCC, chromophobe RCC, renal oncocytoma (RO) with AUC ranging from 0.82 to 0.96 in eight studies. The renal tumour grading studies focused on clear cell RCC (AUC = 0.82–0.978) and chromophobe RCC (AUC = 0.84) with further work needing to be done with other renal tumour types. Challenges remain to translate radiomic pipelines for use at the radiologist’s workstation.