Introduction

Colorectal cancer (CRC) is one of the most habitually diagnosed tumour worldwide and metastatic disease is the main cause of death for patients with CRC. The liver is the most common site of distant metastases. A proper identification and characterization of liver lesions allows a better patients selection to avoid unnecessary treatment, so that the radiologist plays a crucial role in the multidisciplinary team of colorectal patients with liver metastases [1, 2]. Although computed tomography (CT) is usually the diagnostic method utilized for staging and surveillance, magnetic resonance imaging (MRI) is the main appreciated diagnostic tool in liver assessment thanks to its ability to provide morphological and functional data that improve the lesion characterization [1.2]. Radiomics is an encouraging field that explores medical images to acquire quantitative data that could be utilized as biomarkers to evaluate pathological processes at microscopic levels in order to increase diagnostic, prognostic and predictive accuracy in oncological setting [1,2,3,4,5,6,7,8]. The primary endo-point of radiomics is to improve the detection rate of tumours, which is accompanied by the need for a correct estimate of the prognosis and the identification of patients who are responsive to a specific treatment [9,10,11,12,13,14]. In this context, radiomic is conceived to be applied in decision support of precision medicine, using standard of care images that are routinely acquired in clinical practice, without burdening the costs of a radiological examination, both for patients and for health facilities [15,16,17,18]. Moreover, this tool, providing prognostic and/or predictive biomarker, offers a low-cost and repeatable instrument for longitudinal observing [19, 20].

Radiogenomics, that is the correlation of radiomics with patient molecular data, improves treatment, in the view of medicine adapted to the patient. Even if several studies have evaluated the radiogenomics in hepatocellular carcinoma, only few researches have assessed the radiomics in colorectal cancer liver metastases [1,2,3]. Imaging plays a crucial role in the management of patients with liver metastases having to guarantee not only an early diagnosis, but also a correct assessment post therapy, in order to avoid treatments that are harmful [21,22,23,24]. Although computed tomography (CT) is the diagnostic tool most often used during staging and follow-up, magnetic resonance imaging (MRI) is the only technique that allows to assess morphological and functional data of lesions, providing quantitative data that increase the characterization and the assessment after treatment [21,22,23,24].

In this setting, the opportunity to correlate radiomics data obtained by MRI to recurrence, mutational status, pathological characteristics (mucinous and tumour budding) and surgical resection margin offers significant benefits with respect to qualitative assessment, allowing useful treatment selection in the perspective of personalized medicine. In the present study, we assessed the efficacy of radiomics features obtained by conventional T2-weighted (W) sequences-MRI to predict clinical outcomes following liver resection in colorectal liver metastases patients.

Materials and methods

Dataset characteristics

Local Ethical Committee board accepted this retrospective study renouncing to the patient consent signature for nature of the study.

Patient selection was made from January 2018 to May 2021 considering the following inclusion criteria: (1) liver pathological proven metastases; (2) MRI study of high quality in pre-surgical setting and (3) a follow-up CT scan of at least six months after surgery. The exclusion criteria were: (1) discordance among the imaging diagnosis and the pathologically ones, (2) no MRI studies. An external validation patient dataset was considered using data from “Careggi Hospital”, Florence, Italy. Therefore, the patient cohort included a training set and an external validation set. The internal training set included 51 patients (18 women and 33 men) with 61 years of median age (range 35–82 years) and 121 liver metastases. The external patient cohort consisted of 30 patients with single lesion (10 women and 20 men) with 60 years of median age (range 40–78 years).

As prognostic features we considered data obtained by pathological lesions assessment: (1) front of tumour growth: expansive versus infiltrative; (2) tumour budding: high grade versus low grade or absent; (3) mucinous type and clinical data obtained by follow-up; and 4) presence of recurrence.

The characteristics of the patients and their metastases are summarized in Table 1.

Table 1 Characteristics of the study population (81 patients)

MR imaging protocol and images post-processing

A Magnetom Symphony (Siemens, Erlangen, Germany) and Magnetom Aera (Siemens) equipped with an eight-element body and phased array coils were used to acquire MRI study protocol that includes breath-hold fat-saturated and not fat-saturated T2-weighted (T2-W) turbo spin-echo sequence, in- and opposed-phase T1-weighted (T1-W) gradient-echo sequence and fat-saturated (FS) T1-W gradient-echo sequence before and after contrast agent injection.

In this study, the radiomic features extraction was made considering the SPACE (sampling perfection with application-optimized contrasts using different flip angle evolution) fat suppressed sequences. Detailed data regarding the MR imaging parameters are summarized in Table 2.

Table 2 MR Sequence parameters

Regions of interest (ROIs) were manually drawn slice-by-slice on SPACE images by two expert radiologists with 22 and 15 years of abdominal imaging experience, first separately and then together and in accordance with each other. Radiomics features were extracted as median values by the volumes of interest obtained by the consensus of two radiologists. No registration techniques to decrease movements artefacts were applied; however, the use of median value of metrics reduce the influence by artefacts.

The ROIs definition was made using segmentation tool of 3DSlicer [https://www.slicer.org/].

Radiomic Features were extracted using PyRadiomics [https://pyradiomics.readthedocs.io/en/latest/features.html] that includes First Order Statistics (19 features); Shape-based (3D) (16 features); Shape-based (2D) (10 features); Gray Level Cooccurence Matrix (24 features); Gray Level Run Length Matrix (16 features); Gray Level Size Zone Matrix (16 features); Neighbouring Gray Tone Difference Matrix (5 features); Gray Level Dependence Matrix (14 features).

The features are calculated according to the definitions of the Imaging Biomarker Standardization Initiative (IBSI). Details about radiomics features are reported in [https://readthedocs.org/projects/pyradiomics/downloads/] [22, 25].

Statistical analysis

Univariate and multivariate analysis were performed using the Statistics and Machine Learning Toolbox of MATLAB R2021b (MathWorks, Natick, MA, USA).

The assessment of observer variability was made calculating the intraclass correlation coefficient. The nonparametric Kruskal–Wallis test was performed to identify differences statistically significant among clinical parameters and radiomic metrics of two groups (front of tumour growth: expansive versus infiltrative; tumour budding: high grade versus low grade or absent; mucinous type; and presence of recurrence).

Receiver operating characteristic (ROC) analysis was made and the Youden index was used to individuate the optimal cut-off value for each feature and area under the ROC curve (AUC), sensitivity, positive predictive value (PPV), negative predictive value (NPV) and accuracy. McNemar test was used to demonstrate difference statistically significant in the performance results of dichotomy tables. A p value < 0.05 was considered as significant for each statistical test.

To identify the combinations of variables with the best results in the prediction of the clinical outcomes, a multivariate analysis was performed. Clinical outcome considered were: (1) front of tumour growth: expansive versus infiltrative; (2) tumour budding: high grade versus low grade or absent; (3) mucinous type; and (4) presence of recurrence.

A first selection of variables was made based on the results obtained from the univariate analysis considering only the features that at univariate analysis had an accuracy superior a threshold reported in Table 3.

Table 3 (Sub)datasets, variables selection criteria and predictors combinations

A linear regression modelling was used to assess the best linear model of textural features considered as predictors for each outcome. ROC analysis with Youden index was used to identify the optimal cut-off value of the linear model and to obtain sensitivity, specificity, PPV and NPV.

Moreover, pattern recognition methods include support vector machine (SVM), k-nearest neighbours (KNN), artificial neural network (NNET), and decision tree (DT). The best model was chosen considering the highest area under ROC curve and highest accuracy. A 10-k fold cross validation approach was used to individuate the best classifier on the training set while the external validation cohort was used to validate the findings of the best classifier.

Results

The median value of intraclass correlation coefficients for features was 0.91 (range 0.86–0.95).

Among significant features to differentiate the front of tumour growth, 15 textural parameters obtained an accuracy ≥ 70% (Table 4). The best performance to discriminate expansive versus infiltrative front of tumour growth was obtained by wavelet_LHL_gldm_DependenceNonUniformityNormalized with accuracy of 82%, a sensitivity of 99%, a specificity of 53% and a PPV and a NPV of 78% and 96%, respectively, with a cut-off value of 0.06.

Table 4 Findings by univariate analysis with ROC performance results

Among significant features to differentiate the tumour budding, four textural parameters obtained an accuracy ≥ 85% (Table 4). The best performance to discriminate high grade versus low grade or absent was the wavelet_LLH_glcm_Imc1 with accuracy of 88%, a sensitivity of 93%, a specificity of 71% and a PPV and a NPV of 90% and 79%, respectively, with a cut-off value of -0.14.

Among significant features to differentiate the mucinous type of tumour, 15 textural parameters obtained an accuracy ≥ 87% (Table 4). The best performance to differentiate the mucinous type of tumour was obtained by the wavelet_LLH_glcm_JointEntropy with accuracy of 92%, a sensitivity of 83%, a specificity of 94% and a PPV and a NPV of 78% and 95%, respectively, with a cut-off value of 4.61.

Among significant features to identify tumour recurrence, six textural parameters obtained an accuracy ≥ 80% (Table 4). The best performance to identify tumour recurrence was obtained by the wavelet_LLL_glcm_Correlation with accuracy of 85%, a sensitivity of 52%, a specificity of 97% and a PPV and a NPV of 84% and 85%, respectively, with a cut-off value of 0.88.

Linear regression model increased the performance obtained with respect to the univariate analysis exclusively in the discrimination of expansive versus infiltrative front of tumour growth while for the other predictions the univariate analysis obtained the highest accuracy (see Tables 5 and 6, Fig. 1). In the discrimination of the of expansive versus infiltrative front of tumour growth the linear model of the 15 predictors reached an accuracy of 90%, a sensitivity of 95%, a specificity of 80% and a PPV 89% and NPV of 90%, respectively.

Table 5 Linear regression and pattern recognition analysis with significant features
Fig. 1
figure 1

ROC curves of linear regression analysis respect to the tumour growth front (A), the tumour budding (B), the tumour mucinous type (C) and the recurrence presence (D)

Considering significant texture metrics tested with pattern recognition approaches, the best performance for the identification of the front of tumour growth was reached by a decision tree while for the discrimination of tumour budding, mucinous type and presence of recurrence by a KNN (Table 5, Fig. 2). The best accuracy was reached by the KNN in the discrimination of the tumour budding considering the four textural predictors (original_glcm_Idn; wavelet_HLL_glcm_InverseVariance; wavelet_LHL_gldm_DependenceNonUniformityNormalized; wavelet_LLH_glcm_Imc1): (AUC of 0.93; an accuracy of 93%; sensitivity of 81%; and a specificity of 97%).

Fig. 2
figure 2

ROC curves of the best classifier respect to the tumour growth front (A), the tumour budding (B), the tumour mucinous type (C) and the recurrence presence (D)

Significant difference in terms of accuracy among univariate and multivariate analysis was obtained only in the prediction of tumour growth front both considering linear model and the decision tree and in the prediction of tumour budding considering the KNN compared to the accuracy obtained by the single best predictor at the univariate analysis (p value < 0.05 at McNemar test).

Discussion and Conclusions

The present study demonstrated that radiomics analysis can be identified as biomarkers, several features that could impact on the therapeutic choice in colorectal liver metastases patients. Our data were verified by external validation dataset.

We obtained a good performance considering the single textural significant metric in the identification of front of tumour growth (expansive versus infiltrative) and tumour budding (high grade versus low grade or absent), in the recognition of mucinous type and in the detection of recurrences.

With regard to the front of tumour growth, 15 textural parameters obtained an accuracy ≥ 70% and the best performance was obtained by wavelet_LHL_gldm_DependenceNonUniformityNormalized with accuracy of 82%, a sensitivity of 99%, a specificity of 53% and a PPV and a NPV of 78% and 96%, respectively, with a cut-off value of 0.06.

Regarding tumour budding, four textural parameters obtained an accuracy ≥ 85% and the best performance to discriminate high grade versus low grade or absent was the wavelet_LLH_glcm_Imc1 with accuracy of 88%, a sensitivity of 93%, a specificity of 71% and a PPV and a NPV of 90% and 79%, respectively, with a cut-off value of −0.14.

Among significant features to differentiate the mucinous type of tumour, 15 textural parameters obtained an accuracy ≥ 87%. The best performance was obtained by the wavelet_LLH_glcm_JointEntropy with accuracy of 92%, a sensitivity of 83%, a specificity of 94% and a PPV and a NPV of 78% and 95%, respectively, with a cut-off value of 4.61.

With regard to tumour recurrence, six textural parameters obtained an accuracy ≥ 80%. The best performance was obtained by the wavelet_LLL_glcm_Correlation with accuracy of 85%, a sensitivity of 52%, a specificity of 97% and a PPV and a NPV of 84% and 85%, respectively, with a cut-off value of 0.88.

Linear regression model increased the performance obtained with respect to the univariate analysis exclusively in the discrimination of expansive versus infiltrative front of tumour growth while for the other predictions the univariate analysis obtained the highest accuracy.

Several studies demonstrated the correlation between radiomics parameters and prognosis [26,27,28,29,30,31,32,33,34,35,36,37,38,39]. An association between homogeneity and worse overall survival (OS) was demonstrated by Andersen et al. [31]. According to Rahmim et al. radiomic parameters of heterogeneity obtained by FDG PET were predictors of lower OS [36]. Lubner et al. demonstrated that the degree of skewness was inversely correlated to KRAS while the entropy was related to OS [33]. In addition to the survival advantages, the possibility to predict recurrence in liver has been demonstrated [36,37,38,39]. According to our results, Ravanelli et al. related high CT uniformity and low OS and PFS in patients with CRC and liver metastasis [38].

Radiomics and radiogenomics are emerging tools with significant limits. The major limit is the heterogeneity of software employed in different studies, so as the variety of imaging devices in different clinics. This evidently hampers the reading of different results in multicentre studies. In addition, the segmentation could affect the results [39,40,41,42,43,44,45,46,47].

The present study has several limitations: (1) the small population analysed, although the investigation was done on a homogeneous sample and on individual lesion; (2) the retrospective nature of the study; (3) a manual segmentation, that, although several researches support automatic segmentation to avoid inter-observer variability, in our opinion, the manual approach is more realistic. Moreover, we not assessed the impact of the different sequences as T1-W or diffusion weighted imaging so as the different phases of contrast study. Data that we plan to evaluate in a future study are shown in Table 6.

Table 6 Linear regression model parameters with respect to the tumour growth front

Ours results confirmed the capacity of radiomics to identify as biomarkers, several prognostic features that could affect the treatment choice in patients with liver metastases, in order to obtain a more personalized approach. These results were confirmed by external validation dataset. We obtained a good performance considering the single textural significant metric in the identification of front of tumour growth (expansive versus infiltrative) and tumour budding (high grade versus low grade or absent), in the identification of mucinous type and in the detection of recurrences.