Introduction

In recent years, predictive, preventive, and personalized medicine (PPPM) has been advocated in the field of tumor research and has become the direction of future research [1]. It is evident that a paradigm shift in medicine is needed to move from traditional reactive medical practice to PPPM to devise treatment strategies [1,2,3]. Applying PPPM to the clinical management of adrenal incidentalomas (AIs), it is important to predict the pathology of the adrenal tumors before surgery to prevent perioperative mobility and mortality. A subclinical pheochromocytoma (sPHEO) may pre-dispose the patient to life-threatening hypertensive crises during surgery, which may lead to serious cardiovascular events. However, due to the overlap in imaging features and non-specific clinical findings, sPHEO is frequently misdiagnosed as other AIs, such as lipid-poor adrenal adenoma (LPA) [4,5,6,7,8,9]. There has been no clear way to distinguish sPHEO and LPA based on traditional radiological analysis. New imaging-related methodologies with a PPPM approach might be useful to improve predictive diagnosis for AI.

Radiomics is a PPPM-based approach that uses computerized quantitative imaging analysis for extracting large numbers of image-related features, such as intensity, geometry, and texture, from medical images [10]. In short, radiomics converts medical images into numerical values that could not be obtained by observation of the image. This process uses high-throughput data-extraction algorithms, which enable subsequent data analysis and model building [11,12,13]. Recent radiomic studies have shown promising results for tumor differentiation and subtype classification [14, 15]. This method provides an alternative to traditional imaging analysis, and it may assist in the development of predictive diagnostics for personalized medicine [16,17,18]. However, the effectiveness of a radiomic approach in differentiating sPHEO from LPA is not known.

In the present study, we retrospectively evaluated 265 cases that had been surgically resected with pathological confirmation of sPHEO and LPA. We developed radiomic nomograms and validated their efficiency for differentiating between sPHEO and LPA.

Materials and methods

Patients

This retrospective study was approved by the Ethics Committee and Institutional Review Board in Xiangya Hospital of Central South University, P. R. China (IRB No. 201612638). Informed consent was waived for all patients.

Data for surgically and pathologically confirmed adrenal adenoma (AA) and pheochromocytoma cases in our hospital were collected from 1 June 2006 to 31 December 2017 using a search of our institutional database and medical record system. None of the patients included in this study received tumor-related therapy before the computed tomography (CT) scans. Patients without detailed medical records, pathological results, or quality pre-operative CT images were excluded. The patients included in this study were divided into two cohorts (training and validation) at a ratio of 4:1 using computer-generated random numbers. This process is shown in Fig. 1. Enhanced CT (CTpost) features and pre-enhanced (CTpre) features were used to generate the radiomic nomograms M1 and M2, respectively.

Fig. 1
figure 1

The patient recruitment pathway, along with the inclusion and exclusion criteria. Note: PHEO, pheochromocytoma; AA, adrenal adenoma; sPHEO, subclinical pheochromocytoma; LPA, lipid-poor adenoma; CTpre, pre-enhanced CT value (Hu); CTpost, enhanced CT value (Hu)

CT image acquisition, retrieval, analysis, and feature extraction

Details about CT image acquisition, retrieval, and analysis, as well as the algorithms for radiomic feature extraction [19] are in the Electronic supplementary material, along with intraobserver (two observations by reader 1) and interobserver (reader 1 vs. reader 2) reproducibility evaluation methods.

Radiomic feature selection and signature building

The least absolute shrinkage and selection operator (LASSO) method, which is suitable for analyzing large sets of radiomic features and regression of high-dimensional data, was applied to select the most significant predictive feature from the training cohort. Most of the coefficients of the covariates were reduced to zero, and the remaining non-zero coefficients were selected by LASSO. The non-zero coefficients of the selected features were used to construct the radiomic signature (Rad-score). The Rad-score was calculated for each patient via a linear combination of selected features that were weighted by their respective coefficients.

Diagnostic validation of the radiomic signature

The potential use of the Rad-scores to differentiate the two lesions was first evaluated in the training cohort and then in the validation cohort using the area under the receiver operating characteristic (ROC) curve (AUC).

Development and performance of a radiomic nomogram based on enhanced CT images

Multivariable logistic regression analysis was performed with both the Rad-scores and the CT imaging features showing significant differences between sPHEO and LPA patients. A radiomic signature was applied to build a predictive model for differentiating sPHEO from LPA using the training cohort.

To provide the clinicians with a quantitative tool for accurate diagnosis, we built a nomogram (M1) using multivariable logistic regression analysis of CTpost images in the training cohort. AUC analysis was used to quantify the predictive performance of the radiomics nomogram in the training cohort. The AUC values were then used to test the predictive performance of the radiomics nomogram in the validation cohort. A calibration curve was obtained from multivariable Cox proportional hazard regression analysis.

Development and performance of a radiomics nomogram without enhanced CT features

To evaluate the feasibility and efficiency of the radiomic nomogram using non-enhanced CT images in the classification task, another radiomic nomogram (M2) was developed for the CTpre imaging features using the training cohort. The predictive performance of the M2 model was validated in the validation cohort using AUC values. A calibration curve was obtained from multivariable Cox proportional hazard regression analyses.

Statistical analysis

Quantitative data with a normal distribution are presented as the mean ± SD, and the groups were compared using an independent Student’s t test. Quantitative data with a non-normal distribution are presented as medians (interquartile ranges), and the groups were compared using a Wilcoxon rank sum test. Patient sex ratios and lesion location ratios between the two groups were compared using a chi-square test. The LASSO regression was performed using MATLAB. Other statistical analyses were performed using R software. A two-sided P value was computed, and statistical significance was set at 0.05.

Results

Clinical characteristics of the patients

The patient characteristics for the training and validation cohorts are summarized in Table 1. There were no significant differences in the clinical characteristics between the training cohort and validation cohort, for either the sPHEO or LPA group, which justified their use as training and validation cohorts.

Table 1 Characteristics of patients in the training and validation cohorts

Inter- and intraobserver reproducibility of radiomic feature extraction

Satisfactory inter- and intraobserver reproducibility of the texture feature extraction was achieved. The interobserver intraclass correlation coefficients (ICCs) calculated based on features extracted by reader 1 (first extraction) and reader 2 ranged from 0.775 to 0.907. The intraobserver ICCs, calculated based on reader 1’s feature extraction results from two attempts, ranged from 0.794 to 0.933. Therefore, all outcomes were based on the features extracted by reader 1.

Radiomic feature extraction and radiomic signature construction

A total of 340 features were extracted from the unenhanced CT images. Figure S1 contains a visual representation of the extracted radiomic features of all the patients. From this representation, an estimated classification was visible based on the differences in texture associated with the two diagnoses. Of these features, 16 were selected by LASSO for inclusion (Fig. S1) in the Rad-score calculation formula, as shown in the Electronic supplementary material.

Figure S2 contains a graphical presentation of the Rad-scores for the patients in the training and validation cohorts. The patients’ diagnoses were represented by the color-coding scheme, and the Rad-score values were sorted in ascending order, a clear classification of LPA and sPHEO images could be observed.

Diagnostic validation of the radiomic signature

A significant difference was found in the Rad-scores between sPHEO and LPA patients in the training cohort (P < 0.001), which was further confirmed in the validation cohort. Patients with sPHEO generally had higher Rad-scores in the training cohort. The radiomic signature yielded an AUC of 0.903 in the training cohort and 0.904 in the validation cohort. A Rad-score value of more than 0.661 was considered sPHEO, with a sensitivity of 92.5% and specificity of 83.5%. ROC curves are presented in Fig. 2.

Fig. 2
figure 2

Receiver operating characteristic (ROC) curve analysis based on the radiomic signature (Rad-score). Cut-off values are indicated along the curves. a ROC curve of the training cohort. b ROC curve of the validation cohort. Fn, false-negative rate

Development of a radiomic nomogram

Multivariable logistic regression analysis was performed with Rad-score, LD, SD, CTpre, shape, homogeneity, and N/C. A prediction model incorporating the above predictors was developed and is presented as a nomogram in Fig. 3. This model (M1) provided an AUC of 0.957 in the training cohort and 0.967 in the validation cohort (Fig. 3). An M1 value of more than 75.97 could be diagnosed as sPHEO, with a sensitivity of 89.7% and specificity of 95.5%.

Fig. 3
figure 3

The receiver operating characteristic (ROC) curves and the corresponding nomogram with multiple predictors. Cut-off values are indicated along the curves. a ROC curve for the training cohort. b ROC curve for the validation cohort. c Nomogram

Development of a radiomic nomogram without enhanced CT features

To determine whether pre-enhanced CT images were sufficient to distinguish sPHEO from LPA, the M2 nomogram was constructed without inclusion of the imaging features from the enhanced CT. M2, which was developed without N/C, also showed a promising AUC of 0.955 in the training cohort and 0.958 in the validation cohort. These were very close to the values obtained when N/C was included. The corresponding ROC curves are shown in Fig. 4. A M2 value of more than 81.50 could be diagnosed as sPHEO, with a sensitivity of 97.0% and specificity of 84.8%.

Fig. 4
figure 4

Receiver operating characteristic (ROC) curves based on the nomogram without enhanced CT features (necrosis/cystic degeneration, N/C). Cut-off values are indicated along the curves. a ROC curve for the training cohort. b ROC curve for the validation cohort. c Nomogram without N/C

Validation of the radiomic nomogram

The calibration curves for the two radiomic nomograms to be used for differentiation between sPHEO and LPA, based on either enhanced or unenhanced images, are shown in Fig. S3. They both showed good agreement between evaluation by the radiomic nomogram and actual observation.

Discussion

Radiomics for medical imaging analysis facilitates identification of predictive diagnostic imaging biomarkers, which allows for a personalized approach to therapy [10]. Such information may help clinicians use a PPPM approach [1,2,3] to optimize presurgical diagnosis of subclinical pheochromocytoma. In the present study, we found that the Rad-score alone could differentiate sPHEO from LPA. Our predictive models combining the Rad-score with traditional imaging features improved performance in differentiating the sPHEO from LPA. Using AI as an example, our study showed the relevance of using a PPPM approach to predict the tumor pathology correctly before surgery, preventing any potential adverse outcomes during surgery and enabling a personalized therapeutic strategy for each patient.

Our study was designed to address a real dilemma that is encountered by clinicians, especially radiologists and surgeons [7, 20]. In clinical practice, AIs are often identified when patients undergo a routine abdominal or chest CT scan, even when it is only a non-contrast CT scan. A dedicated multi-phase contrast-enhanced CT adrenal scan may help the radiologists to make a definitive diagnosis regarding the pathology of the tumor, but the potential risk of an additional CT scan should be considered. The disadvantages of doing a second scan include additional cost, radiation hazards and the potential risks associated with contrast media, which include allergy and potential renal damage. Unfortunately, the majority of patients currently undergo an additional CT scan due to the difficulty in distinguishing the various adrenal tumors with traditional radiological analysis. Furthermore, the risks associated with contrast media are further exacerbated in vulnerable populations such as the elderly and pediatric populations.

Our approach has some advantages over the existing methods in AI research filed. Our approach is part of a growing number of studies that use a new kind of predictive models for pre-operative diagnosis of cancer. For example, one study combined a large panel of predictors, including radiological and clinicopathological features, to improve the prediction of lymph node metastasis in colorectal cancer [21]. Notably, this method relied heavily on the key factors chosen to construct the prediction model. In addition, in traditional predictive model studies, their method required many cases to achieve high efficiency (sensitivity and specificity), which was not practical. The number of patients enrolled in clinical studies is usually modest and therefore only a few features could be included, which may lead to decreased accuracy in predictive models. Our study overcame this issue by using Rad-score as a single key factor through data reduction method for modeling throughout the study. Therefore, the impact on accuracy of predictive modeling, which caused by the modest number of patients included in the study or the limited number of features should be included in the model building, seems to be overcome transiently in our study, as it would have if the previously published method was used [21]. In addition, our study introduces the Rad-score, which was calculated using 16 radiomic features selected by LASSO regression and showed good classification ability in the ROC test. Our study results indicated that the Rad-score might be a useful stand-alone diagnostic indicator.

Application of radiomic analysis has been increasing in medical imaging research [10, 22, 23]. Successful examples have been reported for cancers throughout the body [10, 22,23,24,25]. Radiomic models have been reported as potential biomarkers for distinguishing different types of tumors or different subtypes of a common tumor. Furthermore, radiomic signatures could be important predictors for differential diagnosis, therapeutic response, and survival in cancers [26, 27]. Most recently, radiomics has been used to differentiate small angiomyolipoma without visible fat from renal cell carcinoma with high accuracy, sensitivity, and specificity (93.9, 87.8, and 100%, respectively) [14]. Therefore, radiomic signatures can be useful in clinical practice.

Our study has filled a gap in the literature on sPHEO in the setting of AI. Previous studies on AIs have mostly focused on the differentiation of benign and malignant adrenal tumors. Prior studies have reported that some imaging characteristics on CT scans, as well as a dedicated adrenal CT scan with washout and 15-min delayed imaging, could be helpful in differentiating adenoma from other adrenal tumors [28,29,30]. However, specific data on LPA and sPHEO has been lacking. Gufler et al. reported an attempt to distinguish metastatic adrenal tumors from benign lesions, using 56 adrenal tumor cases with a history of malignancy [31]. The researchers developed a scoring system based on traditional radiological features in pre-enhanced CT images, which demonstrated a high efficiency for identifying metastatic lesions. Our study is similar to theirs in that both studies developed effective scoring systems for identifying adrenal tumors. However, our study focus was different; we studied benign adrenal tumors, and they studied both benign and malignant adrenal tumors. In addition, our study had the advantage of a large study cohort with all cases pathologically confirmed while their study had a much smaller cohort with only a part of the cohort pathologically confirmed. To the best of our knowledge, our study presents the largest cohort to date for sPHEO and LPA in the setting of AI.

Our findings may have value for clinical practice. The goal of imaging for an adrenal tumor is to differentiate a lesion that may need treatment, such as pheochromocytoma, from a benign “leave-alone” tumor, such as an adenoma. From a radiological perspective, an optimal algorithm should be used to differentiate the leave-alone lesions from the lesions that need treatment using as few features as possible [28]. Our study has yielded very encouraging results. It is exciting that we may not need the additional contrast-enhanced adrenal CT scan. A non-contrast-enhanced CT scan is easier to obtain with reliable image quality, cheaper, and less time-consuming than enhanced CT or other imaging modalities such as magnetic resonance imaging (MRI) and positron emission tomography (PET)/CT. More importantly, the CT images already obtained for clinical purposes may be sufficient for clinicians to perform the calculations introduced in the present study. Therefore, our results may be rapidly incorporated into clinical practice.

There are several limitations to our study. First, due to the retrospective nature of the study, there might be selection bias. Second, the clinical outcome and prognosis data for the cardiology and oncology domains were not available due to lack of relevant documentation in the medical records or loss of follow-up clinical visits. Because of the benign nature of the adrenal tumors in this study, patients and their doctors do not have a strong incentive to follow-up after initial treatment. Third, we used a validation cohort that was from the same institution as the training cohort, which prevented us from generalizing the results to other institutions and other clinical settings. As a single-center study, the patient population was relatively small and homogeneous. A large-scale independent prospective multicenter study is needed to assess the generalizability of the results. Fourth, it is possible that adding features from other imaging modalities, such as MRI or PET/CT, might improve the accuracy of the model. However, this was beyond the scope of this study and will require further investigation. Fifth, although we only compared sPHEO and LPA in the present study, this method should be validated when adding other adrenal tumors with similar overlapping CT findings in future studies. Last, the radiomics features in our study were extracted from the largest cross-sectional area of the tumor. We recognize that different approaches, such as assessing the entire tumor in a three-dimentional image may provide additional information about tumor heterogeneity and may affect the study results. Nevertheless, our method of selecting the largest cross-sectional area of the tumor has been validated in previous published studies [21, 32].

Conclusions and expert recommendations

In conclusion, our PPPM-based radiomic models were effective in differentiating sPHEO from LPA in the setting of AI. In addition, our pre-enhanced CT-based nomogram has the advantage of avoiding the additional radiation exposure, risk, and cost of enhanced CT. The two models may allow clinicians to use existing pre-enhanced or enhanced CT images to correctly predict the pathology of adrenal tumors before surgery. Future prospective multicenter studies with larger samples may be needed to validate our models for clinical practice.

Our promising study result may facilitate PPPM-based efforts in both research and clinical practice. By assisting clinicians to make an accurate pre-operative diagnosis without additional imaging examination, the implementation of our diagnostic algorithm may improve personalized therapy. Moreover, we believe this CT image-based model and operational algorithm may be replicated by other institutions.

Based on our experience from the present study, we believe the following recommendations are important for establishing an accurate pre-operative diagnosis of adrenal tumors using PPPM principles [1,2,3]. First, multidisciplinary collaboration and partnership, including radiologists, urologists, cardiologists, and endocrinologists, are essential for the successful planning, calculation, integration, and applicability of radiomic research results in clinic practice. Second, quality CT images are vital for achieving an accurate diagnosis. These recommendations can enable the radiomics method developed here to improve predictive diagnostics and support personalized medicine.