Introduction

Radiomics converts imaging data into high-dimensional mineable features, which have the potential to yield imaging biomarkers for tumor classification and prognostication [1]. This is currently a “hot topic” and exciting frontier in radiology research. Radiomics and other artificial intelligence approaches dominate discussions at our radiology meetings and our publications [2, 3]. These publications usually report impressive results with high level of statistical significance [4,5,6], but most if not all of these publications end with the same disclaimers related to reproducibility [5, 7, 8]. This is because there currently are no standardized image acquisition and post-processing protocols in radiomics. Technical parameters related to image acquisition (dose, phase) [9, 10], reconstruction (slice thickness, reconstruction kernel) [9, 11], segmentation technique [10], and radiomics feature extraction [12]. A number of studies have explored using compensation techniques or deep-learning based algorithms to mitigate the effect of these technical parameters on radiomics features [13,14,15,16]. In addition, there is high barrier to entry in radiomics research due to amount of data required for analysis and many of the published reports have used in-house radiomics software that required expertise in computer science. To our knowledge, only one publication has investigated the effect of in-house vs. freely available radiomics software in the calculated radiomics features [12]. Foy et al. evaluated radiomics features extracted from regions of interests from 40 mammograms and 39 head and neck CTs using two in-house radiomics software programs and two freely available radiomics packages, and found that there were significant variations in the calculated values across software platforms [12]. In this study, the authors evaluated the effect of software on the calculated radiomics features but they did not evaluate whether these variations affected the overall performance in a classification task. The purpose of this study is to evaluate the diagnostic performance of a commercially available radiomics research prototype vs. an in-house radiomics software in binary classification of CT images from patients with pancreatic ductal adenocarcinoma (PDAC) vs. healthy control subjects.

Materials and methods

Patients

This study was an Institutional Review Board-approved HIPAA-compliant retrospective study. The same dataset of patients with PDAC and healthy controls [6] was used for both in-house radiomics software and commercially available software. Results for the in-house software were previously published [6]. Briefly, 190 patients with surgically resected PDAC were identified from the radiology and pathology databases at our institution from 2012 to 2017. One hundred and ninety healthy renal donors without known pancreatic disease were identified from the radiology database from 2005 to 2009. Patients with suspected PDAC based on imaging features without surgical proof were excluded. Medical records of potential renal donors were reviewed to exclude participants with pancreatic disease (e.g., pancreatitis, pancreatic mass) and diabetes mellitus. Preoperative CT scans of patients with PDAC and healthy control subjects were analyzed. The dataset was divided into 255 training cases (125 healthy control cases and 130 PDAC cases) and 125 validation cases (65 healthy control cases and 60 PDAC cases). The training and testing cases were randomly selected from total 380 cases (190 PDAC + 190 normal) as the 2/3 for training (255 cases, 67%) and the remaining 1/3 (125 cases, 33%) for testing. The number of training cases was twice of the testing cases so that the samples can statistically cover the distribution of testing cases.

CT acquisition

Patients with PDAC were scanned on a 64-slice MDCT scanner (Sensation 64, Siemens Healthineers) or dual-source MDCT scanner (FLASH, Siemens Healthineers), and healthy control subjects were scanned on a 64-slice MDCT scanner (Sensation 64 Siemens Healthineers). Patients with PDAC and healthy control subjects were injected with 100–120 mL of iohexol (Omnipaque, GE Healthcare) at an injection rate of 4–5 mL/s. Scanning protocols were customized for each patient to minimize dose but were in the order of 120 kVp, 300 mAs, and 0.6–0.8 pitch. Both arterial and venous phases were acquired per institution protocol, for both patients with PDAC and healthy renal donors.

Image segmentation

Venous phase 0.75 mm slices were chosen for image segmentation and radiomics analysis. The whole 3D volume of pancreas for healthy control cases and the whole 3D volume of the tumor, background pancreas, and whole pancreas (including tumor region and background normal pancreas) for PDAC cases were manually segmented by four researchers (a radiation oncologist with 30 years of experience, a CT technologist with 20 years of experience, and two post-doctoral fellows with 1 year experience) using commercial segmentation software (Velocity 3.2.0, Varian Medical Systems). The contours were verified by three abdominal radiologists with 5–30 years of experience. Features extracted from whole pancreas contours were used in radiomics analysis.

Image analysis

Image analysis using in-house software has been described in detail in [6], and 478 features were extracted. The features were based on tumor intensity, shape, texture, and wavelet features as described in [17], and the process was implemented using C++ language by our computer scientist. Because the number of features was larger than the number of training cases, it was necessary to reduce the redundancy of computed features. Minimum-redundancy maximum-relevancy feature selection was applied to the computed feature set and 40 features (and subset of 5 most relevant features) were selected for random forest classification.

Image analysis using commercial software was performed on syngo.via Frontier Radiomics prototype (syngo.via Frontier, Siemens Healthineers). Eight hundred and fifty-four radiomics features including first-order statistics, shape, and texture were extracted from the original images. Additional first-order statistics and texture features were computed from filtered images, such as wavelet filters. Feature reduction was performed and 40 most relevant features were selected for random forest classification. Decision trees for random forest were developed based on the training dataset and tested on the test set by majority voting. Performance of the algorithm was evaluated by overall sensitivity, specificity, and accuracy in binary classification of cases from patients with PDAC and healthy control subjects.

Results

Demographic information of the 190 patients with surgically resected PDAC and 190 healthy control subjects is shown in Table 1. The mean and SD of the maximal 2D diameter of the tumor was 4.1 ± 1.7 cm for the 190 patients with PDAC. The unsupervised clustering results of all 854 radiomics features extracted using commercially available research prototype for both the PDAC cases and healthy control cases are shown in Fig. 1. This heat map represents a color-coded array of all feature values (y-axis) in all cases (x-axis). For visualization, each individual radiomics feature is normalized on the basis of all 380 cases.

Table 1 Demographics characteristics of patients with pancreatic ductal adenocarcinoma (PDAC) and healthy control subjects
Fig. 1
figure 1

Heat map representation of radiomics features on y-axis and cases on x-axis. Left color bar represents color coding of Z-scores of each radiomics features of 380 cases for patients with PDAC and healthy controls. There are different clustered patterns between patients with PDAC and healthy controls

Forty features were selected from the commercially available research prototype using minimum-redundancy maximum-relevancy feature selection and the 10 most relevant features are shown in Table 2. The number of features was further reduced to five features to allow more direct comparison with results from the previous publication [6]. Diagnostic performance of the commercially available research prototype and in-house software in classifying CT cases from patients with PDAC and healthy control subjects is shown in Table 3. When 40 radiomics features were used, the in-house software achieved superior sensitivity (1.00) and accuracy (0.992) compared to the commercially available research prototype (sensitivity = 0.950, accuracy = 0.968). Both software achieved the same specificity (0.985). When the number of features was reduced to five features, diagnostic performance of the in-house software decreased to sensitivity (0.950), specificity (0.923), and accuracy (0.936), whereas the diagnostic performance of the commercially available research prototype was unchanged (sensitivity 0.950, specificity 0.985, and accuracy 0.968).

Table 2 Ten most relevant radiomics features selected by commercially available research prototype for binary classification of pancreatic ductal adenocarcinoma cases vs. healthy control cases
Table 3 Diagnostic performance of commercially available radiomics research prototype vs. in-house software in binary classification of pancreatic ductal adenocarcinoma and healthy control subjects

Although both radiomics software programs generated three false negatives when only five radiomics features were used (sensitivity = 0.950), they only shared one of the false negatives in common. The case that was misclassified as false negative by both programs was a predominantly exophytic mass arising from the head of pancreas with contiguous porta hepatic lymphadenopathy (Fig. 2). The other two false negatives that were misclassified by one radiomics software program were correctly classified by the other radiomics software (Figs. 3, 4). The discrepancy was likely due to differences in computation and selection of relevant features (Table 4).

Fig. 2
figure 2

False negative case of pancreatic ductal adenocarcinoma that is misclassified with both commercially available and in-house radiomics software, using five radiomics features for classification. Coronal IV contrast-enhanced CT image of a 85-year-old man shows a predominantly exophytic mass arising from head of pancreas that is contiguous with locoregional lymphadenopathy (arrows)

Fig. 3
figure 3

False negative cases of pancreatic ductal adenocarcinoma that are misclassified by the commercially available software but were correctly classified by the in-house software, using five radiomics features for classification. a Axial IV contrast-enhanced CT image of a 62-year-old woman shows diffusely infiltrative tumor throughout the pancreatic body and tail (white arrow) associated with dilatation of the pancreatic duct (black arrow). b Axial IV contrast-enhanced CT image of a 65-year-old woman shows an exophytic hypoenhancing mass arising from the uncinate process of the pancreas (arrow)

Fig. 4
figure 4

False negative cases of pancreatic ductal adenocarcinoma that are misclassified by the in-house software but were correctly classified by the commercially available software, using five radiomics features for classification. a Coronal IV contrast-enhanced CT image of a 59-year-old man shows dilated common bile duct and pancreatic duct with a subtle isoenhancing mass at the transition point in the ampullary region (arrow). b Axial IV contrast-enhanced CT image of a 69-year-old man shows a hypoenhancing mass in the pancreatic body (arrow)

Table 4 Five maximally relevant features in the commercially available radiomics research prototype and in-house software

Discussion

Radiomics has the potential to generate imaging biomarkers for classification and prognostication. Technical parameters from image acquisition to feature extraction and analysis have the potential to affect radiomics features [9,10,11,12]. The current study used the same CT images with manual segmentation on both a commercially available research prototype and in-house radiomics software to control for any variability at the image acquisition step and compared the diagnostic performance of the two programs. Both programs achieved similar diagnostic performance in the binary classification of CT images from patients with PDAC and healthy control subjects, despite differences in the radiomics features they employed (854 features in commercial program vs. 478 features in in-house program). This is reassuring that even though there may be variations in the computed values for radiomics features, the differences do not seem to significantly impact the overall diagnostic performance of the constellation of radiomics features. This is important for the broader implementation of radiomics research. Currently, many radiomics studies have been performed using proprietary in-house software, which requires in-house expertise in computer science, a luxury that only a few academic centers can afford. The results of this study show that commercially available radiomics software may be a viable alternative to in-house computer science expertise, which can lower the barrier of entry for radiomics research and allow clinicians to validate findings of the published studies with their own local datasets.

In the previously published study [6], we observed a decrease in diagnostic performance when the number of features was reduced from 40 features to 5 features. In the current study, there is no change in performance when the number of features was reduced. Interesting, although both programs achieve the same sensitivity (0.950) using five features, the false negative cases are not the same across both programs, likely due to differences in computation and selection of relevant features. The cases that were misclassified as false negatives by one or both software show diverse imaging appearance ranging from small subtle isoenhancing mass to diffuse tumor infiltration of the pancreas. These algorithms appear to be focusing on different imaging features as the basis for the classification. It may be possible to combine different algorithms to achieve superior performance. Due to “blackbox” nature of radiomics, it is not easy pinpoint the exact cause of the difference in performance of these programs. Future research is needed to understand these differences.

This study has a number of limitations. First, it was a retrospective study with a relatively small sample size. The study population was selected based on the previously published study to allow for direct comparison of diagnostic performance between the commercially available and the in-house software. Second, this study compared the performance of two software on one specific application. Future research is needed to determine if other commercially available radiomics software will achieve similar results and if these software will achieve similar results for other clinical applications and imaging modality. Third, there is currently no standardization of imaging protocol for radiomics studies. In the future, these radiomics software will require validation across different institutions, vendors, and scanning protocols.

Conclusion

This study showed that a commercially available radiomics software may be able to achieve similar diagnostic performance as an in-house radiomics software. The results obtained from one radiomics software may be transferrable to another system. Availability of commercial radiomics software may lower the barrier of entry for radiomics research and allow more researchers to engage in this exciting area of research.