Introduction

Decades of research on tumor biology have revealed that tumors are heterogeneous entities at all scales (macroscopic, physiological, microscopic, genetic) [1, 2]. This tumor heterogeneity refers to the fact that different tumor cells can show distinct morphological and phenotypic profiles, including gene expression, metabolism, motility, proliferation, and metastatic potential [3,4,5]. Over the last few years, PET/CT has been proposed as a tool for noninvasive exploration of intratumor heterogeneity at the macroscopic scale [6,7,8], supposedly providing information about the biological features of tumors [9, 10]. Using a quantitative and automated approach termed as texture analysis, several prior studies have found significant correlations between tumor biology and heterogeneity measures derived from that texture analysis [11,12,13,14,15,16,17,18] in a number of different tumor types, such as breast [19, 20], pancreas [21], esophageal [22, 23], prostate [24], and especially numerous in lung cancer [25,26,27,28,29,30,31,32,33,34,35,36,37]. In this regard, several studies have reported that texture analysis can be correlated with clinically relevant outcomes such as overall survival [25, 27, 31, 34, 35], progression-free survival [27, 32, 37], and treatment response [26, 29, 36].

Although PET-based texture analysis seems to be an ideal tool for personalized medicine in oncology, numerous challenges need to be addressed before its reliable and interpretable use in the clinic [38]. There is still a need for a standardized quantification protocol to address all the widely reported issues related to the differences in the acquisition and reconstruction parameters, post-processing techniques, tumor segmentation methods or even the texture algorithm itself [38,39,40,41,42]. Nevertheless, the current major limitation of texture analysis is the lack of understanding of what PET texture indices actually represent in terms of the underlying real spatial distribution of radiotracer within the tumor. The complexity of the formulation of texture analysis makes it difficult to explain pervasive findings such as the correlations displayed between different texture indices [30, 31, 33,34,35, 43, 44], as well as the strong correlations between textural indices and tumor volume [45, 46]. Unraveling the biological substrate of these correlations is crucial to understand the complementary information provided by texture analysis, as well as to prove its clinical interpretation.

In this regard, the origin of the correlation of texture indices with tumor volume is particularly important in lung cancer. Associations between texture indices and overall survival have been consistently reported before by different studies [25, 27, 31, 34, 35]; however, tumor volume, which is also correlated with a more advanced disease stage, is strongly associated with survival [47,48,49], and therefore, it is unclear whether the predictive power of textural indices is simply driven by this correlation. In an attempt to account for this, some prior studies have statistically adjusted survival models for tumor volume [27, 46, 50]; however, correlations between texture indices and volume have been shown to be highly non-linear, leaving the efficacy of these approaches and, therefore, the real added value of texture indices, unclear.

In this study, we investigated the origin of the correlations between tumor volume and texture indices. For this, we employed a novel approach based on phantom data and studied the correlations between texture indices and volume in both real lung tumors and spheres with a homogeneous distribution of 18F-FDG.

Material and methods

Patient cohort

Our retrospective study included patients with newly diagnosed non-small lung cell carcinoma (NSCLC) referred to the Nuclear Medicine Department at Complexo Hospitalario Universitario de Santiago de Compostela (CHUS) for a pre-treatment 18FDG-PET/CT study from January to September 2018. Tumor stage was established according to the AJCC 7th Edition TNM classification [51].

Phantom data

NEMA image quality phantom [52] was used to mimic the shape of an upper human body including the commonly used hollow glass spheres with inner diameters of 3.7, 2.8, 1.7, 1.3, and 1 cm, and larger spheres specifically designed and manufactured for this work with inner diameters of 10 cm and 6 cm. In this way, our phantom study covered a range of sphere volumes from 0.29 to 294 cm3, which is similar to the range found in lung tumors [46].

Vereos PET/CT scanner

PET studies were carried out with digital Vereos PET/CT (Philips), a PET/CT scanner designed to improve small-lesion detection while reducing received dose and study time. The Digital Vereos PET/CT system is a digital photon counting PET scanner combined with a 128-channel CT system. The CT component, based on the Ingenuity CT, is a helical system with 40 mm axial coverage. The PET detector ring consists of 18 detector modules, each containing a 40 × 32 array of 4 × 4 × 19 mm3 LYSO crystals individually coupled to digital photon counters. The ring has a diameter of 764 mm and 164 mm axial length [53].

PET data acquisition

Patient preparation

Patients fasted for at least 6 h before the injection of 3.5 MBq/kg of 18F-FDG in order to ensure correct incorporation of the radiotracer, and blood glucose levels were checked and patients rested in a warm room 30 min before administration.

Phantom preparation

The phantom was prepared with two different sphere configurations: (a) spheres with 3.7, 2.8, 1.7, 1.3, and 1 cm diameter, and (b) spheres with 6 cm and 10 cm diameter. For each configuration, the phantom background was filled with a homogeneous 18F-FDG solution while spheres were filled with a higher 18F-FDG concentration to generate sphere-to-background ratios of approximately 4 and 6. Injected activity was selected so that the initial activity of the spheres was higher than that measured in the tumors with higher FDG uptake, thus covering, after decay for 8 h, the entire range of activities observed in our lung tumor sample (from 67 to 2.8 kBq/cm3).

Acquisition protocol

Patient and phantom data were acquired using exactly the same protocol for whole-body exams at our institution. Two minutes acquisition per bed were used, with an axial field of view of 16.4 cm for the first bed and 10 cm for subsequent ones, and the transaxial field of view was 576 mm. Phantom data was acquired using only one bed position centered on the phantom, with 2 min frames for 8 h. Those frames with activities below or above the observed activity range in tumors were discarded and not included in any of the analyses.

All images were reconstructed with the software provided by the manufacturer (Philips PET/Vereos version 2.0.2.26321) using an OSEM algorithm with 2 iterations and 10 subsets. Isotropic voxels of 2 mm size were used. Scatter, random, and attenuation corrections were turned on, but no PSF correction was performed.

PET data analysis

To investigate which part of the heterogeneity measured comes from biological reality of the tumor and which from other factors, we calculated textural features from PET images of NSCLC patients and then compared them with such textural features from PET images of homogeneous spheres. The same procedure was performed for patient and phantom data, following the general scheme illustrated in Fig. 1. Each tumor/sphere was manually enclosed in a cropping box by an experienced nuclear physician and automatic segmentation was performed inside the box applying a threshold of 0.45 × SUVmax [54]. The textural features were obtained using in-house implemented algorithms in MATLAB (The MathWorks, Inc.). To facilitate comparison with previously published literature, we computed exactly the same indices as in [46], and further computed 4 extra textural features that have been previously found to be associated with clinically relevant outcomes among lung cancer patients [33]. This set of indices includes first-order statistics features and textural features, derived from co-occurrence matrix (CM) [12], and size zone matrix (SZM) [55]. We adopted the same definition for the CM as in [46], in which we used a single matrix taking into account all 13 directions simultaneously to compute the elements P(I,J) of the CM, using a neighbor distance of n = 1 pixel. The selected features calculated from the CM [12] were entropy, dissimilarity, energy, contrast, and homogeneity. The selected features calculated from SZM [55] were high intensity large area emphasis (HILAE), Gray Level Non-Uniformity Normalized (GLNN), and zone percentage (ZP). Features were calculated after quantization with 64 gray levels as previously described [45, 46, 56,57,58,59]. A more detailed description of the textural indices is given in the Supplementary Material. Entropy, dissimilarity, HILAE, and ZP have shown robustness with respect to partial volume effects and segmentation [56], and reconstruction settings [42], as well as high reproducibility [57].

Fig. 1
figure 1

Schematic diagram of the processing pipeline employed for the computation of textural features in the two groups of images, namely, lung tumors and homogeneous spheres

Statistical analysis

For descriptive statistics, mean ± standard deviation was used. We used a smoothing spline to model the dependence between texture indices and volume in homogeneous spheres. The smoothing parameter was set using cross-validation. Resulting spline models were then used to predict texture indices in lung tumors, using tumor volume as the only predictor. We additionally reported root mean-squared errors (RMSE) to provide a measure of fit quality. Finally, Pearson correlation analyses were performed between predicted and measured texture features to estimate the fraction of variance (r2) that is explained by the phantom-based model. To facilitate spline regression fitting, HILAE was log-transformed due to its wide range of values (~ 700 to 22000) and to reduce skewness of the residuals. Cox proportional hazards regression was used to investigate the associations between textural features and overall survival. These analyses were performed with both raw textural features and volume-adjusted textural features by subtracting volume dependence using the phantom-based spline model. All the analyses were performed using MATLAB (The MathWorks, Inc.).

Results

Patients’ characteristics

Table 1 shows the tumor characteristics of the patients included in this study. Eighty-five patients met clinical and imaging criteria for inclusion. The median age was 68 ± 10 years, being 19 females (12/19 smokers) and 62 males (62/62 smokers). Tumor stage was determined in 75/85 patients, with 11 patients in stage I, 2 patients in stage II, 28 patients in stage III, and 34 patients in stage IV. All tumors were correctly segmented, and tumor volumes ranged from 0.35 to 399.7 cm3. As an example, Fig. 2 shows PET scans from three representative patients.

Table 1 Patient and tumor characteristics
Fig. 2
figure 2

Representative examples of three tumor lesions analyzed in this study, with volumes of (a) 3.5 cm3, (b) 25.2 cm3, and (c) 399.7 cm3. Panel (d) shows the resulting tumor segmentation, described in the PET data analysis section

Textural features and volume

Figure 3 shows the dependence of textural features measured in tumors and phantom spheres with volume. All textural features showed strong, non-linear correlations with volume, both in tumors and phantom spheres. Moreover, the measured texture indices in tumors and phantom spheres displayed a significant overlap, particularly for dissimilarity (Fig. 3b), ZP (Fig. 3d), contrast (Fig. 3f), GLNN (Fig. 3g), and homogeneity (Fig. 3h). Entropy (Fig. 3a), HILAE (Fig. 3c), and energy (Fig. 3e) showed slight differences between spheres and tumors for larger volumes, but still the overlap was significant. For large volumes, entropy and energy measured in tumors was higher than entropy and energy measured in spheres, while HILAE measured in tumors was lower than HILAE in spheres.

Fig. 3
figure 3

Dependence of textural features on lesion volume, for real tumors (red solid triangles) and homogenous spheres (black solid circles). The analyzed textural features were (a) entropy, (b) dissimilarity, (c) HILAE, (d) ZP, (e) energy, (f) contrast, (g) GLNN, and (h) homogeneity

Correlations between tumor and sphere texture features

Figure 4 displays textural feature values predicted using tumor volume with the phantom-based spline model versus the measured textural features in tumors. Strong and significant correlations were found in all the studied features (r > 0.7, p < 0.0001). Correlations between predicted versus tumor texture indices were very high for dissimilarity (r = 0.95, r2 = 0.90, RMSE = 0.993), contrast (r = 0.95, r2 = 0.91, RMSE = 28.83), ZP (r = 0.94, r2 = 0.90, RMSE = 0.055), GLNN (r = 0.93, r2 = 0.86, RMSE = 0.0393), and homogeneity (r = 0.90, r2 = 0.82, RMSE = 0.0197), and high for entropy (r = 0.70, r2 = 0.50, RMSE = 0.261) and the log (HILAE) (r = 0.73, r2 = 0.53, RMSE = 0.192). On the other hand, we find a low correlation only in the case of energy (r = 0.55, r2 = 0.30, RMSE = 0.00046). r2 values indicate that 90% of the variance of dissimilarity, contrast, and ZP was driven by non-heterogeneity information, and 80% for GLNN and homogeneity, while this fraction of variance was reduced to more than 50% for entropy and HILAE and close to 30% for energy.

Fig. 4
figure 4

Predicted versus measured textural features in lung tumors. Predicted textural features were computed using a spline model describing the dependence of textural features on lesion volume in homogeneous spheres. Thus, the model is entirely built on phantom data and only uses tumor volume for the estimation of the textural features. The analyzed textural features were (a) entropy, (b) dissimilarity, (c) HILAE, (d) ZP, (e) energy, (f) contrast, (g) GLNN, and (h) homogeneity. Squared Pearson correlation coefficients are reported as r2 and interpreted as the fraction of variance described by the phantom-based model

Survival analysis

We now tested whether the observed correlations with volume could drive or obscure associations with overall survival. Table 2 summarizes the results of univariable Cox regressions for each textural feature, both with and without subtraction of the volume effect estimated with our phantom-based model. We found that, among volume-unadjusted features, only entropy and energy were associated with survival (b = − 0.90, p = 0.01; b = 510, p = 0.01, respectively). These associations disappeared when adjusting for the effect of volume as estimated by the phantom-based model (b = − 0.43, p = 0.20; b = 3.6, p = 0.98); however, HILAE emerged as associated with survival (b = − 0.35, p = 0.008).

Table 2 Results of Cox proportional hazards models. Results are presented as coefficient models (p values)

Discussion

Although texture analysis in lung cancer has shown promising performance for some clinically relevant tasks [25,26,27,28,29,30,31,32,33,34,35], its widespread use is still limited due to the lack of understanding of what texture features actually measure [40,41,42]. Aimed at shedding light on the biological correlates of texture analysis, we studied the correlations displayed between texture indices and lesion volumes [45, 46], both in lung tumor and phantom spheres, to estimate the physiological and non-physiological contributions to the observed correlations. Our strategy based on multiple patient-to-phantom comparisons was able to identify a strong non-physiological contribution to the correlation between volume and eight robust and well-established textural features (entropy, dissimilarity, HILAE, ZP, energy, contrast, GLNN, and homogeneity). Strikingly, this non-biologically driven correlation derived from phantom data was able to explain between 30 and 90% of the observed correlation between tumor volume and texture indices in real tumors.

Our findings demonstrated strong non-linear correlations between textural features and volume, showing an analogous behavior for spheres and tumors. This fact was clearly evident for dissimilarity, ZP, contrast, GLNN, and homogeneity, which showed a virtually complete overlapping between texture indices in tumors and texture indices in phantom’s spheres, suggesting that these metrics might not be useful for heterogeneity assessment. This indicates that prior findings linking these texture indices with survival or biological features of lung tumors are likely driven by this spurious correlation [26, 50, 60]. Moreover, given the strong non-linear dependencies with volume, it is far from clear that conventional linear techniques for confounder adjustment have effectively removed this spurious effect. Less severe overlapping was found in the trends of spheres and tumors for entropy, energy, and HILAE; being entropy and energy from tumors higher than that from homogeneous spheres; and HILAE from tumors lower than that from homogeneous spheres. This indicates that entropy, energy, and HILAE might provide biologically driven information beyond the non-biological information driven by volume. Nevertheless, our correlation analyses demonstrated that, still, the spurious correlation with volume explains more than 30% of the observed correlation. Moreover, entropy displayed a clear non-monotonic relationship with volume, which further complicates potential interpretability of this measure as a marker of intratumor heterogeneity. Future studies are warranted to further understand the technical factors that lead to these complex associations. Taken together, our findings suggest that (1) the ability to measure real intratumor heterogeneity of the studied indices is severely confounded by tumor volume and (2) the correlation between tumor volume and texture indices is mostly non-biologically driven.

In order to explore the implications of our findings, we conducted a survival analysis to investigate whether the observed correlations with the volume could drive or obscure associations with overall survival. For this, we studied the association between texture indices and overall survival, considering both volume-adjusted and volume-unadjusted texture indices as estimated from our phantom model. Our results showed that only entropy was associated with survival in the volume-unadjusted model [46]. Remarkably, this association disappeared when removing volume dependence through our phantom model and, in contrast, a HILAE effect emerged [60]. This finding suggests the potential of spurious volume correlations for obscuring biologically driven associations in texture analysis, highlighting the need for novel approaches that effectively remove artificial volume dependencies in textural features.

From an interpretability point of view, our findings challenge the established (but weakly supported) notion that the correlation between tumor volume and texture indices arises as a reflection of higher intratumor heterogeneity due to tumor growth [46]. The fact that we were able to describe more than 75% of the variance for some textural features in tumors with a model entirely based on phantom data excludes the possibility of a biological origin, while casts doubt on the suitability of these indices for assessing intratumor heterogeneity. These findings are consistent with those from a prior report in which the limited ability of texture indices for the assessment of intratumor heterogeneity was suggested using phantom and simulation data [61]. Here, we crucially add to these results by performing direct comparisons between phantom lesions and tumors, observing a strong overlap between the texture indices and confirming these previous findings. Overall, our findings suggest that a significant number of texture indices previously found to be correlated with clinically relevant outcomes might not provide any useful information apart from that driven by its correlation with tumor volume.

Our study has some strengths. We performed the first systematic comparison between texture indices measured in real lung tumors and in activity-matched homogeneous phantoms, allowing us to determine texture indices’ values in the absence of heterogeneity. Moreover, these analyses were carried out in a single PET/CT scan under highly controlled and routinely used settings. Compared to previous studies that also analyzed textural feature dependence on volume in a purely correlational way [46], we were able to isolate non-biological contributions to textural features in lung tumors, demonstrating the large impact on texture indices of these non-heterogeneity-driven correlations, and allowed a better understanding of what textural features actually reflect. This study also has some limitations. First, we only analyzed one specific type of tumor in a group of patients that came from the same hospital. Second, we only included a relatively small number of heterogeneity quantification metrics. However, these eight textural features included here have been widely studied previously [33,34,35, 46] and also been shown to have a prognostic value in different cancer types [46, 56, 57, 61]. Third, we employed a widely used approach for tumor segmentation based on intensity thresholding. Other approaches might lead to different results; however, we kept the same approach for both tumors and phantoms, and therefore, any potential bias should cancel out in direct comparisons between tumor and phantom data.

Conclusion

We have shown that textural features previously found to be correlated with a number of clinically relevant outcomes present strong, non-biologically driven correlations with tumor volume. Textural features measured in homogeneous phantom spheres were highly predictive of textural features measured in tumors of similar volume. Our findings suggest that these spurious correlations might hamper reliable measurement of intratumor heterogeneity with texture analysis.