Introduction

Using quantitative parameters extracted from the positron emission tomography (PET) component of PET/Computed Tomography (CT) images, such as standardised uptake values (SUV), as biomarkers in multicentre trials or in sites equipped with multiple PET/CT scanners requires that these parameters be comparable among patients, regardless of the PET/CT system used. This can be achieved by harmonising patient preparation, as well as data acquisition, reconstruction, and processing, including the steps for image analysis and parameters extraction [13]. The American College of Radiology (ACR) program [4], the European Association of Nuclear Medicine (EANM)/EANM Research Ltd. (EARL) accreditation program [5] and the Society of Nuclear Medicine (SNM) clinical trials network (SNM-CTN) [6] have set up harmonisation programs based on the use of phantoms acquisitions. These are used as standardised objects in order to harmonise data acquisition, processing, and analysis so that the physical, technical, and biological sources of error [1, 7] in SUV measurements can be limited.

A specific issue is related to reconstruction-dependent variations encountered with recently introduced advanced image reconstruction algorithms, such as those incorporating the point spread function (PSF) in the system matrix [8] or Bayesian penalised likelihood (BPL) reconstruction [9]. These new image reconstruction schemes have been shown to produce SUV metrics significantly higher than conventional ordered subset expectation maximisation (OSEM) algorithms [10]. Consequently, an additional filtering step can be used in order to meet harmonising standards [1113]. With regards to the EANM/EARL program [5], a set of PET images with NEMA NU-2 anthropomorphic phantom-based filtering is mandatory to harmonise SUVs to the EANM standards. Given that centres running PET systems with advanced reconstruction algorithms are often willing to use them with parameters chosen in order to achieve optimal lesion detection, EARL-accredited centres tend to use two PET datasets when participating in multicentre trials: one for optimal lesion detection and image interpretation, and the filtered one for harmonised quantification [12].

It is important to emphasise that all these previous efforts have been focused on typical SUV metrics, as they are commonly used in oncology for therapy assessment and risk stratification. However, there is growing interest in using alternative measurements—for instance, metabolically active tumour volume (MATV) and heterogeneity metrics—in order to provide a more comprehensive quantitative assessment of lesions from PET images [14, 15]. One of the most promising approaches for heterogeneity quantification is textural features analysis, introduced for image processing applications in the 1970s, used in magnetic resonance imaging (MRI) and CT since the early 1990s, and more recently in PET [16]. As PSF reconstruction improves resolution and therefore provides higher definition of structures within a lesion, it is reasonable to expect improved evaluation of tumour heterogeneity as compared to OSEM algorithms. This raises the question of which reconstruction should be used for assessing tumour heterogeneity within a program using a smoothed dataset to reach harmonising standards. Two studies have already reported on the impact of the type of reconstruction algorithm or variation of reconstruction parameters on the textural features values [17, 18]. However, they have mostly been focused on reporting the quantitative impact only, and have neither explored the issue within the context of harmonisation programs nor looked at the relationship between heterogeneity and volume, an important aspect that was recently demonstrated [19].

This study focused on lung cancer, a tumour type for which standard SUV metrics have been proven to be clinically useful [2023] and for which quantification of tumour heterogeneity in PET images has recently gained interest [24, 25], and aimed at evaluating the potential impact of the EARL accreditation program [5] on selected 18F-FDG heterogeneity metrics. The primary aim was to compare several heterogeneity features previously identified as reliable (robust and repeatable) in lung cancer patients, in PSF-reconstructed images, PSF-reconstructed images with a filter chosen to meet harmonising standards, and in EARL-compliant OSEM images, later referred as to OSEM images. This comparison was performed not only in terms of absolute values but also in terms of their distributions with respect to tumour volume, which was not considered in previous studies. A secondary aim was to study whether potential differences in heterogeneity features amongst these three reconstructions would be similar in adenocarcinomas (ADC), squamous cell carcinomas (SqCC), and large cell lung cancer (LCC), the main histological types encountered in non-small cell lung cancers (NSCLC).

Materials and methods

Patients’ selection

Over a 3-month period, 60 consecutive biopsy proven lung cancer patients (four small cell lung cancer and 56 NSCLC) were prospectively included. Informed consent was waived for this type of study by the local ethics committees (Ref A12-D24-VOL13, Comité de protection des personnes Nord-Ouest III) since the scans were performed for clinical indications and the study procedures were performed independently of normal clinical reporting.

PET calibration and cross-calibration

The calibration of the PET system was performed daily with a 68Ge cylinder with a known radioactive concentration.

The cross-calibration procedure was performed twice during the present study, as per the EANM guidelines [11]. Details regarding this cross-calibration can be found elsewhere [12]. The cross-calibration factors were found to be 0.99 and 1.00.

PET/CT examinations

After a 15-min rest in a warm room, patients who had been fasting for 6 h were injected with 18F-FDG. The injected activity and the exact delay between injection and the start of the acquisition were recorded for each patient.

All PET imaging studies were performed on a Biograph TrueV (Siemens Medical Solutions) with a 6-slice spiral CT component. For additional technical details regarding this system we refer to a previous publication [26]. CT acquisition was performed first, with the following parameters: 60 mAs, 130 kVp, pitch 1 and 6 × 2 mm collimation. Subsequently, the PET emission acquisition was performed in three-dimensional (3-D) mode. Patients were scanned from the skull base to the mid-thighs.

PET reconstruction

The standard reconstruction in the department where patients were recruited is a PSF reconstruction algorithm (HD; TrueX, Siemens Medical Solutions; three iterations and 21 subsets) without filtering. For the purpose of the present study, raw data were also reconstructed with the OSEM reconstruction algorithm (four iterations and eight subsets) and a PSF reconstruction algorithm (HD; TrueX, Siemens Medical Solutions; three iterations and 21 subsets) with a 7-mm Gaussian filter (PSF7). As shown in a previous study, this latter reconstruction leads to protocol-specific images with NEMA NU-2 phantom-based filtering that meet EANM quantitative harmonising standards, therefore reducing reconstruction-dependent variation in SUVs [12]. The OSEM reconstruction parameters met the EANM requirements regarding activity recovery.

For all reconstructions, matrix size was 168 × 168 voxels, resulting in isotropic voxels of 4.07 × 4.07 × 4.07 mm3. Scatter and attenuation (using the associated CT) corrections were applied.

PET tumour delineation

All lesions were first automatically delineated in the PSF PET image using the fuzzy locally adaptive Bayesian (FLAB) algorithm, and resulting segmentations were reported on the two other images (OSEM and PSF7). This process avoided any variability in the tumour volume definition and number of voxels involved in the calculations when comparing features across the three images. FLAB has been developed specifically for PET image segmentation [27] and has been thoroughly validated for reproducibility, robustness, and repeatability [28, 29], as well as for accuracy on simulated and clinical images [30]. Tumours were first located and isolated in a volume of interest (VOI) well enclosing the tumour and its surrounding background, without including nearby pathological uptake. This was performed using in-house software in which points are placed by the user around the tumour (see Supplemental Fig. 1). The FLAB algorithm was then applied to this VOI in fully automated mode, in contrast to a semi-supervised approach considered in a previous work on [18F]fluorothymidine (FLT) images during radiotherapy, in which the contrast and signal-to-noise ratio were lower [31].

However, in order to be more representative of a current multi-centric clinical setting, tumour volumes were also defined using a fixed threshold at 50 % of SUVmax (SUVmax50%) applied independently to each of the three images. Furthermore, for the most discordant volumes between PSF7/OSEM and PSF (outliers located above the 90th percentile) when using SUVmax50%, tumours were also segmented independently with FLAB on each set of images.

Features were then extracted from each of the three volumes and compared.

Tumour characterisation and quantification

From the FLAB-delineated volumes, all “standard” metrics were extracted: SUVmax, SUVmean and metabolically active tumour volume. To characterise uptake heterogeneity, several metrics were considered: on the one hand, a first-order metric based on the intensity histogram (IH) [16] denoted area under the curve of the cumulative histogram (CHAUC) [32], and on the other hand, second- and third-order textural features (TF). The metric CHAUC is based on intensity histogram only and does not incorporate spatial information. TFs have been defined to quantify patterns of spatial arrangements and/or intensity variations. There exist dozens of TFs based on different computational frameworks. In the present work, we used only a few selected TFs. This selection was based on several previous studies showing that most of the features (including first-order metrics such as skewness), especially third-order metrics focusing on small areas and/or low intensities, are unreliable due to poor robustness vs. reconstruction [17, 18] or partial-volume effects and segmentation [33], and low repeatibility on test-retest images [34]. The remaining features are either calculated from co-occurrence (second-order: entropy, correlation, and dissimilarity) or size-zone matrices [third-order: high-intensity larger area emphasis (HILAE) and zone percentage (ZP)]. Before building these matrices, images are first discretised into a chosen number of bins (B) with a quantisation step. It has been shown that the choice of the quantisation value (usually between 8 and 256) has an important impact on the resulting TF value, but also the reproducibility [34] or complementary value with the tumour volume in which it is calculated [19]. Based on these previous results, a value of B = 64 was used in the present work, and the quantisation was performed using equation 1, in which I(x) is the original SUV of the voxel of interest and SUV min and SUV max are the minimal and maximal SUV values within the tumour volume.

$$ {I}_B(x)=B\times \frac{I(x)-SU{V}_{min}}{SU{V}_{max}-SU{V}_{min}} $$

For co-occurrence matrices, it has been shown that less redundant features are obtained when calculated using a single co-occurrence matrix taking into account all 13 spatial directions simultaneously, rather than computing a matrix for each direction followed by averaging [19, 35]. A single co-occurrence matrix was thus adopted in the present work.

Noise analysis in PET images

In order to evaluate noise characteristics of each of the three reconstructions, signal-to-noise ratio (SNR, defined as \( 20\times { \log}_{10}\left(\frac{\mu }{\sigma}\right)DB \) [36] where μ and σ are the mean and standard deviation of intensities) was measured in circular regions of interest (ROIs) placed in homogeneous regions of the liver and automatically reported in each reconstruction.

Statistical analysis

Quantitative data are presented as mean (standard deviation), as well as the median when not normally distributed. Bland-Altman analyses were used to compare the SUV metrics obtained in the three images. The features obtained on each of the three sets of PET images were first compared globally using Friedman tests. Graphical plots of each feature depending on tumour volume were also used to estimate the impact of PSF reconstruction compared to OSEM and PSF7 images, and the features were then compared by categories of volumes using Friedman tests. MATV, SUVmax, and TFs extracted from the three sets of data were compared according to the histological type of the tumour (ADC, ScCC, and LCC) using Kruskal-Wallis tests. For all tests, a two-tailed P value of less than 0.05 was considered statistically significant. Graphs and analyses were carried out using Prism (GraphPad Software, La Jolla, CA).

Results

Population characteristics and compliance to guidelines for tumour imaging

Population characteristics are displayed in Table 1. Overall, 58 (96.7 %) patient examinations fulfilled the EANM 2.0 guidelines for PET tumour imaging. The mean (SD) injected dose of 18F-FDG was 4.02 (0.16) MBq/kg. The mean (SD) delay between the injection and the start of the PET acquisition was 60.43 (3.38) min. The mean (SD) blood glucose level was 1.04 (0.23) mmol/L.

Table 1 Patient demographics

Validation of the use of an additional harmonised PET dataset to overcome reconstruction-dependency of SUVs

Overall, 71 pulmonary lesions were delineated. The mean (SD, median) FLAB-derived MATV was 31.7 (46.4, 9.7) cm3. The mean (SD) SUVmax for OSEM, PSF, and PSF7 reconstructions were 10.50 (5.85), 15.42 (9.56), and 10.56 (5.88), respectively. The mean (SD) SUVmean for OSEM, PSF, and PSF7 reconstructions were 6.14 (2.99), 7.37 (4.03), and 6.25 (2.98), respectively.

As shown in Supplemental Fig. 2, a Bland-Altman analysis demonstrated that the mean ratio of PSF and OSEM reconstructions for SUVmax and SUVmean were 1.46 (95 % CI = 0.86–2.08) and 1.19 (95 % CI = 0.71–1.67), respectively. When using the PSF7 harmonised reconstruction, the mean ratio between PSF7 and OSEM reconstructions were 1.01 (95 % CI = 0.93–1.09) and 1.02 (95 % CI = 0.95–1.09) for SUVmax and SUVmean, respectively.

Compared to OSEM, SNR in the liver was lower in PSF images (−25.8 ± 3.9 %), whereas it was very similar in PSF7 images (1.2 ± 2.9 %).

Impact of newest reconstruction algorithms on textural features

In a first step we used FLAB to delineate lesions in PSF images and we reported this segmentation on OSEM images. For the first-order metric CHAUC based on the intensity histogram, most tumours were quantified as significantly more heterogeneous in PSF images compared to OSEM images, as PSF values were significantly lower than OSEM ones (a lower area under the curve indicating higher heterogeneity). Regarding second-order metrics calculated on the co-occurrence matrix (entropy, correlation, and dissimilarity), on the one hand, PSF values were significantly lower for correlation and significantly higher for dissimilarity (in both cases indicating higher heterogeneity), compared to OSEM reconstruction. On the other hand, no significant difference between PSF and OSEM images was observed for entropy. Regarding third-order metrics calculated on size-zone matrices (HILAE and ZP), there was a significant difference between PSF and OSEM images only for HILAE values, which were lower, indicating higher heterogeneity. Figure 1 displays TFs for the three reconstructions used.

Fig. 1
figure 1

Impact of the EARL harmonisation strategy on textural features using the FLAB algorithm to delineate lesions. Textural features are shown for the three reconstructions used. CHAUC: area under the curve of the cumulative histogram; high-intensity larger area emphasis (HILAE); ZP zone percentage. Data is shown as Tukey boxplots (lines displaying median, 25th and 75th percentiles; cross represents the mean value).*, **, and *** indicate two-tailed P < .05, P < .01, and P < .001, respectively. ns non significant

Heterogeneity features were also analysed depending on the range of tumour volumes. As shown in Fig. 2, the dispersion of the values (represented by the interquartile range) was larger and calculated values were significantly smaller for PSF reconstruction as compared to OSEM reconstruction for tumour volumes larger than 1 cm3 in the case of CHAUC. For HILAE, calculated values were also significantly smaller for tumour volumes larger than 1 cm3 but the dispersion of values was narrower for PSF compared to OSEM. Dissimilarity values were significantly higher in PSF for volumes >50 cm3 and the dispersion of these values was larger in PSF images for tumour volumes larger than 1 cm3, compared to OSEM reconstruction. Distributions for all metrics can be seen in details of Supplemental Figs. 3 and 4.

Fig. 2
figure 2

Impact of tumour volume on textural features. Textural features used for the three reconstructions, depending on tumour volume. CHAUC: area under the curve of the cumulative histogram; high-intensity larger area emphasis (HILAE); ZP zone percentage. Data is shown as Tukey boxplots (lines displaying median, 25th and 75th percentiles; cross represents the mean value).*, **, and *** indicate two-tailed P < .05, P < .01, and P < .001, respectively. ns: non significant

When defining volumes using SUVmax50% applied independently to OSEM and PSF images, mean (median, SD) MATVs were significantly smaller on PSF images [9.0 (12.4, 3.3) cm3] as compared to OSEM images [18.8 (24.2, 7.3) cm3] (p < 0.0001). There were significant differences between PSF and OSEM images for first-order, second-order, and third-order metrics with the same trends as detailed above (Fig. 3). Figure 4 displays representative examples of tumour delineation using the FLAB algorithm and a 50 % of SUVmax threshold, as well as the related SUV and TF metrics.

Fig. 3
figure 3

Impact of the EARL harmonisation strategy on textural features using a 50 % of SUVmax threshold to delineate lesions. Textural features are shown for the three reconstructions used. CHAUC: area under the curve of the cumulative histogram; high-intensity larger area emphasis (HILAE); ZP zone percentage. Data is shown as Tukey boxplots (lines displaying median, 25th and 75th percentiles; cross represents the mean value).*, **, and *** indicate two-tailed P < .05, P < .01, and P < .001, respectively

Fig. 4
figure 4

Representative examples of lung tumours. OSEM, PSF, and PSF7 images and textural features are displayed for a 67-year-old male patient with a squamous cell carcinoma (panels a and c) and for a 44-year-old female patient with an adenocarcinoma (panels b and d). Images have been scaled on the same maximum value. Note the improvement in tumour apparent activity and contrast in PSF images compared to OSEM images, and the similarity between OSEM and PSF7 images. This can also be observed quantitatively in the extracted SUV and TF metrics. Green contours (panels a and b) denote the tumour delineation using the automatic FLAB algorithm on the PSF image and reported on the two other datasets. Red contours (panels c and d) denote the delineation using the 50 % of SUVmax threshold applied independently to each image. CHAUC: area under the curve of the cumulative histogram; high-intensity larger area emphasis (HILAE); ZP zone percentage

When comparing volumes obtained with SUVmax50% on PSF to OSEM and PSF7 ones, eight outliers (above the 90th percentiles) were observed and re-processed by independently contouring with FLAB. There was no significant difference between PSF and OSEM mean (median, SD) MATV [12.8 (12.2, 4.8) cm3 and 13.4 (13.0, 4.8), respectively]. There were significant differences between PSF and OSEM images for first-order, second-order, and third-order metrics as previously described, except for dissimilarity and ZP, for which there were only trends (Supplemental Fig. 5).

Effect of the harmonisation strategy on textural features

In the previous section, using the FLAB algorithm to delineate lesions, we found that data extracted from OSEM reconstructions were different from those extracted from PSF reconstructions for several TFs but also for CHAUC. When comparing these metrics extracted from OSEM and PSF7 reconstructions, none exhibited significant differences (Fig. 1). Furthermore, the distributions of their values according to MATV were much more similar for all ranges of volumes (Fig. 2) with no significant difference in whatever range of tumour volumes was considered (except for HILAE in volumes larger than 50 cm3), highlighting that the quantifiable heterogeneity content of the PSF7 images was very close to the one contained in OSEM images.

We also defined tumour volumes using SUVmax50% applied independently to OSEM and PSF7 images. When using this methodology, no significant difference was observed between MATV, CHAUC, and all TFs extracted from OSEM and PSF7 reconstructions (Fig. 3). Mean (SD, median) MATV were 18.8 (24.2, 7.3) cm3 and 19.5 (25.5, 7.7) cm3 for OSEM and PSF7 reconstructions, respectively (ns).

Analysing the eight outliers described above for which FLAB was used independently on the three sets of images, there was no difference between OSEM and PSF7 mean (median, SD) MATV [13.4 (13.0, 4.8) cm3 and 13.6 (13.2, 5.11), respectively] and between textural features extracted from OSEM and PSF7 images (Supplemental Fig. 5).

SUV metrics and heterogeneity features amongst the histological subtypes

Standard metrics exhibited significant differences amongst the three NSCLC histological subtypes. In particular, there was a trend towards smaller volumes in ADC. SUVmax values were also different in the three subtypes, however, these had large overlaps between the three distributions. SUVmax values obtained in PSF reconstructed images were higher for all three subtypes, though these resulted in a similar and unchanged differentiation between them: ADC had significantly lower SUVmax than SqCC and LCC in all three reconstructions (Fig. 5).

Fig. 5
figure 5

Standard quantification metrics. FLAB-derived metabolically active tumour volume (MATV) and standardised uptake values (SUVs) according to the histological subtypes (adenocarcinoma: ADC; squamous cell carcinoma: SqCC; large cell carcinoma: LCC) in non-small cell lung cancer patients for the three reconstructions used. Data is shown as Tukey boxplots (lines displaying median, 25th and 75th percentiles; cross represents the mean value). * and ** indicate two-tailed P < .05, and P < .01, respectively. ns non significant

Although the heterogeneity metrics were differently distributed with the three different reconstruction schemes, none of them were significantly different among the three histological subtypes, whatever reconstructed image set was considered (Fig. 6).

Fig. 6
figure 6

Impact of the histological subtype on textural features in non-small cell lung cancer patients. FLAB-derived textural features according to the histological subtypes (adenocarcinoma: ADC; squamous cell carcinoma: SqCC; large cell carcinoma: LCC) in non-small cell lung cancer patients for the three reconstructions used. CHAUC: area under the curve of the cumulative histogram; high-intensity larger area emphasis (HILAE); ZP zone percentage. Data is shown as Tukey boxplots (lines displaying median, 25th and 75th percentiles; cross represents the mean value)

Discussion

Heterogeneity metrics, especially textural features, have gained interest in the past few years to quantify intratumour heterogeneity in PET images. There have been several studies highlighting the dependency of these metrics to various factors, including the image analysis workflow (such as tumour delineation or partial-volume effects correction) [33], the image reconstruction schemes or parameters [17, 18, 37] and basic stochastic effects occurring in the PET acquisition process [38].

Our results confirm some of these previous results regarding the impact of the reconstruction choices on these metrics values [18]. Compared to OSEM images, unfiltered PSF-reconstructed images showed lower SNR in the liver, higher heterogeneity and higher range of heterogeneity values in the tumour, for most of the metrics when using FLAB (independently on the three sets of images or not) and for all of the metrics considered in the present work when using SUVmax50%, to be more representative of a current multi-centric clinical setting. This difference was logically especially observed when analysing larger tumours. Our study indeed sheds light on the impact of reconstruction algorithms on the distributions of heterogeneity features with respect to tumour volume, which had not been considered in these previous studies. Regarding the differences observed in the case of SUVmax50%, it should be emphasised that part of these can be directly attributed to the fact that this segmentation method applied to PSF images led to significantly smaller volumes than on OSEM and PSF7 images, with sometimes drastically reduced volumes not covering the tumour uptake spatial extent (see Fig. 4c and d and supplemental Fig. 6). This method, which has been evaluated previously mostly on standard non-PSF images, is thus clearly not appropriate to extract tumour volume and associated metrics from PSF-reconstructed images because of their higher contrast.

Thus, the impact of reconstruction for comparable tumour volumes was found to be significant for some metrics (CHAUC, correlation, dissimilarity, HILAE) and only an observable trend for others (entropy, ZP), and the differences increased with larger tumour volumes (for instance, in the case of ZP, differences were significant only for tumours larger than 50 cm3). This suggests that PSF-based reconstruction may provide more quantifiable heterogeneity-related information in larger tumours than OSEM images, as the interval between smallest and highest values increases, thereby providing more potential for differentiating different levels of heterogeneity in these tumours. Our results also highlight the fact that some TFs seem more sensitive than others to the changes in PSF-reconstructed images compared to OSEM images when analysing similar volumes determined with FLAB: CHAUC, correlation, and HILAE showed higher sensitivity with larger differences in both overall and volume-related distributions than entropy, dissimilarity, and ZP.

The present study was conducted within the overall harmonisation strategy context and focused on the EARL accreditation program, which is why unfiltered PSF images (optimised for diagnostic purposes) and OSEM images were compared to PSF images filtered with a 7-mm Gaussian filter chosen to meet the EANM 1.0 guidelines (PSF7). As previously published [12], the use of PSF7 resulted in SNR in the liver and SUVmax values in the tumour very close to OSEM, and the same pattern was observed for heterogeneity metrics. All metrics considered in the present study were very close with no significant differences when extracted from OSEM and PSF7 images, no matter the delineation technique used. This suggests that OSEM and PSF7 EARL-compliant images present a similar quantifiable heterogeneity content, and validates the use of TFs extracted from PSF-filtered images for multi-centre studies. However, as stated above, our results also suggest that using unfiltered PSF-based reconstructions could potentially provide more discriminative image features allowing for higher differentiation amongst patients, for studies aiming to quantify tumour heterogeneity using TFs and exploit these metrics for a clinical endpoint, such as patient stratification according to survival or response to therapy. Of note, these studies have been mostly performed in single sites, but future validation studies will likely require pooling data from several centres in order to obtain larger cohorts with enough statistical power. This raises the issue of using filtered-harmonised PSF images so that they can be pooled with OSEM data from other centres (potentially losing some discriminative power from TFs), or pooling data only from centres using PSF reconstruction with no post-filtering. This issue is problematic, as the EARL accreditation program is not meant to exclude images from centres running PET systems not equipped with PSF reconstruction or other advanced algorithms. Also, the sensitivity of TFs to reconstruction parameters needs to be interpreted in the context of an important reconstruction disparity within PET centres, even in centres running the same PET system, as recently reported by the Clinical Trials Network of the Society of Nuclear Medicine and Molecular Imaging (SNMMI) [39]. Taken together, these findings suggest that PSF reconstruction with a Gaussian filtering chosen to meet harmonising standards could be used within the harmonisation strategy context for studies aiming to quantify intra-tumour heterogeneity to stratify, rank, or classify patients with respect to a given clinical endpoint. In addition, we recommend that whenever available, unfiltered PSF images should also be analysed, especially for large, single-centre series since quantitative metrics obtained from these could potentially offer higher discriminative power. This of course requires additional studies in larger cohorts to be carried out.

One limitation of our study is the inclusion of a single system where the underlying reconstruction was identical apart from the use of the PSF modeling. Differences in image reconstruction methods between several vendors may give rise to additional variability that needs to be evaluated before texture analysis could be reliably used in the context of multi-centric studies, despite the demonstrated repeatability and robustness of several features versus changes in image properties [17, 18, 37].

Finally, we sought to identify differences in heterogeneity features within NSCLC histological subtypes. Although we showed that the ADC presented much smaller volumes as well as lower SUVmax values than the SqCC and LCC subtypes, none of the heterogeneity metrics showed any discriminative power in differentiating these subtypes, in either reconstruction method used, which is in line with recent results obtained in breast cancer studies [40]. On the other hand, it contradicts another recent study that suggested textural features could differentiate between ADC and SqCC in a cohort of 30 NSCLC Asian patients [41]. These results were obtained on 2D-slice—not 3D volume analysis only—and required the combination of numerous parameters through machine learning (automated clustering) in order to differentiate the two subtypes. The derived model was not validated in an external cohort. This possibly led to overfitting and the results might not be generalisable to other series of patients, especially European ones. Our results suggest that heterogeneity features could be used in a multi-centre setting regardless of the histology in series of European patients. Indeed, results in term of heterogeneity in Asian patients may not be applicable to European patients, as not only the ratio between ADC and SqCC is inverted in these populations, but also the rate of EGFR mutation is higher in Asian patients (20–40 %) compared to European patients (around 10 %) [42]. One could therefore postulate different TFs in ADC depending on the mutation status. Studies with a larger ADC population, focusing on the differences in heterogeneity features between mutated and non-mutated ADC are therefore required to complement recent data on standard SUV metrics [43, 44].

Conclusion

The use of PSF reconstruction with Gaussian filtering chosen to meet harmonising standards produced comparable SUV values, as well as similar levels of heterogeneity information, compared to OSEM images, which validates its use within the harmonisation strategy context for studies aiming to quantify intra-tumour heterogeneity to stratify, rank, or classify patients. However, unfiltered PSF-reconstructed images showed significantly higher heterogeneity for CHAUC, correlation, and HILAE, as well as a wider range of heterogeneity values than OSEM ones, for most of the metrics considered, especially when analysing larger tumours. This suggests that, whenever available, unfiltered PSF images should also be analysed because resulting quantitative heterogeneity features could be more discriminative in stratifying or ranking patients, which remains to be demonstrated. Finally, the main NSCLC histological subtypes in this cohort did not show any differences in terms of intra-tumour heterogeneity, despite some notable differences in metabolically active tumour volume and levels of uptake (SUVmax). This may facilitate the potential multi-centre use of heterogeneity features regardless of the histology in series of European patients.