Introduction

Decades of observation and research have shown that cancer is a heterogeneous disease. The phenomenon of intratumor heterogeneity, i.e., phenotypic and functional variability among cancer cells within the same tumor, arises as a result of the genetic instability of cancer cells and tumor microenvironment. Continuous mutations lead to different cellular subpopulations within the same tumor and the spatial distribution of these populations is complex and dependent on the tumor microenvironment [1, 2]. These biological features have a strong impact on patient treatment and outcome, for instance in breast cancer [3], making its characterization essential for treatment planning.

In this context, positron emission tomography (PET) with 18F-fluorodeoxyglucose (18F-FDG) can reflect metabolic tumor heterogeneity by measuring the variations in metabolism of different tumor regions. Apart from well-known metrics as SUVs (standardized uptake values) and MATV (metabolically active tumor volume), texture analysis [4,5,6] of 18F-FDG PET images has emerged as a promising tool for assessing intratumor heterogeneity [7]. The spatial distribution of SUV values within the tumor is the most important input of texture analysis since it is essential for an accurate characterization of metabolic heterogeneity. This metabolic heterogeneity is expected to reflect biological and clinical tumor properties and, therefore, most of the studies were conducted to show this correlation. Performance of texture analysis was better when predicting outcome or response to treatment in different types of tumors [8,9,10] than when discriminating tumor histology or immunohistochemistry (IHC) [7]. Although outcome and response are very relevant clinical variables, one would expect to find significant differences in biologically different tumors, with different aggressiveness and resistance to treatment. However, there is little evidence supporting this association [7]. Considering that the distance scales at which variations in tumor cell types are present range from cellular level (0.01-100 μ m) to macroscopic level (subpopulations larger than a few mm) [11, 12], and that the spatial resolution of the most advanced PET systems is about 2–4 mm, it seems clear that we can only measure heterogeneity directly if it appears at the macroscopic level and if differences in the radiotracer uptake between the different subpopulations of cells can be effectively measured by PET scanners. Nevertheless, it is possible that a microscopically heterogeneous population of cells presents a macroscopic signature, for instance, a particular shape, allowing us to measure microscopic heterogeneity indirectly by correlating texture analysis with anatomopathological information.

Breast cancer (BC) is a good example of a heterogeneous type of tumor [12]. According to [3], BC can be classified into five IHC subtypes: luminal A like (LA), luminal B like/HER2 − (LB −), luminal B like/HER2+ (LB+), HER2-positive non-luminal (HER2pnl) and triple negative (TN). Each of these is associated with a particular molecular subtype but, although both classifications are largely concordant, there is no exact overlap between IHC subtype and inherent molecular subtype [13]. Despite this non-perfect correspondence, IHC subtypes have shown prognostic and predictive power and are one of the main inputs in treatment planning [3], showing that this surrogacy encodes most of the relevant information of genetic analyses. Therefore, the study of correlations between tumor heterogeneity and IHC subtypes is valuable in the sense that it may provide the same information as IHC analyses. Furthermore, given the high overlap between IHC surrogates and genetic profiles, a positive correlation between heterogeneity descriptors and IHC subtypes is likely to imply a positive correlation between these descriptors and tumor genetic profiles. Although this can only be proven with a dedicated study, the demonstration of these correlations seems to be the necessary prerequisite before carrying out the more resource-demanding genetic studies. No previous investigations have found strong correlations between textural features and IHC subtype [14,15,16,17]. Since a good spatial characterization of tumor uptake is essential to quantify metabolic heterogeneity, an improvement of the spatial resolution of PET scanners may help to elucidate new associations between PET features and tumor biology. In the present study, we used a high-resolution dedicated breast PET (dbPET) scanner (MAMMI-PET, GEM Imaging S.A., Valencia, Spain) [18]. The aim of this work was to explore possible correlations between tumor biological characteristics and PET features measured with a high-resolution dbPET scanner.

Materials and methods

dbPET

PET studies were carried out with MAMMI-PET (GEM Imaging S.A., Valencia, Spain), a dbPET scanner designed to improve small-lesion detection while reducing received dose and study time [18]. Patients lie in prone position, allowing better tumor detection and minimizing breathing motion compared with a whole-body PET scanner. Our dbPET scanner consists of 12 continuous LYSO crystals arranged in a ring with an aperture of 186 mm that provides an axial field of view of 40 mm. The ring can be moved by a precise engine along the axial direction so acquisitions in a step-and-shoot mode can be launched to increase the axial field of view up to 170 mm. The 3D mode-acquired images were reconstructed with a 3D maximum likelihood expectation maximization (MLEM 3D) algorithm and individual beds merged to form a single 3D image. Different settings can be used in the reconstruction, namely, voxel size of 0.5 × 0.5 ×0.5 or 1 × 1 ×1 mm3; random, scatter, decay, and shape-based attenuation corrections, as well as different numbers of iterations. Spatial resolutions below 3 mm were measured in most of the field of view [18] and a comparative study between this dbPET scanner and a whole-body PET/CT reported better resolution and detectability of the former [19]. Also, an improvement in visual heterogeneity assessment was reported [20].

Patient cohort

This study was approved by the local ethics committees (CEIC-Galicia) and informed consent was obtained from all participants. The MAMMI-PET project consists of two different research lines. ARM 1 investigates the power of MAMMI-PET as a complementary imaging tool compared to MRI, so every breast cancer patient that was scheduled for MRI scan due to at least one of the following indications, (a) suspected multifocal/multicentric/bilateral disease, (b) discrepancies between ultrasonography and mammography, and (c) hidden tumor, was scanned with MAMMI-PET before any treatment. ARM 2 studies the power of MAMMI-PET in evaluating neoadjuvant treatment response. Every patient that was scheduled to receive neoadjuvant therapy was scanned with MAMMI-PET before treatment beginning. Participants in both ARMs had no previous breast cancer history. If a patient met the inclusion criteria for ARM 1 and ARM 2, then that patient was assigned to ARM 2. From July 2014 to February 2017, consecutive patients from ARM 1 and ARM 2 that met the following two imaging criteria, (1) lesions that were fully contained in the field of view and (2) lesions that were detectable, so that they could be unambiguously located by three independent nuclear physicians, were included in this study. TNM stage was established according to the 7th American Joint Committee on Cancer (AJCC).

PET protocol

Patients fasted for at least 6 h before the injection of 180 MBq of 18F-FDG. To ensure correct incorporation of the radiotracer, blood glucose levels were checked and patients rested in a warm room 30 mins before administration. All PET images were reconstructed with the settings recommended by the vendor: pixel size of 0.5 mm, 12 iterations and attenuation, scatter, and randoms correction. The axial field of view covered by each bed was 40 mm, with an acquisition time of 3 min and 30 s.

Tumor histology and immunohistochemistry

Core-needle biopsy was used for diagnosis. Histological grading was performed using the modified SBR system. Tumors were considered positive for estrogen (ER) or progesterone (PR) receptors according to the Allred method, being positive those tumors with a total score (TS) equal or higher than 3 (TS ≥ 3). Tumors were considered HER2-positive if a 3+ score was obtained according to HercepTest criteria. In situ hybridization (FISH) was used to assess positivity of equivocal (2+) cases. IHC classification followed the recommendations of the 13th St. Gallen International Breast Cancer Conference (2013) with a Ki67 threshold of 20% [3].

Texture analysis

Each tumor was manually enclosed in a cropping box by an experienced nuclear physician. A threshold of 0.45 ×SUV m a x was applied in the cropping box to segment lesions. All the possible holes of the resulting segmentation were filled with a 3D flood-fill algorithm. Quantitative analysis was carried out in the segmented tumors. In order to allow a precise comparison between whole-body PET and dbPET, we computed the same features as those used in [16], which include first-order statistics features (FOS) and textural features (TF), derived from co-occurrence matrix (CM) [4] and size-zone matrix (SZM) [21]. FOS features take the frequency histogram of tumor uptake values as the only input and therefore do not capture any information of the spatial arrangement of voxels. On the contrary, textural features quantify the presence of spatial structures at a local (CM) or regional (SZM) level. Computed FOS features were SUV m a x , SUV m e a n , MATV, TLG = SUV m e a n ×MATV and CH A U C [22]. The latter is the only FOS feature designed to quantify global heterogeneity by means of the area under the curve of the cumulative histogram. TF were calculated after quantization with 64 grey levels [16, 17, 23,24,25]. CM was computed by finding a coincidence in any of the 13 possible directions, instead of averaging 13 features, each one coming from a single matrix for each direction [25]. The selected features were entropy, homogeneity, and dissimilarity from CM [4], and HILAE and ZP from SZM [21]. All these features have shown robustness with respect to partial volume effects and segmentation [24], reconstruction settings [26], and reproducibility [23]. Most of them were also tested in previous studies, so a comparison of performance between dbPET and current whole-body PET scanners can be carried out. A comprehensive definition of all of the features is provided in Supplemental Methods. Algorithms were implemented in Matlab (The MathWorks, Inc).

Statistical analysis

For descriptive statistics, medians ±standard deviations were used. Correlations between features were evaluated using the Spearman rho. Associations between PET features and clinical, histopathological, and IHC factors were computed using a two-sided Wilcoxon rank-sum test. Associations between PET features and molecular subtype were assessed using a Kruskal–Wallis test. All these tests were corrected for multiple testing using the Benjamini–Hochberg method [27] and a further Tukey–Kramer test was applied when evaluating the pairs of molecular groups that are statistically different. The significance level used in the test was the standard α = 0.05. ANCOVA analysis was used to adjust for covariates. Discriminative power was quantified with the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. All of the analyses were computed with Matlab (The MathWorks, Inc).

Results

Patient population characteristics

A total of 179 patients participated in ARM 1 and ARM 2, resulting in 127 patients (median age 52 ±12 years) that fulfilled imaging criteria described in Section “Patient cohort” (Fig. 1). Overall, nine were diagnosed with stage I BC, 74 with stage II, 41 with stage III, and three with stage IV. Multifocal/multicentric disease with more than one biopsied lesion that fulfilled the inclusion criteria was present in ten patients. A total number of 139 tumors were analyzed. Tumor number for each molecular subtype was 36 LA, 50 LB −, 25 LB+, 16 HER2pnl, and 11 TN. Some examples of PET images from these tumors are presented in Fig 2. Patient and tumor characteristics are listed in Table 1.

Fig. 1
figure 1

Diagram showing the number of patients involved the study. Of the excluded patients from ARM 1 (ARM 2), 11 (31) did not show FDG uptake and ten (27) were not fully included in the field of view

Fig. 2
figure 2

Examples of some of the analyzed tumors. SUV scale is depicted on the right of each image. a T1, HER2pnl tumor, b T3, LB − tumor, c multifocal tumor

Table 1 Patient and tumor classification

Correlations between PET features

Correlations between SUV m a x and FOS features ranged from 0.24 for MATV to 0.99 for SUV m e a n (0.37 for CH A U C and 0.63 for TLG). TF were less correlated with SUV m a x than FOS: −0.30 for dissimilarity, 0.03 for entropy, 0.24 for homogeneity, 0.53 for HILAE and −0.34 for ZP. Correlations between MATV and FOS features were low (−0.24 for CH A U C ) except for TLG (0.89), as expected. Stronger correlations were found between MATV and TF: −0.86 for dissimilarity, 0.64 for entropy, 0.83 for homogeneity, 0.64 for HILAE, and −0.85 for ZP. Dissimilarity was strongly correlated with TLG (0.80), entropy (0.82), homogeneity (0.88), and ZP (0.97) (Supplementary Table 1).

Relation between PET features and clinical, histological, and immunohistochemical factors

Associations between features and tumor characteristics are summarized in Table 2. Clinical T stage was associated with MATV and TLG, as expected. CH A U C and TF, with the exception of HILAE, were also correlated with T stage. No association between clinical N stage and PET features reached statistical significance after multiple tests correction.

Table 2 Associations (p values) between PET features and clinical, histological, and molecular factors. When significant, AUCs comparing each group are presented in parentheses

Ductal carcinoma had significantly higher uptake than lobular carcinoma (p = 0.0022). None of the heterogeneity metrics was significantly associated with histology. Grade 3 lesions were associated with higher uptake compared to grade 1 and 2 lesions and higher TLG. HILAE was also associated with histologic grade (p < 0.001). CH A U C and ZP did not reach statistical significance after multiple tests correction.

Estrogen positivity was only associated with SUV m a x and SUV m e a n . Progesterone receptor status was significantly associated with SUV m a x and SUV m e a n . TLG, CH A U C and HILAE were also significantly associated. HER2 status was significantly correlated with SUV m a x and SUV m e a n (p < 0.001). This result remained significant after correcting by histologic grade (p = 0.011 and p = 0.006, respectively). CH A U C and HILAE were also associated with HER2 status (Fig. 3) (p = 0.002 and p = 0.016, respectively).

Fig. 3
figure 3

Box plots showing significantly associated features with HER2 status. The center mark indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the + symbol

All features were significantly associated with molecular subtype after multiple tests correction, although entropy did not reach significance after a further Tukey–Kramer test to determine which pairs of groups were different. Box plots of each feature are presented in Fig. 4. Most of the significant associations involved LA tumors. SUV m a x and SUV m e a n were lower for LA tumors (p < 0.001) than for the other subtypes. The discriminative power ranged from AUC = 0.717 for LB − to AUC = 0.880 for HER2pnl (Supplementary Figure 1). MATV showed relation p = 0.023 between LA and LB − with an AUC = 0.712 (Supplemental Figure 1b). TLG was significantly different (p < 0.001) for LA when compared to LB −, HER2pnl and TN. CH A U C was discriminative when comparing LA vs. HER2pnl and LB − vs. HER2pnl (p = 0.022). Among TF, dissimilarity, homogeneity, HILAE, and ZP were significantly different when comparing LA and LB − and AUCs were 0.768, 0.712, 0.797 (Supplemental Figure 1b) and 0.789, respectively. HILAE was also significantly correlated for LA vs. LB+ and LA vs. HER2pnl (Supplemental Figure 1a).

Fig. 4
figure 4

Box plots showing significantly associated features with IHC subtype. The center mark indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the + symbol

Comparison with previous studies

In order to evaluate the performance of dbPET with respect to whole-body PET, we performed a subanalysis to compare results with [16] and [17].

In [16], patients were classified into three IHC groups: TN, ER+/HER2 −, and HER2+. Associations between PET features and IHC classification from [16] are summarized in Supplementary Table 2. Only SUV m a x , SUV m e a n , and CH A U C were significantly associated with the IHC groups (p < 0.001 for SUV, p = 0.001 for CH A U C ). AUCs were computed using the criteria of [16], comparing each group against the other two together. Discriminative power was better for ER+/HER2 − and HER2+ groups than for TN.

In [17], only ER+ tumors were included in the analysis. Significant correlations after multiple tests correction were found for SUV m a x and SUV m e a n when discriminating tumor grade (p < 0.001, AUCs of 0.758 and 0.760, respectively) and HER2 status (p = 0.001 and p < 0.001, respectively). HER2 status remained significant after correcting by histologic grade (p = 0.017) and p = 0.014, respectively. TLG and HILAE remained significant (p = 0.009 and p = 0.012) for tumor grade, although AUCs were modest (< 0.7). T stage associations had low p values (0.006) but were not enough to pass multiple tests correction. These results are listed in Table 3.

Table 3 Associations (p values) between PET features and IHC factors with a subsample of 111 ER+ tumors, as studied in [17]. When significant, AUCs comparing each group are presented in parentheses

Discussion

As a heterogeneous disease, differences in histology and clinical outcome of breast cancer have served as the basis for tumor classification but, recently, this classification has been refined and at times replaced by molecular classifications, which have the ability to combine biological features with clinical outcome. In this context, the primary aim of texture analysis is to non-invasively characterize tumors by medical imaging. Although correlations with response and outcome were found in other tumors and imaging modalities [8,9,10, 28], poor associations were found with biological features of breast cancer [14,15,16,17]. A possible reason for this failure was pointed out in [29], where it was found that texture analysis with current PET resolutions in tumor volumes below 45 cm3 may lead to inaccurate heterogeneity quantification. This issue is specially problematic in BC, since most of the tumors are smaller than this limit [14,15,16,17]. In this study, we have measured five FOS and five TF in a cohort of 139 different breast tumors using a dbPET scanner. All these features were previously studied in [16] and most of them in [15, 17].

Our results indicate that HER2-positive tumors show higher metabolism than negative ones, contrary to what was found before by several studies [16, 17, 30,31,32]. This result remains when considering only ER+ tumors so no influence of ER status is altering our measurement. A possible reason to explain the inability of whole-body PET to detect this association might be its lower spatial resolution, thus yielding a less accurate SUV m a x measurement for a given HER2-positive tumor than the one measured by our dbPET scanner. Although a recent study has found this association using whole-body PET [33], the authors claim that their finding is only based on the histopathological nature of their patient sample. In any case, further investigation is needed to clarify the origin of the discrepancies among studies.

HILAE, a regional heterogeneity measure, was the only TF that showed correlation with IHC factors. Significant associations with tumor grade, progesterone status, and HER2 status were found. Furthermore, HILAE was the most discriminative feature of the four that were significantly associated with IHC subtype and it was moderately correlated with SUV m a x (rho=0.53) and MATV (rho=0.64), being TLG its most correlated feature (rho=0.76). Its better discrimination power with respect to SUV when comparing LA to LB −, the two most similar IHC groups in terms of uptake, and the fact that the rest of TF also showed better discrimination than SUVs when comparing these two groups (Fig. 4), may indicate that the role of tumor uptake is minimized while heterogeneity information captured by dbPET becomes more relevant in characterizing LA and LB- tumors. Figure 5 depicts the differences in uptake distribution for these two types of tumors, having both very similar SUV and MATV. HILAE also showed good discrimination power (AUC > 0.75) differentiating LA vs. LB+ and LA vs. HER2pnl. This connects with the above discussion concerning HER2-positive tumors: if such tumors present metabolic heterogeneity at small distance scales, HILAE might be reflecting this characteristic, since LB+ and HER2pnl define our population of HER2-positive tumors.

Fig. 5
figure 5

Example of uptake distributions of LA and LB − tumors with very similar SUV and MATV values but different HILAE, along with resulting segmentations. a LA with SUV m a x = 13.8, MATV = 1.1 cm3 and HILAE = 2951. b LB − with SUV m a x = 13.8 and MATV = 1.2 cm3 and HILAE = 6716

To compare our results with [16], we used the same IHC classification as that study. We found that SUVs and CH A U C were associated with IHC subtype. In [16], SUVs and TLG were significantly associated. The discriminative power of SUV for TN tumors was lower in our study, but an increase (> 0.05) in discriminative power was observed for ER+/HER2 − tumors, and a substantial increase (> 0.15) for HER2+ tumors. Due to differences in patient population (54 TN in [16] vs. 11 TN in our study) it is not possible to establish a direct comparison of performance between dbPET and whole-body PET. However, we have shown that regrouping the initial five groups into three can spoil any heterogeneity characterization. We must bear in mind that two molecular subtypes are biologically different entities, with completely different gene expressions even when they are differentiated just by a single IHC factor (for instance LA and LB −, which basically differ in Ki67 index) [3, 34]. In this sense, IHC factors act as surrogates to classify different genetic entities, so any molecular classification based only on few of these factors is going to contain a heterogeneous mixture of different biological profiles in each group. Our results suggest that heterogeneity can be captured with dbPET if the more modern five-group IHC classification is used but, as pointed out in [16], this is still pending work in whole-body PET.

Restricting the analysis to ER+ tumors, as in [17], our main results remained unchanged. HER2-positive tumors showed, again, higher uptake values. Apart from correlations of TLG and HILAE with histologic grade, we obtained very low p values for SUV associations with histology and PR status. Taking into account that our patient sample was very similar to the one used in [17], this comparison suggest that MAMMI-PET significantly improves the performance of whole-body PET in quantifying tumor characteristics.

Our study had some limitations. Although we found a positive relation between TF and IHC subtypes, the relation between TF and molecular subtype by means of genomic analyses must be explored, which are the most accurate descriptors of tumor biology and aggressiveness. The results of this work suggest that the existence of this relation is probable. We included stage I and IV patients, making our sample slightly different from the samples studied in [16, 17]. Our segmentation algorithm was chosen due to its simplicity but was different from the algorithms used in [16, 17]. This should not represent a serious problem since the chosen TF demonstrated robustness against the segmentation algorithm [24]. Ki-67 index is a measure with high degree of variation between laboratories and therefore it is important to follow the recommendations from [35] to achieve fully reproducible results.

The results of this work suggest that the direction that must be followed to achieve a complete heterogeneity characterization is two-fold. On one hand, and even though this study may indicate that TF provide additional information, it is essential to have a better understanding of TF, its correlations, and how they behave when measuring complex uptake distributions to determine the biological mechanisms behind different heterogeneity scores. This will allow to select the set of features that contain complementary and relevant information, a crucial step in order to design a machine-learning algorithm that will reliably classify tumor molecular subtypes in a non-invasive way. It should be noted that this methodology can be extended to the prediction of other relevant variables such as clinical outcome or response to treatment, providing that enough data are available and a good selection of input features was performed. On the other hand, an improvement in the quality of PET images, in terms of resolution and noise, may make emerge new and strong associations that are currently disguised, as we have shown in this study. Bearing in mind these strategies, it would be desirable, from a theoretical point of view, to collect more evidence linking heterogeneity in PET images with the expected pathological manifestations such as axillary nodal involvement. In our opinion, these are the guides that should be followed in future studies.

Conclusions

Our study suggests that dedicated breast PET resolution improvement enables a more precise heterogeneity characterization. Thus, TFs were able to discriminate IHC subgroups, showing its best performance when comparing LA and LB −, the two subgroups with most similar uptake levels. Due to the low correlation of TF with SUV, this result suggests that there is an actual metabolic difference in the uptake distribution of both types of tumors. Additionally, HER2+ tumors showed increased SUV, a result that was not reported before.