Introduction

Fluorine-18 fluorodeoxyglucose positron emission tomography ([18F]FDG PET) could play an important role in guiding the clinical management of breast cancer patients. Because [18F]FDG uptake in most breast cancers is higher than that in normal breast tissue, it can be used for diagnosis [1, 2, 3, 4, 5, 6, 7] and staging [8, 9, 10, 11, 12], for monitoring response to chemotherapy [13, 14, 15, 16, 17, 18, 19] and for predicting clinical and pathological outcome [20, 21]. Several methods can be used to assess [18F]FDG uptake, and methodology is still heterogeneous, making it difficult to compare studies from different institutes. Whereas in theory, non-linear regression (NLR) of the full compartment model [22, 23, 24] is the most accurate quantitative measure of glucose metabolic rate (MRglu), it is also technically and analytically the most complicated method at the expense of clinical applicability. Therefore, a multitude of alternative analytical methods have been proposed over the years. Claims have been made with respect to simplified calculations and simpler clinical protocols, as well as to higher accuracy, less variability in parameter estimation and incorporation of tumour heterogeneity [25, 26, 27, 28, 29, 30, 31, 32, 33]. Unfortunately, there is a lack of studies that comprehensively validate these alternatives [34]. Simplified methodology needs to be validated not only for tumour type, but also for therapeutic intervention. The latter is needed because therapy itself might alter body distribution and thus plasma clearance of [18F]FDG, thereby possibly violating underlying assumptions of at least some of the simplified methods.

In this study several analytical methods for measuring [18F]FDG uptake were compared with NLR in 20 breast cancer patients. The main purpose was to select the most promising methods in terms of accuracy, using full compartmental analysis with NLR as a reference. Since patients were additionally scanned at least once while receiving chemotherapy, it was explored as a secondary purpose whether correlation of a method with NLR changed over time during therapy.

Materials and methods

Patients

Twenty women with stage III or IV breast cancer were included in this study: ten with newly diagnosed locally advanced breast cancer (LABC) and ten with locoregional recurrence and/or distant metastases. Patients were scanned as part of ongoing clinical studies. The Medical Ethics Committees of the two participating hospitals approved the study and patients gave informed consent. Mean age (± SD) was 51.3±11.2 years (range 31–77 years), mean body weight (±SD) was 75±11 kg (range 54–97 kg) and no patient had diabetes.

Patients with LABC received either six courses of doxorubicin 90 mg/m2 and cyclophosphamide 1,000 mg/m2 (AC) at 3-weekly intervals, followed by mastectomy or a split-course of three cycles of AC followed by mastectomy and three more cycles of AC. G-CSF or GM-CSF was administered from days 2 to 12 after start of chemotherapy. Treatment for stage IV disease consisted of FAC or FEC (5-fluorouracil 500 mg/m2, cyclophosphamide 500 mg/m2 and doxorubicin 50 mg/m2 at 3-weekly intervals in three patients; 4-epirubicin 60 mg/m2 at 3-weekly intervals in one patient) as first-line chemotherapy, or docetaxel (100 mg/m2 every 3 weeks in six patients) as second-line therapy.

Primary tumours had all been confirmed by either fine-needle aspiration or core biopsy. Abdominal ultrasound, abdominal CT, bone scan and, in some cases, whole-body [18F]FDG PET scans had been performed prior to inclusion. Metastases were found in liver, lung and skeleton.

No patient with recurrent disease had received either radio- or chemotherapy in the 3 months prior to inclusion. Characteristics of all patients are summarised in Table 1.

Table 1. Patient characteristics

Study protocol

To address the first research question (selection of the most accurate methods at baseline), all 20 patients were scanned prior to treatment. To analyse the potential impact of time and therapy on the correlation between these methods and NLR, 18 of these patients underwent one or more follow-up scans during chemotherapy: 8 days after start of the first cycle (n=8), shortly before the second cycle of chemotherapy (n=18), and after three and six cycles of chemotherapy (n=10 and 6, respectively).

Data acquisition and reconstruction

An ECAT EXACT HR+ PET scanner (Siemens/CTI, Knoxville) was used, the characteristics of which have been described elsewhere [35]. Patients fasted for at least 6 h before scanning. Imaging was performed over the breast and, where applicable, over metastases in the same field of view as the heart. Patients were scanned in the supine position. Prior to the injection of approximately 370 MBq of [18F]FDG, a 10-min transmission scan was acquired for subsequent attenuation correction of the emission data. During dynamic scanning, three venous blood samples were taken at 35, 45 and 55 min post injection. From these samples, whole blood and plasma [18F]FDG concentrations were measured using a well counter, which was cross-calibrated against the PET scanner. In addition, plasma glucose concentration was measured (hexokinase method).

Dynamic emission data were acquired for 60 min (6×5, 6×10, 3×20, 5×30, 5×60, 8×150, 6×300 s frames) and were corrected for decay, scatter, random coincidences and measured photon attenuation. Scans were reconstructed as 128×128 matrices using filtered back-projection (FBP) with a 0.5 Hanning filter, resulting in a transaxial spatial resolution of ~7 mm full-width at half-maximum at the centre of the field of view. In addition, summed images (45–60 min post injection) were reconstructed using ordered subsets expectation maximisation (OSEM) [36, 37] with 2 iterations and 16 subsets.

Data analysis

For the image-derived input functions (IDIF), regions of interest (ROIs) were drawn on early frames, when activity was restricted to the arterial blood pool. Small circular ROIs were drawn on the aorta, left atrium and left ventricle [38] in all cases. In addition, adopting a procedure recently described and validated [39], 15-mm ROIs on the aorta were defined for the baseline scans only. The resulting input functions (IDIFsmall and IDIF15, respectively) were used to assess the sensitivity of baseline MRglu values (obtained with NLR) to the actual definition of input ROIs. Projection of input ROIs onto all 39 frames yielded arterial blood time-activity curves (TACs). Individual curves were inspected with respect to peak height and correspondence with the venous blood samples using a previously reported quality control procedure [38]. Accepted curves were averaged (weighted to volume) and the mean curve was used for further kinetic analysis.

Primary tumours and metastases were defined as areas of focally increased [18F]FDG uptake corresponding with known tumour lesions. ROIs were drawn on the OSEM reconstructed summed images using a semi-automatic technique, which creates a 3D ROI of a lesion by including all pixels within 50% of the maximum tumour activity. Projecting these ROIs onto all frames of the original FBP reconstructed dynamic scan generated tumour TACs.

Analytical methods for evaluating FDG uptake

The gold standard in this study was non-linear regression, a method based on the compartment model originally described for carbon-14 deoxyglucose by Sokoloff et al. [22]. Either three or four rate constants can be incorporated in the calculation of MRglu [23, 24]. Rate constants K 1 and k 2 refer to forward and reverse capillary transport of [18F]FDG and k 3 and k 4 refer to phosphorylation of [18F]FDG and dephosphorylation of [18F]FDG-6-PO4, respectively. In the present study, curve fitting was performed using both three and four rate constants, together with a parameter to account for intravascular [18F]FDG activity [40]. The best fit was determined by Akaike and Schwartz criteria [41, 42]. Fitted MRglu values were only accepted if their standard errors were less than 50%. The lumped constant (LC), a value accounting for the kinetic differences between [18F]FDG and glucose, is unknown for breast cancer and was assumed to be unity.

The analytical methods that were compared with NLR were Patlak graphical analysis [28], SUV [27], simplified kinetic method [29], SUV-based net influx constant ("Sadato" method) [30], tumour to non-tumour ratio [31], total lesion evaluation [32] and the 6P model [33]. SUV was normalised for weight (W), lean body mass (LBM) and body surface area (BSA) and calculated with or without correction for plasma glucose (g). In a recent review [43] all these methods were described in detail. A brief summary is given in the Appendix.

Statistics

Intra-class correlation coefficients (ICCs) were used to compare baseline MRglu values obtained with NLR using either IDIFsmall or IDIF15.

For further statistical analysis the patient population was divided into two groups, which were analysed separately. In the first group only data from the lesions identified on the 20 baseline scans were used. All analytical methods for assessment of [18F]FDG uptake were applied to this data set. Correlation coefficients with NLR were calculated with multilevel analysis [44] to compensate for the inclusion of multiple lesions within the same patient. This approach resulted in "adjusted" correlation coefficients.

In the second group data from 18 baseline and all the follow-up scans were pooled. Only the analytical methods with the highest "adjusted" correlation with NLR at baseline were considered for further analysis in this group. Multilevel analysis was applied to compensate not only for multiple lesions within the same patients, but also for the same patients being scanned several times. It was assessed whether correlation with NLR changed over time after start of chemotherapy. In addition, it was assessed whether correlation was influenced either by the chemotherapy administered or by the type of tumour lesions involved. This was done by adding therapy or tumour type as an interaction term to the multilevel model. Three therapy groups (AC, docetaxel and FAC/FEC combined) and four tumour groups (primary tumour, and lymph node, soft tissue and bone metastases) were considered. The interaction term was considered statistically significant if P<0.10, which is customary with interaction analysis.

Results

Comparison of analytical methods

The 20 patients scanned prior to therapy provided a dataset of 108 separate tumour sites (primary tumours and metastases to lymph nodes, lung, skeleton and liver). Data from three lesions (two lymph nodes and one bone metastasis) were excluded from further analysis because fitted MRglu values (using NLR) had standard errors larger than 50%. Therefore, the comparison of methods was based on a total of 105 lesions.

Comparing the baseline MRglu values obtained with NRL and either IDIFsmall or IDIF15 resulted in ICCs of 0.99 [95% confidence interval (CI) 0.98–1.00] with three rate constants and 0.97 (95% CI 0.91–0.98) with four rate constants. Based on Akaike and Schwartz criteria, NLR gave better parameter fits with three rather than four rate constants in 86% (90/105) of cases. This was even true for 36 of the 38 liver metastases identified on the baseline and follow-up scans. Since the average value of k 4 (0.003±0.022) was also not significantly different from zero, NLR with three rate constants was used as the reference test.

Table 2 lists the "adjusted" correlation coefficients for all analytical methods compared with NLR. Best results were obtained for Patlak10–60 and Patlak10–45 (r=0.98 and 0.97, respectively), followed by SUVLBMg40–60, SKM40–60 and Patlak20–60, all with r=0.96. Correlation of SUV with NLR was best for SUVLBM (r=0.91–0.96), followed by SUVBSA (r=0.87–0.93), and was worst for SUVW (r=0.82–0.88). Glucose correction improved correlation with NLR in every case. Correlation coefficients for the "Sadato" methods were similar to those for the corresponding SUV parameters. For TLE, comparable correlation coefficients (r=0.9–0.94) were found for threshold values ranging from 0.5 to 0.8. This analysis could not be performed with thresholds higher than 0.8 as most lesions then became indistinguishable. TNT based on large mirror regions (i.e. the whole normal breast) resulted in better correlation coefficients (r=0.82–0.84) than were obtained with TNT based on small mirror ROIs in the non-affected breast (r=0.66–0.73). With the 6P model, correlation with NLR was moderate when small mirror ROIs were used (r=0.62), but became virtually non-existent (r=0.14) with large reference regions.

Table 2. Comparison between analytical methods and non-linear regression

Based on these results, the following three methods were selected for further analysis: Patlak (both 10–60 and 10–45), SKM40–60 and SUVLBMg40–60. In addition, SUVBSAg40–60 was investigated further since this normalisation procedure is generally thought to compensate more accurately for the dependency of SUV on overall body size [45, 46, 47, 48].

Effect of time, therapy and tumour lesion

On the baseline and follow-up scans a total of 270 lesions could be identified. Data from 25 lesions (15 lymph node, six bone and four liver metastases) were excluded from further analysis because fitted MRglu values (using NLR) had standard errors larger than 50%. The evaluable dataset therefore comprised 245 lesion ROIs.

After an initial rise 8 days after start of chemotherapy, the "adjusted" correlation between NLR and SUV decreased significantly on follow-up scans compared with baseline values. In contrast, changes in correlation with NLR for SKM40–60 and Patlak were small and not statistically significant (Table 3). On a patient basis this meant that SUV generally showed the same trend in [18F]FDG uptake over time (e.g. increase or decrease) as the other methods, but with more variability in the demonstrated change. In addition, in a number of cases, SUV changes were the opposite of what was found with other methods (Fig. 1). On a lesion basis it was seen that the variability of SUV over time was caused by bone and soft tissue metastases in 57% of cases.

Table 3. Changes in correlation with NLR during chemotherapy in primary and metastatic tumour lesions
Fig. 1A–D.
figure 1

Examples of changes (%) in FDG uptake of four different tumour lesions in three patients. Primary tumour (A) and lung metatasis (B) in same patient. Changes after one course of chemotherapy. Bone metastasis (C). Changes after 8 days and after one and three courses of chemotherapy. Liver metastasis (D). Changes after one, three and six courses of chemotherapy

To further explore possible influences of chemotherapy and tumour lesions on the observed changes in correlation, interaction analysis was performed for Patlak, SKM and SUV. The group receiving AC as chemotherapy consisted of eight patients with 71 ROIs, the FEC/FAC group of four patients with 67 ROIs and the docetaxel group of six patients with 107 ROIs. Further characteristics are given in Table 4. It was seen that for Patlak10–45 and for SKM, correlation with NLR varied with the therapy given. The interaction analysis also showed that correlation with NLR was lowest for bone metastases, regardless of the method used (P<0.05).

Table 4. Tumour characteristics in different tumour groups

Discussion

The main purpose of this study was to investigate which analytical methods displayed the highest accuracy in terms of correlation with full compartmental analysis using NLR (the gold standard). NLR could not be used as the reference method in a total of 28 ROIs that had to be excluded because of large standard errors of fitted MRglu values. These ROIs mostly involved small lymph nodes, or lesions after three or more courses of chemotherapy, which had very low uptake, and, therefore, high noise levels in the parameter fitting, a known shortcoming of NLR. Use of an IDIF rather than an input function based on arterial sampling could compromise NLR analysis. The procedures used for defining the IDIF, however, have been validated in earlier studies [38, 39], and in the present study the use of two different sets of ROIs (IDIFsmall and IDIF15) resulted in excellent intra-class correlation coefficients for NLR. In addition, use of an IDIF resulted in excellent test-retest variability for NLR in lung tumours [49]. Therefore, it seems justified to use NLR as the gold standard, even with an IDIF.

A limited number of comparative methodological studies have been published with respect to breast cancer. Wahl et al. [13] compared SUV50–60, k 3 and Patlak as tools for monitoring response in 11 patients with primary breast cancer and reported the same trend (decrease) for all these parameters in responding patients, although variability was highest for k 3. Avril et al. [50] examined the ability of TNT, Patlak and SUV40–60 (with several normalisation procedures) to differentiate between benign and malignant breast tumours in 73 patients and found no clear advantage in using Patlak over SUV. SUV normalised for weight and partial volume effect and corrected for plasma glucose yielded the highest diagnostic accuracy. Kole et al. [51] compared the correlation between SUV30–50 (with different normalisation procedures) and Patlak in 13 patients with malignant and benign breast lesions and found the best correlation for SUV normalised for body surface area and plasma glucose. No comparative breast cancer study involving SKM could be found, but Graham et al. [52] compared SUV45–60, SKM and Patlak with NLR in a group of 40 colon cancer patients with metastases to the liver and ranked Patlak first, followed by SKM and SUV normalised for body surface area. More recently, Hoekstra et al. [49] performed a comparative study in lung cancer patients, comprising all analytical methods presented in this paper, and also found the best correlation with NLR for Patlak and SKM, followed by SUV normalised for BSA and with plasma glucose correction.

In the present study, Patlak10–60 and Patlak10–45 were found to be superior to all other methods with respect to correlation with NLR (r=0.98 and 0.97, respectively). SKM40–60 and SUVLBMg40–60 had comparable correlation coefficients (r=0.96).

Correlation of SUV with NLR was slightly higher for LBM than for BSA normalisation, in contrast to EORTC guidelines and other studies [34, 45, 46, 47, 48], which recommend BSA normalisation. The differences between LBM and BSA normalisation in this study were actually relatively small (r=0.96 vs r=0.93), but the correction for plasma glucose levels seemed to be important, in concordance with other studies [49, 50, 51, 53, 54]. The "Sadato" method (SUV-based influx constant) [30] is basically a net influx constant derived from SUV and the method will have the same characteristics and the same disadvantages as SUV [25, 26, 27]. Not surprisingly, correlations with NLR were identical to those of the corresponding SUVs. The 6P model [33] was developed to take tumour heterogeneity (i.e. mixture with normal tissue) into account. In the present study, two kinds of reference regions were defined, but correlations with NLR were moderate at best. This is mainly due to the large number of parameters, leading to more uncertainty in the estimation of each parameter. The TLE method [32] is based on correlation coefficient constrained parametric (Patlak) images of glucose metabolism [55], which should improve accuracy in identifying (small) focal increases in MRglu and reduce background noise. In the present study, correlation with NLR was good (r=0.90–0.94 with thresholds ranging from 0.5 to 0.8), but definitely lower than that of the original (non-constrained) Patlak images (r=0.98). Finally, the correlation of TNT with NLR was only moderate (r=0.66–0.84).

Three analytical methods were identified that potentially could replace NLR for monitoring response to chemotherapy in breast cancer. These methods represented different levels of simplification and each of these methods was superior to other methods with a similar level of simplification. Patlak analysis requires a "slow" dynamic scanning protocol, but allows for functional images of MRglu. SKM and SUV only require a static image (both 40–60 min), with SKM requiring (at least) one additional venous blood sample. The advantage of SKM over SUV is that these blood samples provide a means to perform a first-order correction for variation in plasma clearance of [18F]FDG.

The above methods were selected on the basis of scans performed prior to chemotherapy. If they are to be used for response monitoring purposes, however, good correlation with NLR at baseline is not sufficient. In this respect it should be noted that according to the interaction analysis, correlation with NLR was not constant, but could either increase or decrease depending on the chemotherapy or tumour type. Results from this analysis should be interpreted with care, because the number of patients in each therapy group was small and in addition, patients in each therapy group differed with regard to tumour stage and type of tumour lesion (Table 4). Nevertheless, it was seen that bone metastases behaved differently from all other tumour lesions in the sense that correlation with NLR was lowest in bone metastases, regardless of the analytical method used.

Correlation with NLR decreased significantly over time for both SUVs evaluated, thereby possibly complicating interpretation of SUV results during therapy. This "time effect" was seen on all subsequent scans after the first course of chemotherapy, but the magnitude varied with different time points and with the tumour lesions involved (primary or metastatic). Correlation of SUV with NLR was lower for soft tissue and bone metastases than for primary tumours or lymph node metastases. This is no explanation for the observed effect, however, since compared with baseline, the proportion of soft tissue and bone metastases did not change over time. Since Patlak did not suffer from this time effect, and the SKM only to a much lesser extent, the most plausible explanation seems to be that there are therapy-mediated changes in the plasma clearance of [18F]FDG that are not detected by SUV, because this method does not register what happens in plasma. A limitation of this study is that for the time analyses, patients were either grouped by therapy (three groups) or by tumour lesion (four groups), but not by both simultaneously, because then individual groups would have been too small. Studies with larger, uniform patient populations are needed to clarify how types of therapy and tumour lesions affect the (relative) performance of analytical methods.

In conclusion: Apart from full compartmental analysis, three analytical methods are eligible for response monitoring studies using [18F]FDG. Patlak analysis is the alternative method of choice in terms of accuracy. SKM and SUV are both candidates in terms of simplicity (static scan), but SKM incorporates a first-order correction for changes in plasma clearance of [18F]FDG. For the present data, SUV should ideally be normalised for lean body mass (normalisation for body surface area performed slightly worse) and be corrected for plasma glucose. Finally, only SUV showed a significant change in correlation with NLR during chemotherapy.