Introduction

Standardized uptake values (SUVs) or metabolic rates are routinely used to monitor response to therapy when using 18F-fluorodeoxyglucose (FDG) positron emission tomography (PET). However, both parameters incompletely capture the tumour response to therapy as they do not reflect the total tumour metabolic activity but instead reflect metabolic activity per gram of tissue; a tumour could grow or shrink without a change in the metabolic rate per gram of tissue [1]. By multiplying the metabolic rate or SUV of a lesion by its volume to obtain “total lesion glycolysis” or TLG, this limitation may be overcome [2]. Various methodologies have been described to define tumour volumes using FDG PET. Adaptive methods that use a variable, tumour to background dependent threshold above which voxels with an activity concentration superior to this threshold are included in the tumour volume have proven to be the most robust [3, 4]. While these adaptive methodologies apply well when confronted with untreated, large tumours with high levels of metabolic activity, accurate assessment of tumour volume and thus also of TLG can prove challenging during or after treatment. Firstly, with decreasing tumour volumes, the accuracy of SUVs is affected by partial volume effects (PVE) resulting in underestimation of the true SUV values [57]. PVE will also influence the determination of the tumour volume through modification of the optimal threshold value for tumour delineation. Secondly, if the contrast between the tumour and surrounding normal tissue is low, no clear tumour borders can be identified. In this case, it could be argued that when using PET/CT, tumour volumes might be derived from CT images. However, quantification of tumour volumes by means of CT has proven to be quite labour intensive. Finally, tumour volume measurements by thresholding on PET images are dependent on the scan time; volumes decrease significantly and the variability of the volume is significantly larger for images with scan times below 3 min per bed position when compared to images with scan times of ≥3 min [8]. Consequently, easily applicable and straightforward methodologies assessing TLG, though avoiding PVE and tumour volume assessment by thresholding, and that are less count dependent, are of major interest. In this paper, such a method is described. The method assesses the total excess SUV above the tumour background: the standardized added metabolic activity (SAM). Phantom data as well as initial clinical validation data are presented.

Materials and methods

Phantom study

A Jaszczak Deluxe phantom was filled with a background concentration of 5.92 kBq/ml (0.16 μCi/ml) 18F-FDG. The spheres of the Jaszczak phantom, having a diameter of 9.5, 12.7, 15.9, 19.1, 25.4 and 31.8 mm, were filled with a 5.4 times higher concentration. Data acquisition was performed during 10 min with a Philips Allegro PET system in 2 mm3 voxel size. Subsequently, additional activity was added to the background of the phantom to obtain an activity ratio of 3.64. Then a new data set was collected for 10 min. This procedure was repeated a third time using a ratio for spheres to background of 2.0.

Images were reconstructed using standard, commercially available reconstruction software (Philips). For the determination of SAM, two volumes of interest (VOIs) were delineated using PMOD software (version 3.1, PMOD Technologies Ltd., Zurich, Switzerland). The first VOI (VOI1) was delineated around the sphere and a second, larger VOI (VOI2) around VOI1. In all cases, VOI1 was placed at a sufficient distance from the sphere borders in order to avoid PVE and to ensure that no spillover from the sphere to VOI2 occurred.

The exact volumes of the spheres were determined by weighing the filled as well as the empty spheres. The measured inside volumes were 18.3, 11.4, 5.68, 2.13, 1.19 and 0.58 cm3. The respective volumes of the plastic shells were 12.3, 6.53, 4.13, 2.25, 1.60 and 1.36 cm3. These volumes were subtracted from VOI1 for each sphere since they contain no activity (volume VOI1 corrected). Without this adjustment the background correction would be overcompensating and more importantly, it would influence the result for each sphere differently. Errors would become especially large in the smallest spheres where the volume of the plastic becomes comparable to or even larger than the inner sphere volumes. Based on the volume and total SUV count of VOI1 (corrected for the sphere shell volume) and VOI2, the SAM of the spheres were derived as follows:

First, mean background activity (BG in SUV) was derived by the formula:

$$ Mean\,BG = \left( {Total\,SUV\,VO{I_2} - Total\,SUV\,VO{I_1}} \right)/\left( {Volume\,of\,VO{I_2} - Volume\,of\,VO{I_1}} \right) $$

in which “total SUV” is the multiplication of the mean SUV by the respective volume. Subsequently, SAM was calculated using the following formula:

$$ SAM = Total\,SUV\,V{O_1} - \left( {mean\,BG*Volume\,VO{I_{{1corrected}}}} \right) $$

Finally, normalized SAM was calculated as follows:

$$ norm\,SAM = SAM/mean\,BG. $$

Measured SAM as well as normalized SAM were expressed as a percentage of the expected values which is equal to the actual activity ratio minus 1.

Patient studies

Clinical settings

Impact of acquisition time on SAM

Fifteen patients suffering from primary squamous cell carcinoma (PSCC) of the head and neck were included in this substudy. All patients underwent a whole-body FDG PET/CT examination and subsequently a 10-min list-mode acquisition PET/CT study of the head and neck region. From the list-mode data, image frames with scan times of 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10 min were reconstructed. On the reconstructed images, SAM of the primary tumour as well as SUVmax values were determined. The effects of the scan time on SAM and SUVmax values of the primary tumour were assessed. SAM values derived from images with a duration of 10 min (SAM10) were considered the gold standard. SAM values of the remaining nine images (duration 1–9 min) were expressed as a percentage of the 10-min SAM value. The same measurements were performed for normalized SAM and SUVmax.

SAM for response assessment

In this substudy, 19 patients (11 women) were included. All patients suffered from colorectal carcinoma metastasized to the liver for which they were treated by means of chemotherapy (FOLFOX, FOLFIRI). All patients underwent a PET/CT examination prior to treatment initiation (scan 1) and after 3-6 cycles of chemotherapy (scan 2). SAM values of the liver lesions on scan 1 (SAM1) as well as SUVmax values were determined and the percentage reduction between both was calculated as follows:

$$ \matrix{ {\Delta {\text{SAM}} = \left( {{\text{SAM}}1 - {\text{SAM}}2} \right)/{\text{SAM1}} \times {1}00} \hfill \\ {\Delta {\text{SUVmax}} = \left( {{\text{SUV}}\max 1 - {\text{SUV}}\max 2} \right)/{\text{SUVmax1}} \times {1}00,\,{\text{were}}\,{\text{obtained}}.} \hfill \\ }<!end array> $$

The complete disappearance of abnormal FDG uptake was considered a 100 % reduction. Results obtained were related to Response Evaluation Criteria in Solid Tumors (RECIST) response outcome [9]. Receiver-operating characteristic (ROC) curve analysis was performed to assess the discriminatory power of ΔSAM and ΔSUVmax to separate responders from non-responders as defined by RECIST response outcome.

Data acquisition

All patients were injected with a dose of 3.7 MBq (100 μCi) of 18F-FDG per kg body weight. PET/CT acquisition was started between 45 and 60 min after the tracer administration with a duration of 3 min per bed position (matrix size 4 mm3 voxel). Images were acquired from the base of the skull to the proximal third of the femora.

The Gemini PET/CT imaging system (Philips Co., Cleveland, OH, USA) consists of a gadolinium oxyorthosilicate (GSO) full-ring PET scanner with 5.0 mm spatial resolution and a 16-slice helical CT scanner. Standard software recommended by the manufacturer was used for acquisition and processing. Reconstruction was performed using an iterative 3-D row action maximum likelihood algorithm (RAMLA), with 2 iterations and using generalized Kaiser-Bessel functions (blobs). Scatter and attenuation correction were applied using the single scatter simulation algorithm and measured attenuation correction, respectively.

Patients in substudy 1 subsequently underwent an additional PET/CT study in list-mode acquisition of 10 min duration. First a low-dose CT [30 mA, 120 kV, field of view (FOV) 600 mm, 0.5 s rotation time, pitch 0.9, collimation 16×1.5 mm] was performed with the arms at the side of the torso. Immediately after the CT acquisition a list-mode acquisition with a duration of 10 min was performed. From the list-mode data, image frames of 1–10 min duration were reconstructed and corrected for scatter and attenuation.

Image analysis (calculation of SAM)

A volume of interest (VOI1) was drawn using PMOD software over the primary tumour lesions of patients of substudy 1 on all reconstructed image sets (1- to 10-min reconstructed images) as well as over all individual metastatic liver lesions of patients of substudy 2 (scan 1 and scan 2). A second VOI (VOI2) was delineated around VOI1 directed to a small zone of homogeneous background. The borders of VOI1 were set at a reasonable distance from the tumour lesions in order to avoid PVE and to ensure that no spillover from the tumour lesion to VOI2 occurred (see Fig. 1). SAM and normalized SAM values were derived using the formulas described in the phantom study, but without the volume correction for VOI1, as this does not apply on patients. In patients with multiple liver metastases (substudy 2) SAM was calculated as the sum of the individual SAMs of the lesions.

Fig. 1
figure 1

18F-FDG PET scan of a patient with colorectal carcinoma metastasized to the liver. A volume of interest (VOI1, blue VOI) was set at a reasonable distance from the metastatic lesion in order to avoid PVE. A second VOI (VOI2, green VOI) was delineated around VOI1. Subtracting the background activity in VOI2-VOI1 from VOI1 yields SAM

Results

Phantom study

Table 1 shows that for a ratio of 5.4, all of the normalized SAM values/sphere volume remain constant for all spheres, with a mean value of 91.9 % (SD 6.8 %) of the expected value of 4.4. For the phantom study with a ratio of 3.64, the 0.58 ml sphere was no longer visible. For the five remaining spheres a mean value of 97.6 % (SD 3.6 %) of the expected value of 2.64 was obtained. For the lowest ratio of 2.0, four spheres were still visible. The smallest one had a volume of 2.1 ml. A mean value of 96.0 % (SD 6.2 %) of the expected 1.0 was calculated. Overall for the 3 acquisitions and the 15 visible spheres, the mean normalized SAM/sphere volume calculated was 94.9 % (SD 5.9 %) of the expected value. These results show that SAM per unit of volume remains unchanged for smaller spheres, proving that this method is not influenced by PVE and can be used for small lesions. Since in patients the tumour size is not known, SAM/tumour volume cannot be calculated. Instead, SAM or SAM normalized for background activity can be used.

Table 1 Normalized SAM/sphere volume

Clinical studies

Impact of acquisition time on SAM

Absolute mean (+ range) SAM10 values, mean BG10 values and normalized SAM10 values of PSCC were 70.9 (6.9–105.4), 0.7 (0.4–1.1) and 73.8 (13.3–255.6), respectively. Absolute mean (+ range) SUVmax values of PSCC were 5.8 (2.6–10.5).

SAM values, normalized SAM values and SUVmax values of PSCC expressed as a percentage of the 10-min value for the different acquisition durations (1–10 min) are shown in Table 2. While SAM decreased significantly with shorter scan time (Friedman, p < 0.0001), the differences proved however small. The mean difference between the 1-min acquisition and the 10-min reference was only 4 % (SD 5 %; range 88–105 %). This decrease with shorter scan time was not seen for normalized SAM, but the dividing operation increased the statistical noise resulting in a larger variability (SD 10 %; range 88–121 %).

Table 2 SAM, normalized SAM and SUVmax expressed as a percentage of the 10-min value

SUVmax was not significantly different between images with various scan times (Friedman, p = 0.2). The variability of SUVmax values proved however slightly larger than that of SAM values as illustrated by the SD of 6.4 % (range 86–110 %).

SAM for response assessment

No systematic differences in starting times (between baseline and response scans, between responders and non-responders or a combination of both) were observed (p ≥ 0.2).

SAM1 values of liver lesions ranged from 18.3 to 2,111 (mean 196; SD 487). SUVmax1 values ranged from 1.7 to 14.6 (mean 5.3; SD 3.0).

According to the RECIST criteria as applied on CT we identified 8 responders and 11 non-responders. SAM1 and ΔSAM values for responders versus non-responders were 57 (SD 119) versus 297 (SD 625) for SAM1 (p = 0.2) and 99 % (SD 3 %) versus 32 % (SD 44 %) for ΔSAM (p = 0.001), respectively. The area under the ROC curve (AUC) for SAM1 and ΔSAM were 0.74 and 1.0, respectively, suggesting that ΔSAM may accurately assess response to treatment of liver metastases in patients suffering from colorectal cancer (CRC) (see Fig. 2). Using a cutoff of 85 % reduction for ΔSAM, responders could be separated from non-responders with a sensitivity of 100 % and a specificity of 100 %.

Fig. 2
figure 2

ROC curves for SUVmax1, SAM1, ΔSUVmax and ΔSAM

SUVmax1 and ΔSUVmax values in responders versus non-responders were 3.9 (SD 2.4) versus 6.3 (SD 3.1) for SUVmax1 (p = 0.08) and 94 % (SD 17 %) versus 7 % (SD 40 %) for ΔSUVmax (p = 0.0001), respectively. The AUC for SUVmax1 and ΔSUVmax were 0.79 and 0.99, respectively, suggesting that ΔSUVmax may accurately assess response to treatment of liver metastases in patients suffering from CRC (see Fig. 2). Using a cutoff of 48 % reduction of ΔSUVmax, responders could be separated from non-responders with a sensitivity of 100 % and a specificity of 91 %.

The AUC of ΔSAM and ΔSUVmax proved to be not significantly different (p = 0.6).

Discussion

The main problem for accurate quantification in PET is the relatively low resolution of the PET camera. Both quantification of uptake values and volume determination are affected by the PVE [6].

In this study, using a Jaszczak phantom with a homogeneous background in order to simulate normal liver activity, we show that SAM is sphere size independent, confirming that, when using SAM as a quantitative parameter, problems related to PVE and threshold values for tumour delineation are avoided.

Still there are some factors that might affect the accuracy and precision of SAM such as the calibration of the PET scanner, injected dose and patient weight. For this reason, another parameter, normalized SAM (SAM/mean BG) was introduced. As the aforementioned sources of error affect both SAM and the background similarly, normalized SAM is no longer affected by them. However, the dividing operation increases the statistical noise.

Another factor affecting the accuracy of all quantitative parameters is the reconstruction algorithm. Despite the fact that we found consistent SAM values for different sphere volumes, the values proved to be systematically lower than expected. It is well known that the reconstruction algorithm and the adjustment of its parameters can influence the quantification accuracy [10]. Also, inadequate scatter correction might be the cause of these consistently lower values. Especially in large axial FOV 3-D PET scanners the contribution of scatter from outside the FOV is a challenging issue that needs to be addressed [11].

Tumour volume assessment on FDG PET imaging is gaining wide clinical interest outside the field of radiotherapy planning, where its primary interest is biological target volume delineation [3, 12, 13].

First, in line with studies performing volumetric tumour staging by means of CT or MRI [1416], it was recently shown that the metabolic tumour volume as assessed by FDG PET contains independent prognostic information on disease-free and overall survival in various types of cancer [1722]. The assessment of the metabolic volume as described in these reports was performed using either manual or automated delineation of the tumour, e.g. region-growing or segmentation-based methods. Automated tumour contouring or delineation is more robust when compared to manual delineation. Unfortunately, in clinical practice the less reproducible manual method is most often the method used as most tumours present with an irregular pattern of FDG uptake and a corresponding complex tumour demarcation. More recently, some of the more advanced automated methods, e.g. gradient-based methods, the improved fuzzy C-means algorithm and the fuzzy locally adaptive Bayesian algorithm, have proved to be more robust and able to deal efficiently with complex tumour demarcation on PET images acquired on different PET/CT systems, using different voxel sizes and different tracers, and this both in phantom models as well as in patient studies [3, 4, 23, 24]. Thus, these segmentation algorithms may provide a robust and repeatable tool to aid physicians in determining functional volumes in PET imaging.

Opposed to automated techniques, SAM avoids the problem of tumour delineation as it assumes that the tumour tissue develops on top of the normal tissue, hence the subtraction of local surrounding normal tissue activity. Furthermore, SAM is only minimally influenced by the acquisition duration. In patients suffering from PSCC of the head and neck, the mean difference between SAM measurements derived from 1 min versus 10 min acquisition was only 4 %; for volume measurements using automated thresholding a mean difference in a similar setup of 16 % was reported previously [8]. While SUVmax values proved to be not significantly different between images of variable scan time, the variability was slightly larger (SD 6.4 % for SUVmax versus 5 % for SAM). SUVmax is by definition obtained from the hottest voxel and thus more prone to be affected by noise when compared to SAM [25].

It might be argued that by adopting SAM the true tumour volume is underestimated. However, the concept of SAM and background subtraction is fully in line with the current clinical practice in FDG PET interpretation in which normal scan findings or a complete response are defined as the absence of detectable lesions against the normal background.

Recently, the PET Response Criteria in Solid Tumors (PERCIST) framework was introduced for tumour response evaluation, in which the authors also identify TLG as an attractive parameter. PERCIST suggests that for tumour volume delineation a threshold should be used equal to 2-3 SDs above the mean SUV in healthy liver parenchyma of the patient [26]. Similarly, by subtracting background activity, SAM takes into account the variability in metabolic activity of normal tissue between scans, but SAM also takes it one step further as the background SUV is tissue specific.

The basic principle behind PERCIST is that tumour response is a continuous and time-dependent variable which is expressed as a percentage change in SUVpeak (average activity within a spherical region of interest measuring 1.2 cm in diameter centred at the most active portion of the tumour), rather than SUVmax between the pre- and post-treatment scans. PERCIST further specifies that the SUVpeak is to be obtained on the single most active lesion on each scan, which may be located in a different lesion on a follow-up scan. It is further recommended that a sum of SUVpeak of up to five target lesions (no more than two per organ) would be measured as a secondary determinant of response. In those lesions that are smaller than 2 cm in diameter, assessment of TLG and SUVpeak activity will be underestimated due to the PVE [6, 23]. Furthermore, the reduction in size of lesions <2 cm in diameter and the reduction in size of lesions >2 cm to a size prone to PVE will result in an overestimation of the metabolic response. Accordingly, the spread on the range of percentage change in TLG and SUVpeak in partial responders may vary significantly depending on the size of the lesions included for PERCIST analysis. As SAM is not influenced by the PVE, hypothetically, it is likely to significantly decrease the range of percentage change in partial responders. In the patient series presented, only one RECIST responder presented with a partial response on PET; as such the aforementioned hypothesis remains to be proven. As nearly all responders were complete responders, the AUC of 1 is likely to decrease in a setting which includes more partial responders. Nevertheless, the findings presented suggest that SAM which incorporates both the primary and secondary determinants of response suggested by PERCIST is a promising parameter for tumour response assessment of liver metastases. Additional studies in larger patient populations and different tumour pathologies, including tumours with more heterogeneous background activity, as well as in a mix of tumour responses are warranted.

Study limitations

Because of the retrospective character of the clinical study included, FDG PET/CT images were acquired within a range of 45-60 min following injection of FDG. However, no systematic differences in starting times (between baseline and response scans, between responders and non-responders or a combination of both) were observed.

The presented methodology has only been tested in conditions with a homogeneous background, namely the Jaszczak phantom and liver parenchyma. As already mentioned above, the performance of the presented methodology in other tumour pathologies with more heterogeneous background is not known. Phantom and clinical studies in this regard should also be performed.

In the clinical substudy, the change in SAM of lesions is related to the RECIST response. The use of these morphometric criteria is based on the abundant evidence showing that agents that produce tumour shrinkage will most likely also prolong survival [9]. In the last decade, PET has proven to be a useful tool in response assessment, even outperforming morphometric evaluation in several indications [2628]. Nevertheless, the RECIST criteria are still considered the gold standard in assessing treatment response by imaging. Obviously, further research is warranted to identify a possible relationship between SAM and progression-free or overall survival.

Another potential limitation has to be addressed. While SAM assesses the added metabolic activity on top of the background activity, it does not provide information on tumour volume or mean tumour SUV. In some applications, these two separate parameters are of interest. Accordingly, SAM should be considered as a complementary parameter that provides additional information to more commonly used PET response parameters like SUV values or metabolic tumour volume.

Conclusion

Similarly to TLG, the SAM method takes into account the tracer uptake as well as the size of the lesion, though avoiding the need of thresholding or another delineation method. The phantom study has demonstrated that the presented method does not suffer from PVE. Preliminary clinical data show that SAM has a potential role in tumour response assessment by means of FDG PET of liver metastases, though additional clinical studies in larger patient populations and different tumour pathologies, including tumours with more heterogeneous background activity, as well as in a mix of tumour responses are warranted.