Introduction

In patients with advanced stage squamous cell carcinoma of the head and neck (HNSCC), radiotherapy with or without concomitant chemotherapy is commonly the treatment of choice. Although accurate delineation of the gross tumour volume (GTV) is pivotal for high precision radiation therapy, there is significant uncertainty in tumour delineation using anatomical imaging (i.e. CT and MRI) due to interobserver and intraobserver variability [1]. HNSCC displays a number of characteristics that may adversely affect treatment outcome including intrinsic radiation sensitivity, tumour cell hypoxia and tumour cell repopulation during radiotherapy [2, 3]. The last of these tumour resistance mechanisms may be counteracted by concurrently administering targeted therapy such as epidermal growth factor receptor (EGFR) inhibitors, or by accelerating radiation schedules [4, 5]. Alternatively, enhanced dose delivery to the proliferative tumour subvolume may compensate for prolonged overall treatment times in conventional once-daily radiation schedules. Ideally, one would like to monitor tumour proliferation during treatment to steer counteractive measures based on the repopulative response.

PET is a noninvasive imaging modality capable of depicting functional characteristics of the entire tumour volume before and during treatment. 18F-Fluorothymidine (FLT), a thymidine analogue, is a PET tracer that reflects cellular proliferation [6]. In contrast to 18F-fluorodeoxyglucose (FDG), FLT does not accumulate in inflammatory tissue [7], which is frequently found in/near primary tumours of the head and neck or is induced during the course of (chemo)radiotherapy. It has previously been demonstrated that FLT accumulates in HNSCC, and that uptake diminishes significantly during therapy, preceding morphological changes [3, 810]. Similarly, FLT uptake in HNSSC has been shown to result in reproducible standardized uptake values (SUVs) before and during treatment [11, 12], and is therefore suitable for early response monitoring [12]. Thus, FLT PET may help identify highly proliferative tumour (sub)volumes, and select patients eligible for individual treatment modifications, ultimately leading to improved treatment outcome.

In order to transfer tailored therapy approaches from a research setting to clinical practice, the proliferative tumour volume (PV) should be accurately and objectively delineated. Several semiautomatic segmentation methods for PET functional volume determination have been developed over the last few years. Most of them have been validated using synthetic/simulated datasets, simplistic phantom acquisitions and visual tumour delineation, and occasionally against histopathological specimens [1317]. In addition, the majority of clinical evaluation studies using such algorithms have considered FDG tumour imaging.

The goal of the present study was to compare three semiautomatic PET segmentation methods for derivation of PV in primary HNSCC on sequential FLT PET images before and during (chemo)radiotherapy. The abilities of these methods to deliver usable and coherent PV segmentations were assessed and compared with that of visual delineation as a reference. Furthermore, segmented PVs were related to clinical outcome.

Materials and methods

Patients

From July 2006 until September 2008, eligible patients aged 18 years or older with newly diagnosed UICC stage II – IV HNSCC, treated with (chemo)radiotherapy with curative intent, were included in the study. Exclusion criteria were surgery as primary tumour therapy, palliative treatment and pregnancy. The Institutional Review Board approved the study and written informed consent was obtained from all patients. To evaluate clinical outcome, patients were followed until death or otherwise for at least 3 years. Data were censored on 31 December 2011.

Treatment

Eighteen patients were treated with three-dimensional conformal radiotherapy delivering a dose of 68 Gy in 2-Gy fractions to the primary tumour and metastatic cervical lymph nodes, and 44 Gy to electively treated nodal levels. The other patients were treated with intensity-modulated radiation therapy using a simultaneous integrated boost technique, delivering a dose of 68 Gy in 2-Gy fractions to the primary tumour and metastatic cervical lymph nodes, and 50.3 Gy in 1.48-Gy fractions to electively treated nodal levels. In all patients, an accelerated fractionation schedule was used with an overall treatment time of 5.5 weeks, delivering two fractions daily during the last 1.5 weeks of treatment. In accordance with institutional guidelines, 14 patients with cT3 and cT4 tumours, without contraindications and age below 70 years, were concomitantly treated with intravenous cisplatinum at 40 mg/m2 once weekly.

FLT PET/CT imaging

FLT was obtained from the Department of Nuclear Medicine and PET Research, Free University Medical Center, Amsterdam, The Netherlands. The procedure for FLT synthesis and purification has been described elsewhere [3, 18]. FLT PET/CT of the head and neck region was performed using a PET/CT scanner (Biograph Duo; Siemens/CTI, Munich, Germany). All patients received a FLT PET/CT scan prior to treatment, and a second FLT PET/CT scan was performed in 44 patients followed by a third scan in 28 patients. The first 14 patients accrued to the study did not receive a third FLT PET/CT scan because this was added in an amendment to the study protocol authorized by the Institutional Review Board 7 months after initiation of the study. All scans were acquired with the patient in the supine position, immobilized with an individual head support and a rigid customized mask to increase positioning accuracy and to prevent movement during image acquisition. Emission images of the head and neck area were recorded 60 min after intravenous injection of approximately 250 MBq of FLT. Two bed positions were recorded with a 7-min acquisition time per bed position. PET images were reconstructed using the ordered-subsets expectation maximization iterative algorithm with parameters optimized for the head and neck region (four iterations, 16 subsets, and a 5-mm FWHM, three-dimensional gaussian filter) and a matrix size of 128 × 128 × 94. Correction for radioactive decay to the start of the scan was applied to the PET images. Simultaneously, an intravenous contrast-enhanced (Optiray 300; Tyco Healthcare/Mallinckrodt Inc., Santa Monica, CA) CT scan for anatomic correlation and PET attenuation correction was acquired (120 mAs, 130 kV, rotation time 1.0 s, slice collimation 2.5 mm, and feed 4.3 mm).

Image analysis

PET and CT images were imported into the radiotherapy planning system Pinnacle3 (version 8.0d; Philips Radiation Oncology Systems, Andover, MA). With this software, consecutive CT and PET scans were resampled and registered to the first CT scan using cross-correlation and rigid transformation. Tumour SUVpeak was defined as the mean SUV of the hottest voxel in the tumour and its eight surrounding voxels in one transaxial slice. Two experienced radiation oncologists delineated the GTV of the primary tumour on the CT scans in consensus using all available clinical and anatomical information. The PVs on consecutive FLT PET images were visually delineated (PVVIS) by a trained radiation oncologist. In previous studies, the value of SUVpeak and two commonly used segmentation methods, the 50 % isocontour of SUVpeak and signal-to-background ratio (SBR) [13] methods, have already been addressed in patients in this cohort and were therefore not reassessed here [3, 9]. The following three semiautomatic segmentation methods for determining the PV were applied.

The first was the iterative background-subtracted relative-threshold level (RTL) method using Pinnacle3 scripts (PVRTL). This model-based method provides threshold levels independent of the SBR and is based on the convolution by the scanner’s point spread function (PSF) of a sphere that takes into account differences in sphere size and scanner resolution [14]. Thresholds for RTL were iteratively derived using the adopted PVVIS and a manually defined background ROI in the neck musculature (about 10 cm3) at a sufficient distance from the vertebrae, the primary tumour, and lymph node metastases. This iterative algorithm was applied until the threshold level changes were less than 0.1 %. In the second method, the PV derived from the watershed transform and hierarchical clustering (W&C; PVW&C) segmentation was created using the Matlab-based program IMRE viewer (Université Catholique de Louvain, Belgium) [15]. This is a gradient-based method which allows better estimation of the gradient intensity by first denoising with an edge-preserving filter and partial volume effect correction with a constrained iterative deconvolution algorithm before gradient-based delineation is applied. A PSF with a FWHM of 8 mm was chosen to deconvolve the images. Visual inspection of the segmented volumes allowed more accurate positioning of the volume of interest in which the algorithm performed the segmentation. In the final method, the PV derived from the fuzzy locally adaptive Bayesian (FLAB) algorithm (FLAB; PVFLAB) was obtained using specifically designed software (ImageD; LaTIM INSERM laboratory, Brest, France) [19]. This approach is based on computing the probability that each voxel belongs to a given class (i.e. tumour or background) by taking into account the intensity with respect to the statistical distributions of the voxels in the various regions of the image, as well as spatial correlation with neighbouring voxels. In cases where the automatic FLAB algorithm initialization procedure led to unsatisfactory results on visual inspection, the user exploited the available option of manually adjusting the initialization parameters (mean and/or standard deviation of tumour and/or background uptake) and restarting the automatic FLAB volume estimation process.

Images were not segmented when the radiation oncologist and nuclear medicine physician were unable to identify any FLT accumulation within known lesions. Before accepting the results, both medical specialists visually inspected the segmented PV obtained by the semiautomatic algorithms. Segmentation was deemed successful if it adhered to the anatomical boundaries of the primary tumour as defined in the first CT scan, and did not delineate a large area of evidently normal tissue.

Statistical analysis

Differences between groups were tested using linear mixed model analysis (random intercept patient effect, fixed method effects). Using untransformed PV as dependent variable led to relatively large skewness (6.38) of the residuals. Therefore, the volumes were logarithmically transformed, resulting in a considerably smaller skewness of 0.96, enabling parametric analysis. Correlations were determined using the Pearson test.

Median follow-up was determined using the inverse Kaplan-Meier method. The changes in GTVs based on CT and PVs obtained by FLT PET during treatment were correlated with long-term clinical outcome. Four-year locoregional control (LRC), disease-free survival (DFS) and overall survival (OS) (± standard error of the mean) were analysed using Kaplan-Meier estimates and compared using the log-rank test. LRC was defined as no recurrence of primary tumour or cervical lymphadenopathy at the date of last follow-up. DFS was defined as freedom from locoregional or distant recurrence at the date of last follow-up. OS was defined as the time to the date of death from any cause, or patients were censored at the date of last follow-up. Survival parameters were related to the date of the first PET/CT scan. Continuous variables were dichotomized at the median. Two-sided p values <0.05 were considered to indicate significance.

Statistical analyses were performed using SAS version 9.2 (SAS institute Inc., Cary, NC) and GraphPad Prism version 5.04 (GraphPad Software, La Jolla, CA).

Results

Patient characteristics

The study accrued 52 patients with in total 54 primary HNSCC. Six patients were excluded from the present analysis, two due to technical difficulties and four because they were treated in a clinical phase III trial incorporating modification of the tumour oxygenation status [20] which deviated from the standard institutional treatment guideline. Therefore, the final population eligible for analysis consisted of 46 patients undergoing (chemo)radiotherapy with 48 HNSCCs in total. Their average age was 60 years at enrolment in the study, ranging from 39 to 75 years. Median follow-up was 52 months (range 8 – 65 months). The pretreatment FLT PET/CT scan was performed at a median of 4 days (range 0 – 14) before the start of treatment in all patients, the second scan in the 2nd week (after a median of 7 fractions, range 4 – 12 fractions) in 44 patients with 46 tumours, and the third scan in the 4th week of (chemo)radiotherapy (after a median of 18 fractions, range 15 – 19 fractions) in 28 patients with 30 tumours. Patient and tumour characteristics, as well as events are presented in Table 1.

Table 1 Patient and tumour characteristics, as well as events

Comparison of CT-delineated GTVs and PET-segmented PVs on pretreatment scans

Figure 1 shows the CT-delineated GTVs and the PET-segmented PVs on the pretreatment scans. The mean PVs ranged from 9.4 to 14.4 cm3. All segmentation methods resulted in statistically significantly different PVs (p = 0.003). PVVIS was the largest (mean volume ± standard deviation, 14.4 ± 12.7 cm3) and correlated best with PVFLAB and GTVCT (Pearson correlation coefficient 0.85, p < 0.0001;, and 0.88, p < 0.00001, respectively). Correlations of PVVIS with PVRTL and PVW&C were weaker, although still statistically significant (Pearson correlation coefficient 0.77, p < 0.0001, for PVRTL; 0.77, p < 0.0001, for PVW&C).

Fig. 1
figure 1

Plots of successfully delineated PVs from segmentation methods on consecutive FLT PET scans: a visual delineation, b RTL, c W&C and d FLAB (e CT volumes per scan). The bottom and top of each box represent the 25 % and 75 % percentiles. The horizontal line within the boxes is the median. The lower and higher extremes of the whiskers represent the minimum and maximum values

Delineation of PVs on repeat FLT PET/CT scans

Previous studies have shown that FLT SUVpeak decreases significantly during treatment [3, 9]. CT volumes did not show a significant decrease until the 4th week of treatment (mean decrease 1.6 mL/7 % between the first and second scan, p = 0.1; mean decrease 4.8 mL/41 % between the first and third scan, p < 0.0001). Regarding PVs, PVVIS showed a significant decrease between one FLT PET/CT scan and the next (p < 0.0001; Fig. 1). Changes in PVRTL were small and not statistically significant (overall difference between scans, p = 0.33). PVW&C and PVFLAB both showed a significant reduction, with a steeper decrease in PVFLAB than in PVW&C (p < 0.0001 for decreases in PVFLAB between all scans, and in PVW&C between the first and second scan; and p = 0.16 for decrease in PVW&C between the second and third scan).

As illustrated in Figs. 2 and 3, the RTL, W&C and FLAB methods showed a considerable decrease in the percentage of PV delineations deemed successful on the second and third scan. FLAB demonstrated a higher success rate than RTL and W&C (e.g. in the third scan, 67 % of tumours as compared to 40 % for PVW&C and 27 % for PVRTL).

Fig. 2
figure 2

Rates of accurate tumour segmentation using the three segmentation methods RTL, W&C and FLAB as a percentage of all PET/CT scans performed in comparison with visual delineation on PET images

Fig. 3
figure 3

PVRTL (red) and PVFLAB (blue) on the scans before treatment (a), and after 7 fractions (b) and 17 fractions (c) of radiotherapy in a patient with a cT4N0M0 oropharyngeal tumour who was treated with radiotherapy only. When FLT uptake is evidently higher in the tumour than in surrounding tissues, both segmentation algorithms give approximately the same segmentation (a, b). As the tumour FLT signal approaches background levels (c) and the proliferating tumour volume decreases significantly, the RTL algorithm produces a delineation that clearly expands beyond the initial tumour boundaries (as delineated on the primary CT scan), and was thus termed “unsuccessful”

Influence of the primary tumour site and the treatment regimen on PV

Since the delineation of oropharyngeal tumours on FLT PET scans may be hampered by proliferative activity in nearby lymphoid tissues [21], the influence of the tumour site on the performance of the segmentation methods was assessed. On the pretreatment scans, there was no statistically significant difference in the proportion of PV delineations deemed successful between the various segmentation methods when comparing oropharyngeal and oral cavity tumours with HNSCC at other sites (p = 0.36). However, the percentage of segmentations deemed successful during treatment was in general lower for tumours in the oropharynx and oral cavity than in tumours at other sites (Table 2). The change in PV during (chemo)radiotherapy was not related to the primary tumour site (data not shown).

Table 2 Percentages of PV segmentations by RTL, W&C and FLAB deemed successful on consecutive scans in patients with oropharyngeal and oral cavity tumours versus HNSCC at other sites

There were no significant differences in PV changes over time between the patients receiving radiotherapy and those receiving chemoradiotherapy. The number of successful segmentations on the second scans was comparable between the treatment groups. No conclusions could be drawn regarding the percentage of accurate segmentations on the third scans in relation to treatment due to the small number of patients receiving chemoradiotherapy (Table 3).

Table 3 Comparison of percentages of PV segmentations by RTL, W&C and FLAB deemed successful on the consecutive scans in patients treated with radiotherapy versus chemoradiotherapy

Correlation between semiautomatically segmented PV changes in sequential FLT PET scans and clinical outcome

Changes in the CT-based GTVs during (chemo)radiotherapy were not statistically significantly correlated with treatment outcome. Due to the low number of successful segmentations during the course of (chemo)radiotherapy, PVRTL was not related to treatment outcome. Changes in PVW&C between pretreatment and the subsequent two scans were not significantly predictive of outcome. For PVFLAB, a median decrease between the pretreatment scan and the scan acquired in the 4th week of therapy of more than 7.39 cm3 was predictive of a better 4-year DFS (90 ± 9.5 versus 53 ± 17.6 %, p = 0.040; Fig. 4). Significant correlations were not found between PVFLAB obtained at 2 weeks and outcome parameters, nor between PVFLAB obtained at 4 weeks and LCR or OS at 4 years (results not shown).

Fig. 4
figure 4

Patients with a PVFLAB decrease above the median between the baseline scan and the scan in the 4th week of (chemo)radiotherapy demonstrated a better DFS than patients with a PVFLAB decrease below the median

Discussion

In this study in patients with advanced HNSCC, of three semiautomated PV segmentation methods, the highest percentage of delineations deemed successful was obtained using the FLAB algorithm. The reduction in PVFLAB during therapy was also correlated with clinical outcome.

Common automated segmentation methods using fixed thresholds such as the 50 % isocontour of the SUVpeak, are easy to use but subject to criticism because the threshold is fixed arbitrarily. These methods are sensitive to contrast variations as well as to noise [19, 22]. In a recent study [9], we showed that PV50% leads to grossly overestimated functional volumes on repeat imaging, and therefore this method was dismissed from in-depth analysis. The same holds true for the adaptive threshold segmentation method SBR developed by Daisne et al. [13]. The decrease in signal-to-background contrast during treatment hampers this segmentation method. The RTL method, being independent of the tumour-to-background ratio because of background subtraction, and being based on tumour radius and image resolution only [14, 23], was the least robust advanced semiautomatic segmentation method in this study, especially for the third scan. On visual inspection of the results, the segmented PVRTL encompassed normal vital structures, such as vertebrae, vessels, submandibular gland, tongue and soft palate, and thus the resulting segmentations were considered unsuccessful. An example of this can be seen in Fig. 3. These unsuccessful segmentations were probably due to a reduced performance of RTL when PVs became smaller than the FWHM of the PSF of the PET/CT scanner during (chemo)radiotherapy, and to a decrease in contrast between tumour and surrounding tissue on repeated PET/CT scans [14].

Also with the gradient-based W&C method, the number of successful segmentations diminished considerably, although the overall success rate was higher than with the RTL method. “Semiautomatic” corrections imply that the volume of interest for the algorithm can be changed and small changes in the binning step can be made. As with the previously discussed methods, the dissociation between low FLT uptake in PET-positive volumes and the background on pretreatment scans hampered the accuracy of this method. The uptake registered as background uptake might even be increased with the W&C method because of preprocessing steps. Notably, the gradient-based methodologies have been validated for accuracy against phantom images [15, 24] as well as histopathology in head and neck tumours [15] and lung tumours [25, 26]. They have also been applied to clinical datasets to derive image-based parameters of clinical prognostic value [27, 28]. However, the W&C method was validated using FDG datasets only, and in this study using repeat FLT PET scans, performance of this method decreased during therapy.

The FLAB segmentation method resulted in the highest percentage of segmentations deemed successful and the delineated tumour volumes significantly decreased over time. However, it should be emphasized that this high success rate resulted from manual adjustment of the input parameters (mean and/or standard deviation of tumour and/or background uptake); this user-dependent operation was performed in 6 % of the first, 28 % of the second and 55 % of the third scans. The possibility to manually adjust the input parameters during segmentation thus seems to be an advantage over the other semiautomatic segmentation methods, but also induces operator dependency. FLAB is less sensitive to image noise than the other segmentation approaches tested here. Its robustness, reproducibility and repeatability have been assessed for (sequential) FDG and pretreatment FLT PET imaging of primary tumours at different sites [2933]. Furthermore, a number of FDG image-derived parameters obtained with the FLAB algorithm have been proven to be of prognostic and predictive value [33, 34]. To date, no FLAB volume measurements on sequential FLT PET images have been performed.

A limitation of this study was that the results of the segmentations could not be validated against histopathological resection specimens, as this treatment regimen was chosen for organ preservation purposes. This also precluded immunohistochemical verification of proliferation decreases during therapy as found with the consecutive PET/CT scans. However, a previous study has validated the use of FLT against immunohistochemical proliferation markers for primary HNSCC [35]. Since we could not validate the segmentation methods against a ground-truth, we compared the semiautomated segmentations with manually delineated PVs. PVVIS is highly interoperator and intraoperator dependent and reliable application requires a learning curve and a standardized approach such as, for example, PET image window level settings. However, when standards are applied this method can be used for qualitative assessment of the capacity of automated algorithms to delineate a PV; for example, if an intratumoral volume with an FLT uptake level above that of the background and of surrounding tissues is discerned qualitatively. An “ideal” automated segmentation algorithm would be capable of segmenting a PV within the physical boundaries of the tumour. If the tested method proves unable to achieve this, it may be assessed as “unsuccessful”.

In an earlier analysis of the current patient cohort, we observed that a decrease in SUVpeak of 45 % or more or a decrease in PVVIS of 41 % (median) or more in the second treatment week is associated with a better 3-year DFS rate [9]. Among the semiautomated algorithms evaluated here, only a decrease in PVFLAB was predictive of 4-year DFS, but not until the fourth treatment week. The combined results from the studies in this cohort indicate that, although a decrease in PVFLAB is useful for prognostication purposes, early alterations in treatment might best be founded on SUV-based observations. However, SUV by itself does not convey information about the PV. The FLAB segmentation method can supply robust volumes before and during therapy. Defining the SUVpeak and the PV on FLT PET prior to treatment may help identify patients who will benefit from customized antiproliferative treatment, such as accelerated radiotherapy [5], or from the addition of antiproliferative systemic treatment to radiotherapy, e.g. EGFR inhibition [4]. In patients not eligible for these two treatment approaches, enhanced dose delivery to the FLAB-defined proliferative tumour subvolume during treatment, as this changes during therapy, may compensate for prolonged overall treatment times in conventional once-daily radiation schedules [3]. FLAB-based FLT PET segmentation may be a guidance tool with respect to target volume reduction in patients who show an excellent proliferative response during therapy and may receive over-treatment when the original target volume is adhered to. These assumptions need to be explored in further investigation to warrant eventual clinical application of PVFLAB.

The present study indicated that FLAB is a promising candidate for radiation target volume adaptation based on sequential FLT PET scanning. However, there is a need for improved automatic initialization of the FLAB method before it can be applied with reduced user dependency in common clinical practice. Similarly, results obtained with the gradient-based method could be improved by adjusting preprocessing steps for FLT data and/or allow manual adjustment.

Conclusion

In this study, the FLAB algorithm performed best in terms of segmentation of the proliferative FLT PET tumour volume and correlation with patient outcome, when compared to the other semiautomatic segmentation methods RTL and W&C. With further operator-independent automation of the method, it may be a good candidate for FLT PET-guided individualized treatment choices, radiotherapy delivery strategies and response assessment.