Introduction

Lung cancer is a leading cause of cancer-related death throughout the world [1] and has a wide variety of courses and prognosis. More than 85% of lung cancer patients are diagnosed as non-small cell lung cancer (NSCLC), of which approximately 70% present unresectable disease. NSCLC is considered a rapidly proliferating pathology. Treatment strategies include surgery, radiation therapy (RT), chemotherapy (CH) and genotype-driven therapies [2,3,4]. At present, concurrent chemo-radiotherapy (CRT) is considered the neoadjuvant treatment of choice in patients with operable disease. For non-surgical patients (IIIA or IIIB), chemo-radiotherapy (CRT) is considered the standard treatment; the two treatment modalities can be sequential or concurrent, where the concomitant schedule is more efficacious but more toxic [5]. Nonetheless, the prognosis of patients with locally advanced NSCLC is still very poor (median OS 21–28 months), despite the progress in diagnosis and staging, the development of biologic therapies, and the improvement in delivery RT. In this setting, the possibility to select responder patients early during treatment, or to give a boost dose to the tumours of non-responders, is very attractive and challenging [6].

The administration of Fluorodeoxyglucose (18F–FDG) radiopharmaceuticals is a non-invasive procedure that allows observing by a Positron Emission Tomography (PET) detector the cells with prominent glucose uptake, such as more metabolically active or proliferating cells, including cancer cells. In the last two decades, PET imaging with 18F–FDG has undergone increasing applications in cancer care, nowadays performed by a Positron Emission Tomography (PET) detector associated with Computed Tomography (CT) (PET/CT). 18F–FDG-PET/CT is widely used for disease staging, detection of recurrence, target volume delineation for RT, and assessment of response to CH and CRT [7,8,9]. More recently, the research has paid special attention on the potential of ad interim 18F–FDG-PET/CT (FDGint) acquired during the radiation treatment, aiming at the early detection of metabolic changes brought on by therapy and the distinction between responder and non-responder patients.

CRT rapidly decreases proliferation in responding tumours, so diminished metabolism as measured by FDGint may precede structural changes that typically correlate with tumour response, as confirmed in several tumours [10]. Conversely, the visualization of residual proliferative tumour cells could, in principle, bear out radio-resistant tissues and suggest a prompt therapy modification.

There are two main rationales exploiting the metabolic variations detected by FDGint.

The first rationale is related to response and prognostication. Patients early responding to neoadjuvant CRT could have a better survival, and benefit from further preoperative treatment. Alternatively, in early non-responders, useless CRT could be discontinued to avoid unnecessary toxicity [11, 12].

The second rationale is the possibility to perform an adaptive RT plan based on the FDGint, guiding therapy during the course of RT towards a more favourable therapeutic ratio and personalized cancer treatment, as applied, e.g., in head and neck cancer [13].

FDGint may be used to select poor responders, who could benefit most from dose intensification protocols or who could be treated alternatively with chemotherapy alone. Moreover, the early interruption of an ineffective therapy avoids or reduces the risk of side effects and allows a more expedient start of treatment targeted for resistant disease. Finally, the execution of FDGint could be useful even in responder patients, as changes in 18F–FDG uptake distribution associated with tumour shrinkage could be used to better define the target volume modification during RT (adaptive RT, dose painting by numbers [14, 15]).

The interest for the FDGint in the assessment of prognostic and adaptive capability to RT or CRT is supported by a broad literature. However, its role still needs to be explored. Some studies support in various districts the two rationales described above, and suggest individual therapeutic decisions based on FDGint [16,17,18,19,20,21]. Yet, others do not [22,23,24]. Attempts to provide reasons for these controversies have been made with studies pointing out possible non-optimal time for FDGint image acquisition and/or possible misinterpretation due to radiation-induced inflammation [12, 25, 26]. Other authors focused their efforts on the study of new radiopharmaceuticals potentially more specific than 18F–FDG, developed to identify proliferative activity or hypoxia [27, 28]. Nevertheless, their application is still a niche research and, to date, the role of 18F–FDG remains undisputed for its proven usefulness and properties, superior in some cases [29].

In NSCLC, 18F–FDG-PET/CT has been essentially performed at the end of neoadjuvant therapy, giving evidence that the quantitative metabolic parameters derived from this examination significantly correlates with histology, response, and overall survival (OS) [30,31,32]. Similar results have been obtained for early metabolic tumour response, involving, however, only chemotherapy and targeted genotype-driven therapies - in particular patients that present, for example, epidermal growth factor receptor (EGFR) mutations or NSCLC harbouring anaplastic lymphoma kinase fusions [2,3,4].

In this study, we focused on the smaller literature describing the potentials of FDGint. In principle, different behaviours exist compared to chemotherapy alone, as radiation-induced inflammation might enhance the 18F–FDG uptake (see “Timing and inflammation” paragraph). Therefore, even in those cases where RT or CRT is the treatment of choice, there are no recognized standards of care specific for the histology of lung cancer. This has lead the researchers to prudently wait for ampler analysis before modifying the therapeutic decision based on FDGint.

The whole scenario highlights the need of a thorough literature review, providing clinical based evidence of FDGint potentials in RT or CRT [8]. The present systematic review assembles the original papers available in the last decade, specific to FDGint and addressed to NSCLC.

Methods

Six different searches were completed on Medline (http://www.ncbi.nlm.nih.gov/pubmed) and Embase (http://www.embase.com), using keywords combined by Boolean operators (“and”, “or”):

  1. i)

    (lung cancer) AND (predictive OR prediction OR response assessment OR response OR assessment) AND (early OR ad interim) AND therapy AND (FDG OR 18F–FDG) AND (PET OR PET/CT);

  2. ii)

    (positron emission tomography) AND (18F–FDG-uptake) AND (SUV) AND (lung cancer);

  3. iii)

    (lung cancer) AND (adaptive radiotherapy) AND (FDG OR 18F–FDG) AND (PET OR PET/CT);

  4. iv)

    (early pet OR interim pet) AND (lung);

  5. v)

    (early PET OR interim PET) AND (lung cancer) AND (metabolic response) AND (radiotherapy);

  6. vi)

    (18F–FDG PET/CT) AND (chemoradiotherapy) AND lung cancer.

Papers published from January 2005 to December 2016 with studies involving FDGint in patients affected by NSCLC were selected. Studies inherent to FDGint for adaptive RT were also included. From these queries, the papers in language other than English, and studies of FDGint related to chemotherapy (CH) only, or post neoadjuvant treatment were excluded.

A further restriction was the consideration of studies with the hybrid system PET/CT instead of PET only, to gather more uniform, recent, and accurate data as compared to those from PET alone. The abstracts of the papers were read to exclude those evidently out of aim. The final choice was derived from reading the full papers selected to verify that none of exclusion criteria were present. Furthermore, we carefully made a manual search of the cross-references of identified studies and relevant reviews, in order to complete the literature pursuit. Reviews and editorial letters were not included in the analysis, but their references were also checked.

Findings

Figure 1 represents the literature retrieval workflow according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [33].

Fig. 1
figure 1

Search strategy and exclusion criteria. Flowchart according to the PRISMA guidelines [33]

After the various steps of the process selection, 21 studies were extracted out of 970 in PubMed and 1256 in Embase researches [19,20,21, 25, 26, 34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49]. Twelve studies were from Europe, the others from Asia (6), and North America (3). All but three studies [38, 43, 44] were prospective and analysed a total of 627 patients with NSCLC (mean: 30; range: 6–66). The details of the studies analysed are shown in Tables 1, 2, 3 and 4.

Table 1 FDGint for early response, prognosis and variation vs. time: Changes of 18F–FDG uptake during RT or CRT
Table 2 FDGint for early response, prognosis and variation vs. time: Assessment of NSCLC response to RT or CRT
Table 3 FDGint for early response, prognosis and variation vs. time: Prognostic value during RT or CRT
Table 4 FDGint for adaptive RT in NSCLC

Table 1 gathers five papers concerning Changes of 18F-FDG uptake during RT or CRT, that did not perform any study of specific relationships with response but simply observed the time trend of the radiotracer 18F–FDG uptake during RT or CRT [25, 26, 34,35,36].

Table 2 gathers three papers concerning the “Assessment of NSCLC response to RT or CRT“ [19, 37, 38]. In this category, the Response Evaluation Criteria In Solid Tumours (RECIST) criteria were applied to classify the response.

Table 3 gathers eight papers with the common aim to evaluate the “Prognostic value of FDGint during RT” [20, 39,40,41,42,43,44,45].

Overall, FDGint sensitivity and specificity were reported only in four studies [19, 40, 42, 44], and are compared in Fig. 2 for the different PET parameters analysed.

Fig. 2
figure 2

Sensitivity, Specificity and AUC of FDGint for different PET parameters reported in four papers [19, 40, 42, 44]. Legend: TR = Tumour Response; TP = Tumour Progression; 2y OS = Overall survival at 2 years; AUC = Area Under the Curve (ROC); COV / ΔCOV = Coefficient Of Variation of SUV and correspondent variation; MTV / ΔMTV = Metabolic Tumour Volume and correspondent variation; SUVmax / SUVmean / ΔSUXmax / ΔSUVmean = maximum SUV / mean SUV and correspondent variations

Finally, Table 4 reports the details of five papers [21, 46,47,48,49] focused on the potential of FDGint as a guide for biologically adaptive RT in NSCLC, with possible intensification of CRT.

A large heterogeneity is appreciable as regards the stage and histology of included patients. Eight papers included only stage III pts. [20, 21, 25, 39, 41, 44,45,46], three papers included even stage IV pts. [19, 40, 48] and the remaining ten papers evaluated stage I-III or II-III pts. [26, 34,35,36,37,38, 42, 43, 47, 49]. Even treatments exhibited high heterogeneity, as they consisted in concomitant CRT, radiotherapy alone or both, concomitant CRT or radiotherapy alone in the same study population. Moreover, more than half of the patients received induction chemotherapy in some studies. The 3-D RT was applied in the majority of the studies, but in six, Intensity Modulated Radiation Therapy (IMRT) was used.

The timing for the acquisition of FDGint did not find agreement among researchers, ranging from one to 5 weeks after the start of RT (performed at a variable delivered dose of 20–50 Gy).

The main parameters considered for PET image evaluation were the maximum and the mean values of the Standardized Uptake Value (SUVmax and SUVmean, respectively), the metabolic tumour volume (MTV), the Total Lesion Glycolysis (TLG, calculated by multiplying MTV by the mean SUV) and their correspondent variations (ΔSUVmax, ΔSUVmean, ΔTLG, ΔMTV). Also, follow-up was very variable, ranging from 1 to 68 months.

Discussion

This review has assembled the findings on NSCLC with the intent to clarify the issues emerging from the use of FDGint in conjunction with RT or CRT. The experience in NSCLC finds a total of 21 papers (Tables 1, 2, 3, 4). Globally, even considering the large heterogeneity, these studies have shown that FDGint is a very promising tool in the evaluation of response and prognosis of NSCLC, as well as for adaptive RT.

From the five papers simply describing the time variation of 18F–FDG uptake during RT or CRT (Table 1), interim PET parameters emerge as capable of detecting early metabolic modifications, linked to metabolic response [34, 35], and to longer Progression Free Survival (PFS) in patients with complete metabolic response [36]. In particular, Massaccesi et al. found a decrease of the basal SUVmax and metabolic tumour volume earlier during the third week of CRT [36]. Van Baardwijk et al. demonstrated different evolution of SUVmax for metabolic responders (no changes during RT) and non-responders (consistent increase at all time points, up to ∼50%) [34]. Giovacchini et al. found a progressive significant decrease of SUVmax at any time in metabolic responders [35]. Edet-Sanson et al., who performed FDGint every 14 Gy in order to determine an optimal time window for tumour response [26], observed a linear decrease in SUVmax for tumours and lymph nodes, with ∼50% decrease in SUVmax around 40–45 Gy. De Ruysscher focused the study on normal lung uptake and found higher lung SUVmax at one and 2 weeks during treatment in six out of 18 patients who later developed Radiation Induced Lung Toxicity (RILT) [25].

The three studies focused specifically on the FDGint assessment of NSCLC response to RT or CRT (Table 2) demonstrated that metabolic changes measured by FDGint were predictive of tumour response after treatment established with RECIST criteria [19, 37, 38]. Despite the differences in stages, treatment approaches, and timing of FDGint, these studies were able to distinguish responders from non-responders, and investigated parameters that found correlations. Kim et al. found that ΔSUVmax and ΔSUVmean were significantly larger in responders, whilst ΔMTV or ΔTLG had no correlation with outcome [38]. Huang (2011) et al. found that not only ΔSUVmax and ΔSUVmean but even ΔMTV were significantly more pronounced in responders [19]. Responses of tumours and lymph nodes during RT correlated with those at 3 months after treatment in the study by Kong et al. [37].

Also, the eight papers inherent in the prognostic value of FDGint verified the capability to identify groups with better PSF and survival rate (Table 3) [20, 39,40,41,42,43,44,45]. There were several PET parameters found to be useful to distinguish responder and non-responder groups: ΔSUVmax of 50% for significant response and survival rate at 1-year and 2-years [39]; ΔSUVmean of 15% with statistically different survival (2-year OS of 33% vs. 92%) [40]; absolute SUVmax of ∼5 as the single variable predictive of death or tumour progression at 1 year in multivariate analysis [42]; cut-off values for survival curves of ΔSUVmax, ΔSUVmean and ΔMTV of 37%, 42% and 30%, respectively, with ΔMTV being the only independent prognostic factor for OS [41]; ΔTLG of 15% as the more robust predictor for OS and PFS, with higher ΔTLG correlating with better OS in multivariate analysis for stage IIIB [43]; ΔTLG of 40%–50% for longer PFS [20].

Grootjans et al. evaluated different automatic segmentation algorithms, such as the fixed threshold region (MTV40 and MTV50), iterative relative-threshold level (RTL), signal to background ratio (SBR), and fuzzy locally adaptive bayesian algorithm (FLAB). The authors found that the sum of the ΔTLG of the tumour and of lymph nodes (ΔTLGs) was significantly associated with PFS and OS, with SBR as the method of choice for calculation of TLG [45]. Dong et al. evaluated instead the early change of metabolic tumour heterogeneity during CRT by textural features including SUVmax and MTV and, at multivariate analysis, they found that contrast (a textural feature derived from the normalized grey-level of the co-occurrence matrix) was the only feature with significant independent prognostic value, with Δcontrast% > 70% associated with improved PFS and OS [44].

On the whole, all these studies showed that FDGint is useful to identify responder groups and different survival rates. Certainly, the lack of univocal PET parameters is a major limitation while standardization would be required for best practice.

Biologically adaptive radiotherapy

Biologically adaptive RT is an approach that modifies the initial treatment planning during the course of RT according to interim images. The advantage of this technique resides in accounting for the geometrical and biological variations induced by RT, instead of applying the same baseline characteristics to all fractions. After delivery of a certain dose, the treatment plan is re-optimized aiming to “boost” the dose to the residual tumour, while sparing healthy tissues. Besides, biologically adaptive RT also considers that the target tissues may not have uniform radiobiological characteristics, due to spatial differences in metabolism, radio-resistance, hypoxia, and proliferation. Techniques like IMRT allow modulating the dose distribution accordingly, e.g., using the dose-painting-by-numbers strategy, which entails assigning different doses to voxels based on PET voxel intensities. Thus, IMRT is ideal to boost a more 18F–FGD avid area or a residual 18FDG avid area in a tumoral mass identified with FDGint [50, 51].

Five papers were found focusing on adaptive RT (Table 4) and reporting proof-of concept treatment planning studies that used FDGint to design a boost along the remaining RT fractions. However, patient treatment was not modified according to FDGint outcomes.

The study by Gillham et al. considered two hypothetical dose escalation strategies, the standard (66 Gy + 12 Gy-boost to the planning target volume) and one based on the maximal dose that would not exceed the normal tissue constraints. Authors found that the adaptive strategy would result in a modest dose escalation, and would have been feasible in 4 out of 10 patients [46].

More encouraging results were observed by Feng et al., who designed a boost based on FDGint that was feasible in six out of 14 patients, with a mean dose escalation of 58 (30–102) Gy to the target or a 2% reduction in Normal Tissue Complication Probability [47].

A much larger patient population (66 patients) was analysed by Ding et al., who demonstrated that adaptive RT based on FDGint after 40 Gy could spare normal tissues (lung, spinal cord, oesophagus and heart) and potentially allow dose escalation and increased local control [21]. Similar results were obtained by Kelsey et al. in 17 patients submitted to adaptive RT [48].

Finally, the study of Yap et al. demonstrated that the use of volumes from a respiratory-gated 4D PET/CT is feasible to dose escalate primary and nodal disease [49].

The results of these studies show the feasibility of biologically adaptive radiotherapy with a great potential for loco-regional control in NSCLC and to be more sparing of normal tissues.

Timing and inflammation

The different timing of FDGint and the differences in the treatments performed (RT alone or CRT) have been identified as causes of the discordant radiotracer uptake observed during RT in different investigations [34,35,36]. The timing of FDGint in the papers analysed varied from 1 to 6 weeks (with doses ranging from 14 Gy to 50 Gy).

The hypothesis of van Baardwijk et al. was that differences in time trends of 18F–FDG uptake might reflect a complex relationship between several factors, such as changes in blood flow and extracellular compartment together with intrinsic tumour properties [34].

Other authors highlighted that 18F–FDG uptake after CRT might be due not only to recurrent tumour but also to RT-induced inflammation. This is because, in principle, 18F–FDG is unable to distinguish inflammatory cells from residual viable tumour cells [25, 52].

Deepening the issue, Kong et al. showed that 18F–FDG uptake within irradiated normal lung had no significant difference between basal 18F–FDG-PET/CT and FDGint, while 18F–FDG-PET/CT was higher on post-CRT scan in 46% of patients [37]. Edet-Sanson et al. confirmed these results, finding that RT-induced inflammation does not preclude an adequate evaluation of FDGint as well [26]. Giovacchini et al. could differentiate inflammation from tumour uptake based on CT morphology (presence of ground-glass attenuation and fibrous thickening) and uptake patterns, and showed that RT-induced increase in parenchymal uptake was less frequent at 50 Gy than at 3 months after RT [35].

In this field, De Ruysscher et al. found that higher 18F–FDG uptake in the lung tissue well outside the target, early during radiotherapy, reflects subclinical RILT [25]. This seems in contrast with the paper by Kong et al. [37], but it could be due to the different treatment schedule (twice vs. once daily), timing of FDGint, and definition of the region of interest. The possibility of an early identification of patients at risk to develop RILT would have great clinical value, possibly leading to a therapy switch.

Further analysis is needed in targeted studies with larger populations of patients, homogeneous for stage, FDGint evaluation, and treatments, in order to better define the ideal timing for FDGint. Overall, FDGint acquired at 2 weeks seems a good compromise to avoid the overlapping of tumour response and tissue inflammation.

PET parameters

SUVmax is the most used parameter to assess therapy response in PET images. SUVmax is easy to determine, reflects the uptake of the metabolically active regions, and has a good intra-observer reproducibility. More recently, the PET Response Criteria in Solid Tumours (PERCIST, 2009) [53] proposed the mean SUV (SUVmean) on a small region as a suitable parameter to assess response of the whole tumour. Moreover, other parameters, such as the Metabolic Tumour Volume (MTV) and Total Lesion Glycolysis (TLG), have shown a prognostic capability in NSCLC [54]. The parameter variation resulted in significant correlation with survival in all studies evaluating the prognostic value of FDGint in NSCLC [20, 39,40,41,42,43]. However, different parameters and cut-offs were found that separate responders from non-responders.

One of the reasons of non-unicity might be related to the non-standardized segmentation of the MTV. As a clear example, Edet-Sanson et al. used three methods for PET volume measurement in FDGint (visual delineation, MTV40, adaptive threshold) and all gave adequate volume measurements during the first week of RT [26]. However, later on during RT, manual delineation appeared to be more reliable, whilst on later acquisitions MTV40 and the adaptive threshold method failed in 48% and 9% of lesions, respectively.

PET images often show not ideal shapes, non-uniform activity distributions, and low contrast. Thus, in the majority of cases “experienced” operators manually delineated the MTV, at the cost of low repeatability and possible inconsistent outcomes.

Indeed, some authors investigated the impact of different automated delineation strategies. In particular, Grootjans et al. analysed several methods in FDGint and concluded that SBR was the segmentation method of choice for stronger association of TLG with PFS and OS, although also MTV50 and FLAB segmentation were successful [45]. Although not in FDGint, Hatt et al. concluded that fixed (MTV50) and adaptive threshold-based methods should not be used in cases of large heterogeneous NSCLC, while advanced image segmentation algorithms able to deal with heterogeneity should be preferred [55].

Another interesting approach is the analysis of tumour metabolic heterogeneity by textural features, that could correctly predict treatment response and survival of NSCLC patients undergoing CRT, in basal 18F–FDG images and in FDGint as well, allowing to stratify patients in clinical trials [44, 56, 57].

Far from expecting to provide the performance of the parameters (considering that only four papers report such analysis), Fig. 2 offers a first glance comparison of the sensitivity, specificity, and area under the curve found by different authors and various approaches, and shows promising values in the majority of cases, for all the parameters studied.

Limitations and potential

The major limitation of the reviewed studies is their heterogeneity. Some enrolled a relatively high sample size, others comprise a smaller series of patients. Moreover, the disease stage was different, thus potentially biasing the correlations of the evaluated parameters. Moreover, the prescription of surgery, different chemotherapy schemes, and different RT schedules could obstruct an inter-comparison among the researches and/or lower the predictivity of FDGint. Above all, these investigations had very different acquisition times of the FDGint, reference response assessment (CT-based, and PET-based) and PET parameters used to correlate with response and/or prognosis.

We acknowledge that the information available in the literature concerning NSCLC is unbalanced as far as the response and prognostic abilities of FDGint are concerned. Parameters of accuracy such as sensitivity, specificity, and AUC are reported only in a few papers [19, 40, 42,43,44] (Fig. 2), despite the major interest and the encouraging results for the early response and prognosis in lung cancer. In fact, these parameters are higher than 75%–80% in the majority of cases.

Many applications can be explored, for CRT response assessment as well as for adaptive RT.

Precious learning points have been highlighted to create the base of further research. Future studies should: recruit a high number of patients; be multicentric; apply the same CRT regimen; distinguish the population by different stages of NSCLC [2, 17]; apply the same PET/CT acquisition and quantification methods and the same PET parameters for the evaluation of metabolic tumour response [58, 59]; and have an adequate follow-up. The best timing remains an open question as well as the cut-off value to distinguish responders from non-responders, which may differ for different tumour stages.

With such a variety of factors, translation of these observations into the clinical practice or even into clinical trials seems still challenging. Therefore, prospective randomized trials are required as a way to investigate whether the results from FDGint are solid enough to guide treatment decisions. We feel that the necessary prerequisites exist for a positive answer to this issue and a future application of FDGint in NSCLC clinical practice.

Conclusion

The studies included in this review were quite heterogeneous, but they show that the early identification of metabolic tumour response to RT through FDGint could be used as a predictor of response and prognosis in NSCLC patients.

Further steps include the evaluation of the real clinical impact of these results, which is strictly linked to the evolution of treatment strategies in NSCLC, such as the development of adaptive protocols. Molecular imaging could become the guide for the selection of patients, who need a closer follow-up to detect and treat earlier a disease relapse, or who deserve more aggressive therapies to improve their outcome. This aim could be reached only through prospective trials with adequate statistical power and follow-up. The review of the literature concerning FDGint in NSCLC supports the possibility to distinguish responder from non-responder patients, in favour of the predictivity of response and prognosis of FDGint, and for adaptive RT. 18F–FDG remains a promising and challenging tracer for early outcomes assessment and identification of non-responder patients early during neoadjuvant CRT.