Introduction

Lung cancer is the first leading cause of cancer death among both men and women, and is expected to account for 26 % of all female cancer deaths and 28 % of all male cancer deaths in 2013 [1]. Non-small-cell lung cancer (NSCLC) accounts for 80 % of lung cancers [2]. The standard care for the treatment of early NSCLC is surgical resection and/or radiation therapy according to the patient's eligibility for surgery [3]. For advanced NSCLC, chemotherapy or chemoradiotherapy is the principal treatment modality [4]. Despite standard treatment, overall survival (OS) in NSCLC is very poor even in low-stage disease (50 % in stage IA) and becomes progressively worse with increasing TNM stage (2 % in stage IV) [5, 6]. However, there is no established prognostic marker except TNM stage and performance status [7].

PET/CT using 18F-FDG has become a valuable tool in the differential diagnosis of a solitary pulmonary nodule and a standard modality for staging and monitoring treatment response in lung cancer [8, 9]. To quantify a lesion’s metabolism, maximum standardized uptake value (SUVmax) is widely used in clinical practice. It provides a semiquantitative measure of the normalized concentration of radioactivity in a lesion [10]. In a meta-analysis the European Lung Cancer Working Party for the International Association for the Study of Lung Cancer Staging Project demonstrated the prognostic value of the SUV of the primary tumour in NSCLC [11, 12]. However, SUVmax is not recommended for risk stratification in the 7th edition of the American Joint Committee on Cancer cancer staging manual [13], and is also not considered to be a prognostic biomarker in the National Comprehensive Cancer Network guidelines (version 3, 2014) [14]. The reasons for this are that SUVmax is a single voxel value that may not represent total tumour metabolism [15] and it is not certain whether SUVmax is a reliable independent prognostic marker or whether it provides additional risk stratification over T staging [13, 16].

Instead of SUVmax, metabolic tumour volume (MTV) and total lesion glycolysis (TLG), which are volumetric indices derived from 18F-FDG PET, have been proposed for risk stratification of lung cancer patients [17]. On 18F-FDG PET images, tumour can be delineated by a specific threshold SUV or with other methods such as the gradient or the fuzzy C-mean method, with MTV referred to a volume of the delineated tumour [3, 18, 19]. TLG is calculated by multiplying MTV by the mean SUV of all voxels in the MTV, and represents both the degree of 18F-FDG uptake and the size of the tumour, in other words the whole metabolic and volumetric burden of the tumour [10, 2022]. Growing interest in volumetric indices has led to the development of commercially available tools that enable the rapid and simple measurement of the indices for tumour analysis [20]. In fact, MTV and TLG are considered to be more reliable markers reflecting tumour burden and aggressiveness and are thus better candidates as prognostic markers in a variety of types of malignancy including lung cancer [15, 23, 24]. However, there are also several conflicting results regarding the prognostic value of volumetric parameters in NSCLC [3, 25]. Therefore, we designed a meta-analysis to assess the prognostic value of MTV and TLG in patients with NSCLC.

Materials and methods

Data search and study selection

We performed a systematic search of MEDLINE (inception to November 2013) and EMBASE (inception to November 2013) for English language publications using the keywords “positron emission tomography”, “lung”, and “volume.” All searches were limited to human studies. Inclusion criteria were: 18F-FDG PET used as an initial imaging tool; studies limited to NSCLC; volume measurement of lung cancer; patients who had not undergone surgery, chemotherapy, or radiotherapy before the 18F-FDG PET scan; and articles that reported survival data. Reviews, abstracts, and editorial material were excluded. Two authors conducted the searches and screening independently. Any discrepancies were resolved by consensus.

Data extraction and quality assessment

Data were extracted from the publications independently by two reviewers (K. Pak and H.J. Im) and the following information was recorded: first author, year of publication, country, study design, number of patients, TNM staging, treatment, and endpoints. Three reviewers scored each publication according to a quality scale, which was based on that used in previous studies [11, 26]. This quality scale was grouped into four categories: scientific design, generalizability, analysis of results, and PET reports (Supplementary Table 1). A value of 0, 1 or 2 was attributed to each item. Each category had a maximum score of 10 points.

Statistical analysis

We followed the same methodology as used in our previous study [27]. The primary outcome was event-free survival (EFS). Disease-free survival, recurrence-free survival and progression-free survival were obtained as primary outcomes and newly defined as EFS, which was measured from the date of initiation of therapy to the date of recurrence or metastasis [28]. The secondary endpoint was OS, defined as the time from initiation of therapy until death from any cause. The relationships between MTV and TLG and survival were measured in terms of the hazard ratio (HR) effect size. Survival data were extracted using the following methodology as suggested by Parmar et al. [29]. We extracted a univariate HR estimate and 95 % confidence intervals (CIs) directly from each study, if provided by the authors. Otherwise, p values of the log-rank test, 95 % CI, number of events and number at risk were extracted to estimate the HR indirectly. Survival rates on Kaplan-Meier curves were read using Engauge Digitizer version 3.0 (http://digitizer.sourceforge.net) to reconstruct the HR estimate and its variance, assuming that patients were censored at a constant rate during follow-up. An HR greater than 1 implied worse survival in patients with a high MTV or TLG, whereas an HR less than 1 implied a survival benefit in patients with a high MTV or TLG. Heterogeneity between studies was assessed in term so χ 2 test and I 2 statistics, as described by Higgins et al. [30]. Funnel plots were used to assess publication bias graphically [31]. Survival data were also extracted in relation to SUVmax from the same studies included in this meta-analysis as mentioned above. P values less than 0.05 were considered statistically significant. Data from each study were analysed using Review Manager (RevMan, version 5.2; The Nordic Cochrane Centre, The Cochrane Collaboration, 2012, Copenhagen).

Results

Study characteristics

The electronic search identified 507 articles. After excluding 24 articles in languages other than English, 233 conference abstracts, and 113 studies that did not meet the inclusion criteria based on title and abstract, and reviewing the full text of 57 articles, 13 studies including 1,581 patients were eligible for inclusion in this study (Fig. 1). All 13 studies were of a retrospective design. We excluded whole-body MTV or TLG data from this meta-analysis. Either MTV [3234] or TLG [2] was measured in four studies, and both [3, 24, 25, 3540] were measured in nine studies. The volume of interest (VOI) was defined as the primary lung cancer lesion. Four threshold methods were adapted to segment the VOI in each study. A fixed SUV of 2.5 [2, 24, 25, 33, 34, 38, 40] or 7 [32] was used in eight studies. The gradient segmentation method was applied in two studies [3, 36], and 50 % of SUVmax was used in two studies [35, 39]. In one study, a threshold was determined using mediastinal background average SUV plus 2 standard deviations [37]. In each study, patients were divided into two groups (high and low volume) based on cut-off values. To determine cut-off values receiver operating characteristics in six studies [25, 33, 34, 3840], median values in four studies [3, 32, 35, 36], maximally selected rank statistics in two studies [24, 37], and maximizing the profile partial likelihood [2] in one study were applied. The cut-off values of MTV ranged between 0.3 and 68.3 cm3 and those of TLG ranged from 9.6 to 525. Visual inspection of the funnel plot suggested no evidence of publication bias. Study characteristics are summarized in Tables 1 and 2.

Fig. 1
figure 1

Flowchart for the identification of eligible studies

Table 1 Studies included in the meta-analysis
Table 2 PET protocols of included studies

Primary outcome: EFS

EFS was analysed based on eight studies investigating the prognostic value of MTV. The combined HR for adverse events was 2.71 (95 % CI 1.82 – 4.02, p < 0.00001). There was significant heterogeneity (χ 2 = 15.82, p = 0.03; I 2 = 56 %). Eight studies investigating the prognostic value of TLG were included in the second analysis of EFS. Using a fixed-effect model, the pooled HR was 2.43 (95 % CI 1.95 – 3.02, p < 0.00001; I 2 = 0 %), indicating that tumours with a high TLG are associated with progression and recurrence. Forest plots of the HR in studies investigating the prognostic value of MTV and TLG are presented in Figs. 2 and 3.

Fig. 2
figure 2

Forest plots of hazard ratios for events in studies investigating the prognostic value of MTV. Hazard ratios for events in individual studies together with the pooled result are shown (error bars 95 % CI, SE standard error)

Fig. 3
figure 3

Forest plots of hazard ratio for events in studies investigating the prognostic value of TLG. Hazard ratios for events in individual studies together with the pooled result are shown (error bars 95 % CI, SE standard error)

Subgroup analyses were performed in relation to tumour delineation method, cut-off value, and TNM stage. According to three variables, eligible studies were divided into two subgroups. Two studies [36, 39] included patients with stage I to IV, and thus were excluded from the subgroup meta-analysis of TNM stage. Each subgroup analysis showed significant HR for events (Table 3).

Table 3 Subgroup analyses of volumetric parameters of 18F-FDG PET

Secondary outcome: OS

The survival analysis was based on seven studies investigating the prognostic value of MTV. The combined HR was 2.31 (95 % CI 1.54 – 3.47, p < 0.0001; χ 2 = 18.97, p = 0.004; I 2 = 68 %; Fig. 4). Six studies investigating the prognostic value of TLG were included in the analysis of OS. The pooled HR for death was 2.49 (95 % CI 1.94 – 3.18, p < 0.00001; Fig. 5). There was no evidence of significant heterogeneity (I 2 = 28 %. χ 2 = 6.99, p = 0.22). Subgroup meta-analyses in relation to cut-off value, tumour delineation method, and TNM stage were performed. Each subgroup analysis showed a significant HR for death (Table 3).

Fig. 4
figure 4

Forest plots of HR for death in studies investigating the prognostic value of MTV. HRs for death in individual studies together with the pooled result are shown (error bars 95 % CI, SE standard error)

Fig. 5
figure 5

Forest plots of HR for death in studies investigating the prognostic value of TLG. HRs for death in individual studies together with the pooled result are shown (error bars 95 % CI, SE standard error)

Combined SUVmax data

Survival data were extracted from studies investigating the value SUVmax in predicting EFS (seven studies) and OS (four studies). The HR for adverse events was 2.12 (95 % CI 1.30 – 3.47, p = 0.003) with significant heterogeneity (χ 2 = 16.59, p = 0.01; I 2 = 64 %). The pooled HR for death was 1.2 (95 % CI 1.05 – 1.38, p = 0.008) with significant heterogeneity (I 2 of 72 %, χ 2 = 10.89, p = 0.01; Table 4).

Table 4 Pooled hazard ratios of 18F-FDG PET parameters

Discussion

In the present meta-analysis, the prognostic value of volumetric indices from 18F-FDG PET in NSCLC patients was evaluated by analysing the HR for EFS and OS in patients with high MTV and/or TLG compared to those with low MTV and/or TLG. The pooled results showed that patients with high MTV had a 2.71-fold higher risk of adverse events and a 2.31-fold higher risk of death than patients with low MTV. Patients with a high TLG had a 2.35-fold higher risk of adverse events and a 2.43-fold higher risk of death. HRs of MTV and TLG for OS were higher than those of SUVmax for OS without overlapping 95 % CI (Table 4). In addition, SUVmax was not a significant prognostic factor either for EFS (three of seven studies) or for OS (two of four studies) in most studies included in this meta-analysis. In contrast, a single study [3] (1 of 13 studies) showed that MTV and TLG cannot predict EFS and OS. However, for adverse events, we could not confirm if pooled HRs of MTV and TLG are higher than that of SUVmax because of overlapping 95 % CIs (Table 4).

MTV and TLG are combined volumetric and metabolic parameters that reflect both properties of the tumour. More precisely, MTV is affected by tumour size and the distribution of the SUV and TLG is affected by MTV and also SUV. Also, SUV itself can vary according to blood glucose level, fasting time, uptake time, and methods of attenuation correction and reconstruction. We reviewed these factors in the studies included using a quality assessment form (Supplementary Table 1). In the quality assessment of the PET studies, four studies scored 5/8 (62.5 %) and the other nine studies scored 6/8 (75 %). In all studies, blood sugar levels were determined and imaging was done when the patient had a blood sugar level lower than their upper limit (blood sugar range 120 – 200 mg/dL). Fasting time was also well documented in all studies except one [36], and ranged from 4 to 8 h. Uptake time after injection of 18F-FDG was well reported in all studies and ranged from 45 to about 60 min, except in one study with an uptake time of 84 ± 32 min [36] (Table 2). The procedure for measuring SUV was acceptable in all studies except one which had relatively long uptake periods with a wide range [36]. However, exclusion of this study did not affect the pooled HRs: the HR for OS in patients with a high MTV changed from 2.31 (95 % CI 1.54 – 3.47) to 2.33 (95 % CI 1.48 – 3.66), and the HR for OS in patients with a high TLG changed from 2.49 (95 % CI 1.94 – 3.18) to 2.48 (95 % CI 1.92 – 3.21).

Although an SUVmax threshold of 2.5 is widely used for tumour delineation, Abelson et al. [32] found in their patient population that an SUVmax threshold of 7 was better than a threshold of 2 or 4 for predicting prognosis. Thus, to find specific cut-off MTV and TLG values for a worse prognosis for further research, the measurement of SUV should be well controlled and the SUV for tumour delineation should also be standardized. However, regardless of the method of tumour delineation or the MTV and TLG cut-off values selected in each study, high values of MTV and TLG were associated with a higher risk of adverse events and/or death.

The search for previous meta-analyses evaluating the utility of PET or PET/CT in lung cancer identified 20 articles (Table 5). Of these 20 studies, 15 evaluated PET for detecting lymph node metastasis [4149] or distant metastasis [5055], 2 evaluated the accuracy of PET for diagnosing a solitary pulmonary nodule [56, 57], one determined the predictive value of PET after neoadjuvant therapy [58], and two evaluated PET for determining disease-free survival and OS using the HR effect size [11, 12]. In a meta-analysis, Berghmans et al. [11] determined the prognostic value of SUVmax in NSCLC patients. These authors subsequently conducted another meta-analysis [12] which showed that SUVmax was associated with a 2.08-fold higher risk of death (95 % CI 1.69 – 2.56), which is similar to the pooled HR found in the current study (2.33, 95 % CI 1.51 – 3.61), even though there was no overlap in the studies between the two meta-analyses. In one study in patients with advanced stage NSCLC, high SUVmax was not a significant risk factor [16]. This might be explained by the fact that if the cancer becomes advanced, SUVmax can neither represent the whole tumour burden nor predict prognosis. Interestingly, the subgroup analysis in this study according to TNM stage showed that both MTV and TLG were significant risk factors for EFS and OS in patients with stage I/II and III/IV NSCLC.

In 11 of the included studies multivariate analysis was performed using the Cox proportional hazards model [2, 24, 25, 3335, 3740] or logistic regression model [36] to evaluate the independence of MTV and TLG as prognostic markers with covariates including TNM stage and/or tumour size. Of seven studies in which multivariate analysis for EFS was performed [23, 25, 31, 33, 39, 45, 52], three [25, 38, 40] of six [25, 33, 35, 3840] and four [2, 35, 38, 40] of seven showed that MTV and TLG, respectively, are independent prognostic markers for EFS. On the other hand, SUVmax was found to be an independent prognostic marker in only one study [40] of seven [2, 25, 33, 35, 3840]. Of six studies in which multivariate analysis was performed for OS, five of five [24, 25, 34, 36, 37] and three [2, 24, 37] of five [2, 24, 25, 36, 37] showed that MTV and TLG, respectively, are independent prognostic markers for OS. However, SUVmax was found to be an independent prognostic marker in only one study [37] of six [2, 24, 25, 34, 36, 37]. These results indicate that, unlike SUVmax, MTV and TLG might be independent prognostic markers regardless of TNM stage and tumour size. However, since the results are heterogeneous and all included studies had a retrospective design, a further large-scale prospective study is warranted to assess whether MTV and TLG could be independent prognostic factors for clinical outcome.

Table 5 Previous meta-analyses of 18F-FDG PET in patients with lung cancer

Heterogeneity was detected in the present meta-analysis. In pooled data, significant heterogeneity was found for MTV in predicting EFS [38] and OS [34], and thus a random effect model was used to derive a pooled HR. In each analysis of the value of MTV in predicting EFS, studies that showed heterogeneity were identified [34, 38]. The study by Yan et al. [34] was the only study that used PET rather than PET/CT in analysis of the value of MTV in predicting EFS, and the study by Yoo et al. [38] was the only study that included only patients with stage IV lung cancer. Excluding these two studies reduced the heterogeneity (I 2, from 56 % to 42 % for EFS, and from 68 % to 11 % for OS) with HR of 2.34 (95 % CI 1.64 – 3.34) for EFS and 2.64 (95 % CI 1.99 – 3.50) for OS.

This is the first meta-analysis investigating the prognostic value of volumetric parameters in patients with lung cancer; however, the study had several limitations. We were unable to determine an optimal cut-off value to categorize volumetric parameters as high or low. Different cut-off values and delineation strategies, and various histological methods were applied in the studies, which might have affected the occurrence of events and survival. Further studies with data from individual patients are needed to determine standard cut-off values and delineation methods for predicting prognosis using volumetric PET parameters. Although we found that patients with a high MTV or TLG had a higher risk of adverse events or death than patients with a low MTV or TLG, there was difficulty in interpreting the HRs for MTV and TLG because exact incidence rates for the events were unknown. Further prospective studies are needed which also include incidence rates. The included studies were all retrospective in design and thus the results could have been underpowered. There was a single study with a prospective design, but we could not extract survival data [7]. A publication bias cannot be excluded even though funnel plots showed no clear evidence of it. In addition, language bias could have been present because articles in languages other than English were excluded. In addition, although two reviewers independently extracted data from each study, the complete accuracy of the data could not be ensured by the strategy.

Conclusion

Volumetric parameters from 18F-FDG PET are significant prognostic factors for outcome in patients with NSCLC. Patients with a high MTV or TLG are at higher risk of adverse events or death. In addition, volumetric parameters may be used as incremental predictors of EFS rather than SUVmax even in patients with advanced NSCLC.