Introduction

Although thymic epithelial tumours (TETs) are relatively rare, they are the most common primary neoplasm in the anterior mediastinum in adults [1]. The World Health Organization (WHO) classification divides TETs into six histologic subtypes (thymoma A1, AB, B1, B2, B3, and thymic carcinoma) according to their morphology and degree of atypia [2]. Further simplified histologic classifications have been proposed—low-risk thymoma (LRT; thymoma A, AB, and B1), high-risk thymoma (HRT; thymoma B2 and B3), and thymic carcinoma (TC) — and these simplified subgroups show different prognostic values [3, 4].

The wide spectrum of morphological and oncologic behaviour in TETs makes accurate prognostication critical for their therapeutic management. Currently, management decisions for TETs are largely made on the basis of several prognostic factors including histologic grade and stage. Complete surgical resection is recommended as the standard of care and is an important prognostic factor in potentially resectable TETs [5], while adjuvant radiation therapy or systemic therapy may also be required, according to the risk factors [6]. Neoadjuvant chemotherapy can be considered in advanced disease, with the aim of making complete removal of tumour feasible [7]. The selection of open or minimally invasive surgery and the surgical extent also differ according to stage and histologic subtypes [8].

Computed tomography (CT) and magnetic resonance imaging (MRI) are the standard imaging modalities for TETs but may have limited value for differentiating the histologic subtypes as there is significant overlap in morphologic features across the disease entities [9,10,11]. 2-Deoxy-2-[18F]fluoro-D-glucose positron emission tomography/CT (2-[18F]FDG PET/CT) which represents glycolytic activity, a metabolic hallmark of malignancy [12], has been evaluated for the differentiation of TETs. The maximum standardised uptake value (SUVmax) and texture parameters on 2-[18F]FDG PET/CT have shown favourable results for determining histologic grade [13,14,15]. Moreover, a few studies showed that SUVmax and volumetric parameters were associated with disease recurrence [16, 17]. We considered that establishing the role of 2-[18F]FDG PET/CT for predicting histologic grade and prognosis would be a step forward in the therapeutic management of TETs. Therefore, this study evaluated the diagnostic and prognostic values of pretreatment 2-[18F]FDG PET/CT for determining the simplified WHO classification, freedom-from-recurrence (FFR) and overall survival (OS) in resectable TETs.

Materials and methods

Study design and subjects

This retrospective single-centre cohort study was approved by our local institutional review board. The need for informed patient consent was waived because of its retrospective nature (IRB No: 2021-0076).

Between January 2012 and December 2018, 161 consecutive patients with TETs had undergone pretreatment 2-[18F]FDG PET/CT and surgery at our institution. Of these, seven patients who underwent surgery only for biopsy because complete surgical resection of tumour was considered unfeasible were excluded from this study. Histological classifications were performed according to the WHO classifications for TETs. All slides of the surgically resected specimens were evaluated by experienced pathologists. If a tumour exhibited a mixed histologic type, it was classified according to the highest malignancy rating.

PET/CT image acquisition

All patients fasted for at least 6 h before the procedure and had a venous blood glucose level less than 150 mg/dl. PET imaging was performed using one of the following scanners: Discovery STe 8, Discovery 690, Discovery 710, Discovery 690 Elite (GE Healthcare), Biograph Sensation 16, or Biograph TruePoint 40 (Siemens Healthineers). Of these, PET/CT images acquired on Discovery STe 8 (n = 9), Biograph Sensation 16 (n = 5), and Biograph TruePoint 40 scanner (n = 26) were excluded from the quantitative analysis because images from these scanners have a lower spatial resolution and different acquisition and reconstruction parameters (i.e., non-time-of-flight or non-point-spread-function) which can lead to significant difference in SUV and image quality, compared to those from the other three scanners. Finally, 114 patients who underwent pretreatment 2-[18F]FDG PET/CT were included in the analysis (Fig. 1).

Fig. 1
figure 1

Patient inclusion flowchart

Patients were intravenously administered 5.2 MBq/kg of 2-[18F]FDG and image acquisition was commenced about 60 min after the injection. PET/CT images were acquired from skull base to mid-thigh with 2 min per bed position in 3-dimensional mode. Data were reconstructed on a 192 × 192 matrix with a voxel size of 2.6 × 2.6 × 3.75 mm using an ordered-subset expectation maximisation algorithm (18 subsets, four iterations, 4.0-mm full-width-at-half-maximum Gaussian smoothing) with time-of-flight and point-spread-function modelling and with attenuation correction using CT maps. The SUVs were harmonised between three aforementioned scanners, which were ensured by routine dose calibrator (CRC-25 PET; Capintec, Inc.), quality control (daily constancy, quarterly linearity, and annual accuracy/precision), annual cross-calibration with the same dose calibrator, and quarterly quality control of the recovery coefficients of all hot cylinders in an American College of Radiology (ACR)-accredited PET phantom (Flangeless Esser PET phantom; Biodex Medical Systems, Inc.) without any smoothing.

PET/CT image analysis

The PET/CT parameters of the anterior mediastinal tumours were semi-quantitatively measured in a blind manner by one experienced nuclear medicine physician (S.H.). The SUV was calculated according to a standard formula, using lean body mass values calculated from body weights and heights [18]. The metabolically active tumour was segmented using an absolute SUV threshold of ≥ 2.5, as reported in a previous study [1]. SUVmax, metabolic tumour volume (MTV), and total lesion glycolysis (TLG) were derived using Mirada DBX software (version 1.2.0.59; Mirada Medical Ltd). Additionally, the maximum diameter of the tumour in any direction was measured on an axial plane combined CT image.

Statistical analysis

Continuous variables are described as mean and standard deviation or median and interquartile range (IQR), and categorical variables are described as numbers with proportions (%). Categorical variables were compared using Pearson’s Chi-square test or Fisher’s exact test. Continuous variables were compared with one-way analysis of variance (ANOVA) or Kruskal–Wallis test with post hoc Dunn’s test. Receiver operating characteristics (ROC) curve analysis was used to evaluate diagnostic performance for the different histologic subtypes. The area under the curve (AUC) values and their 95% confidence intervals (CIs) were derived and compared using DeLong’s method [19]. The cut-off value giving the highest accuracy was chosen as the optimal cut-off.

The survival measures used in this study adhere to the standard outcome measures proposed by the International Thymic Malignancy Interest Group (ITMIG) [20]. Following the ITMIG recommendation and considering the indolent behaviour of TETs, FFR was chosen as a putative oncologic outcome of OS in our study population. FFR was defined as the period from the date of completion of treatment to the date of recurrence. OS was defined as the time between the date of diagnosis and death from any cause. All patients underwent R0 resection or had no radiologic evidence of disease after resection followed by adjuvant treatment. Patients without events until the last follow-up (March 16 2021) were regarded as censored. Survival analyses were performed using univariable and multivariable Cox proportional hazards models. Possible nonlinearities were evaluated by plotting the Martingale residuals of the null Cox models against continuous predictors; accordingly, the median age was used to dichotomise age at diagnosis. Potentially influential observations were evaluated using deviance residuals and dfbeta values. The proportional hazards assumption was tested using Schoenfeld’s residual test. A multivariable Cox regression analysis using stepwise model selection based on the Akaike information criterion (AIC) was performed on the variables showing statistical significance in the univariable analyses. Variables having a high correlation (i.e., Masaoka or TNM stage, and SUVmax, MTV, or TLG) were separately incorporated into the multivariable Cox model considering a multicollinearity issue. Harrell’s concordance index (C-index) was used to compare the discriminatory capacity of the models [21]. Survival curves were estimated using the Kaplan–Meier method and were compared using the log-rank test. p values from multiple pairwise comparisons were adjusted using the Benjamini–Hochberg procedure to compensate for type I error. Crude or adjusted p values of less than 0.05 were regarded as significant. Statistical analyses were performed using R software (version 3.6.0; R Foundation for Statistical Computing).

Results

Patient characteristics

The characteristics of the included patients are presented in Table 1. There were 52 (46%), 33 (29%), and 29 (25%) patients with LRT, HRT, and TC, respectively. There were 39 (34%) and 31 (27%) patients with advanced disease (stage III or IV) based on the Masaoka and TNM staging systems, respectively. Neoadjuvant and adjuvant treatments were performed in 9 (8%) and 53 (46%) patients, respectively. The median time interval between PET/CT and surgery was 25 days (IQR, 12 − 43 days). SUVmax, MTV, and TLG showed significant differences across the simplified WHO classifications (Fig. 2; all p < 0.001), whereas no significant difference was found in maximum diameter. SUVmax, MTV, TLG, and maximum diameter differed across the Masaoka stages and TNM stages (Supplementary Fig. 1).

Table 1 Patient characteristics
Fig. 2
figure 2

Boxplot showing SUVmax (a), MTV (b), TLG (c), and maximum diameter (d) according to the simplified WHO classification. *, **, *** represent adjusted p values of < 0.05, < 0.01, and < 0.001, respectively

Diagnostic performance for determination of WHO classification

For differentiating HRT/TC from LRT, SUVmax (AUC = 0.84 [95% CI, 0.76 − 0.92]) showed good discrimination, with significantly better performance than MTV (p = 0.022; AUC = 0.77 [95% CI, 0.68 − 0.86]), TLG (p = 0.039; AUC = 0.78 [95% CI, 0.70 − 0.87]), or maximum diameter (p < 0.001; AUC = 0.50 [95% CI, 0.39 − 0.61]), as shown in Fig. 3a. The sensitivity, specificity, and accuracy for differentiating HRT/TC from LRT were 73% (45/62), 92% (48/52), and 82% (93/114), respectively, using an optimal SUVmax cut-off value of 4.1 (Supplementary Table 1 and Supplementary Fig. 2a).

Fig. 3
figure 3

Pairwise ROC curves of PET/CT parameters for HRT/TC vs LRT (a), and TC vs LRT/HRT (b). Representative cases are shown (c)

For the differentiation of TC from LRT/HRT, SUVmax (AUC = 0.94 [95% CI, 0.90 − 0.98]) showed excellent discrimination with significantly better performance than MTV (p = 0.001; AUC = 0.84 [95% CI, 0.76 − 0.92]), TLG (p = 0.004; AUC = 0.86 [95% CI, 0.78 − 0.94]), or maximum diameter (p < 0.001; AUC = 0.62 [95% CI, 0.50 − 0.74]), as shown in Fig. 3b. The diagnostic performance of SUVmax for differentiating TC from LRT/HRT was significantly higher than that for differentiating HRT/TC from LRT (p = 0.021). Using the optimal SUVmax cut-off value of 6.4, the sensitivity, specificity, and accuracy for differentiating TC from LRT/HRT were 69% (20/29), 96% (82/85), and 89% (102/114), respectively (Supplementary Table 1 and Supplementary Fig. 2b). Representative cases are shown in Fig. 3c.

Survival analysis

Eight of 114 patients were excluded from the survival analysis because of concurrent malignancy (n = 4) or immediate follow-up loss (n = 4; Fig. 1). The median follow-up durations of FFR and OS were 32 months (IQR 21 − 50 months) and 54 months (IQR 35 − 76 months), respectively. During the follow-up period, 18 patients had disease recurrence (one in mediastinum, one in lymph node, six in pleura, four in lung, two in bone, one in liver, one in pararenal space, one in pleura and lung, and one in pleura and bone). Death occurred in eight patients, including one patient who died without disease recurrence.

In the univariable Cox model, SUVmax, MTV, TLG, simplified WHO classification, Masaoka stage, TNM stage, and resection margin were significantly associated with FFR (Table 2, Fig. 4a, and Supplementary Fig. 3). In the multivariable analyses, SUVmax was independently associated with FFR (Table 3; adjusted hazard ratio 1.39 [95% CI 1.22 − 1.58] for model 1, 1.33 [95% CI 1.15 − 1.52] for model 2), but MTV and TLG were not (Supplementary Table 2). The multivariable Cox model with SUVmax showed better performance than those without SUVmax in terms of fitness and discrimination performance based upon AIC and C-index. In a post hoc sensitivity analysis, the overall association between SUVmax and FFR appeared relatively consistent across both the WHO classification and clinical stage subgroups (Fig. 5). When the simplified WHO classification was stratified using the optimal diagnostic SUVmax cut-off of 6.4, a clear risk stratification in FFR was found between the TC and LRT/HRT groups, with TC patients with an SUVmax ≤ 6.4 and LRT/HRT patients with an SUVmax > 6.4 showing similar FFR to LRT/HRT and TC groups, respectively (Fig. 4b).

Table 2 Univariable Cox proportional hazards model for freedom-from-recurrence and overall survival
Fig. 4
figure 4

Kaplan–Meier curves of freedom-from-recurrence according to the simplified WHO classification (a) and subgroups divided by the optimal diagnostic SUVmax cut-off of 6.4 (b). The table below the figure describes the adjusted p values between each group determined using the log‐rank test

Table 3 Multivariable Cox proportional hazards analyses of two models for predicting freedom-from-recurrence
Fig. 5
figure 5

Extended Cox proportional hazards analyses of SUVmax for freedom-from-recurrence according to clinical characteristics. The SUVmax value was included as a continuous variable in the Cox models

High SUVmax, Masaoka stage, TNM stage, and R1 resection were associated with worse OS (Table 2), but multivariable analyses could not be performed because of the paucity of events.

Discussion

The current study evaluated the diagnostic and prognostic values of 2-[18F]FDG PET/CT in patients with resectable TETs. Our major findings are as follows: (1) 2-[18F]FDG uptake parameters (SUVmax, MTV, and TLG) differed across the simplified WHO classifications of TETs; (2) SUVmax, MTV, and TLG showed good performance in the differentiation of LRT, HRT, and TC, with SUVmax having excellent performance for predicting TC; and (3) SUVmax was an independent significant predictor for disease recurrence and contributed to building a better prognostic model. As current therapeutic management decisions for TETs are largely based on well-known prognostic factors including histologic grade and stage, our findings support the use of SUVmax for predicting histologic grade and patient prognosis, which can help with choosing the optimal management in patients with TET.

Our findings are consistent with previous studies that reported SUVmax as having a higher discriminative ability for histologic grades than volumetric or textural parameters [1, 14]. SUVmax had particular strength for differentiating TC from thymoma (compared with dividing tumours into LRT and HRT), which has important clinical implications in the surgical planning of TETs. A systematic sampling of supraclavicular and lower cervical lymph nodes is recommended for TC, whereas the extent of lymph node evaluation is usually limited to the mediastinum in thymoma [8]. Previous studies indicated that 2-[18F]FDG uptake parameters including SUVmax, MTV, and TLG were significantly associated with disease recurrence in resectable TETs [16, 17]. In addition, the metabolic response on 2-[18F]FDG PET/CT is associated with disease progression and survival in patients with unresectable TETs [22,23,24]. Interestingly, SUVmax showed higher diagnostic and prognostic values for TETs than MTV and TLG, which have been reported to have incremental prognostic value over SUVmax in other malignancies [25,26,27]. Previous TET studies also indicated that SUVmax has higher or at least comparable diagnostic [1, 14] and prognostic values [28] to volumetric parameters. It might suggest that the oncologic behaviour of TETs, which often shows a highly variable histology with intratumoural heterogeneity, depends largely on the region showing the highest degree of malignancy rather than the entire tumour cell.

As far as we are aware, our study included the largest cohort of patients with resectable TETs and pretreatment 2-[18F]FDG PET/CT so far evaluated, and we found SUVmax to have an independent and incremental prognostic value in association with other traditional prognostic markers including stage and WHO classification. In addition to including a large study cohort, the survival analyses in this study were conducted in line with the ITMIG guidelines [20], not by applying data-dependent cut-offs for continuous predictors [29], and by meticulously checking the proportional hazard assumption [30, 31], which could help provide robust evidence for the prognostic significance of 2-[18F]FDG PET/CT.

As an imaging biomarker, 2-[18F]FDG PET/CT has several theoretical virtues: (1) PET/CT can provide information preoperatively, whereas histologic grade and stage based on the Masaoka or TNM staging systems can only be obtained after surgery; (2) PET/CT allows nodal and distant staging as well as prediction of the primary tumour histology and outcome; (3) PET/CT can non-invasively evaluate the entire tumour, whereas biopsy may not reflect tumour heterogeneity and may increase the risk of tumour implantation [32]; (4) SUVmax is a simple numeric value not affected by observers, whereas the histologic classification of thymoma is known to be subject to inter-observer variability with only a moderate degree of inter-observer agreement [33, 34]; and (5) pretreatment 2-[18F]FDG PET/CT can also serve as baseline imaging in patients with unresectable TETs who undergo first-line systemic treatment [22,23,24]. On the other hand, there is an important technical hurdle hampering general use of 2-[18F]FDG PET/CT. Quantitative measures on 2-[18F]FDG PET/CT are affected by spatial resolution of PET scanner, acquisition parameters, and reconstruction protocols. Voxel size and smoothing also strongly affect SUVmax, a single-voxel value. Establishment of the standardised PET acquisition and reconstruction parameters are warranted for analytical validity of 2-[18F]FDG PET/CT as a valid biomarker. Along with these technical efforts, our findings in conjunction with these potential advantages may bolster the role of 2-[18F]FDG PET/CT as a useful imaging biomarker for TETs in clinical practice.

There are several limitations to our study. First, our study is retrospective in nature. Nevertheless, we included consecutive patients to mitigate selection bias. Considering the rarity of TETs, retrospective studies are much more feasible than prospective trials, and they have the advantage that they reflect routine clinical practice. Second, there was only a small number of OS events, and we could not therefore perform multivariable survival analyses for OS. Considering the generally indolent nature of TETs, FFR may be a putative surrogate marker for OS [35]. Third, the semi-quantitative measures of 2-[18F]FDG PET/CT are affected by scanner, acquisition, and reconstruction protocols. Therefore, we excluded data from PET/CT scanners that showed different SUV profiles. Nevertheless, caution is required when applying our SUVmax value cut-offs to imaging from other facilities. Fourth, the validation process could not performed on our developed prediction models for FFR due to the limited number of events. Further studies are needed to obtain internal–external validation [36] of our models and compare them with other available prediction models [5, 37, 38]. Finally, our results cannot be applied to other anterior mediastinal tumours such as germ cell tumour and lymphoma. However, to some extent, TET can be distinguished from these tumours on the basis of demographics, tumour marker profiles, or imaging features.

In conclusion, 2-[18F]FDG PET/CT showed excellent performance in the differentiation between histological grades of TETs. SUVmax was an independent significant prognostic factor in terms of FFR and contributed to building an improved prognostic model. 2-[18F]FDG PET/CT can be a useful diagnostic and prognostic imaging biomarker in conjunction with WHO classification and stage, and can help select the optimal treatment strategy for patients with TETs.