Introduction

The gastrointestinal tract is the most commonly involved extranodal site of non-Hodgkin’s lymphoma (NHL). Intestinal NHL (I-NHL) is far less frequent than gastric lymphoma and accounts for 30–40% of primary gastrointestinal lymphomas [1, 2]. A multicenter, retrospective analysis of 581 patients showed that patients with non-Hodgkin’s lymphoma of the small and large intestines primarily displayed diffuse large B-cell lymphoma (DLBCL) subtype (66.4%) [3]. Combination treatment of surgery and CHOP (cyclophosphamide, doxorubicin, vincristine, and prednisolone) or R-CHOP (rituximab plus) chemotherapy was reported to provide better survival outcomes for primary intestinal DLBCL (PI-DLBCL) patients than chemotherapy alone [4]. However, surgical resection of the primary mass remains a topic of debate because DLBCL itself is a disease that can be cured by chemotherapy, and some PI-DLBCL patients were observed to respond well to chemotherapy in the absence of surgical tumor resection [5, 6]. To date, the appropriateness(the chemotherapy or surgical tumor resection) in the therapeutic strategy of PI-DLBCL has not been defined. Therefore, an accurate risk stratification method is urgently needed to tailor therapies for individual patients.

A growing body of literature shows that quantitative FDG PET/CT indices, such as metabolic tumor volume (MTV) or total lesion glycolysis (TLG), prior to treatment can be used to risk-stratify patients with nodal DLBCL and may be valuable in guiding clinical and therapeutic decisions [7,8,9]. However, evidence that the quantitative metrics are able to predict survival in patients with primary gastrointestinal DLBCL remains limited due to the scarcity of research on this subject [10, 11]. Moreover, gastric and intestinal NHL are often discussed together in most studies but differ significantly from each other in clinical features, pathology, treatment and prognosis [2, 12, 13]. The published data and recommendations concerning the role of 18F-FDG PET in PI-DLBCL are still lacking. Therefore, in the current study we explored the value of 18F-FDG PET/CT for prognostic stratification in patients with newly diagnosed PI-DLBCL treated with an R-CHOP-like regimen.

Patients and methods

Patients

We conducted a retrospective study of patients with newly diagnosed PI-DLBCL between January 2010 and May 2019, who underwent baseline whole-body FDG PET/CT. Main inclusion criteria were histological confirmation of PI-DLBCL and treatment with an R-CHOP-like regimen with curative intent. Patients were excluded if they were treated surgically. Patients who had a previous malignancy, chemotherapy, radiotherapy, pregnancy or lactation, or diabetes mellitus with a fasting blood glucose level greater than 150 mg/dL were also excluded from this study. Clinical parameters [sex, age, B symptoms, Eastern Cooperative Oncology Group Performance Status (ECOG PS), International Prognostic Index (IPI), and LDH level] were collected from the medical records. Ann Arbor stage modified by Musshoff was defined [14, 15]. The NCCN-IPI estimating a maximum of 8 scoring points was evaluated, and patients with a score of 0/1 are categorized as low risk, 2/3 as low-intermediate, 4/5 as high-intermediate and 6/7/8 as high risk [16]. For the purpose of this study, the four NCCN-IPI risk groups were dichotomized into low-risk NCCN-IPI (comprising low- and low-intermediate-risk patients) and high-risk (comprising high-intermediate- and high-risk patients) groups. Approval was obtained from the Ethics Committee of Nanjing Drum Tower Hospital, the Affiliated Hospital of Nanjing University Medical School.

PET/CT scan protocol

All patients underwent whole-body 18F-FDG PET/CT on a combined Gemini GXL PET/CT scanner with a 16-slice CT component (Philips Corp, the Netherlands). After six hours of fasting (no oral or intravenous fluids containing dextrose or other sugars), 185–370 MBq of 18F-FDG (5.18 MBq/kg) was administered intravenously. The patient’s blood glucose level was checked immediately before 18F-FDG administration. Each patient was weighed for standardized uptake value (SUV) determination prior to each scan. Whole-body PET/CT scans (from the top of the head to the upper thigh) were started sixty minutes after radiopharmaceutical injection. CT scan was performed with parameters set to 80 milliamperage-seconds (mAs) and 150 kilovolt peak (kVp). Slice thickness was 3.75–5 mm. For the FDG-PET scan, 2-min emission acquisitions per field of view were obtained in three-dimensional mode. PET scans were reconstructed in a 128 × 128 matrix with an ordered subset maximum expectation iterative reconstruction algorithm and attenuation correction based on CT data. The acquired images from the PET and CT scans were sent for image registration and fusion using Syntegra software.

Imaging analysis

PET/CT images were read by 2 physicians specializing in nuclear medicine. They were blinded to all patient information including the patient’s clinical condition. The results were determined by a consensus reached by the two physicians. Images were reviewed using the volume viewer software on a dedicated workstation (CompassView 5.0, Philips Corp, the Netherlands) to calculate SUV, MTV and TLG. Two-dimensional regions of interest (ROIs) were placed manually to cover the lesion and then a volumetric ROI was produced automatically by the software. Finally, the determined ROI was adjusted manually to fully encase all involved lesions in axial, coronal and sagittal PET/CT images. When a polylobular or extremely extensive lesion with multiple hypermetabolic tumor foci was present, the ROI of each hypermetabolic tumor focus was drawn separately. The PET parameters such as SUVmean, SUVmax as well as MTV of the lesion were produced automatically by the software. The boundaries of voxels were produced automatically with the 41% SUVmax threshold method recommended by the European Association of Nuclear Medicine, because of the high interobserver reproducibility previously described in lymphoma [17]. The circle was modified to include pathological lesions and to exclude sites of normal organs and false-positive lesions (e.g., inflammation, infection or other benign FDG-avid lesions). When the localization of lesions or intestine level of 18F-FDG uptake were unidentifiable on PET/CT, the endoscopy and abdomen enhanced CT results were used for reference. The TMTV was obtained by summing the MTV of all lesions. TLG was calculated as the sum of all MTV × SUV (mean of lesions) in each patient. SUVmax values were obtained and corrected for bodyweight using the following standard formula: mean ROI activity (MBq/mL)/[injected dose (MBq)/body weight (kg)].

Tissue microarrays and immunohistochemistry

Three representative cores (0.6 mm) of formalin-fixed, paraffin-embedded tissue (FFPE) from each case were used to build tissue microarrays (TMAs). Immunohistochemical staining was performed on 4-μm sections using standard procedures. GCB and non-GCB phenotypes were defined using the decision tree established by Hans and colleagues with indicated cut-offs [18].

Treatment and follow-up period

All patients had been treated with four to eight cycles of a standard-dose R-CHOP-like regimen. Follow-up evaluation consisted of history, physical examinations, endoscopy, abdominal ultrasound, whole-body FDG PET/CT (not in all patients), MRI scans of the head and CT scans of the neck, thorax, abdomen, and pelvis (if necessary). Patients were examined every 3 months for at least 2 years and twice a year afterwards.

Statistical methods

Progression-free survival (PFS) and overall survival (OS) were chosen as the end points to evaluate the prognosis of PI-DLBCL patients. PFS was defined as the interval between the date of diagnosis and the date of first relapse, progression, death from any cause, or last follow-up examination. OS was defined as the interval from the date of diagnosis until the time of death from any cause or last follow-up. Survival functions were estimated using the Kaplan–Meier method and were compared using log-rank tests. All PET and clinical variables, as well as prognostic scores, including the IPI and NCCN-IPI, deemed significant in the univariate analysis were entered into a multivariate analysis using the Cox proportional hazards model. Receiver operating characteristic (ROC) curves were constructed to estimate accuracy in predicting ideal cut-off values for SUVmax, TMTV and TLG. All statistical analyses were performed using SPSS 22.0, and P values less than 0.05 were considered to be significant.

Results

Patient characteristics and treatment results

The clinical characteristics of the 73 patients (27 women and 46 men) included in the study are summarized in Table 1. After a median follow-up of 20 months (range 3–117 months), 33 patients had disease relapse or progression, and 26 patients died.

Table 1 Clinical and imaging characteristics of the study population

Clinical characteristics of patients in relation to TMTV and TLG

The differences in clinical characteristics between the dichotomized TLG groups are shown in Table 2.

Table 2 MTV and TLG in relation to patient clinical parameters

ROC curve analysis of SUVmax, TMTV and TLG

In the present study, ROC curve analysis was used to calculate the accuracy of ideal cut-off values to distinguish a low SUVmax group from a high SUVmax group, a low TMTV group from a high TMTV group and a low TLG group from a high TLG group. The estimated area under the ROC curve (AUROC) for SUVmax was 0.656, that for TMTV was 0.784 and that for TLG was 0.794. The ideal cut-off values for SUVmax, TMTV and TLG were 25.3, 211.1 cm3 and 1559.8, respectively (Fig. 1).

Fig. 1
figure 1

Analysis of ROC curves to determine whether SUVmax, MTV or TLG was the better predictor of survival

Univariate and multivariate analysis

The mean PFS was 60.5 months (95% CI 46.7–74.2, range 3–117 months), and the mean OS was 71.4 months (95% CI 57.9–85.0, range 3–117 months). The PFS and OS estimates for all patients were 54.8% and 64.4%, respectively. The univariate analyses are shown in Table 3 and Fig. 2. High NCCN-IPI, non-GCB as well as high TLG were significantly correlated with both inferior PFS and OS. NCCN-IPI (P = 0.001), non-GCB (P = 0.037) and TLG (P = 0.025) remained statistically independent predictors of PFS, and both TLG (P < 0.001) and NCCN-IPI (P = 0.016) were statistically independent predictors of OS after multivariate analysis (Table 4).

Table 3 Univariate analysis of factors predictive of PFS and OS
Fig. 2
figure 2

Kaplan–Meier plots for PFS and OS in all patients in relation to TLG (< 1559.8 vs. ≥ 1559.8), cell-of-origin (GCB and non-GCB) and NCCN-IPI score (low-risk score 0–3 vs. high-risk score 4–8). PFS (a) and OS (b) in relation to TLG; PFS (c) and OS (d) in relation to the cell-of-origin; PFS (e) and OS (f) in relation to the NCCN-IPI score

Table 4 Multivariate analysis of factors predictive of survival

Grading system to predict PFS and OS in PI-DLBCL

The grading system was based on the number of risk factors following the results of univariate and multivariate analysis (high TLG, non-GCB, high NCCN-IPI), and patients were divided into 4 risk groups (PFS: χ2 = 33.858, P < 0.001; OS: χ2 = 29.435, P < 0.001); low-risk group (none of the 3 risk factors, 18 patients); low-intermediate risk group (1 risk factor, 24 patients); high-intermediate risk group (2 risk factors, 16 patients); and high-risk group (all 3 risk factors, 15 patients) (Fig. 3a). Survival curves generated by Kaplan–Meier analysis are used to display the differences among these 4 risk groups by the grading system, which showed a stronger ability to reveal further discrimination among subgroups compared with NCCN-IPI alone (Fig. 3b, c and Table 5).

Fig. 3
figure 3

a Illustration of the grading system using maximal intensity projection on FDG-PET images. b Kaplan–Meier curve of overall survival (PFS) according to the grading system. c Kaplan–Meier curve of overall survival (OS) according to the grading system

Table 5 Comparing the NCCN-IPI with the grading system based on TLG, cell-of-origin and the NCCN-IPI

Discussion

The combined treatment of rituximab with CHOP (R-CHOP) was readily adopted as a standard treatment for DLBCL and shown to achieve a significant improvement in the prognosis of patients with nodal DLBCL [19, 20]. However, the published data based on PI-DLBCL are still limited. PI-DLBCL, as a subtype of DLBCL with the extranodal presentation, has significantly different molecular and clinical characteristics from nodal DLBCL, suggesting that they should be regarded as separate entities [21,22,23]. Therefore, we aimed to explore the value of 18F-FDG PET/CT for prognostic stratification in patients with PI-DLBCL treated with an R-CHOP-like regimen in the current study.

The quantization parameters (MTV and TLG) of 18F-FDG PET/CT have been shown to be useful in the prognosis evaluation in patients with nodal DLBCL [9]. TLG on baseline PET appeared to be a powerful predictor of patients with extranodal DLBCL originating from primary mediastinal [24], central nervous system [25] and testicular [26] areas. In the present study, we addressed the issue of the prognostic value of TLG on 18F-FDG PET/CT prior to pretreatment in PI-DLBCL patients. The results indicated that patients with TLG greater than 1559.8 had lower survival, with TLG being an independent predictor of survival outcomes after multivariate analysis. This finding is inconsistent with Alagöz et al.’s research, which demonstrated that TLG was not a predictive marker for primary gastrointestinal lymphoma [11]. The reasons for these discrepancies may be partially explained by the fact that patients included in Alagöz et al.’s study are gastric and intestinal DLBCL patients combined, which significantly differ from each other in prognosis [12]. Interestingly, the TMTV was also evaluated but was found to not be an independent predictor in our study. The reason for this finding could be partially explained by the fact that TLG obtained by multiplying SUV mean by MTV can contribute to patient management by assessing both tumor volume and metabolism together [27].

In recent years, with the progress of molecular genetic research, it was determined that molecular tumor heterogeneity is directly correlated with the treatment response and prognosis and will be crucial to the development of individualized risk-adapted therapy [28]. The cell-of-origin phenotype has been demonstrated to be a strong prognostic biomarker that the presence of the non-GCB type would be associated with a dismal prognosis in DLBCL [18, 29, 30]. In our study, non-GCB type was shown to be significantly associated with PFS (HR 3.400, P = 0.004) and OS (HR 2.830, P = 0.026), and an independent predictor of PFS. These results confirmed that cell-of-origin phenotype is a powerful tool to predict survival outcomes in PI-DLBCL patients.

Following the introduction of rituximab in the treatment of patients with DLBCL, the impact of IPI on the prognostic prediction was deduced and questioned [31,32,33]. With the pressing needs for newer strategies to better subcategorize DLBCL patients in the rituximab era, NCCN-IPI was proposed by Zhou et al. in 2014 and can better discriminate low and high-risk subgroups than the IPI [16]. In the current study, the Kaplan–Meier curves for PFS and OS were plotted with 2 subgroup stratification based on IPI and NCCN-IPI, respectively, and patients in the 2 NCCN-IPI groups showed a distinctive clinical outcome (PFS: HR 3.219, P = 0.001; OS: HR 3.515 P = 0.002), whereas patients subcategorized by the IPI exhibited a less discriminatory survival pattern (PFS: HR 2.891, P = 0.002; OS: HR 2.723, P = 0.011). Moreover, NCCN-IPI was shown to be an independent predictor of survival outcomes in multivariate analysis. Our result confirms that the NCCN-IPI is a more robust and useful prognostic tool to stratify PI-DLBCL patients.

Prior studies have demonstrated that integration of molecular indices or PET quantization parameters with IPI or NCCN-IPI could yield a better stratification for DLBCL patients [34, 35]. Other studies also reported the combination of molecular and PET imaging metrics at diagnosis could lead to a more accurate selection of patients [36, 37]. To establish a risk stratification model for PI-DLBCL patients, we encompass the factors that showed the significant prognostic value in multivariate analysis. The grading system, including clinical and laboratory information (NCCN-IPI), volume and metabolism (TLG), as well as molecular profile (cell-of-origin), showed a more favorable capability to stratify patients in different groups separated by the number of risk factors compared with NCCN-IPI alone.

This study has several limitations. First, our analysis is based on a single-center retrospective analysis with a limited number of patients. Multicenter prospective studies with many more patients are needed to avoid potential bias in analysis results. Additionally, the differences in threshold used for delineating the tumor when calculating TMTV and TLG might result in inconsistencies among studies. While SUV ≥ 2.5 as a marginal threshold method was reported to be easier to utilize in practice [38], it is more prone to variability by different reconstruction protocols or PET/CT systems than the method using threshold of 41% SUVmax recommended for tumor imaging by the European Association of Nuclear Medicine [17]. To date, the proper method to calculate and identify volumetric parameters is still under debate and warrants further research. Besides, the sum of volumes obtained from the manual positioning of ROI would be affected by a systematic error due to the operators.

In summary, our study focused on the prognostic value of PET/CT quantitative parameters in PI-DLBCL. The results of our study suggest that NCCN-IPI, non-GCB as well as TLG could be prognostic factors of PI-DLBCL. The grading system based on NCCN-IPI, non-GCB and TLG could more accurately predict the prognosis of patients and guide treatments.