Introduction

18F-FDG PET/CT (PET) is the recommended imaging modality for the staging and restaging of FDG-avid lymphoma, including aggressive non-Hodgkin’s lymphoma (NHL) [1]. The overall prognosis of patients with diffuse large B-cell lymphoma (DLBCL), the most common subtype of NHL, has improved with the addition of rituximab to chemotherapy, with a 5-year progression-free survival of 60%. Despite this, nearly a third of patients treated with standard immunochemotherapy (R-CHOP = cyclophosphamide, doxorubicin, vincristine, prednisone plus rituximab) will have refractory disease or relapse [2]. Identification of these high-risk patients using traditional prognostic factors is limited. In recent years, new metrics derived from staging PET estimating the overall tumor burden such as metabolic tumor volume (MTV), or total lesion glycolysis (TLG), have been used to predict progression-free and overall survival in patients with lymphoma, including aggressive NHL [3,4,5,6].

Radiomics is a rapidly evolving research field which refers to the extraction and analysis of large volume quantitative imaging data from medical images, such as CT or PET in a minable form to build predictive models associating texture features to phenotypes, genetic and proteomic signatures, or treatment outcomes [7, 8]. These parameters may capture intra-tumoral biological heterogeneity associated with cellular and molecular characteristics such as cellular proliferation, fibrosis, necrosis, metabolism, blood flow and angiogenesis, hypoxia, and receptor expression [9]. Texture analysis typically involves the accumulation of multi-dimensional histograms of image intensities or applies calculation of advanced features such as image frequencies. A large number of nonlinear metrics are computed from these intensity distributions by measuring properties such as heterogeneity, directionality, and entropy. This results in a large set of features that can subsequently be tested for accuracy in predicting treatment outcomes, despite the fact that the physiological underpinnings of these measures are largely unknown. Typically, each tumor segmented from pre-treatment images will be characterized with many dozens of such features, and some of these, considered in isolation, have been moderately successful in predicting the success or failure of specific treatments on certain cancers [7,8,9,10]. In lymphoma, whole-body metabolic tumor volumes and tumor texture features, if validated, may enable more accurate risk stratification and tailoring of therapy, reducing toxicity in patients with favorable outcome and providing intensified therapy regimens for patients at high risk for treatment failure.

One of the barriers for the widespread use of tumor radiomic signatures in clinical practice is whole-body tumor segmentation, especially in lymphoma where disease can be disseminated. Automated tumor segmentation methods have reduced the time required to segment tumors, especially in PET, although with the software used in the current study, as texture feature extraction is available for solitary lesions, the analysis requires isolated segmentation of each lesion. This is challenging and time-consuming especially for tumors in close proximity to metabolically active physiological structures such as the heart or kidneys. The purpose of the current study was to determine whether metabolic tumor parameters and radiomics features can predict response to therapy and outcome in patients with aggressive B-cell lymphoma undergoing staging FDG PET/CT.

Methods

This single institution, retrospective study included patients with biopsy-proven aggressive non-Hodgkin’s lymphoma who underwent staging with 18F-FDG PET/CT (PET) at baseline, with available clinical and imaging follow-up data after the end of first-line therapy. Acceptable imaging follow-up at the end of therapy was PET or CT for patients with no residual morphological mass ≥ 1.5 cm (if end of therapy PET was not available) and PET for patients with residual masses at the end of therapy [1]. Exclusion criteria included patients with incomplete clinical and imaging datasets or those in whom disease status could not be confirmed at the end of therapy. The study was approved by the institutional ethics review board and informed consent was waived. Demographic patient data, histology, disease stage, including bone marrow status, presence of bulky disease (which was defined as a mass ≥ 6 cm in maximal diameter), therapy regimen, and outcome data including disease-free survival (DFS) and overall survival (OS) were recorded from electronic patient charts and radiology information system.

Standard of reference

Patients in clinical remission and with no residual morphological abnormality on CT were considered responders. On PET, response to therapy was determined at the end of therapy according to the Lugano classification [1]. Patients were considered as responders to first-line therapy (complete metabolic response) if assigned a Deauville score of 1–3 and nonresponders if assigned a score of 4 or 5 [1].

PET imaging protocol

PET scans were performed in 3D mode with a dedicated in-line PET/CT scanner (Siemens Biograph mCT 40, Siemens Healthcare, Knoxville, TN, USA). Patients were asked to fast for at least 6 h before undergoing the examination. Data were acquired 60–70 min after an intravenous injection of approximately 5 MBq/kg body weight of FDG (up to 550 MBq). First, a spiral CT scan from the skull base to the upper thighs was obtained using the following parameters: 120 kVp; 40–105 mAs; scan width, 5.0-mm reconstructed section thickness, 2.0 mm overlap. On completion of CT, PET scans of the same area were acquired for 3 min/bed position, with 5–7 bed positions per patient. PET was interpreted clinically on a dedicated fused imaging workstation (Thinking Systems, Petersburg, FL, USA).

Tumor segmentation, metabolic tumor volume and texture analysis

For each patient, whole-body tumor segmentation was performed semi-automatically with fixed thresholds including SUV threshold of 3.0 (= SUV3) and SUV threshold of 6.0 (= SUV6) (Fig. 1), and adaptive thresholds with total lesion glycolysis volumes (TLGv) measured with an SUV threshold of 20, 40 and 70% of the maximum intensity in the region (= TLGv20, TLGv40, TLGv70, respectively). For the purpose of tumor texture analysis, an abbreviated tumor segmentation protocol was used. For each patient, between 1 and 3 tumor sites representing the most metabolically active disease areas (site with the highest SUV) were contoured on the baseline PET. For the purpose of analysis, lesions contoured were categorized as lesions with a residual mass (> 1.5 cm in diameter) at the end of therapy, or no residual morphological abnormality. The volume of interest (VOI) of each selected tumor lesion on PET was automatically segmented using an SUV threshold of 3, with segmentation corrections performed manually in consensus by two readers (AP, NT). The entire radiomics feature extraction was performed using Local Image Features Extraction (LIFEx) software (http://www.lifexsoft.org) [11]. Texture features included parameters from gray-level co-occurrence matrix (GLCM—Haralick), neighborhood gray-level different matrix (NGLDM—Amadasum), gray-level run length matrix (GLRLM—Xu), and gray-level zone length matrix (GLZLM—Thibault), and indices from sphericity and histogram. A detailed description of the various texture parameters evaluated can be found at http://www.lifexsoft.org. Conventional and advanced metabolic tumor parameters as well as radiomics features assessed are summarized in Table 1. All parameters evaluated were correlated with response to therapy as per the standard of reference, presence of residual mass at the end of therapy CT, DFS, and OS.

Fig. 1
figure 1

A 27-year-old man diagnosed with stage 4 diffuse large B-cell lymphoma. a Coronal PET image showing metabolically active bulky abdominal lymphadenopathy. b Same image after tumor segmentation with LifeX segmentation tool, using a fixed threshold of SUVmax = 3.0. Each separate confluent disease site is contoured individually and marked with a different color

Table 1 Parameters evaluated included conventional and advanced metabolic indices, and texture and other features

Statistical analysis

Data on categorical variables were reported as frequencies and percentages. Continuous variables were described as means ± standard deviations, along with median values. Summary statistics are reported on each patient and lesion individually. Statistical significance between “response” and “no response” groups was reported using Chi square test for categorical variables, and Student’s T test or Wilcoxon Mann–Whitney test for continuous data. Univariable association of patient (response vs. no response) and lesion outcomes (no residual mass vs. residual mass) with lesion-related variables is analyzed using generalized linear mixed model procedure (PROC GLIMMIX) of SAS software. Univariate Cox regression analysis was used to identify predictors of overall survival (OS) and disease-free survival (DFS) in patient-related variables, as well as odds ratio (OR) for response to therapy. To identify predictors of OS and DFS in lesion data, univariate Cox regression analysis with cluster option was conducted. The statistical significance level was chosen at a two-sided p value of 0.05 or less. SAS v9.4 or R 3.1.3 was used for all statistical analysis.

Results

Patient population

98 consecutive patients with aggressive non-Hodgkin’s lymphoma underwent initial staging with PET between May 2012 and May 2016 at our institution. Sixteen patients were excluded due to incomplete datasets (lack of clinical or imaging follow-up data). The final study cohort consisted of 82 patients with a median age of 55 years (range 18–86), including 40 men (49%) and 42 women (51%). Histologic subtype, distribution of disease stage at diagnosis, and treatment received are summarized in Table 2. Radiotherapy was given to 46 patients (56.1%). 40 patients (49%) had bulky disease. Of the entire cohort, there were 66 patients (80.5%) who were responders, and 16 (19.5%) nonresponders, with residual disease at the end of therapy (Deauville score 4–5).

Table 2 Distribution of histological subtype, stage at diagnosis, and therapy received

Prediction of response to therapy

Clinical and morphological parameters

Of the clinical parameters evaluated, only the presence of bulky disease at baseline was associated with response [12/16 non-responders (75%) versus 28/66 (42%) of responders; p = 0.026]. There was no significant association between response and disease stage (p = 0.53), type of chemotherapy given (p = 0.38), or whether or not radiotherapy was given (p = 0.25).

There were 211 lesions evaluated at baseline in 82 patients with a median of 3 lesions evaluated per patient (range 2–5). At the end of therapy, 47/211 (22.3%) had a residual mass on CT measuring ≥ 1.5 cm in diameter. Absence of a residual mass ≥ 1.5 cm on CT at the end of therapy was a predictive factor for response (OR = 4.3; 95% CI 1.2–15.9; p = 0.028).

Predictors of response to therapy

Of all the whole-body volumes extracted using the designated method, SUV3 and SUV6 were the best predictors of response (Table 3). Univariable generalized linear mixed model regression analysis of conventional lesion indices evaluated at baseline, including SUVmin, SUVmean, SUVmax, SUVpeak, and the standard deviation within the ROI (SUVstd), were unable to predict treatment outcome (Table 4). Similarly, none of the tumor texture features evaluated were able to reliably predict therapy response (Online Resource 1).

Table 3 Metabolic tumor volumes (in mL) at baseline as predictors of response to therapy
Table 4 Conventional indices at baseline as predictors of response to therapy

Predictors of residual mass at the end of therapy CT

The most metabolically active lesions were associated with the presence of a residual mass at the end of therapy (OR, 5.64; p < 0.001). The metabolic tumor parameters which correlated with the presence of a residual mass at the end of therapy were SUVmax (OR, 0.97; p = 0.037), SUVpeak (OR, 0.95; p = 0.003), and TLGv (for every increase in 1000 cc; OR 0.57; p < 0.001). Radiomic features which correlated with the presence of a residual mass at the end of therapy were skewness (OR, 3.65; p = 0.001), entropy (OR, 0.03; p = 0.016), compacity (OR, 0.48; p < 0.001), SRE (OR, 0.18; p = 0.07), SRHGE (OR, 0.22; p = 0.001), LGRE (OR, 0.29; p = 0.001), SZHGE (OR, 0.31; p = 0.018), LGZE (OR, 0.24; p = 0.001), GLNU (OR, 0.81; p < 0.001), and ZLNU (OR, 0.35; p < 0.001).

Predictors of DFS and OS

For the available surveillance period (median 18 months; range 3–54), 67/82 patients (81.7%) were disease free at the last clinical follow-up. Of the remaining 15 patients, 12 had recurrent disease and 3 had refractory disease. There were a few indices from the GLZLM matrix which correlated with DFS, namely, LZE (p = 0.045), LZLGE (p = 0.039), and GLNU (p = 0.013). Each 1000 unit increase in LZE, 10 unit increase in LZLGE, and 100 unit increase in GLNU is associated with a 4, 6, and 8% increase in the expected hazard of recurrence, respectively.

There were 6/82 (7.3%) patients who died during the surveillance period. Age, gender, and presence of bulky disease did not have any significant association with OS (p = 0.94, 0.39 and 0.88, respectively). The presence of stage IV disease was associated with expected hazard of death that is 9.4 times higher than those with stage I–III (p = 0.054). The predictors of overall survival included SUV3 (p = 0.0028), SUV6 (p = 0.012), and TLGv20 (p = 0.03). Kurtosis was the only radiomics parameter which correlated with OS. One unit increase in kurtosis is associated with 67% increase in expected hazard of death (p = 0.035).

Discussion

Previous reports have suggested that in patients with DLBCL, high SUVmax on baseline PET may be a significantly poor prognostic factor for both progression-free survival and OS independent of the International Prognostic Index score [12,13,14]. These studies included different SUVmax thresholds; for example, Chihara et al. [12] used SUVmax cutoff of 30 and Miyazaki et al. [14] used a cutoff of 15. When assessed as a continuous variable, we could not find a strong correlation between SUVmax and response to therapy.

Studies have also suggested a correlation between whole-body tumor volume and outcome [15]. Measuring tumor volume on CT is labor intensive and clinically challenging due to complex lesion definition. With PET, several strategies for semiautomated whole-body tumor contouring have been used including fixed or relative thresholds, adaptive thresholds, or gradient based (adaptive region growing). For the semi-automated methods, the isocontour described as A41 (41% of max pixel value) generally corresponds to the best with the morphological dimensions of the tumor, but only for higher tumor to background ratios and homogeneous backgrounds [16]. In our study, using a fixed threshold of SUV3 or SUV6 was the best predictor of response to first-line therapy and overall survival.

Unique struggles may be observed when contouring tumor sites in patients with lymphoma. Often, tumors are distributed in many different sites, with varying background tissues, including regions of low or high physiological metabolic activity for nodal and extranodal sites (e.g., nodal mass in mediastinum overlying the left ventricle of the heart, or infiltration of renal parenchyma). Furthermore, there are heterogeneous volumes of disease with small-volume and bulky sites, and variable SUV at various disease sites. Due to these inherent challenges, we evaluated an abbreviated tumor segmentation method for the assessment of texture features, which may be easier to implement in routine clinical work, where only a few lesions are contoured including the most metabolically active site of disease. None of the tumor texture features evaluated was associated with therapy response as determined at the end of therapy, although several features, including entropy, skewness, compacity, and certain features from the GLRLM matrix correlated with the presence of a residual mass at the end of therapy. Previous research has suggested that the presence of a residual mass > 2 cm on CT in patients with a negative PET after completion of chemotherapy for DLBCL may be a predictor of inferior OS and PFS [17]. This may highlight the limitation of using the Lugano classification as a single surrogate of outcome. Nonetheless, we chose the Lugano classification as the outcome measure in the current analysis, as it is the current standard for therapy response assessment and guides further management of patients being treated with curative intent. Interestingly, one of the tumor texture features assessed in the current study, GLNU, a parameter measuring non-uniformity of gray-level intensity values in the image, correlated with both presence of a residual mass at the end of therapy and DFS. A higher GLNU value correlates with greater gray-level non-uniformity and is associated with a shorter DFS.

The current study has a few limitations. Foremost, the study cohort is relatively small, especially given the large number of available parameters in texture analysis. As we have included all eligible patients in our institution, future studies should include data from other centers to validate our findings. As in all texture-based studies, our study is limited by contouring and lesion selection techniques. We attempted to employ an abbreviated and reproducible method of contouring the PET data using a widely available software package, but this proposed method will require validation in future studies. Furthermore, our texture analysis included data from a single PET-CT scanner and would require validation on data obtained from other equipment before it can be used clinically.

Conclusions

In conclusion, in patients with aggressive NHL, MTV using a threshold of SUV3 or SUV6 correlates with response to therapy and OS. Various tumor texture features correlate with the presence of a residual mass on CT, previously shown to represent a negative prognostic marker; and other radiomics parameters correlate with DFS and OS. These predictive parameters could be used as prognostic tools, if validated in future prospective trials with datasets from various institutions to ensure reproducibility across various scanners and image reconstruction algorithms. The relative weight of each of these biomarkers in predicting treatment response and how they compare with other known factors for high-risk disease remain to be determined.