Dear Sir,

Given the absence of a recognized gold standard for assessing the metabolic tumor volume (MTV) in FDG PET, it is important to discuss the strengths and weaknesses of the different methods used in DLBCL.

In their retrospective series of 147 patients, Ylyas et al. [1] tested three different fixed thresholding methods: SUV ≥ 2.5, ≥ 41% of the SUVmax and a liver uptake dependent threshold as suggested in PERCIST. They confirmed the strong prognostic value of baseline MTV, regardless of the method used, consistent with previous findings [2, 3]. These results deserve further comments.

The median MTV reported in this study with the 41% SUVmax method is surprisingly low (165 cm3) with a large difference with the 2.5 method (~590 cm3). Although the population included 70% of advanced stage patients, their median is much lower than values reported in previous studies employing 41% SUVmax method in DLBCL, with medians of 258 cm3 [4], 315 cm3 [5], 320 cm3 [6], and 373 cm3 [7]. Using a threshold based on liver SUV, Kostakoglu et al. [8] reported a median of 336 cm3 in 1,334 DLBCL patients.

Ilyas et al. used either house-made or commercial software. VOI were obtained automatically for 2.5 and 41% thresholding after the operator-selected tumor sites using a single click for each region. Although the algorithms used are not described, it is likely that the volume was determined by region growing techniques dependent on a threshold and on the choice of the click position [9, 10]. Figure 6 of [1] clearly shows that a single VOI or click for all mediastinal nodes excludes a large part of the tumor. The authors wrote that they have edited additional volumes manually, but did not mention how many VOIs per patient, which makes the results difficult to interpret. The manual VOIs selection process, before applying any segmentation, has a major impact on the delineated metabolic volumes and must be determined carefully when applying the 41% method which is a weakness of this thresholding method. Furthermore, the Ilyas et al. results have been obtained using scans acquired 90 min post-injection and may not hold for recommended 60 min post-injection scans.

The 2.5 method, supported by the authors, also includes a number of severe limitations. First, due to the limited spatial resolution of PET systems causing partial volume effect, the apparent activity of a tumor region depends on its volume, and there is no single absolute threshold that can accurately estimate the volume of a tumor regardless off the tumor volume and uptake [11]. In Figure 6 of [1] and in the figure of a recent editorial [12], it is clear that tumor peripheral background, i.e. non tumor regions, is included in the tumor volume when the 2.5 threshold method is used. The necrotic part of a bulky mass with high uptake could also be included in the volume as a result of partial volume. In addition, the authors report a 95% confidence interval of 313.2 cm3 when the same observers used two different software applications, which is about 30% of the reported mean MTV value (~990 cm3), demonstrating the high impact of the practical implementation of the 2.5 threshold. The limit of agreement as defined in a Bland Altman plot should be centred around 0 given that the two implementations yield very similar mean, but it is not. The median difference of the Bland Altman plot (Figure 1) [1] close to zero does not mean that the two implementations agree well, as stated by the authors; rather, this is the mean difference, also close to zero (Table 1) [1] that shows that there is no systematic difference between the two implementations. The latest advances in PET reconstruction, such as modeling of the point spread function of the imaging system, can lead to significant increases in SUV values, questioning the relevance of relying on the 2.5 threshold established many years ago when PET images had both poorer spatial resolution and contrast [13].

Regarding the cut-off setting, as for early response assessment on interim PET with the Deauville scale, its choice depends on its primary objective which can be escalating or deescalating the treatment, i.e. lower MTV threshold for increased sensitivity and volume MTV threshold for increased specificity. The cut-off should also change with the patient characteristics, such as localized or advanced stage and treatment. The worse the prognosis, the lower the MTV to identify patients with high probability of events. Therefore the statement of Ilyas et al. that “the cut-off might have been expected to be higher in an older population” is questionable. Elderly patients have multiple comorbidities and poor performance more frequently. Consequently, small tumor volume has more detrimental impact than in younger patients.

In conclusion, although all threshold-based methods discussed have their own limitations, MTV is a promising tool in DLBCL. More advanced threshold-based segmentation methods accounting for background activity and/or signal-to-background ratio might improve delineation and make it less dependent on the initial VOIs delineation. Cooperative studies between research groups are needed to reach an agreement and produce recommendations that could be helpful for the end-user willing to calculate MTV in lymphoma patients.