Introduction

Dynamic susceptibility contrast–enhanced perfusion-weighted imaging (DSC-PWI) is the most widely used advanced imaging technique for the diagnosis and follow-up of gliomas. The European Society of Neuroradiology highly recommends using it in this specific setting [1]. Perfusion data analysis has been proven effective in differentiating low- and high-grade gliomas or in differentiating radiation-induced reactions from tumour progression [2, 3]. However, there is a high variability in the published perfusion parameter thresholds, raising the question of the reliability of this imaging modality for patient management [4]. These discrepancies may be due to differences between institutions in both acquisition and processing, such as the use of different magnetic fields, pre-load of gadolinium and different post-treatment algorithms [5]. Additionally, the commonly used method to evaluate tumour perfusion and to calculate its maximum relative cerebral blood volume (rCBVmax) could also be responsible for this high variability.

During DSC-PWI, the first pass of a bolus of a paramagnetic agent leads to changes in signal intensity, which are measured over time on T2*-weighted echo planar images. Cerebral blood volume, which is proportional to the area under the signal intensity-time curve, correlates with the microvessel density of gliomas and is typically evaluated in the “hotspots” of the tumour [6]. For this, the radiologist manually draws regions of interest (ROI) in the hypervascularized portions of the tumour, identified on the colour CBV map [7]. For absolute quantification, normalization with an internal reference ROI is needed, usually drawn in the normal-appearing contralateral white matter, to obtain the tumour hotspot rCBV. This method heavily relies on the radiologist’s subjective assessment of the lesion’s vascularization and allows only a few areas within the tumour to be assessed. In heterogeneous tumours, such as glioblastomas, this bias can be particularly detrimental [8].

High inter-observer variability of rCBVmax, as well as of the reference ROI, has previously been reported with the hotspot method, but this seems to be accepted as no alternative has been proposed in clinical practice [9, 10]. With potentially different radiologists taking care of the same patient during the longitudinal MRI perfusion follow-up, such variability may lead to an inconsistent evaluation of the patient’s response to treatment. Improved reproducibility of MRI perfusion parameters would also allow values to be compared across studies. In addition to the standardization of DSC-PWI protocols, there is a critical need to develop a reliable analysis method that is less subjective and less selective than the hotspot method. Volume approaches may be a solution and have been proposed to better characterize lesions using CBV histogram patterns or the fractional tumour burden (pMRI-FTB) [11]. pMRI-FTB has already been correlated to histologic tumour fraction and can differentiate treatment effect from tumour recurrence [12]. Nevertheless, this type of parameter remains little used in current practice and is instead reserved for research activities. We have developed a similar parameter that assesses the hypervascularized fraction of a tumour by measuring the fraction of pixels in the whole tumour volume displaying an rCBV above 2: %rCBV > 2. By analysing the totality of the pixels contained in a lesion, volume methods allow the calculation of various parameters that are potentially more representative and more reproducible. However, only a few reproducibility studies on volume methods have been published so far [9].

The aim of our study was to compare the inter- and intra-observer reproducibility of the common perfusion parameter, rCBVmax, calculated either with the hotspot method or a volume method. Secondly, we aimed to investigate the inter- and intra-observer reproducibility of our new volumetric parameter, %rCBV > 2.

Method

Study population

For this retrospective study, informed consent was waived.

Between January 2017 and December 2018, 30 newly diagnosed glioblastomas consecutively acquired in our radiological department were included. The selection criteria were as follows: (a) subjects were over 18 years old; (b) they had a histopathological diagnosis of glioblastoma according to the World Health Organization 2016 classification; (c) they had an initial 3 T-MRI in our institution, including a DSC-PWI sequence and a contrast-enhanced gradient-echo 3D T1-weighted imaging (CE-T1WI).

MRI protocol

Images were acquired using two 3 T systems (Magnetom Skyra, Siemens Healthcare and Achieva, Philips Medical System). The imaging protocol included at least axial spin-echo T1-weighted imaging and axial fluid–attenuated inversion recovery imaging (FLAIR), followed by DSC-PWI data and 3D CE-T1WI.

DSC-PWI was acquired with a gradient-echo echoplanar imaging technique during the first pass of a standard bolus of gadolinium contrast without pre-bolus. The imaging parameters were TR 1710 ms, TE 20 ms, slice 4 mm, flip angle 75° (Magnetom) and TR 1657 ms, TE 40 ms, slice 4 mm, flip angle 75° (Achieva), in plane voxel size 2.1 × 2.1 × 4.0 mm. During 45 consecutive echoplanar imaging scans lasting 1 minute 30 s, with 15 s for baseline signal intensity measurements, an intravenous bolus injection of 0.2 ml/kg gadolinium chelate (gadoteric acid, Dotarem®, Guerbet, France) was administered at a flow rate of 5 ml/s followed by a 20-ml saline flush.

3D CE-T1WI (MPRAGE) data were acquired with the following parameters: TR 1670 ms, TI 970, TE 2.30 ms, slice 1 mm, flip angle 8° (Magnetom) and TR 1750 ms, TI 942 ms, TE 2.19 ms, slice 1 mm, and flip angle 8° (Achieva). FLAIR data were acquired with the following parameters: TR 8000 ms, TI 2500, TE 100 ms, and slice 3 mm [13].

Image post-processing

DSC-PWI data were post-processed with a constructor-independent commercial software, using automatic AIF selection, a unidirectional leakage correction algorithm to calculate the contrast-agent extravasation-corrected CBV maps and automatic rigid 3D co-registration of series (Olea Sphere 3.0 SP-6, Olea Medical, La Ciotat, France) [14].

First, according to the classical Wetzel method [7], we obtained the maximum relative CBV of the tumour (rCBVmax) by placing ROIs within the tumour normalized by contralateral white matter. For the tumour values, 3 to 4 pre-shaped circular ROIs, ranging from 40 to 60 mm2 were manually drawn on the CE-T1WI images. To search for the maximum value, they were placed on multiple slices covering the enhancing tumour that visually appeared hypervascularized on the colour CBV maps. Care was taken to avoid areas of necrosis, cysts or non-tumour macrovessels. The highest CBV obtained among these 3 to 4 ROIs was considered as the maximum CBV of the tumour. Then for normalization, one comparable pre-shaped circular ROI was placed in the normal white matter of the contralateral lobe (CBVWM), to obtain a relative CBV.

Secondly, a volume analysis was performed as follows: (1) manual segmentation of the contrast-enhanced delineated lesion on the 3D CE-T1WI images on all cross-sections, including central necrosis; (2) volume masks were transferred on the co-registered CBV maps, followed by visual inspection; (3) the maximum CBV value and the volume of the segmented lesion were automatically generated by the software; (4) the reference ROI, which measured between 1 and 1.5 cm3, consisted in 3 ROI delineations drawn freehand in the contralateral white matter at the upper, middle and lower levels of the tumour; (5) the absolute CBV values of each pixel contained in the tumour volume were extracted for analysis and normalized with the mean CBV value of the reference ROI (CBVWM); (6) the ratio between the number of pixels with a value above the threshold of 2 and the number of pixels contained in the volume defined our volumetric parameter: %rCBV > 2. As various thresholds have been published depending on the institution and clinical context [12, 15], we chose to use an institutional threshold of 2, predetermined in a previous work [16].

Lesion size was estimated by calculating the average volume of the enhancing tumour segmented by each observer.

Figure 1 illustrates the two methods of post-processing.

Fig. 1
figure 1

Flowchart of the two different post-processing methods. (a) and (c) give axial plans of 3D contrast–enhanced T1-weighted images (CE-T1WI); (b) and (d) show colour maps of cerebral blood volume (CBV). 1st line: the hotspot method, by manually drawing regions of interest in the enhancing lesion on 3D CE-T1WI (a) that appeared hypervascularized on the CBV map (b); and 2nd line: the volume method obtained after manual segmentation of the entire enhancing lesion on each slice on the 3D CE-T1WI (c) then reported on each corresponding slice of the CBV map by rigid coregistration (d). The highest absolute tumour CBV (CBVtum) was normalized with a reference contralateral white matter (CBVWM) to calculate maximum relative CBV (rCBVmax). %rCBV > 2 was only calculated with the volume method

Observers

Three observers (M.Ro and G.A, certified neuroradiologists with 5 years of experience and M.Ra, a 5th-year radiology resident) evaluated the included cases within 8 weeks. They independently performed the tumour segmentation on the CE-T1WI and calculated all the perfusion parameter measurements. All observers were familiar with the principles of perfusion MRI imaging and were blinded to patient history. Intra-observer variability was assessed by observer no. 1 (M.Ro), who performed a second evaluation after 8 to 12 weeks, using the same dataset in random order and the same technique.

Statistical analysis

Statistical analysis was performed on MedCalc statistical software version 18.11.6 (MedCalc Software bvba, Ostend, Belgium; https://www.medcalc.org; 2019).

The mean and standard deviation of all CBV measurements were calculated. All the data were tested for normality using Shapiro–Wilk tests. If needed, a log transformation was applied to normalize data.

First, we used the intraclass correlation coefficient (ICC), a statistical criterion, to determine multiple inter-observer and intra-observer reproducibility in perfusion parameter measurements. For intra-observer reproducibility, only the measurements of the first evaluation by observer no. 1 were used. For every pairwise combination of observers, the Lin’s concordance correlation coefficient (LCC) was used, since this combines precision (Pearson correlation coefficient) and accuracy (bias correction factor), contrary to ICC. ICC was calculated with a two-way random model, with consistency, on single measures. Agreement was considered using standard guidelines [17] as follows: ICC or LCC < 0.4 = poor, 0.4–0.59 = fair, 0.6–0.74 = good and > 0.74 = excellent. ICC and LCC are reported with their 95% confidence intervals (CI).

Second, we constructed Bland–Altman plots, a clinical criterion assessing the absolute difference between methods. Visual inspection of the graph shows the data dispersion and individual differences between measures. If the two measures present a wide range of variation, it is supposed that there is no concordance between them. If there is a concordance between two measures, the different values are scattered along the zero line [18].

The study was designed according to Walter et al., based on α = 0.05 and β = 0.20 and assuming that intraclass correlation (ρ) would equal 0.6 (ρ0) for the hotspot method and 0.8 (ρ1) for the volume method. The optimal number of observers was n = 3, and the required sample size was k = 26 [19].

To evaluate the statistical difference in correlation coefficients between the two methods, the hotspot was considered as the reference and p > 0.05 was considered statistically significant. For the fractional parameter, calculated only by the volume method, agreement was considered significantly satisfactory when the lower limit was greater than 0.60 for a concordance coefficient, corresponding to a match that was at least “good”.

Results

In total, 27 consecutive patients were included. 3 patients were excluded from the cohort because it was not possible to segment their lesion on the 3D CE-T1WI (ill-defined margins or < 1 mm3) or because magnetic susceptibility artefacts did not allow for DSC assessment.

There were 11 women and 16 men with a median age of 63 years (range: 25–85 years). The average lesion size was 36.97 cm3 ± 23.12. Inter-observer reproducibility for the volume of segmented lesion was excellent (ICC = 0.99).

The mean and standard deviation of CBV measurements for the three observers with both methods are shown in Table 1.

Table 1 Mean measurements and standard deviation of maximum relative CBV of the tumour (rCBVmax) and the reference ROI (CBVWM)

Inter-observer reproducibility of rCBVmax and the reference ROI

Using the hotspot method, inter-observer reproducibility was fair for rCBVmax (ICC 0.46 [0.22–0.67]) and CBVWM (ICC 0.53 [0.30–0.73]). With the volume method, reproducibility was good for rCBVmax (ICC 0.65 [0.46–0.80], p = 0.34) and excellent for CBVWM (ICC 0.84 [0.72–0.92], p = 0.04) (see Table 2).

Table 2 Inter- and intra-observer intraclass correlation coefficient (ICC) values of maximum relative CBV of the tumour (rCBVmax), CBV of the reference ROI (CBVWM) measurements obtained with the hotspot and the volume methods and of the fractional parameter %rCBV > 2

Pairwise inter-observer reproducibility evaluated by LCC was poor to fair for rCBVmax (LCC range 0.30–0.47) and for CBVWM (LCC 0.38–0.57) using the hotspot method. With the volume method, reproducibility was fair to excellent for rCBVmax (LCC 0.57–0.75) and good to excellent for CBVWM (LCC 0.74–0.89). The comparison of the measurements by observers no. 1 and no. 3 demonstrated that the volume method significantly improved the reproducibility of each of the perfusion parameters (Table 3).

Table 3 Pairwise inter-observer Lin’s concordance correlation coefficient (LCC) values of maximum relative CBV of the tumour (rCBVmax) and the reference ROI (CBVWM) measurements obtained with the hotspot and volume methods and of the fractional parameter %rCBV > 2

Figure 2 compares Bland–Altman plots of the two metrics for observers no. 1 and no. 3: the mean differences were lower with the volume method than with the hotspot method. (a) illustrates the systematic overevaluation of rCBVmax values calculated with the hotspot method by observer no. 1 compared to no. 3.

Fig. 2
figure 2

Inter-observer agreement of maximum relative CBV and the reference ROI calculated with the hotspot and the volume methods for observers no. 1 and no. 3. Bland–Altman plots were used to evaluate the inter-observer agreement with the hotspot (a, c) and the volume (b, d) methods. The differences between the measurements of the two observers are plotted on the y-axis and the means of the two evaluations are plotted on the x-axis. The solid (blue) line represents the mean difference and the dashed (red) line represents the 1.96 SD

Intra-observer reproducibility of rCBVmax and the reference ROI

Intra-observer reproducibility of rCBVmax was fair with the hotspot method (ICC = 0.57 [0.25–0.78]) and good with the volume method (ICC = 0.74 [0.50–0.87], p = 0.29). With both methods, intra-observer reproducibility of CBVWM (ICC = 0.82 [0.65–0.92] and 0.91 [0.82–0.96] respectively, p = 0.2) was excellent (see Table 2).

Inter- and intra-observer variability of %rCBV > 2

Our volumetric parameter %rCBV > 2, assessing the fraction of pixels displaying an rCBV above 2 within the whole segmented tumour, showed excellent inter-observer reproducibility (ICC = 0.94 [0.88–0.97] and LCC range 0.90–0.96) and excellent intra-observer reproducibility (ICC = 0.91[0.80–0.96]) (Tables 2 and 3).

Figure 3 shows Bland–Altman plots of both inter- and intra-observer excellent agreement. The mean differences in %rCBV > 2 measurements were 1.2% between observers no. 1 and no. 2, 0.7% between observers no. 1 and no. 3, and 2% between observers no. 3 and no. 2. The mean difference between the two measurements by observer no. 1 was 0.9%.

Fig. 3
figure 3

Inter- and intra-observer agreement of the volumetric parameter %rCBV > 2. Bland–Altman plots were used. The differences between the measurements of every pairwise combination of observers are plotted on the y-axis and the means of the two evaluations are plotted on the x-axis. The solid (blue) line represents the mean difference and the dashed (red) line represents the 1.96 SD

Discussion

Despite a high inter- and intra-observer variability, the hotspot method remains widely used in daily practice to evaluate MRI perfusion parameters, and therefore to characterize tumours and their response to treatment [20]. Our results confirm such a variability of the hotspot method, showing only a fair agreement for the classical parameter rCBVmax, as for the reference ROI, CBVWM. We also demonstrated that with a volume method, these metrics reached a good to excellent agreement and that our fractional parameter, %rCBV > 2, exhibited excellent inter- and intra-observer agreement. Thus, a volume assessment of perfusion parameters is more reliable than the classically used hotspot method. It is consistent with previous results showing the advantages of volume approaches, but these can hardly be strictly compared. As an example, Dijkstra et al. compared the ICCs of small ROIs at maximum perfusion of the tumour with histograms obtained either on freeform 2D on the largest cross-section of the tumour or on freeform 3D on all cross-sections, with freeform 3D showing the greatest agreement [9].

In our work, while using the hotspot method, observer no. 3 obtained lower rCBVmax measurements than observers no. 1 and no. 2. Interestingly, the higher values of rCBVmax calculated by observers no. 1 and no. 2 with the hotspot method, approximated the rCBVmax values obtained with the volume method. Observers no. 1 and 2, who were more experienced, may have better detected areas of high neo-angiogenesis within heterogeneous glioblastomas. Indeed, with the hotspot method, which is only sampling the lesion, the experience of the operator may influence the results. However, in clinical practice, DSC-PWI is not always post-processed by an expert radiologist.

Regarding CBVWM, our results are consistent with a recent study showing a fair agreement (ICC 0.44–0.57) on normal contralateral white matter selected with the hotspot method [10]. This variability directly impacts rCBVmax variability since the tumour CBV must be normalized with this reference. Hence, with observer no. 2, who had the lowest ICC for CBVWM, an excellent agreement for rCBVmax was not achieved even with the volume method. Some authors have proposed using absolute CBV values to differentiate tumour recurrence from radionecrosis of brain metastases [21]. Nevertheless, CBV values are derived from a non-quantitative calculation and may vary between patients and even within the same patient, depending on cardiac output or hematocrit values. Others have proposed using an ROI placed in the centrum semiovale, showing better reproducibility (ICC > 0.74) than ROIs placed arbitrarily in contralateral white matter [10]. For our volume method, we selected 3 free-hand ROIs in the upper, middle and lower parts of the tumour to calculate an average white matter CBV, considerably improving the reproducibility of the reference ROI (ICC 0.84 vs 0.53, p < 0.05).

The volume method not only allows the calculation of rCBVmax but also of other interesting volumetric parameters such as histogram patterns or the tumour fraction displaying a high rCBV, like the fractional tumour burden (FTB). Using an institutional threshold of 1.8, this type of fractional parameter has been correlated to the malignant histological features of recurrent neoplasm [15]. Using the threshold of 1.75 to define a high pMRI-FTB, this parameter could help to differentiate treatment effect from tumour recurrence, helping to inform clinical decision-making [22]. However, to our knowledge, there has been no comparative study of inter- and intra-observer agreement of fractional parameters with usual ones, such as rCBVmax. In this study, our fractional parameter, %rCBV > 2, yielded an excellent inter- and intra-observer agreement (ICC = 0.94 and 0.91) providing a potential reliable tool for patient follow-up.

We believe that this fractional parameter, based on analysis of the entire lesion, would provide a better assessment of the hypervascularization of heterogeneous tumours such as glioblastomas, than measurements on one or a few selected points. This is especially the case during follow-up, where treatment-related changes such as pseudoprogression or radionecrosis remain a diagnostic challenge [23]. Lesion heterogeneity weakens the capacity of any sampling method, such as ROI analysis or even stereotactic biopsy, to make a reliable diagnosis [24]. A more representative and more reproducible perfusion parameter could help with these difficult diagnoses. %rCBV > 2, unlike pMRI-FTB, includes the central necrotic component of the glioblastoma, which could facilitate its use in current practice, but this requires further studies for clinical validation.

Our study has some limitations. It was performed exclusively on the initial MRIs of glioblastomas before any treatment was started. A recent inter-reader variability study showed that ICC considerably worsens after treatment, ranging from 0.9 to 1 on baseline MRIs and from 0.48 to 0.76 on post-treatment MRIs [25]. In this multicentre study, variability in tumour segmentation was the most significant factor contributing to inter-reader variability. Only including the baseline MRI in our study could have contributed to the good reproducibility of volume parameters. Secondly, we used a post-processing leakage correction without a gadolinium pre-load dose, which could have resulted in an underestimation of our CBV values [26]. However, as the protocol was the same for each patient, this would not have affected our reproducibility results. Finally, while some authors demonstrated better reproducibility with semi-automatic segmentation in the measurement of rCBVmax of glioblastomas, we chose to rely on radiologist assessments of the tumour volume and to perform the segmentation manually to avoid software errors [27]. However, with our volume method, the fractional parameter reproducibility was excellent. As thousands of pixels are included in the fractional parameter calculation, this may mitigate variation differences in the volume segmentation.

In conclusion, rCBVmax, the reference ROI (CBVWM) and our recently developed fractional parameter, %rCBV > 2, yielded good to excellent inter- and intra-observer agreement when calculated with the volume method but only fair agreement with the commonly used hotspot method. For heterogeneous tumours such as glioblastomas, the use of a volume method for analysing DSC-PWI data as well as fractional parameters appears to be more reliable than the hotspot method, potentially improving radiological assessment during patient follow-up. Despite the existence of automatic segmentation tools, these are not commonly available in clinical practice and this type of study emphasizing the advantages of volume approaches could encourage manufacturers to develop and offer simpler and more effective segmentation solutions.