Introduction

In dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) a series of consecutive T 1-weighted MR images are acquired after the administration of a bolus of paramagnetic gadolinium-based contrast agent (CA). As the bolus of CA arrives at the tissue of interest it modifies the inherent relaxation rates of the protons in that region, effecting a change in the measured signal intensity. By collecting a consecutive series of images, a signal intensity-time curve is produced at each imaging voxel, from which contrast concentration–time curves (CTCs) can be derived. Phenomenological parameters, such as the wash-in and wash-out rates can be calculated from the CTCs, and since the shape of the CTCs is related to the perfusion conditions of the tissue being imaged, pharmacokinetic (PK) modelling can also be used to estimate physiological parameters, such as the volume transfer coefficient (K trans) [min−1] and extravascular-extracellular volume fraction (v e), providing quantitative information related to the vascular properties of the tissue.

Prostate cancer (PCa) is known to have increased vascular density, permeability, and interstitial fluid volume compared with healthy prostate tissue, leading to tumorous tissue exhibiting earlier enhancement and higher washout, with higher-grade tumours tending to have higher rate constants [1, 2]. Since these tissue properties are known to vary significantly between malignant and benign prostate tissues [3], phenomenological curve-shape analysis and PK modelling of DCE-MRI data shows great promise as a potential non-invasive gold standard imaging technique for PCa detection [1], localisation [4, 5], and grading [2, 6]. However, the widespread acceptance of DCE-MRI has been hindered by discrepancies in the published results, mainly as a result of methodological differences between studies. The quality of MR-derived parameters is known to be influenced by the experimental settings governing the data acquisition and hence the measurements [7], with adequate acquisition duration (AD; the overall length of time that the CTC is acquired for from the start of the CA enhancement) and temporal resolution (T res; the time-spacing between subsequent image volumes) being required for accurate PK model fitting [8, 9]. For optimal parameter estimation, the T res should be sufficiently high to adequately capture the rapidly changing features of the CTC (i.e. the wash-in phase) and the AD should be long enough to include the slowly changing features (i.e. wash-out phase). Additionally, long ADs can introduce motion artefacts in patient imaging, which can be an additional source of measurement errors [8], while conversely, inadequate ADs can lead to underestimation of perfusion parameters [9]. Considering the range of DCE-MRI acquisition and analysis protocols reported in the literature, and the large variability in data derived from these varied approaches, it is apparent that a reliable ‘gold standard’ method is required for determining the measurement accuracy of the CTCs, as well as derived parameters, with errors calculated against known ground truth values [911].

Previous attempts to investigate DCE-MRI measurements have used model phantom systems to physically mimic permeable microvasculature [12, 13]. However, in none of these systems were the ‘true’ or ‘ground truth’ values known, where the ‘ground truth values’ in this context refer to an accurate and precise knowledge of the actual CTCs within the object being scanned. Indeed, only when such ground truth CTCs are known can the accuracy of a DCE measurement be determined. Additionally, none of these previous designs have allowed for MR measurements to be made in an environment which realistically challenges the prostate DCE-MR imaging protocol by closely mimicking conditions observed in vivo. Theoretical approaches have also been used to investigate the effects of T res and AD on PK parameter measurements [9, 14]; however, although beneficial in the planning of MRI protocols, a purely theoretical approach does not allow for the full assessment of the MR scanner’s ability to accurately measure known reference CTCs using a particular protocol. Other studies have used retrospectively resampled patient prostate data to investigate the effects of reduced AD and T res [10, 15, 16], but, again, the ground truth values are not known and hence the measurement accuracy of the resampled data (or even the fully sampled data) cannot be assessed.

Previous DCE-MRI studies have reported using a wide range of both T res (~2–30 s) and AD (~90–760 s) values for the acquisition of data in the prostate [1720]. The aim of the present work was to use a recently developed novel DCE-MRI prostate phantom test device, in which MR-measured CTCs could be compared with precisely known ground truth values [21], to quantitatively investigate the effects of T res and AD on the accuracy of MR-measured CTCs, as well as phenomenological and PK parameters derived from these CTCs, across a range of T res and AD values used in previously published prostate DCE-MRI studies.

Materials and methods

MRI measurements

DCE-MRI data were acquired using a 3T scanner (Achieva TX, Philips, The Netherlands) and a 32-channel phased array detector coil. A 3D spoiled gradient echo imaging sequence with multi-transmit was used, with the following scan parameters: repetition time (TR) = 4.3 ms; echo time (TE) = 1.4 ms; flip angle = 10°; FOV = 224 × 224 × 80 mm3; spatial resolution = 1 × 1 × 4 mm3; no parallel imaging (R-factor); and number of signal averages (NSA) = 1; resulting in a T res of 8.1 s. This protocol was then modified by applying parallel imaging to produce protocols with T res values of 2, 3.8, and 5.3 s [R-factor (AP/FH) = 2/2, 1.8/1.3, and 1.3/1.3, respectively]. Additionally, the NSA was increased to produce protocols with T res = 16.3 and 24.4 s (NSA = 2 and 3, respectively). This resulted in six scanning protocols, with T res values across a range which was in line with those used in previously published prostate DCE-MRI studies. Each 720-s experimental run consisted of at least 5 dynamic scans measured at the baseline CA concentration and with the remaining dynamics measured for a further 600 s (i.e. AD = 600 s). The fully sampled MR data thus acquired was then retrospectively truncated to produce sub-sets of data at AD = 480, 360, 240, 180, 120, 60, and 30 s.

Contrast concentration–time curves (CTCs)

‘Healthy’ and ‘tumour’ tissue-mimicking CTCs were generated using the standard Tofts model [22], a model arterial input function (AIF) [23], and input PK parameters taken from representative published patient data [K trans = 0.14 (‘healthy) and 0.36 (‘tumour’); v e = 0.47 (‘healthy) and 0.55 (‘tumour’)] [18, 24]. The CTCs consisted of 120 s of baseline data, followed by a 600-s physiological CTC shape mimicking those typically observed in healthy or tumorous tissue. Ground truth CTCs were established from repeated measurements made using a highly precise (±0.4%), high spatiotemporal-resolution optical imaging system [21]. Concordance correlation coefficient (CCC) analysis was performed on the repeat optical measurements, using Statistical Package for the Social Sciences (SPSS; v.22, IBM Corp, USA), with high reproducibility calculated between repeat optical experimental CTC runs (CCC = 0.992, 95% confidence intervals = {0.990, 0.993}).

Phantom

The anthropomorphically shaped phantom device used herein allowed for the simultaneous production of two distinct CTCs, for which ground truth values were known a priori, within an environment which closely mimics the male pelvic region. The CTCs were produced by means of a custom-built, computer-controlled, four-pump flow system, by alternating the instantaneous volume of solutions pumped from two reservoirs, each containing different concentrations (0.15 and 1.5 mM) of a contrast agent (Multihance, Bracco, USA). The mixing of these solutions in turn created the desired concentration of the CA at a particular time-point, giving rise to the desired CTC shapes, which was subsequently measured within two ‘measurement chambers’ located inside a prostate-mimicking region of the anthropomorphic phantom. The measurement chambers were designed to ensure an optimally uniform distribution of fluids at each instant of time at the low flow rate used (1.5 ml s−1), thereby avoiding any mixing bias within the regions of interest placed inside each measurement chamber in the MR images.

The phantom system was calibrated using a custom-built, high-spatiotemporal resolution optical imaging system, which was set up using the same tubing configuration and lengths as used for the main phantom device, with two pumps producing CTCs within a single measurement chamber, set within a custom-built, light-sealed enclosure. A highly-stable endoscopic light source (Fujinon Eve Σ400, Fujifilm Corp, Japan) was used in conjunction with a high-resolution CMOS camera (Canon 50D, Canon Inc, Japan) with a 4752 × 3168 array for detection (4 × 4 µm2 pixels; T res = 1 s). The signal intensity was modified over time to produce the required CTCs by adjusting the concentration of black dye (used as a CA surrogate in the system) in the measurement chamber, with precise concentrations being subsequently calculated using a calibration method. Further details of this novel phantom device and calibration system can be found in Ref. [21].

Data analysis

The AIF used in this present work for the analysis of all optical and MR data was generated using code developed in MatLab (R2015b, MathWorks, USA) at temporal resolutions matching those used in the optical and MR studies [23]. CCC values were calculated between the MR-measured and ground truth CTCs (AD = 600 s); the CTCs were temporally aligned (MR-measured and ground truth), and ground truth CTC temporally resampled to match the MR data points prior to analysis. CCC values were used as a metric for both the precision (ρ, the Pearson correlation coefficient, a measure of the degree of scatter) and accuracy (C b, a bias correction factor, the degree of systematic location and scale shifts) of the full CTC measurements [25]. The wash-in (the maximum slope between the time of onset of contrast inflow and the time of peak intensity) and wash-out (the maximum slope of the late-wash phase of the curve) rates [2] were calculated from the CTCs at two manually selected regions of interest (ROIs), each containing 26 voxels and set within the respective measurement chamber (voxel-wise analysis), as illustrated in Fig. 1. Voxel-wise PK modelling using the standard Tofts model was also performed at the same ROIs using the DCEMRI.jl toolkit [26]. Wash-in, wash-out, K trans, and v e values were derived from the MR-measured CTCs and compared with the ground truth values (derived from the optically measured CTCs), with the percentage differences calculated.

Fig. 1
figure 1

Representative T 2-weighted (a) axial and (b) coronal scans of the phantom during peak enhancement within the ‘prostate’ region. The white box in (a) outlines the region shown in (c). c Pharmacokinetic parameter map (K trans) showing the ROIs placed within the measurement chambers which were used to measure the healthy (ROI 1) and tumour (ROI 2) parameter values

Signal to noise ratio (SNR) values were calculated in the same ROIs used for the CTC analysis using a ‘difference method’ [27], based on the evaluation of a difference between two repeated acquisitions at baseline concentration, given as

$${\text{SNR}}_{\text{diff}} \left( {b_{1} ,b_{2} } \right) = \frac{{\frac{1}{2}\bar{S}_{{{\mathbf{r}} \in {\text{ROI}}}} \left( {S_{N} \left( {{\mathbf{r}},b_{1} } \right) + S_{N} \left( {{\mathbf{r}},b_{2} } \right)} \right)}}{{\frac{1}{\sqrt 2 }\sigma_{{{\mathbf{r}} \in {\text{ROI}}}} \left( {S_{N} \left( {{\mathbf{r}},b_{1} } \right) - S_{N} \left( {{\mathbf{r}},b_{2} } \right)} \right)}} ,$$
(1)

where \(\bar{S}_{{{\mathbf{r}} \in {\text{ROI}}}}\) is the mean signal value and \(\sigma_{{{\mathbf{r}} \in {\text{ROI}}}}\) is the standard deviation, for calculations performed at each imaging voxel (r) in the ROI, and S N (r, b n ) is the signal at each imaging voxel in the ROI for each respective baseline measurement (b 1 and b 2).

Results

Figure 2 shows representative dynamic images taken from the MR data at the ‘tumour’ CTC peak at each of the T res values tested, along with SNR values derived from the DCE-MRI data; SNR increased as the T res value was increased. Furthermore, this increase in SNR was evident in the MR-measured CTCs as a decrease in signal variation across the ROIs, resulting in a reduction in the calculated uncertainties, as shown in Fig. 3. Figure 3 also shows an underestimation in the measurement of the ‘tumour’ CTC peak using protocols with T res < 8.1 s, as well as an overestimation of the wash-out phase of the tumour CTC for all protocols tested. Ground truth CTCs are shown in Fig. 3 as solid red (‘tumour’) and blue (‘healthy’) lines.

Fig. 2
figure 2

Representative dynamic images taken from the MR-data at the CTC peak (tumour) at each of the T res values tested. SNR values derived from the DCE-MRI data are also presented. (SNR results presented herein should be treated as relative, for the purpose of highlighting relative behavioural trends in the data)

Fig. 3
figure 3

Graphs showing the MR-measured CTCs at different T res values, with error bars derived from the standard deviation across the ROI. Ground truth CTCs are also shown as solid lines

Figure 4 shows the correlation plots comparing the MR-measured and ground truth CTCs, as well as the CCC values with 95% confidence intervals. Inaccuracies in the measurement of the wash-in phase of the ‘tumour’ CTC were evident as outlying data points, when using T res values ≤8.1 s. Using the fastest protocol (T res = 2 s), there was an overestimation of the MR-measured healthy CTC at higher concentration (>0.3 mM), and these measurement inaccuracies were also apparent in the relatively low CCC value of 0.906 (95% CI = {0.889, 0.920}). There was a general trend for higher CCC values as T res was increased.

Fig. 4
figure 4

Correlation plots of the MR-measured versus ground truth CTCs for full 600-s CTC measurements. The diagonal line indicates equality, with divergence from this line indicating discrepancies between the MR-measured and ground truth values. The CCC values are also shown, along with their 95% confidence intervals

Errors in the measured wash-in and wash-out parameter values are given in Fig. 5. For the measurement of the wash-in, despite the fact that the standard deviation across the ROI was reduced with increased T res (as evidenced by a reduction in the size of the error bars), large underestimation errors (up to 40%) were measured using protocols with T res values ≥16.3 s. Errors in the measurement of the 'tumour' CTC wash-out did not vary greatly across all T res values tested, with all errors <15%; however, there was large variation in the wash-out values calculated across the ROI for the ‘healthy’ CTCs (as evidenced by the large uncertainties of up to ±65%, see Fig. 5), indicating that all protocols struggled to measure this portion of the ‘healthy’ CTC accurately.

Fig. 5
figure 5

Percentage errors in wash-in and wash-out parameter values derived from the DCE-MRI data at different temporal resolutions using the standard Tofts model. Error bars are derived from the standard deviation of the data

Figures 6 and 7 present the percentage errors in the PK parameter values derived from the DCE-MRI data at different T res and AD values using the standard Tofts model. K trans and v e values derived from the MR data differed from the ground truth values by up to 230 and 98%, respectively. Errors of <14% were found for all MR-measurements of K trans made at AD ≥360 s and T res ≤ 8.1 s. The 24.4-s protocol was the least accurate in the measurement of K trans, with errors of up to 172% using the full 600-s acquisition, and these errors increased as the AD was reduced, as shown in Fig. 6 (bottom-right graph). For the measurement of v e, errors of <12% were measured at T res ≤16.3 s, with AD ≥360 s. As with the measurement of K trans, the greatest errors in the measurement of v e were observed in data measured at T res = 24.4 s. The lowest overall errors (<10%) in the measurement of both ‘tumour’ and ‘healthy’ K trans and v e values occurred using: T res = 2 s, with AD ≥360 s.

Fig. 6
figure 6

Percentage errors in K trans values derived from the DCE-MRI data at different temporal resolutions and acquisition lengths using the standard Tofts model. Error bars are derived from the standard deviation of the data

Fig. 7
figure 7

Percentage errors in v e values derived from the DCE-MRI data at different temporal resolutions and acquisition lengths using the standard Tofts model. Error bars are derived from the standard deviation of the data

Discussion

The absence of a robust validation method has contributed to a lack of standardisation in the way that DCE-MRI is performed, hindering its full clinical implementation and rendering comparisons between published studies difficult [8]. As a consequence, the published literature is replete with many different DCE acquisition and analysis protocols, resulting in inconsistencies in the reported data. It is clear that a consensus needs to be reached regarding the optimal data acquisition and analysis methodology to use, so that consistent evidence-based data can be accumulated. The current study demonstrates the trade-offs which arise from modifications of a prostate DCE acquisition protocol involving two of the most widely varied acquisition parameters: temporal resolution and AD. This was possible through the use of a novel phantom device wherein precisely known ground truth CTCs could be repeatedly produced and presented to the MRI scanner for measurement, and hence the accuracy of the measured CTCs; additionally, phenomenological (wash-in and wash-out) and PK (K trans and v e) parameter values derived from these CTCs could be determined.

The data presented herein for a standard 3D spoiled gradient-recalled echo DCE imaging sequence (with a T res of 8.1 s) demonstrates that non-negligible errors in wash-in, wash-out, K trans, and v e parameters occur for both tumour and healthy tissue-like regions. Similar, if not larger, errors can be expected with rapidly accelerated techniques such as parallel imaging and compressed sensing (used to reduce T res values to facilitate more complex PK modelling), where decreases in SNR will invariably occur [28]. The measured CCC values increased with increasing T res, indicating greater precision and accuracy in the point-by-point measurement of the CTCs; however, further parametric analysis of the CTCs showed that the reduced number of data points, resulting from larger T res values, hindered the accurate measurement of certain important curve-shape features. Prior studies have reported that model-free phenomenological parameters, such as the wash-in and wash-out rates, increase the specificity and sensitivity for PCa detection compared with other diagnostic techniques, even when using T res values as high as 30 s [5, 29]. However, the actual wash-in and wash-out rates were not known in these studies, but rather a relative comparison of the measured rate values across the prostate area was performed. The current work demonstrates that a T res value ≤8.1 s is required for accurate measurement of the absolute wash-in rate.

Conflicting data has been published in the literature regarding the effect of T res on PK parameter measurements. For example, in a theoretical simulation study, Aerts et al. reported that the T res requirements were strongly related to the actual PK parameter values being measured, with lower T res values required for the precise measurement of higher K trans values [9]. The physical data from the current study corroborates these theoretical findings, with up to 50% greater precision in the measurement of the lower (healthy) versus higher (tumour) K trans values observed, as well as two to ten times higher accuracy in the measured K trans values (using T res of ≤8.1 s). For higher T res values, the K trans accuracy decreased markedly due to inadequate sampling of the wash-in portion of the curves. T res had less effect on the accuracy of v e measurements, with errors found to be fairly consistent at up to approximately 10% for T res values of ≤16.3 s (AD = 600 s), increasing moderately to approximately 20% using T res = 24.4 s; in this case, a variation in the number of sampling points due to modified T res did not cause a major hindrance in the measurement of the final peak value of the CTCs, a key component for the accurate measurement of v e. This is in line with the recent findings of Ginsburg et al. where it was reported that the effect of temporal resolution was different for estimates of K trans than for v e [10]. Another recent study reported no significant effect of varying T res on PK parameter values [16]; however, T res values of 5–30 s in this study were obtained by a retrospective resampling of the original T res = 5 s data set, with a relative comparison among datasets used to explore any PK parameter changes, without knowledge of the ground truth values.

Reported AD values have also varied widely in published DCE-MRI studies, again with conflicting results presented. In one theoretical study, AD values greater than 420 s were found to offer no improvement in PK parameter estimation for T res values of less than 10 s, whereas below 420 s, the precision in the measurement of higher K trans values was found to decrease [9]. Conversely, another study investigated the effect of retrospectively reducing the AD of patient prostate DCE data on the measured K trans and v e values, and reported no statistical difference using acquisition lengths as short as 150 s in the former and 50 s for the latter [15]. However, these results were based on a ‘full acquisition’ length of only 250 s, against which the PK parameter values measured from retrospectively reduced data sets (AD = 50, 100, 150, and 200 s) were compared in a relative sense, and hence any potential CTC truncation effects (particularly on v e measurements) are unknown. In the current study, large errors in ‘healthy’ v e values were measured for AD values ≤240 s, which may be attributed to the appearance of the peak in the ‘healthy’ CTC at approximately 345 s: the v e parameter estimation from PK model fitting is critically dependent on the CTC reaching a clearly identifiable peak, with some degree of subsequent wash-out, in order for the model fit to produce an accurate v e value. For instance, this explains why, in the current study for the ‘tumour’ CTC used (where the peak of the CTC appeared at approximately 35 s), a reasonably good v e measurement accuracy was maintained with AD values as low as 180 s (<13% error using all T res values tested). Thus, the findings in [15] that ADs as low as 50 s did not significantly change the measured v e values may only be valid for certain CTC shapes with peaks occurring at early time points (for example, as found typically in malignant tumours with fast wash-in/wash-out profiles). Indeed, this critical dependence on an adequate AD for the accurate measurement of v e for different tissue/tumour types may explain the wide discrepancy in published v e values and the consequent perceived lack of diagnostic sensitivity and specificity to PCa detection. One such example is provided by a study by Chen et al. which reported no significant difference for v e values measured in tumour verses healthy prostate tissue which, in the light of the present study’s results, may have been strongly influenced by the short 120-s AD used for the measurements [6]. On the other hand, the authors did report that the measured K trans values were significantly higher in tumour tissue compared with healthy tissue, even at the short 120-s AD [6], and this was also seen in the present work where the AD was not found to exert a strong influence on the accuracy of K trans measurements for AD values ≥120 s and for T res ≤ 8.1 s. Considering the critical dependence of the accuracy of the K trans parameter estimation on the initial wash-in slope of the CTC, this result is perhaps not surprising. Indeed, the results presented herein, along with those of those of certain previous patient studies, suggest that acquisition times as low as 120 s could be justified if K trans is the sole object of any PK modelling exercise. On the other hand, if v e is also to be measured, longer ADs are required. In establishing an optimum AD, clinical studies will also need to consider the potential detrimental effect of patient movement with longer dynamic scan times [8]. However, various image registration techniques, specific to DCE, are being developed which may go some way to mitigating these effects [30, 31]. Such effects could be investigated using the current phantom system, using techniques similar to previous phantom studies where motion was simulated [32].

The phantom used in the present study was designed to be anthropomorphic in both size and complexity, mimicking the conditions associated with abdominal patient imaging; as such, the protocols tested herein faced the same challenges, such as B1 and B0 inhomogeneity, inadequate spoiling, and off-resonance effects, all of which may have contributed to the errors in the MR measurements. Although some of these homogeneity issues were compensated for in the present study, future work using this quantitative phantom-based approach will probe these effects in more detail. It is worth noting that in the present study, T res < 8.1 s were achieved using the commonly used and easily implementable technique of parallel imaging, which caused an inevitable reduction in both SNRs. SNR values have been shown to exert an influence of the precision of derived PK parameters [28], and while no consistent pattern of decreased precision with SNR was noted in the current study, nevertheless, it would be interesting to investigate this in more detail; this is the focus on on-going work with the model phantom system. Furthermore, the use of more complex PK models, which offer the potential to reveal additional (patho-)physiological information yet have an increased sensitivity to SNR [28] and T res [14], are also the focus of on-going work.

Conclusion

This phantom study demonstrated the critical dependence of the accuracy of model-free phenomenological and pharmacokinetic parameters derived from DCE-MRI data on the T res and AD used in the acquisition. K trans errors were below 14% for acquisitions with T res ≤ 8.1 s and AD ≥360 s, but increased dramatically for longer T res and shorter AD values. v e errors were below 12% for acquisitions with T res ≤ 16.3 s and AD ≥360 s, and again increased dramatically outside of this range. No major gain in K trans, v e, and wash-in parameters measurement accuracy was found from the use of AD ≥360 s, at T res ≤ 8.1 s (all measurement errors were <15%). The use of quantitative, phantom-based approaches, such as the one described in this study, to access the accuracy and precision of DCE-MRI techniques, offers the prospect of standardising DCE acquisition protocols for the prostate and beyond, and, ultimately, a wider acceptance of the technique for use in routine clinical examinations.