Introduction

Tumor hypoxia correlates with increased metastatic risk and resistance to radiation and chemotherapy [1]. Invasive and non-invasive techniques have been developed to measure tumor hypoxia. Invasive methods, e.g., the Eppendorf pO2 electrode, provide real-time pO2 measurements and have shown a correlation between tumor hypoxia and negative patient survival [2, 3]. However, such techniques are user dependent, subject to sampling error, and limited to easily accessible tumors [4].

Non-invasive molecular imaging techniques are increasingly used in the clinic to address limitations of invasive methods [5]. Positron emission tomography (PET) can be used to assess the magnitude and variability of tumor hypoxia and guide treatment decisions [6, 7]. [18F]Fluoromisonidazole ([18F]FMISO) PET [8] is considered one of the more promising hypoxia quantification methods because the tracer selectively binds in hypoxic cells [912]. [18F]FMISO is reduced by nitroreductase enzymes and covalently bound to intracellular macromolecules in the absence of oxygen, causing accumulation of F-18 radioactivity in hypoxic cells [13]. Others [9, 10] first demonstrated [18F]FMISO could detect hypoxia in human tumors [11] and showed the biomarker was representative of intracellular pO2 [14].

Current methods used to quantify the hypoxic fraction (HF), i.e., percentage of the tumor volume defined as hypoxic, do not provide absolute pO2 values and can lead to highly variable results [5]. Tissue-to-blood ratio (TBR), the most common metric to quantify tumor hypoxia with PET (~1 in normal tissues), is dependent on threshold values (first defined as ≥1.4 [9], ranging from 1.1 to 1.5 in the literature). Others [15, 16] have attempted to identify an [18F]FMISO parameter that correlates with a pO2 value for the same tumor volume [17].

PET tracer kinetic modeling derives tracer binding parameters from the tissue activity curves of dynamic PET datasets [18]. Thorwarth et al. [19] and others [20, 21] suggested kinetic modeling could provide more accurate tumor hypoxia quantification accounting for both delivery of tracer to tumor (K 1) and binding to hypoxic cells (k 3) [19]. Bartlett et al. [22] compared these parameters with TBR and determined that k 3 was the most accurate hypoxic quantifier. However, it remains unclear which metric or threshold is the most clinically relevant, and the method of image analysis could impact biological treatment planning and treatment outcomes analysis.

We assess the impact of different PET imaging metrics (TBR and kinetic parameters) and thresholds on the accuracy of tumor HF quantification and correlate these parameters with independent pO2 measurements.

Methods and Materials

Animal and Tumor Models

EMT6 mouse mammary carcinoma cells were grown in vitro and 2 × 105 cells were injected subcutaneously into the right-upper limb of all six male BALB/c mice (average weight ~30 g) aged ≥2 months. Imaging was performed ~2 weeks post-inoculation when tumors were ~0.6 cm3. Murine experimental protocols were approved by the Yale University Institutional Animal Care and Use Committee and conducted in full compliance with the Association for the Assessment and Accreditation of Laboratory Animal Care, governmental and institutional regulations, and principles outlined in the United States Public Health Service Guide. After completion of all experiments, mice were euthanized under anesthesia by cervical dislocation.

Cell line information, including origin, characteristics, growth methods, and presence of tumor hypoxia, is described elsewhere [2327]. Others ([4, 28]) have characterized the use of this subcutaneous model for tumor hypoxia measurement with pO2 probes and PET imaging.

PET Imaging

Preclinical Imaging

Mice were anesthetized with isoflurane and air and injected i.v. with [18F]FMISO (maximum 0.5 mCi). [18F]FMISO was prepared according to previously published methods [29]. Animals underwent a whole-body 120-min dynamic PET scan using a microPET scanner (Focus 220, Siemens, U.S.A.). A transmission scan was acquired for attenuation correction and computed tomography (CT) for registration purposes. Rodents were transferred from the microPET (in the same immobilized position) on a removable tray to the microSPECT-CT scanner (X-SPECT, GammaMedica-Ideas, Northridge, CA) for CT imaging.

List mode data were binned as follows: 6 × 30 s and 3 × 1, 2 × 2, and 22 × 5 min. Dynamic PET images were reconstructed into a 256 × 256 × 95 matrix (voxel dimensions, 0.949 × 0.949 × 0.796 mm) using Fourier rebinning followed by filtered back projection algorithm with a ramp filter. Corrections for attenuation, scatter, randoms, and dead time were applied to each frame.

Clinical Imaging

Human [18F]FMISO PET/CT scans were acquired on a Siemens Biograph mCT scanner (Siemens Medical Solutions, Hoffman Estates, Knoxville, TN, USA) in patients with early stage non-small cell lung cancer (NSCLC) tumors. Each [18F]FMISO PET scan consisted of three parts, acquired dynamically from 0 to 120, 150–180, and 210–240 min post-injection. A CT was performed before each part for attenuation correction. List mode data were divided into frames of 6 × 30 s and 3 × 1, 2 × 2, and 33 × 5 min. Dynamic PET images were reconstructed into a 400 × 400 × 111 matrix (voxel dimensions, 2.036 × 2.036 × 2 mm) using a OP-OSEM, PSF, and TOF with a 3.0-mm FWHM Gaussian isotropic filter, 2-mm isotropic voxels, and no zoom. Global dead time and model-based scatter correction was also carried out for each frame.

Preclinical and Clinical [18F]FMISO PET Analysis

Input Function and ROI Definition

MEDx 7.1 (Medical Numerics, Sterling, VA) and MATLAB (Mathworks Inc, Natick, MA) were used for image visualization and processing. Image-derived input functions were identified by drawing a region of interest (ROI) on the PET transaxial slices containing the heart in frames acquired <2–5 min post-injection. The heart ROI was then applied to the whole dynamic PET dataset to generate an input function using the average activity in the heart ROI. The heart time-activity curve (TAC), after its peak, was fitted using a sum of exponential functions (1 to 3 exponentials, automatically determined by minimizing the chi-square criterion) to reduce noise for more accurate kinetic analysis.

For preclinical imaging, tumor and muscle ROIs were drawn on CT and co-registered to PET images using FLIRT (Oxford UK) [30, 31] to perform a linear six-parameter rigid registration based on mutual information. Absolute tumor volume was verified using caliper measurement after tumor excision. For clinical imaging, the absolute tumor volume was created by the radiation oncologist contouring the tumor ROI on CT. The tumor diameter was measured on the treatment planning CT.

Calculation of Tissue-to-Blood Ratio and Hypoxic Fraction

Uptake values in each tumor ROI voxel were divided by the average value in the heart ROI from 90 to 120 min post-injection (preclinical) and 210–240 min post-injection (clinical) to create a TBR value for each voxel. Values were averaged to give a mean TBR for the tumor ROI, and a threshold was applied to calculate the percentage of tumor voxels that were hypoxic, i.e., above the defined threshold.

Voxel-by-Voxel Tumor Kinetic Analysis

A two-tissue compartment (2TC) model was used to generate tracer kinetic parameters [22]. Kinetic parameters K 1 (ml min cm−3), k 2 (min−1), and k 3 (min−1) were averaged over the whole tumor and estimated for each voxel. Relevant parameters were k 3 and K i (K 1 k 3 /(k 2 + k 3)), the net rate of tracer binding and influx, respectively. Levenberg–Marquardt least-squares optimization algorithm was used to estimate model parameters in each voxel. In addition, Patlak analysis was performed to provide K i only [32].

HFs were quantified using different individual imaging metrics (TBR, k 3, or K i) and thresholds (1.2, 1.3, 1.4 for TBR, 0.008 min−1 for k 3, and 0.004 ml min cm−3 for K i). Variation in HF across these metrics and thresholds for TBR was assessed by comparing the median HF across all mice. Parametric maps were visualized in MATLAB. For clinical imaging, HF was defined as the percentage of the absolute tumor volume exhibiting a TBR >1.2 and >1.4, k3 >0.008 min−1, and Ki >0.004 ml min cm−3.

Preclinical pO2 Measurements

Partial oxygen pressure (pO2) was measured with an Eppendorf pO2 histograph. (Eppendorf KIMOC 6650, Hamburg, Germany) [33]. Needle trajectories were guided using plastic grids [three per row and 1 mm apart]. A 0.5-mm step length of needle was set (~half the size of a PET voxel) by advancing 0.8 mm and retracting 0.3 mm to avoid inaccuracy due to needle tip pressure. Forty pO2 tumor measurements (four tracks of ten measurements each) and 30 pO2 anterior right leg muscle measurements (three tracks of ten measurements each) were acquired. Distributions of measured values were plotted as histograms to compare tumor and muscle readings (as reference normoxic tissue with a pO2 range of 20–40 mmHg). Mean pO2 values were calculated.

Preclinical pO2 Measurement Analysis and Parametric Image Comparison

Parametric maps (TBR, k 3 and K i) were created using in-house software and ROI values were imported into GraphPad (La Jolla California USA). Four tumor ROI regions were selected (to reduce sampling error) to spatially correspond to pO2 readings based on external grid placement. Two track readings corresponded to PET voxel width. To reduce the impact of spatial registration error between PET voxel location and electrode placement, measured pO2 values were averaged to calculate mean pO2 across all tumor and muscle voxels and correlated with mean TBR, k 3, or K i voxel values for the same approximate spatial tumor or muscle area. Kinetic parameter thresholds k 3 = 0.008 min−1 and K i = 0.004 ml min cm−3 were selected to yield comparable HF values as TBR >1.2.

Statistical Methods

The median imaging metric value was calculated for tumor and muscle ROIs for all mice. An assessment of the difference of the medians was done using a Wilcoxon ranksum test (p < 0.05) [34]. R 2 values were calculated and used to analyze how differences in [18F]FMISO PET metrics can be explained by a difference in a pO2 using the correlation plots. Standard deviation (SD) was calculated for each metric to provide the uncertainty in HF estimation.

Results

Figure 1 shows histograms of mouse tumor and muscle pO2 values. Mean tumor pO2 values ranged between 1 and 9 mmHg and mean muscle values between 21 and 35 mmHg. Mean tumor and mean muscle pO2 values were within expected ranges, 0–10 mmHg for tumor and 20–60 mmHg for muscle. Muscle values <20 mmHg are attributed to prolonged time under anesthesia and possible probe misplacement into nearby fascia or fatty tissue.

Fig. 1.
figure 1

Histograms showing measured pO2 values for mouse tumors and muscles (M1M6). Mean tumor pO2 values range between 1.1 and 9.4 mmHg and mean muscle values between 20.9 and 35 mmHg.

Table 1 shows tumor HFs as defined by TBR thresholds >1.2, >1.3, and >1.4 and Patlak and 2TC rate constants. The median HFs for mouse 1–6 (M1–M6) using TBR >1.2, 1.3, and 1.4 were 43, 29, and 19 %, respectively, a variation of up to a factor of 1.5. Compared to median HFs defined by 2TC k 3, K i, and Patlak K i, these were 39, 57, and 54 %, respectively, varying by up to a factor of 1.5. Variation of up to a factor of 3 was observed between median HFs calculated by TBR versus kinetic modeling parameters. Median HF using pO2 <5 mmHg (a commonly accepted threshold for radiobiological hypoxia [35, 36]) was 71 %, a value up to 3.5 times larger when compared to median HF calculated by TBR and kinetic parameter thresholds.

Table 1 Comparison of hypoxic fractions for each tumor using TBRs and rates of tracer influx (K i in ml/min/g) and binding (k 3 in min−1) for 2TC and Patlak models. Error is shown as ± standard deviation for each metric

Figure 2 shows transaxial images (n = 6) highlighting the visual differences in HFs by comparing TBR values estimated from static late PET images with kinetic parameters derived from the entire dynamic scan (display range shows hypoxia as white, tumor location indicated with arrows). Each column corresponds to one mouse (M1–M6). Row (1) shows the CT images of a representative mouse tumor slice. Subsequent rows show PET images created using different [18F]FMISO imaging metrics. Row (2) shows TBR (>1.2) images calculated using 90–120 summed image. Rows (3), (4), and (5) show parametric images for k 3 (>0.008), K i 2TC (>0.004), and Patlak K i (>0.004) respectively.

Fig. 2.
figure 2

Axial images (M1M6) showing HF variations: Row (1) CT. (2) 90–120 summed PET, TBR >1.2 shows hypoxia. (3), (4), and (5) Parametric map of k 3 min−1 (2TC), threshold = 0.008 min−1; parametric map of K i (ml min cm−3), threshold = 0.004 ml min cm−3, 2TC; and Patlak, respectively. Arrows indicate tumor hypoxia, and scale bar on the right shows hypoxia as white.

Figure 3 shows the correlation between mean tumor and muscle pO2 and mean imaging metrics for tumor hypoxia (TBR, k 3, and K i). An inverse relationship was observed between mean pO2 and mean [18F]FMISO values. Data were fit using an exponential function (Y = Y0*exp(K*X), X = pO2), which provided a reasonable empirical description of the data (Y0 = 1.464, 0.013, 0.007 and 0.008 and K = −0.018, −0.044, −0.046, and −0.046 for TBR, k 3, 2TC K i, and Patlak K i, respectively). R 2 values from this function fit were 0.25, 0.25, 0.30, and 0.32 for TBR, k 3, 2TC K i, and Patlak K i, respectively. For low pO2 values, a high k 3, high K i, and high TBR were observed. The fit for the K i Patlak metric yielded the highest R 2 value of 0.32.

Fig. 3.
figure 3

Symbols show mean tumor (circle) or muscle (triangle) PET imaging metric and mean tumor or muscle pO2 for each mouse (M1–M6). Solid lines indicate inverse-exponential relationship between mean pO2 (by Eppendorf electrode) and [18F]FMISO parameter described by Y = Y0*exp(K*X) where Y0 = 1.464, 0.013, 0.007, and 0.008 and K = −0.018, −0.044, −0.046, and −0.046 and for a, b, c and d, respectively. a TBR, b k 3, c 2TC K i, and d Patlak K i. R 2 values were 0.25, 0.25, 0.30, and 0.32, respectively. Error bars are standard deviations of the mean for each [18F]FMISO metric.

An assessment of the difference of the medians was done using a Wilcoxon rank-sum test (p < 0.05, corrected for multiple comparisons). The median values for all [18F]FMISO imaging metrics (TBR, k 3, and K i) for the tumor, and muscle pO2 groups for all mice were compared. For k 3, K i (2TC), and K i (Patlak), the p values were 0.0043, 0.0043, and 0.0043, respectively, and all statistically significant (p ≤ 0.05). No statistically significant difference was observed between the tumor and muscle TBR values, suggesting that model-based methods are more sensitive. Figure 4 (a–d) shows a voxel-by-voxel correlation of imaging metrics (TBR, k 3 and K i) and pO2 values for one mouse tumor, as described by Y = Y0*exp(K*X) where Y0 = 1.00, 0.004, 0.01, and 0.005 and K = -0.003, -0.04, -0.13, and -0.13 and R2 values were 0.02, 0.24, 0.42 and 0.39 or a, b, c, and d, respectively. Again, an inverse relationship was observed between pO2 and mean [18F]FMISO values. Figure 4 e-h shows the correlation between mean tumor pO2 and mean imaging metrics for tumor hypoxia (TBR, k 3 and K i) for all mice (M1–M6). A solid line indicates the fit of a nonlinear inverse-exponential described by Y = Y0*exp(K*X). R 2 values were 0.04, 0.25, 0.15, and 0.08. Error bars are standard deviations of the mean for each [18F]FMISO metric. Due to the small sample size of the tumor data, the under-sampling of the pO2 probe, and additional factors outlined in the discussion, the correlation without the muscle data is not consistent with physiological expectations.

Fig. 4.
figure 4

a TBR, b k 3, c 2TC K i, and d Patlak K i as a function of independently measured partial oxygen pressure for a single mouse (M1) tumor described by Y = Y0*exp(K*X). (e–h) Symbols show mean tumor (circle) PET imaging metric and mean tumor pO2 for each mouse (M1–M6). Dashed lines show the fit of a horizontal. A horizontal line through the data compared with solid lines indicating the fit of a nonlinear inverse-exponential described by Y = Y0*exp(K*X) for e, f, g, and h. Error bars are standard deviations of the mean for each [18F]FMISO metric.

Figure 5 shows the clinical impact of imaging metric and threshold on HF quantification in human tumors with [18F]FMISO PET imaging. In patient A, HFs were estimated as 67, 52, 41, and 38 % of total tumor volume using TBR >1.2, TBR >1.4, k 3 > 0.008, and K i > 0.004, respectively. In patients B and C, HFs were estimated to as 14, 1.6, 0, and 0 % and 27, 7.4, 1.2, and 0.1 %, respectively, using the same metrics. The variation in HF in all three patients highlights the importance of metric and threshold choice in the use of [18F]FMISO PET imaging for radiation therapy applications. For example, the yellow contours in each panel could be used as a biological target volume that would receive a higher radiation prescription dose than the surrounding tumor volume.

Fig. 5.
figure 5

Translation of tumor hypoxic volumes onto treatment planning CT. Yellow contours represent potential biological target volume (BTV) defined by [18F]FMISO PET to receive radiation dose escalation. Patients with early stage (I–II) NSCLC (tumor diameters of 4.1, 2.2, and 5.3 cm for patients A, B, and C) and were prescribed 10–18 Gy × 3 stereotactic radiation therapy. a, b, and c represent patients A, B, and C: [18F]FMISO PET image summed from 210 to 240 min post-injection, BTVs defined using tumor-to-blood (TBR) >1.2 and >1.4 and k 3 2TC image and K i Patlak images for the same image slice and time frame. Arrows indicate tumor hypoxia, and scale bar on the right shows hypoxia as white.

Discussion

In this work, several PET imaging metrics (including kinetic model-derived metrics) used to quantify tumor hypoxia were directly compared and correlated with independent pO2 measurements. Similar correlative experiments have been done [22, 3741]; however, this is the first to assess the impact of thresholding various imaging metrics on tumor HFs. The dependence of tumor hypoxia quantification on metric and threshold choice is shown.

Three commonly used TBR thresholds (~1 in normal tissue) to define tumor hypoxia were compared: >1.2 [4251], >1.3 [5254], and >1.4 [9, 55]. Variation in median HFs for the same tumor depend on TBR threshold selection when compared to median HFs defined by pO2 <5 mmHg. This suggests that certain TBR thresholds may provide an inaccurate HF that is different from physiologically meaningful hypoxia. To account for TBR inaccuracies, kinetic modeling can be applied on a voxel-by-voxel basis to provide hypoxic heterogeneity information [19] and differentiate hypoxic (irreversible tracer accumulation) from normoxic regions (reversible tracer uptake) [56]. Mouse median HFs varied by up to a factor of 3 when defined by a TBR of >1.2, 1.3, or >1.4 (thresholds used in clinical studies [9, 52, 5759]), when compared to those defined by kinetic parameters (k 3, 2TC K i, and Patlak K i). For M1, HF was 0 % (TBR >1.4) versus 33 % (k 3 > 0.004 min−1). As TBR is a static measure, it provides a composite map of peripheral tracer clearance, tracer delivery, and clearance in the tumor, as well as irreversible tumor uptake due to hypoxia. This may explain the large variation in HFs and significant size differences when compared to kinetic modeling parameters. Although we expect the use of different thresholds to produce different HFs, in the clinic, this reality is often ignored and HFs are defined somewhat arbitrarily.

Despite the advantages, kinetic modeling remains imperfect for HF quantification. Aside from lengthy imaging protocols, the partial volume effect (PVE) still affects all kinetic modeling tumor HF measures (as it does for TBR especially for small tumors) and can impact the input function [21] and the shape of the TACs [19]. We showed a variation by up to a factor of 1.5 in median HF defined by k 3 compared to those defined by K i (with either model). If the model had been constrained for noise, to calculate HFs defined by k 3, the difference between these volumes and those defined by K i may be reduced [22]. However, we did not implement a constrained fit as we lacked sufficient data to adequately and precisely define the constrained values. In addition, that there is no clear overestimation or underestimation of hypoxia using pO2 compared to the various PET measurements. This could be due to the fact that PET resolution (1.5 mm) cannot directly capture all radiobiological relevant heterogeneity in tumor hypoxia, which occurs at a distance of 100–200 μm [60]. As [18F]FMISO uptake parameters all have steep gradient changes and considering the low resolution of PET, averaging of these values may have resulted in a loss of information. Also, the bioreduction of [18F]FMISO only exists over a finite range of pO2 values (0–20 mmHg) and this may be a source for error as this range corresponds to varying levels of hypoxia i.e., severe or moderate and this information could be lost in the image [22, 35, 61].

As clinical practice moves towards the use of hypoxic quantification metrics, including kinetic parameters [62], it is critical to correctly interpret their physiological meaning to more accurately define HF and facilitate comparison of results across institutions. As highlighted by the change in HF with threshold in Fig. 5, inaccurate assessment of patient hypoxic status could make it challenging to stratify patients and overcome hypoxic radioresistance [26, 46, 5053, 63].

To elucidate this, we correlated mean pO2 values with imaging metrics for the same approximate spatial tumor volume (Fig. 3). The difference in kinetic parameter values for tumor and muscle was statistically significant (p ≤ 0.05), but this was not the case with TBR. This is consistent with other studies that suggest TBR may not provide accurate hypoxic analysis [19] and others showing it may not correlate with pO2 [64].

However, the voxel-by-voxel correlation was unclear between each imaging metric and pO2 for the same approximate spatial area in most subjects with the exception of M1 (Fig. 4 a–d). We attribute this to a number of factors. First, Eppendorf pO2 electrode is user-dependent [4] and spatial registration error may have resulted from deformation of the probe trajectory during measurements. Second, as HF is defined as viable tissue, macroscopic areas of necrosis (if present in the tumor) do not affect the value of the HF fraction as defined by [18F]FMISO PET as these areas lack detectible uptake. However, a low pO2 reading could be obtained in a necrotic region using the electrode and artificially lower the HF value [65]. Third, tumor ROI values were averaged to obtain representative pO2 and [18F]FMISO uptake values to account for under-sampling and user- dependence of the pO2 electrode [4] and poor tracer differentiation of hypoxic regions. Moreover, muscle histograms do contain values <20 mmHg and this may be a result of prolonged time under anesthesia and possible misplacement of the probe into nearby fascia or fatty tissue. Finally, of the six mice presented, CAIX staining was only performed on two of them; thus, we have limited knowledge of the necrotic fraction. Also, quantitative analysis of the CAIX staining results was challenging and suffered from severe undersampling. As a result, we decided to remove the IHC section from the manuscript.

Despite the aforementioned challenges, increased accuracy of tumor hypoxia quantification techniques in the clinical setting has the potential to improve cancer patient outcomes through (1) selective stratification to hypoxia-targeted drug trials and (2) more optimal radiotherapy planning and delivery both aimed at overcoming treatment resistance. Inaccurate assessment of patient hypoxic status can make it challenging to stratify patients and overcome hypoxic radioresistance [46, 5053, 63]. This has potentially significant implications for the clinical setting in the context of dose escalation for targeting hypoxic subvolumes or stratification of patients into subgroups for personalized hypoxia modification therapies. While our results do not demonstrate a clear statistical significance and none of the quantification strategies elucidate a clear and reliable correlation with pO2, they are suggestive that tracer kinetics may provide an improved characterization of tumor hypoxia using [18F]FMISO PET imaging and indicate a direction for future research.

Conclusions

Different PET imaging metrics and thresholds can result in substantially different tumor HF measurements. Our results suggest a direction for future studies without providing a statistically significant demonstration that tracer kinetic modeling has potential to improve quantification clinically. Future experiments are needed to provide more accurate imaging metrics and refine pO2 measurement techniques to better quantify tumor hypoxia. Accurate hypoxia quantification techniques have the potential to impact patient treatment decisions and improve radiotherapy patient outcomes.