Introduction

Liver is the primary site for iron storage in patients with hereditary hemochromatosis and transfusional hemosiderosis. Iron distribution in the liver is remarkably uniform in those patients [1]. Iron overload leads to liver damage, which eventually develops into cirrhosis, liver failure, and hepatocellular carcinoma [2, 3]. Furthermore, patients with elevated liver iron content (LIC) are at great risk of cardiac complications and premature death [4]. The clinical LIC thresholds of 1.8, 3.2, 7.0, and 15.0 mg Fe/g dry tissue indicated the normal storage, lower limit, upper limit for chelation therapy, and increased risk of iron-induced complications, respectively [4]. Accordingly, accurate liver iron quantification and stratification are necessary to support clinicians’ treatment decision [1].

Currently, invasive needle biopsy, which is often subject to sampling errors and low reproducibility, is the direct method to quantify LIC [5]. Noninvasive methods for hepatic iron quantification have been developed on several modalities, i.e., superconducting quantum interference device (SQUID), magnetic resonance imaging (MRI), and dual-energy CT (DECT) [610]. SQUID apparatus is not widely available because of high cost [11, 12]. MRI is widely utilized for LIC quantification, but it showed low sensitivity in case of heavy iron accumulation, due to the presence of MR susceptibility artefacts [13]. DECT with two spectra detects high atomic number elements, such as iodine and iron, thus could quantify LIC [14].

On dual-source DECT, dual-energy analysis was developed using imaging-based algorithms to discriminate and quantify materials [15]. An ex vivo phantom study showed that virtual iron concentration (VIC) images, which were derived from an iron-specific three-material decomposition algorithm, could quantify LIC. In the meantime, VIC images eliminated the confounding effect of fat [16]. Such results were inherent to an ex vivo study. Thus, it might not be applicable to an in vivo situation, and might not reach the same accuracy. To the best of our knowledge, a study on VIC imaging, to quantify LIC and stratify LIC based on different clinically LIC thresholds in vivo, has not been published. Therefore, the purpose of the study is to assess the accuracy of VIC-derived LIC quantification on DECT, and to evaluate its ability of grading different clinical LIC thresholds in an iron overload rabbit model.

Materials and methods

Iron overload rabbit models

This study was approved by the local Institutional Animal Care and Use Committee. A total of 51 male New Zealand white rabbits (CHEDUN Experimental Animal Seed Multiplication Farm, Shanghai, China) were used. The age of those rabbits ranged from 3 to 5 months. Their weights ranged from 2.0 to 3.0 kg. The rabbits, which were raised in a pathogen-free environment, had free access to water and normal diet. Iron accumulation was induced by using intravenous injection of iron dextran (100 mg Fe/mL) (Shanxi Zhaoyi Biological Co. Ltd., Shanxi, and China). We attempted to use intravenous iron injection to mimic transfusional iron-overload in a clinic situation.

Thirty-five rabbits were included in the main group and 16 in the validation group. In the main group, a series of total iron injection doses of 0, 25, 50, 100, 200, 300, 400, 800, 1,200, 1,600, 2,000, and 2,400 mg were used. Each dose level was applied to three rabbits, except the dose level of 2,400 mg applying to two rabbits. In the validation group, a series of total iron doses of 0, 50, 100, 200, 400, 800, 1,600, and 2,000 mg were used. Two rabbits were explored at each dose level. Selection of those iron doses was based on our preliminary experimental results, where we successfully established different iron accumulation levels that covered the clinical LIC range. Several separate injections were used for rabbits with a total iron dose higher than 200 mg, whereas rabbits with doses of 200 mg or lower were injected once. In case of multiple injections, the maximum injection dose at each time was 100 mg Fe/kg of body weight. All the rabbits were examined using DECT in 1 week after iron injection.

Pathological and chemical assessment of LIC

After a rabbit was sacrificed, the whole liver was obtained. Large vessels and bile ducts were removed. The remaining hepatic tissues were rinsed repeatedly with distilled water. Our preliminary experimental results showed that iron distribution was uniform and diffuse in the entire liver in this model. All samples for pathological and chemical analyses were selected from almost the same position as those used for DECT measurements. For pathological analysis, haematoxylin and eosin and Prussian blue staining methods were used. For chemical analysis, two liver samples from each rabbit were dried in an oven at 60 °C overnight until their weights remained constant. The dried liver samples were then sent to the Center of Analysis and Measurement, Fudan University, Shanghai, China for LIC quantification using an inductively-coupled plasma spectrometer (ICP) (Hitachi, Ltd., Tokyo, Japan).

CT imaging and analysis

All CT examinations were performed on a dual-source dual-energy CT system (SOMATOM Definition Flash, Siemens AG, Forchheim, Germany). Rabbits were anaesthetized by intravenous (IV) injection of 30 mg/kg pentobarbital sodium (Sigma, USA). Rabbits were fixed by an abdominal belt in order to reduce breathing motion artefacts. Upper abdominal unenhanced DECT acquisition was performed with tube voltages at 80 and 140 kVp with a tin filter. The tin filter was used to improve separation of two spectra [17]. The tube current was 350 and 135 eff. mAs for the low and high voltage tube, respectively. The volumetric CT dose index (CTDIvol) for this setting was 13.17 mGy. Other CT parameters included 64 × 0.6 mm detector collimation, 0.5 s gantry rotation time and 0.6 pitch. The field of view (FOV) was 180 mm × 180 mm with a matrix of 512 × 512. Images were reconstructed using 3 mm slice thickness and 1.5 mm slice increments by a dedicated dual-energy kernel (D30f). We used the same CT acquisition parameters for rabbits with all different levels of iron overload.

Three sets of data were reconstructed for each rabbit as follows: (1) images acquired at 80 kVp; (2) images acquired at 140 kVp; and (3) virtual iron concentration (VIC) images obtained by commercially available post-processing software (Liver VNC, Syngo Dual Energy, Siemens AG, Forchheim, Germany). The pre-settings of the algorithm for the decomposition of iodine were adjusted to obtain an iron-specific algorithm. A previously described iron-specific slope value of 1.9 was used in our study [16]. Two types of images (VIC images and virtual non-iron images) were obtained after image processing. VIC images provided CT attenuation values enhanced by iron, which were similar to the images of iodine enhancement on contrast-enhanced CT [16]. Thus, VIC images indicated iron content and distribution in the liver.

Two radiologists with at least 5 years of experiences in abdominal imaging, blinded to the ICP and histopathological results, independently evaluated the images. For 80 kVp, 140 kVp and VIC images, mean Hounsfield units (HU) of the liver was measured by placing freehand regions of interest (ROI) approximately 5 mm away from the liver boundary. Each ROI was selected in a homogeneous area, avoiding major vessels, bile ducts and image artefacts. Measurements of mean HU were performed on three consecutive CT images. The means of three HU values were used for further analysis.

Statistics

The Kolmogorov-Smirnov test was used to analyse data distributional for normality. Variables in normal distribution were described as mean ± standard deviation. Bland–Altman analysis was used to indicate the agreement of CT measurement between two observers, and the consistency of ICP measurement between two liver samples from each rabbit [18]. Pearson’s correlation analysis was used to show the correlation between CT measurements and reference LIC confirmed by ICP. Data from 35 rabbits in the main experiment group were used to build a linear regression equation, and to determine the relationship between CT measurements and reference LIC. Data from 16 rabbits in the validation group were used to validate the accuracy of the established equation. Bland–Altman analysis was used to determine agreements between VIC-derived LIC [LIC(VIC)] and ICP-derived LIC [LIC(ICP)]. Receiver operating characteristic (ROC) analysis was performed for all 51 rabbits to determine the optimal cutoff value of VIC for discriminating LIC thresholds. The optimal cutoff value was defined as the Youden’s index J associated criterion that maximized the sum of the sensitivity and specificity [19]. A P value < 0.05 was considered as statistically significant. Statistical analysis was performed using SPSS package (v.13.0, Chicago, USA) and MedCalc (v.13.1, Ostend, Belgium).

Results

Pathological analysis

The higher the total iron injection dose obtained, the greater the iron staining observed with uniform distribution in hepatocytes and hemosiderin granules. In six rabbits that had not received iron dextran injections, iron staining was not observed. At the minimum dose of 25 mg, iron-injected rabbits showed homogenous distribution of iron staining. For doses of <50 mg, iron staining only appeared in Kupffer cells of hepatic sinusoids, but not in most hepatocytes. For doses of ≥100 mg, iron deposition appeared in hepatic sinusoids and hepatocytes (Fig. 1). No fat decomposition and hepatic fibrosis were observed in the samples.

Fig. 1
figure 1

Examples of rabbit models with different iron accumulation levels, as graded by clinical liver iron content (LIC) thresholds of 1.8, 3.2, 7.0 and 15.0 mg Fe/g dry tissue. Rows 1–5 represent the five rabbits with LICs of 0.29, 1.95, 6.01, 13.92 and 33.58 mg Fe/g dry tissue, respectively. Columns 1–5 are CT images at 80 kVp, 140 kVp, virtual iron concentration (VIC) images (colour overlay with window centre of 40 HU and width of 80 HU) and Prussian blue staining (100×), respectively. With increasing LIC, the hepatic attenuation increased on images of 80 and 140 kVp, while higher attenuation was observed on 80 kVp images than 140 kVp. On colour-coded VIC images, the colour changed from blue to red with increasing LIC. This finding was confirmed by Prussian blue staining

Chemical assessment of LIC

A total of 102 hepatic samples (two from each rabbit) were used for liver iron content analysis by ICP. Dry sample weight was 0.4 ± 0.2 mg (in the range, 0.1 to 1.1 mg), with a 95 % confidence interval (CI) of 0.4–0.5 mg. Bland–Altman analysis showed an optimal LIC agreement between two samples from each rabbit. The mean difference was 0.29 mg Fe/g (95 % CI, −1.75 to 2.33 mg Fe/g). According to the optimal measurement agreement, the mean LIC value of both samples for each rabbit was calculated for further analysis.

LIC of the dry sample was 9.9 ± 10.7 mg Fe/g (range 0.2–39.6 mg Fe/g) and 95 % CI was 7.8–12.0 mg Fe/g. Grouped by the widely accepted LIC thresholds, 14, 7, 8, 12 and 10 rabbits were sorted in different LIC groups, namely, <1.8, 1.8 to 3.2, 3.2 to 7.0, 7.0 to 15.0 and >15.0 mg Fe/g, respectively.

Interobserver agreement of DECT measurements

For CT measurements between two observers, Bland–Altman analysis showed excellent agreements of CT measurements between two observers. The mean differences were −0.18 HU (95 % CI,−4.51 to 4.14 HU), −0.41 HU (95 % CI,−3.56 to 2.73 HU) and 0.45 HU (95 % CI,−6.01 to 6.92 HU) for 80 kVp, 140 kVp and VIC, respectively. The mean of both measurements from the two observers was taken for further analysis because of strong interobserver agreement.

Correlations between CT measurements and LIC

Hepatic attenuation increased on both 80 and 140 kVp images with increasing LIC values. Moreover, 80 kVp images showed higher attenuation than 140 kVp images. Hepatic VIC value also increased with titrated LIC (Fig. 2). Accordingly, the colour-coded VIC images changed from blue to red with increasing LIC. Prussian blue staining also confirmed that the LIC increased progressively (Fig. 1).

Fig. 2
figure 2

Scatter plots for the correlations between CT measurements (80 kVp, 140 kVp and virtual iron concentration [VIC]) and liver iron content (LIC). All CT measurements linearly correlated with LIC (P < 0.01)

In the main experimental group comprising 35 rabbits, hepatic attenuation between 80 and 140 kVp showed a linear correlation. The correlation coefficient was 0.825 (P < 0.001) (Fig. 3). Hepatic attenuations at both 80 and 140 kVp correlated with LIC. VIC showed stronger correlation with LIC than 80 and 140 kVp images (Table 1). The linear regression equation for LIC calculated from VIC was as follows:

$$ y=0.43x-6.88\left(F=493.03,P<0.001\right) $$
(1)

where x indicates VIC, and y represents LIC(VIC) (Fig. 4).

Fig. 3
figure 3

Scatter plot for the correlation between the CT value in Hounsfield unit (HU) at 80 and 140 kVp. The mean attenuation values between 80 and 140 kVp showed a linear correlation (0.825, P < 0.001)

Table 1 Pearson’s correlation coefficient (r) between CT measurements and liver iron content (LIC)
Fig. 4
figure 4

Scatter plot of virtual iron concentration (VIC) against liver iron content (LIC) with linear fit (solid line) and 95 % CI (dashed lines)

Validation of the regression equation

An accuracy of that equation for LIC quantification was validated in a group of 16 rabbits loaded with different iron levels. The mean LIC(VIC) obtained using Eq. 1 was 7.1 ± 8.7 mg Fe/g dry tissue. The postmortem LIC(ICP) measured with ICP was 8.1 ± 8.8 mg Fe/g dry tissue. Bland–Altman analysis between LIC(VIC) and LIC(ICP) showed a mean difference of 0.95 mg Fe/g (95 % CI, −3.50 to 5.39 mg Fe/g). However, no significant difference was found between LIC(VIC) and LIC(ICP) (P > 0.05) (Fig. 5).

Fig. 5
figure 5

Bland–Altman plots showed the agreements between LIC(VIC) and LIC(ICP), which were determined using virtual iron concentration(VIC) imaging and inductively-coupled plasma (ICP) spectrometry, respectively. The mean difference was 0.95 mg Fe/g, but not statistically significant (P > 0.05). The solid line showed the mean difference between the two methods, whereas the dashed lines indicated the upper and lower 95 % bounds of agreement between them

Diagnostic accuracy for discriminating different clinical LIC thresholds

ROC analysis was performed to distinguish rabbits with different LIC thresholds (1.8, 3.2, 7.0 and 15.0 mg Fe/g), and the corresponding optimal cutoff values for VIC were 19.6, 25.3, 36.9 and 61.5 HU, respectively. At 1.8 mg Fe/g, which was the threshold used to distinguish iron accumulation from normal storage, VIC had 89 % sensitivity and 100 % specificity. When the cutoff value was set at 61.5 HU for VIC, the highest sensitivity (100 %) and specificity (100 %) were obtained at the threshold of 15.0 mg Fe/g (Table 2).

Table 2 The sensitivity and specificity of virtual iron concentration (VIC) with an optimal criterion for liver iron content (LIC) prediction at different clinical LIC thresholds

Discussion

Our study demonstrated that VIC exhibited a strong and positive linear relationship with titrated LIC in an iron-overload rabbit model. Based on the given equation, VIC could accurately predict LIC. The predicted values were highly consistent with reference ICP results. ROC analysis indicated that VIC could stratify the clinically significant LIC thresholds with high sensitivity and specificity.

Accurate assessment of iron accumulation in the liver is necessary for quantitative staging of iron overload and monitoring chelation therapy in a clinic setting [20]. An ideal liver iron quantification technique should be sensitive and accurate for the entire clinically relevant LIC range. Different severity levels of iron-overload rabbit models had been successfully established by injection of iron dextran. LIC in those models ranged from 0.2 to 39.6 mg Fe/g dry tissue, which covered a wide range of LIC, thus could mimic various degrees of iron overload in humans, as presented by iron-overload patients [10]. In our animal model, the iron distribution was homogeneous, which presented in hereditary haemochromatosis and transfusion-dependent iron-overload patients [1]. However, other chronic liver diseases might be associated with heterogeneous iron decomposition. Thus, our results might be helpful for iron quantification in homogeneous distribution. The total iron in the liver might also be helpful to assess the iron burden for heterogeneous iron distribution.

A previous phantom study utilized a polypropylene test tube filled with veal liver and iron [Fe(NO3)3] at titrated iron concentrations for DECT acquisition, in which VIC showed significant positive linear correlation (coefficient = 0.984) with LIC [16]. Our in vivo experimental research exhibited a similar result with the phantom study. VIC quantification algorithm was based on a three-material decomposition theory to quantify iron and fat from soft tissues in the liver. Two slopes were necessary in this theory. The first slope was defined by the density of two basic components, e.g., soft tissue and fat. The second slope was defined by the photoelectric effect of the contrast material being quantified, e.g., iodine, iron or xenon gas [15]. Similar to iodine, iron also showed an energy-dependent property in CT attenuation. Iron showed a lower slope than iodine (1.9 vs. 3.2). Such a slope was used in the iron-specific algorithm [16]. After post-processing of dual energy data, the iron content was subtracted and subsequently pseudocolour-coded to illustrate the distribution and enhancement of iron. As shown in Fig. 1, the colour of the VIC image turned from blue to red with increasing LIC.

Fat is a confounding factor in iron quantification. The presence of fat reduced the attenuation of the hepatic parenchyma [21]. Therefore, LIC would be underestimated in the presence of hepatic fat on conventional single-source CT [16]. The traditional method to analyse attenuation difference (ΔH) for iron qualification can also be affected by the presence of fat. A previous study showed that ΔH resulted in a significant weak negative correlation with fat [22]. Thus, if a substantial amount of fat simultaneously accumulated in the liver along with heavy iron deposition, ΔH would inevitably cause underestimation of iron accumulation. However, a phantom study indicated that VIC imaging based on an iron-specific algorithm could eliminate the confounding effect of fat [16]. The same iron-specific algorithm was used in this study to quantify iron and eliminate the confounding effect of fat. On the contrary, a previous study showed that DECT allowed for accurate quantification of liver fat concentration based on the same iron-specific algorithm, even in the presence of iron and iodinated contrast media [23]. In our animal experimental study, pathological analyses confirmed that no obvious fat decomposition was detected. Thus, the current animal model study might benefit accurate iron quantification with the precondition that no obvious hepatic steatosis was observed. However, before clinical application of VIC, the confounding effect of fat should be carefully studied.

The LIC level above the threshold of 7.0 mg Fe/g was an important entrance for chelation therapy and for treatment efficacy monitoring [24]. In our study VIC showed 95 % sensitivity and 97 % specificity above LIC threshold of 7.0 mg Fe/g. Both sensitivity and specificity increased to 100 % above the iron level of 15.0 mg Fe/g. A previous MRI study showed 89 % sensitivity and 96 % specificity above iron level of 7.0 mg Fe/g. However, above 15.0 mg Fe/g, the sensitivity and specificity dropped to 85 % and 92 %, respectively [4]. Although iron quantification by MRI is widely used and considered as an indicator for iron chelation therapy in some clinical guidelines [25], MRI quantification of liver iron concentration is not sufficiently accurate in clinical practice, especially for patients with heavy iron deposition in the liver because of heavy MR susceptibility artefacts [26].

Our results indicated that VIC has potential to be more accurate than MRI, especially for patients with severe iron overload in the liver. When a patient has substantial iron accumulation or contraindications for MR imaging, DECT might be a better choice for noninvasive LIC quantification. However, further investigations are necessary on the two types of modalities to confirm that argument, based on the same patient sample.

The radiation exposure should be carefully considered for clinical application of any CT technique. A previous patient study on DECT for iron quantification showed that the mean effective radiation dose was 3.60 mSv per examination [22]. That dose level was similar to the annual effective background dose of approximately 3 mSv, which was much less than the average dose of 8 mSv for the traditional abdominal single-phase CT examination [27]. VIC was derived from post-processing analysis of dual-energy data without an additional radiation dose. For patients with homogeneous iron distribution or follow-up patients, the reduction in scan coverage will further decrease the radiation exposure.

This study had several limitations. Firstly, the inherent limitation of an animal model should be considered; the observed experimental results might not be reproducible in patients. Nevertheless, a well established animal model of titrated iron accumulation in the liver was capable of simulating patients with different levels of iron overload and/or severe iron overload, which was important to initialize the investigation on this topic. Secondly, the sample size of the validation group was relatively small. The small sample size might be the reason for the mean difference of 0.95 mg Fe/g between LIC(VIC) with LIC(ICP). However, such a difference was statistically not significant. Thirdly, in the scatter plot of VIC against LIC, we found that the intercept was non-zero, which indicated that the parameters of dual-energy material decomposition algorithm were not optimized for rabbits. The original algorithm was for humans, where iodine was used as the contrast agent. Further studies are necessary to address the abovementioned limitations and to evaluate the diagnostic performance of VIC imaging for LIC quantification in patients.

In conclusion, virtual iron concentration (VIC) imaging derived from an iron-specific algorithm of dual-energy analysis was feasible and accurate for iron quantification in an iron overload rabbit model. VIC was highly accurate for liver iron content grading, especially for heavy iron accumulation in the liver. The present animal model study may serve as a basis for the potential clinical use of VIC imaging to guide iron chelation therapy.