Introduction

The global prevalence of hepatitis C virus (HCV) infection is estimated to be 143 million people (2%) as of 2015 [1].

Knowledge of liver fibrosis stage in chronic HCV infections is beneficial for prognosis, follow-up, and treatment decisions [2].

Liver biopsy is still considered the gold standard for staging hepatic fibrosis. Results are expressed in a semi-quantitative classification system validated for HCV fibrosis (i.e., the METAVIR score) [3, 4]. However, it is an invasive procedure, sometimes leading to life-threatening complications. It allows the assessment of only 1/50,000 of the whole liver volume and is prone to sampling errors and intra-/inter-observer variability [5]. To overcome these limitations, several noninvasive methods for liver fibrosis quantification have been proposed and introduced in clinical practice.

Direct and indirect serum biomarkers alone can provide information about liver fibrosis and be useful in low-resource environments but have variable accuracies. The current guidelines recommend these laboratory tests to be used in combination with an elastography technique to detect those patients who have clinically significant fibrosis [6].

Quantitative elastography methods include ultrasound-based modalities and magnetic resonance elastography (MRE) [7, 8].

One-dimensional transient elastography (1D-TE) is currently the most validated technique for the noninvasive assessment of liver fibrosis in HCV patients [9]. 1D-TE has a low procedure time (< 5 min), can be performed after minimal training, and has a good reproducibility and high performance for advanced liver fibrosis. However, it has lower applicability than other noninvasive techniques (e.g., ascites and obesity) [6]. Two-dimensional shear wave elastography (2D-SWE) provides for the analysis of a larger area of liver parenchyma, allowing the measurement of average stiffness within a region-of-interest (ROI) chosen by the operator [7].

In MRE, mechanical waves are produced in tissues and then imaged with a dedicated MRI sequence. Shear wave information is used to generate elastograms (i.e., color-coded maps that quantitatively depict tissue stiffness) [10]. MRE visualizes a large amount of liver volume and has an excellent accuracy in detecting and staging liver fibrosis [11]. The main limitations are its cost and low availability [6].

Given the wide variety of laboratory tests and stiffness imaging modalities that are available to monitor the progression of liver fibrosis in HCV patients, there is a need for mutual validation among them for a better implementation in routine clinical practice [12].

In the recent literature, there are various studies comparing the diagnostic performance of pairs of stiffness imaging techniques (i.e., 1D-TE vs. 2D-SWE [13]; 1D-TE vs. MRE [14]; MRE vs. 2D-SWE [12]), but obtaining more data on the inter-modality concordance among the different elastographic methods is still necessary. The present study is the first that prospectively assesses the inter-modality concordance/agreement among three stiffness imaging modalities (MRE, 1D-TE, and 2D-SWE) in the same cohort of HCV patients. A secondary objective was to understand which patient-related factors may cause disagreement among the elastographic modalities.

Materials and methods

This was a prospective study that was approved by the institutional review board (449REG2016), and informed consent was obtained from all patients.

Study design and inclusion of patients

This was a pilot study and a formal calculation of the sample size was not performed. Ninety-one consecutive patients with current or previous chronic HCV infection were enrolled at the Infectious Disease Unit of our institution between March 2017 and September 2018. The time span for enrollment was determined by the availability of 2D-SWE in our radiology department (loan for use for research purposes). Demographics (sex, age, and BMI) and various laboratory values [i.e., alanine aminotransferase (ALT), aspartate aminotransferase (AST), Gamma-Glutamyl Transferase (GGT), total bilirubin, platelet count, HCV–RNA, and HBsAg] were obtained for each patient. Clinical evaluation and blood tests had to be performed within 1 week of inclusion in the study. Patients with chronic liver disease from other causes other than HCV were excluded. Patients with general contraindications to MRI were excluded. MRE, 1D-TE, and 2D-SWE were all randomly performed on the same day.

Technical and biological confounders

To avoid potential confounders in stiffness measurements all included patients had transaminase levels < 5 × Upper Limit of Normal (ULN) , no clinical/radiological signs of severe right heart failure, extrahepatic cholestasis, and infiltrative liver disease. At the time of examinations, patients had been fasting for at least 6 h [15].

Fibrosis assessment with serum biomarkers

Fibrosis-4 (FIB-4) score is a noninvasive index based in serum biomarkers to predict significant fibrosis and was calculated using the following formula: [age (years) × AST (U/l)]/[platelets (109) × ALT (U/l)1/2] [16]. We performed a separate analysis to assess the concordance between MRE, 1D-TE, and 2D-SWE measurements and FIB-4.

Stiffness imaging techniques

MRE technique

To perform MRE, we used a Signa HDxt™ 1.5 Tesla scanner (GE Healthcare) and placed a 19 cm diameter, 1.5 cm thick cylindrical passive driver (MR-Touch; GE Healthcare) against the patient’s right anterior chest wall with the center of the driver at the level of the xiphoid process. Tissue shear stiffness maps (elastograms) were automatically yielded in kilopascals (kPa) by using the complex shear modulus [10]. One of the two abdominal radiologists, with at least 10 years of clinical practice and 2 years of MRE experience (F.P., L.B.), drew the largest ROI on each of four axial images, and the average stiffness was reported. MRE failure was considered if the wave pattern was disorganized or no pixel value was on the confidence map [17]. In the same MRI session, T2* decay values were calculated by using a multigradient echo sequence with 16 echoes [18]. Because no patients had significantly low T2* decay values (i.e., minimum T2* value of 17.50 ms), liver fat fraction was calculated by using the two-point dual-Dixon method [19]. An example of liver stiffness measurement in MRE is shown in Fig. 1a, b.

Fig. 1
figure 1

Liver stiffness measurements obtained by magnetic resonance elastography (MRE) and two-dimensional shear wave elastography (2D-SWE) in the same HCV patient. a Wave image showing the progression of shear waves through the liver parenchyma. No artifacts (i.e., regions of wave interference) are appreciable in the image. b Drawing of the free-hand ROI on the confidence map yielded a liver stiffness value of 5.21 kPa, which is indicative of advanced fibrosis (group 3 fibrosis). c Liver stiffness measurement obtained by 2D-SWE provided a value of 9.44 kPa, which is indicative of advanced fibrosis (group 3 fibrosis)

1D-TE technique

1D-TE was performed with FibroScan™ (Echosens). The operator (G.F.) located a portion of the liver at least 6 cm thick and free of large vascular structures using time-motion ultrasound (based on multiple A-mode lines in time at different proximal locations assembled to form a low-quality image) [20]. The probe was placed at the 9th to 10th intercostal spaces at the mid-axillary line level in supine position. The machine displayed the median of the measured Young’s modulus in kPa, the interquartile range (IQR), and the IQR/median (IQR/M). The assessment was considered reliable when 10 valid readings and an IQR ≤ 30% of the median (IQR/M ≤ 30%) were obtained. An XL probe was used for patients with a skin-to-liver capsule distance > 25 mm [7].

2D-SWE technique

2D-SWE was performed on the Logiq™ E9 XD Clear 2.0 (GE Healthcare) by one of the two abdominal radiologists (L.C., S.P.) with at least 5 years of clinical experience and more than 2 years of clinical experience of US elastography. The convex abdominal 1–6 MHz probe was placed in the right intercostal space that provided the best view of the right liver lobe in supine position. Measurements were performed by placing a 1 cm circular ROI over the different saved 2D-SWE images. Median stiffness was expressed in terms of Young’s modulus E. IQR/M value below 30% was considered a quality criterion. Failure was defined if there was an IQR/M ≥ 30% [7, 21]. An example of liver stiffness measurement in 2D-SWE is shown in Fig. 1c.

Reading strategy

Each of the elastographic techniques was performed by a different operator who obtained stiffness measurements independently and was blinded to all clinical, biological, and other stiffness measurement data. After obtaining a stiffness measurement in kPa, the registered value was subsequently assigned to a fibrosis group according to the cut-off values described in the following section.

Stratification of patients according to fibrosis groups

Patients were stratified in fibrosis groups according to the consensus statement of the Society of Radiologists in Ultrasound (Table 1). The cut-off values select patients who are at low risk for clinically significant fibrosis and does not require additional follow-up from patients at high risk for advanced fibrosis or cirrhosis. Between these two cut-off values, there is substantial overlap of fibrosis stages, and they suggested liver biopsy or MRE for clarification [22].

Table 1 Cut-off values of kPa for stratifying HCV patients according to their fibrosis group

Statistical analyses

Statistical analyses were performed using MedCalc for Windows, version 15.0 (MedCalc Software) and RStudio for Windows, version 1.1.463 (RStudio, Inc.).

Descriptive statistics were produced for patient data. Categorical data were expressed as number and percentage, whereas continuous data were expressed as mean and standard deviation (SD) or median and range (from minimum to maximum). The normal distribution of different data sets was assessed employing the D’Agostino-Pearson test [23]. Nominal statistical significance was defined with a P value of 0.05.

The correlation of kPa values among MRE versus 1D-TE, 2D-SWE versus 1D-TE, and MRE versus 2D-SWE was tested by means of Spearman’s rank test. The correlation of kPa values was also assessed by means of linear regression. The (r) values were interpreted as follows: 0.9–1 (very strong), 0.7–0.89 (strong), 0.5–0.69 (moderate), 0.3–0.4.9 (moderate to low), 0.16–0.29 (weak to low), and < 0.16 (too low to be meaningful) [24]. Differences between each pair of techniques were plotted against the averages of the two techniques by using the method suggested by Bland and Altman. Inter-modality agreement in the stratification of patients according to the different fibrosis groups was calculated for each pair of techniques by using weighted kappa, according to Cohen. Kappa values were interpreted as follows: < 0.20 (poor), 0.21–0.40 (fair), 0.41–0.60 (moderate), 0.61–0.80 (good), and 0.81–1.00 (very good) [25]. Inter-modality agreement was further evaluated using Gwet’s AC1 [26]. Multivariate logistic regression analysis was performed to assess which patient-related factors were significantly associated with disagreement among the three techniques [27] .

Results

Seventy-seven patients met the inclusion criteria and had reliable measurements with all three techniques; they included 67/77 (87.01%) males and 10/77 (12.99%) females, with a mean age of 55.87 ± 8.79 and a mean BMI of 25.31 ± 4.04.

A flow diagram of patients’ inclusion is shown in Fig. 2.

Fig. 2
figure 2

Flow diagram showing the inclusion of patients

Patient’s data are summarized in Table 2. Distribution of HCV patients in each fibrosis group for each modality is shown in Table 3. Liver stiffness measurements obtained by the different modalities in each fibrosis group are reported in Fig. 3.

Table 2 Demographic, clinical, and laboratory features of included patients
Table 3 Stratification of patients in the three groups of fibrosis according to the stiffness measurements obtained by the various elastographic techniques
Fig. 3
figure 3

Kilopascal values in each fibrosis group obtained by the different stiffness imaging techniques. 77 patients with reliable measurements on all three modalities were included. The top and the bottom of the boxes are the first and third quartiles, respectively. The length of the box represents the interquartile range including 50% of the values. The line through the middle of each box represents the median. The error shows the minimum and maximum values (range). An outside value (separate point) is defined as a value that is smaller than the lower quartile minus 1.5 times the interquartile range or larger than the upper quartile plus 1.5 times the interquartile range. a MRE. b 1D-TE. c 2D-SWE

Technical failure rate

The overall technical failure was 14/91 (15.38%). There was only one case of MRE technical failure 1/91 (1.10%). Ultrasound-based techniques failed in 13 over 91 patients (14.29%). 1D-TE failed in 6/91 (6.59%) patients, and 2D-SWE failed in 7/91 (7.69%) patients.

Correlation of stiffness measurements in kPa between techniques

The Spearman’s correlation of stiffness measurements, expressed in kPa, was found to be at least strong for all pairs of techniques. The highest correlation was seen between MRE and 2D-SWE [r = 0.898, CI 95% (0.843–0.934)], and the lowest between 2D-SWE and 1D-TE [r = 0.795; CI 95% (0.695–0.865)]. Correlation between MRE and 1D-TE was as follows: r = 0.867; CI 95% (0.798–0.914). The P value of correlation was inferior to 0.001 for all pairs of techniques. Results of the correlation analysis are reported in Fig. 4. Lowest kPa correlation was observed in ultrasound-based techniques.

Fig. 4
figure 4

Correlation analysis between stiffness measurements, expressed in kilopascals, obtained by the various elastographic techniques. a MRE versus 1D-TE, r = 0.867. b 1D-TE versus 2D-SWE, r = 0.795. c MRE versus 2D-SWE, r = 0.898

Linear regression analysis showed a strong correlation between MRE versus 1D-TE (r = 0.794; R2 = 0.630; P < 0.0001) and MRE vs. 2D-SWE (r = 0.841; R2 = 0.707; P < 0.0001). Correlation between kPa values of ultrasound-based methods (2D-SWE and 1D-TE) was moderate (r = 0.608; R2 = 0.370; P ≤ 0.0001).

Bland–Altman plots

In the Bland–Altman analysis, the highest mean difference between kPa values (–8.49; CI 95% (− 10.55 to − 6.44); SD = 9.06; lower limit = –26.25; upper limit = 9.27) was found between MRE and 1D-TE, whereas the lowest [− 4.11; CI 95% (− 4.50 to − 3.72); SD = 1.72; lower limit = − 7.48; upper limit = – 0.74] was obtained between MRE and 2D-SWE. Figure 5 illustrates Bland–Altman plots for each pair of techniques.

Fig. 5
figure 5

Bland-Altman plots showing the differences between pairs of techniques plotted against the averages of the two techniques. Horizontal lines are drawn at the mean difference, and at the limits of agreement, which are defined as the mean difference plus and minus 1.96 times the standard deviation of the differences. a 1D-TE versus MRE. b 1D-TE versus 2D-SWE. c MRE versus 2D-SWE

Inter-modality agreement in the stratification of patients according to fibrosis group

There was an agreement among all techniques in 50/77 patients (64.94%). In 14/77 (18.18%), there was an agreement between MRE and 1D-TE, whereas 2D-SWE was discordant. In 5/77 patients (6.49%), there was concordance between 2D-SWE and 1D-TE, whereas MRE was discordant. In 6/77 patients (7.79%), MRE and 2D-SWE assigned patients to the same fibrosis group, whereas 1D-TE assigned them to different fibrosis groups. In only 2/77 patients (2.60%) was there a complete disagreement among all three techniques. Rates of agreement are summarized in Table 4.

Table 4 Rates of agreement between the elastographic techniques

The agreement was highest between MRE and 1D-TE, with a Cohen’s κ value of 0.801 (CI 95% [0.7–0.903]), and lowest between 2D-SWE and 1D-TE, with a Cohen’s κ of 0.662 (CI 95% [0.535–0.788]). The intermediate κ value was found between MRE and 2D-SWE (κ = 0.704; CI 95% [0.594–0.815]).

Gwet’s AC1 analysis gave results comparable to those of Spearman’s rank correlation. In particular, Gwet’s AC1 was 0.748, CI 95% [0.621–0.875] in MRE vs. 1D-TE; 0.577, CI 95% [0.422–0.732] in 2D-SWE vs. 1D-TE; and 0.593, CI 95% [0.442–0.745] in MRE vs. 2D-SWE.

A separate analysis of agreement was conducted between each of the stiffness techniques and FIB-4. 1D-TE and FIB-4 assigned patients to the same fibrosis group in 38/77 cases (49.4%) and to different fibrosis groups in 39/73 patients (50.6%). Inter-modality agreement was fair κ = 0.318, CI 95% [0.153–0.484]. MRE and FIB-4 agreed in 39/77 patients (50.6%), and inter-modality agreement was fair (κ = 0.322, CI 95% [0.157–0.486]). FIB-4 and 2D-SWE assigned the same fibrosis group in 42/77 patients (54.5%) (moderate agreement: κ = 0.445, CI 95% [0.296–0.594]).

Gwet’s AC1 was 0.231, CI 95% [0.062–0.399] in 1D-TE vs. FIB-4; 0.269; CI 95% [0.102–0.436] in MRE vs. FIB-4; and 0.331, CI 95% [0.156–0.504] in 2D-SWE vs. FIB-4.

Factors influencing disagreement between techniques

A multivariate logistic regression analysis was performed, introducing the disagreement between two or more techniques as the dichotomous dependent variable and various patient-related factors as independent variables, including age, BMI, fibrosis group on 1D-TE, T2*, and fat fraction values (Table 5). T2* and fat fraction values are obtained with MRI-based methods (MRE) and cannot be done with ultrasound-based scans. Increasing BMI was found to be significantly associated with disagreement between techniques, with an odds ratio of 1.15 (CI 95% [1.01–1.31]; P = 0.0339).

Table 5 Logistic regression analysis

Discussion

To our knowledge, this is the first study assessing the inter-modality agreement among MRE, 1D-TE, and 2D-SWE in a prospective cohort of HCV patients.

Different previous studies evaluated the diagnostic performance of the techniques examined in this study. The diagnostic performances of MRE are the highest, with AUC values ranging from 0.78 to 0.99 [14, 28]. 1D-TE diagnostic performance varied from 0.73 to 0.91 [29, 30] and that of 2D-SWE varied from 0.77 to 0.97 [31, 32].

MRE and 2D-SWE are relatively recent techniques and need further validation; in addition, some challenges may arise when comparing measurements obtained with different modalities as well as when converting these into the corresponding fibrosis stage. In our study, MRE, 1D-TE, and 2D-SWE assigned the majority of the patients (about 65%) to the same fibrosis group. This figure may not seem optimal, but it can be explained by several factors. Mainly, the various stiffness imaging techniques measure different quantifiable properties, such as the Young modulus E in the case of both 1D-TE and 2D-SWE and the complex shear modulus in the case of MRE. Second, stiffness measurements may vary up to 12% in ultrasound-based scanners from different manufacturers [22]. Besides, there are potential variations in parenchyma stiffness across the liver Couinaud segments, which may reflect the heterogeneous nature of fibrosis; this observation could explain some cases of disagreement since the regions being evaluated are not strictly the same when using the different techniques [33]. Nevertheless, agreement in assigning the same fibrosis group was good on weighted kappa and moderate to good on Gwet’s AC1. The lowest inter-modality agreement was found between 1D-TE and 2D-SWE, despite the strong correlation between kPa values at quantitative analysis. One possible explanation may arise from the observation that the 2D-SWE module employed in our work was only recently developed, and it is of striking importance to find optimal cut-off values for converting stiffness measurements in the correspondent fibrosis stage. In this regard, Bende et al. obtained cut-off values different from those suggested by the manufacturer [34]. In the present study, the value used to determine advanced fibrosis or cirrhosis (METAVIR F4 and some F3) by means of MRE is > 5.0 kPa; according to the results of recent studies, some patients with stage 3 disease may fall in the range between 4.0 and 5.0 kPa [8, 35]. Therefore, these patients may represent false negative cases in group 2 detected on MRE. In order to avoid missing clinically significant fibrosis, patients assigned to group 2 fibrosis deserve particular attention and require follow-up examinations.

Even though serum biomarkers are commonly used in clinical practice, none of these markers have evolved as the standard of practice for primary assessment of liver fibrosis. In terms of accuracy, they are not able to replace liver biopsy or stiffness imaging techniques as the standard of reference for primary assessment of liver fibrosis [8]. The agreement between elastographic techniques and FIB-4 was lower than the inter-modality agreement among the various elastographic techniques. The observed discrepancies with FIB-4 are reasonably due to the well-known limitations of this laboratory score. Therefore, combining two noninvasive elastographic modalities may be more helpful for an accurate estimation of liver fibrosis than the integration of a clinical/laboratory score and only one stiffness imaging technique. However, this could not be verified in this concordance study due to the absence of liver biopsy as standard of reference.

We found a good agreement between MRE and 2D-SWE in the stratification of patients according to their fibrosis group, and the strongest correlation between kPa values at quantitative analysis. Interestingly, we found that the correlation was weaker for higher kPa values, as seen in Fig. 4c. This is in line with a previous study published by Yoon et al., which found a correlation rho value ranging from 0.3 to 0.9 between 2D-SWE and MRE, with lower correlation for higher kPa values. In fact, shear wave generation, using focused US push-pulses, could be more unevenly attenuated in cirrhotic livers, resulting in more variable LS measurements [12].

In the Bland–Altman analysis, it was interesting to notice that the highest mean difference between kPa values was found comparing MRE and 1D-TE. This result comes from the intrinsic difference between velocity measurements and kPa scales used in these two elastographic modalities. However, after converting kPa values in stages of fibrosis by means of validated cut-offs, we found that the highest inter-modality agreement was seen between these two modalities. This result may be seen as a point of strength for both techniques, because 1D-TE is still the stiffness imaging modality of reference, and MRE is the most promising among the currently available elastographic techniques.

In our study, we found an overall technical failure rate of 15.38%. MRE failed in 1.10% which is slightly lower than previously reported values (3.5–5.6%) [17, 36]. Ultrasound-based techniques failed in 13 over 91 patients (14.29%). 1D-TE failed in 6.59% of cases, and this figure is lower than those previously reported in literature (14.3–18.4%) [37, 38]. 2D-SWE failed in 7.69% which falls between previously reported rates of failure/unreliable results (4.2–24.8%) [12, 34]. On the other hand, MRE gave unreliable results in only one case. Given the higher rates of technical failure of both 2D-SWE and 1D-TE, in those clinical settings where MRE is available, it should be considered the first-line modality for noninvasive assessment of liver fibrosis. However, if MRE is unavailable, ultrasound-based elastography techniques may be used.

In multivariate logistic regression analysis, we found that BMI was significantly associated with discordance. With regard to 1D-TE, Wong et al. noted that even using the 1D-TE XL probe, unreliable measurements were found in about 35% of patients with BMI greater than 30 kg/m2 [39]. Another study found that BMI and increasing abdominal wall thickness were associated with unreliable measurements with 2D-SWE [40]. MRE can be useful in cases of high BMI and great abdominal wall thickness, because increasing BMI was found to have little to no effect on MRE success [17, 36]. In our experience, correct driver positioning and wrapping the elastic belt as tightly as possible are two technical clues of utmost importance to obtain reliable MRE stiffness measurements.

The limitations of this study include the missing histopathological gold standard and the small number of patients. No patient had clinical indication for liver biopsy. The small number of patients may be a consequence of the restrictive inclusion criteria to avoid confounding factors. We emphasize that all three stiffness imaging techniques were performed on the same day.

MRE, 1D-TE, and 2D-SWE assigned the majority of patients to the same fibrosis group. The agreement was at least good, and there was a strong correlation between kPa values in all three pairs of techniques. Highest agreement was found between MRE and 1D-TE. The technical failure rate was very low, especially in the case of MRE. High BMI was the only factor associated with discordance among the techniques.