Introduction

The clinical management of chronic liver disease (CLD) depends on the correct assessment of liver fibrosis. Liver biopsy is currently recommended as the gold standard to determine the degree of fibrosis, but it has many drawbacks. Approximately 1–3% of patients require hospitalization for complications, and 25% report post-procedural pain [1]. In addition, its diagnostic accuracy is strongly influenced by the quality of the specimen [2] and by inter- or intra-observer variation [3]. Therefore, noninvasive methods for assessing the degree of liver fibrosis have been proposed, such as serum markers and transient elastography (TE) (Fibroscan, Echosens, Paris). Serum markers consist of several biochemical tests and scores, but which of these tests should be used in clinical practice is still unclear. Fibroscan measures liver stiffness, which is correlated to fibrosis stage, and two recent meta-analyses have shown high accuracy in the diagnosis of liver cirrhosis [4, 5]. This method, however, has several limitations: failure or unreliable results occur in 10–20% of patients due to the patient being overweight, narrow intercostal spaces [6], or high variability of measurements [7]. Additional problems are its high cost, low accuracy in diagnosing significant fibrosis [4, 5], and poor reproducibility for both low and high stiffness values [810]. For these reasons, new methods for assessing liver fibrosis have been proposed and are now under evaluation. Real-time tissue elastography (RTE) and acoustic radiation force impulse (ARFI) imaging technology are two different add-on modules that can be embedded into standard ultrasound imaging devices. They both measure liver stiffness and have the advantage, over TE, of combining direct visualization of the liver parenchyma and liver stiffness measurements. This enables the operator to directly correlate the anatomical correspondence between tissue elasticity and B-mode display, thus avoiding the subcapsular region and reducing the variability of measurements [8]. The results are not affected by overweight patient status or narrow intercostal spaces, and the failure rate is virtually nil. Initial studies have shown that the results of both techniques are reproducible and correlate with liver fibrosis [1115]. Two studies [16, 17] have demonstrated a good correlation between RTE and TE, with similar AUROCs for the diagnosis of cirrhosis, while in another study [18], the diagnostic accuracy of RTE was found to be inferior to TE in the diagnosis of both significant fibrosis and cirrhosis. Regarding ARFI, no difference with TE was found in one study [19], while in another study, ARFI performed less well than TE in the diagnosis of significant fibrosis [20]. There are no studies that directly compare the three methods in the same population.

The aim of our study was to perform a head-to-head comparison of these techniques in diagnosing fibrosis, significant fibrosis, and cirrhosis in a population consisting of normal subjects and patients with CLD.

Patients and methods

Patients

A total of 91 subjects were selected to enter the study. All consecutive patients with CLD who underwent percutaneous liver biopsy during the last year were invited to participate. Fifty-four out of 78 (70%) gave their informed consent and attended the study, which was approved by the local ethical committee. Two patients with decompensated cirrhosis and 3 with hepatocellular carcinoma were excluded. The control group consisted of 37 normal subjects who were randomly selected from among voluntary blood donors in our area.

Liver biopsy was performed with a 16–17 G needle (Biomoll, HS Hospital Service, Aprilia, Italy) at a median time of 7 months (range 1–12) prior to the study. The specimens were taken only from the right liver lobe and were adequate (at least 2 cm in length and 11 portal tracts) in all cases. More specifically, the median length of all specimens was 3.8 cm (range 2–4.8), and the median number of portal tracts per specimen was 15 (range 11–17). All liver biopsies were formalin-fixed, embedded in paraffin, and routinely stained with hematoxylin and eosin, periodic acid–Schiff after diastase digestion, Masson’s trichrome, and Perl’s method for iron.

Liver fibrosis was staged according to the Metavir scoring system [21]: F0, non-fibrosis; F1, portal fibrosis without septa; F2, portal fibrosis with few septa; F3, numerous septa without cirrhosis; F4, cirrhosis. Significant fibrosis was defined as stage F2 or greater, while healthy voluntary blood donors were considered to have F0 fibrosis. Necro-inflammatory activity was graded as follows: A0, none; A1, mild; A2, moderate; A3, severe.On the same day in November 2010, the subjects underwent a liver stiffness measurement (Fibroscan), an upper abdominal ultrasound examination (MyLab70, Esaote, Genova, Italy) and an RTE (Preirus, Hitachi Medical Systems Europe Holding AG, Zug, Switzerland). An interval of at least one month between the liver biopsy and the performance of TE plus RTE was required in order for the patient to be included in the study.

Fifteen days after the first visit, all subjects were further evaluated with ARFI (Acuson S2000 Virtual Touch Tissue Quantification, Siemens, Erlangen, Germany). The procedures were performed after an 8 h period of fasting and in the resting condition. Only patients who underwent all three examinations were included in the study and analyzed.

Liver stiffness measurements

TE was performed with Fibroscan and with the regular probe by a specifically trained physician (PD). The tip of the probe transducer was placed in the intercostal space at the level of the right midaxillary line and at the center of the right liver lobe. Results were expressed in kilopascals (kPa) and as the median of 10 valid acquisitions, with values ranging from 2.5 to 75 kPa. Only procedures with at least 10 valid acquisitions, a success rate of at least 60%, and an interquartile range (IQR)/median stiffness ratio of <0.3 were considered. TE was classified as having failed if no or <10 measurements were obtained, while unreliable TE was defined as a success rate of <60% and/or an IQR/median stiffness ratio of <0.3 [6].

Real-time tissue elastography

All examinations were performed by one of the authors (MB) with the assistance of an ESAOTE technician. Hitachi Preirus ultrasound equipment was used, with an embedded elastography module (Hitachi Medical Systems Europe Holding AG) and a 3.5–7 MHz linear probe. The ultrasound probe was placed in an intercostal space, with the patient lying supine. Measurements were taken from the right liver lobe. A rectangular area devoid of large vessels, measuring 3 cm in length and 2 cm in breadth, was chosen 10 mm below the liver surface. The device automatically captures the internal compression transmitted to the liver parenchyma by the heartbeat. Numerical strain values for the pixels are converted into a color image within the rectangle, ranging from 0 (blue) to 255 (red) as hardness increases, and a histogram is generated. Ten static images were analyzed using the software Elasto_ver 1.5.1, kindly provided by Hitachi. Briefly, the distribution of pixels was represented by an histogram from which eleven parameters were derived and analyzed by the software. Four main functions (Z1Z4) were calculated and included in an integrative function from which the common elastic index of RTE was calculated, according to the formula

$$ I = (5.174{Z}1 + 2.154{Z}2 + 1.366{Z}3 + 0.985{Z}4). $$

Results were expressed as the mean elastic index of all measurements. Variability of sampling was expressed as the standard deviation when the sampling area was distributed normally or as the complexity of the histogram for an uneven distribution. Complexity was calculated by the following equation: periphery2/area of the histogram.

Acoustic radiation force impulse imaging

The ARFI system is a module of a standard ultrasound device (Acuson S2000 Virtual Touch Tissue Quantification) that generates a high-energy ultrasound pulse. The pulse produces mechanical excitation along the acoustic wave propagation path and shear waves are generated. The speed of the shear waves or the shear wave velocity (SWV), expressed in m/s, is detected in the region of interest (corresponding to a cylinder 0.5 cm wide and 1 cm long). The speed increases as the hardness of the liver parenchyma increases. Practically, the probe was placed in an intercostal space and the right liver lobe was visualized. On the conventional B-mode images the cylinder is visible as a green rectangle that can be freely moved within the liver parenchyma down to a depth of 5.5 cm below the skin surface. The cylinder was placed in a region devoid of large vessels, and the ultrasound pulse was generated by pressing a button while the patient was holding his breath. Ten measurements were acquired for each patient, and the mean along with the standard deviation were automatically calculated by the software. The examinations were performed on the same day by one of the authors (SC) and under the supervision of a Siemens technician. The three examiners (PD, MB, and SC) who performed TE, RTE, and ARFI were blinded to the results of the other techniques.

Analysis of data

The correlations between fibrosis stage, necro-inflammatory activity, and the results of TE, RTE, and ARFI were calculated via the Spearman rank order correlation coefficient. Box plots were used to study the distribution of values according to the stage of fibrosis. The accuracy of TE, RTE, and ARFI when assessing the fibrosis stage was evaluated by calculating the sensitivity, specificity, positive and negative likelihood ratios, and the area under the receiver operating characteristic curve (AUROC curve). The statistical analysis was performed with the SigmaStat 3.5 package (Systat Software, Inc., Richmond, CA, USA) and Med-Calc Version 11.4.4 (MedCalc Software, Mariakerke, Belgium). The best cut-off values for diagnosing fibrosis, significant fibrosis and cirrhosis were calculated according to the Youden index—i.e., the best combination of sensitivity and specificity. The study was powered to detect a 0.100 difference in AUROC curves for TE versus RTE and ARFI with a type I error—an alpha value of 0.05. The minimal sample size required for fibrosis and significant fibrosis (assuming TE AUROC = 0.800 vs. RTE and ARFI AUROCs = 0.700) was 90 subjects, while the minimal sample size for cirrhosis (assuming TE AUROC = 0.900 vs. RTE and ARFI AUROCs = 0.800) was 67 subjects. We also calculated the adjusted AUROCs and the positive (PPV) and negative predictive values (NPV) taking into account the prevalence of each fibrosis stage using the Bayesian methodology present in the MedCalc statistical package.

Finally, we performed an intention-to-diagnose analysis, applying a wrong result to failed and unreliable TE attempts. Two types of intention-to-treat analyses were performed. In the first, in order to maximize sensitivity, all nondiagnostic TE attempts were considered positive results—we attributed the median value of successful examinations in patients with fibrosis, significant fibrosis, and cirrhosis (true positive results) to them. In the second, in order to maximize specificity, all nondiagnostic TE attempts were considered negative results—we attributed the median value of successful examinations in patients below the studied fibrosis stage (true negative results) to them.

Results

Characteristics of the subjects and success rate of the examinations

The clinical, biochemical, and histological characteristics of the entire population are shown in Table 1. Seventy-two of the 91 subjects who presented at the clinic and were initially examined by TE and RTE were examined by ARFI 15 days later. Only the 72 subjects who underwent all three procedures were analyzed (45 patients with CLD and 27 normal subjects). The clinical characteristics of these patients are represented in Table 2. TE failed (i.e., there was no measurement or <10 measurements) in 5 out of 72 subjects (6.9%), and it was unreliable (i.e., there was a success rate of <60% or an IQR/median stiffness of >0.3) in 4 other subjects (5.6%). All failed, incomplete, and unreliable TE were excluded in the conventional analysis but included in the intention-to diagnose analysis. None of the subjects who underwent RTE and ARFI had <10 valid measurements, so all subjects were included.

Table 1 Clinical characteristics of the entire population (91 patients)
Table 2 Clinical characteristics of the population who underwent all three examinations (72 patients)

Relationships between liver stiffness, elasticity index, shear wave velocity, and histological parameters

The variations of liver stiffness on TE, elastic index on RTE, and SWV on ARFI with fibrosis stage are shown in Fig. 1. TE showed the lowest overlap through all the stages and RTE the greatest. ARFI showed a high degree of overlap for F0 to F3, while stage F4 was well separated from the others (median SWV F4: 2.6 m/s, 25th percentile: 2.2 vs. median SWV F3: 1.56 m/s, 75th percentile: 1.8). The calculated correlation coefficients between fibrosis and the values obtained by the three methods showed the highest correlation for TE and ARFI: TE = 0.646 Spearman correlation coefficients (p < 0.0001), ARFI = 0.535 (p < 0.0001), RTE = 0.363 (p < 0.002).

Fig. 1
figure 1

Variations of liver stiffness (TE), elasticity index (RTE), and shear wave velocity (ARFI) in the entire population of 91 subjects according to the stage of fibrosis. The bottom and the top of each box are the 25th and 75th percentiles. The horizontal line represents the median and the vertical line the range, excluding the outliers, which are represented by dots

Further analysis (see Fig. 2) was performed on the data generated by RTE in order to determine which parameter was better correlated with fibrosis stage. The elasticity index alone was found to correlate with fibrosis, while no correlation was found for the percentage of the blue area and the complexity of the histogram. Regarding ARFI, in spite of the internal control method of the device, variability could not be totally eliminated from the measurements. In fact, 10 (13.8%) and 22 (30.5%) subjects, respectively, had standard deviations that were higher than 40 and 33% of the mean SWV. However, upon eliminating these subjects from the analysis, the correlation coefficient between SWV and fibrosis did not vary, suggesting that the performance of this technique could not be further improved by discarding the subjects with greater variability. We therefore disregarded these adjunctive parameters, and only considered the elastic index and SWV in our analysis. In order to determine the influence of necro-inflammatory activity on the three techniques, we also calculated the correlation coefficients between necro-inflammatory activity and the relevant indices, adjusted for fibrosis stage (Table 3). The results were inconclusive, and no evidence of any influence of necro-inflammatory activity on the results of the three tests could be clearly demonstrated.

Table 3 Correlations of liver stiffness (TE), elasticity index (RTE), and shear wave velocity (ARFI) with histological necro-inflammatory activity in patients with chronic liver disease, according to different stages of fibrosis
Fig. 2a–c
figure 2

Image analysis of transient elastography (TE), real-time tissue elastography (RTE), and acoustic radiation force impulse (ARFI) imaging. a In TE, the transducer gives a mechanical impulse to the thoracic wall, generating an elastic shear wave, which is represented by the elastographic curve in the right upper corner. The steepness of the curve is directly related to liver stiffness and is expressed in kPa. b In RTE, the strain transmitted to the liver parenchyma by the heartbeat is captured by the device and converted into a color image. Liver elasticity is calculated by a complex function and expressed as an elastic index. c In ARFI, a high-energy ultrasound pulse generates elastic shear waves in the liver parenchyma, which are measured in the region of interest (ROI). The ROI, represented by the box, can be freely moved within the liver, with a depth limitation of 5.5 cm below the skin surface. The mean speed of the shear waves is measured in m/s and is directly related to the stiffness of the liver

Calculation of the areas under the receiver operating characteristic curves for TE, RTE, and ARFI

We calculated the best cut-off values for TE, RTE, and ARFI for any fibrosis (F0 vs. F1, 2, 3, 4), significant fibrosis (F0, 1 vs. F2, 3, 4), and cirrhosis (F0, 1, 2, 3 vs. F4), and their corresponding AUROCs (Table 4). The AUROC values were as follows: TE 0.878, RTE 0.834, and ARFI 0.807 for predicting fibrosis (no significant difference between the three curves); TE 0.897, RTE 0.751, and ARFI 0.815 for predicting significant fibrosis (TE better than RTE with p < 0.01, no significant difference between TE and ARFI, or between ARFI and RTE); TE 0.922, RTE 0.852, ARFI 0.934 for predicting cirrhosis (no significant difference between the three curves).

Table 4 Cut-off values for transient elastography (TE), real-time elastography (RTE), and acoustic radiation force impulse (ARFI) imaging for fibrosis, significant fibrosis, and cirrhosis, as well as their corresponding AUROCs

Considering only AUROCs >0.9, which is the commonly accepted threshold to classify a test as highly accurate [22], we found that the methods with the highest diagnostic accuracy were TE and ARFI for the diagnosis of cirrhosis, with the best cut-offs set at 9.2 kPa for TE and 1.7 m/s for ARFI. Regarding the diagnosis of fibrosis and significant fibrosis, the best AUROCs were observed for TE, with the best cut-offs set at 6.3 kPa for fibrosis and 7.8 kPa for significant fibrosis. The AUROC curves were not affected by differences in fibrosis stage; they were exactly the same irrespective of the correction for the prevalence of fibrosis stages. In the intention-to-diagnose analysis maximizing sensitivity, the AUROCs of TE decreased from 0.878 to 0.853 (95% CI 0.747–0.927) for predicting fibrosis, from 0.897 to 0.874 (95% CI 0.775–0.940) for predicting significant fibrosis, and from 0.922 to 0.867 (95% CI 0.766–0.936) for predicting cirrhosis. In the intention-to-diagnose analysis maximizing specificity, the AUROCs of TE decreased to 0.802 (95% CI 0.691–0.887) for predicting fibrosis, 0.837 (95% CI 0.731–0.914) for predicting significant fibrosis, and 0.688 (95% CI 0.568–0.792) for predicting cirrhosis. Comparison of the AUROC curves of the three methods with the intention-to-diagnose analysis failed to demonstrate significant differences between the various curves, although a trend for significance was found in favor of ARFI versus TE in the analysis maximizing specificity and for the diagnosis of cirrhosis (ARFI 0.934 vs. TE 0.688, p = 0.0642).

Discussion

Transient elastography, RTE, and ARFI are the techniques most commonly used to measure liver stiffness, which is correlated with liver fibrosis. The physical principles on which these techniques are based are different: TE and ARFI use shear wave elastography, while RTE uses strain tissue elastography. The aim of our study was to compare these methods with histological parameters in the same group of patients.

In order to reduce confounding factors, we performed all examinations in the same population and at the same time: only 15 days elapsed between TE, RTE, and ARFI. We also obtained measurements entirely from the right lobe, thus eliminating interlobe variations. In comparative studies with TE, it is important to restrict sampling to one lobe, because TE only explores the right liver, and significant interlobe variability has been shown for ARFI [23, 24].

The results of all three methods were strongly correlated with fibrosis and not with necro-inflammatory activity, corrected for fibrosis stage. This is partially at odds with the findings of Lupsor et al. [20], who report that both fibrosis and necro-inflammatory activity but not steatosis influenced ARFI. In our study, the number of patients with fatty liver was small, so we could not investigate the effect of steatosis on these techniques. We found that TE and ARFI are both highly effective in diagnosing cirrhosis, but that TE is probably more accurate in predicting significant fibrosis (AUROC TE 0.897 vs. ARFI 0.815), although we could not demonstrate a significant difference between the two curves. Our results are at variance with three studies which found similar accuracies of TE and ARFI in diagnosing significant fibrosis [19, 24, 25], but are consistent with two other studies who found the same diagnostic accuracy for cirrhosis, but better performance of TE in predicting significant fibrosis [15, 20].

TE, however, has limitations, because in 15% of our patients it was unsuccessful. Our failure rate is in agreement with a large study on TE feasibility [7]; the most common reason for failure was obesity. A special TE probe for overweight people has been produced, but preliminary data show that the cut-offs for fibrosis obtained with the new probe may be different from the standard cut-offs obtained with the regular probe [26]. In this case, the diagnostic accuracy of TE should be re-calculated. In our study, RTE and ARFI were successful in all patients, but other studies have reported a failure rate of 5–8% for these techniques too [15, 20]. Not surprisingly, when we performed an intention-to-diagnose analysis of our data, TE lost ground and ARFI remained the only highly accurate method for diagnosing cirrhosis.

In our study, RTE AUROCs were inferior to TE and ARFI, but RTE still showed good accuracy for the diagnosis of fibrosis (AUROC 0.834) and cirrhosis (AUROC 0.852). Data from the literature regarding this issue are conflicting, with two studies reporting low accuracies of RTE in the diagnosis of both significant fibrosis and cirrhosis [11, 18], and three other studies showing the opposite results [13, 16, 27]. The reason for this variability is most likely the fact that RTE technology and the equations used to calculate tissue elasticity are rapidly changing. Further studies are therefore needed to fully explore the potential of RTE, and it is surely premature to conclude that RTE has a lower diagnostic performance than the other two techniques.

Despite showing better diagnostic accuracy, ARFI has a very narrow range of measurements, with only 0.27 m/s separating the cut-offs for significant fibrosis and cirrhosis. Reproducibility of measurements is therefore crucial for ARFI. Preliminary data show that this technique has good intra- and inter-observer agreement [15], but these results should be confirmed by further investigations.

A limitation of our study is the overprevalence of low fibrosis stages, but it is unlikely that it altered our results, since the AUROCs did not change after correcting for the varying prevalences of fibrosis stages. It is also unlikely that the seven-month interval that elapsed from liver biopsy to the performance of the examinations influenced the consistency of our results. The progression of fibrosis in CLD is slow [2830], and we excluded from our study patients with acute and subacute hepatitis who could rapidly progress to advanced fibrosis stages.

In conclusion, our study—performed in a mixed population of normal subjects and CLD patients—showed that TE and ARFI provide high diagnostic accuracy in the prediction of cirrhosis. When feasible, TE is probably the best method to screen for CLD patients in the general population and to identify significant fibrosis, but larger studies are needed to reach statistical significance.